Вы находитесь на странице: 1из 1121

The Oxford Handbook of

Perceptual Organization
The Oxford Handbook
of Perceptual
Organization

Edited by

Johan Wagemans

1
1
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Oxford University Press 2015
The moral rights of the author‌have been asserted
First Edition published in 2015
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2014955474
ISBN 978–0–19–968685–8
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Oxford University Press makes no representation, express or implied, that the
drug dosages in this book are correct. Readers must therefore always check
the product information and clinical procedures with the most up-to-date
published product information and data sheets provided by the manufacturers
and the most recent codes of conduct and safety regulations. The authors and
the publishers do not accept responsibility or legal liability for any errors in the
text or for the misuse or misapplication of material in this work. Except where
otherwise stated, drug dosages and recommendations are for the non-pregnant
adult who is not breast-feeding
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Foreword
Stephen E. Palmer

The topic of perceptual organization typically refers to the problems of how the visual informa-
tion is structured into qualitatively distinct elements over time and space during the process
of perceiving and how that structuring influences the visual properties observers experience.
Corresponding work on analogous topics in other sensory modalities is also an active area of
research (see Section 7), but the vast majority of the literature concerns perceptual organization
in vision (as reflected in the rest of the volume). If one grants that the smallest, lowest-level visual
elements are likely to be the outputs of retinal receptors and that the largest, highest level ele-
ments are the consciously experienced, meaningful environmental scenes and events that human
observers use to plan and execute behaviors in their physical and social environments, then the
fundamental question of perceptual organization is nothing less than this: how does the visual
system manage to get from locally meaningless receptor outputs to globally meaningful scenes
and events in the observer’s perceived environment? When stated in this way, the field of percep-
tual organization encompasses most of human perception, including the perception of groups,
patterns, and textures (Section 2), contours and shapes (Section 3), figures, grounds, and depth
(Section 4), surfaces and colors (Section 5), motion and events (Section 6), as well as analogous
issues in other sensory modalities (Section 7). (The present volume also includes two further
sections on topics that have evolved from the material covered in Sections 2-7, one on special-
ized topics (Section 8) and another on practical applications (Section 9).) Indeed, nearly the only
aspects of perception typically excluded from discussions of perceptual organization are very low-
level sensory processing (such as detecting lines and edges) and very high-level pattern recogni-
tion (such as recognizing objects and scenes). This division has led to a somewhat unfortunate
and uninformative classification of vision into low-level, mid-level, and high-level processing,
with perceptual organization being identified with mid-level processing: essentially, whatever is
left over between basic sensory processing and pattern recognition of known objects and scenes.
Even so, some topics are more closely associated with the field of perceptual organization than
others, and the ones represented in this volume constitute an excellent sample of those topics.
Perceptual organization not only spans a wide array of empirical phenomena in human vision,
but the approaches to understanding it encompass four distinct, but tightly interrelated domains:
phenomenology, physiology, ecology, and computation. Phenomenology concerns the conscious
appearance of the visible world, seeking to answer questions about the structural units of visual
experience (e.g., regions, surfaces, and volumetric objects) and the properties people experience
as defining them (e.g., their colors, shapes, sizes and positions). Physiology (i.e., neuroscience)
concerns how neural events in the brain produce these experiences of perceived elements and
properties, addressing the problem of how the brain achieves that organization of visual experi-
ences. Ecology concerns the relation between observers and their environments (including physi-
cal, social, and cultural aspects), attempting to determine why the world is experienced in terms
of these units rather than others and why the brain processes the corresponding sensory informa-
tion in the way it does. Computation concerns formal theories of how perceptual organization
vi Foreword

might be achieved by the processing of information at a more abstract level than that of physi-
ological mechanisms in the brain. Computation thus provides a theoretical interlingua in which
the other three domains can potentially be related to each other. All four domains are crucial in
understanding perceptual organization and are mentioned throughout this volume. They are also
addressed quite explicitly in the final, theoretical section (Section 10).
The topic of perceptual organization in vision has a fascinating, roller-coaster history that is
relevant to understanding the field. Until the late 19th and early 20th centuries, organizational
issues in vision, at least as they are currently considered, were virtually nonexistent. The reason
is that the dominant theoretical paradigm in18th century philosophy came from British empiri-
cists, such as Locke, Berkeley, and Hume, who proposed that high-level perceptions arose from
a mechanistic, associative process in which low-level sensory atoms — i.e., primitive, indivisible,
basic elements (akin to the outputs of retinal receptors) — evoked other sensory atoms that were
linked together in memory due to repeated prior joint occurrences. The result of these activated
associations, they believed, was the perception of meaningful objects and scenes. This atomistic,
associative view, which became known as “Structuralism” in the hands of 19th century psycholo-
gists, such as Wundt and Titchener, includes no interesting role for structure between low-level
sensory atoms and high-level perceptions, as if the latter arose from unstructured concatenations
(or “summative bundles”) of the appropriate sensory atoms.
The theoretical landscape became more interesting in the late 19th century with the develop-
ment of philosophical phenomenology (see Chapter 2), in which the structure of internal experi-
ences was ascribed a much more important role. Phenomenologists, such as Brentano, Husserl,
and Merleau-Ponty, analyzed the subjective organization and content of internal experiences (i.e.,
the appearance of perceptual objects) into a sophisticated taxonomy of parts and wholes. The
development of such ideas in the hands of philosophers and early psychologists eventually led
to the seminal singularity in the history of perceptual organization: the advent of the Gestalt
revolution in the early 20th century. “Gestalt” is a German word that can roughly be translated
as “whole-form” or “configuration,” but its meaning as the name for this school of psychology
goes considerably beyond such superficial renderings because of its deep theoretical implications.
Gestalt psychology was nothing less than a revolutionary movement that advocated the over-
throw of Structuralism’s theoretical framework, undermining the assumptions of both atomism
and associationism. Following important earlier work by von Ehrenfels on the emergent quali-
ties of melodies, Gestalt psychologists, most notably including Wertheimer, Köhler and Koffka,
argued forcefully against the Structuralist views of Wundt and his followers, replacing their claims
about atomism and associationism with the opposing view that high-level percepts have intrinsic
emergent structure in which wholes are primary and parts secondary, the latter being determined
by their relations to and within the whole. This viewpoint is often expressed through the well-
known Gestalt rallying cry that “the whole is different from the sum of its parts.” Indeed, it was
only when the Gestaltists focused attention on the nature and importance of part-whole organiza-
tion that it was recognized as a significant problem for the scientific understanding of vision. It is
now a central – though not yet well understood – topic, acknowledged by virtually all perceptual
scientists. The historical evolution of the Gestalt approach to perceptual organization is described
in scholarly detail in Chapter 1.
Gestalt psychologists succeeded in demolishing the atomistic, associative edifice of
Structuralism through a series of profound and elegant demonstrations of the importance of
organization in visual perception. Indeed, these demonstrations, which Koenderink (Chapter 3)
calls “compelling visual proofs,” were so clear and definitive that they required only a solid
consensus about the subjective experiences of perceivers when viewing the examples, usually
Foreword vii

without reporting quantitative measurements. Their success is evident in the fact that many
of these initial demonstrations of organizational phenomena have spawned entire fields of
subsequent research in which more sophisticated, objective, and quantitative research meth-
ods have been developed and employed (see Chapter 3). Indeed, the primary topic of this
handbook is the distillation of current, cutting-edge knowledge about the phenomenologi-
cal, physiological, ecological, and computational aspects of perceptual organization that have
been achieved using these modern methods.
Research on the initial organizational phenomena discovered by Gestalt psychologists, such as
grouping (Chapter 4), apparent motion (Chapter 23), and other forms of organization in motion
and depth (Chapter 25), got off to a quick start, impelled largely by their crucial role in undermin-
ing the Structuralist dogma that held sway during the early 20th century, especially in Europe. (The
Gestalt approach was not as successful in the US, largely because American psychology was mired
in theoretical and methodological Behaviorism.) Indeed, Gestalt theorists advanced some claims
about alternatives to Structuralism that were quite radical. Among them were Köhler’s claims
that the brain is a “physical Gestalt” and that it achieves perception through electrical brain fields
that interact dynamically to minimize physical energy. Gestalt theorizing encountered resistance
partly because it went against the accepted consensus that science makes progress by analyzing
complex entities into more elementary constituents and the interactions among them, a claim
explicitly rejected by Gestalt theorists. More importantly, however, acceptance of Gestalt theory
plummeted when Köhler’s electrical field hypothesis was tested physiologically and found to be
inconsistent with the results (see Chapter 1 for details).
The wholesale rejection of Gestalt ideas that followed was an unfortunate example of throwing
the baby out with the bathwater. The poorly understood problem is that Gestalt theory was (and
is) much more general and abstract than Köhler’s electrical field theory or indeed any other par-
ticular implementation of it (see Palmer, 2009, for further explanation). For example, one of the
most central tenets of Gestalt theory is the principle of Prägnanz (or simplicity), which claims that
the organization of the percept that is achieved will be the simplest one possible given the available
stimulation. That is, the visual system attempts both to maximize the “goodness-of-fit” between
the sensory data and the perceptual interpretation and to minimize the perceptual interpretation’s
complexity (see Chapters 50 and 51). Köhler identified complexity with the energy of the electri-
cal brain field, which tends naturally toward a minimum in dynamic interaction within a physical
Gestalt system, which he claimed the brain to be. It is tempting to suppose that if electrical field
theory is incorrect, as implied by the results of experiments, then Gestalt theory in general must
be incorrect. However, subsequent analyses have shown, for example, that certain classes of neural
networks with feedback loops exhibit behavior that is functionally isomorphic to that of energy
minimization in electrical fields. If perception is achieved by activity in such recurrent networks
of neurons, then Gestalt theory would be vindicated, even though Köhler’s electrical field conjec-
ture was incorrect.
An equally important factor in the stagnation of research on perceptual organization was the
advent of World War II, which turned attention and resources away from scientific enterprises
unrelated to the war effort and sent many prominent German Gestaltists into exile in the US. The
Gestalt movement retained a significant prominence in Italy, however, where psychologists such
as Musatti, Metelli, and Kanizsa kept the tradition alive and made significant discoveries concern-
ing the perception of transparency (Chapters 20 and 22) and contours (Chapters 10–12). Other
important findings about perceptual organization were made by Michotte (in Leuven, Belgium),
whose analysis of the perception of causality challenged the long-held philosophical belief that
causality was cognitively inferred rather than directly perceived. These and other contributions to
viii Foreword

the phenomena of perceptual organization kept the field alive, but the period from the 1940s to
the 1960s was a nadir for research in this field.
A variety of forces converged since the 1960s to revitalize interest in perceptual organization
and bring it into the mainstream of the emerging field of vision science. One was the use of mod-
ern, quantitative methods to understand and extend classic Gestalt phenomena. These include
both direct psychophysical measures of organization (e.g., verbal reports of grouping) and visual
features (e.g., surface lightness) and indirect measures of performance in objective tasks (e.g.,
reaction time measures of interference effects). Among the many important examples of such
research are Wallach’s and Gilchrist’s contributions to understanding lightness constancy, Rock’s
work on reference frames in shape perception, Palmer’s studies of new grouping principles and
measures, Kubovy’s quantitative laws for integrating multiple grouping principles, Peterson’s
exploration of the role of past experience in figure-ground organization, Navon’s work on global
precedence, and Pomerantz’s research into configural superiority effects. Such empirical findings
intrigued a new generation of vision scientists, who failed to find low-level sensory explanations
of them – hence the invention of the term “mid-level vision.” A second force was the healthy desire
to shore up the foundations of Gestalt theory by formalizing and quantifying the Gestalt principle
of Prägnanz. This enterprise was advanced considerably by seminal contributions from Attneave,
Hochberg, Garner, Leeuwenberg, van der Helm, and others who applied concepts from informa-
tion theory and complexity theory to phenomena of perceptual organization. A third force that
eventually began to have an effect was the study of the neural mechanisms of organization. Hubel
and Wiesel revolutionized sensory physiology by discovering that the receptive fields of neurons
in visual cortex corresponded to oriented line- and edge-based structures. Their results and the
explosion of physiological research that followed is not generally discussed as being part of the
field of perceptual organization – rather, it is considered “low-level vision” – but it surely can be
viewed that way, as it specifies an early level of structure between retinal receptor outputs and
high-level perceptual interpretations. Subsequent neuroscientific research and theory by pioneers
such as von der Heydt, Lamme, von der Marlsburg, and van Leeuwen addressed higher-level
structure involved in figure-ground organization, subjective (or illusory) contours, and grouping.
A fourth converging force was the idea that perception – indeed, all psychological processes –
could be modeled within an abstract computational framework. This hypothesis can ultimately be
traced back to Turing, but its application to issues of visual organization is perhaps most clearly
represented by Marr’s influential contributions, which attempted to bridge subjective phenom-
ena with ecological constraints and neural mechanisms through computational models. More
recently, Bayesian approaches to the problem of perceptual organization are having an increas-
ing impact on the field due in part to their generality and compatibility with hypotheses such as
Helmholtz’s likelihood principle and certain formulations of a simplicity principle. Many of the
theoretical discussion in this volume are couched in computational terms, and it seems almost
certain that computational theory will continue to loom large in future efforts to understand per-
ceptual organization.
The present volume brings together all of these diverse threads of empirical and theoretical
research on perceptual organization. It will rightly be considered a modern landmark in the com-
plex and rapidly evolving history of the field of perceptual organization. It follows and builds upon
two extensive scholarly review papers that were published exactly 100 years after Wertheimer’s
landmark 1912 article on the phi phenomenon that launched the Gestalt movement (see
Wagemans Elder, Kubovy, Palmer, Peterson, Singh, & von der Heydt, 2012; Wagemans, Feldman,
Gepshtein, Kimchi, Pomerantz, van der Helm, & van Leeuwen, 2012). The 51scholarly chapters it
contains are authored by world-renown researchers and present comprehensive, state-of-the-art
Foreword ix

reviews about how perceivers arrive at knowledge about meaningful external objects, scenes, and
events from the meaningless, ambiguous, piecemeal evidence registered by sensory receptors.
This perceptual feat is nothing short of a miracle, and although we do not yet understand how it
is accomplished, we know a great deal more than was known a century ago when the enterprise
began in earnest. This handbook is thus equally suitable for students who are just beginning to
explore the literature on perceptual organization and for experts who want definitive, up-to-date
treatments of topics with which they are already familiar. And it is, above all, a fitting tribute to the
founding of an important field of scientific knowledge that was born a century ago and the quite
remarkable progress scientists have made in understanding it during that time.
Stephen E. Palmer
Professor of the Graduate School
Psychology & Cognitive Science
University of California, Berkeley, CA
U.S.A.

References
Palmer, S. E. (2009). Gestalt theory. In Bayne, T., Cleeremans, A., & Wilken, P. (Eds.). (2009). The Oxford
Companion to Consciousness (pp. 327–330). Oxford, U.K.: Oxford University Press.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure–ground
organization. Psychological Bulletin, 138(6), 1172–1217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., & van Leeuwen,
C. (2012). A century of Gestalt psychology in visual perception: II. Conceptual and theoretical
foundations. Psychological Bulletin, 138(6), 1218–1252.
Preface

Perceptual organization is a central aspect of perception. Indeed, it is often considered as the


interface between the low-level building blocks of incoming sensations and the high-level inter-
pretation of these inputs as meaningful objects, scenes and events in the world. This is most obvi-
ous in the visual modality, where the features signalled by the neurons in low-level cortical areas
must be combined in order for the high-level areas to make sense of them. However, a similar
functionality of perceptual organization is also present in other modalities (e.g. audition and
haptics). In other words, for vision, perceptual organization is more or less synonymous with
mid-level vision. Mid-level vision is the two-way relay station between low-level and high-level
vision, referring to a wide range of processes such as perceptual grouping, figure-ground organi-
zation, filling-in, completion, and perceptual switching, amongst others. Such processes are most
notable in the context of shape perception but they also play a role in other areas including (but
not restricted to) texture perception, lightness perception, colour perception, motion perception,
depth perception. In summary, perceptual organization deals with a variety of perceptual phe-
nomena of central interest. It is no wonder then that this lively area of research is studied from
many different perspectives, including psychophysics, experimental psychology, neuropsychol-
ogy, neuro imaging, neurophysiology, and computational modelling. Given its central importance
in phenomenal experience, perceptual organization has also figured prominently in old Gestalt
writings on the topic, touching upon deep philosophical issues regarding mind-brain relation-
ships and consciousness. In addition to its historical importance, it still attracts a great deal of
interest from people working in the applied areas of visual art, design, architecture, and music.
The Oxford Handbook of Perceptual Organization brings together the different areas of con-
temporary research in the field of perceptual organization into one comprehensive and authorita-
tive volume. The handbook provides an extensive review of the current literature, written in an
accessible form for scholars and students, functioning as a reference work for many years to come.
The handbook is aimed primarily at researchers and students interested in perceptual organiza-
tion. The majority of this audience will be vision scientists, an interdisciplinary network of psy-
chologists, physicists, optometrists, ophthalmologists, neuroscientists, and engineers – all working
on vision. However, given the central importance of perceptual organization in the broader area
of sensation and perception, experimental and cognitive psychologists should be interested as
well. In addition, in view of the philosophical, historical, and cultural roots of the Gestalt tradi-
tion in which perceptual organization played a key role, some interest is to be expected from
other humanities in addition to psychology. Finally, perceptual organization has recently become
a hot topic in computer vision and graphics, as well as in web design, art, and other applied areas.
Intellectuals from all kinds of disciplinary background will therefore find material in this hand-
book to trigger their curiosity.
Acknowledgements

Editing a handbook such as this is a serious undertaking. It has been high on my list of priorities
for over 3 years, from the first draft of the proposal to the writing of this paragraph. I was aided
in my initial steps by the helpful suggestions of many colleagues, including those who accepted
invitations to become members of the Scientific Advisory Board: Marlene Behrmann, Patrick
Cavanagh, Walter Gerbino, Glyn Humphreys, Stephen E. Palmer, and Pieter Roelfsema. I was
struck by the great level of enthusiasm I received from those I approached to write specific chap-
ters. Almost all accepted right away, and those who did not, explained how much they regret-
ted being unable to contribute due to other commitments. I thank everyone for tolerating my
persistence during the more difficult aspects of the editorial process, such as the coordination
of submissions, reviews, revisions, author proofs, and copyright forms. I would especially like to
thank all of the authors for their excellent contributions, and all of the reviewers (many of them
authors themselves or current and former postdoctoral collaborators) for the useful feedback and
specific suggestions for further improvements. A word of gratitude is in order for Martin Baum
(Senior Commissioning Editor for Psychology and Neuroscience at Oxford University Press),
for his enthusiasm and support throughout the whole process, from the very beginning to the
very end. I would also like to thank Charlotte Green (Senior Assistant Commissioning Editor for
Psychology and Social Work at Oxford University Press) and all the staff at OUP (and their service
companies) for their professional assistance during all steps from manuscript to final production
in electronic and book form. You have all done a marvellous job, thanks a lot!
I would like to thank my university (KU Leuven) and faculty (Psychology and Educational
Sciences) for allowing me a sabbatical when I started to work on this handbook, and the Research
Foundation–Flanders (K8.009.12N) for funding it. In addition, I thank the “Institut d’études avan-
cées” (IEA), Paris for providing an excellent environment to work on a large and time-consuming
project such as this. Last but not least, I thank the Flemish Government for the long-term struc-
tural funding of my large-scale research program, aimed at reintegrating Gestalt psychology into
contemporary vision science and cognitive neuroscience (METH/08/02 and METH/14/02). With
this handbook I hope to significantly contribute to realizing this ambition.
Contents

Contributors  xix

Section 1  General Background


1 Historical and conceptual background: Gestalt theory  3
Johan Wagemans
2 Philosophical background: Phenomenology  21
Liliana Albertazzi
3 Methodological background: Experimental phenomenology  41
Jan J. Koenderink

Section 2  Groups, Patterns, Textures


4 Traditional and new principles of perceptual grouping  57
Joseph L. Brooks
5 Emergent features and feature combination  88
James R. Pomerantz and Anna I. Cragin
6 Symmetry perception  108
Peter A. van der Helm
7 The perception of hierarchical structure  129
Ruth Kimchi
8 Seeing statistical regularities  150
Steven Dakin
9 Texture perception  167
Ruth Rosenholtz

Section 3  Contours and Shapes


10 Contour integration: Psychophysical, neurophysiological and
computational perspectives  189
Robert F. Hess, Keith A. May, and Serge O. Dumoulin
11 Bridging the dimensional gap: Perceptual organization of contour
into two-dimensional shape  207
James H. Elder
12 Visual representation of contour and shape  236
Manish Singh

Section 4  Figure-Ground Organization


13 Low-level and high-level contributions to figure-ground organization  259
Mary A. Peterson
xvi contents

14 Figures and holes  281


Marco Bertamini and Roberto Casati
15 Perceptual completions  294
Rob van Lier and Walter Gerbino
16 The neural mechanisms of figure-ground segregation  321
Matthew W. Self and Pieter R. Roelfsema
17 Neural mechanisms of figure-ground organization: Border-ownership,
competition and perceptual switching   342
Naoki Kogo and Raymond van Ee
18 Border inference and border ownership: The challenge of integrating
geometry and topology  363
Steven W. Zucker

Section 5  Surface and Color Perception


19 Perceptual organization in lightness  391
Alan Gilchrist
20 Achromatic transparency  413
Walter Gerbino
21 Perceptual organization of color  436
Hannah E. Smithson
22 The perceptual representation of transparency, lightness, and gloss  466
Barton L. Anderson

Section 6  Motion and Event Perception


23 Apparent motion and reference frames  487
Haluk Öğmen and Michael H. Herzog
24 Perceptual organization and the aperture problem  504
Nicola Bruno and Marco Bertamini
25 Stereokinetic effect, kinetic depth effect, and structure from motion  521
Stefano Vezzani, Peter Kramer, and Paola Bressan
26 Interactions of form and motion in the perception of moving objects  541
Christopher D. Blair, Peter U. Tse, and Gideon P. Caplovitz
27 Dynamic grouping motion: A method for determining perceptual organization
for objects with connected surfaces  560
Howard S. Hock
28 Biological and body motion perception  575
Martin A. Giese

Section 7  Perceptual Organization and Other Modalities


29 Auditory perceptual organization  601
Susan L. Denham and István Winkler
contents xvii

30 Tactile and haptic perceptual organization  621


Astrid M. L. Kappers and Wouter M. Bergmann Tiest
31 Cross-modal perceptual organization  639
Charles Spence
32 Sensory substitution: A new perceptual experience  655
Noelle R. B. Stiles and Shinsuke Shimojo
33 Different modes of visual organization for perception and for action   672
Melvyn A. Goodale and Tzvi Ganel

Section 8  Special Interest Topics


34 Development of perceptual organization in infancy  691
Paul C. Quinn and Ramesh S. Bhatt
35 Individual differences in local and global perceptual organization  713
Lee de-Wit and Johan Wagemans
36 Mutual interplay between perceptual organization and attention: A
neuropsychological perspective  736
Céline R. Gillebert and Glyn W. Humphreys
37 Holistic face perception  758
Marlene Behrmann, Jennifer J. Richler, Galia Avidan, and Ruth Kimchi
38 Binocular rivalry and perceptual ambiguity  775
David Alais and Randolph Blake
39 Perceptual organization and consciousness  799
D. Samuel Schwarzkopf and Geraint Rees
40 The temporal organization of perception  820
Alex Holcombe

Section 9  Applications of Perceptual Organization


41. Camouflage and perceptual organization in the animal kingdom  843
Daniel C. Osorio and Innes C. Cuthill
42 Design insights: Gestalt, Bauhaus, and Japanese gardens  863
Gert J. van Tonder and Dhanraj Vishwanath
43 Perceptual organization in visual art  886
Jan J. Koenderink

Section 10  Theoretical Approaches


44 Hierarchical organization by and-or tree  919
Jungseock Joo, Shuo Wang, and Song-Chun Zhu
45 Probabilistic models of perceptual features  933
Jacob Feldman
46 On the dynamic perceptual characteristics of Gestalten: Theory-based methods  948
James T. Townsend and Michael J. Wenger
xviii contents

47 Hierarchical stages or emergence in perceptual integration?  969


Cees van Leeuwen
48 Cortical dynamics and oscillations: What controls what we see?  989
Cees van Leeuwen
49 Bayesian models of perceptual organization  1008
Jacob Feldman
50 Simplicity in perceptual organization  1027
Peter A. van der Helm
51 Gestalts as ecological templates  1046
Jan J. Koenderink

Index of Names  1063

Subject Index  1077


Contributors

David Alais Joseph L. Brooks


School of Psychology, The University School of Psychology,
of Sydney, Australia University of Kent, UK
Liliana Albertazzi Nicola Bruno
CIMeC & Department of Humanities, Department of Psychology, University
University of Trento, Italy of Parma, Italy
Barton L. Anderson Gideon P. Caplovitz
School of Psychology, The University Department of Psychology, University
of Sydney, Australia of Nevada Reno, USA
Galia Avidan Roberto Casati
Department of Psychology, Ben-Gurion Institut Jean Nicod, CNRS ENS-DEC EHESS,
University of the Negev, Israel France
Marlene Behrmann Anna I. Cragin
Cognitive Neuroscience Lab, Carnegie-Mellon Department of Psychology, Rice University,
University, USA USA
Wouter M. Bergmann Tiest Innes C. Cuthill
MOVE Research Institute, Faculty School of Biological Sciences, University
of Human Movement Sciences, of Bristol, UK
VU University, Amsterdam,
Steven C. Dakin
The Netherlands
Optometry and Vision Science, University of
Marco Bertamini Auckland, New Zealand
School of Psychology, University
Susan Denham
of Liverpool, UK
Cognition Institute and School of Psychology,
Ramesh S. Bhatt University of Plymouth, UK
Department of Psychology, University of
Lee de-Wit
Kentucky, USA
Laboratory of Experimental Psychology,
Christopher D. Blair University of Leuven (KU Leuven), Belgium
Department of Psychology, University of
Serge O. Dumoulin
Nevada Reno, USA
Experimental Psychology, Helmholtz
Randolph Blake Institute, Utrecht University,
Department of Psychological Sciences, The Netherlands
College of Arts and Science, Vanderbilt
James H. Elder
University, USA
Center for Vision Research, Department of
Paola Bressan Electrical Engineering & Computer Science,
Department of General Psychology, Department of Psychology, York University,
University of Padua, Italy Ontario, Canada
xx Contributors

Jacob Feldman Astrid Kappers


Rutgers Center for Cognitive Science, MOVE Research Institute, Faculty of
Rutgers University, USA Human Movement Sciences, VU University
Amsterdam, The Netherlands
Tzvi Ganel
Department of Psychology, Ben-Gurion Ruth Kimchi
University of the Negev, Israel Department of Psychology,
Institute of Information Processing and
Walter Gerbino
Decision Making, Max Wertheimer
Department of Life Sciences,
Minerva Center for Cognitive Processes and
Psychology Unit “Gaetano Kanizsa”,
Human Performance, University of Haifa,
University of Trieste, Italy
Israel
Martin A. Giese
Jan J. Koenderink
Department of Cognitive Neurology,
Laboratory of Experimental Psychology,
University of Tübingen, Germany
University of Leuven (KU Leuven),
Alan Gilchrist Belgium
Psychology Department, Newark Campus,
Naoki Kogo
Rutgers University, USA
Laboratory of Experimental Psychology,
Céline R. Gillebert University of Leuven (KU Leuven),
Department of Experimental Psychology, Belgium
University of Oxford, UK
Peter Kramer
Melvyn A. Goodale Department of General Psychology, University
Department of Psychology, Western of Padua, Italy
University, Ontario, Canada Keith A. May
Michael H. Herzog Division of Optometry and Visual Science,
Laboratory of Psychophysics, City University London, UK
EPFL SV BMI LPSY, Switzerland Haluk Öğmen
Robert F. Hess Department of Electrical and Computer
McGill Vision Research, McGill University, Engineering, Center for Neuro-Engineering
Montreal, Canada and Cognitive Science,
University of Houston, USA
Howard S. Hock
Department of Psychology, Daniel C. Osorio
Florida Atlantic University, USA School of Life Sciences,
University of Sussex, UK
Alex Holcombe
School of Psychology, The University of Mary A. Peterson
Sydney, Australia Department of Psychology, University
of Arizona, USA
Glyn W. Humphreys
Department of Experimental Psychology, James R. Pomerantz
Oxford University, UK Department of Psychology,
Rice University, USA
Jacob Feldman
Rutgers Center for Cognitive Science, Paul C. Quinn
Rutgers University, USA Department of Psychological and Brain
Sciences, University of Delaware, USA
Jungseock Joo
Computer Science Department, University of
California Los Angeles (UCLA), USA
Contributors xxi

Geraint Rees Peter A. van der Helm


Institute of Cognitive Neuroscience, Laboratory of Experimental Psychology,
University College London, UK University of Leuven (KU Leuven), Belgium
Jennifer J. Richler Raymond Van Ee
Department of Psychology, Vanderbilt Philips Research Laboratories, Department
University, USA of Brain, Body & Behavior, Eindhoven,
The Netherlands
Pieter R. Roelfsema
Laboratory of Experimental Psychology,
Netherlands Institute for Neuroscience,
University of Leuven, (KU Leuven), Belgium
The Netherlands
Donders Institute, Radboud University,
Ruth Rosenholtz Department of Biophysics, Nijmegen,
Department of Brain and Cognitive Sciences, The Netherlands
Massachusetts Institute of Technology, USA
Cees van Leeuwen
D. Samuel Schwarzkopf Laboratory of Experimental Psychologym
Experimental Psychology, University College University of Leuven (KU Leuven), Belgium
London, UK
Rob van Lier
Matthew W. Self Donders Institute for Brain, Cognition and
Netherlands Institute for Neuroscience, Behaviour, Radboud University Nijmegen,
The Netherlands The Netherlands
Shinsuke Shimojo Gert J. van Tonder
Division of Biology and Biological Engineering, Laboratory of Visual Psychology, Kyoto
California Institute of Technology, USA Institute of Technology, Japan
Shuo Wang Stefano Vezzani
Computation and Neural Systems, California Department of General Psychology,
Institute of Technology, USA University of Padua, Italy
Manish Singh Dhanraj Vishwanath
Rutgers Center for Cognitive Science, Rutgers School of Psychology and Neuroscience,
University, USA University of St Andrew, UK
Hannah Smithson Johan Wagemans
Department of Experimental Psychology, Laboratory of Experimental Psychology,
Oxford University, UK University of Leuven (KU Leuven), Belgium
Charles Spence Shuo Wang
Oxford University, Department of Computation and Neural Systems, California
Experimental Psychology, UK Institute of Technology, USA
Noelle R. B. Stiles Michael J. Wenger
Computation and Neural Systems, California Department of Psychology, The Pennsylvania
Institute of Technology, USA State University, USA
Peter U. Tse István Winkler
Department of Psychological and Brain Institute of Psychology and Cognitive
Sciences, Dartmouth College, USA Neuroscience, Research Centre for Natural
James T. Townsend Sciences, Hungarian Academy of Sciences,
Department of Psychology, Hungary
Indiana University, USA
xxii Contributors

Song Chun Zhu Steven W Zucker


Center for Vision, Cognition, Learning and Department of Computer Science, Yale
Art, University of California Los Angeles University, USA
(UCLA), USA
Section 1

General background
Chapter 1

Historical and conceptual background:


Gestalt theory
Johan Wagemans

Introduction
In 2012, it was exactly 100 years ago since Wertheimer published his paper on phi-motion (1912)–
perception of pure motion, that is, without object motion – which many consider to be the start of
Gestalt psychology as an important school of thought. The present status of Gestalt psychology is
quite ambiguous. On the one hand, most psychologists believe that the Gestalt school has died with its
founding fathers in the 1940s, after some devastating empirical findings regarding electrical field the-
ory in the 1950s, or as a natural decline because of fundamental obstacles against further progress, and
stronger theoretical and experimental frameworks arising and gaining dominance, since the 1960s
and 1970s (e.g., cognitive science, neuroscience). On the other hand, almost all psychology textbooks
still contain a Gestalt-like chapter on perceptual organization (although often quite detached from the
other chapters), and new empirical papers on Gestalt phenomena are published on a regular basis.
I believe that Gestalt psychology is quite relevant to current psychology in several ways. Not
only has contemporary scientific research continued to address classic questions regarding the
emergence of structure in perceptual experience and the subjective nature of phenomenal aware-
ness (e.g., visual illusions, perceptual switching, context effects), using advanced methods and
tools that were not at the Gestaltists’ disposal. I also believe that the revolutionary ideas of the
Gestalt movement can still function as a dissonant element to question some of the fundamental
assumptions of mainstream vision science and cognitive neuroscience (e.g., elementary build-
ing blocks, channels, modules, information-processing stages). Indeed, much progress has been
made in the field of non-linear dynamical systems, theoretically and empirically (e.g., techniques
to measure and analyze cortical dynamics), which allows us to surpass some of the limitations in
old-school Gestalt psychology, as well as in mainstream vision research.
To be able to situate all the reviews of a century of theoretical and empirical work on perceptual
organization in this handbook against the background of this special position of Gestalt psychol-
ogy, I will first introduce the key findings and ideas in old-school Gestalt psychology, its historical
origin and development, rise and fall. I will sketch only the main lines of thought and major steps
in the history. For a more extensive treatment of the topic, I refer to Ash (1995).

Early History of Gestalt Psychology


Wertheimer’s discovery of phi motion (1912)
What Max Wertheimer1 discovered was not the phenomenon of apparent motion – that is, the
perception of motion between two stationary light sources, flashing on and off at given intervals,

  The names in boldface are the historically most important Gestalt psychologists.
1
4 Wagemans

but a special case. It concerned perceived motion without seeing an object moving, so rather than
the standard case of seeing an object first at location a, and then, after an interval φ, at location b
(i.e., apparent motion from a to b), here it concerned pure φ, without a percept of a or b. The gen-
eral phenomenon of apparent motion had already been observed as early as 1850 by the Belgian
physicist Joseph Platteau, Sigmund Exner (one of Wertheimer’s teachers) had obtained it with two
electric sparks in 1875, and in 1895 the Lumière brothers had patented the ‘cinématographe’, an
invention based on the phenomenon. (For an excellent discussion of its historical importance, see
Sekuler, 1996; for a demonstration of the phenomenon and for a review of its misrepresentation
in later sources, see Steinman, Pizlo, & Pizlo, 2000; for a recent review of apparent motion, see
Herzog & Ogmen, this volume.)
According to a famous anecdote, Wertheimer came to the idea for this experiment when he saw
alternating lights on a railway signal, while on his way from Vienna to the Rhineland for vaca-
tion in the autumn of 1910. He got off the train in Frankfurt, bought a toy stroboscope and began
constructing figures to test the idea in his hotel room. He then called Wolfgang Köhler, who had
just begun to work as an assistant at the Psychological Institute there. Köhler provided him with
laboratory space and a tachistoscope with a rotating wheel, especially constructed by Schumann
(the Institute’s Director) to study successive expositions. According to the conventional view of
apparent motion perception, we see an object on several positions successively and something is
then added subjectively. If this were correct, then an object would have to be seen moving, and
at least two positions, the starting and end points, would be required to produce seen motion.
Neither of these conditions held in the case of phi motion. By systematically varying the form,
color, and intensity of the objects, as well as the exposure intervals and stimulus distances between
them, and by examining the role of attitude and attention, Wertheimer was able to refute all of the
current theories of motion perception.
In the standard experiment, a white strip was placed on a dark background in each slit, while
the rotation speed of the tachistoscope wheel was adjusted to vary the time required for the light
to pass from one slit to the next. Above a specific threshold value (~200 ms), observers saw the
two lines in succession. With much faster rotation (~30 ms), the two lines flashed simultane-
ously. At the so-called optimal stage (~60 ms), observers saw a definite motion that could not
be distinguished from real motion. When the time interval was decreased slightly below 60 ms,
after repeated exposures, observers saw motion without a moving object. Although he used only
three observers (Wolfgang Köhler, Kurt Koffka, and Koffka’s wife Mira), he was quite confident
in the validity of the results: the characteristic phenomena appeared in every case unequivocally,
spontaneously, and compellingly. After confirming Exner’s observation that apparent motion pro-
duces negative after-images in the same way as real motion, Wertheimer proposed a physiological
model based on some kind of physiological short circuit, and a flooding back of the current flow,
creating a unitary continuous whole-process. He then extended this to the psychology of pure
simultaneity (for the perception of form or shape) and of pure succession (for the perception of
rhythm or melody). This extension was the decisive step for the emergence of the Gestalt theory.

Implications: Gestalt theory
The phi phenomenon was simply a process, a transition (‘an across in itself ’) that cannot be com-
posed from the usual optical contents of single object percepts at two locations. In other words,
perceived motion was not just added subjectively after the sensory registration of two spatiotem-
poral events (or snapshots), but something special with its own phenomenological characteris-
tics and ontological status. Indeed, based on the phi phenomenon, Wertheimer argued that not
Historical and conceptual background 5

sensations, but structured wholes or Gestalten are the primary units of mental life. This was the
key idea of the new and revolutionary Gestalt theory.
The notion of ‘Gestalt’ was already introduced into psychology by Christian von Ehrenfels in
his essay ‘On Gestalt qualities’ (1890), one of the founding document of Gestalt theory. Because we
can recognize two melodies as identical, even when no two notes in them are the same, he argued
that these forms must be something more than the sum of the elements. They must have, what
he called ‘Gestalt quality,’ a characteristic, which is immediately given, along with the elementary
presentations that serve as its fundament, dependent upon the objects, but rising above them. In
his discussion of the epistemological implications of his discovery of phi motion, Wertheimer
went considerably beyond von Ehrenfels’s notion of one-sided dependence of Gestalt qualities
on sense data, which made wholes more than the sum of their parts, while maintaining the parts
as foundations (‘Grundlage’). He claimed instead that specifiable functional relations exist that
decide what will appear or function as a whole and as parts (i.e., two-sided dependency). Often
the whole is grasped even before the individual parts enter consciousness. The contents of our
awareness are mostly not summative, but constitute a particular characteristic ‘togetherness’, a
segregated structure, often comprehended from an inner centre, to which the other parts of the
structure are related in a hierarchical system. Such structures were called ‘Gestalten,’ which are
clearly different from the sum of the parts. They were assumed to arise on the basis of continuous
whole-processes in the brain, rather than associated combinations of elementary excitations.
With this significant step, Wertheimer separated himself from the Graz school of Gestalt psy-
chology, represented by Alexius Meinong, Christian von Ehrenfels, and Vittorio Benussi, who
maintained a distinction between sensation and perception, the latter produced on the basis of the
former (Boudewijnse, 1999; for further discussion, see Albertazzi, this volume). The Berlin school,
represented by Max Wertheimer, Kurt Koffka, and Wolfgang Köhler, went further and considered
a Gestalt as a whole in itself, not founded on any more elementary objects. Instead of perception
being produced from sensations, a percept organizes itself by mutual interactions, a percept arises
non-mechanically by an autonomous process in the brain. The Berlin school also did not accept a
stage theory of perception and, hence, distinguished itself from the Leipzig school, represented by
Felix Krüger, Friedrich Sander, and Erich Jaensch, in which the stepwise emergence of Gestalten
(‘Aktualgenese’ or ‘microgenesis’) played a central role (see va Leeuwen, this volume).
Although the Berlin theorists adhered to a non-mechanistic theory of causation and did not
want to analyze the processes into stages, they did believe that the critical functional relations in
the emergence of Gestalts could be specified by several so-called Gestalt laws of perceptual organ-
ization. They were inspired by Johann Wolfgang Goethe, who introduced the notion of ‘Gestalt’
to refer to the self-actualizing wholeness of organic forms. For Goethe, the functional role of an
organism’s parts is determined by a dynamic law inherent in the whole, filled with comings and
goings, but not mechanical operations. The ideal end results of these dynamic interactions are clas-
sically proportioned forms, signs of balance, lawfulness, and order realizing itself in nature, not
imposed upon it by an ordering mind. However, at the same time, the Berlin theorists wanted to
give this notion a naturalistic underpinning to avoid the anti-physicalist attitude of Felix Krüger’s
holistic psychology (‘Ganzheitspsychologie’), which was characteristic of the Leipzig school.
They were all trained in experimental psychology by Carl Stumpf in Berlin, who strongly
believed in the immediately given as the basis of all science (cf. Brentano) and in the lawfulness
of the given, which included not only simple sensations of color or tone, but also spatially and
temporally extended and distributed appearances, as well as relationships among appearances,
such as similarity, fusion, or gradation. The laws of these relationships are neither causal nor
functional, but immanent structural laws according to Stumpf. It is these structural laws that
6 Wagemans

the Berlin school was about to uncover. Already at a meeting of the Society for Experimental
Psychology in 1914, Wertheimer announced that he had discovered a general kind of Gestalt law,
a tendency towards simple formation (‘Gestaltung’), called the law of the Prägnanz of the Gestalt.
Unfortunately, the promised publication did not appear until 1923, although the experiments
were essentially from the years 1911–1914.

Further Developments of Gestalt Psychology


Although Max Wertheimer could be considered as the founding father of the Berlin school, his
younger colleagues, Kurt Koffka and Wolfgang Köhler were just as important in its further
development. The initial period was characterized by explaining how radically revolutionary the
new Gestalt theory was. For instance, in his essay ‘On unnoticed sensations and errors of judg-
ment,’ Köhler (1913) criticized the tendency shared by Helmholtz and Stumpf to regard percep-
tions and sensations as unambiguously determined by peripheral stimulation as much as possible.
In the same spirit, Koffka (1914) argued that a complete transformation of perceptual theory had
occurred because sensation was now understood from the point of view of perception, instead of
the other way around. Koffka clarified this position in a 1915 polemic against Vittorio Benussi,
a vehement proponent of the Graz school, which became the first full statement of Gestalt theory
as a psychological system. The fundamental break with the Graz school was a radical revision
in the meaning of the world ‘stimulus.’ In this new conception, this word no longer referred to a
pattern of excitations on a sense organ, as it had throughout the 19th century, but to real objects
outside of and in functional relation to a perceiving and acting organism. Benussi, being trained
in ontology by Meinong (see Albertazzi, this volume), insisted on maintaining the distinction
between stimulation and perception. In fact, he distinguished sensory responses from different
kinds of presentations (‘Vorstellungen’), for instance, elementary ones and perceived Gestalts,
the latter being produced from the former in different phases (Albertazzi, 2001). Koffka instead
cared only about psychological experience, not in the analysis of the building blocks or process-
ing phases or stages. After this dispute, Koffka went further to expand the Gestalt notion from
perception to motor action, which became considered as an organized whole process too, with a
structure that cannot be reduced to a bundle of reflexes. As Koffka boldly asserted, ‘there are real
Gestalten.’ After this initial period, two major developments are generally considered as high-
lights in the history of Gestalt psychology: Köhler’s ‘physical Gestalten’ (1920) and Wertheimer’s
Gestalt laws’ (1923).

Köhler’s ‘physical Gestalten’ (1920) and isomorphism


In 1920, Wolfgang Köhler published ‘Die physischen Gestalten in Ruhe und im stationären Zustand,’
in which he extended the Gestalt concept from perception and behavior to the physical world,
and thus attempted to unify holism and natural science in a way that was very distinct from
the holistic psychology of the Leipzig school. Inspired by work of his friends in physics (Albert
Einstein, James Clerk Maxwell, and Max Planck), Köhler proposed to treat the neurophysiological
processes underlying Gestalt phenomena in terms of the physics of field continua rather than that
of particles or point-masses. In a well-insulated ellipsoid conductor, for instance, the density of
charge is greatest at the points of greatest curvature and smallest at the points of least curvature.
The distribution of charge in such a conductor thus depends on the shape of the conductor (i.e.,
the system’s topography), but is independent of the materials used or the total quantity of charge
involved. In such physical systems, which he called ‘strong Gestalten,’ the mutual dependence
among the parts is so great that no displacement or change of state can occur without influencing
Historical and conceptual background 7

all the other parts of the system. Köhler then showed that stationary electric currents, heat cur-
rents, and all phenomena of flow are strong Gestalten in this sense. These he distinguished from
what he called ‘weak Gestalten,’ which are not immediately dependent on the system’s topography
(e.g., a group of isolated conductors connected by fine wires). Weak Gestalten are satisfactorily
treated with simultaneous linear algebraic functions, whereas strong Gestalten must be described
either with integrals or with series of partial differential equations.
In addition, Köhler tried to construct a specific testable theory of brain processes that could
account plausibly for perceived Gestalten in vision. In short, he presented visual Gestalten as the
result of an integrated Gestalt process in which the whole optic sector from the retina onward
is involved, including transverse functional connections among conducting nerve fibres. The
strongest argument for proposing that the brain acted as a whole system was the fact that Gestalts
were found at many different levels: seen movement, stationary Gestalten, the subjective geom-
etry of the visual field, motor patterns, and insightful problem solving in animals. This theory
had dramatic consequences. For Gestalt theory, the 3-D world that we see is not constructed by
cognitive processes on the basis of insufficient sensory information. Rather, the lines of flow are
free to follow different paths within the homogeneous conducting system, and the place where a
given line of flow will end in the central field is determined in every case by the conditions in the
system as a whole. In modern terms, Köhler has described the optic sector as a self-organizing
physical system.
Based on this general theory of physical Gestalten and this specific theory of the brain as a
self-organizing physical system within which experienced Gestalten emerge, Köhler then came
to the postulate of ‘psychophysical isomorphism’ between the psychological facts and the brain
events that underlie them. With this he meant, as Wertheimer before him, functional instead
of geometrical similarity, so it is not the case that brain processes must somehow look like per-
ceived objects. Köhler also insisted that such a view does not prescribe featureless continuity in
the cortex, but is perfectly compatible with rigorous articulation. He conceded that experiments
to establish the postulated connections between experienced and physical Gestalten in the brain
were nearly unthinkable at the time from a practical point of view, but that this should not detract
from its possibility in principle. In the meantime, Köhler tried to show that his postulate was
practical by applying it to the figure-ground phenomena first reported by Edgar Rubin in 1915.
Decades later, after Köhler emigrated to the USA, he attempted to carry out such experiments (see
Section “In the USA” below).
All of the examples Köhler had offered of physical Gestalten were equilibrium processes, such as
the equalization of osmotic pressures in two solutions by the migration of ions across the boundary
between them, or the spontaneous distribution of charged particles on conductors. As Maxwell’s
field diagrams showed, we could predict from a purely structural point of view the movements
of conductors and magnets, and the groupings of their corresponding fields, in the direction of
increased evenness of distribution, simplicity, and symmetry. This was a qualitative version of the
tendency (described by Planck) of all processes in physical systems left to themselves, to achieve
the maximum level of stability, which is synonymous with the minimum expenditure of energy,
allowed by the prevailing conditions. Köhler explained this tendency – based on the second law of
thermodynamics or the entropy principle – with an example from hydrostatics. When dipping wire
frames of different forms into a solution of water and soap, one can see that such physical sys-
tems tend toward end states characterized by the simplest and most regular form, a tendency that
Köhler called the tendency to the simplest shape or toward ‘the Prägnanz of the Gestalt,’ alluding
to the principle already enunciated but rather vaguely by Wertheimer at the meeting of the Society
for Experimental Psychology in 1914.
8 Wagemans

Wertheimer’s ‘Gestalt laws’ (1923)


Around the same time, Max Wertheimer developed his Gestalt epistemology further and he out-
lined the research practice of experimental phenomenology that was based on it. He first stated
the principles publically in a manifesto published in Volume 1 of Psychologische Forschung in
1922:  ‘Untersuchungen zur Lehre von der Gestalt, I:  Prinzipielle Bemerkungen.’ There he called
for descriptions of conscious experience in terms of the units people naturally perceive, rather
than the artificial ones assumed to be in agreement with proper scientific method. Implicit in
conventional psychological descriptions is what he called a mosaic or bundle-hypothesis – the
assumption that conscious experience is composed of units analogous to physical point-masses
or chemical elements. By making this assumption, psychologists constrain themselves to link con-
tents of consciousness in a piecemeal fashion, building up so-called higher entities from below,
with the help of associative connections, habits, hypothesized functions, and acts or a presup-
posed unity of consciousness.
In fact, however, such ‘and-sums,’ as Wertheimer delightfully called them, appear only seldom
(i.e., under certain characteristic, limited conditions) and perhaps even only in approximation.
Rather, the given is, in itself, formed (‘gestaltet’) – given are more or less completely structured,
more or less determinative wholes and whole-processes, each with its own inner laws. The con-
stitution of parts in such wholes is a very real process that changes the given in many ways.
In research, therefore, proceeding ‘from below to above’ (‘von unten nach oben’) would not be
adequate, but rather the way ‘from above to below’ (‘von oben nach unten’) is often required. Note
that this twin-set of concepts is not what we nowadays indicate by ‘bottom-up’ and ‘top-down,’
respectively. The latter notions refer more to ‘sense-driven’ and ‘concept-driven,’ respectively,
and in this regard Gestalts are more sense-driven or bottom-up, by being based on autonomous
tendencies, not depending on previous knowledge, expectations, voluntary sets, observer inten-
tions, etc.
Wertheimer offered evocative examples of what he meant by working ‘from above’ instead of
‘from below’ in 1923, when he presented a full account of the ‘Gestalt laws’ or tendencies that he
had announced in 1914. The perceptual field does not appear to us as a collection of sensations
with no meaningful connection to one another, but is organized in a particular way, with a spon-
taneous, natural, normally-expected combination and segregation of objects. Wertheimer’s (1923)
paper was an attempt to elucidate the fundamental principles of that organization. Most general
was the law of Prägnanz. This states, in its broadest form, that the perceptual field and objects
within it take on the simplest and most impressive structure permitted by the given conditions.2
More specific were the laws of proximity, similarity, closure, and good continuation. These laws
are discussed in more detail in many of the chapters to follow (e.g. Brooks, this volume), but here
I will attempt to remove some common misunderstandings about them.
Wertheimer was not the first to outline these principles. Indeed, Schumann (1900) and Müller
(1904) had mentioned the existence of such tendencies in perception much earlier, but they had said
only that these tendencies make the perception of stimulus patterns easier (for a recent review of this

  The German word ‘Prägnanz’ is derived from the verb ‘prägen,’ – to mint a coin. Hence, by describing the
2

principle of Prägnanz as the tendency towards the formation of Gestalten, which are as regular, simple, sym-
metric (‘ausgezeichnet’, according to Wertheimer’s term) as possible given the conditions, a connection is
made to the notion of ‘Gestalt’ as the characteristic shape of a person or object, or the likeness of a depiction to
the original (which was the colloquial German meaning before Goethe and von Ehrenfels assigned it its more
technical meaning as we know it today). For this reason, ‘Prägnanz’ has often been translated as ‘goodness.’
Historical and conceptual background 9

history, see Vezzani et al., 2012). Wertheimer, instead, maintained that they are determinative for
the perception of figures and for form perception in general. Wertheimer also recognized the pow-
erful effect of observers’ attitudes and mental set, but by this he understood primarily a tendency
to continue seeing the pattern initially seen, even under changed conditions. Nor did he deny the
influence of previous experience, such as habit or drill, but he insisted that these factors operate only
in interaction with the autonomous figurative forces at work in the immediate situation. Moreover,
Wertheimer did not exclude quantitative measurements from his program but he made it clear that
such measurements should be undertaken only in conjunction with detailed phenomenological
description to discover what ought to or meaningfully could be measured. In fact, Wertheimer had
not elaborated a finished theory, but had presented an open-ended research program. He converted
the culturally resonant term ‘Gestalt’ and the claim that the given is ‘gestaltet’ into a complex research
program to discover the principles of perceptual organization in both its static and dynamic aspects.

The Rise and Fall of Gestalt Psychology


Significant expansion in 1920–1933
The development of Wertheimer’s open-ended research program was significantly facilitated by
the establishment of a real Gestalt school. The founding fathers acquired professorships at major
universities in Germany (Koffka in Giessen in 1919, Köhler in Berlin in 1922, and Wertheimer
in Frankfurt in 1929), and they founded the journal Psychologische Forschung in 1921. Together
they supervised a large number of PhD theses, which amounted to unpacking the empirical and
theoretical implications of Wertheimer’s (1923) paper. The initial steps were usually disarm-
ingly simple demonstrations. Friedrich Wulf (1922) had already attempted to demonstrate the
applicability of the law of Prägnanz to memory before Wertheimer’s paper appeared. Wilhelm
Benary (1924) employed an experiment devised by Wertheimer to test the law of Prägnanz on a
phenomenon of brightness contrast, and introduced the principle of ‘belongingness’. Following
up on Koffka’s (1923) experimental proof that achromatic (black-white) color contrast does not
depend on the absolute amount of available light but on what he called ‘stimulus gradients,’
Susanne Liebmann (1927) pursued this line of investigation further by relating chromatic color
to principles of organization, specifically to the figure-ground phenomenon originally studied by
Edgar Rubin (1915). In 1923, Adhemar Gelb and Ragnar Granit had already demonstrated that
thresholds for seeing a given color were lower when it was regarded as figure than when it was
seen as background.
Perhaps the most spectacular demonstration of the fundamental role of organization in percep-
tion came from Wolfgang Metzger’s (1930) research with a homogeneous ‘Ganzfeld’ (i.e. a way to
stimulate an observer’s visual field uniformly and remove all structure from it). Kurt Gottschaldt
(1926, 1929) tested Wertheimer’s claim that habit and drill are secondary to organization, and
showed that so-called ‘embedded figures’ were not found more easily in a group of subjects that
had seen them in isolation 520 times compared with a group of subjects who had seen them only
three times. Herta Kopfermann (1930) explored the role of the Gestalt tendencies in the appear-
ance of plane figures as 3-D.
In research on motion and organization, there was a progression from relatively simple dem-
onstration experiments to more complicated apparatus-driven designs. Josef Ternus (1926)
asked what kinds of perceived motion are needed to experience ‘phenomenal identity’, i.e. unified
moving objects. In a spectacular demonstration of both Prägnanz and depth effects in motion
perception, Wolfgang Metzger (1934) used an ingenious setup of his own design, which he
10 Wagemans

called a rotating light-shadow apparatus, yielding what is now known as the ‘kinetic depth effect’
(Wallach & O’Connell, 1953; see also Vezzani, Kramer, & Bressan, this volume). In-between
Ternus and Metzger, Karl Duncker (1929) altered both the research modus and the terms of
discourse about these issues in his research on what he called ‘induced motion.’ In this work, he
combined some remarks from Wertheimer’s 1912 paper about the role of the observer’s position
in motion terminology with terminology from relativity theory in physics (borrowing the term
‘egocentric frames of reference’ from Georg Elias Müller). More parametric follow-up studies
were carried out by Brown (1931a,b,c) and Hans Wallach (1935). For recent reviews of motion
perception in the Gestalt tradition, see Herzog & Öğmen, this volume; Bruno & Bertamini, this
volume).
In the meantime, Gestalt thinking also affected research on other sense modalities (e.g., bin-
aural hearing by Erich von Hornbostel), on learning and memory (e.g., Otto von Lauenstein
and Hedwig von Restorff, both working under Köhler in search for physiological trace fields),
and on thought (e.g., Karl Duncker’s work on stages in productive thinking, moving away from
Wertheimer’s work on re-centering and Köhler’s work on sudden insight). At first sight, Gestalt
theory seemed to develop, rather consistently, from studying the fundamental laws of psychol-
ogy first under the simplest conditions, in rather elementary problems of perception, and then
including more and more complex sets of conditions, turning to memory, thinking, and acting.
At the same time, however, the findings did not always fit the original theories, which consti-
tuted serious challenges to the Gestalt framework. This was even more true for applications of
Gestalt theory to action and emotion (by Kurt Lewin), to neuropathology and the organism
as a whole (by Adhemar Gelb and Kurt Goldstein), to film theory and aesthetics (by Rudolf
Arnheim).
In summary, the period from 1920 to 1933 marked the high point, but not the end of Gestalt
psychology’s theoretical development, its research productivity, and its impact on German science
and culture. At the same time, Gestalt theory had some impact on research in the USA, as well,
mainly owing to Kurt Koffka (e.g., the notion of vector field inspired some interesting empirical
work published in the American Journal of Psychology; see Brown & Voth, 1937; Orbison, 1939).
Reviews of Gestalt psychology appeared in Psychological Review on a regular basis (e.g., Helson,
1933; Hsiao, 1928), a comprehensive book on state-of-the-art Gestalt psychology was published
as early as 1935 (Hartmann, 1935), and three years later Ellis’s (1938) influential collection of
translated excerpts of core Gestalt readings made some of the original sources accessible to a
non-German-speaking audience. Already in 1922, at Robert Ogden’s invitation, he had published
a full account of the Gestalt view on perception in Psychological Bulletin. He emigrated to the USA
mainly for professional reasons, after accepting a job at Smith College in 1927, long before such a
step became politically necessary, as for many other Gestaltists.

From 1933 to World War II


General situation
In this period, many of the psychology professors at German universities lost their posts
because of their Jewish origin, and many emigrated to the USA taking on new positions there
(e.g., Wertheimer at the New School for Social Research in New  York in 1933, Kurt Lewin at
Cornell University in 1934). Wolfgang Köhler, who was not a Jew, protested frequently and resisted
for a long time, but then accepted a position at Swarthmore College in 1935. Rudolf Arnheim first
moved to Rome, then to England, and finally to the USA. Others stayed, like Wolfgang Metzger,
Kurt Gottschaldt, and Edwin Rausch. Much has been said and written about the relationships
Historical and conceptual background 11

between the Gestalt psychologists at German universities during this period, and the political
attitudes and acts of the Nazi regime (e.g., Mandler, 2002; Prinz, 1985; Wyatt & Teuber, 1944),
which clearly went beyond pragmatic survival behavior in some cases (e.g., Erich Jaensch’s empir-
ical anthropology). I will focus only on the scientific contributions and impact on Gestalt psy-
chology here. Compared with the flourishing previous period, the institutional conditions for
Gestalt-theoretic research in the Nazi period were considerably reduced, but it was possible to
continue at least some of the lines of work already begun.
After the appearance of a pioneering monograph, ‘Thing and Shadow,’ by Vienna psychologist
Ludwig Kardos in 1934, Gestalt researchers pursued the issue further, for instance, examining
spatial effects of brightness contrast or applying Duncker’s work on induced motion to bright-
ness perception. Perhaps the most interesting research in this period was Erich Goldmeier’s study
of judgment of similarity in perception, published in 1937. His starting point was the problem
originally raised by Harald Höffding and Ernst Mach in the 1890s. How do we know an object
or features is the same as one we have seen before; or, how do we recognise forms as the same
even when they are presented in different positions? In Goldmeier’s view, his results showed that
what is conserved in perceived similarity are the phenomenal function of the parts within the
perceived whole or the agreement of those qualities, which determine the phenomenal organiza-
tion of the field in question. He found that similarity of form properties was best preserved by
proportional enlargement, while it was best to keep their measure constant for the similarity of
material properties.
Around the same time, two major developments in Gestalt theory occurred that have generally
been ignored outside Germany. Edwin Rausch’s monograph on ‘summative’ and ‘nonsummative’
concepts (1937) and Wolfgang Metzger’s theoretical masterpiece, ‘Psychology.’

Edwin Rausch
Rausch’s aim was to develop a more systematic account of the concepts of part and whole, with
the aid of innovations in symbolic logic pioneered by Bertrand Russell, Rudolf Carnap, Giuseppe
Peano, and others. Despite some conceptual difficulties, Rausch’s work had an immediate impact
(although not outside Germany). In an analysis of the Gestalt concept published in 1938, the emi-
grated logical empiricist philosophers Kurt Grelling and Paul Oppenheim attempted, in explicit
agreement with Rausch, to clarify the notions of sum, aggregate, and complex, in a way that would
elucidate the actual content of von Ehrenfels’s and Köhler’s Gestalt concepts and differentiate
them from one another. Such analyses could have saved the Gestalt concept from the recurring
charge of vagueness, if they had not been ignored at the time. However, because they presupposed
an empiricist standpoint, Grelling and Oppenheim failed to engage the epistemological core of
Gestalt theory – Wertheimer’s claim that Gestalten are immanent in experience, not categories
imposed upon experience. For a thorough discussion, see Smith (1988).

Wolfgang Metzger
After Wertheimer’s dismissal, Wolfgang Metzger became de facto head of the Frankfurt Institute,
and he was able to maintain his major lines of research by taking a collaborative stance regarding
the Nazi regime. In 1936, Metzger published a synoptic account of research on the Gestalt theory
of perception entitled ‘Gesetze des Sehens’ (‘Laws of seeing’), since reissued and vastly expanded
three times, and translated in 2006.
Even more important from a theoretical perspective was Metzger’s (1941) book, ‘Psychology: The
development of its fundamental assumptions since the introduction of the experiment.’ The original
title was ‘Gestalt theory,’ but he changed it to make clear that his aim was to make Gestalt theory
12 Wagemans

the conceptual foundation of general psychology. To achieve this, he employed a strategy rather
different from that of Kurt Koffka’s major text of the same period, ‘Principles of Gestalt Psychology’
(1935), which he wrote in the USA. Koffka wrote mainly against positivism (materialism, vital-
ism, E. B. Titchener, and behaviorism), while Metzger wrote mainly against non-positivists who
opposed natural-scientific psychology, or those who criticized Gestalt theory for its alleged lack
of biological orientation. Koffka structured his textbook in a standard way, enunciating general
Gestalt principles and then applying them to standard topics, beginning with a detailed account
of visual perception, proceeding to a critical reworking of Lewin’s work on action and emotion,
incorporating research by Wertheimer, Duncker, and Köhler on thinking, learning, and memory,
and finally applying Gestalt principles to personality and society. Metzger, however, presented not
a conventional textbook, but an attempt to revise the theoretical presuppositions of modern psy-
chology. His hope was that this approach would put an end to the misunderstanding that Gestalt
theory was merely a psychophysical theory that seeks to explain the entire psychical realm at any
price by means of known physical laws. The assumption that he questioned was that real causes
of events must be sought only behind, not within phenomena. The strategy he employed was to
convert Gestalt principles into meta-theoretical concepts and depict them as names for intrinsic
natural orderings. His chapter headings were, therefore, not standard textbook topics, but rather
terms from Gestalt-type phenomenology of perception, such as qualities, contexts, relational sys-
tems, centering, order, and effects.
Of particular interest and originality was Metzger’s discussion of psychological frames of ref-
erence or relational systems. The presupposition under attack was that of psychological space
as a collection of empty, indifferent locations. Instead, he argued that all location in space and
time, as well as all phenomenal judgment, is based on relations in more extended psychological
regions. To explain why relatedness is ordinarily hidden from immediate experience and that
in ordinary life the absolute quality of things appears their most outstanding characteristic, he
recognized that Wertheimer’s application of the word Gestalt to both seen objects and the struc-
ture of the perceptual field as a whole required modification. Specifically, Metzger acknowledged
that the characteristic membership of regions in a relational system is correlative to but different
from the relation of parts to their whole. A true part is in a two-sided relation with its whole;
a part of a relational system is in a one-sided, open-ended relation with the system as a whole.
A thing in space, for example, leaves no gap on removal, but a piece of a puzzle does. With this
modification, Metzger could get a conceptual grip on the myriad tendencies he and his stu-
dents had to suppose to account for the results that could not be explained by simple analogies
to Wertheimer’s Gestalt laws. To cover these, he posited a principle of branched effects, which
stated that wherever the experienced field had more dimensions than the stimulus field, an infi-
nite variety of experiences can emerge from the same stimulus constellation, depending on the
structure of the environmental situation and the state of the perceiving organism. With this
principle, it became possible to portray processes considered psychological, such as attention
and attitudes, as relational systems, and thus bring them into purview of Gestalt theory. It also
implied the possibility of extending Gestalt theory from perception and cognition to personality
and the social realm.
Metzger’s book was an eloquent statement of Gestalt principles and their conceptual founda-
tions but it was problematic both as a summary of what Gestalt theory had achieved and as a
response to its critics. Unexperienced entities as Gestalt centres of gravity are not causes of what
we perceive, but part of a larger, self-organizing Gestalt context that included the given. In addi-
tion, the organism-environment nexus is a relational system, not a Gestalt. In this way, Metzger
had reached Gestalt theory’s conceptual limits for which he tried to compensate in part with
Historical and conceptual background 13

terminological concessions to Leipzig’s holistic psychology. Like that of Koffka from the same
period, Metzger’s book considerably expanded the conceptual range of Gestalt theory. Precisely
that elaboration gave Gestalt theory a new, more finished look – the look of a system – during the
1930s, which it had not had before. However, because it now lacked the necessary institutional
base in Germany (e.g., very few PhD students), the book did not have a major impact on the field
as a whole in this period. Hence, this was at the same time the culmination of Gestalt theory and
the start of its decline.

After World-War II
In the USA
After their emigration to the USA, the founding fathers of Gestalt psychology did not perform
much new experimental work. Instead, they mainly wrote books in which they outlined their
views (e.g., Koffka, 1935; Köhler, 1940; Wertheimer, 1945). The big exception was Köhler who had
taken up physiological psychology, using EEGs and other methods in an attempt to verify his iso-
morphism postulate directly. Initially, his results with Hans Wallach on so-called figural afteref-
fects appeared to support his interpretation in terms of satiation effects of direct cortical currents
(Köhler & Wallach, 1944). Afterwards, he was able to directly measure cortical currents – as EEG
responses picked up from electrodes at the scalp, which flow in directions corresponding to some
bright objects moving in the visual field (Köhler & Held, 1949).
However, soon after that breakthrough, Lashley and colleagues (Lashley et  al., 1951) per-
formed a more critical test of Köhler’s electric field theory (and its underlying postulate of iso-
morphism). If the flows of current picked up from the scalp in Köhler and Held’s experiments
were supposed to reflect the organized pattern of perception and not merely the applied stimu-
lation, and if that pattern of perception would result from a global figure-field across the whole
cortex, a marked alteration of the currents should distort visual figures and make them unrec-
ognizable. By inserting metallic strips and metal pins in large regions of the visual cortex of rhe-
sus monkeys, Lashley et al. could short-circuit the cortical currents. Surprisingly, the monkeys
could still perform the learned shape discriminations, which demonstrated that global cortical
currents were not necessary for pattern perception. In subsequent experiments, Sperry and
colleagues (Sperry et  al., 1955) performed extensive subpial slicing and dense impregnation
with metallic wires across the entire visual cortex of cats, and showed that these animals too
could still perform rather difficult shape discriminations (e.g., between a prototypical triangle
and several different ones with small distortions). Together, these two studies effectively ruled
out electrical field theory as an explanation of cortical integration and, therefore, removed the
empirical basis of isomorphism between cortical flows of current and organized patterns of
perception.
Of course, Köhler (1965) reacted to these experiments. Lashley’s experiments he rejected
because he thought that the inserted gold foils had probably depolarized at once, which would
have made them incapable of conducting, deflecting the cortical currents, and thus disturbing
pattern vision. Sperry’s results he found too good to be acceptable as reliable evidence. Based
on the many deep cuts in large parts of the visual cortex, the cats should have been partially
blind when they were tested, and yet they made very few mistakes on these difficult discrimi-
nation tasks. Because the learning was initially already so difficult (forcing reliance on local
details), the animals probably learned to react not only to visual cues associated with the pro-
totypical test figure (which was repeated over and over again), but to other, non-visual cues
(e.g., smell) as well. The necessary methodological precautions to rule out these alternative cues
14 Wagemans

(e.g., changing all objects from trial to trial) had not been taken. However, Köhler’s rather con-
vincing counter-arguments and suggestions for further experiments were largely ignored, and
for most scientists at the time (especially, for physiological psychologists), the matter was closed
and electrical field theory, which was one of the pillars of Gestalt psychology’s scientific basis,
was considered dead and buried.

In Germany
In Germany, Gestalt psychology did not make much progress anymore after World War II.
Under Metzger’s guidance, the Psychological Institute in Münster became the largest in Western
Germany in 1965. This had much to do with Metzger’s public defense of experimental psychology,
presenting Gestalt theory as a humanistic worldview, based on experimental science. Metzger also
worked steadily to develop links with American psychologists, but that involvement did not actu-
ally rehabilitate the Gestalt position because, in doing so, he conceded much to conventional views
of machine modelling as causal explanation. In contrast to Metzger’s broad range and willingness
to address non-academic audiences, Rausch devoted nearly all of his publications to extremely
exact phenomenological illumination and conceptual clarification of issues from Gestalt theory.
For instance, in a major essay on the problem of qualities or properties in perception (Rausch,
1966), he provided an exhaustive taxonomy of Gestalt qualities (in von Ehrenfels’s sense) and
whole qualities (in Wertheimer’s sense), and he argued that whether a given complex is a Gestalt
or not is not a yes-or-no decision, but a matter of gradations on a continuum. Gottschaldt focused
mainly on clinical psychology.

Elsewhere
While Gestalt psychology declined in the English-speaking world after World War II, Italy was
a stronghold of Gestalt psychology. For instance, Wolfgang Metzger, the most important and
orthodox Gestalt psychologist in Germany at the time, dedicated his ‘Gesetze des Sehens’ (3rd
edn, 1975) to the memory of his ‘Italian and Japanese friends.’ Among his friends were Musatti,
Metelli, and Kanizsa, three major figures in Italian psychology. In spite of being Benussi’s student
and successor (from the Graz school), Cesare Musatti was responsible for introducing the Berlin
school’s Gestalt theory in Italy and training important students in this tradition, most notably
Metelli and Kanizsa, whose contribution continues to be felt today (see Bertamini & Casati, this
volume; Vezzani, Kramer, & Bressan, this volume; Bruno & Bertamini, this volume; Gerbino,
this volume; Kogo & van Ee, this volume; van Lier & Gerbino, this volume). Fabio Metelli is best
known for his work on the perception of transparency (e.g., Metelli, 1974). Gaetano Kanizsa’s
most famous work was performed in the 1950s with papers on subjective contours, modes of
color appearance, and phenomenal transparency (Kanizsa, 1954, 1955a, b; all translated into
English in 1979).
In the edited volume, ‘Documents of Gestalt psychology’ (Henle, 1961), the most important col-
lection of Gestalt work from the 1940s and 1950s, no Italian work was included. Although it
was not recognized by the emigrated German psychologists in the USA, the work put forward
by the Italian Gestalt psychologists was in many respects very orthodox Gestalt psychology. For
instance, Kanizsa (1955b/1979) took the phenomenon of ‘subjective contours,’ already pointed
out by Friedrich Schumann (1900), and gave a Gestalt explanation of the effect in terms of the
tendency toward Prägnanz. He showed how the contour could affect the brightness of an area,
just as Berlin Gestaltists had shown that contour could affect the figural character of an area.
Kanizsa (1952) even published a polemic against stage theories of perception, in which he argued
that, since according to Gestalt principles perception was caused by simultaneous autonomous
Historical and conceptual background 15

processes, it was meaningless to hypothesize perceiving as a stage-like process. This work symbol-
ized his complete separation from Graz thinking. In fact, one could talk about this tradition as the
Padua–Trieste school of Gestalt psychology (see Verstegen, 2000).
Except for Italy, Gestalt psychology was also strong in Belgium and in Japan. Albert
Michotte became famous with his work on the perception of causality (1946/1963), in which
he could demonstrate that even a seemingly cognitive inference like causality could be linked
directly to specific higher-order attributes in the spatiotemporal events presented to observers.
This work was very much in the same spirit as work by Fritz Heider on perceived animacy
and attribution of intentions (Heider, 1944; Heider & Simmel, 1944), which was the empirical
basis for his later attribution theory (Heider, 1958). Together with his coworkers, Michotte
also introduced the notions of modal and amodal completion (Michotte et  al., 1964), and
studied several configural influences on these processes (for a further discussion of Michotte’s
heritage, see Wagemans et al., 2006). Building on earlier collaborations of Japanese students
with major German Gestalt psychologists (e.g., Sakuma with Lewin, Morinaga with Metzger),
Gestalt psychology continued to develop further in Japan after World War II. For instance,
Tadasu Oyama did significant work on figural aftereffects (e.g., Sagara & Oyama, 1957) and
perceptual grouping (e.g., Oyama, 1961). The Gestalt tradition is still continued in Japanese
perceptual psychology today (e.g., Noguchi et al., 2008), especially in their work on visual illu-
sions (e.g., Akiyoshi Kitaoka).

Historical Evaluation of Gestalt Psychology


Despite signs of well-deserved respect in the USA and in Germany (e.g., Köhler’s honorary degrees
in 1967 and his APA presidency in 1957; Wertheimer’s posthumous Wilhelm Wundt Medal of the
German Society for Psychology in 1983), the Gestalt theorists’ ideas were ambivalently received.
They raised central issues and provoked important debates in psychology, theoretical biology, and
other fields, but their mode of thinking and research style accommodated uncomfortably to the
intellectual and social climate of the post-war world. Two explanations have been given for this
outcome (Ash, 1995).
One emphasizes institutional, political, and biographical contingencies. For example, Kurt
Koffka received insufficient funding for his Giessen institute in the 1920s and the remaining lead-
ers were cut off from their bases in Berlin and Frankfurt while they were still in their prime.
The Gestalt school suffered severe personal blows with the early deaths of Wertheimer in 1943,
Koffka in 1941, Gelb in 1935, and Lewin in 1947. In addition, three of Köhler’s most outstanding
students – Karl Duncker, Otto Lauenstein, and Hedwig von Restorff – all died young. After they
left Germany, the founders of Gestalt theory all obtained positions where they could do excellent
research, but could not train PhDs. The situation in Germany was different: Metzger, Rausch, and
Gottschaldt produced more students between them than Köhler, Koffka, and Wertheimer did, but
relatively few carried on in the Gestalt tradition. They all broadened the scope of their research
portfolio much beyond traditional Gestalt topics, in the direction of developmental psychology,
educational psychology, sport psychology, personality, clinical psychology, psychotherapy, and so
forth.
The second explanation concerns conceptual issues. The strengths and limitations of Gestalt
theory determined both how well it could live up to its creators’ own hopes for a new scientific
worldview, and how well their students could adapt to social and cultural change. For instance,
one of the issues that did not fit the Gestalt approach well was language. The reason for this is
clear. In psychologies and epistemologies based on rationalist categories, language constitutes
16 Wagemans

meaning. For Gestalt theory, in contrast, language expresses meaning that is already there in
the appearance or in the world (e.g., Pinna, 2010). Orthodox Gestalt theorists also refrained
from applying Gestalt thinking to personality and social psychology, fearing a lack of rigor. The
preferred route to such extensions was analogy or metaphor, and the further the metaphors
were stretched, the harder it became to connect them with Köhler’s concept of brain action. As
the work of Rudolf Arnheim on expression and art, and of Kurt Lewin on action and emotion
showed, extensions of the Gestalt approach were possible so long as one separated them from
Köhler’s psychophysics. Further extensions in that direction were largely an American phenom-
enon (e.g., Solomon Asch).
Ultimately decisive in the further decline of Gestalt theory was a meta-theoretical impasse
between its theoretical and research styles and those of the rest of psychology. Gestalt theory was
and remains interesting because it was a revolt against mechanistic explanations in science, as well
as against the non-scientific flavor of holism. Especially after 1950, its critics increasingly insisted
on causal explanations, by which they meant positing cognitive operations in the mind or neural
mechanisms in the brain. As sophisticated as the Gestalt theorists were in their appreciation of
the way order emerges from the flow of experience, one must ask how such a process philosophy
can be reconciled with strict causal determination, as Köhler at least wished to do. Koffka tried to
accomplish this feat by insisting that the very principles of simplicity and order that the Gestalt
theorists claimed to find in experience should also be criteria for evaluating both descriptions
and explanations. For him, the best argument for isomorphism was his desire for one universe of
discourse. Koffka and his co-workers never succeeded in convincing their colleagues that it was
logically necessary or scientifically fruitful to think that the external world, it’s phenomenal coun-
terpart, and the brain events mediating interactions between them, all have the same structure or
function, according to the same dynamical principles.
James J. Gibson (1971) has written that the question Koffka asked in his ‘Principles of Gestalt
Psychology’ – ‘Why do things look as they do?’ – has fundamentally reshaped research on percep-
tion. In the last two decades, central issues of Berlin school research, such as perceptual grouping
and figure-ground organization, have returned to centre stage (e.g., Kimchi et al., 2003; see also
Wagemans et al., 2012a, for a recent review), although concepts of top-down processing offered
to deal with the question have at best a questionable relationship to Gestalt theory. The status of
Wertheimer’s Gestalt laws and particularly of the so-called minimum principle of Prägnanz he
enunciated remains contested, which is another way of saying that the issues involved are still
important (e.g., Hatfield & Epstein, 1985; see also Wagemans et al., 2012b; van der Helm, this vol-
ume). Although it may be true that the Gestalt theorists failed to develop a complete and accept-
able theory to account for the important phenomena they adduced, it is also true that no one else
has either. The challenges for contemporary vision scientists are still significant.

Acknowledgments
I am supported by long-term structural funding from the Flemish Government (METH/08/02).

References
Albertazzi, L. (2001). The legacy of the Graz psychologists. In The School of Alexius Meinong, edited by
L. Albertazzi, D. Jacquette, & R. Poli, pp. 321–345. Farnham: Ashgate Publishing Ltd.
Ash, M. G. (1995). Gestalt Psychology in German Culture, 1890–1967: Holism and the Quest for Objectivity.
Cambridge, MA: Cambridge University Press.
Historical and conceptual background 17

Benary, W. (1924). Beobachtungen zu einem Experiment über Helligkeitskontrast [Observations


concerning an experiment on brightness contrast]. Psychol Forsch 5(1), 131–142.
Boudewijnse, G. (1999). The rise and fall of the Graz school. Gestalt Theory 21, 140–158.
Brown, J. F. (1931a). The visual perception of velocity. Psychol Forsch 14, 199–232.
Brown, J. F. (1931b). On time perception in visual movement fields. Psychol Forsch 14, 233–248.
Brown, J. F. (1931c). The thresholds for visual movement. Psychol Forsch 14, 249–268.
Brown, J. F., & Voth, A. C. (1937). The path of seen movement as a function of the vector-field. Am J
Psychol 49, 543–563.
Duncker, K. (1929). Über induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung) [Concerning induced movement (Contribution to the theory of visually perceived
movement)]. Psychol Forsch 12, 180–259.
Ellis, W. D. (1938). A Source Book of Gestalt Psychology. New York/London: Harcourt, Brace and Company/
Routledge & Kegan Paul.
Gelb, A., & Granit, R. (1923). Die Bedeutung von ‘Figur’ und ‘Grund’ für die Farbenschwelle [The
significance of figure and ground for the color thresholds]. Zeitschr Psychol 93, 83–118.
Gibson, J. J. (1971). The legacies of Koffka’s principles. J Hist Behav Sci 7, 3–9.
Goldmeier, E. (1937). Über Ähnlichkeit bei gesehenen Figuren. Psychol Forsch 21(1), 146–208. [Translation
reprinted as ‘Similarity in visually perceived forms’ (1972). Psychol Issues, 8 (1, Monograph 29)].
Gottschaldt, K. (1926). Über den Einfluß der Erfahrung auf die Wahrnehmung von Figuren. I. Über den
Einfluß gehäufter Einprägung von Figuren auf ihre Sichtbarkeit in umfassenden Konfigurationen
[About the influence of experience on the perception of figures, I]. Psychol Forsch 8, 261–317.
Gottschaldt, K. (1929). Über den Einfluß der Erfahrung auf die Wahrnehmung von Figuren. II.
Vergleichende Untersuchungen über die Wirkung figuraler Einprägung und den Einfluß spezifischer
Geschehensverläufe auf die Auffassung optischer Komplexe [About the influence of experience on the
perception of figures, II]. Psychol Forsch 12, 1–87.
Grelling, K., & Oppenheim, P. (1938). The concept of Gestalt in the light of modern logic. In Foundations
of Gestalt Theory, edited by B. Smith, pp. 191–209. Munich: Philosophia Verlag.
Hartmann, G. W. (1935). Gestalt Psychology: A Survey of Facts and Principles. New York: Ronald Press.
Hatfield, G., & Epstein, W. (1985). The status of the minimum principle in the theoretical analysis of visual
perception. Psychol Bull 97, 155–186.
Heider, F. (1944). Social perception and phenomenal causality. Psychol Rev 51, 358–374.
Heider, F. (1958). The Psychology of Interpersonal Relations. New York: John Wiley & Sons.
Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. Am J Psychol 57, 243–259.
Helson, H. (1933). The fundamental propositions of Gestalt psychology. Psychol Rev 40, 13–32.
Henle, M. (Ed.). (1961). Documents of Gestalt Psychology. Berkeley: University of California Press.
Hsiao, H. H. (1928). A suggestive review of Gestalt theory. Psychol Rev 35, 280–297.
Kanizsa, G. (1952). Legittimità di un’analisi del processo percettivo fondata su una distinzione in fasi o stadi
[Legitimacy of an analysis of the perceptual process based on a distinction of phases or stages]. Arch
Psicol Neurol Psichiat 13, 292–323.
Kanizsa, G. (1954). Alcune osservazioni sull’ effetto Musatti. Arch Psicol Neurol Psichiat 15, 265–271.
[Translation reprinted as ‘Some observations on color assimilation’. In Organization in Vision: Essays on
Gestalt Perception, edited by G. Kanizsa (1979), pp. 143–150. New York: Praeger Publishers.]
Kanizsa, G. (1955a). Condizioni ed effetti della trasparenza fenomenica. Riv Psicol 49, 3–18. [Translation
reprinted as ‘Phenomenal transparency’. In Organization in Vision: Essays on Gestalt Perception, edited
by G. Kanizsa (1979), pp. 151–169. New York: Praeger Publishers.]
Kanizsa, G. (1955b). Margini quasi-percettivi in campi con stimolazione omogenea [Quasi-perceptual
margins in homogeneously stimulated fields]. Riv Psicol 49, 7–30.
18 Wagemans

Kanizsa, G. (1979). Organization in Vision: Essays on Gestalt Psychology. New York: Praeger.


Kardos, L. (1934). Ding und Schatten: Eine experimentelle Untersuchung über die Grundlagen des
Farbensehen [Object and shadow]. Zeitschr Psychol 23, 1–184.
Kimchi, R., Behrman, M., & Olson, C. R. (eds). (2003). Perceptual Organization in Vision. Behavioral and
Neural Perspectives. Mahwah: Erlbaum.
Koffka, K. (1914). Die Psychologie der Wahrnehmung [Psychology of Perception]. Die Geisteswissenschaft
26 and 29, 711–716, and 796–800.
Koffka, K. (1915). Beitrage zur Psychologie der Gestalt. III. Zur Grundlegung der
Wahrnehmungspsychologie. Eine Auseinandersetzung mit V. Benussi. Zeitschr Psychol 73, 11–90.
[Translated extract reprinted as ‘Contributions to Gestalt psychology. III. Toward a foundation of the
psychology of perception. A debate with V. Benussi’. In A Source Book of Gestalt Psychology, edited by
W. D. Ellis (1938), pp. 371–378. London: Routledge & Kegan Paul Ltd.]
Koffka, K. (1922). Perception: an introduction to the ‘Gestalt-Theorie’. Psychol Bull 19, 531–585.
Koffka, K. (1923). Über Feldbegrenzung and Felderfüllung [On filling-in and boundaries of visual fields].
Psychol Forsch 4, 176–203.
Koffka, K. (1935). Principles of Gestalt Psychology. London: Lund Humphries.
Köhler, W. (1913). Über unbemerkte Empfindungen und Urteilstäuschungen [On unnoticed sensations and
errors of judgment]. Zeitschr Psychol 66, 51–80.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand. Eine natur-philosophische
Untersuchung. Braunschweig, Germany: Friedr. Vieweg und Sohn. [Translated extract reprinted as
‘Physical Gestalten’. In A Source Book of Gestalt Psychology, edited by W. D. Ellis (1938), pp. 17–54.
London: Routledge & Kegan Paul Ltd.]
Köhler, W. (1940). Dynamics in Psychology. New York: Liveright.
Köhler, W. (1965). Unsolved problems in the field of figural after-effects. Psychol Record 15, 63–83.
Köhler, W., & Held, R. (1949). The cortical correlate of pattern vision. Science 110, 414–419.
Köhler, W., & Wallach, H. (1944). Figural after-effects: an investigation of visual processes. Proc Am
Philosoph Soc 88, 269–357.
Kopfermann, H. (1930). Psychologische Untersuchungen über die Wirkung zweidimensionaler
Darstellungen körperlicher Gebilde [Psychological studies on the effect of two-dimensional
representations of physical structures]. Psychol Forsch 13(1), 293–364.
Lashley, K. S., Chow, K. L., & Semmes, J. (1951). An examination of the electrical field theory of cerebral
integration. Psychol Rev 58, 123–136.
Liebmann, S. (1927). Über das Verhalten farbiger Formen bei Helligkeitsgleichhe von Figur und
Grund [Behavior of colored forms with equiluminance of figure and ground]. Psychol Forsch 9(1),
300–353.
Mandler, G. (2002). Psychologists and the National Socialist access to power. Hist Psychol 5, 190–200.
Metelli, F. (1974). The perception of transparency. Scient Am 230, 90–98.
Metzger, W. (1930). Optische Untersuchungen am Ganzfeld. II. Zur Phänomenologie des homogenen
Ganzfeldes [Optical investigations of the Ganzfeld. II. Toward the phenomenology of the homogeneous
Ganzfeld]. Psychol Forsch 13, 6–29.
Metzger, W. (1934). Beobachtungen über phänomenale Identität [Observations on phenomenal identity].
Psychol Forsch 19, 1–60.
Metzger, W. (1936). Gesetze des Sehens. Frankfurt am Main: Kramer. [Translation reprinted as Laws of
Seeing, translated by L. Spillmann, M. Wertheimer, & S. Lehar (2006). Cambridge, MA: MIT Press].
Metzger, W. (1941). Psychologie: Die Entwicklung ihrer Grundannahmen seit der Einführung des Experiments
[Psychology: The Development of Basic Principles Since the Introduction of the Experimental Method].
Darmstadt: Verlag von Dr. Dietrich Steinkopff.
Metzger, W. (1975). Gesetze des Sehens, 3rd edn. Frankfurt am Main: Kramer.
Historical and conceptual background 19

Michotte, A. (1963). The Perception of Causality, translated by T. R. Miles & E. Miles. New York: Basic
Books. (Original work published 1946.)
Michotte, A., Thinès, G., & Crabbé, G. (1964). Les compléments amodaux des structures perceptives [Amodal
Completion of Perceptual Structures]. Leuven: Publications Universitaires de Louvain.
Müller, G. E. (1904). Die Gesichtspunkte und die Tatsachen der psychophysischen Methodik [Viewpoints
and the facts of psychophysical methodology]. In Ergebnisse der Physiologie, Vol. II, Jahrgang, II,
Abteilung Biophysik und Psychophysik, edited by L. Asher & K. Spiro, pp. 267–516. Wiesbaden:
J. F. Bergmann.
Noguchi, K., Kitaoka, A., and Takashima, M. (2008) Gestalt-oriented perceptual research in Japan: past
and present. Gestalt Theory, 30, 11–28.
Orbison, W. D. (1939). Shape as a function of the vector-field. Am J Psychol 52, 31–45.
Oyama, T. (1961). Perceptual grouping as a function of proximity. Percept Motor Skills 13, 305–306.
Pinna, B. (2010). New Gestalt principles of perceptual organization: an extension from grouping to shape
and meaning. Gestalt Theory 32, 11–78.
Prinz, W. (1985). Ganzheits- und Gestaltpsychologie und Nationalsozialismus [Holistic and Gestalt
psychology and National Socialism]. In Wissenschaft im Dritten Reich [Science in the Third Reich],
edited by P. Lundgreen, pp. 55–81. Frankfurt: Suhrkamp.
Rausch, E. (1937). Über Summativität und Nichtsummativität [On summativity and nonsummativity].
Psychol Forsch 21, 209–289.
Rausch, E. (1966). Das Eigenschaftsproblem in der Gestalttheorie der Wahrnehmung. [The problem of
properties in the Gestalt theory of perception]. In Handbuch der Psychologie: Vol. 1: Wahrnehmung
und Bewusstsein [Handbook of psychology: Vol. 1 Perception and consciousness] edited by W. Metzger &
H. Erke, pp. 866–953. Göttingen, Germany: Hogrefe.
Rubin, E. (1915). Synsoplevede Figurer. Studier i psykologisk Analyse /Visuell wahrgenommene Figuren.
Studien in psychologischer Analyse [Visually perceived figures. Studies in psychological analysis].
Copenhagen, Denmark/Berlin, Germany: Gyldendalske Boghandel.
Sagara, M., & Oyama, T. (1957). Experimental studies on figural after-effects in Japan. Psychol Bull 54,
327–338.
Schumann, F. (1900). Beiträge zur Analyse der Gesichtswahrnehmungen. I. Einige Beobachtungen über
die Zusammenfassung von Gesichtseindrücken zu Einheiten [Contributions to the analysis of visual
perception. I. Some observations on the combination of visual impressions into units]. Zeitschr Psychol
Physiol Sinnesorgane 23, 1–32.
Sekuler, R. (1996). Motion perception: a modern view of Wertheimer’s 1912 monograph. Perception 25,
1243–1258.
Smith, B. (1988). Foundations of Gestalt Theory. Munich: Philosophia Verlag.
Sperry, R. W., Miner, N., & Myers, R. E. (1955). Visual pattern perception following subpial slicing and
tantalum wire implantations in the visual cortex. J Comp Physiol Psychol 48, 50–58.
Steinman, R. M., Pizlo, Z., & Pizlo, F. J. (2000). Phi is not beta, and why Wertheimer’s discovery launched
the Gestalt revolution. Vision Res 40, 2257–2264.
Ternus, J. (1926). Experimentelle Untersuchungen über phänomenale Identität. Psychol Forsch 7, 81–136.
[Translated extract reprinted as ‘The problem of phenomenal identity’. In A Source Book of Gestalt
Psychology, edited by W. D. Ellis (1938), pp. 149–160. London: Routledge & Kegan Paul Ltd.]
Verstegen, I. (2000). Gestalt psychology in Italy. J Hist Behav Sci 36, 31–42.
Vezzani, S., Marino, B. F. M., & Giora, E. (2012). An early history of the Gestalt factors of organization.
Perception 41, 148–167.
von Ehrenfels, C. (1890). Über ‘Gestaltqualitäten’. Vierteljahrsschr wissenschaftl Philosoph 14, 224–292.
[Translated as ‘On ‘Gestalt qualities’. In Foundations of Gestalt theory, edited and translated by B. Smith
(1988), pp. 82–117. Munich, Germany/Vienna, Austria: Philosophia Verlag.]
20 Wagemans

Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R.
(2012a). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychol Bull 138(6), 1172–1217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P., & van
Leeuwen, C. (2012b). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychol Bull 138(6), 1218–1252.
Wagemans, J., van Lier, R., & Scholl, B. J. (Eds.). (2006). Introduction to Michotte’s heritage in perception
and cognition research. Acta Psychol 123, 1–19.
Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung [On visually perceived direction of
motion]. Psychol Forsch 20(1), 325–380.
Wallach, H., & O’Connell, D. N. (1953). The kinetic depth effect. J Exp Psychol 45(4), 205–217.
Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung. Zeitschr Psychol 61,
161–265. [Translated as ‘Experimental studies on seeing motion’. In On Motion and Figure-ground
Organization edited by L. Spillmann (2012), pp. 1–91. Cambridge, MA: M.I.T. Press.]
Wertheimer, M. (1922). Untersuchungen zur Lehre von der Gestalt, I: Prinzipielle Bemerkungen. Psychol
Forsch 1, 47–58. [Translated extract reprinted as ‘The general theoretical situation,’ in A Source Book of
Gestalt Psychology, edited by W. D. Ellis (1938), pp. 12–16. London: Routledge & Kegan Paul Ltd.]
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychol Forsch 4, 301–350.
[Translated as ‘Investigations on Gestalt principles, II,’ in On Motion and Figure-ground Organization
edited by L. Spillmann (2012), pp. 127–182. Cambridge, MA: M.I.T. Press.]
Wertheimer, M. (1945). Productive Thinking. New York: Harper & Brothers Publishers.
Wulf, F. (1922). Beiträge zur Psychologie der Gestalt; VI Über die Veränderung von Vorstellungen
(Gedächtnis und Gestalt). Psychol Forsch 1, 333–373. [Translated extract reprinted as ‘Tendencies in
figural variation’. In A Source Book of Gestalt Psychology, edited by W. D. Ellis (1938), pp. 136–148.
London: Routledge & Kegan Paul Ltd.).
Wyatt, F., & Teuber, H. L. (1944). German psychology under the Nazi system: 1933–1940. Psychol Rev 51,
229–247.
Chapter 2

Philosophical background:
Phenomenology
Liliana Albertazzi

Verae philosophiae methodus nulla alia


nisi scientia naturalis est
(Brentano, IV Habilitationsthesen)

The Philosophical Origins


Phenomenology, understood as the science of phenomena, appearances, or subjective experiences,
was born as a philosophical theory. It is a complex neo-Aristotelian theory that first originated
in the empirical and descriptive psychology of Brentano (Brentano, 1874/1995a, 1976/1988),
although it is generally best known in the version developed by Husserl (1913/1989,. Husserl’s
analysis, however, for a series of reasons, remained essentially theoretical. Apart from a few cases
(Merleau-Ponty, Ingarden, Becker, Schütz, Gurwitsch (1966)), the majority of Husserl’s successors
(Heidegger and Sartre, Derrida, Levinas, Ricoeur, Henry, Marion) abandoned the contact with
the sciences and the problem of their foundation—aspects that were fundamental for Husserl
(see Spiegelberg, 1982).
When in 1874 Brentano introduced the notion of intentional reference in his Psychology from
an Empirical Standpoint (PES), he might not have immediately foreseen all the consequences that
would ensue from that particular, and so ambiguous, passage in his book. And yet it sparked a
surprising intellectual debate and gave rise, through Stumpf and Meinong, two of his best pupils,
to an astonishing flourishing of experimental research in the Berlin and Graz schools of Gestalt
psychology (Albertazzi, 2001c; Wagemans et al., 2012), of which the basis was that perceiving,
grounded on the subjective, inner space-time dynamics of psychic presentations, is the perceiving
of appearances.
Described in what follows are those aspects of the Brentanian theory that drove the devel-
opment of experimental studies in perception, and mainly in vision. Descriptive psychology,
in fact, was the origin of, and the first systematic effort in, experimental phenomenology (see
Koenderink’s chapter, this publication; Albertazzi, 2013). The extreme complexity of the theory,
however, extends far beyond a summary of what is known to be Brentano’s contribution to the
science of psychology, although it was constrained to perception studies. The reader is invited
to refer on individual points to the literature cited (for a general introduction to Brentano and
related literature see Albertazzi, 2006a).
22 Albertazzi

Presentations
In PES Brentano defines the nature of the psychic phenomena (Vorstellungen) as acts (i.e. pro-
cesses) of psychic energy (a sort of Jamesian flow of awareness hence James’s esteem for Brentano
as expressed in James, 1890/1950, I, p. 547). Presentations may originate either in perception (as
seeing, noticing, observing, etc.), or in the phantasy, generally understood in terms of the capacity
to present or to visualize (when thinking, remembering, imagining, etc.).
Presentations usually do not exist on their own but in the context of other intentional modali-
ties like judgements and phenomena of interest, founded on presentations themselves. Whatever
their occurrence, and however complex simultaneously occurring psychic phenomena may be,
conscious experience is always unitary, because the acts are unitarily directed to the same object
(say, a landscape) and because individually they are partial phenomena (non-detachable parts)
of a single whole, i.e. of actual presenting. Brentano’s theory, in fact, is not ‘a summative bundle’
(Hume, 1739/2007) where perceptions arise in parcelled pieces or sensations, to be later associated
with each other according to traces of earlier perceptions, memory, etc. (Wertheimer, 1925b/1938,
p.12). A bundle, as Brentano observes, ‘strictly speaking requires a rope or wire or something else
binding it together’; on the other hand consciousness consists of a multitude of internally related
parts (Brentano, 1995b, p. 13–14).
As to perceiving, in Brentanian terms it consists neither in the symbolic or probabilistic rep-
resentation of an objective external physical reality, as for example assumed by the inferential
approach (Marr, 1982; Rock, 1983), nor in a direct or indirect resonance of such a reality due to
action, as for example assumed in the Gibsonian (Gibson, 1979) and enactive approaches (Noë,
2004) to perception. The ecological approach to vision still plays an important role in current
studies of perception (Koenderink, 1990; Lappin et al., 2011; Mace, 1977; Todd, 2004; Warren,
2005, 2006), and it is certainly closer to a Brentanian viewpoint than inferentialism; however, in
the Brentanian stance, one perceives qualitative wholes, not physical entities or physical invari-
ants. As to inferentialism, in the Brentanian framework this plays a role only insofar as the nature
of the transcendent world is concerned: in fact, appearances, the sole objects of our experience,
have only an extrinsic relationship with entities and unknown processes (PES, p. 129). Contrary
to inferentialism, however, a descriptive approach does not need to verify/justify the veridicality
or illusoriness of appearances with respect to the stimuli, because appearances are experienced
as evidently given in actual perceiving: at issue is the coherence of the structure, not the so-called
veridicality of the objects (Brentano, 1874/1995a).
Brentano identifies the essential characteristic of intentional presentation in its being directed
towards an inner object of some kind. As he writes in a celebrated but dense passage:

Every psychic phenomenon is characterized by what the medieval scholastics termed the intentional
(i.e. mental) in/existence of an object and which I  shall call, albeit using expressions not devoid of
ambiguity, reference to a content, directedness towards an object (Objectum) (which should not be
taken to be real), or immanent objectivity. Every psychic phenomenon contains something in itself as
an object (Gegenstand), although each of them does not do so in the same way. In presentation some-
thing is presented, in judgement something is accepted or rejected, in love something is loved, in hate
hated, in desire desired, etc.
(PES, p. 88).

Brentano was clearly aware from the outset of an intrinsic ambiguity in this formulation,
which was exacerbated by the medieval implications of the term intentional, whether or
not it implied an act of will related to a goal, i.e., an ‘intention’ as generally understood in
Philosophical background 23

contemporary theory of intentionality; or whose behaviour, in modern parlance, could be


explained or predicted by relying on ascriptions to the system of beliefs and desires (and hopes,
fear, intentions, hunches as well, as in Dennett, 1978), or even in terms of a perception-action
relation (O’Reagan and Noë, 2001).
One of the problems immediately raised by definitions of psychic phenomenon concerns the
relationship between the immanent object and the content of the presentation process, which are
often treated as synonyms by commentators (Höfler, 1897; Twardowsky, 1894/1977; Husserl,
1896/1979; Passmore, 1968, p. 178). To greatly simplify the question, the distinction concerns, say,
the appearance of something like a red patch in seeing (‘Seeing a colour’, Brentano, 1874/1995ap.
79). Because a perceived surface, as a part of the visual space, is necessarily a coloured appear-
ance, a spatial quality and a red textured quality are both contents and object of a presentation
(concrescent, non-detachable parts in Brentano’s 1995b terminology) of the red patch as a whole.
Other distinctions concern the difference between seeing, thinking, remembering, judging, or
loving an object like a red patch, or a cat, which means having the same object in mind under
specific and different psychic relations. On seeing a cat, for example, the perceiver’s presenta-
tion grounds on specific shape perspectival aspects appearing in awareness: the cat being white/
grey/black, running/standing, stretched out or curled up, etc., i.e. all the partial contents of the
object of presentation ‘cat’ that directly offer the cues for it to be perceptually completed as either
a modal or amodal cat (Tse, 1998). Assuming this standpoint means conceiving human experi-
ences as based on internal mental forms, be they figural patterns and/or colour appearances (see
Smithson’s chapter, this publication).

Experimental phenomenology
In Brentano’s approach the world is built from within, but not in a neurophysiological sense.
Neurophysiological aspects are not relevant to this kind of inquiry, which concerns itself only with
the modes of appearance of perceptive objects (on the relation between phenomenology of appear-
ances and neuroscience see Spillmann and Ehrenstein, 2004; Spillmann, 2009). What Brentano
affirms is that the world of experience is reducible neither to external nor to internal physiological
psychophysics (Wackermann, 2010): it is a primary, conscious, evident, qualitative level made up
of perception of colours, shapes, landscapes, movements, cats, and so on. This also means that
information is qualitative, immediately given, and endowed with meaning, not a product of the
computational retrieval and elaboration of stimuli. These are also the main tenets of an experi-
mental phenomenology focused on qualitative perceiving and its laws.
As Kanizsa put it:
The goal pursued by experimental phenomenology does not differ from that of other sectors of psy-
chology: discovery and analysis of necessary functional connections among visual phenomena, identi-
fication of the conditions that help or hinder their appearance or the degree of their evidence, in other
words: determination of the laws which the phenomenological field obeys. And this without leaving the
phenomenal domain; without, that is, referring to the underlying neurophysical processes (to a large
extent unknown) or to the concomitant non-visual psychological activities (logical, mnestic, affective
activities which are just as enigmatic as vision itself). The influence of such processes and activities cer-
tainly cannot be denied, but they must not be identified with seeing . . . The experimental phenomenol-
ogy of vision is not concerned with the brain but with that result of the brain’s activity that is seeing.
This is not a second-best choice justified by the slowness of progress in neurophysiological research and
its uncertain prospects, it is a methodological option taken for specific epistemological reasons. And
mainly the conviction that the phenomenal reality cannot be addressed and even much less explained
with a neuro-reductive approach because it is a level of reality which has its own specificity, which
24 Albertazzi

requires and legitimates a type of analysis suited to its specificity. The knowledge obtained in this way
is to be considered just as scientific as the knowledge obtained in any other domain of reality with
methods commensurate to that domain.
(Kanizsa, 1991, pp. 43–44; emphasis added).

In other words, phenomenological description comes first and it is also able to explain the laws
of seeing as the conditions governing appearances in visual space. The point has also been stressed
by Metzger when describing the task and method of an experimental phenomenology:
. . . we have proceeded exclusively and without any glance into physics, chemistry, anatomy, and physi-
ology, from within, from the immediate percept, and without even thinking of rejecting any aspect of
our findings or even changing its place, just because it does not fit with our contemporary knowledge of
nature so far. With our perceptual theory we do not bow to physiology, but rather we present challenges
to it. Whether physiology will be able to address these challenges, whether on its course, by external
observation of the body and its organs, it will be able to penetrate into the laws of perception, is point-
less to argue about in advance.
(Metzger, 1936/2006, p. 197).

A phenomenological approach to perception obviously does not deny the existence of stimuli,
but it treats them as external triggers and considers them extraneous to the phenomenological
level of analysis. Nor does it deny the psychophysical correlation between the stimulus and the
behavioural response, nor its measurement. In short, it does not deny classical psychophysics but
distinguishes among what pertains to psychophysics, what pertains to brain analysis, and what
pertains to a qualitative analysis of phenomena.
The Gestaltists adopted several features of the phenomenological method outlined by Brentano,
such as the description of appearance of the phenomena (Koffka, 1935, Part III). Katz, for exam-
ple, in his eidetic (Gestalt) analysis of colour, furnished an exemplary description of what is a
phenomenological variation (Husserl, 1913/1989, section 137)  by showing that a particular
appearance of red is nothing but an instance of a certain shade of red in general (as pure colour)
and that there is a phenomenal difference between surface colours and film or volumetric colours
(Katz, 1935, Part I). Hering provided a psychological grounding for this method of analysis in
the first two chapters of his Outlines of a Theory of the Light Sense (Hering, 1920/1964), which led
to recovery of the laws of opponence among the unique colours, which were subsequently con-
firmed at neurophysiological level (Hurvich and Jameson, 1955). Although further research has
cast doubt on some of the results obtained by neuroscientific investigation (Valberg, 1971, 2001),
it has not changed in the slightest the validity of Hering’s analysis at the phenomenological level,
nor of Brentano’s proposed methodology.

The Information Content of Presentation


However complex the riddle of the structural embedding of the act, content, and object in a
whole of presentation, as addressed in detail in Descriptive Psychology (Brentano, 1995b), may
seem at first sight, it highlights some aspects crucial for a science of experiential perceiving: for
example, the non-detachability of visual space and visual objects in the organization of per-
ception, as was later demonstrated (Koffka, 1935, Chapter  3; Kopferman, 1930), and the fact
that qualities as they appear in configurations like ‘coloured patches’ or ‘cats’ are intrinsically
relational and cannot be analysed in atomistic terms, even less in terms of physical properties.
What constitutes the identity of phenomenal objects like a seen cat, which is of neither a logical
Philosophical background 25

nor a physical kind but a whole made up of merely qualitative, internally-related appearances,
and what constitutes its phenomenal permanence in the flow of our awareness, are questions to
be explained. In fact, they were later addressed by, among others, Husserl (1966a/1991), Benussi
(1913), and Michotte (1950/1991).
It should also be noted that appearances in presentations may have stronger or weaker degrees
of intentional existence like that of a presented, remembered, or dreamed cat (Albertazzi, 2010).
For example, Metzger (1941/1963, Chapter 1) would later distinguish between an occurring event
(presented reality) and the same event represented (represented reality).
Consider a play, which takes place during a certain period of physical time, and is watched
‘live’ with a subjective experiencing that varies in relation to the spectator’s attention, interest,
and emotional involvement. Then consider the representation of the event in static photographic
images or as reported in a newspaper. Mainstream science represents events in a quantitatively
parametrized mode, but it involves structural changes in the lived experience.
A second difference within the level of phenomenal reality is given by the present reality in
its fullness, and by the reality that is equally given but present in the form of a lack, a void, or
an absence. Examples of this difference are almost structural at presentative level because of the
organization of appearances into figure/ground, so that in the visual field there is always a ‘double
presentation’ (Rubin, 1958). Other striking examples are provided by the phenomena of occlu-
sions, film colour, or the determinateness versus indeterminateness of colours, or the volume of a
half-full and half-empty glass.
A further difference within the phenomenal level of reality is that between forms of reality
that present themselves as phenomenally real and forms that present themselves as phenom-
enally apparent. In the latter case, they have a lower degree of phenomenal reality. Examples
are mirror images, after-images, and eidetic images, and hallucinations, delusions, illusions,
etc. A phenomenological conception is not a disjunctivist conception, as has sometimes been
argued (see for example Smith, 2008; for a review of the varieties of disjunctivism see: http://
plato.stanford.edu/entries/perception-disjunctive/). In fact, what is seen is only a difference in
the degree of reality among veridical, deceptive, and hallucinatory perceptions. This is because
the reality of an appearance is not classifiable in terms of its possible veridicality upon the
stimulus. As said, for Brentano a ‘physical phenomenon’ is the object of a presentation or an
appearance. A  complex and paradigmatic example of this difference is provided by amodal
shadows, like those produced on the basis of anomalous contours in an unfolding stereokinetic
truncated cone (Albertazzi, 2004).
Perceptual appearances may also have different modalities of existence. One thinks of the amodal
triangle (Kanizsa), of the impossible triangle (Penrose), of the length of lines in the Müller-Lyer
illusion (1889), or of the size of the circles in the Ebbinghaus illusion (1902), or more simply of the
already mentioned diverse modes of appearance of colour (Katz, 1935), including their valence
characteristics in harmony, which is still a controversial topic (Allen and Guilford, 1936; Allen
and Guilford,1936; Da Pos, 1995; Geissler, 1917; Granger, 1955;; Guilford and Smith, 1959; Major,
1895; von Allesch, 1925a, b).
Distinguishing and classifying the multifarious variety of immanent object/s and content/s also
in regard to the different kinds of psychic processes (ranging among presentations, judgements,
emotional presentations, and assumptions) was the specific goal of both Twardowsky (1894/1977)
and Meinong (1910), while the subjective space-time nature and internal dependence of act,
object, and content were the specific concern of Husserl’s, Meinong’s, and Benussi’s research, as
well as the phenomenological-experimental approach to the study of consciousness.
26 Albertazzi

What is Physical in Qualitative Perceiving?


One of the most revolutionary aspects of Brentano’s theory concerns the distinction between what
should be understood as being psychic and what should be understood as being physical, in per-
ceiving. This distinction is still a matter of debate, and it may have significant potential for the
advancement of perception studies.
As Brentano wrote in another famous passage:
Every presentation which we acquire either through sense perception or imagination is an example
of a psychic phenomenon. By presentation I do not mean what is presented, but rather the act of
presentation. Thus, hearing a sound, seeing a coloured object, feeling warm or cold, as well as similar
states of imagination are examples of what I mean by this term. I also mean by it the thinking of a
general concept, provided such a thing actually does occur. Furthermore, every judgment, every rec-
ollection, every expectation, every inference, every conviction or opinion, every doubt, is a psychic
phenomenon. Also to be included under this term is every emotion: joy, sorrow, fear, hope, courage,
despair, anger, love, hate, desire, act of will, intention, astonishment, admiration, contempt, etc.
(Brentano, 1874/1995a, pp. 78–79, tr. slightly modified).

Brentano distinguished very clearly between psychic and physical phenomena. He wrote,
Examples of physical phenomena, on the other hand, are a colour, a figure, a landscape which I see, a
chord which I hear, warmth, cold, odour which I sense; as well as similar images which appear in the
imagination.
(Brentano, 1874/1995a, pp. 79–80).

Although his theory underwent subsequent developments, Brentano always maintained his
assumption that ‘psychic phenomena’ like a seeing, a feeling, a hearing, an imagining, and so on,
constitute what effectively exists in the strong sense (Brentano, 1982, p. 21). They are mental pro-
cesses, in fact, expressed in verbal form.
Psychic phenomena are essentially distinct from ‘physical phenomena’, which for Brentano
are immanent and intentional objects of the presentations themselves, i.e. appearances, and are
expressed in nominal form (Brentano, 1874/1995a, pp.  78–79). Essentially, physical phenom-
ena are composed of two non-detachable parts, i.e. phenomenal place and quality (Brentano,
1874/1995a, pp. 79–80; 1907/1979, p. 167; 1982, pp. 89, 159 ff.). For example, if two blue spots,
a grey spot, and a yellow one appear in the visual field, they differ as to colour and place; each
of the blue spots, in its turn, is different from the yellow and the grey one. But they are also dif-
ferent from each other because of a difference in place; colour and place, in fact, being two (dis-
tinctional) parts of the same visual phenomenon (Brentano, 1995b, p. 17 ff; Albertazzi, 2006a,
Chapter 4).
The point is important, because readers of whatever provenance easily misunderstand what
Brentano conceives to be physical phenomena, as distinguished from psychic phenomena, mostly
because of the equivocalness of the term ‘physical’. Given that the objects of a presentation are
wholly internal to the mental process, it is not surprising, in this framework, that a seen colour, a
heard sound, an imagined cat, a loved poem, etc. are conceived as the only ‘physical phenomena’
of our subjective experience. Brentano’s ‘sublunar Aristotelian physics’ is a physics of man, or an
observer-dependent physics (Koenderink, 2010). One might think that avoiding equivocalness
and, for example, speaking in terms of processes and appearances would be more fruitful for
understanding Brentano’s theory. However, one notes that a similar radical position was later
assumed by Hering when he addressed the nature of the visual world. In defining the nature of
objects in a visual presentation, Hering declares:
Philosophical background 27

Colors are the substance of the seen object. When we open our eyes in an illuminated room, we see
a manifold of spatially extended forms that are differentiated or separated from one another through
differences in their colors . . . Colors are what fill in the outlines of these forms, they are the stuff out of
which visual phenomena are built up; our visual world consists solely of different formed colors; and
objects, from the point of view of seeing them, that is, seen objects, are nothing other than colors of dif-
ferent kinds and forms.
(Hering, 1920/1964, Chapter 1, p. 1; emphasis added).

Nothing could be more Brentanian than Hering’s account of vision, both from a psychological
and an ontological viewpoint. Interlocked perceptual appearances like colour, shape, and space,
in the Brentanian/Heringian framework, are in fact the initial direct information presented to us
in awareness (Albertazzi et al., 2013). They are not the primary properties of what are commonly
understood as physical entities, even though they are correlated with stimuli defined on the basis
of physics. Appearances in visual awareness are not simply representations of ‘external’ stimuli;
rather, they are internal presentations of active perceptual constructs, co-dependent on, but quali-
tatively unattainable through, a mere transformation of stimuli (see Mausfeld, 2010). For example,
the intentional object ‘horse’ is not the ‘represented horse’, but the inner object of who has it in
mind (Brentano, 1966/1979, pp. 119–121). The references of the phenomenal domain are not
located in the transcendent world but are the subjective, qualitative appearances produced by the
process of perceiving. Consequently, phenomena of occlusion, transparency, so-called illusions,
trompe l’oeil, and so on, because they are almost independent from external stimuli, are entirely
ordinary perceptive phenomena; they are not odd, deceptive perceptions as has been maintained
(Gregory, 1986). In fact, appearances are prior from the point of view of experiences to any con-
struction of physical theories: consider, for example, a visual point in which one can distinguish
between a where (the place in the field where the point appears) and a what (its ‘pointness’), some-
thing very dissimilar from the abstraction of a Euclidean point. We perceive the world and we do
so with evidence (the Brentanian concept of internal perception, innere Wahrnehmung) before
making of it an object of successive observations and scientific abstractions.

Psychology from a First Person Account


Descriptive Psychology (Brentano, 1995b) presents a sophisticated taxonomy of wholes and parts,
intended to lay down a science of the mental components of the process of intentional reference
and their laws of organization. Brentano painstakingly itemizes the different varieties of distinc-
tional parts of a psychic whole, not necessarily detachable, and how they relate to each other. For
example, he distinguishes between concrescent parts, like the place and colour of a patch and
parts of the psychic phenomenon regarding awareness of an object and self-awareness of being
conscious of it. Furthermore, he distinguishes between the different varieties of the detachability
that parts can undergo within the unitary consciousness: bilateral detachability as in simultane-
ously seeing and hearing; one-side detachability as between side-by-side red and yellow patches,
as separate instances of the common species ‘colour’, this being their logical part; or the one-side
detachability between a presentation and a phenomenon of interest. In so doing, he shows not
only the psychological but also the ontological nature of the processes and of the part-processes.
Thus, descriptive psychology plays the role of a general foundation of science.
Brentano, in fact, maintained that his descriptive psychology, i.e. a pure non-physiological psy-
chology, was far more advanced than physics, because it aimed systematically to describe, distin-
guish, and explain the nature of subjective experiences and their laws before they are correlated
with our conceiving and understanding of the transcendent world in terms of physics. In other
28 Albertazzi

words, phenomenology ‘is prior in the natural order’ (Brentano, 1995b, p. 8, p. 13), and provides
guidance for correlated neurophysiological and psychophysical researches, but it also explains the
nature of appearances themselves, i.e. the conditions of their appearing.
This is why a science of phenomena must be strictly and formally constructed on the basis
of subjective judgements in first person account. Experimental-phenomenological science must
then identify the specific units of representations and the specific metrics with which to measure
them and construct a generalized model of appearances (Kubovy and Wagemans, 1995). In his
criticism of Fechner (1860/1966), Brentano maintained that explanation is required not only of
the classical psychophysical just noticeable differences (jnd), but also of ‘just perceivable differ-
ences’ (jpd), i.e. magnitudes of a qualitative nature that constitute the perception of difference,
like the ‘pointness’, ‘squareness’, ‘acuteness’, or ‘remoteness’ of an appearance in presentation. Here
evaluation is made of the phenomenic magnitude of a subjective, anisotropic, non-Euclidean,
dynamic space (Koenderink et al., 2010; Albertazzi, 2012a). The nature of such units (for exam-
ple, temporal momentum), depending on the conditions and the context of their appearances,
requires a non-linear metrics for their measurement. Contemporary science has not yet devel-
oped a geometry of visual awareness in terms of seeing, although this is a necessary preliminary
step in order to be able to address the question in proper terms, but there are some proposals
more or less organized into theories (Koenderink, 2002, 2010, 2013; Koenderink and van Doorn,
2006). This radical standpoint obviously raises numerous issues as to the proper science of psy-
chology, its feasibility, its laws of explanations, its correlation with the sciences of psychophysics
and neurophysiology, its methods, and its measurement of psychic processes and their appear-
ances. Last but not least, how the construction and the final identity of the object of a presenta-
tion develops in the flow is something that cannot be explained until we have a general theory of
subjective time-space, and of the inner relations of dependence among the parts of the contents
of our awareness in their flowing.
One only need look at Brentano’s analysis of the intensity of colour perception, for example, to
understand how distant from classical psychophysics his approach is (On Individuation, Multiple
Quality and the Intensity of Sensible Appearances, Brentano, 1907/1979, Chapter  1, pp.  66–89);
or at what should be framed as a geometry of the subjective space-time continuum, presented in
the Lectures on Space, Time and the Continuum (see the contributions in Albertazzi, 2002a), to
be aware of what could be the foundations of a science of subjective experiencing or, strictly in
Brentano’s terms, a science of psychic phenomena. These pioneering studies are at the roots of a
theory of consciousness as a whole.

Perceptual Grouping
Wholes and parts
The theory of wholes and parts is a cornerstone of Gestalt psychology (Brentano, 1982). However,
closer inspection of the subject shows how complex the question may be, how many different
aspects of our awareness it may concern, and at the same time the still enormous potential that it
has for the study of perceptual organization and of awareness in current science. Gestalt mereol-
ogy, in fact, concerns different aspects of perceiving, and intrinsically correlated topics like the
continuity, variance, and isomorphism of the inner relations of the parts of a perceptual whole,
this being a process of a very brief duration.
Mostly unknown in psychological studies, however, is that it was Twardowsky’s book
(1894/1977) on the object (i.e. phenomenon or appearance) and content of a presentation, and
Philosophical background 29

his distinction between the different types of parts in a whole, which prompted several strik-
ing developments in mereology among the Brentanians. It was the starting point for Husserl’s
mereology (1900–01/1970, Third Logical Investigation), Stumpf ’s analyses of the process of
fusion (Verschmelzung) between the parts of an acoustic whole (Stumpf, 1883), and Meinong’s
works on relations (Meinong, 1877, 1882) and on higher order mental objects like Gestalt wholes
(Meinong, 1899). Fusion is today studied in light of the concept of ‘unitization’ (Goldstone, 1998;
Czerwinski et al., 1992; Welham and Wills, 2011) but is generally seen as the product of percep-
tual learning.
All the above-mentioned developments were painstaking analyses that distinguished the
many ways in which something is part of a whole, and how a whole is made up of parts,
as well as the hierarchy of acts, objects, and parts of contents in a presentation. Most nota-
bly, Stumpf ’s analysis of tonal fusion was based on similarity of sounds, in contrast with
Helmholtz’s neurophysiological explanation, which was framed within a quantitative summa-
tive theory (Zanarini, 2001). Wertheimer, Koffka, and Köhler, all Stumpf ’s pupils, inherited
also his concept of the colour of a musical interval and the Gestalt concept of vocality. The
concept of fusion was then taken up by Husserl (1891/2003, § 29) when he considered mental
aggregates and manifolds. Husserl’s Logical Investigations (Husserl, 1900–01/1970), in fact, are
dedicated to Carl Stumpf.
Over the years, the analyses concentrated mainly on the nature of the already-organized percept
and its laws of organization in the so-called Berlin style (Koffka, 1935; Metzger, 1934, 1936/2006,
1941/1963), giving rise to what today is generally conceived as the Gestalt approach to percep-
tion. Less developed was the analysis of the process itself, in the so-called ‘Graz style’, i.e. how the
percept unfolds from within, in presentation. Wertheimer himself, however, in clarifying the role
and the goal of Gestalt theory, wrote:
There are wholes, the behaviour of which is not determined by that of their individual elements, but
where the part-processes are themselves determined by the intrinsic nature of the whole. It is the hope
of Gestalt theory to determine the nature of such wholes.
(Wertheimer, 1925a/1938, p. 2).

The nature of this type of whole is explained as follows:


Empirical enquiry discloses not a construction of primary pieces, but gradations of givenness
(Gegebenheit) ‘in broad strokes’ (relative to more inclusive whole properties), and varying articulation.
The upper limit is complete internal organization of the entire given; the lower limit is that of additive
adjacency between two or more relatively independent wholes. To sever ‘a part’ from the organized whole
in which it occurs—whether it itself be a subsidiary whole or an ‘element’—is a very real process usu-
ally involving alterations in that ‘part’. Modification of a part frequently involves changes elsewhere in
the whole itself. Nor is the nature of these alterations arbitrary, for they too are determined by whole
conditions and the events initiated by their occurrence run a course defined by the laws of functional
dependence in wholes. The role played here by the parts is one of ‘parts’ genuinely ‘participating’—not
extraneous, independent and-units.
(Wertheimer, 1925b/1938, p. 14).

Emphasizing that the concept of Gestalt had nothing to do with ‘sums of aggregated contents
erected subjectively upon primary given pieces’, or ‘qualities as piecemeal elements’, or ‘some-
thing formal added to already given material’, expressed by kindred concepts, Wertheimer defined
these types of wholes as ‘wholes and whole processes’ possessed of specific inner intrinsic laws
(Wertheimer, 1925a/1938, p.  14; Albertazzi, 2006b), whose ‘pieces’ almost always appear as
30 Albertazzi

non-detachable ‘parts’ in the whole process: that is, they are not detachable from them. Finally,
he stated:
The processes of whole-phenomena are not blind, arbitrary, and devoid of meaning . . . To comprehend
an inner coherence is meaningful; it is meaningful to sense an inner necessity.
(Wertheimer1925a/1938, p. 16).

In short, according to Wertheimer, Gestalt wholes are made up of non-independent parts; they
are presented as phenomenal appearances with different degrees of reality; and they are intrinsi-
cally meaningful, which signifies that they do not have to refer to transcendent entities for their
truth, validity, and consistency. From where do these statements derive? And, can we say that over
the years Wertheimer’s theory, with all its richness, has received adequate explanation?
One may distinguish between two main approaches in the analysis of whole and parts: a line
of inquiry that can be broadly ascribed to Stumpf, Husserl, Wertheimer, Koffka, and Köhler, and
a line of inquiry broadly ascribable to Ehrenfels, Meinong, and Benussi, although matters are not
so clear-cut. Kenkel (1913), Lindemann (1922), Hartmann (1932), and Kopferman (1930), for
example, worked on the dynamic aspects of the apprehension of Gestalten; while the positions
taken up by Meinong, Benussi, Höfler, Witasek (1899), and Ameseder (1904) exhibit features in
common with what was the main concern of the Leipzig school of Ganzheitspsychologie (Sander,
1930; Klages, 1933; Krueger, 1953; Wellek, 1954; Ehrenstein, 1965). In fact, there is a time of the
development of phenomena (what the Leipzigers called ‘actual genesis’) that inheres in the onset
of a form at a certain temporal point of consciousness. From this point of view, the individual
Gestalten are sub-wholes of a larger whole, that is, the entire content of consciousness (see also
Husserl’s theory of double intentionality in Husserl, 1966a/1991).
Briefly, the Berliners focused mainly on appearances and their laws of organization in percep-
tual fields and their physiological correlates, while the Grazers were mainly interested in the con-
struction and the deployment of appearances in the subjective duration. Both approaches were
essentially concerned with the question of relations of a specific kind: the figural qualities, and
how they appear in perceiving. The solutions, however, were different.

Gestalt qualities
The term ‘Gestalt qualities’ was initially proposed by von Ehrenfels (1890/1988), Meinong (1891),
Cornelius (1897), and Mach (1886). Specifically, Mach observed that we are able to have an imme-
diate sensation of spatial figures, and of tonal ones like melodies. As is well known, the same
melody can be played in F, G, and so forth, as long as all the relationships of tempo and the tonal
intervals among the notes are respected; even if we replace all of the melody’s sounds, the melody
is still recognizable as the same melody.
Ehrenfels (1890/1988) wrote:
By Gestalt quality we mean a positive content of presentation bound up in consciousness with the
presence of complexes of mutually separable (i.e. independently presentable) elements. That complex
of presentations which is necessary for the existence of a given Gestalt quality we call the foundation
[Grundlage] of that quality.
(Ehrenfels, 1890/1988, § 4).

The most interesting and generally unknown development of the Brentano mereological the-
ory, however, was due to Benussi (Benussi, 1904, 1909, 1922–23). What Benussi experimen-
tally discovered is that there are phases (prototypical durations) in a presentation that allow
Philosophical background 31

dislocations and qualitative reorganization of the stimuli. He identified very short durations
(from 90 to 250 msec ca); short durations (from 250 to 600 msec ca); indeterminate durations
(from 600 to 1100 msec ca); long durations (from 1100 to 2000 msec ca); and extremely long
durations (≥2000 msec).
These findings addressed the subjective temporal deployment of a presentation and how mean-
ing is perceptually construed in the duration. The stereokinetic phenomenon of the rotating
ellipse, later developed by Musatti, shows the presence of ‘proto-percepts’ that processually unfold
from the first configuration in movement until the final perceptual stable outcome (Musatti, 1924,
1955, pp. 21–22).
To be noted is that Kanizsa, who first declared his disagreement with the idea of phases in
perceiving (Kanizsa, 1952), later came to reconsider Benussi’s viewpoint (Vicario, 1994). While
Kanizsa distinguished between seeing and thinking, considering them two different processes, at
least heuristically, he never directly addressed the question as to whether there was continuity or
discontinuity between the two processes (Albertazzi, 2003). Benussi’s theory shows the temporal
transition from perceptive to mental presence (i.e. from seeing to thinking) in presentation as the
inner deployment of the part/whole structure of a presentation.
Benussi’s experiments showed that seeing has a temporal extensiveness comprising phases in which
an ordering between the parts occurs; that the parts in perceptive presence are ‘spatialized’ in a simul-
taneous whole given in mental presence; that processes and correlates develop together; and that the
duration has a progressive focus and fringes of anticipation and retention of the parts, as Husserl had
already discussed from a phenomenological viewpoint. Benussi also showed that the dependence
relation among parts is a past-present relation, not a before-after one, occurring in the simultaneity of
the time of presentness; that parts may be reorganized qualitatively (as in cases of temporal and visual
displacement); and that at the level of the microstructure of the act of presentation, the parts can give
rise to different outputs as second-order correlates (which explains the phenomena of plurivocity).
After the initial ‘critical phase’ of the presentation regarding the actual duration of a presentation, we
take note of the spatial arrangement, the symmetry, the distance of its content-elements, and take
up assertive attitudes or attitudes of persuasion, of fantasy, of fiction, etc. (again a Brentanian legacy,
Brentano PES II). These are all intellective states, concerning the types of the act.

Berlin Versus Graz


The Benussi-Koffka dispute
A turning point in Brentano’s theory and in the development of descriptive psychology can be
exemplified by the controversy between Benussi and Koffka (Koffka and Kenkel, 1913; Benussi,
1912b; Albertazzi, 2001a). In 1912 two articles were published on the perception of stroboscopic
movement (Benussi, 1912a; Wertheimer, 1912/2012). The articles raised the issue of the theoreti-
cal status of so-called illusions.
Benussi designed a vertical, tachistoscopic variant of the Müller-Lyer illusion, and he found that
the subjects saw the vertical line, which was of constant length, as extending or shortening accord-
ing to the position and orientation of the collateral segments. The subjects perceived the apparent
movement of the median point of the line in relation to the change of form of the figure as a whole,
and in the temporal deployment of the various phases of the phenomenon. Benussi highlighted
the presence of two different types of movement, the first resulting from the succession of the
stroboscopic sequence of stimuli (s-Movement), and the second resulting from the apprehension
and subjective production of the whole appearance (S-Movement).
32 Albertazzi

This explanation was bitterly contested by the Berliners. In 1913 Koffka and Kenkel published a
joint article in which they conducted detailed analysis of the results from tachistoscopic presenta-
tions of the Müller-Lyer illusion, results that closely resembled Benussi’s. Kenkel found that with
stroboscopic exposure, objectively equal lines in these figures were seen to expand and contract
(α-movement) in exactly the same manner as two similarly exposed objectively unequal lines
(ß-movement). From Koffka and Kenkel’s point of view, the two moments were functionally and
descriptively the same. While acknowledging Benussi’s temporal priority on this type of experi-
ment, Koffka nevertheless criticized his explanation. Benussi maintained that the cause of appar-
ent movement was the diversity of position assumed by the figure in the individual distinct phases
of the process. Koffka instead believed that the vision of movement was a unitary phenomenon,
not an aggregate of parts. Hence, he maintained, even if the phases presented are physically dis-
tinct, they are seen as a unitary, clearly structured complex (Koffka and Kenkel, 1913, 445 ff).
From his viewpoint, it was not possible to derive wholes from their parts, which he evidently
considered to be sensory contents, i.e. individual pieces.
At bottom, therefore, this was a theoretical dispute concerning: (i) the existence or otherwise
of non-detachable components of the Gestalt appearance; (ii) their nature, i.e. whether they
were sensory contents; (iii) their relation with the stimuli; (iv) their mutual inner relations; and
(v) more generally whether or not it was possible to analyse the deployments of the contents in
the presentation.
While insisting that the presence of internal phases did not imply the separateness of the parts
of the phenomenon, Benussi (1914a) in his turn criticized the physiological conception at the
basis of the Berliners’ theory, in that it did not account for the eminently psychological structure
of the event. What the Berliners lacked was a thorough theory of presentation in which stimuli
play only the role of triggers, in the absence of any constancy principle: presentations are not psy-
chophysical structures representing stimuli, as Brentano maintained.
The controversy continued in Koffka (1915/1938), who used the dispute with Benussi as an
occasion to give systematic treatment to the Berlin school’s views on the foundations of the theory
of perception, which he set in sharp contrast to those of the Graz school. The value of the con-
troversy consists in its clear depiction of the different positions taken by the two Gestalt schools
(Albertazzi, 2001b, c). From our present point of view, the controversy was grounded in the ques-
tion as to whether it is possible to test, and consequently explain, the subjective deployment of a
phenomenon at the presentational level, without necessarily having to resort to psychophysical or
brain correlates for their explanation.

Descriptive and genetic inquiries


The Meinongians went further into the object and methodology of a descriptive psychology,
by addressing the processual aspects of the psychic phenomena—the laws of becoming—in
Brentanian terms (Brentano, 1995b, p.  6), although not from a physiological viewpoint. In so
doing, they further distinguished their research and interests from the Berlin approach.
Meinong’s work on assumptions was also the maximum point of development of Brentano’s
descriptive psychology. Brentano, in fact, on distinguishing the task of psychology from that of
physiology, wrote:
My school draws a distinction between psychognosis and genetic psychology . . . The former contains
all the psychic elements which when combined produce the totality of psychic phenomena, in the
same way as the letters of the alphabet produce the totality of words . . . The latter teaches us the laws
which determine how psychic phenomena appear and vanish. Given that―because psychic functions
Philosophical background 33

indubitably depend on the workings of the nervous system―these are in large part physiological
conditions, we see that in this case psychological research must combine with physiological research.
(Brentano, 1895, p. 35; emphasis added).

And he subsequently observed that ‘the perfection of psychognosis [descriptive psychology]


will be one of the most essential steps in preparation for a genuinely scientific genetic psychology’
(Brentano, 1995b, p. 11).
In 1910, in the preface to the second edition of On Assumptions, Meinong wrote:
. . . the theory of assumptions can pride itself on the success of having been chosen as one of the main-
stays for a new theoretical edifice, namely that of genetic psychology—the latest, most arduous, and
most promising of the special psychological disciplines.
(Meinong, 1910/1983, p. 7; emphasis added).

The ‘genetic’ approach to which Meinong refers means neither a reduction to physiology, nor
research conducted in terms of developmental psychology, to use modern terms. The genesis,
i.e. the study of the deployment of a presentation, pioneered by Benussi, to distinguish specific
prototypical micro-durations responsible for the final output, was conducted without resorting
to underlying neurophysiological processes, but merely by analysing the characteristic of the
subjective integrations occurring in the space-time of awareness. Benussi admitted, however,
that at his time the tools available were not such to enable him to slow down the process in
the proper way. Recent research on attention processes, by Rensink (2000, 2002) for example,
has confirmed almost all the five prototypical durations evidenced by Benussi in his experi-
ments (Benussi, 1907, 1913, 1914b; see also Katz, 1906; Calabresi, 1930; Albertazzi, 1999, 2011).
These durations constitute the present and its fringes, i.e. they are the basic components of
presentations.
The theory of production, instead, was understood by the Berliners in terms of a mosaic theory,
as a variation of elementism, grounded on the constancy hypothesis of what, in their view, still
appeared to be ‘sensations’ (Köhler, 1913; Koffka, 1915/1938), interpreting it in inferentialistic
terms. As Kanizsa points out, in fact, in the inferentialist viewpoint:
One postulates the existence of a first ‘lower-level’ psychic phase, that of the ‘elementary sensations’.
Acting upon this are then ‘higher-level’ psychic faculties or instances, namely the memory, the judge-
ment, and the reasoning, which, through largely unconscious inferences founded upon specific and
generic past experiences, associate or integrate the elementary sensations, thus generating those
broader perceptual units which are the objects of our experience, with their forms and their meanings.
(Kanizsa, 1980, p. 38).

However, there is almost nothing in the Graz theory that can be traced back to a theory of
atomic sense data, to a Wundtian apperception or to unconscious Helmholtian inferences: what
the Grazers called the ‘founding elements’ on which higher-order objects (Gestalten) are subjec-
tively grounded are non-detachable parts of the whole and do not depend on probabilistic infer-
ences from past experience. Being partial contents of presentations, they are already phenomenic
materials, i.e. part-processes on their own, influenced, modified, and reorganized in the Gestalt
whole deploying in the time of presentness: for example, they are presented as ‘being past’, which
is a qualitative determination. Moreover, although they are distinguishable parts, they are not
separable. Also set out within this framework are the classic Brentanian notions concerning tem-
poral perception (specifically the difference between perceived succession and the perception of
succession), and the location in subjective space, place, and time of appearances.
34 Albertazzi

Gestalt Phenomenology and Beyond


I have briefly sketched the origin of, and the main concepts that gave rise to, experimental phe-
nomenology, and mainly from the Gestalt point of view in the version of both the Berlin and Graz
schools. The main distinction between the two schools consists in the greater weight given to the
relationships between phenomenology and physiology by the Berliners, and to phenomenology
and the structure of awareness by the Grazers. Simplifying to the extreme, the Meinongians were
somewhat less ‘positivistic’ than their colleagues, notwithstanding Koffka’s claims in his Principles
(Koffka, 1935, pp.  684–5). At the basis of the controversy lay a different idea of the theory of
wholes and parts.
In the 1970s the ideas of Brentano and his school on the theory of wholes and parts were
recast mainly in the analytic field, through the so-called mereological essentialism formu-
lated by Chisholm (1973, 1975). However, if mereological essentialism may prove to be a valid
instrument in analysis of wholes that are aggregates (Grelling and Oppenheim, 1937/8), it is
unable to deal with the dynamic unity of Gestalt wholes, the basics of Brentano’s psychology.
Consequently, this recasting had no impact on the development of the theory of intentional
reference as such.
As to the relationship between phenomenology and neurophysiology, envisaged by the Berliners,
the phenomenological analysis of appearances has furnished inputs to the neurosciences. As
Brentano maintained, a genetic psychologist without descriptive knowledge is like a physiologist
without anatomical knowledge (Brentano, 1995b, p. 10). Not only the phenomena but also the
principles of Gestalt have been subject to neurophysiological investigation. Very rarely, however,
have the results of neurophysiological analyses furnished insights for phenomenological analysis.
Moreover, our current knowledge about neuronal mechanisms does not yet enable us to establish
with precision the relations between the two levels:  the qualitative level of perception of visual
appearances and that of the underlying neuronal activity.
The Brentano programme in its entirety, instead, is still awaiting completion and most of all a
phenomenological-experimental explanation. Still unaccomplished, for example, is completion of
the project regarding the foundations of a general theory of subjective space-time and its filling-in
(Albertazzi, 1999, 2002a, 2002b; Lappin and van de Grind, 2002; Koenderink et al., 2012), i.e. a
general theory of appearances in awareness.
What experimental phenomenology incontestably entails is the need to devise ‘sharply and pre-
cisely’ (Brentano, 1995b, p. 5) a psychological science per se, which goes beyond current proposals.
Such a science must develop new methods for the investigation, measurement, and mathematical
modelling of qualitative perceiving. One of the starting points, for example, would be conceiving
a geometry of virtual or ‘imaginary’ spaces closer to awareness of visual phenomena—which is
what Brentano laid out almost two centuries ago.

References
Albertazzi, L. (1999). ‘The Time of Presentness. A Chapter in Positivistic and Descriptive Psychology.’
Axiomathes 10: 49–74.
Albertazzi, L. (2001a). ‘Back to the Origins.’ In The Dawn of Cognitive Science. Early European Contributors
1870–1930, edited by L. Albertazzi, pp. 1–27 (Dordrecht: Kluwer).
Albertazzi, L. (2001b). ‘Vittorio Benussi.’ In The School of Alexius Meinong, edited by L. Albertazzi,
D. Jacquette, and R. Poli, pp. 95–133 (Ashgate: Aldershot).
Albertazzi, L. (2001c). ‘The Legacy of the Graz Psychologists.’ In The School of Alexius Meinong, edited by
L. Albertazzi, D. Jacquette, and R. Poli, pp. 321–345 (Ashgate: Aldershot).
Philosophical background 35

Albertazzi, L. (2002a). ‘Continua.’ In Unfolding Perceptual Continua, edited by L. Albertazzi, pp. 1–28


(Amsterdam: Benjamins Publishing Company).
Albertazzi, L. (2002b). ‘Towards a Neo-Aristotelian Theory of Continua: Elements of an Empirical
Geometry.’ In Unfolding Perceptual Continua, edited by L. Albertazzi, pp. 29–79 (Amsterdam: Benjamins
Publishing Company).
Albertazzi, L. (2003). ‘From Kanizsa Back to Benussi: Varieties of Intentional Existence.’ Axiomathes
13: 239–259.
Albertazzi, L. (2004). ‘Stereokinetic Shapes and Their Shadows.’ Perception 33: 1437–1452.
Albertazzi, L. (2006a). Immanent Realism. Introduction to Franz Brentano (Berlin, New York: Springer).
Albertazzi, L. (2006b). ‘Das rein Figurale.’ Gestalt Theory 28(1/2): 123–151.
Albertazzi, L. (2010). ‘The Ontology of Perception.’ In TAO-Theory and Applications of Ontology. Vol. 1.
Philosophical Perspectives, edited by R. Poli, and J. Seibt, pp. 177–206 (Berlin, New York: Springer).
DOI: 2147444897.
Albertazzi, L. (2011). Renata Calabresi: History of Psychology 14(1): pp. 53–79.
Albertazzi, L. (2012a) (in press). ‘Qualitative Perceiving.’ Journal of Consciousness Studies 19 (11–12): 6–31.
Albertazzi, L. (2013)) ‘Experimental Phenomenology. An Introduction.’ In The Wiley-Blackwell Handbook
of Experimental Phenomenology. Visual Perception of Shape, Space and Appearance, edited by
L. Albertazzi, pp. 1–36. London-Wiley-Blackwell.
Albertazzi, L., van Tonder, G., and Vishwanath, D. (2010). ‘Information in Perception.’ In Perception
Beyond Inference. The Information Content of Perceptual Processes, edited by L. Albertazzi, G. van
Tonder, and D. Vishwanath, pp. 1–26 (Boston, Mass.: MIT Press).
Allen, E. C., and Guilford, J. P. (1936). ‘Factors Determining the Affective Value of Color Combinations.’
The American Journal of Psychology 48: 643–648.
Ameseder, R. (1904). ‘Über Vorstellungsproduktion, Über absolute Auffälligkeit der Farben.’ In
Untersuchungen zur Gegenstandstheorie und Psychologie, edited by A. Meinong, pp. 509–526
(Leipzig: Barth).
Benussi, V. (1904). ‘Zur Psychologie der Gestalterfassens (Die Müller-Lyer Figur).’ In Untersuchungen zur
Gegenstandstheorie und Psychologie, edited by A. Meinong, pp. 303–448 (Leipzig: Barth).
Benussi, V. (1907). ‘Zur experimentelle Analyse des Zeitvergleichs.’ Archiv für die gesamte Psychologie 9:
572–579.
Benussi, V. (1909). ‘Über “Aufmerksamkeitsrichtung” beim Raum- und Zeitvergleich.’ Zeitschrift für
Psychologie 51: 73–107.
Benussi, V. (1912a). ‘Stroboskopische Scheinbewegungen und geometrisch-optische Gestalttäuschungen.’
Archiv für die gesamte Psychologie 24: 31–62.
Benussi, V. (1912b). ‘Referät über Koffka-Kenkel’. ‘Beiträge zur Psychologie der Gestalt- und
Bewegungserlebnisse I.’ Archiv für die gesamte Psychologie 32: 50ff.
Benussi, V. (1913). Psychologie der Zeitauffassung (Heidelberg: Winter).
Benussi, V. (1914a). ‘Gesetze der inadäquaten Gestalterfassung.’ Archiv für die gesamte Psychologie 32:
50–57.
Benussi, V. (1914b). ‘Versuche zur Bestimmung der Gestaltzeit.’ In Bericht über den 6. Kongress für
experimentelle Psychologie Göttingen, edited by F. Schumann, pp. 71–73 (Leipzig: Barth).
Benussi, V. (1922–23). Introduzione alla psicologia sperimentale. Lezioni tenute nell’anno 1922–23, typescript
by Dr. Cesare Musatti. Fondo Benussi. (Milan: University of Milan Bicocca).
Brentano, F. (1874/1995a). Psychologie vom Empirischen Standpunkte (Leipzig: Duncker & Humblot). En.
edition (1995) by L. McAlister (London: Routledge).
Brentano, F. (1895). Meine letzten Wünsche fϋr Österreich (Stuttgart: Cotta).
Brentano, F. (1907/1979). Untersuchungen zur Sinnespsychologie (Leipzig: Duncker & Humblot), edited
(1979) by R. M. Chisholm and R. Fabian (Hamburg: Meiner).
36 Albertazzi

Brentano, F. (1966/1979). Die Abkehr vom Nichtrealen, edited by F. Mayer-Hillebrand (Hamburg: Meiner).


Brentano, F. (1976/1988). Philosophische Untersuchungen zu Raum, Zeit und Kontinuum, edited by
R. M. Chisholm and S. Körner (Hamburg: Meiner). En. tr. (1988) by B. Smith (London: Croom Helm).
Brentano, F. (1982). Descriptive Psychologie, edited by R. M. Chisholm and W. Baumgartner
(Hamburg: Meiner). En. tr. (1982) by B. Müller (London: Routledge & Kegan Paul).
Brentano, F. (1995b). Deskriptive Psychologie, edited by R. M. Chisholm and W. Baumgartner
(Hamburg: Meiner). En. tr. by B. Muller (London: Routledge).
Calabresi, R. (1930). History of Psychology 14(1), pp. 53–79.
Chisholm, R. M. (1973). ‘Parts as Essential to their Whole.’ Review of Metaphysics 25: 581–603.
Chisholm, R. M. 1975. ‘Mereological Essentialism: Some Further Considerations.’ Review of Metaphysics
27: 477–484.
Cornelius, H. (1897). Psychologie als Ehrfahrungswissenschaft (Leipzig: B. G. Teubner).
Czerwinski, M. P., Lightfoot, N., and Shiffrin, R. M. (1992). ‘Automatization and Training in Visual Search.’
American Journal of Psychology, special issue on ‘Views and Varieties of Automaticity’ 105: 271–315.
Da Pos, O. (1995). ‘The Pleasantness of Bi-colour Combinations of the Four Unique Hues.’ In Aspects of
Colour, edited by Arnkil, H., and Hämäläinen, E., pp. 164–174 (Helsinki: UIAH The University of Art
and Design).
Dennett, D. C. (1978). Brainstorms. Philosophical Essays on Mind and Beliefs (Brighton: Harvester Press).
Ebbinghaus, H. (1902). Grundzüge der Psychologie, 2 vols. (Leipzig: Veit).
Ehrenstein, W. (1965). Probleme des höheren Seelenlebens (München/Basel: Reinhard Verlag).
Fechner, G. T. (1860/1966). Elemente der Psychophysik (Leipzig: Breitkopf & Härtel). En. tr. (1966)
(New York: Holt, Rineheart & Winston).
Geissler, L. R. (1917). ‘The Affective Tone of Color Combinations.’ Studies in Psychology (Titchener
Commemorative Volume), pp. 150–174 (Worcester: L. N. Wilson).
Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Boston: Houghton Mifflin Co.).
Goldstone, R. (1998). ‘Perceptual Learning.’ Annual Review of Psychology 49: 585–612.
Granger, G. W. (1955). ‘An Experimental Study of Colour Harmony.’ The Journal of General Psychology
52: 21–35.
Gregory, R. L. (1986). Odd Perceptions (London: Methuen).
Grelling, K., and Oppenheim, P. (1937/8). ‘Der Gestaltbegriff in Lichte der neuen Logik.’ Erkenntnis 7:
211–225. En. tr. in Foundations of Gestalt Psychology (1988), edited by B. Smith, pp. 82–117 (München,
Wien: Philosophia Verlag).
Guilford, J. P., and Smith, P. C. (1959). ‘A System of Color-Preferences.’ The American Journal of Psychology
72(4): 487–502.
Gurwitsch, A. (1966). The Field of Consciousness (Pittsburgh: Duquesne University).
Hartmann, L. (1932). ‘Neue Verschmelzungsprobleme.’ Psychologische Forschung 3: 322–323.
Hering, E. (1920/1964). Outlines of a Theory of the Light Sense (Berlin, New York: Springer).
Höfler, A. (1897). Psychologie (Wien: F. Tempsky).
Hume, D. (1739/2007). A Treatise on Human Nature, a critical edition by David Fate Norton and Mary J.
Norton (Oxford: Clarendon Press).
Hurvich, L. M., and Jameson, D. (1955). ‘Some Quantitative Aspects of an Opponent-Colors Theory. II’.
Journal of the Optical Society of America 45: 602–6.
Husserl, E. (1891/2003). Philosophie der Arithmetik: Psychologische und logische Untersuchungen.
Halle: Niemeyer. En. tr. (2003) by D. Willard (Dordrecht: Kluwer).
Husserl, E. (1896/1979). ‘Review of Twardowsky, Zur Lehre vom Inhalt und Gegenstand der Vosrtellungen.’
Husserliana XXII, Aufsätze und Rezensionen (1890–1910), edited by B. Rang, pp. 348–356 (The Hague:
M. Nijhoff).
Philosophical background 37

Husserl, E. (1900–01/1970). Logische Untersuchungen, 2 vols (Niemeyer: Halle). En tr. (1970) by


J. N. Findlay (London: Routledge).
Husserl, E. (1913/1989). Ideen zu einer reinen Phänomenologie und phänomenologische Philosophie, 3 vols.
(Halle: Niemeyer). En tr. (1989) (Dordrecht: Kluwer).
Husserl, E. (1966a/1991). Zur Phänomenologie des inneren Zeitbewusstseins, edited by R. Boehm,
Husserliana X (Den Haag: Nijhoff). En. tr. (1991) by J. Barnett Brough (Dordrecht: Kluwer).
James, W. (1890/1950). Principles of Psychology, 2 vols. (Boston: Holt and Co.).
Kanizsa, G. (1952). ‘Legittimità di un’analisi del processo percettivo fondata su una distinzione in ‘fasi’ o
‘stadi’’. Archivio di Psicologia, Neurologia e Psichiatria 13: 292–323.
Kanizsa (1980). La grammatica del vedere (Bologna: Il Mulino).
Kanizsa, G. (1991). Vedere e pensare (Bologna: Il Mulino).
Katz, D. (1906). ‘Experimentelle Beiträge zur Psychologie des Vergleichs im Gebiete des Zeitsinns.’
Zeitschrift für Psychologie 42: 302–340.
Katz, D. (1935). The World of Colour (London: Routledge).
Kenkel, F. (1913). ‘Untersuchungen über den Zusammenhang zwischen Erscheinungsgrösse und
Erscheinungsbewegung bei einer sogenannten optischen Täuschungen.’ Zeitschrift für Psychologie
67: 358–449.
Klages, L. (1933). Vom Wesen des Bewußtseins, 2nd ed. (Leipzig: Barth).
Koenderink, J. J. (1990). Solid Shape (Cambridge, MA: MIT Press).
Koenderink, J. J. (2002). ‘Continua in Vision.’ In Unfolding Perceptual Continua, edited by L. Albertazzi,
pp. 101–118 (Amsterdam: Benjamins Publishing Company).
Koenderink, J. J. (2010). ‘Information in Vision.’ In Perception Beyond Inference. The Information Content of
Perceptual Processes, edited by L. Albertazzi, G. van Tonder, and D. Vishwanath, pp. 27–57 (Cambridge,
Mass.: MIT Press).
Koenderink, J. J. (2013). ‘Surface Shape, the Science and the Look.’ In The Wiley-Blackwell Handbook of
Experimental Phenomenology. Visual Perception of Shape, Space and Appearance, edited by L. Albertazzi,
pp. 165–180. London: Wiley-Blackwell.
Koenderink, J. J., and van Doorn, A. (2006). ‘Pictorial Space, a Modern Reappraisal of Adolf
Hildebrand.’ In Visual Thought. The Depictive Space of Perception, edited by L. Albertazzi, pp. 135–154
(Amsterdam: Benjamins Publishing Company).
Koenderink, J. J., Albertazzi, L., van Doorn, A., van de Grind, W., Lappin, J., Farley, N., Oomes, S.,
te Pas, S., Phillips, F., Pont, S., Richards, W., Todd, J., and de Vries, S. (2010). ‘Does Monocular Visual
Space Contain Planes?.’ Acta Psychologica 134(1): 40–47.
Koenderink, J. J., Richards, W., and van Doorn, A. (2012). ‘Blow up: A Free Lunch?.’ I-Perception
3(2): 141–145. DOI:10.1068/i0489sas
Koffka, K. (1915/1938). ‘Beiträge zur Psychologie der Gestalt und Grundlegung der
Wahrnehmungpsychologie. Eine Ausenaindersetzung mit V. Benussi.’ Zeitschrift für Psychologie und
Physiologie der Sinnesorgane 73: 11–90. En. tr. (1938) (repr. 1991) in A Source Book of Gestalt Psychology,
edited by W. D. Ellis, pp. 371–378 (London: Kegan Paul).
Koffka, K. (1935). Principles of Gestalt Psychology (London: Routledge & Kegan Paul).
Koffka, K., and Kenkel, F. (1913). ‘Beiträge zur Psychologie der Gestalt- und Bewegungserlebnisse. I.
Untersuchungen ϋber den Zusammenhang zwischen Erscheinunsgrosse und Erscheinungsbewegung
bei einigen sogennaten Täuschungen.’ Zeitschrift für Psychologie und Physiologie der Sinnesorgane 67:
pp. 353–449.
Köhler, W. (1913). ‘Ȕber unbemerkte Empfindungen und Urteilstȁuschungen.’ Zeitschrift für Psychologie
und Physiologie der Sinnesorgane 66: 51–80.
Kopferman, H. (1930). ‘Psychologische Untersuchungen über die Wirkung zweidimensionaler
Darstellungen körperliche Gebilde.’ Psychologische Foschung 67: 293–364.
38 Albertazzi

Krueger, F. (1953). Zur Philosophie und Psychologie der Ganzheit (Berlin: Springer).


Kubovy, M., and Wagemans, J. (1995). ‘Grouping by Proximity and Multistability in Dot
Lattices: A Quantitative Gestalt Theory.’ Psychological Science 6(4): 225–234.
Lappin, J. S., Bell, H. H., Harm, O. J., and Kottas, B. L. (1975). ‘On the Relation between Time and Space
in the Visual Discrimination of Velocity.’ Journal of Experimental Psychology: Human Perception and
Performance 1(4): 383–94.
Lappin, J. S., and van de Grind, W. A. (2002). ‘Visual Forms in Space-Time.’ In Unfolding Perceptual
Continua, edited by L. Albertazzi, pp. 119–146 (Amsterdam: Benjamins Publishing Company).
Lappin, J. S., Norman, J. F., and Phillips, F. (2011). ‘Fechner, Information, and Shape Perception.’ Attention,
Perception & Psychophysics 73(8): 2353–2378. DOI: 10.3758/s13414-011-0197-4.
Lindemann, E. (1922). ‘Experimentelle Untersuchungen über das Entstehen und Vergehen von Gestalten.’
Psychologische Forschung 2: 5–60.
Mach, E. (1886). Beiträge zur Analyse der Empfindungen (Jena: Fischer). En. tr. (1897) (La Salle: Open Court).
Major, D. R. (1895). ‘On the Affective Tone of Simple Sense Impressions.’ The American Journal of
Psychology 7: 57–77.
Mace, W. M. (1977). ‘James J. Gibson’s Strategy for Perceiving: Ask not What’s Inside your Head, but What
your Head’s Inside of.’ In Perceiving, Acting, and Knowing, edited by R. E. Shaw and J. Bransford, pp. 43–65
(Hillsdale, NJ: Lawrence Erlbaum Associates).
Marr, D. (1982). Vision (San Francisco: Freeman Press).
Mausfeld, R. (2010). ‘The Perception of Phenomenal Material Qualities and the Internal Semantics of the
Perceptual System.’ In Perception beyond Inference. The Information Content of Perceptual Processes,
edited by L. Albertazzi, G. van Tonder, and D. Vishwanath, pp. 159–200 (Cambridge, Mass.: MIT
Press).
Meinong, A. (1877). ‘Hume Studien I: Zur Geschichte und Kritik des modernen Nominalismus.’
Sitzungsberichte der philosophisch-historischen Klasse der Kaiserlischen Akademie der Wissenschaften
87: 185–260. Repr. in Alexius Meinong’s Gesamtausgabe (GA), edited by R. Haller (Wien: Gerold’s Sohn).
Meinong, A. (1882). ‘Hume Studien II: Zur Relationstheorie.’ Sitzungsberichte der philosophisch-historischen
Klasse der Kaiserlichen Akademie der Wissenschaften (Wien) 101: 573–752. Repr. (1882) GA vol. II,
pp. 1–183 (Wien: Carl Gerold’s Sohn).
Meinong, A. (1891). ‘Zur Psychologie der Komplexionen und Relationen.’ Zeitschrift für Psychologie und
Physiologie der Sinnesorgane 2: 245–265. Repr. GA vol. I, pp. 279–303.
Meinong, A. (1899). ‘Über Gegenstände höherer Ordnung und deren Verhältnis zur inneren Wahrnehmung.’
Zeitschrift für Psychologie und Psysiologie der Sinnesorgane 21: 182–272. Repr. GA vol. II, pp. 377–480.
Meinong, A. (1910/1983). Über Annahmen (Leipzig: Barth) (1st ed. 1902). Repr. GA vol. IV, pp. 1–389,
517–535. En. tr. (1983) by J. Heanue (Berkeley: University of California Press).
Metzger, W. (1934). ‘Beobachtungen über phänomenale Identität.’ Psychologische Forschung 19: 1–49.
Metzger, W. (1936/2006). Laws of Seeing, tr. by L. Spillmann, S. Lehar, M. Stromeyer, and M. Wertheimer
(Cambridge, Mass.: MIT Press) (1st ed. 1936).
Metzger, W. (1941/1963). Psychologie: die Entwicklung ihrer Grundannahmen seit der Einführung des
Experiments (Dresden: Steinkopf).
Michotte, A. (1950/1991). ‘A propos de la permanence phénoménale: Faits et theories.’ Acta Psychologica
7: 293–322.Repr. (1991) in Michotte’s Experimental Phenomenology of Perception, edited by G. Thinès,
A. Costall, and G. Butterworth, pp. 117–121 (Hillsdale: Erlbaum).
Müller-Lyer, F. C. (1889). ‘Optische Urteilstäuschungen.’ Archiv für Anatomie und Physiologie. Physiologische
Abteilung 2: 263–270.
Musatti, C. L. (1924). ‘Sui fenomeni stereocinetici.’ Archivio Italiano di Psicologia 3: 105–120.
Musatti, C. L. (1955). ‘La stereocinesi e la struttura dello spazio visibile.’ Rivista di Psicologia 49: 3–57.
Noë, A. (2004). Action in Perception (Cambridge, MA: MIT Press).
Philosophical background 39

O’Reagan, J., and Noë, A. (2001). ‘A Sensorymotor Account of Vision and Visual Consciousness.’
Behavioural and Brain Sciences 24(5): 939–1031.
Passmore, J. (1968). A Hundred Years of Philosophy 3rd ed. (London: Penguin Books).
Rensink, R. A. (2000). ‘Seeing, Sensing, Scrutinizing.’ Vision Research 40: 1469–87.
Rensink, R. A. (2002). ‘Change Detection’. Annual Review Psychology 53: 245–77.
Rock, I. (1983). The Logic of Perception (Cambridge, Mass.: MIT Press).
Rubin, E. (1958). ‘Figure and Ground.’ In Readings in Perception, edited by D. C. Beardsley and
M. Wertheimer (New York: Van Nostrand).
Sander, F. (1930). ‘Structures, Totality of Experience and Gestalt.’ In Psychologies of 1930, edited by C.
Murchison (Worcester, Mass.: Clark University Press).
Smith, A. D. (2008). ‘Husserl and Externalism.’ Synthese 160(3): 313–333.
Spiegelberg, H. (1982). The Phenomenological Movement, 2nd ed. The Hague: Nijhoff.
Spillmann, L. (2009) ‘Phenomenology and Neurophysiological Correlations: Two Approaches to Perception
Research.’ Vision Research 49(12): 1507–1521. http://dx.doi.org/10.1016/j.visres.2009.02.022.
Spillmann, L., and Ehrenstein, W. (2004). ‘Gestalt Factors in the Visual Neurosciences?.’ The Visual
Neurosciences 19: 428–434.
Stumpf, C. (1883). Tonpsychologie, 2 vols. (Leipzig: Hirzel).
Todd, J. T. (2004). ‘The Visual Perception of 3D Shape.’ TRENDS in Cognitive Sciences 8(3): 115–121.
doi:10.1016/j.tics.2004.01.006.
Twardowsky, K. (1894/1977). Zur Lehre vom Inhalt und Gegenstand der Vorstellungen. Wien: Hölder. En. tr.
(1977) by R. Grossman (The Hague: Nijhoff).
Tse, P. U. (1998). ‘Illusory Volumes from Conformation’. Perception 27(8): 977–992.
Valberg, A. (1971). ‘A Method for the Precise Determination of Achromatic Colours Including White’.
Vision Research 11: 157–160.
Valberg, A. (2001). ‘Unique Hues: An Old Problem for a New Generation.’ Vision Research 41: 1645–1657.
http://dx.doi.org/10.1016/S0042-6989(01)00041-4.
Vicario, G. B. (1994). ‘Gaetano Kanizsa: The Scientist and the Man’. Japanese Psychological Research
36: 126–137.
von Allesch, G. J. (1925a). ‘Die aesthetische Erscheinungsweise der Farben’ (Chapters 1–5). Psychologische
Forschung 6: 1–91.
von Allesch, G. J. (1925b). ‘Die aesthetische Erscheinungsweise der Farben’ (Chapters 6–12). Psychologische
Forschung 6: 215–281
von Ehrenfels, C. (1890/1988) ‘Über Gestaltqualitäten.’ Vierteljharschrift für wissenschaftliche
Philosophie 14: 242–292. En. tr. in B. Smith ed. (1988), Foundations of Gestalt Psychology, pp. 82–117
(München-Wien: Philosophia Verlag).
Wagemans, J., Elder, J. E., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der
Heydt, R. (2012). ‘A Century of Gestalt Psychology in Visual Perception. I. Perceptual Grouping and
Figure-Ground Organization.’ Psychological Bulletin. Doi: 10.1037/a0029333.
Wackermann, J. (2010). ‘Psychophysics as a Science of Primary Experience.’ Philosophical Psychology 23:
189–206.
Warren, W. H. (2005). ‘Direct Perception: The View from here.’ Philosophical Topics 33(1): 335–361.
Warren, W. H. (2006). ‘The Dynamics of Perception and Action.’ Psychological Review 113(2): 358–389.
DOI: 10.1037/0033-295X.113.2.358.
Welham A. K., and Wills, A. J. (2011). ‘Unitization, Similarity, and Overt Attention in Categorization and
Exposure.’ Memory and Cognition 39(8): 1518–1533.
Wellek, A. (1954). Die genetische Ganzheitspsychologie. (München: Beck).
40 Albertazzi

Wertheimer, M. (1912/2012). ‘Experimentelle Studien über das Sehen von Bewegung.’ Zeitschrif für
Psychologie 61: 161–265. En tr. by M. Wertheimer and K. W. Watkins, in Max Wertheimer, On Perceived
Motion and Figural Organization, edited by L. Spillmann, pp. 1–92 (Cambridge, Mass.: MIT Press).
Wertheimer, M. (1925a/1938). ‘Untersuchungen zur Lehre von der Gestalt. I.’ Psychologische Forschung 4:
47–58. En tr. (1938; repr. 1991) in A Source Book of Gestalt Psychology, edited by W. D. Ellis, pp. 12–16
(London: Kegan Paul).
Wertheimer, M. (1925b/1938). Über Gestalttheorie (Erlangen). En tr. (1938; repr. 1991) in A Source Book of
Gestalt Psychology, edited by W. D. Ellis, pp. 1–11 (London: Kegan Paul).
Witasek, S. (1899). Grundlinien der Psychologie (Leipzig: Dürr).
Zanarini, G. (2001). ‘Hermann von Helmholtz and Ernst Mach on Musical Consonance.’ In The Dawn
of Cognitive Science. Early European Contributors 1870–1930, edited by L. Albertazzi, pp. 135–150
(Dordrecht: Kluwer).
Chapter 3

Methodological background:
Experimental phenomenology
Jan J. Koenderink

Physics, Psychophysics, and Experimental


Phenomenology
The human observer deploys various organs of sense as physical or chemical instruments, to
monitor the environment. Of the classical five senses (Aristotle ca.350 BCE), two are aimed at the
chemical constitution of matter (the olfactory and gustatory senses), whereas the others are aimed
at various physical properties. Vision allows observations in the realm of optics (electromagnetic
radiation in the range of 1.65–2.5 eV photon energy), hearing in the realm of acoustics (air pres-
sure vibrations in the frequency range 10 Hz–20 kHz). ‘Touch’ is a mixed sense that allows a vari-
ety of mechanical and thermal interactions to be monitored. The ‘sense organ’ of touch is diffuse,
and involves the skin and the skeleto-muscular system. Of course, the body contains numerous
sensors that lie outside of Aristotle’s taxonomy. Most of these (e.g. the baroreceptors in the aorta)
have at most a diffuse effect on your immediate awareness, although some (e.g. the vestibular
labyrinthine system) occasionally do influence awareness directly.
In daily life one depends on various multimodal interactions, and it remains often unclear
exactly how one became aware of certain environmental properties. This makes ecological sense,
because important physical properties typically become manifest in many, mutually correlated
ways. For instance, small things tend to be lighter, move faster, sound higher, and—if animate—
live shorter than large things. The definition of physical properties and their operational defini-
tion by way of measurement ultimately derive from such multimodal experiences.
Consider weight as an example. Primitive man must have been keenly aware of weight in an
absolute sense. It is easy enough to classify objects as heavy or light, just by handling or lifting
them. In agricultural societies one develops a notion of relative weight. One adopts certain objects
as standard, and ‘measures’ weight by comparison with the (common) standard. A  frequently
adopted method is the use of ‘scales’, which offers a sensitive way of comparing the equilibrium
state by eye measure. Notice that this obviates the need for a perception of weight. It is an example
of a perceptual attribute that has been ‘objectified’ as a physical measurement. Similar methods
are also easily developed for pitch, brightness, and so forth. Such methods are called objective,
because the senses are only used to notice the simplest states, such as the coincidence of a mark
with a fiducial marking on a scale. Just consider: you may sweat and strip, whereas I shiver and
put on a sweater! Yet we may both agree on the level of a mercury column in some glass tube, and
declare the ‘temperature to be 20ºC’. The 20ºC has little to do with your feeling of warmth. Physics
has taken over.
Physics allows one to practice a science in which the observer as a sentient being is absent in
the limit. Of course, limits can never be reached. If the interest is in the observer itself, physics
42 Koenderink

becomes of marginal interest. Consider the case of weight again. A kilogram of feathers by defini-
tion weighs as much as a kilogram of lead, yet they are experienced as ‘somehow different’ by the
human observer (Charpentier 1891).
In 1846 Ernst Heinrich Weber published Tastsinn und Gemeingefüll (Weber 1905). One result
he had found was that the human observer, in comparing weights placed upon the two hands, can
just notice a 5 per cent difference in weight—that is 50 g on a kilogram, or 5 g on 100 g. This law
of proportionality is known as ‘Weber’s Law’ (name due to Fechner). Gustav Theodor Fechner
published Elemente der Psychophysik in 1860 (Fechner 1860). He analytically ‘integrated’ Weber’s
Law, and thus framed what is commonly known as the Weber–Fechner Law: the sensation (in
this case the quantity of the feeling of heaviness) is proportional to the logarithm of the physical
stimulus (in this case weight). Fechner referred to this as ‘The Psychophysical Law’. (In all fair-
ness to Fechner, his ‘Psychophysical Law’ properly applies to arbitrary, just noticeable differences,
Weber’s law being just a particular example.)
Notice that we deal with a number of ontologically very different entities here1. We have at least
to reckon with the magnitude of a physical parameter, the judgment of equality of an environ-
mental property, the notion of the just noticeable difference in some environmental parameter,
and the magnitude of a certain experience. The physical parameter is often assumed to be trivial,
because physics is supposed to be the most elementary of the sciences. Of course, this is not quite
true. For one thing, physics derives from human experience, rather than the other way around, a
fact that is often forgotten. For another thing, the nature of mass in physics is not really that well
understood (does it involve an understanding of the Higgs boson2, or does it involve a composite
nature of the electron3?). However, I’ll let that be, for the elementary notions of detectability and
discriminability are more interesting. If you perform the experiment ‘right’, these notions can be
made very ‘objective’. Objectivity implies ‘independent of any first-person account’. In the highest
regarded methods the person making the judgments is largely (or even fully) unaware of experi-
encing anything at all. I will refer to such cases as ‘dry physiology’. Most of classical psychophys-
ics falls in this general ballpark. With methods like EEG-recording the ideal is actually reached.
One may derive signals from the body in response to physical stimuli that the person never (or
only after some time interval) becomes consciously aware of. The ‘magnitude of an experience’ is
in a different ballpark altogether. It is literally like a pain in the ass, in that it involves conscious
personal awareness.
Something like a ‘magnitude of experience’ may be considered mysterious, and perhaps not to
be counted as a scientific fact. One popular account would denote it ‘epiphenomenal to certain
neural events’4,5. This is like saying that ‘pain is the firing of C-fibres’, indeed a popular notion
(Puccetti 1977). The optimistic feeling is that once science prevails people will stop referring to
pre-scientific notions like pain.
A ‘magnitude of experience’ is not even the most mysterious entity around. Many naive observ-
ers actually feel that they experience (are aware of) qualities and meanings—at least that is what

1  On ontological emergence see Silberstein and McGeever (1999).


2
  On the Higgs boson, see <http://press.web.cern.ch/press-releases/2012/07/cern-experiments-observe-
particle-consistent-long-sought-higgs-boson>.
  On the origin of mass and the composite nature of the electron, see <http://arxiv.org/pdf/physics/0010050.pdf>.
3

4  On epiphenomenalism see <http://plato.stanford.edu/entries/epiphenomenalism>.


  On reductionism see <http://www.disf.org/en/Voci/104.asp>.
5
Methodological background 43

they report, whatever that may be construed to mean. For instance, some visual observers, when
confronted with pieces of colored paper, are perfectly happy to grade them as ‘red’, ‘blue’, ‘yellow’,
and so forth. Notice that such observers are grading visual experiences here, not physical objects.
It is easy enough to change the state of the environment (including the observer), such that the
qualities change, relative to the identity of the objects. One may consider numerous confusions
at this point. For instance, it is not uncommon to hear remarks like ‘the red paper looks blue to
the observer’. Of course, that is a confusion of ontological levels. A thing that looks blue is a blue
visual thing. The ‘red paper’ referred to is another thing—here ‘red’ refers apparently to a physical
property. We are discussing visual things here.
I will denote the study of first-person reports such as ‘I see a blue patch’ as a function of the struc-
ture of the physical environment ‘experimental phenomenology’ (Varela, Maturana, and Uribe
1974)6. It is different from ‘dry physiology’, which I will denote ‘psychophysics’. Psychophysics is
again different from ‘physics’, which I will treat as the level at which ‘the buck stops’ as inquiry
goes. This is in no way necessary; for instance, the physicist will certainly want to carry the inquiry
further indefinitely.

Measurement in Psychophysics
Since I defined psychophysics as ‘dry physiology’, it only makes sense that psychophysics often
makes use of physiological measurements. These are usually physical measurements of an electri-
cal, mechanical, or thermal nature. Historically, reaction times have been very important; later
EEG-recording became a common method; at this time in history various techniques of ‘brain
scanning’ are becoming increasingly popular. Such methods are not essentially different from the
methods of animal physiology. Here I will concentrate upon methods in which the observer has
an active role.
The role of the observer can be various. In the simplest cases the observer has to indicate equal-
ity or its absence in a pair of prepared physical environments. The observer is not required to
comment on the nature of the difference. In some cases the observer may have to judge the dif-
ference between something and nothing. The ‘something’ remains undefined. In many cases, the
observer will actually be unaware of the nature of it—that is to say, will be hard-put to describe its
qualities. In such cases the observer acts as a ‘null-detector’. It is much like the case of weighing
with scales in which the person notices equilibrium, but has no experience of the quality of ‘heavi-
ness’, such as happens with objects too heavy to lift.
These are the measurements of ‘absolute thresholds’ and of ‘discrimination thresholds’. One
often assumes that such thresholds in some way ‘exist’, even when not being measured. The
experiment simply tries to measure this pre-existing value as precisely as possible. A  plethora
of methods have been developed for that. The reader is referred to the standard literature for
this (Luce 1959; Farell and Pelli 1999; Ehrenstein and Ehrenstein 1999; Treutwein 1995; Pelli and
Farell 1995). Decades of work have resulted in a wealth of basic knowledge in (especially) vision
and audition. The development of modern media like television and high-fidelity sound record-
ing would have been impossible without such data. Yet it is easily possible to question the basic
assumptions. The thresholds are evidently idiosyncratic, and depend upon the present physiologi-
cal state of the observer. It is probably more reasonable to understand thresholds as operationally
defined, than as pre-existing. Indeed, different operationalizations typically yield (at least slightly)

  On phenomenology see Albertazzi (forthcoming).


6
44 Koenderink

different values. To discuss the question ‘which value is right’ seems hardly worthwhile. In a few
cases the thresholds can be related to basic physical constraints. For instance, electromagnetic
energy comes as discrete photon events (Bouman 1952), setting physical limits to the thresholds,
and Brownian movement of air molecules causes ‘noise’ that limits the audibility of weak sounds
(Sivian and White 1933). Especially in such cases, the notion of ‘dry physiology’ (essentially a
subfield of physics) appears an apt term.
If you have ever been an observer in a classical threshold experiment yourself, you will under-
stand that I only indicated the top of the iceberg. In the best, most objective, methods, the experi-
menter and the observer are both unaware of what they are doing. Such experiments are called
‘double blind’; these are considered the only ones to be trusted unconditionally. If the method has
been optimized for time, the observer will have a fifty-fifty chance of ‘being right’ at each trial.
‘Being right’ is relative to the notion that there exists a threshold independent of the method of
finding it. This puts the observer in a very unfortunate spot, namely maximum uncertainty. This
is especially unpleasant if you don’t know what you are supposed to ‘detect’. The best experiments
are like Chinese torture. This frequently happens in adaptive multiple forced-choice procedures.
The observer often has no clue as to what she is supposed to notice. One trick of the observer is
to respond randomly, in an attempt to have the method raise the stimulus level, so as to be able to
guess at the task. This is an idea that might not occur to actually ‘naive’ observers, which is perhaps
one reason for their popularity. Then the observer tries to remember what the task was, while—at
least in the observer’s experience—nothing is perceived at all. Such methods depend blindly on a
number of shaky assumptions, and their claims to objectivity, precision, and efficiency are argu-
able. In my view it remains hard to beat Fechner’s simple ‘method of limits’, ‘method of constant
stimuli’, and ‘method of adjustment’ (Farell and Pelli 1999; Ehrenstein and Ehrenstein 1999; Pelli
and Farell 1995), both conceptually and pragmatically.
In my experience, many observers try to ‘cheat’ by aiming at a level somewhat above threshold.
This is often possible because the experimenter will never notice. I can say from (much) experience
as an observer that it feels way better, and from (much) experience as an experimenter that it yields
much better results. Of course, this is bad, for it defeats the purpose. As an observer you are able to
manipulate the threshold. In many cases it is possible to maintain a number of qualitatively different
thresholds. For instance, in the case of the contrast threshold for uniformly translating sine-wave
gratings (about three decades worth of literature!) an observer can easily maintain thresholds for:
•  Seeing anything at all;
•  Seeing movement, but not its direction;
•  Seeing movement in a specific direction;-Seeing something spatially articulated moving;
•  Seeing stripes, but being uncertain about their spacing or width;
•  Seeing well-defined stripes moving;
•  and so forth.
It will depend upon the physical parameters what one will be aware of. Such things have rarely
been recorded in the literature (Koenderink and van Doorn 1979). However, they must be obvious
to anyone who was ever an observer. They must have been obvious to experimenters who occa-
sionally acted as an observer themselves. However, some experimenters never act as an observer,
in fear of losing their status as an objective bystander. Many are reluctant to admit that they did.
The point I am making here is that one should perhaps take the literature with a little grain of salt.
It is hard, maybe impossible, to really understand an experiment you are reading about, unless
you were at least once an observer in it yourself. This perhaps detracts a bit from the apparently
Methodological background 45

tidy objectivity of such reports. For the hardcore brain scientist this does not pose a problem, for
on the ontological level of physiology the observer’s reports are mere subjective accounts, and do
not count as scientific data. Moreover, visual awareness is epiphenomenal with respect to the real
thing, which is electrochemical activity in the brain. Numerical threshold data are supposed to
carry their own meaning.
Perhaps more interesting cases involve supra-threshold phenomena. These are often more
important from an applications perspective. It also involves the observer’s perceptual awareness.
It does not necessarily involve the observer’s recognition or understanding (in reflective thought)
of the perception. The techniques almost all involve a comparison of two or more perceptual
entities. In case the comparison is between successive cases, memory will also be involved. The
comparison may involve mere identity, in which case we are back in the dry physiology situation,
but more commonly involves some partial aspect of the perceptual awareness. In that case one
draws on the observer’s ability to somehow parse awareness.
An extreme example is Stanley Smith Stevens’ (proud author of the ‘Handbook of Experimental
Psychology’ (1951), counting over 1400 pages (Stevens 1951)) method of intermodal comparison
(the famous paper ‘On the Psychophysical Law’, dating from 1957 (Stevens 1957)). Stevens had
people ‘equate’ anything with anything, like equating brightness of an illuminated patch with force
exerted in a handgrip (or anything you might imagine). What could this mean? Apparently people
are comparing ‘magnitudes of sensation’ in the Fechnerian sense. It is not easy to understand what
is really going on here. Such experiments are simple enough to program on a modern computer,
and it is worthwhile to gain the experience. For instance, you may try to equate brightness with
loudness. Stevens’ Law tells us that all magnitudes of sensation are related by power laws, the argu-
ment being that power laws form a group under concatenation. It is hard to assess how reasonable
this argument is. Perhaps remarkably, in practice it works amazingly well. Moreover, silly as the
task sounds, most observers have no problem with it. They simply do it.
A special case of Stevens’ method of comparison is to let the observer relate a magnitude of sensa-
tion to numbers. One starts with some instance and encourages the observer to call out a number
(any number). Then further instances are supposed to be related to this, the number scale being
considered a ratio scale. This is often called ‘direct magnitude estimation’ (Poulton 1968). It has
often been shown to lead to apparently coherent results. This might perhaps be interpreted as an
indication that the ‘magnitude of sensation’ is a kind of quality that is immediately available to the
observer.
An interesting approach is Thurstone’s method of comparison (Thurstone 1927, 1929). Given
three items, you are required to judge which item is the (relative) outlier. This is evidently a metric
method—at least it purports to be by construction. The observer is not required to know on what
basis the decision is to be made, rendering the method ‘objective’. However, different from the
pairwise comparison, the observer is forced to judge on the basis of some quality (or qualities),
forced by the very choice of stimuli. Moreover, the method yields a clear measure of consistency.
This is what I like best. If the task makes no sense to the observer, the results will be verifiably
inconsistent. If the data are consistent, one obtains a metric. Simple examples appear impressive
at first sight. For instance, using pieces of paper, one obtains a metric that appears to reflect the
structure of the color circle. Does this ‘objectify’ the color circle? Perhaps, but it does not do so in
an interesting way. The same structure can be obtained from judgments of pairwise equality. It has
nothing to do with the quality we know as ‘hue’.
In the final analysis, if you want to study ‘hue’ as a quality, all you can do is rely on first-person
accounts of ‘what it is like’ to experience hue (e.g. to ‘have red in mind’ or ‘experience redly’). That
means moving to experimental phenomenology.
46 Koenderink

Experimental Phenomenology
Consider the instance of hue naming. It is easy enough to check whether observers can perform
this task in a coherent manner. One simply asks for the hues of a large number of objects that dif-
fer only in a few spectral parameters (e.g., the RGB colors of a CRT tube), presenting each object
multiple times. One goes to some length to keep the physical environment stable. For instance,
one shows the objects in open shade at noon on a sunny day, or uses a CRT in a dark room. This
allows one to check reproducibility. One finds that observers do indeed yield coherent results,
inconsistencies being limited to objects that appear very similar. The fuzzy equivalence sets7
appear to be fixed for a given observer. Moreover, there are numerous observers that essentially
agree in their judgments, although occasional dissenters occur. This suggests that the hue names
are not totally idiosyncratic. One might say that there exists something of a ‘shared objectivity’
among a large group of observers (Berlin and Kay 1969).
Such a shared objectivity is by no means the same as the (true) objectivity that is the ideal of the
sciences. In physics the ‘facts’ are supposed to be totally independent of the mind of any individual
observer. On closer analysis the facts of physics are defined by community opinion, the community
being a group of people that recognize each other as professionals (a ‘peer group’). They agree on
the right way to do measurements, to analyze the results, and so forth. There is no doubt that this
has been shown to work remarkably well. However, it is certainly the case that some ‘facts’ are hotly
debated in the community (like tachyonic neutrinos (Reich 2011), or the recent Higgs boson). There
are also cases where the system did not work too well, like the (in)famous case of Schiaparelli’s
Martian canals8, which played an important role in planetary science for decades9, but are now
regarded as non-existent. Thus the ideal of ‘true objectivity’ is evidently a fiction, at best a virtual lim-
iting case. One should perhaps not to hastily dismiss shared objectivity as totally unscientific. That
so many people are ready to judge blood ‘red’ and grass ‘green’ is hardly entirely meaningless. Nor
is it explained away by the spectral locations of the hemoglobin and chlorophyll absorption bands.
Researchers in the Gestalt tradition10 frequently use the method of ‘compelling visual proof ’.
One prepares an optical scene, and collects the majority community opinion on the structure of
immediate visual awareness in the presence of the scene. In cases of striking majority consensus,
one speaks of an ‘effect’, reified through shared objectivity. An example is the figure–ground struc-
ture of visual awareness. Visual objects are seen against a ground, the contour belonging to the
object, the ground apparently extending behind the object. The phenomenon of figure–ground
reversal proves that this is a purely mental phenomenon, there being no physics of the matter.
Most researchers accept compelling visual proofs as sufficient evidence for the reality of an effect.
The striking visual proof implies shared objectivity over a large group of observers, which goes
some way towards the virtual limit of ‘true objectivity’. However, it is accepted that there might be
a minority group that ‘fails to get the effect’.
Visual proofs are not limited to the psychology of Gestalt. They are actually common in math-
ematics, especially geometry. For instance, several visual proofs of the Pythagorean theorem are
well known11. Many mathematicians consider proofs only useful when they are ‘intuitive’, by which

  On fuzzy sets see Zadeh (1965).


7

  Le Mani su Marte: I diari di G.V. Schiaparelli. Observational diaries, manuscripts, and drawings (Historical
8

Archive of Brera Observatory).


  Infamous is the book by Sir Percival Lowell (Lowell 1911).
9

  On the Gestalt tradition see Wagemans (in press).


10

  On proofs of the Pythagorean theorem see <http://www.cut-the-knot.org/pythagoras/index.shtml>.


11
Methodological background 47

is meant that they can be broken up in smaller parts that are individually compelling. Such parts
are often visual proofs (Pólya 1957). Other mathematicians abhor visual proofs and only recognize
‘symbol pushing’. Ideally, that would lead to a mathematics that would be fully independent of the
human mind, and be simply the (uninterpreted!) output of a Turing machine. In physics, visual
proofs are also common enough. Famous is the ‘Clootcransbewijs’ of Simon Stevin (Stevin 1586),
which yields an immediate insight in the truth of the vector addition of forces. Again, some physi-
cists would prefer to limit physics to ‘symbol pushing’ and ‘pointer readings’, in the interest of true
objectivity. Such would be physics beyond ‘human understanding’ in the usual sense. It could be
the (uninterpreted!) signal transmitted by a NASA Mars explorer. Since ‘true objectivity’ in the sci-
ences would exclude human intuition or understanding, it seems hardly a goal to strive for. Who
might be interested? True objectivity implies zero understanding. Somehow, one has to find the
right balance.
In experimental phenomenology such ‘symbol pushing’ or ‘pointer readings’ are to no avail, as
there are no formal theories with quantitative predictive power, and pointer readings belong to
dry physiology. Perceptual proofs have to be the major tool.

Methodologies in Experimental Phenomenology:


The Art of Devising Methods
So far I have given only the simplest and most direct methods used in experimental phenomenol-
ogy, namely hue naming and visual proof. It is not really possible or useful to attempt to sum up
exhaustively the methods to be mined from the literature. Description (like hue naming) is, of
course, a basic method, as is part–whole analysis12. The former is not quantitative, the latter per-
haps of a semi-quantitative nature. Here I mainly concentrate on quantitative methods. They are
too diverse, and depend much on the specific area of endeavor. For instance in acoustics, or music,
one is likely to use different methods from optics or the visual arts. However, there is perhaps
something like a common denominator to be found in the design process of such methods. Issues
that recur again and again in such design processes are:
•  Identification of the aspect to be studied, and possible ways to (hopefully) quantify it. For
instance, one might be interested in local surface shape, and parameterize it by two sectional
curvatures and an orientation. Often alternative parameterizations are possible, differing in
their degree of ‘naturalness’.
•  Ways to address the aspect. In the simplest case one might instruct the observer to name it.
•  Ways to check the consistency of the results. In the simplest case one might check repeatability
and inter-observer consistency; often ‘internal consistency’ checks are possible.
•  Ways to generalize the result over varying states of the environment.
Notice that it is easily possible to attempt to address aspects of the scene that the observer
has no clue how to find in immediate awareness. For instance, the range (distance to the eye) is
totally unavailable. Such aspects are outside the scope of experimental phenomenology. Yet it is
not uncommon to find attempts to measure such parameters in the literature. In order to avoid
such unfortunate choices, the experimenter needs to understand the task of the observer at the
gut level. This equally holds for the ‘naturalness’ of the parameterization. It is easy enough to try
to address ‘the same’ aspect in various parameterizations, leading to very different results. One
method might feel ‘natural’, the other ‘impossible’.

  On mereology see <http://plato.stanford.edu/entries/mereology/>.


12
48 Koenderink

I will draw some illustrative examples from our recent work, stressing the considerations lead-
ing up to the design of the method, and the types of result that were obtained.

Example A: Shape from shading


It is well known to visual artists that one effective way to evoke the awareness of pictorial shape
is artfully applied shading (Baxandall 1995). Various effective techniques of shading were devel-
oped over the centuries. In modern western culture shading also became a topic of optics.
Eventually the artistic techniques were ‘explained’ optically, and taught in the art academies all
over Europe. However, alternative artistic shading techniques, not based upon optical princi-
ples, also remain in widespread use. In experimental phenomenology one has often started from
the optical interpretation. It is important to understand that this is a rather limited approach.
A common optical pattern in this research is a circular disk on a uniform ground, filled with a
linear luminance gradient. This, no doubt, started as an attempt to design the simplest possible
‘elementary stimulus’. The linear gradient is conventionally considered to be the relevant param-
eter. That this is not correct is evident when you substitute a square for the disk: what first looked
spherical now looks cylindrical. Apparently the shape of the contour is every bit as important
as the gradient per se. The fact that the area of the disk appears in visual awareness as spherical,
either concave or (most frequently) convex, is known as ‘shape from shading’ (Wagemans, van
Doorn, and Koenderink 2011).
The spherical surface is an aspect of visual awareness, a mental thing. A minority of observers
fails to experience this; others only experience convexity; while for many observers convexity
and concavity alternate in apparently random fashion. The distinction ‘flat’ (no pictorial relief),
‘cup’, or ‘cap’ (concave or convex) can be made spontaneously by almost any observer. Most
research has indeed relied on naming, usually offering only the alternatives cup or cap (the fact
that some observers never have the awareness of a spherical surface seems a well-kept secret in
the community). This method is not unlike hue naming. The difference is that for many observ-
ers spontaneous cup–cap (or vice versa) flips occasionally occur (whereas red–green, or yel-
low–blue flips are unknown). The solution is to use a presentation time short enough to render
the number of flips during a presentation much smaller than one.
If various of these stimuli are simultaneously present, one notices that they tend to ‘synchronize’,
that is to say, they occur in awareness as all cup or all cap. Whether this happens depends upon
the precise configuration. If all gradients are lined up, synchronization is almost universal; if the
gradient directions are randomized synchronization is rare, except for observers who report only
‘flat’ or ‘convex’ in any case. How to probe this effect?
One simple way is to ask for a report ‘all cups’, ‘all caps’, or ‘mixed’ (van Doorn, Koenderink,
and Wagemans 2011; van Doorn, Koenderink, Todd, and Wagemans 2012). Again, one uses a
short enough presentation to avoid flips. The method can be made more discriminative by ask-
ing for the relation of specific pairs (van Doorn, Koenderink, and Wagemans 2011; van Doorn,
Koenderink, Todd, and Wagemans 2012). This can be implemented by marking the members of
the pair, for instance with dots. This introduces a complication, since the markers might conceiv-
ably affect the awareness. These are fairly typical issues met with in such problems. The reader
interested in the details of this specific case (indeed very instructive) should consult the literature.

Example B: Pictorial shape


Consider a simple picture like a portrait, or figure photograph, painting, or drawing. One may
look at the picture, and see a flat piece of paper covered with pigments in some simultaneous
Methodological background 49

order. One may also look into the picture and be aware of a pictorial space, filled with pictorial
objects. Pictorial objects are volumetric and bounded by surfaces, the pictorial reliefs. Different
from the picture surface, which is a physical object coexisting with the body of the observer in a
single space, the pictorial relief is a mental object without physical existence. It lives in immedi-
ate visual awareness. As such, it is a worthy object for study in experimental phenomenology
(Koenderink, van Doorn, and Wagemans 2011).
Pictorial reliefs are two-dimensional submanifolds of three-dimensional pictorial space. Pictorial
space is quite unlike Euclidean space (the space you move in) in that the depth dimension is not com-
mensurate with the visual field dimensions. Whereas the ontological status of the visual field dimen-
sions is in no way obvious, these dimensions do at least have analogues in the physical scene, namely
the dimensions that span the picture plane. Despite these fundamental differences, it is intuitively evi-
dent that an element (small patch) of pictorial relief can be parameterized by a spatial attitude (that is
to say, it could be seen frontally or obliquely), and by a shape. The attitude can be parameterized by two
angles, a slant (measure of obliqueness) and a tilt (the direction of slanting). Being a two-dimensional
patch, it is geometrically evident that the shape can be parameterized by two curvatures in mutually
orthogonal directions and an orientation. Thus one can parameterize a smallish patch of pictorial
relief by six parameters, its ‘depth’ (one parameter), its spatial attitude (two parameters), and its shape
(three parameters). One might consider it the task of experimental phenomenology to address these.
How to go about that (Koenderink, van Doorn, and Kappers 1992)?
Initially, it might seem easiest to go for the depth first, since it is a simple point property. In
the simplest implementation, one might ask an observer to do raw magnitude estimation. One
puts a mark (think of a red dot placed on a monochrome photograph) on the picture surface and
instructs the observer to call out the depth. One repeats this for many points, say in random order.
The result would be a ‘depth map’, evidently a desirable result of experimental phenomenology.
When you give this a try, you will find that it doesn’t work very well. The observer has no clue as
to absolute depth, only relative depths (depth differences between point pairs, say) appear to make
sense. Such point pair comparisons do indeed work to some extent, but—of course—they yield
depth only up to an arbitrary offset. Moreover, the spread in the result is rather high, and for some
point pairs the task is essentially an impossible one. This is an important insight: ‘depth at a point’
plays no role in visual awareness.
Spatial attitude is apparently a better target since observers can easily point out in which direc-
tion a surface element is slanted. How to measure attitude? The simplest method appears again
to be magnitude estimation. Put a mark on the picture surface, and have the observer call out the
slant and tilt angles in degrees. This experiment was actually performed by James Todd (Todd and
Reichel 1989), but unfortunately the results are not encouraging. Observers take a long time to
arrive at a conclusion, and results are very variable. Moreover, observers hate the task. It just fails
to feel ‘natural’. Are there methods to address spatial attitude that do feel natural?
One approach to the design of more natural methods relies on the method of coincidence. It is a
very general principle, also commonly used in the sciences. Consider how one measures length.
One designates a certain stick as the ‘unit of length’. One uses geometrical methods to produce
sticks of any length. For instance, cutting a unit stick into two equal pieces produces a stick of
one-half unit length. The judgment of equality does not require any length measurement itself,
thus does not introduce circularity. Likewise, putting two unit-length sticks in tandem produces
a stick of two unit lengths. And so forth. Measuring the length of an unknown stick involves
finding a stick of known length (they can be produced of any length) and judging equality. In
practice one produces a yardstick with marked subdivisions, puts the unknown stick next to it,
and notices coincidence of the endpoints of the stick with marks on the yardstick. This is the
50 Koenderink

gist of the method of coincidence13. The ancients refined it, and the same principle was applied
to weights. Later methods were found to extend the method to luminance, temperature, various
electrical variables, and so forth. Here I will mainly use the paradigm of the yardstick.
Notice what you need in order to apply this method of ‘length measurement’. First you need a
yardstick. Then you have to be able to put the yardstick next to the object to be measured. Finally
you need to be able to judge the coincidence of two fiducial points on your object with marks on
the yardstick. Each of these requirements might fail to be met. For instance, you have no yardstick
that would let you measure the distance to the moon. You are not able to apply the yardstick (use-
fully) to a coiled rope. And so forth. The method of length measurement implies that you succeed
in dealing with the various requirements.
In the case of pictorial surface attitude you have to design a ‘gauge figure’ (your analogue of the
‘yardstick’), you have to be able to place this object in pictorial space, on the pictorial surface, and
you have to be able to manipulate the gauge figure so as to bring about a ‘coincidence’. None of
these design objectives is trivial.
The gauge figure should be a pictorial object, since it should be inserted in pictorial space. This
means designing a picture of the gauge figure, in the expectation that it will produce a pictorial
object. The gauge figure should appear to have well-defined spatial attitude, for that is what we
would like to measure, and as few superfluous ‘frills’ as possible. Inspiration can be found in the art
of drawing. Artists often use ellipses to suggest spatial attitude, for instance in ‘bracelet shading’14,
spreading ripples on water, the shape of water lily leaves, the bottom hem of a dress, and so forth.
An oval makes a good gauge figure for attitude because it tends to look ‘like’ a slanted and tilted
circle.
How to place the gauge figure at the right location? Perhaps surprisingly, this turns out to be
easy. Almost anything you put on the picture surface will travel into depth till it meets a pictorial
surface on which it will stick. Mustaches and black teeth on posters of politicians are a case in
point. However, it is by no means a fail-safe method; some marks stubbornly look like flyspecks
on the pictorial surface. This is an important insight: in experimental phenomenology the aware-
ness of the experimenter is just as important as that of the observer! The ‘objectivity’ of experi-
mental phenomenology is shared subjectivity. Fortunately, the gauge figure tends to work well.
Simply superimposing an elliptical outline on the picture surface is enough to put the gauge on
the pictorial relief.
Finally, bringing about the coincidence is a simple matter. Most ellipses look like they are not
lying upon the surface, but at some angle to it. By changing the orientation and shape of the ellipse
you may bring about an awareness of the gauge figure as ‘a circle painted upon the surface’. This
is a striking visual fact; it looks very different from an ellipse that doesn’t fit. Of course, there is
little one can do in case the observer fails to agree. Such cases appear to be extremely rare though.
The only important design issue left is the interface. The observer somehow has to be able to
manipulate the ellipse. This is very important. If the interface is not ‘natural’ the method is not
going to work. You may gain an appreciation for this fact if you play with a simple kid’s game: writ-
ing your name with a device that uses two knobs controlling the Cartesian coordinates of the

  These are Eddington’s famous ‘pointer readings’ (Eddington 1928).


13

  ‘Bracelet shading’ derives from the way a (circular) bracelet reveals the shape of a cross-section of an arm,
14

leg, or neck. The hatching used in bracelet shading follows the curves obtained by cutting the shape by
planar sections perpendicular to its overall medial axis. The hatching may follow material features, for
instance, folds in sleeves often lend themselves very naturally to this technique.
Methodological background 51

writing implement. The ‘Etch a Sketch’ toy, a devilish French invention, manufactured by the Ohio
Art Company, does exactly that15. Writing anything, for instance your own name, is nearly impos-
sible, which accounts for the popularity of the device. Using a proper interface, observers bring
about coincidence in a few seconds. Participants consider it easy and generally fun to do. You
easily do hundreds of coincidences in a session of half an hour. In contradistinction, interfaces of
the Etch a Sketch type are a strain on the observer. Moreover, they lead to badly reproduceable
results, and take twice or thrice the time. In practice the difference is crucial. Yet from a ‘formal,
conceptual’ perspective the interface should make no difference at all. That’s why this section is
entitled the ‘art’ of devising methods. It is desirable that eventually such ‘art’ should be replaced
with principled methods, of course.
Notice that a natural interface is also crucial because of time constraints. The structure of picto-
rial space is volatile and may change to a noticeable degree over the span of an hour. This limits
the number of surface attitude samples that can be taken to a few hundred, even with a convenient
interface.
Such experiments are usually done on a computer screen because that makes it easy to imple-
ment the interface. Perhaps unfortunately, it also makes it trivial to put as many gauge figures
on the screen as you wish. This has induced people to plaster the surface with gauge figures, and
have the observer control the structure of an extensive gauge figure field. This is generally a bad
idea. Why? The reason is that ellipses are powerful cues (think of bracelet shading and so forth).
Indeed, you may as well remove the picture, for you will still see the pictorial surface, due to the
gauge figures alone. With the picture present it is easily possible to influence the pictorial relief
by adjusting the gauge figure field. Thus, the measurement influences the result. To minimize this
undesirable effect, we never show more than one gauge figure at a time, and do so in random
spatial order. Of course, there are many more possible artifacts of this type. Size, color, line thick-
ness, and so forth of the gauge figure are an important and integral part of the design. Such factors
co-determine the result, and should be considered part of the measurement.
Given a field of local surface attitudes, one may find an integral surface that ‘explains’ them as
well as possible. Some variations of attitude will have to be ignored by such a method, because not
just any field of attitudes admits of an integral surface. Thus, you obtain a very useful measure of
coherency of the result. If the spread in repeated settings accounts for the incoherence, then one
might say that a ‘pictorial surface exists’. This existence proof is a major advantage of these meth-
ods. In case a coherent surface exists, one obtains a depth map modulo an arbitrary offset. This is
an important point of departure for various important lines of experimental phenomenological
research.
There are a number of very common misunderstandings that may need special mention. I men-
tion two of these that have a bearing on the ontological status of the measurements.
One widespread misunderstanding is due to an overly cognitive interpretation of these meth-
ods. As I have argued above, the final task of the observer is to judge a coincidence. The gauge
figure should appear as ‘a circle painted upon the surface’ in immediate visual awareness. This is a
primitive awareness; it does not involve any reasoning. At least, that should be the case, or else the
method cannot be considered to be a method of experimental phenomenology. Neither cognition
proper (noticing the coincidence in no way involves recognition of the pictorial object, and so
forth), nor (a fortiori) reflective thought, should be involved. Yet people frequently interpret the
method in the following way. The observer is supposed to:

  On ‘Etch a sketch’ see <http://en.wikipedia.org/wiki/Etch_A_Sketch>.


15
52 Koenderink

1 Estimate the spatial attitude of the pictorial surface;


2 Estimate the spatial attitude of the gauge figure (notice that the sequence 1–2 or 2–1 is
immaterial);
3 Compare the two spatial attitude judgments. If no difference is apparent a ‘coincidence’ is
obtained.
This is a travesty of the actual process, reasonable as it may sound on first blush. For the aware-
ness of a coincidence does in no way involve the separate attitude estimates. Consider an exam-
ple: in measuring a length you in no way measure the length of the object, then the length of the
yardstick, and in the analysis compare the two measurements. You simply notice a coincidence.
The ‘double measurement’ method actually leads to infinite regress. That observers do not judge
separate attitudes in the performance of the task is obvious from the results of Todd’s experiment.
Observers are simply unable to do this. Observers notice a coincidence in a fraction of a second,
but take a minute to come up with a spatial attitude estimate. Moreover, the latter are very variable.
Closely related to this misrepresentation is the notion that the method requires one to ‘calibrate
the spatial attitude of the gauge figure’. The attitude of the gauge figure is specified by its physical
parameters, which are the slant and tilt angles used in the graphics-rendering algorithm. The atti-
tude of the local pictorial relief is then defined as the attitude of the coinciding gauge figure. This
is exactly like the use of the yardstick to measure lengths. There is no further need to ‘calibrate’
the attitude of the gauge figure. The calibration would imply either magnitude estimation (in
that case, why not estimate the spatial attitude of the pictorial surface directly?), or comparison
with another method, such as the spatial attitude of a palm board16 (and so forth), which merely
complicates the original problem with another—similar but different—problem: the idea leads to
infinite regress.

Conclusion
Experimental psychology is a very broad discipline. It encompasses subfields like dry physiology
(or behaviorism), cognitive science, and experimental phenomenology, which operate on mutu-
ally distinct ontological levels. This is unusual among the sciences. It is not intrinsically problem-
atic, but it starts to generate countless problems when one tries to enforce the same requirements
on ‘objectivity’ throughout. This is simply not possible. Of course, it isn’t even possible in physics,
but few people are ready to acknowledge that. Here I pleaded for the notion of ‘shared subjectivity’
as a pragmatic alternative to the virtual notion of scientific ‘objectivity’. At least it admits of graded
degrees of objectivity, instead of a mere binary objective/subjective distinction.
Once one recognizes the various ontological levels for what they are, it is evident that these
various levels require distinct methods. Dry physiology is perhaps the easiest case, because its
methods are essentially those of physics. The problem here is not so much in the methodol-
ogy as in its conceptual approaches: the physiological data are often interpreted in terms of
mental entities (e.g. visual awareness), which amounts to an unfortunate confusion of levels.
The behaviorists were far more consequent in considering speech as amounting to the move-
ment of air molecules. Cognitive science approaches perception on the functional level, which

16  A ‘palm board’ is a planar surface on which one may rest one’s hand palm, and that may be rotated in any
desired spatial attitude. The angles parameterizing the attitude are read out, usually in some electronic way.
The palm board is useful as an interface device that may be used to indicate the perceived spatial attitude
of some object.
Methodological background 53

is fine; it has developed a large toolbox of very useful methods. The problems are again a
frequent confusion of levels, in this case in two directions. Functional entities are often inter-
preted in both neural and mental terms (qualities and meanings), frequently in ways that are
rather far-fetched. Finally, experimental phenomenology studies the structure (in terms of
qualities and meanings) of perceptual awareness. It has to use its own methodology, in terms
of first-person accounts, mainly based on immediate ‘perceptual proofs’. This, again, is fine as
it goes. Problems occur as the conceptual interpretation crosses ontological levels. A historic
failure of this kind was the interpretation of Gestalt properties in terms of isomorphic neural
activity.
Of course, there is no problem with any one person freely moving back and forth between
researches on distinct ontological levels. On the contrary, such frequent excursions are very much
to the benefit of experimental psychology! However, a serious attempt at the recognition of the
ontological chasms is essential. Overstepping the boundaries should require explicit mention of
the psychophysical ‘bridging hypotheses’. Unfortunately, and to its disadvantage, the scientific
community fails to enforce that.

References
Albertazzi, L. (forthcoming). ‘Philosophical Background: Phenomenology’. In The Oxford Handbook of
Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Aristotle (ca.350 BCE). De Anima. Available as download from the Internet Classics Archive, <http://classics.
mit.edu/Aristotle/soul.html>.
Baxandall, Michael (1995). Shadows and Enlightenment (London, New Haven: Yale University Press).
Berlin, B. and P. Kay (1969). Basic Color Terms: Their Universality and Evolution (Berkeley, CA: University
of California Press).
Bouman, M. A. (1952). ‘Mechanisms in Peripheral Dark Adaptation’. JOSA 42: 941–950.
Charpentier, A. (1891). ‘Analyse expérimentale: De quelques élements de la sensation de poids’
[Experimental study of some aspects of weight perception]. Arch Physiol Norm Pathol 3: 122–135.
Eddington, Arthur Stanley (1928). The Nature of the Physical World (New York: Macmillan).
Ehrenstein, W. H. and A. Ehrenstein (1999). ‘Psychophysical Methods.’ In Modern Techniques in
Neuroscience Research, ed. U. Windhorst and H. Johansson, ch. 43 (New York: Springer).
Farell, B. and D. G. Pelli (1999). Psychophysical Methods, or How to Measure a Threshold, and Why. In
Vision Research: A Practical Guide to Laboratory Methods, ed. R. H. S. Carpenter and J. G. Robson, pp.
129–36 (New York: Oxford University Press).
Fechner, Gustav Theodor (1860). Elemente der Psychophysik (Leipzig: Breitkopf and Härtel). Available for
download from <http://archive.org/stream/elementederpsych02fech#page/n5/mode/2up>.
Koenderink, J. J. and A. J. van Doorn (1979). ‘Spatiotemporal Contrast Detection Threshold Surface is
Bimodal.’ Optics Letters 4: 32–34.
Koenderink, J. J., A. J. van Doorn, and A. L. M. Kappers (1992). ‘Surface Perception in Pictures.’
Perception & Psychophysics 52: 487–496.
Koenderink, J. J., A. J. van Doorn, and J. Wagemans (2011). ‘Depth.’ i-Perception (special issue on Art &
Perception) 2: 541–564.
Lowell, Percival (1911). Mars and its Canals (New York, London: Macmillan). Available for download on
<http://archive.org/details/marsanditscanals033323mbp>. Last accessed. Sept 25 2013
Luce, R. D. (1959). ‘On the Possible Psychophysical Laws.’ Psychological Review 66(2): 81–95.
Pelli, D. G. and B. Farell (1995). ‘Psychophysical Methods.’ In Handbook of Optics, vol. I, 2nd edn, ed.
M. Bass, E. Wvan Stryland, D. R. Williams, and W. L. Wolfe, pp. 29.1–29.13 (New York: McGraw-Hill).
Pólya, George (1957). How to Solve It (Garden City, NY: Doubleday).
54 Koenderink

Poulton, E. C. (1968). ‘The New Psychophysics: Six Models for Magnitude Estimation.’ Psychological
Bulletin 69: 1–19.
Puccetti, Roland (1977). ‘The Great C-Fiber Myth: A Critical Note.’ Philosophy of Science 44(2): 303–305.
Reich, E. S. (2011). ‘Speedy Neutrinos Challenge Physicists.’ Nature News 477 (27 September): 520.
Silberstein, Michael and John McGeever (1999). ‘The Search for Ontological Emergence.’ The Philosophical
Quarterly 49(195): 201–214.
Sivian, L. J. and S. D. White (1933). ‘On minimal audible sound fields’. J Acoust Soc 4: 288.
Stevens, S. S. (1951). Handbook of Experimental Psychology (New York: Wiley).
Stevens, S. S. (1957). ‘On the Psychophysical Law.’ Psychological Review 64(3): 153–181.
Stevin, Simon (1586). De Beghinselen der Weeghconst. Published in one volume with De Weeghdaet, De
Beghinselen des Waterwichts and an Anhang (appendix) (Leiden: Plantijn).
Thurstone, L. L. (1927). ‘A Law of Comparative Judgment.’ Psychological Review 34: 273–286.
Thurstone, L. L. (1929). ‘The Measurement of Psychological Value.’ In Essays in Philosophy by Seventeen
Doctors of Philosophy of the University of Chicago, ed. T. V. Smith and W. K. Wright, pp. 157–174
(Chicago: Open Court).
Todd, J. T. and F. D. Reichel (1989). ‘Ordinal Structure in the Visual Perception and Cognition of Smooth
Surfaces.’ Psychological Review 96: 643–657.
Treutwein, B. (1995). ‘Adaptive Psychophysical Procedures.’ Vision Research 35(17): 2503–2522.
van Doorn, A. J., J. J. Koenderink, and J. Wagemans (2011). ‘Light Fields and Shape from Shading’. Journal
of Vision 11: 1–21.
van Doorn, A. J., J. J. Koenderink, J. T. Todd, and J. Wagemans (2012). ‘Awareness of the Light Field: The
Case of Deformation. i-Perception 3(7): 467–480.
Varela, F., H. Maturana, and R. Uribe (1974). ‘Autopoiesis: The Organization of Living Systems, its
Characterization and a Model.’ Biosystems 5: 187–196.
Wagemans, J., A. J. van Doorn, and J. J. Koenderink (2011). ‘The Shading Cue in Context.’ i-Perception 1:
159–178.
Wagemans, J. (forthcoming) ‘Historical and Conceptual Background: Gestalt Theory.’ In The Oxford
Handbook of Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Weber, Ernst Heinrich (1905). Tastsinn und Gemeingefühl, ed. Ewald Hering (orig. 1846), Ostwald’s
Klassiker No. 149 (Leipzig: W. Engelmann). Available for download from <http://archive.org/details/
tastsinnundgeme00unkngoog>.
Zadeh, L. A. (1965). ‘Fuzzy Sets.’ Information and Control 8(3): 338–353.
Section 2

Groups, patterns, textures


Chapter 4

Traditional and new principles


of perceptual grouping
Joseph L. Brooks

Within the wider study of perceptual organization, research on perceptual grouping examines how
our visual system determines what regions of an image belong together as objects (or other useful
perceptual units). This is necessary because many objects in real world scenes do not project to
a continuous region of uniform color, texture, and lightness on the retina. Instead, due to occlu-
sion, variations in lighting conditions and surface features, and other factors, different parts of
a single object often result in a mosaic of non-contiguous regions with varying characteristics
and intervening regions associated with other, overlapping objects. These diverse and disparate
image regions must be united (and segregated from those arising from other objects and surfaces)
to form meaningful objects, which one can recognize and direct actions toward. Also, meaning
may appear not only in the shape of individual objects, but in the spatial and temporal relation-
ships between them. For instance, the arrangement of individual objects may form a higher-order
structure, which carries an important meaning, such as pebbles on a beach arranged to form a
word. Perceptual grouping is one process by which disparate parts of an image can be brought
together into higher-order structures and objects.

Classic principles of perceptual grouping


Because perceptual grouping is not indicated directly by the pattern of light falling on the retinae, it
must be derived from the available sensory information. Work by Gestalt psychologists on this prob-
lem in the early twentieth century identified a set of what are now known as principles (or factors)
of perceptual grouping. Many of the classic principles were first articulated as a set of ‘laws’ by Max
Wertheimer (1923). Each classic principle described how grouping amongst a set of elements in a
simple image (e.g., Figure 4.1A) was affected by varying properties of those elements relative to one
another. For instance, when the spatial positions of dots are altered such that pairs of dots are more
proximal to each other than they are to other dots (Figure 4.1B), the entire array tends to be seen
as four groups of two dots, rather than as eight independent dots1. Wertheimer called this effect the
principle of proximity and gave clear demonstrations of its effects on visual perception. Proximity is
not the only factor that Wertheimer proposed as a grouping principle. His paper listed what are now
considered to be some of the other classic Gestalt principles of perceptual grouping. In this section,
I will examine each of these classic principles and describe their origin in Wertheimer’s work as well
as review some modern work that has extended our understanding of how these principles work.

  Although grouping is often described as the unification of independent perceptual elements, it is also possible
1

to see this as the segmentation of a larger perceptual unit (the linear group of eight dots) into four smaller
groups. Regardless of whether it is segmentation or unification, the end result is the same.
(a)

(b)

(c)

(d)

(e)

(f)

   
(g)    

(h) (i)

a d
(j) (k)
b c

Fig. 4.1  Examples of some classic Gestalt image-based grouping principles between elements.
(a) Horizontal array of circular elements with no grouping principles forms a simple line. (b) When
the spatial positions of elements are changed, the elements separate into groups on the basis of
proximity. Elements can also be grouped by their similarity in various dimensions such as (c) color,
(d) shape, (e) size, and (f) orientation. (g) Similarity in the direction of motion (as indicated by
the arrow above or below each element) of elements is referred to as common fate and causes
elements with common motion direction to group together. (h) Curvilinear elements can be grouped
by symmetry or (i) parallelism. (j) Good continuation also plays a role in determining what parts
of a curve go together to form the larger shape. In this case, the edges grouping based on their
continuous link from upper left to lower right and lower left to upper right. (k) However, closure can
reverse the organization that is suggested by good continuation and cause perception of a bow-tie
shape.
Adapted from Palmer, Stephen E., Vision Science: Photons to Phenomenology, figures 6.1.2, © 1999
Massachusetts Institute of Technology, by permission of The MIT Press.
Traditional and New Principles of Perceptual Grouping 59

Proximity: quantitative accounts
Although Wertheimer convincingly demonstrated a role for proximity in grouping, he did not
provide a quantitative account of its influence. Early work on this issue by Oyama (1961) used
simple, rectangular 4 × 4 dot lattices in which the distance along one dimension was constant but
varied (across trials) along the other dimension (Figure 4.2A,B). During a 120-second observa-
tion period, participants continuously reported (by holding down one of two buttons) whether
they saw the lattice as rows or columns at any given time. The results clearly demonstrated that
as the distance in one dimension changed (e.g. horizontal dimension in Figure 4.2A,B) relative to
the other dimension, proximity grouping quickly favored the shortest dimension according to a
power function, a relationship found elsewhere in psychophysics (Luce, 2002; Stevens, 1957) and
other natural laws. Essentially, when inter-dot distances along one dimension are similar to one
another, a small change in inter-dot distance along one dimension can strongly shift perceived
grouping. However, the effect of that same change in inter-dot distance falls off as the initial dif-
ference in inter-dot distance along the two dimensions grows larger.
The above relationship, however, only captures the relative contributions of two (vectors a
and b, Figure 4.2C) of the many possible organizations (e.g., vectors a–d, Figure 4.2C) within the

(a) (b)

b b
a a

(c) (d)

b β
a αγ
c

Fig. 4.2  Dot lattices have been used extensively to study the quantitative laws governing grouping
by proximity. (a) When distances between dots along vectors a and b are the same, participants are
equally likely to see columns and rows. (b) As one distance, b, changes relative to the other, a, the
strength of grouping along the shorter distance is predicted by a negative power function. (c) Dot
lattices have many potential vectors, a–d, along which grouping could be perceived even in a simple
square lattice. (d) Dot lattices can also fall into other classes defined by the relative length of their
two shortest inter-dot distances and the angle between these vectors, γ. In all of these lattices, the
pure distance law determines the strength of grouping.
60 Brooks

lattice. Furthermore, the square and rectangular lattices in Figures 4.2A–D are only a subset of the
space of all possible 2D lattices and the power law relationship may not generalize beyond these
cases. In a set of elegant studies, Kubovy and Wagemans (1995), and Kubovy et al. (1998) first
generated a set of stimuli that spanned a large space of dot lattices by varying two basic features:
(1)  The lengths of their shortest inter-dot distances (vectors a and b, Figure 4.2C,D).
(2)  The angle between these vectors, γ.
They then briefly presented these stimuli to participants and asked them to choose which of four
orientations matched that of the lattice. They found that, across the entire range of lattices in all
orientations, grouping depended only on the relative distance between dots in the various pos-
sible orientations, a relationship that they called the pure distance law. Although the space of all
lattices could be categorized into six different classes depending on their symmetry properties,
this global configuration aspect did not affect the grouping in these lattices, leaving distance as
the only factor that affects proximity grouping. More recently though, it has been found that other
factors, such as curvilinear structure, can also play a role in grouping by proximity (Strother and
Kubovy, 2006).

Common fate
Wertheimer appreciated the influence of dynamic properties on grouping when he proposed
the well-known principle of common fate (Figure 4.1G). The common fate principle (which
Wertheimer also called ‘uniform destiny’) is the tendency of items that move together to be
grouped. Common fate is usually described with grouped elements having exactly parallel motion
vectors of equal magnitude as in Figure 4.1G. However, other correlated patterns of motion,
such as dots converging on a common point and co-circular motion can also cause grouping
(Ahlström, 1995; Börjesson and Ahlström, 1993). Some of these alternative versions of common
motion are seen as rigid transformations in three-dimensional (3D) space. Although common
fate grouping is often considered to be very strong, to my knowledge, there are no quantitative
comparisons of its strength with other grouping principles. Recently, it has been proposed that
common fate grouping may be explained mechanistically as attentional selection of a direction of
motion (Levinthal and Franconeri, 2011).

Similarity grouping
When two elements in the visual field share common properties, there is a chance that these two
elements are parts of the same object or otherwise belong together. This notion forms the basis
for the Gestalt grouping principle of similarity. One version of similarity grouping, and the one
that Wertheimer originally described, involves varying the colors of the elements (Figure 4.1C).
Items that have similar colors appear to group together. However, other features can also be varied
such as the shape (Figure 4.1D), size (Figure 4.1E), or orientation (Figure 4.1F) of the elements.
Although these variations on the principle of similarity are sometimes demonstrated separately
from one another (e.g., Palmer, 1999), Wertheimer appeared to favor the notion of a general prin-
ciple of similarity when he described it as ‘the tendency of like parts to band together.’ Thus, the
list of features given above is not meant to be an exhaustive set of features on which similarity
grouping can occur. Instead, there may be as many variations of the similarity principle as there
are features to be varied (e.g., texture, specularity, blur). However, many of these variations of
similarity grouping have not been studied systematically, if at all. Furthermore, the generality of
the similarity principle may also encompass other known principles as variations of similarity. For
Traditional and New Principles of Perceptual Grouping 61

instance, the principle of proximity may be thought of as similarity of position and classic com-
mon fate as similarity of the direction of movement. However, despite the ability to unify these
principles logically, the extent to which they share underlying mechanisms is unclear.

Symmetry
The world does not solely comprise dots aligned in rows or columns. Instead, elements take many
forms and can be arranged in patterns with varying forms of regularity. Mirror symmetry is a par-
ticular type of regularity that is present in a pattern when half of the pattern is the mirror image of
the other half. Such symmetrical patterns have been found to be particularly visually salient. For
instance, symmetry has clear effects on detection of patterns in random dot fields, contours, and
other stimuli (e.g., Machilsen et al., 2009; Norcia et al., 2002; Wagemans, 1995). However, when a
symmetrical pattern is tilted relative to the frontal plane, its features in the image projected to the
retinae are no longer symmetrical. Nonetheless, the detection advantage seems to be robust even
in these cases of skewed symmetry although it is clearest if symmetry is present in several axes
(e.g., Wagemans, 1993; Wagemans et al., 1991). However, not all symmetries are equal. A substan-
tial number of studies have found that symmetry along a vertical axis is more advantageous than
symmetry along other axes (e.g., Kahn and Foster, 1986; Palmer and Hemenway, 1978; Royer,
1981). However, symmetry along the horizontal axis has also been found to be stronger than sym-
metry along oblique angles (e.g., Fisher and Bornstein, 1982). Symmetry detection is also robust
to small deviations in the corresponding positions of elements in the two halves of the symmetric
pattern (Barlow and Reeves, 1979). The study of symmetry, its effects on detection and factors that
modulate it has been extensive and this is discussed in more detail elsewhere in this volume (van
der Helm, ‘Symmetry Perception’ chapter, this volume). It is important to point out that many
studies of symmetry (including those mentioned above) do not measure perceived grouping
directly, as was often the case for many of the other principles described above. Symmetry group-
ing has tended to be measured by its effect on pattern detection or ability to find a pattern in noise.
The extent to which performance in these tasks reflects perceived grouping, per se, rather than
other task-related changes due to symmetry is unclear. Nonetheless, phenomenological demon-
strations of symmetry grouping are often presented as evidence of the effect (e.g., Figure 4.1H).
One rationale for a symmetry grouping and detection mechanisms is that they are designed to
highlight non-accidental properties that are unlikely to have been caused by chance alignment of
independent elements. Alternatively, symmetry may allow particularly efficient mental or neural
representations of patterns (van der Helm, ‘Simplicity in Perceptual Organization’ chapter, this
volume). Symmetry also appears to be a common feature of the visual environment. Artefacts of
many organisms are often symmetrical (Shubnikov and Koptsik, 1974; Weyl, 1952). However, it is
not clear whether this is a cause of visual sensitivity to symmetry, an effect of it, or whether both
of these are caused by some other adaptive benefit of symmetry.

Good continuation, relatability, closure, and parallelism


The principle of good continuation is often demonstrated by showing that some line segments
form a ‘better’ continuation of a particular curve. For instance, the line segments in Figure 4.1J
are likely to be seen as two, continuous intersecting curves, one going from upper left to lower
right (segments a + c) and the other from lower left to upper right (segments b + d). Of course,
one could see a + b and d + c or even a + d and b + c, but these are seen as less good continua-
tions and thus less likely to be perceived. What defines a good continuation? Wertheimer (1923)
suggested that good continuations of a segment proceed in a direction that ‘carry on the principle
62 Brooks

logically demanded” from the original element, i.e. a ‘factor of direction,’2, as he actually called it.
In Figure 4.1J this seems to correspond roughly to collinearity, or minimal change in direction,
because at their junction ac and bd are more collinear than the alternative arrangements. However,
other examples that he used (Figure 4.3B) suggest that this may not be exactly what he meant.
Wertheimer’s definition was not specific, and largely based on intuition and a few demonstrations.
In modern work, good continuation has been largely linked with work on contour integration
and visual interpolation. Contour integration studies largely examine what factors promote group-
ing of separate (not connected) oriented elements (Figure 4.3C) into contours, which are detectable
in a field of otherwise randomly orientated elements. Collinearity, co-circularity, smoothness, and
a few other features play prominent roles in models of good continuation effects on contour inte-
gration (e.g., Fantoni &and Gerbino, 2003; Field et al., 1993; Geisler, Perry, Super, & Gallogly et al.,
2001; Hess, May, & Dumoulin, this volume; Pizlo, Salach-Golyska, & Rosenfeld et al., 1997; Yen
& Finkel, 1998). Although these definitions of good continuation are clearly specified, the stimuli
and tasks used are very different from those of Wertheimer and may have different mechanisms.
Good continuation is also often invoked in models of interpolation that determine the likelihood
of filling in a contour between two segments on either side of an occluder (e.g., Wouterlood and
Boselie, 1992). One criterion for interpolation is whether two contours are relatable (Kellman and
Shipley, 1991), i.e. whether a smooth monotonic curve could connect them (roughly speaking).
Relatability is another possible formal definition of good continuation, although they may be related,
but distinct concepts (Kellman et al., 2010). This is an issue that needs further study. Completion
and its mechanisms are discussed at length elsewhere in this volume (Singh; van Lier & Gerbino).
Wertheimer also recognized the role for closure in grouping of contours. This is demonstrated
in the bow-tie shape in Figure 4.1K, which overcomes the grouping by good continuation that was
stronger in Figure 4.1J. Several contour integration studies have also examined the role of closure
in perceptual grouping of contour elements. Many find effects of closure on grouping and contour
detection (e.g., Mathes and Fahle, 2007), although these may be explainable by other mechanisms
(Tversky et al., 2004). Contours can also be grouped by parallelism (Figure 4.1I). However, this
effect does not appear to be particularly strong and contour symmetry seems to be better detected
(e.g., Baylis and Driver, 1994; Corballis and Roldan, 1974).

Ceteris paribus rules


The classic grouping principles described above have stood the test of time and have formed the
basis for a substantial amount of modern research on perceptual grouping. Even from the first
demonstrations by Wertheimer though, it was clear that the principles are not absolute. Rather,
they operate as ceteris paribus rules. This Latin phrase is translated literally as ‘other things being
equal.’ Thus, as long as other factors are equated between two elements, then the factor in question
will affect grouping between the elements. By creating simple displays, which varied one factor
at a time, the Gestalt psychologists were able to provide convincing evidence for their principles.
In any given display though, multiple factors can be present at once and in this case, factors may
reinforce one another or compete against one another. For example, proximity of elements in the
array in Figure 4.4A may favor grouping to form rows. This organization is also supported by
the similarity of the colors. However, Figure 4.4B shows an example of how color similarity and

2  Wertheimer also used the term ‘factor of good curve’ in this section of his manuscript to describe an effect
that seems to be similar to his use of ‘factor of direction’ and the modern use of good continuation. However,
Wertheimer did not explicitly describe any differences between the nature of these two factors.
(a)
b
a

(b)

a c

(c)

Fig. 4.3  (a) Good continuation favors a grouping of ac with b as an appendage. This may be due
to segment c being collinear or continuing the same direction as a. (b) Good continuation may
not always favor the smallest change in direction. Segment c seems to be a better completion of a
than b despite b being tangent to the curve (and thus having minimum difference in direction) at
their point of intersection. (c) A stimulus commonly used in contour integration experiments with a
circular target contour created by good continuation and closure in the alignment of the elements.
64 Brooks

(a) (b) (c)

Fig. 4.4  When multiple grouping principles are present in the same display, they may reinforce one
another or compete against one another. (a) When both proximity and color similarity (indicated by
filled versus unfilled dots here) favor organization into rows, they reinforce each other and result in a
clear perception of rows. (b) When proximity grouping favors a rows organization and color similarity
favors columns, the factors compete against one another and this can result in perceptual ambiguity.
(c) With near maximal proximity of elements favoring rows, this factor can overcome the competition
with color similarity and result in a perception of rows.

proximity may work in opposition of one another. In this case, the grouping becomes somewhat
ambiguous. Ultimately, the resulting organization depends on the relative strengths of the two
grouping factors. With proximity at nearly maximum, it gains the upper hand and can overcome
the competing influence of color similarity (Figure 4.4C). Pitting grouping principles against
one another has served as one way to measure the relative strength of grouping principles (e.g.,
Hochberg and Silverstein, 1956; Oyama et al., 1999; Quinlan and Wilton, 1998). However, some
grouping principles may operate faster than others and this may affect their relative effectiveness
against one another in addition to the relative degree to which each principle is present in the
display (Ben-Av and Sagi, 1995).

Recent principles of perceptual grouping


The classic Gestalt grouping principles dominated the stage for most of the 20th century. However,
within the last 20–30 years, modern vision scientists have begun to articulate new principles of
grouping. Some of these are variations or generalizations of Gestalt principles, but others are com-
pletely new. Several of these involve dynamic properties of stimuli, which are much easier to appre-
ciate given modern computerized methods for generating visual content. Although many of the
new principles can be appreciated by demonstrations, modern vision scientists typically quantify
their data using measures of phenomenological psychophysics (Strother et al., 2002), which quan-
tify the reported perceptual outcomes, as well as indirect measures that reflect effects of grouping
on task performance. For some principles, this has led to a robust understanding of the conditions
under which they occur and factors that affect their functioning. The sections below attempt to
describe most of these recent grouping principles and what we know about their function.

Common region
The principle of common region (Figure 4.5B) recognizes the tendency for elements that lie within
the same bounded region to be grouped together (Palmer, 1992). Elements grouped by common
region lie within a single, continuous, and homogenously colored or textured region of space or
within the confines of a bounding contour. The ecological rationale for this grouping principle
Traditional and New Principles of Perceptual Grouping 65

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4.5  Grouping by common region. (a) A set of ungrouped dots. (b) Dots grouped by common
region as indicated by an outline contour. Common region can also be indicated by regions of
common color, texture or other properties. (c) Common region can compete effectively against
grouping by color similarity, as well as against (d) grouping by proximity. (e) In the repetition
discrimination task, the repetition of two shapes in the element array—two circles here—can occur
within the same object or (f) between two different objects (repeated squares in this case).

is clear. If two elements, eyes for instance, are contained within an image region, of a head, then
they are likely to belong together as part of that object, rather than accidentally appearing together
within the same region of space. The effects of common region can compete effectively against
other grouping principles such as color similarity (Figure 4.5C) and proximity (Figure 4.5D).
Palmer (1992) also found evidence that the common region principle operates on a 3D represen-
tation of the world. When he placed elements within overlapping regions, there was no basis for
grouping to go one way or the other. However, if the dot elements were placed in the same depth
plane as some of the oval regions (using stereoscopic displays), then the dots tended to be grouped
according to the regions within their same depth plane. These results suggest that grouping by
common region can operate on information that results from computations of depth in images
and thus may not be simply an early, low-level visual process. It is also worth noting that unlike
all of the classic Gestalt principles that are defined around the relative properties of the elements
themselves, grouping by common region depends on a feature of another element (i.e. the bound-
ing edge or enclosing region) separate from the grouped elements themselves. Although common
region can be appreciated through demonstrations like those in Figure 4.5, indirect methods have
provided corroborative evidence for this grouping factor and others. For instance, in the Repetition
66 Brooks

(a) (b)

Fig. 4.6  Generalized Common Fate was demonstrated using displays comprising (a) square elements
and each element was initially assigned a random luminance and this oscillated over time. (b) For
a subset of these elements, the target (outlined in black here), their luminances oscillated out of
phase with the rest of the elements. This means that, although the elements within the target had
varying luminances (and similar to non-target luminances) they were distinguished by their common
direction of change.

Discrimination Task, abbreviated RDT (Palmer and Beck, 2007) participants see a row of elements
that alternates between circles and squares. One of the elements, either the circle or the square
repeats at one point, and the participant’s task is to report which shape it is. Participants are faster
at this when the repeat occurs within the same group (Figure 4.5E) than when it appears between
two different groups (Figure 4.5F). Because performance on this task is modulated by grouping,
it can be used to quantify grouping effects indirectly and corroborate findings in direct subjective
report tasks. Although such indirect measures may be less susceptible to demand characteristics,
it is important to point out that there is no guarantee that they reflect purely what people actually
see. Indirect measures may also reflect a history of the processing through which a stimulus has
gone even if that history is not reflected in the final percept. Such effects have been demonstrated in
experiments on figure-ground organization in which two cues are competing against one another
to determine which side of an edge is figural. Even though one particular cue always wins the
competition and causes figure to be assigned to its side, the presence of a competing cue suggest-
ing figural assignment to the other side affects response time in both direct report and other task
such as same-difference matching (e.g., Brooks and Palmer, 2010; Peterson and Enns, 2005). Even
clearer cases of the dissociation between implicit measures and conscious perception have been
seen in neurological patients. For instance, patients with blindsight can act toward an object even
though they cannot consciously see it (e.g., Goodale et al., 1991).

Generalized common fate


The classic principle of common fate is typically described as the grouping that results from ele-
ments moving with a similar speed and direction. Although Wertheimer described common fate
with reference to motion, it is not clear that he intended the definition to be limited to com-
mon motion. In a section of text that was not included in the well-known English translation
of his work (Wertheimer, 1938), Wertheimer wrote that the common fate principle ‘applies to a
wide range of conditions; how wide, is not discussed here’ (Wertheimer, 2012). Recently, Sekuler
and Bennett (2001) have demonstrated that grouping can also be mediated by common direc-
tion of luminance changes. They presented participants with square grids (Figure 4.6A) in which
the luminance of each square element was initialized at a random value and then modulated
sinusoidally over time around its initial luminance. A subset of the elements (outlined in black,
Traditional and New Principles of Perceptual Grouping 67

Figure 4.6B) was designated as the target and modulated out of phase with the rest of the elements.
Participants had to determine the orientation (horizontal or vertical) of this target. To the extent
that elements within the target group together (and segment from the other elements) based on
their common luminance changes, discrimination of the target orientation should be easier. The
results demonstrated a strong effect of generalized common fate by common luminance changes.
Importantly, the authors made significant efforts to control for the effects of static luminance
cue differences between the target and non-target areas of the image to ensure that this is a truly
dynamic cue to grouping. Although this grouping cue has been linked with classic common fate
by name, it is not clear whether it is mediated by related mechanisms.

Synchrony
The common fate principles discussed above capture how commonalities in the direction of
motion or luminance can cause grouping. However, elements which have unrelated directions
of change can group on the basis of their temporal simultaneity alone (Alais et al., 1998; Lee and
Blake, 1999). For instance, consider a matrix of small dots that change color stochastically over
time. If a subset of the elements change in synchrony with one another, regardless of their different
changes of direction, these elements group together to form a detectable shape within the matrix.
Lee and Blake (1999) claimed that in their displays, synchrony grouping cannot be computed
on the basis of static information in each frame of the dynamic sequence. This is because, for
instance, in the color change example describe above, the element colors in each frame are identi-
cally, and randomly distributed within both the grouped region and the background. It is only the
temporal synchrony of the changes that distinguishes the grouped elements from the background.
This is in contrast to previous evidence of synchrony grouping which could be computed on the
basis of static image differences at any single moment in time (e.g., Leonards et al., 1996; Usher
and Donnelly, 1998). Lee and Blake argued that purely temporal synchrony requires computing
high order statistics of images across time and is a new form of grouping that cannot be explained
by known visual mechanisms. However, this claim has proved controversial (Farid, 2002; Farid
and Adelson, 2001) and some have argued that temporal structure plays a more important role
than temporal synchrony (Guttman et al., 2007). The rationale for the existence of grouping by
pure synchrony is also controversial. Although it seems reasonable that synchronous changes in
elements of the same object are common in the visual world, it seems unlikely that these are com-
pletely uncorrelated with other aspects of the change (as is required for pure synchrony grouping),
although this appears not to have been formally tested.

Element connectedness
Distinct elements that are connected by a third element (Figure 4.7B) tend to be seen as part of
the same group (Palmer and Rock, 1994). This effect can compete effectively against some of the
classic grouping principles of proximity and similarity (Figure 4.7C,D) and it does not depend on
the connecting element to have the same properties as the elements themselves or to form a con-
tinuous unbroken region of homogeneous color or texture (Figure 4.7E). The ecological rationale
for element connectedness is simple. Many real-world objects comprise several parts that have
their own color, texture, and other properties. Nonetheless, the elements of these objects are often
directly connected to one another. The phenomenological demonstration of grouping by element
connectedness has also been corroborated by evidence from the RDT (Palmer and Beck, 2007)
that was used to provide indirect evidence for the common region principle. The powerful effects
of this grouping principle are also evident by how it affects perception of objects by neurological
68 Brooks

(a)

(b)

(c)

(d)

(e)

Fig. 4.7  Grouping by element connectedness. (a) Ungrouped elements. (b) Connecting elements into


pairs units them into four groups. (c) Element connectedness competes effectively against the classic
principle of proximity. (d) Element connectedness competes effectively against the classic principle
of similarity. (e) Element connectedness does not require the connecting element to have the same
properties or to form a continuous area of the same color or texture.

patients. Patients with Balint’s syndrome suffer from the symptom of simultanagnosia, i.e. they are
unable to perceive more than one object at a time (see Gillebert & Humphreys, this volume). For
instance, when presented with two circles on a computer screen, they are likely to report seeing
only one circle. However, when these two circles are connected by another element to form a bar-
bell shape, the patient can suddenly perceive both of the objects (Humphreys and Riddoch, 1993).
Similar effects of element connectedness have been shown to modulate hemi-spatial neglect
(Tipper and Behrmann, 1996).

Non-accidentalness and regularity


According to the pure distance law of proximity grouping, the relative distance between ele-
ments in two competing organizations is the only driver of grouping strength. This was found
to be the case in rectilinear dot lattices (Kubovy and Wagemans, 1995). However, when different
dot structures were investigated, it became clear that curvilinear grouping patterns (e.g., Figure
4.8A) could be stronger than rectilinear groupings (Strother and Kubovy, 2006) even with dis-
tance between elements held constant. This suggests that proximity alone is not the only factor to
govern grouping in these patterns. Strother and Kubovy (2012) have suggested that this effect is
due to curvilinear arrangements of elements being particularly non-accidental. That is, they claim
that repeated alignment of elements along parallel curves is very unlikely to have occurred by
the chance alignment of independent elements. Therefore, it is more likely that the elements are
somehow related to one another and thus should be seen as grouped rather than independent ele-
ments. In support of this, Strother and Kubovy found evidence that when two curvilinear group-
ing patterns were competing against one another (e.g., Figure 4.8A), the pattern with the stronger
Traditional and New Principles of Perceptual Grouping 69

(a) (b) (c)

Fig. 4.8  (a) A dot-sampled structured grid with two competing patterns of curvilinear structure.
(b) Curvilinear structure along this dimension in panel A has less curvature and is, therefore, less
likely to be perceived in comparison to structure along the direction showed in (c), which has a
stronger curve and is most likely to be perceived as the direction of curvilinear grouping.

curve was more likely to be perceived than the less curved competitor. For instance, the dot stimu-
lus in Figure 4.8A could be organized along the more shallow curve represented by Figure 4.8B
or along the stronger curve represented by Figure 4.8C. Greater curvature caused grouping even
if the distances between dots along the two curves were equal, ruling out an explanation in terms
of proximity. Parallel curvature is one example of non-accidentalness that could be quantified
and then systematically varied on the basis of previous work (Feldman, 2001). Other types of
feature arrangements can also have this property, but a challenge is to quantify and systematically
vary non-accidentalness more generally. One possible example of this principle is the tendency
to perceive grouping along regular variations in lightness (van den Berg et al., 2011). However, it
remains unclear whether these two aspects of grouping are mediated by similar mechanisms or
fundamentally different ones.

Edge-region grouping
Grouping has traditionally involved elements such as dots or lines grouping with other elements
of same kind. However, Palmer and Brooks (2008) have proposed that regions of space and their
edges can serve as substrates for grouping processes as well, and that this can be a powerful deter-
minant of figure-ground organization. For example, common fate edge-region grouping can be
demonstrated in a simple bipartite figure (Figure 4.9A). This stimulus has two sparsely textured
(i.e. the dots) regions of different colors and share the contrast boundary between them. If, for
instance, the edge moves in one direction in common fate with the texture of one of the regions
but not in common with the other region (Figure 4.9B; animation in Supplemental Figure 4.S1),
then participants will tend to see the region that is in common fate with the edge as figural. It is
not necessary for the edge and grouped region to be moving. In fact, if one of the textured regions
is moving, whereas the edge and the second region are both static, the edge will group with the
static region and become figural (Figure 4.9C; Figure 4.S2). Palmer and Brooks demonstrated that
proximity, orientation similarity, blur similarity (Figure 4.9D,E), synchrony, and color similarity
can all give rise to edge-region grouping, albeit with a range of strengths. Importantly, they also
showed that the strength of the induced figure-ground effect correlated strongly with the strength
of grouping (between the edge and the region) reported by the participants in a separate group-
ing task. This suggests a tight coupling between grouping processes and figure-ground processes.
However, it is not clear that the grouping mechanisms that mediate edge-region grouping are the
same as those that mediate other types of grouping. Nonetheless, edge-region grouping challenges
the claim that grouping can only occur after figure-ground organization (Palmer and Rock, 1994).
(a)

(b) (c)

F F

(d)

(e)

Fig. 4.9  Edge-region grouping occurs between edges and regions. (a) A bipartite display commonly
used in figure-ground paradigms contains two adjacent regions of different color (black and white
here) with a contrast edge between them. The regions here are textured with sparse dots. This can
be seen as either a black object with an edge of sharp spikes in front of a white object or as a white
object with soft, rounded bumps in front of a black object. (b) If the texture dots within one region
(right region here) move in common fate with the edge (edge motion indicated by arrow below
the central vertical edge) then that region will tend to group with the edge and be seen as figural.
The non-grouped region (left here) will be seen as background. (c) A region does not need to be
moving in order to be grouped. It (right region here; lack of movement indicated by ‘X’) can be in
static common fate with an edge if its texture and the edge are both static while the other region
(left region here) is in motion. The region which shares its motion properties with the edge (right
here) becomes figural. (d) Edge-region grouping based on blur similarity between the blurry edge
and a blurry textured region can cause figural assignment to the left in this case. (e) When the blur
of the edge is reduced to match the blur level of the texture elements in the right region then the
edge-region grouping causes assignment to the right.
Traditional and New Principles of Perceptual Grouping 71

Induced grouping
The elements in Figure 4.10A have no basis for grouping amongst themselves. However, when these
elements are placed near to other elements which have their own grouping relationships by prox-
imity (Figure 4.10B), color similarity (Figure 4.10C), or element connectedness (Figure 4.10D),
these other groups can cause induced grouping in the otherwise ungrouped elements (Vickery,
2008). For instance, element connectedness in the lower row of Figure 4.10D seems to group
the elements of the upper row into pairs. This impression can be seen phenomenologically, but
it is difficult to determine whether it occurs automatically or because the observer is intention-
ally looking for it (and thus induced by attention). To solve this problem, Vickery (2008) used
the RDT (see Common Region section above) to indirectly measure the effects of grouping and
avoid demand characteristics. The results demonstrated clearly that grouping can be induced by
similarity, proximity, and common fate. Based on demonstrations, other grouping principles also
seem to effectively induce grouping in surrounding elements as well. Induced grouping depends
critically on the relationship between the inducing elements (lower rows in Figures 4.10B–D) and
the elements in which grouping is being induced (top rows in Figures 4.10B–D). For instance, it
can be disrupted by using common region to put the inducing set into a separate region of space
(Figure 4.10E).

(a)

(b)

(c)

(d)

(e)

Fig. 4.10  Examples of induced grouping. (a) A set of elements with no adjacent elements to induce
grouping. (b) Placing elements grouped by proximity below ungrouped elements can induced
grouping within the otherwise ungrouped upper row. (c) Induced grouping by color similarity.
(d) Induced grouping by element connectedness. (e) Induced grouping can be disrupted by
segmenting the inducers into a separate group as done here by common region grouping.
72 Brooks

Uniform connectedness
Grouping principles operate on elements such as lines, dots, regions, and edges. How do these
elements come about in the first place? One hypothesis has been that these elements are gener-
ated by another, early grouping process, which partitions an image to form the substrates for
the further grouping processes that have been described above (Koffka, 1935; Palmer and Rock,
1994). The principle of uniform connectedness (UC) has been proposed to fulfill this role. UC
decomposes an image into continuous regions of uniform image properties, e.g., texture, color,
motion, and depth (e.g., Figure 4.11A–F). This process is very similar to some computer vision
algorithms that have been developed to segment images based on uniform regions of texture
and other properties (e.g., Malik and Perona, 1990; Shi and Malik, 2000). The elements created
by uniform connectedness were proposed to be entry-level units because they were thought of as
the starting point for all subsequent grouping and parsing processes. However, this proposal has
been controversial. Peterson (1994) has argued that the serial ordering of perceptual organiza-
tion suggested by uniform connectedness is not consistent with modern evidence for how these
processes operate. Others have found evidence that other principles such as collinearity and
closure are as important as uniform connectedness for the initial stages of perceptual organiza-
tion (Kimchi, 2000) and that, under some conditions, proximity may operate faster than uniform
connectedness (Han et al., 1999; Han and Humphreys, 2003). Although its place in the hierarchy
of grouping principles is debated, the basic effect of uniform connectedness as a grouping prin-
ciple seems to be clear.

Grouping in dynamic patterns


Apparent motion arises from displays that are presented in rapid succession with their elements in
different spatial locations from one frame to the next (Wertheimer, 1912). With a single element
the direction of this perceived motion is usually clear. However, when two elements with similar
features are present in the display, the direction of motion can become ambiguous (Figure 4.S3).
For instance, if the patterns in Figure 4.12A,B are alternated, one could perceive the dots mov-
ing either horizontally left and right (Figure 4.12C) or vertically up and down (Figure 4.12D).
This ambiguity highlights the correspondence problem, i.e. how do we know which element in the
second frame corresponds to, for instance, the upper left element in the first frame? Notice that
this sounds like a grouping problem but operating over time rather than space. Early on, it was
clear that varying both the spatial distances between elements and their durations could affect
how motion is perceived (e.g., Bruno & Bertamini, this volume; Burt & Sperling, 1981; Herzog &
Öğmen, this volume; Hock, this volume; Korte, 1915). For instance, shortening the horizontal
distance between the elements in successive frames biases perception toward horizontal motion
(Figure 4.S4). However, spatial groupings within each frame may also have an impact. One way
to study this systematically has been to use the dot lattice stimuli that have been previously used
to study grouping by proximity. Gepshtein and Kubovy (2000) constructed displays with two
lattices, Latticet=1 and Latticet=2, which alternated over time (Figure 4.12E). They found that the
perceived direction of apparent motion within these displays depended primarily on two ratios.
First, the motion ratio, rm = m1/m2, considers the distances from an element in Latticet=1 to its two
closest neighbors in Latticet=2. Similarly to the attraction function for proximity grouping (see
section on proximity grouping), there is a negative linear relationship between the motion ratio
and the probability of perceiving motion along m1. That is, as m1 distance increases relative to m2
the likelihood of seeing motion along m1 decreases. In the case of motion lattices, this pattern has
been called an affinity function. The second ratio, rb = b/m2, captures the spatial grouping factors
Traditional and New Principles of Perceptual Grouping 73

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4.11  Examples of uniform connectedness. (a) Each black circle defines its own unique uniformly
connected (UC) region and the grey background forms another UC region based on color.
(b) Regions of uniform texture also form UC regions. (c) When two circles are joined by a bar of the
same color or (d) texture, then those two dots join together with the connecting bar to form a single
UC region. (e) A bar of different color or (f) texture from the circles leads to the circles remaining
separate UC regions and the bar yet another UC region.
Adapted from Palmer, Stephen E., Vision Science: Photons to Phenomenology, figures 6.2.1, © 1999
Massachusetts Institute of Technology, by permission of The MIT Press.

because it takes into consideration the relative distance between elements within each single
frame. If the distance b is large (relative to the motion grouping directions) then spatial grouping
by proximity (along the dashed line in Figure 4.12E) is weak and motion grouping can dominate
and cause motion along either direction m1 or m2. However, when b is relatively small, then spatial
grouping by proximity is strong in each frame and it can affect perception of motion. Specifically,
it can cause motion along a direction orthogonal to the grouped line of dots (i.e. orthogonal to
the dashed line, Figure 4.12E), a totally different direction than either m1 or m2. By manipulating
both spatial and motion/temporal grouping parametrically within these displays, Gepshtein and
Kubovy (2000) found clear evidence that these two factors interact rather than operating sepa-
rately and in sequence as had been previously suggested.
The nature of the interaction between spatial and temporal factors in apparent motion, has
been controversial with some results supporting the notion of space-time coupling, whereas others
support space-time trade-off. Coupling is present if, in order to maintain the same perception of
apparent motion (i.e. perceptual equilibrium), increases in the time difference between two ele-
ments must be accompanied by a corresponding increase in the distance between them. In con-
trast, space-time trade-off occurs when increases in distance between elements (from one frame to
the next) must be countered with a decrease in the time between frames in order to maintain the
same perception of apparent motion. Although these two types of behavior seem incompatible,
74 Brooks

(a) (b) (e)

b
m1

m2

(c) (d)

Latticet=1 Latticet=2

Fig. 4.12  Apparent motion can occur when elements change position from one point in time (a) to
the next (b). If more than one element is present this can lead to ambiguous motion direction. For
instance, the change from pattern (a) to pattern (b) can occur either because of (c) horizontal motion
of the elements or because of (d) vertical motion of the elements. (e) Two frames of a motion lattice
are shown. Latticet=1 is shown in black and Latticet=2 is shown in gray. Spatial grouping along the
dashed line (not present in displays) is modulated by the distance b. Temporal grouping is modulated
by the ratio of distances m1 and m2 from an element in Latticet=1 to its nearest neighbors in Latticet=2.

they have recently been unified with a single function to explain them. Coupling occurs at slow
motion speeds and trade-off occurs at fast motion speeds (Gepshtein and Kubovy, 2007). This
unification provides a coherent account of the spatiotemporal factors that affect grouping (and
apparent motion) in discrete dynamic patterns.

Top-down/non-image factors
Probability
In the RDT paradigm, participants are faster at detecting two repeated-color (or another
repeated property) targets within an alternating-color array when the targets appear within the
same group than when they appear between two groups as indicated by a grouping principle
such as common region (Palmer and Beck, 2007). In the typical version of this task, targets are
equally likely to appear within groups and between groups across all of the trials of the experi-
ment. In this case, using grouping by proximity, common region, or another factor is equally
likely to help or hinder finding the target. However, in a situation in which targets are between
groups on 75% of trials, the perceptual organization provided by grouping would actively hin-
der performance in the task. In an experiment that varied the probability of the target appearing
within the same group (25%, 50%, or 75%), participants were sensitive to this manipulation and
could even completely eliminate the disadvantage of between-group targets with the knowledge
of what type of target was more likely (Beck and Palmer, 2002). A key question about this effect
is what mechanism mediates it. One interpretation is that the participants can use probabil-
ity as a grouping principle and this can itself compete against other grouping principles and
results in a different perceived grouping in the display. Alternatively, it could be that partici-
pants intentionally change their response strategy or allocate attention differently according to
the probability knowledge. In this case, there may be no actual change in perceived grouping,
but the effects of perceived grouping may be overcome by a compensating strategy. This is a
Traditional and New Principles of Perceptual Grouping 75

difficult question that is not easy to answer. However, it is clear that, at the very least, probability
manipulations can at least overcome and affect the results of grouping on performance. It is
also unclear the extent to which participants need to be aware of the probability manipulation
in order for it to be effective.

Learning, associative grouping, and carryover effects


Grouping principles have generally involved relationships between the image features of ele-
ments at the time grouping is occurring. Very little attention has been paid to how learning
from previous visual experiences can impact visual grouping. Recently, Vickery and Jiang (2009)
investigated this issue. They repeatedly presented participants with pairs of unique shapes
(Figure 4.13A,B) that were grouped within a common region (see Common Region section
above). During this training phase, a given shape always appeared as grouped with the same
other shape. To assess the effectiveness of this grouping during the training phase, the authors
used the RDT (Palmer and Beck, 2007). Participants had to detect a target pair of adjacent
shapes that had the same color. As expected, participants were faster at this when the target pair
occurred within the same group (Figure 4.13A) than when the two elements of the target pair
were in different groups (Figure 4.13B). This confirmed that the participants were perceiving
grouping by common region in the training phase. After 240 trials of training on these shapes,
the participants then saw the same pairs of shapes, but now without the surrounding contours
(Figure 4.13C). Based on image factors alone, these stimuli should not be subject to any group-
ing. Instead, the authors found that participants were significantly faster at detecting the target

(a)

(b)

(c)

(d)

Fig. 4.13  Example stimuli from Vickery and Jiang (2009). Participants saw shapes of alternating
colors in a row and had to determine the color of a target pair which was a pair of adjacent shapes
with the same color, i.e. RDT paradigm. Black is the target color in this example. (a) During the
training phase participants saw the shapes grouped into pairs by common region using outline
contours. In some cases the target appeared within the common region group. (b) In other cases,
the target appeared between two common region groups. (c) After training participants saw the
same stimuli paired as they were during training but without the region outlines. The target could
appear within the previously-learned group or (d) between learned groupings.
Reproduced from Attention, Perception, & Psychophysics, 71 (4), pp. 896–909, Associative grouping: Perceptual
grouping of shapes by association, Timothy J. Vickery and Yuhong V. Jiang , DOI: 10.3758/APP.71.4.896 (c) 2009,
Springer-Verlag. With kind permission from Springer Science and Business Media.
76 Brooks

pair when it appeared within one of the previously seen groups (Figure 4.13C) than when the
pair was between two previously learned groups (Figure 4.13D). This suggests that association
between shapes based on their previously observed likelihood to appear together, can cause
grouping of those shapes in later encounters. Importantly, the task at hand was not dependent
on the shapes and only required participants to attend to the colors of the shapes. The authors
termed this effect associative grouping. In another study, they found that associative grouping
also caused shapes to appear closer together than shapes that had no association history, an effect
that mimics previously-observed spatial distortions induced by grouping (Coren and Girgus,
1980). Other results have also suggested that previous experience, both short-term and lifelong,
can have effects on the outcome of perceptual grouping processes (Kimchi and Hadad, 2002;
Zemel et al., 2002).
Some effects of previous experience on grouping are much more short-lived and may derive
from the immediately preceding stimuli. Hysteresis and adaptation are well-known carryover
effects on visual perception. Hysteresis is the tendency for a given percept to persist even in con-
tradiction to sensory evidence moving in the opposite direction, i.e., it maintains the status quo.
Adaptation, on the other hand, reduces sensitivity to the stimulus features at hand and thus reduces
their influence on subsequent perceptual decisions. Gepshtein and Kubovy (2005) demonstrated
that both of these processes have effects on perceptual grouping and, moreover, the two influ-
ences operate independently of one another. They showed participants dot lattices (Kubovy and
Wagemans, 1995) with two competing organizations, e.g., along directions a or b (Figure 4.2C).
As with previous work, they varied the proximity along these two dimensions and found the
expected effects of proximity on grouping. In a further analysis, they then split the data into trials
on which the participant perceived grouping along a, for instance, and determined the likelihood
that the participant would group along a in the next stimulus. Participants were significantly more
likely than chance to group along the same direction as the preceding stimulus. This demonstrates
an effect of hysteresis on perceptual grouping. They also found that the probability of perceiving
grouping along one dimension, say a, in a stimulus decreased with stronger perceptual evidence
for it in the preceding stimulus (i.e. greater proximity along a in the previous stimulus). This was
true regardless of whether you saw grouping along a or b in the preceding stimulus. The authors
interpreted this as evidence for adaptation. Essentially, when an observer sees strong evidence for
grouping along one dimension in a stimulus, the visual system adapts to this evidence, making
the system less sensitive to that same evidence for grouping when it appears in the next stimulus.
Although the recent data described above has clarified the nature of these carryover effects, hys-
teresis, for instance, was not unknown to Wertheimer and he described it as the factor of objective
set (1923).

Theoretical issues about grouping


In addition to identifying new grouping principles, a significant amount of modern work on per-
ceptual grouping has focused on theoretical issues about grouping. A  major issue has been to
understand how grouping fits amongst all of the other processes of visual perception. Does it
occur very early without any input from later processes (e.g., attention, object recognition) or
does it interact with these processes to determine its results. Alternatively, grouping may occur
throughout visual processing or there may be several fundamentally different types of grouping
which rely on independent mechanisms and have their own time-courses. Alongside the devel-
opment of new principles, modern vision scientists have also worked to address some of these
Traditional and New Principles of Perceptual Grouping 77

theoretical issues that place grouping in context and try to reveal the mechanisms that generate
their phenomenal consequences and effects on task performance. Below are three examples of
these theoretical issues.

When does grouping happen?


Information processing approaches to vision have typically tried to determine the sequence
of processing operations that occur within the visual system (e.g., Palmer and Rock, 1994).
Neurophysiological approaches suggest a hierarchy of visual areas (Felleman and Van Essen,
1991), albeit with significant amounts of bi-directional communication between areas. Where
does perceptual grouping occur in these processing structures? Classically, grouping principles
were considered to operate relatively early in models of visual processing because they were
based on simple image characteristics that can be computed directly from the image. However,
‘early’ is not well-defined. To address this issue, Rock and Brosgole (1964) aimed to determine

(a)

(b)

Fig. 4.14  (a) The array of


luminous beads used by Rock
(c) and Brosgole (1964) aligned in
the frontal plane with support
structure. The luminous beads
appeared in the dark either in
the (b) frontal plane or
(c) tilted in depth.
Adapted from Palmer, Stephen
E., Vision Science: Photons to
Phenomenology, figures 6.1.12,
© 1999 Massachusetts Institute of
Technology, by permission of The
MIT Press.
78 Brooks

whether grouping occurred before or after a particular reference point in visual processing, i.e.
the construction of 3D scene representation. To do this, they constructed a 2D array of lumi-
nous beads (Figure 4.14A). In one condition, they presented this array to participants in a dark
room perpendicular to the line of sight (Figure 4.14B). Based on proximity, this array tends to
be perceived as columns. However, in another condition, the array of beads was tilted in depth
(Figure 4.14C). The tilt caused a foreshortening and thus in 2D image coordinates the elements
became closer together in the horizontal dimension which should make grouping by proxim-
ity more ambiguous. Of course, in 3D image coordinates, the beads remained closer together
vertically. If grouping is based on a 3D representation, then the participants should see columns
based on the shorter 3D vertical distances between elements. Alternatively, if grouping is based
on the 2D representation, then they may be more likely to see rows. When viewing the arrays
with both eyes opened (and thus full 3D vision), participants grouped according to the 3D
structure of the displays. However, when participants closed one eye and saw only the 2D image
information, they were more likely to group the display into rows based on the 2D proximity
of elements caused by foreshortening. Similar effects have been shown for similarity grouping,
suggesting that grouping by lightness (Rock et  al., 1992) occurs on a post-constancy repre-
sentation of visual information. Other work has shown that grouping can also be affected by
the outcome of interpolation processes, such as modal (Palmer and Nelson, 2000) and amodal
completion (Palmer, Neff, and Beck, 1996). All of these results suggest that grouping occurs on
a representation beyond simple image features. Furthermore, grouping also seems to be able
to affect the results of figure-ground processing (Brooks and Driver, 2010; Palmer and Brooks,
2008), contradicting previous proposals that grouping can only occur after figure-ground
organization (Palmer and Rock, 1994). Although much of the evidence above suggests that
grouping occurs later in visual processing than previously thought, it does not always do so.
Grouping by color similarity is based on a post-constancy representation with long duration
displays, but when presented for very brief periods these displays are grouped by pre-constancy
features (Schulz and Sanocki, 2003).
Another approach to this question has been to assess whether perceptual grouping occurs
pre-attentively or only within the spotlight of attention? An early study on this issue used an
inattention paradigm (Mack et al., 1992). As with many other studies of grouping, arrays of
shapes that could be seen as arranged either in rows or columns (e.g., see Figure 4.4) were
presented to participants. However, in this case, a large cross was overlaid between the cen-
tral rows and columns, and participants were instructed to focus their attention on it and
judge whether the horizontal or the vertical part of the cross was longer. Despite the array
of elements being in the center of the participants’ visual field during this task, they were
unable to report whether the array was grouped into rows or columns. Presumably, this is
because they were not attending to the grouping array, while their attention was focused on
the task-relevant cross. This was taken as evidence that even if a pattern is at the center of
vision, grouping processes may not operate unless attention is specifically allocated to the
pattern (also see Ben-Av, Sagi, and Braun, 1992). However, since then, others, using different
paradigms, have uncovered evidence, often indirect, that at least some perceptual grouping
may be operating pre-attentively (Kimchi, 2009; Lamy et  al., 2006; Moore and Egeth, 1997;
Russell and Driver, 2005), although this is not the case for all types of grouping (Kimchi and
Razpurker-Apfeld, 2004).
All of these results together have been taken to suggest that grouping may occur at many differ-
ent levels of processing, rather than being a single step that occurs at one point in time (Palmer,
Traditional and New Principles of Perceptual Grouping 79

Brooks, and Nelson, 2003). Furthermore, different types of grouping may occur at different levels.
It is also possible that at least some grouping is dependent on recurrent processing between dif-
ferent levels, or brain areas, rather than representing single sequential steps (e.g., Lamme and
Roelfsema, 2000; Roelfsema, 2006). This is an issue that is just starting to be addressed systemati-
cally and may most directly be approached by studying how perceptual grouping is implemented
in neural circuits.

Mechanisms of grouping
One well-known mechanism that may underlie perceptual grouping is suggested by the tem-
poral correlation hypothesis (Singer and Gray, 1995; von der Malsburg, 1981), which holds that
synchrony in neural populations serves as a binding code for information in different parts of
cortex. Grouping may be mediated by synchronization of activity between neurons represent-
ing different elements of a group. Although some neurophysiological recordings in animals
(e.g., Castelo-Branco et al., 2000; Singer and Gray, 1995) and EEG recordings in humans (e.g.,
Tallon-Baudry and Bertrand, 1999; Vidal, Chaumon, O’Regan, and Tallon-Baudry, 2006) have
supported this idea, it remains a controversial hypothesis (e.g., Lamme and Spekreijse, 1998;
Roelfsema et  al., 2004). Much of that evidence applies to limited types of grouping such as
collinearity/continuity (e.g., Singer and Gray, 1995) or formation of illusory contours based
on these features (e.g., Tallon-Baudry and Bertrand, 1999). It is not clear whether synchrony
can serve as a general mechanism to explain a wider array of grouping phenomena, especially
those not based on image features. For more discussion of the role of oscillatory activity in
perceptual organization see Van Leeuwen’s Cortical Dynamics chapter (this volume). Van der
Helm’s Simplicity chapter (this volume) discusses a link between synchrony and perceptual
simplicity.
Even if multiple cues use synchrony as a coding mechanism, it may be that different cues use
different parts of visual cortex or recruit additional mechanisms. However, some fMRI evidence
suggests that proximity and similarity grouping cues, for instance, share a common network
including temporal, parietal, and prefrontal cortices (Seymour et al., 2008). In contrast, some ERP
evidence has shown differences in the time-course of processing of these two grouping cues (e.g.,
Han et al., 2002; Han et al., 2001) and other cues (e.g., Casco et al., 2009). Other work has focused
specifically on interactions between different visual areas with the role of feedback from higher
order areas a critical issue (Murray et al., 2004). A significant amount of computational work has
also generated specific models of perceptual grouping mechanisms. For instance, some of this
work has aimed to explain how grouping effects may emerge from the structure of the laminar
circuits of visual cortex (e.g., Grossberg et al., 1997; Ross et al., 2000). A full review of findings on
neural and computational mechanisms of grouping is beyond the scope of this chapter but it is
clear that even with the simplest Gestalt cues there is evidence of divergence in mechanisms and
many competing proposals.

Prägnanz and simplicity


Wertheimer (1923, 2012) dedicated a relatively large section of his article to discussing and dem-
onstrating that a particular organization of elements may be favored because it is ‘better’ than
other organizations, i.e., a good Gestalt. This idea has been called the law or principle of Prägnanz
(German word meaning ‘conciseness’) and the notion received substantial attention from Gestalt
psychologists other than Wertheimer (Koffka, 1935; Köhler, 1920). For instance, the lines in
80 Brooks

(a)

1 2
3 4

(b)

2 3 Fig. 4.15  The principle


4
of Prägnanz. (a) The four
edge sections 1–4 can
be seen as arranged into
different structures. Edges
(c)
1 and 2 may group to form
an object separate from 3
and 4, which form another
object as represented in
panel (b). Alternatively,
1 2
edges 1 and 3 may join
3 4
and 2 and 4 join to form
better shapes like those
depicted in panel (c).

Figure 4.15A could be perceived as edges 1 and 2 forming one object and lines 3 and 4 forming
another object (as shown in Figure 4.15B). However, most people do not see this organization.
Instead, they perceive two symmetrical objects that are overlapping (shown non-overlapping in
Figure 4.15C). Wertheimer claimed that the organization in Figure 4.15B produces ‘senseless’
shapes which are not very good Gestalts or whole forms. Those produced by the organization
represented in Figure 4.15C form better wholes. Notice that in this case, this means that we follow
what seems to be a factor of good continuation in grouping the edge segments together rather
than closure which may have favored the other organization. Wertheimer seemed to suggest that
ultimately all of the factors that he proposed are aimed at determining the best Gestalt possible
given the stimulus available. Furthermore, competitions amongst them may be resolved by deter-
mining which of them produces the best Gestalt.
Although the idea of Prägnanz was relatively easy to demonstrate, a clear, formal definition was
not provided by the Gestaltists. To fill this gap, modern vision scientists have often framed the
problem in terms of information theory. In this framework, organizations of the stimulus that
Traditional and New Principles of Perceptual Grouping 81

require less information to encode them are better than those which require more information
(Hochberg and McAlister, 1953). For instance, symmetrical figures (Figure 4.15C) may require
less information to encode than similar non-symmetrical figures (Figure 4.15B) because one half
of each figure is a simple transformation of the other. This could reduce the information needed to
encode them by nearly one half if you encode it as two identical halves plus one transformation.
There are multiple versions of how stimuli can be encoded, their information measured, and sim-
plicity compared (e.g., Collard and Buffart, 1983; Garner, 1970, 1974; Leeuwenberg, 1969, 1971).
Regardless of how it is computed, if the visual system uses simplicity as a criterion for determin-
ing perceptual structure, it is presumably useful in terms of constructing an evolutionarily useful
representation of the physical world. However, there is no guarantee that simple representations
are actually veridical. For a more detailed discussion of these important issues see van der Helm’s
chapter on Simplicity in this volume.

Summary
The Gestalt psychologists discovered and popularized an enduring set of grouping principles.
Their methods were largely based on demonstrations. To some, this has been seen as a point
of weakness. However, the ability to see clear effects through demonstration alone actually
shows the strength of the effects that they found, especially in comparison to some modern
indirect methods, which only show effects, for instance, on the order of tens of milliseconds.
Modern vision scientists have elaborated some of these principles by studying them quantita-
tively and clarifying the conditions under which they operate. However, some of the original
principles still are without clear formal definitions (e.g., good continuation) and work needs
to be done on this. There has also been significant work on how different principles combine
(Claessens and Wagemans, 2008; Elder and Goldberg, 2002), an important issue given that
natural images often seem to contain many cues simultaneously. A robust set of new principles
have also been articulated. Many of these involve dynamic scene features and others highlight
the influence of context, learning, and other aspects of cognition. Although all of these prin-
ciples can be termed as grouping based on their phenomenological effects, such a diverse set
of image-based and non-image factors are likely to involve a wide range of different neural
mechanisms. Identifying the mechanistic overlap between different principles is an issue, that
when addressed, will shed greater light on how we might further categorize them. It is also
unlikely that the principles described above form an exhaustive list. The brain likely picks up
on many sources of information in visual scenes to drive perceptual grouping and we have
likely only scratched the surface.

References
Ahlström, U. (1995). Perceptual unit formation in simple motion patterns. Scand J Psychol 36(4): 343–354.
Alais, D., Blake, R., and Lee, S. H. (1998). Visual features that vary together over time group together over
space. Nature Neurosci 1(2): 160–164.
Barlow, H. B., and Reeves, B. C. (1979). The versatility and absolute efficiency of detecting mirror
symmetry in random dot displays. Vision Res 19(7): 783–793. Available at: M http://www.ncbi.nlm.nih.
gov/pubmed/483597
Baylis, G. C., and Driver, J. (1994). Parallel computation of symmetry but not repetition within single
visual shapes. Visual Cognit 1(4): 377–400.
82 Brooks

Beck, D. M., and Palmer, S. E. (2002). Top-down influences on perceptual grouping. J Exp Psychol Hum
Percept Perform 28(5): 1071–1084.
Ben-Av, M. B., and Sagi, D. (1995). Perceptual grouping by similarity and proximity: experimental results
can be predicted by intensity autocorrelations. Vision Res 35(6): 853–866.
Ben-Av, M. B., Sagi, D., and Braun, J. (1992). Visual attention and perceptual grouping. Percept Psychophys
52(3): 277–294.
Börjesson, E., and Ahlström, U. (1993). Motion structure in five-dot patterns as a determinant of
perceptual grouping. Percept Psychophys 53(1): 2–12.
Brooks, J. L., and Driver, J. (2010). Grouping puts figure-ground assignment in context by constraining
propagation of edge assignment. Attention, Percept Psychophys 72(4): 1053–1069.
Brooks, J. L., and Palmer, S. E. (2010). Cue competition affects temporal dynamics of edge-assignment in
human visual cortex. J Cogn Neurosci 23(3): 631–44.
Bruno, N., and Bertamini, M. (2014). Perceptual organization and the aperture problem. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Burt, P., and Sperling, G. (1981). Time, distance, and feature trade-offs in visual apparent motion. Psychol
Rev 88(2); 171–195.
Casco, C., Campana, G., Han, S., and Guzzon, D. (2009). Psychophysical and electrophysiological evidence
of independent facilitation by collinearity and similarity in texture grouping and segmentation. Vision
Res 49(6): 583–593.
Castelo-Branco, M., Goebel, R., Neuenschwander, S., and Singer, W. (2000). Neural synchrony correlates
with surface segregation rules. Nature 405(6787): 685–689.
Claessens, P. M. E., and Wagemans, J. (2008). A Bayesian framework for cue integration in multistable
grouping: proximity, collinearity, and orientation priors in zigzag lattices. J Vision 8(7): 33.1–23.
Collard, R. F. A., and Buffart, H. F. J. M. (1983). Minimization of structural information: a set-theoretical
approach. Pattern Recogn 16(2): 231–242.
Corballis, M. C., and Roldan, C. E. (1974). On the perception of symmetrical and repeated patterns.
Percept Psychophys 16(1): 136–142.
Coren, S., and Girgus, J. S. (1980). Principles of perceptual organization and spatial distortion: the gestalt
illusions. J Exp Psychol Hum Percept Perform 6(3): 404–412.
Elder, J. H., and Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. J Vision 2(4): 324–353.
Fantoni, C., and Gerbino, W. (2003). Contour interpolation by vector-field combination. J Vision, 3(4): 281–303.
Farid, H. (2002). Temporal synchrony in perceptual grouping: a critique. Trends Cogn Sci 6(7): 284–288.
Farid, H., and Adelson, E. H. (2001). Synchrony does not promote grouping in temporally structured
displays. Nature Neurosci 4(9): 875–876.
Feldman, J. (2001). Bayesian contour integration. Percept Psychophys 63(7): 1171–1182.
Felleman, D. J., and Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cereb Cortex 1(1): 1–47.
Field, D. J., Hayes, A., and Hess, R. F. (1993). Contour integration by the human visual system: evidence for
a local ‘association field.’ Vision Res 33(2): 173–193.
Fisher, C. B., and Bornstein, M. H. (1982). Identification of symmetry: effects of stimulus orientation and
head position. Percept Psychophys 32(5): 443–448.
Garner, W. R. (1970). Good patterns have few alternatives. Am Scient 58(1): 34–42.
Garner, W. R. (1974). The Processing of Information and Structure. New York: L. Erlbaum Associates.
Geisler, W. S., Perry, J. S., Super, B. J., and Gallogly, D. P. (2001). Edge co-occurrence in natural images
predicts contour grouping performance. Vision Res 41(6): 711–724.
Traditional and New Principles of Perceptual Grouping 83

Gepshtein, S., and Kubovy, M. (2000). The emergence of visual objects in space-time. Proc Nat Acad Sci
USA 97(14): 8186–8191.
Gepshtein, S., and Kubovy, M. (2005). Stability and change in perception: spatial organization in temporal
context. Exp Brain Res 160(4): 487–495.
Gepshtein, S., and Kubovy, M. (2007). The lawful perception of apparent motion. J Vision, 7(8): 9.
Gillebert, C. R., and Humphreys, G. W. (2014). Mutual interplay between perceptual organization and
attention: a neuropsychological perspective. In Oxford Handbook of Perceptual Organization, edited by
J. Wagemans. Oxford: Oxford University Press.
Goodale, M. A., Milner, A. D., Jakobson, L. S., and Carey, D. P. (1991). A neurological dissociation
between perceiving objects and grasping them. Nature 349(6305): 154–156.
Grossberg, S., Mingolla, E., and Ross, W. D. (1997). Visual brain and visual perception: how does the
cortex do perceptual grouping? Trends Neurosci 20(3): 106–111.
Guttman, S. E., Gilroy, L. A., and Blake, R. (2007). Spatial grouping in human vision: temporal structure
trumps temporal synchrony. Vision Res 47(2): 219–230.
Han, S., Ding, Y., and Song, Y. (2002). Neural mechanisms of perceptual grouping in humans as revealed
by high density event related potentials. Neurosci Lett 319(1): 29–32.
Han, S., and Humphreys, G. W. (2003). Relationship between uniform connectedness and proximity in
perceptual grouping. Sci China. Ser C, Life Sci 46(2): 113–126.
Han, S., Humphreys, G. W., and Chen, L. (1999). Uniform connectedness and classical Gestalt principles of
perceptual grouping. Percept Psychophys 61(4): 661–674.
Han, S., Song, Y., Ding, Y., Yund, E. W., and Woods, D. L. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38(6): 926–935.
Herzog, M. H., and Öğmen, H. (2014). Apparent motion and reference frames. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Hess, R. F., May, K. A., and Dumoulin, S. O. (2014). Contour integration: psychophysical,
neurophysiological and computational perspectives. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. Oxford: Oxford University Press.
Hochberg, J., and McAlister, E. (1953). A quantitative approach to figural ‘goodness.’ J Exp Psychol
46(5): 361.
Hochberg, J., and Silverstein, A. (1956). A quantitative index of stimulus-similarity proximity vs.
differences in brightness. Am J Psychol 69(3): 456–458.
Hock, H. S. (2014). Dynamic grouping motion: a method for determining perceptual organization for
objects with connected surfaces. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans.
Oxford: Oxford University Press.
Humphreys, G. W., and Riddoch, M. J. (1993). Interactions between object and space systems revealed
through neuropsychology. In Attention and Performance, Volume 24, edited by D. E. Meyer and
S. Kornblum, pp. 183–218. Cambridge, MA: MIT Press.
Kahn, J. I., and Foster, D. H. (1986). Horizontal-vertical structure in the visual comparison of rigidly
transformed patterns. J Exp Psychol Hum Percept Perform 12(4): 422–433.
Kellman, P. J., Garrigan, P. B., Kalar, D., and Shipley, T. F. (2010). Good continuation and
relatability: related but distinct principles. J Vision 3(9): 120.
Kellman, P. J., and Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cogn Psychol
23(2): 141–221.
Kimchi, R. (2000). The perceptual organization of visual objects: a microgenetic analysis. Vision Res
40(10–12): 1333–1347.
Kimchi, R. (2009). Perceptual organization and visual attention. Progr Brain Res 176: 15–33.
84 Brooks

Kimchi, R., and Hadad, B-S. (2002). Influence of past experience on perceptual grouping. Psychol Sci
13(1): 41–47.
Kimchi, R., and Razpurker-Apfeld, I. (2004). Perceptual grouping and attention: not all groupings are
equal. Psychonom Bull Rev 11(4): 687–696.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand [Static and Stationary
Physical Shapes]. Braunschweig, Germany: Vieweg.
Korte, A. (1915). Kinematoskopische Untersuchungen [Kinematoscopic investigations]. Zeitschr Psychol
72: 194–296.
Kubovy, M., Holcombe, A. O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity. Cogn
Psychol 35(1): 71–98.
Kubovy, M., and Wagemans, J. (1995). Grouping by proximity and multistability in dot lattices: a
quantitative Gestalt theory. Psychol Sci 6: 225–234.
Lamme, V. A. F., and Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and
recurrent processing. Trends Neurosci 23(11): 571–579.
Lamme, V. A. F., and Spekreijse, H. (1998). Neuronal synchrony does not represent texture segregation.
Nature 396(6709): 362–366.
Lamy, D., Segal, H., and Ruderman, L. (2006). Grouping does not require attention. Percept Psychophys
68(1): 17–31.
Lee, S. H., and Blake, R. (1999). Visual form created solely from temporal structure. Science
284(5417): 1165–1168.
Leeuwenberg, E. L. (1969). Quantitative specification of information in sequential patterns. Psychol Rev
76(2): 216–220.
Leeuwenberg, E. L. (1971). A perceptual coding language for visual and auditory patterns. Am J Psychol
84(3): 307–349.
Leonards, U., Singer, W., and Fahle, M. (1996). The influence of temporal phase differences on texture
segmentation. Vision Res 36(17): 2689–2697.
Levinthal, B. R., and Franconeri, S. L. (2011). Common-fate grouping as feature selection. Psychol Sci
22(9): 1132–1137.
Luce, R. D. (2002). A psychophysical theory of intensity proportions, joint presentations, and matches.
Psychol Rev 109(3): 520–532.
Machilsen, B., Pauwels, M., and Wagemans, J. (2009). The role of vertical mirror symmetry in visual shape
detection. J Vision 9(12): 11.1–11.11.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). Perceptual organization and attention. Cogn
Psychol 24(4): 475–501.
Malik, J., and Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. J Opt Soc
Am A, Optics Image Sci 7(5): 923–932.
Mathes, B., and Fahle, M. (2007). Closure facilitates contour integration. Vision Res 47(6): 818–827.
Moore, C. M., and Egeth, H. (1997). Perception without attention: evidence of grouping under conditions
of inattention. J Exp Psychol Hum Percept Perform 23(2): 339–352.
Murray, S. O., Schrater, P., and Kersten, D. (2004). Perceptual grouping and the interactions between visual
cortical areas. Neural Networks 17(5–6): 695–705.
Norcia, A. M., Candy, T. R., Pettet, M. W., Vildavski, V. Y., and Tyler, C. W. (2002). Temporal dynamics of
the human response to symmetry. J Vision 2(2): 132–139.
Oyama, T. (1961). Perceptual grouping as a function of proximity. Percept Motor Skills 13: 305–306.
Oyama, T., Simizu, M., and Tozawa, J. (1999). Effects of similarity on apparent motion and perceptual
grouping. Perception 28(6): 739–748.
Traditional and New Principles of Perceptual Grouping 85

Palmer, S. E. (1992). Common region: a new principle of perceptual grouping. Cogn Psychol 24(3): 436–447.
Palmer, S. E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Palmer, S. E., and Beck, D. M. (2007). The repetition discrimination task: an objective method for studying
perceptual grouping. Percept Psychophys 69(1): 68–78.
Palmer, S. E., and Brooks, J. L. (2008). Edge-region grouping in figure-ground organization and depth
perception. J Exp Psychol Hum Percept Perform 34(6): 1353–1371.
Palmer, S. E., Brooks, J. L., and Nelson, R. (2003). When does grouping happen? Acta Psychol
114(3): 311–330.
Palmer, S. E., and Hemenway, K. (1978). Orientation and symmetry: effects of multiple, rotational, and
near symmetries. J Exp Psychol Hum Percept Perform 4(4): 691–702.
Palmer, S. E., Neff, J., and Beck, D. (1996). Late influences on perceptual grouping: amodal completion.
Psychonom Bull Rev 3: 75–80.
Palmer, S. E., and Nelson, R. (2000). Late influences on perceptual grouping: illusory figures. Percept
Psychophys 62(7): 1321–1331.
Palmer, S. E., and Rock, I. (1994). Rethinking perceptual organization: the role of uniform connectedness.
Psychonom Bull Rev 1: 29–55.
Peterson, M. A. (1994). The proper placement of uniform connectedness. Psychonom Bull Rev
1(4): 509–514.
Peterson, M. A., and Enns, J. T. (2005). The edge complex: implicit memory for figure assignment in shape
perception. Percept Psychophys 67(4): 727–740.
Pizlo, Z., Salach-Golyska, M., and Rosenfeld, A. (1997). Curve detection in a noisy image. Vision Res
37(9): 1217–1241.
Quinlan, P. T., and Wilton, R. N. (1998). Grouping by proximity or similarity? Competition between the
Gestalt principles in vision. Perception 27(4): 417–430.
Rock, I., and Brosgole, L. (1964). Grouping based on phenomenal proximity. J Exp Psychol 67: 531–538.
Rock, I., Nijhawan, R., Palmer, S. E., and Tudor, L. (1992). Grouping based on phenomenal similarity of
achromatic color. Perception 21(6): 779–789.
Roelfsema, P. R. (2006). Cortical algorithms for perceptual grouping. Ann Rev Neurosci 29: 203–227.
Roelfsema, P. R., Lamme, V. A. F., and Spekreijse, H. (2004). Synchrony and covariation of firing rates in
the primary visual cortex during contour grouping. Nature Neurosci 7(9): 982–991.
Ross, W. D., Grossberg, S., and Mingolla, E. (2000). Visual cortical mechanisms of perceptual
grouping: interacting layers, networks, columns, and maps. Neural Networks 13(6): 571–588.
Royer, F. L. (1981). Detection of symmetry. J Exp Psychol Hum Percept Perform 7(6): 1186–1210.
Russell, C., and Driver, J. (2005). New indirect measures of ‘inattentive’ visual grouping in a
change-detection task. Percept Psychophys 67(4): 606–623.
Schulz, M. F., and Sanocki, T. (2003). Time course of perceptual grouping by color. Psychol Sci
14(1): 26–30.
Sekuler, A. B., and Bennett, P. J. (2001). Generalized common fate: grouping by common luminance
changes. Psychol Sci 12(6): 437–444.
Seymour, K., Karnath, H-O., and Himmelbach, M. (2008). Perceptual grouping in the human
brain: common processing of different cues. NeuroReport 19(18): 1769–1772.
Shi, J., and Malik, M. (2000). Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine
Intell 22(8): 888–905.
Shubnikov, A. V., and Koptsik, V. A. (1974). Symmetry in Science and Art. New York: Plenum.
Singer, W., and Gray, C. M. (1995). Visual feature integration and the temporal correlation hypothesis. Ann
Rev Neurosci 18: 555–586.
86 Brooks

Singh, M. (2014). Visual representation of contour geometry. In Oxford Handbook of Perceptual


Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Stevens, S. S. (1957). On the psychophysical law. Psychol Rev 64(3): 153–181.
Strother, L., and Kubovy, M. (2006). On the surprising salience of curvature in grouping by proximity.
J Exp Psychol Hum Percept Perform 32(2): 226–234.
Strother, L., and Kubovy, M. (2012). Structural salience and the nonaccidentality of a Gestalt. J Exp Psychol
Hum Percept Perform 38(4): 827–832.
Strother, L., Van Valkenburg, D., and Kubovy, M. (2002). Toward a psychophysics of perceptual
organization using multistable stimuli and phenomenal reports. Axiomathes 13(3/4): 283–302.
Tallon-Baudry, C., and Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in object
representation. Trends Cogn Sci 3: 151–162.
Tipper, S. P., and Behrmann, M. (1996). Object-centered not scene-based visual neglect. J Exp Psychol Hum
Percept Perform 22(5): 1261–1278.
Tversky, T., Geisler, W. S., and Perry, J. S. (2004). Contour grouping: closure effects are explained by good
continuation and proximity. Vision Res 44(24): 2769–2777.
Usher, M., and Donnelly, N. (1998). Visual synchrony affects binding and segmentation in perception.
Nature 394(6689): 179–182.
Van den Berg, M., Kubovy, M., and Schirillo, J. A. (2011). Grouping by regularity and the perception of
illumination. Vision Res 51(12): 1360–1371.
Van der Helm, P. A. (2014a). Symmetry perception. In Oxford Handbook of Perceptual Organization, edited
by J. Wagemans. Oxford: Oxford University Press.
Van der Helm, P. A. (2014b). Simplicity in perceptual organization. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Van Leeuwen, C. (2014). Cortical dynamics and oscillations: what controls what we see? In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Van Lier, R., and Gerbino, W. (2014). Perceptual completions. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Vickery, T. J. (2008). Induced perceptual grouping. Psychol Sci 19(7): 693–701.
Vickery, T. J., and Jiang, Y. V. (2009). Associative grouping: perceptual grouping of shapes by association.
Attention, Percept Psychophys 71(4): 896–909.
Vidal, J. R., Chaumon, M., O’Regan, J. K., and Tallon-Baudry, C. (2006). Visual grouping and the focusing
of attention induce gamma-band oscillations at different frequencies in human magnetoencephalogram
signals. J Cogn Neurosci 18(11): 1850–1862.
Von der Malsburg, C. (1981). The Correlation Theory of Brain Function, Department technical report no
81–2. Gottingen, Germany.
Wagemans, J. (1993). Skewed symmetry: a nonaccidental property used to perceive visual forms. J Exp
Psychol Hum Percept Perform 19(2): 364–380.
Wagemans, J. (1995). Detection of visual symmetries. Spatial Vision 9(1): 9–32.
Wagemans, J., Van Gool, L., and d’Ydewalle, G. (1991). Detection of symmetry in tachistoscopically
presented dot patterns: effects of multiple axes and skewing. Percept Psychophys 50(5): 413–427.
Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung [Experimental studies on
the seeing of motion]. Zeitschr Psychol 61: 161–265.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. Psychol Forsch 4: 301–350.
Wertheimer, M. (1938). Laws of organization in perceptual forms. In A Source Book of Gestalt Psychology
edited by W. D. Ellis., pp 71–88, Gouldsboro, ME: Gestalt Journal Press.
Wertheimer, M. (2012). Investigations on Gestalt principles. In On Perceived Motion and Figural
Organization, edited by L. Spillmann, p. 144. Cambridge, MA: MIT Press.
Traditional and New Principles of Perceptual Grouping 87

Weyl, H. (1952). Symmetry. Princeton, N.J.: Princeton University Press.


Wouterlood, D., and Boselie, F. (1992). A good-continuation model of some occlusion phenomena. Psychol
Res 54(4): 267–277.
Yen, S. C., and Finkel, L. H. (1998). Extraction of perceptually salient contours by striate cortical networks.
Vision Res 38(5): 719–741.
Zemel, R. S., Behrmann, M., Mozer, M. C., and Bavelier, D. (2002). Experience-dependent perceptual
grouping and object-based attention. J Exp Psychol Hum Percept Perform 28(1): 202–217.
Chapter 5

Emergent features and


feature combination
James R. Pomerantz and Anna I. Cragin

Introduction to Emergent Features (EFs)


Emergence
The idea of emergence lies at the heart of perceptual organization. Since the earliest scientific
approaches to perception, the notion has persisted that percepts are composed of sensations as a
wall is made of bricks. If we could determine how those sensations—features, in contemporary
parlance—are detected, we could understand how we perceive the world, namely by adding up or
otherwise integrating those features into wholes. Emergence provides a challenge to this linear,
feedforward view of perception because when certain features are close in time and space, novel,
unexpected, and salient properties may arise. Those properties—emergent features—behave as
though they were elementary themselves, sometimes even being detected far more efficiently than
the nominally more basic features from which they arise. What are these emergent features (EFs),
and how are they detected and employed in perception?

Philosophical issues and reductionism


Most of us are familiar with emergence, although perhaps not by that name. Our first encounter
may come in chemistry when we see two clear liquids poured together to form a dark mixture,
perhaps accompanied by smoke or an explosion. Or when we discover that hydrogen and oxy-
gen gases may combine to form water, a liquid with a host of properties possessed by neither of
its constituents separately. Chemistry provides examples of the emergence of new phenomena
not present in the descriptions and models from the underlying physics, just as biology provides
examples not present in chemistry. These phenomena form the primary challenge to reduction-
ism in the physical sciences.
Emergence is also a key concept in philosophy and cognitive science (Stephan 2003), and its
central tenet is not merely quantitative non-additivity, wherein the combination of two parts does
not add up to the resulting whole. Most sensory processes are non-linear above threshold, after
all:  the brightness of two superimposed lights does not equal the sum of the two lights alone.
Emergence also requires novelty, unpredictability, and surprise that make the whole qualitatively
different from the sum of its parts.

Emergence in perception
The Gestalt psychologists’ key claim was that a whole is perceived as something other than the
sum of its parts, a claim still often misquoted as ‘more than the sum of its parts.’ Indeed, the Gestalt
psychologists argued such summing was meaningless (Pomerantz and Kubovy 1986; Wagemans
Emergent features and feature combination 89

et al. 2012b). That elusive ‘something other’ they struggled to define can be regarded as emer-
gence:  those properties that appear, or sometimes disappear, when stimulus elements are per-
ceived as a unitary configuration. To take the example of apparent motion with which Wertheimer
(1912) launched the Gestalt school (Wagemans et al. 2012a, b): if one observes a blinking light
that is then joined by a second blinking light, depending on their timing, one may then see not
two blinking lights but a single light in apparent (beta) motion, or even just pure (phi) motion
itself. What is novel, surprising and super-additive with the arrival of the second light is motion.
What disappears with emergence is one or both of the lights, because when beta motion is seen
we perceive only one light, not two, and with phi we may see only pure, disembodied motion; in
this respect the whole is less than the sum of its parts.

Basic features and feature integration


The reigning general view of perception today derives from a two-stage model best associated
with Neisser (1967) and with Treisman and Gelade (1980) involving so-called basic features (what
in an earlier day Structuralists such as Titchener might have called ‘sensations’) and their sub-
sequent integration (see also Feldman, in press). For visual perception, in the first stage, basic
features are detected simultaneously and effortlessly, in parallel across the visual field. The cri-
teria for basic are several but include popout, rapid texture segmentation, illusory conjunctions,
and search asymmetry (Treisman and Gelade 1980; Treisman and Gormican 1988; Treisman and
Souther 1985). Considering popout as a prototypical diagnostic, a red square will pop out from a
field of green squares virtually instantaneously, irrespective of the number of green squares; thus,
color (or some particular wavelength combinations) qualifies as a basic feature. Similarly a blink-
ing light will pop out from a field of non-blinking lights, a large object will pop out from a field
of small objects, a moving object from a field of stationary, a tilted line from a field of verticals, a
near object from a field of far ones, and so on. One current estimate (Wolfe and Horowitz 2004)
holds that there are perhaps 20 such basic features.
In the second stage of the standard two-stage model, basic features detected in the first stage are
combined or integrated. This process is both slow and attention-demanding. Originally, the sec-
ond stage was dubbed ‘serial’ in contrast to the ‘parallel’ first stage; but in light of rigorous analyses
by Townsend (1971), this language was replaced by the more process-neutral terms ‘efficient’ and
‘inefficient’. Either way, the combination of basic features is thought to take place within a ‘spot-
light’ of attention that covers only a portion of the visual field at one time. This spotlight can be
moved, but that requires time and effort. Thus the time to detect a target defined by a combination
of basic features is long and rises with the number of items in the field: a red diagonal in a field
of mixed green diagonals and red verticals does not pop out but must be searched for attentively.
Among the other diagnostics for basic features is spontaneous texture segregation (Julesz
1981): if a texture field contains vertical elements on its left and diagonal on its right, observers
will detect a ‘seam’ down the middle where the two textures meet. A similar outcome results with
red vs. green or large vs. small. But if the texture contains clockwise spirals on the left and coun-
terclockwise on the right, observers will not perceive the seam because this feature is not basic.
Regarding search asymmetry, it is easier to find a target containing a basic feature in a field of dis-
tractors lacking it than vice versa; thus it is easier to find an open circle in a field of closed circles
than vice versa, suggesting that terminators may be the basic feature whose presence is detected
in open circles. Finally, basic features may lead to illusory conjunctions, particularly in the visual
periphery when attentional load is high: in a field of red squares and green circles, observers will
sometimes report seeing an illusory red circle, suggesting that both the color and the shape dis-
tinctions are basic features.
90 Pomerantz and Cragin

Gestalts arise from Emergent Features (EFs)


In the strongest version of the argument we outline here, Gestalts are configurations or arrange-
ments of elements that possess EFs. Three closely and evenly spaced points arranged in a straight
line will form a salient Gestalt, as with Orion’s Belt in the night sky where three stars group
by virtue of their proximity, symmetry, nearly equal brightness, and linearity. Three stars more
widely and unevenly spaced, varying in brightness, and not forming any regular geometric
arrangement would thus contain no EFs and are unlikely to be seen grouping into a Gestalt.
The parallelism of two lines, the symmetry of a snowflake, and the good continuation of the two
diagonals crossing to form an X are all emergent features, as detailed below. From the viewpoint
of the Theory of Basic Gestalts (Pomerantz and Portillo 2011; Pomerantz and Portillo 2012) and
related approaches, Gestalts, grouping, and EFs are inseparable concepts; when we say that two
elements group, we mean that salient, novel features emerge from their juxtaposition in space or
time. If a collection of elements contains no EFs (using the definition below), that collection is
not a perceptual group.
The essence of Gestalts is their primacy in perception: EFs are perceived more accurately
and rapidly than are the basic features from which they emerge. Below we discuss in detail the
Configural Superiority Effect by which EFs are diagnosed (Pomerantz et al, 1977), but for now
it is illustrated in Figure 5.1. Panel a shows four line segments: three positive diagonals and one
negative diagonal. These line segments differ in the classic basic feature of orientation. Panel
b shows these same diagonals each accompanied by identical horizontal/vertical pairs form-
ing Ls. Subjects are much faster and more accurate at finding the triangle that has emerged

(a) (b) (c)

(d) (e) (f)


Fig. 5.1  Configural Superiority and Inferiority Effects. Panel (a): Base odd quadrant display of
diagonals; (b): Composite display with L-shaped context elements added, with arrows and triangles
emergent to create configural superiority; (c): Composite display with slightly different Ls added,
yielding forms lacking emergent features and producing configural superiority; (d): Base display
of parentheses; (e): Composite display with a left parenthesis added to create emergent features
and configural superiority; (f): Composite display with rotated parentheses yielding forms lacking
emergent feature differences and producing configural inferiority.
Emergent features and feature combination 91

from a field of arrows in Panel b (as fast as telling black from white) than at finding the nega-
tive diagonal in Panel a, even though the Ls add no discriminative information, rather only
homogeneous ‘noise’ with potential for impairing perception through masking and crowding.
Panels d and e show a similar configural superiority effect involving line curvature rather than
orientation. This configural superiority effect shows better processing of wholes—Gestalts—than
of their parts, and we show below how it may arise from the EFs of closure, terminator count,
and intersection type.
EFs and configural superiority pose challenges for the standard two-stage model of perception.
If the integration of basic features is slow and requires attention, why are Gestalts so salient and so
quickly perceived if they too require feature integration? How can EFs be more basic than the more
elementary features from which they arise? First we review the evidence that Gestalts are in fact
highly salient, and then we consider how their existence can be reconciled with perceptual theory.

Emergent Features are not just perceptual anchors


Because EFs necessarily entail relationships among parts, could configural superiority simply
reflect our superiority at relative judgments over absolute judgments? For example, we can better
judge whether one line is longer than another than identify the length of either, and we can better
tell whether two tones match in pitch than identify either as a middle C. This explanation can-
not work, however, because for every configural superiority effect, there are far more configural
inferiority effects. Panel c of Figure 5.1 shows configural inferiority when the L-shaped context is
shifted relative to the diagonal to eliminate EF differences. This demonstrates that making a judg-
ment easier merely by providing a comparison, contextual stimulus cannot explain configural
superiority; instead the context must mix with the target to create highly specific EFs for this effect
to arise. Panel f provides another illustration of inferiority with curves.

Not all relational properties qualify as emergent


EFs abound in perception: from a few squiggles on paper, a face emerges; from three Pac-man
figures, a Kanizsa triangle emerges (Kanizsa 1979). Are there constraints on what can and cannot
be regarded as an EF? Certainly there are. One might claim that any arbitrary relationship may
constitute an EF; e.g., the ratio of the diameter of the left eye to the length of the right foot. To
establish this unlikely whole as emerging from those two parts, one must find empirical confir-
mation through a configural superiority effect or other converging operation. Below we consider
several possibilities, ranging from whether ‘wordness’ emerges as a salient feature from sequences
of letters to whether topological properties arising from arrangements of geometrical forms are
similarly salient. When the Dalmatian dog first pops out of the famous R. C. James photograph, it
is certainly a surprise for the perceiver, meeting that criterion for a Gestalt. But should we claim
that any and all acts of recognition constitute emergence, or are some of them the result of more
conventional (albeit complex) processes of recognition through parts, as with Feature Integration
Theory? As we shall see, there are as yet only a few hypothesized EFs that have passed the initial
tests to be outlined here, so it seems likely that conventional feature integration may be the norm.

Candidate EFs in human vision


The classic Gestalt ‘laws’
If the human visual system perceives only certain special relationships as Gestalts—if wholes
emerge from only certain configurations of parts—what are the top EF candidates we should
92 Pomerantz and Cragin

consider? The Gestaltists themselves generated hundreds of ‘laws’ (principles) of grouping,


although some of these are vague, others may be merely confounded with other, genuine grouping
principles, and yet others may simply be minor variants from each other. According to our view,
each of the remaining laws could potentially be linked to a testable EF. Figure 5.2 shows a classic
example of a configuration typically seen as a curvy X: two lines that intersect to form a cross. The
same configuration could be seen instead as two curvy, sideways Vs whose vertices are coincident
(‘kissing fish’), but this is rarely perceived, arguably because of the law of good continuation: per-
ception favors alternatives that allow contours to continue with minimal changes in direction.
As Figure 5.2 illustrates, candidates for EFs often are tied to non-accidental properties
(Biederman 1987; Rock 1983), i.e., image properties that are unlikely to arise from mere accidents
of viewpoint. Exceptions to this rule will be noted below. For the curvy Vs interpretation to be
correct, not only would the two vertices have to be superimposed perfectly from the given viewing
angle, but both pairs of line segments making up the Vs would have to be oriented perfectly to
continue smoothly into one another. This interpretation is exceptionally unlikely and so percep-
tion rejects it as highly improbable.
Below we identify a number of plausible EFs in vision underlying the classic Gestalt laws.
Historically, support for these EFs, in the form of grouping laws, came largely from phenom-
enology. In the subsequent section we consider rigorous methodologies that go beyond simple
phenomenology to confirm psychological reality of certain of these potential EFs. The result-
ing advantage over time-honored Gestalt grouping principles would be a systematic approach to
those principles, not only introducing a single method for confirming their existence but perhaps
a uniform scale on which they can be measured.

Possible EFs in human vision


Figure 5.3 illustrates seventeen potential EFs in vision, properties that emerge from parts that
meet at least the test of phenomenology. We start in Panel A with potential EFs that emerge from
the simplest possible stimuli: dot patterns.

Proximity
If the field of vision contains just a point or dot, as in Panel a’s Base displays, that dot’s only
functional feature is its location (x, y coordinates in the plane). If a second dot is added from the
Context displays to create the Composite display, we have its position too, but new to emerge is
the distance or proximity between the two. (This is separate from Gestalt grouping by proximity,
which we address below.) Note that proximity is affected by viewpoint and thus is a metric rather
than a non-accidental property.

Orientation
In this two-dot stimulus, a second candidate EF is the angle or orientation between the two dots.
Orientation too is an accidental property in that the angle between two locations changes with
perspective and with head tilt.

= + or +

Fig. 5.2  Ambiguous figure: crossing lines or kissing fish?


Emergent features and feature combination 93

Linearity
Stepping up to 3-dot configurations, all three dots may all fall on a straight line, or they may form
a triangle (by contrast, two dots always fall on a straight line). Linearity, as with all the potential
EFs listed below, is a non-accidental property in that if three points fall on a straight line in the
distal stimulus, they will remain linear from any viewpoint.

Symmetry (axial)
Three dots may be arranged symmetrically or asymmetrically about an axis (by contrast, two dots
are necessarily symmetric). More will be said about other forms of symmetry in a subsequent
section.

Surroundedness
With four-dot configurations, one of the dots may fall inside the convex hull (shell) defined by
the other three, or it may fall outside (consider snapping a rubber band around the four dots and
seeing whether any dot falls within the band’s boundary).
We now consider the EFs in Panel b, which require parts that are more complex than dots to
emerge. Here we use line segments as primitive parts.

(a)
Base Context Composite

Proximity

Orientation

Linearity

Symmetry

Surroundedness

Fig. 5.3  Potential basic EFs in human vision created from simple configurations of dots (Panel a)
or line segments (b) or more complex parts forming composites resembling 3D objects, faces,
or motion (c). The pair of figures on the left of each row shows a base discrimination with dots
or lines differing in location and/or orientation. The middle pair shows two identical context
elements, one of which is added to each base to form the composite pairs on the right that
contain potential EFs. In actual experiments, these stimulus pairs were placed into odd-quadrant
displays with one copy of one of the two base stimuli and three copies of the other. Note that
many of the rows contain additional EFs besides the primary one labeled at the far right.
(b)
Base Context Composite

Parallelism

Collinearity

Connectivity

Intersection

Lateral endpoint
offset

Terminator count

Pixel count

(c)
Base Context Composite

Topology

Depth

Motion/
flicker

Faces

Kanizsa

Fig. 5.3  Continued


Emergent features and feature combination 95

Parallelism
Two line segments may be parallel or not, but a minimum of two segments is required for paral-
lelism to appear.

Collinearity
Again, two line segments are the minimal requirements. Items that are not fully collinear may be
relatable (Kellman & Shipley, 1991), or at least show good continuation, which are weaker ver-
sions of the same EF.

Connectivity
Two line segments either do or do not touch.

Intersection
Two line segments either intersect or do not. Two lines can touch without intersecting if they are
collinear and so form a single, longer line segment.

Lateral endpoint offset


If two line segments are parallel, their terminators (endpoints) may lie perpendicular to each
other such that connecting them either would or would not form right angles with the lines (if
not, they may look like shuffling skis).

Terminator count
This is not an emergent feature in the same sense as the others, but when two line segments con-
figure, their total terminator count is not necessarily four; if the two lines form a T, it drops to
three. This would illustrate an eliminative feature (Kubovy and Van Valkenburg 2002), where the
whole is less than the sum of its parts in some way.

Pixel count
This too is not a standard EF candidate, but the total pixel count (or luminous flux or surface
area) for a configuration of two lines is sometimes less than the sum of all the component lines’
pixel counts; if the lines intersect or if they superimpose on each other, the pixel count will fall,
sometimes sharply.

Finally, Figure 5.3 Panel (c) depicts five other EFs arising from elements more complex than dots
or lines. These EFs can be compelling phenomenally even though their key physical properties
and how they might be detected are less well understood:

Topological properties
When parts are placed in close proximity, novel topological properties may emerge, and these are
often salient to humans and other organisms. Three line segments can be arranged into a triangle,
adding the new property of a hole, a fundamental topological property (Chen 2005) that remains
invariant over so-called rubber sheet transformations. If a dot is added to this triangle, it will fall
either inside or outside that triangle; this inside-outside relationship is another topological property.

Depth
Depth differences often appear as EFs from combinations of elements that are themselves seen as
flat. Enns (1990) demonstrated that a flat Y shape inscribed inside a flat hexagon yields the per-
ception of a cube. Binocular disparity, as with random dot stereograms, is another classic example
96 Pomerantz and Cragin

of emergence (Julesz 1971). Ramachandran (1988) presented a noteworthy demonstration of


depth emerging from the combination of shading gradients and the shape of apertures.

Motion and flicker


Wertheimer’s (1912) initial demonstrations may rank motion as the quintessential EF, arising as
it does from static elements arranged properly in time and space. When noninformative (homo-
geneous) context elements are delayed in time from a base display such that motion is seen in
the transition composite, huge CSEs result using the same method otherwise as described above.
Flicker behaves similarly and, as with motion, is so salient they are standard methods for attract-
ing attention in visual displays. Higher-order motion phenomena too suggest further EFs, as with
Duncker’s (1929) demonstration of altered perceived trajectories when lights are attached to the
hub and wheel of a moving bicycle.

Faces
A skilled artist can draw just a few lines that viewers will group into a face. We see the same, less
gracefully, in emoticons and smiley faces:  ☺. Does ‘faceness’ constitute its own EF, or is it bet-
ter regarded as only a concatenation of simpler, lower-level grouping factors at work, including
closure, symmetry, proximity, etc.? This question encounters methodological challenges that will
be considered below.

Subjective (Kanizsa) figures


With the arrangement of three suitably placed Pac-man figures, a subjective triangle emerges that
is convincing enough that viewers believe it is physically present (Kanizsa 1979; Kogo & van Ee,
this volume). Certainly this demonstration passes the phenomenological test for EFs. Remaining
to be resolved is whether the subjective triangle is a unique EF in its own right or whether it results
merely from conventional (non-Gestalt) integration of more primitive EFs; e.g., subjective lines
could emerge from the collinear contours of the Pac-man figures, but the appearance of a whole
triangle from three such emergent lines might not be a proper Gestalt.

Similarity and proximity as special EFs


Two well-known Gestalt principles, grouping by similarity and by proximity, merit further discus-
sion. Similarity is excluded from this chapter because it often refers to a psychological concept
of how confusable or equivalent two stimuli appear to be rather than to the physical concept of
objective feature overlap or equivalence. The existence of metamers and of multistable stimuli
forms a double dissociation between perceptual and physical similarity that may help clarify this
distinction. Also, the term similarity can be overly broad; proximity, for example, could be seen as
similarity of position; parallelism or collinearity could be viewed as similarity of orientation, etc.
The limiting case of similarity is physical identity. It’s true that the same-different distinction is
highly salient in vision, but it can be regarded as a form of symmetry, viz. translational symmetry
(see below on symmetry).
Above we present proximity as the first on our list of potential EFs in vision, and below we
present evidence confirming this possibility. We believe proximity may be a qualitatively different
property from the others in the sense that it appears to work in conjunction with, or to modulate
the effects of, other principles listed above (like parallelism and symmetry) rather than being a
grouping principle in its own right. For example, collinearity will be salient between two lines if
they are proximal, and thus they will group; but not if they are separated further. Proximity alone
doesn’t force grouping: attaching a door key to a coffee cup does not make them group into a
Emergent features and feature combination 97

single object despite the zero distance separating them. Unrelated objects piled together may form
a heap, but they usually will create no emergence or Gestalt.

A note on symmetry
Symmetry has been a pervasive property underlying Gestalt thinking from its inception (van der
Helm in press A, this volume). From its links with Prägnanz and the minimum principle (van
der Helm in press B, this volume) to its deep involvement with aesthetics, symmetry appears to be
more than just another potential EF in human perception. And well it might be, given the broad
meaning of symmetry in its formal sense in the physical and mathematical sciences. In the present
chapter, we focus on axial (mirror image) symmetry, but rotational and translational symmetry
may be considered along with translational symmetry. Formally, symmetry refers to properties
that remain invariant under transformation, and so its preeminence in Gestalt theory may come
as no surprise. We could expand our list of potential EFs to include the same versus different
distinction as a form of translational symmetry. We have only begun to explore the full status of
symmetry, so defined, using the approaches described here.

Establishing and quantifying emergent features


via configural superiority
With this long list of potential EFs in vision, how can we best determine which of them have psy-
chological reality for human perceivers? How can we tell that a Gestalt has emerged from parts, as
opposed to a structure perceived through conventional, attention-demanding feature integration?
A start would be finding wholes that are perceived more quickly than their parts. If people per-
ceive triangles or arrows before perceiving any of their component parts (e.g., three line segments
or their vertices), that suggests the whole shapes are Gestalts; otherwise it would be more prudent
to claim that triangles and arrows are assembled following the detection and integration of their
parts in a conventional feedforward manner.

Configural superiority, the odd quadrant task,


and the superposition method
We start with the odd quadrant paradigm: Subjects are presented with displays like those shown
in Figure 5.1 to measure how quickly and accurately they can locate the odd quadrant1. No rec-
ognition, identification, description, or naming is required. As noted, people are much faster and
more accurate at finding the arrow in a field of triangles in Panel b than at finding the negative
diagonal in a field of positive diagonals in Panel a. The diagonal’s orientation is the only element
differentiating the arrow from the triangle, so it follows that ‘arrowness vs. triangularity’ must not
be perceived following perception of the diagonals’ orientations. Instead, this whole apparently
registers before the parts, thus displaying configural superiority.
The simplicity of this superposition method—overlaying a context upon a base discrimi-
nation—and its applicability to almost any stimuli are what make it attractive. Returning
to Figure  5.3, we see several base and composite stimuli that have been tested using the odd
quadrant task. The discriminative information in each base is the same as in its matching com-
posite displays: We start with a fixed Base odd quadrant display and place one of the two base

  Although we typically use four-quadrant stimuli for convenience, there is nothing special about having four stimuli
1

or about arranging them into a square. In some experiments we use three in a straight line or eight in a circle.
98 Pomerantz and Cragin

stimuli into one quadrant and the other into the remaining three quadrants. We then create the
Composite display by superimposing an identical context element in each of the four quadrants
of the Base. Any context can be tested. In the absence of EFs, the context should act as noise and
make performance worse in the composite. The logic behind this superposition method fol-
lows from the eponymous superposition principle common to physics, engineering, and systems
theory.
Again, the composite is far superior to the base with the arrow and triangle displays in Figure
5.1, indicating a configural superiority effect (CSE). But it remains unclear which EF is responsible
for this CSE—it could involve any combination of closure, terminator count, or intersection type
because arrows differ from triangles in all three whereas positive diagonals differ from negatives
on none of them. As Panel c shows, shifting the position of the superimposed Ls eliminates all
three potential EFs and eliminates the CSE as well. Panels d and e show another CSE using base
stimuli varying in direction of curvature rather than in orientation. Here again, discriminating
pairs of curves such as (( and () is easier than discriminating single curves, a result that could be
due to any combination of parallelism, symmetry, or implied closure, all of which emerge in the
composite panel. Panel f shows that rotating the context curve eliminates both the EF differences
and the CSE, indicating that it is not just any inter-curve relationship from which a CSE arises but
rather only special ones giving rise to EFs.

Confirmation of proximity, orientation, and linearity as EFs


Figure 5.3 shows a large number of base and composite stimuli, each of which suggests some
potential EF or EF combination that has been evaluated using this criterion of CSEs (Pomerantz
and Portillo 2011). A future goal will be disentangling these CSEs to show what EFs appear with
the simplest stimuli. For now, with the dots in Panel a, observers are faster to find the quadrant
containing dot pairs differing in proximity than to find the single dot oddly placed in its quadrant,
even though that odd placement is solely responsible for the proximity difference. Stated differ-
ently, viewers can tell the distance between the dots better than the positions of the individual
dots, implying that proximity is computed before, not after, determination of the dots’ individual
positions. This in turn indicates that proximity is an EF in its own right, a Gestalt of the most
elementary sort, emerging as it does from just two dots.
The next row in Panel a shows that viewers can similarly tell the orientation or angular differ-
ence between two dots better than the position of either dot. Again, this indicates that orienta-
tion is not derived from those positions but is registered directly as an EF. Subsequent panels
of three-dot patterns similarly show CSEs where the EFs at work appear to be symmetry and
linearity.
The sets in Figure 5.3 Panel b show CSEs for selected EF candidates from two-line stimuli
(Stupina [Cragin] 2010), which allow for additional EF candidates beyond those possible with
just dots. The number of configurations possible from two line segments varying in position and
orientation is huge, but Cragin sampled that stimulus space using the odd quad paradigm. Her
results confirmed several candidate EFs working in combination: parallelism, collinearity, con-
nectivity, and others shown in Figure 5.3 Panel b. For example, people are faster to discriminate
parallel line pairs from non-parallel than they are to discriminate a single line of one orientation
from lines of another orientation even though that orientation difference is all that makes the par-
allel pair differ from the non-parallel pair. Stated differently, people apparently know whether two
lines are parallel before they know the orientation of either. This again is a CSE, and it indicates
confirmation of parallelism as an EF.
Emergent features and feature combination 99

Although these results confirm EFs arising with two-line stimuli, they do not provide inde-
pendent confirmation for each individual EF because EFs often co-occur, making it hard to isolate
and test them individually. Just as the arrow-triangle (three-line) example showed a confounded
co-occurrence of closure, terminator count, and intersection type, it can be challenging to sepa-
rate individual EFs even with two-line stimuli. For example it is difficult to isolate the feature of
intersection without engaging the feature of connectivity, because lines must be connected to
intersect (albeit not vice versa). Stupina ([Cragin] 2010) has shown that our ability to discriminate
two-line configurations in the odd quadrant task can be predicted well from their aggregate EF
differences. As noted below, however, further work is needed to find independent confirmation
of some of these EF candidates. For now, it is clear there are multiple, potent EFs lurking within
these stimuli.
Panel c of Figure 5.3 shows additional EFs involving a number of topological features (which
often yield very large CSEs), depth cues (Enns 1990), Kanizsa figures, and faces. Yet more can-
not be displayed readily in print because they involve stereoscopic depth, motion, or flicker. To
date, no experiments using the measurements described above have found clear EFs appearing in
cartoon faces or in words, but future work may change that with such stimuli that seem to have
Gestalt properties.

Converging operations from garner and stroop interference


If configural superiority as measured by the odd quadrant task is a good method for detecting EFs,
it is still only a single method. Converging operations (Garner et al. 1956) may help separate EFs
from the particular method used to detect them. Another converging measure is selective atten-
tion as measured by Garner Interference (GI), the interference observed in speeded classification
tasks from variation on a stimulus dimension not relevant to the subject’s task (Garner, 1974).
When subjects discriminate an arrow from a triangle differing from it only in the orientation of
its diagonal, they are slower and less accurate if the position of the superimposed L context also var-
ies, even though logically that variation is irrelevant to their task. This interference from irrelevant
variation is called GI, and it indicates subjects are attending to the L even though it is not required.
This in turn suggests the diagonals and Ls are grouping into whole arrows and triangles, and that it
is those wholes, or the EFs they contain, that capture S’ attention. Similarly if subjects discriminate
rapidly between ((and (), logically they need attend only to the right hand member of each pair. But
if the left hand member varies from trial to trial, such that they should make one response to either
((or)(and another response to () or)), they become much slower and more error-prone than when
the left element remains fixed. This indicates again that Ss are attending to both members of the
pair, suggesting the two curves grouped into a single stimulus and Ss were attending to the whole or
EF. If the irrelevant parenthesis is rotated 90 degrees so that no identifiable EFs arise, GI disappears.
Cragin et al. (2012) examined various configurations formed from line segments and found broad
agreement between the CSE and GI measures of grouping, with the latter also being well predicted
by the number of EFs distinguishing the stimuli to be discriminated. These results agree with the
CSE data and so converge on the idea that both CSE and GI reveal the existence of EFs.
If GI converges well with CSEs, will Stroop Interference (SI) converge as well? Unlike GI, which
taps interference from variation between trials on an irrelevant dimension, SI taps interference
from the content on an irrelevant dimension on any one trial. In classifying pairs of curves such as ((
or () from )( or )), will subjects be faster on the pairs ((and)) because their two curved elements are
congruent, but slower on pairs () and)(where the curves are incongruent, curving in opposite direc-
tions? That too might indicate that the curves had grouped and either both were processed or neither
100 Pomerantz and Cragin

processed. In general, however, little or no SI arises with these stimuli or with most other stimuli
that are known to yield GI (see Pomerantz et al. 1994 for dozens of examples).2
Why might this contradiction exist between GI and SI, two standard methods for assessing
selective attention? In brief, GI occurs for the reason given above: the two elements group, and
Ss attend to the EFs arising between the elements, EFs that necessarily span the irrelevant parts.
However with SI, the same grouping of the elements precludes interference: for any two elements
to conflict or be congruent, there must of course be two elements. If the two elements group into
one unit, there are no longer two elements and thus no longer an opportunity for the two to be
congruent or incongruent. Perceivers are looking at EFs, not elements.
There is an alternative explanation for the lack of SI when parts group. The two elements in the
stimulus ((may seem congruent in that they both curve to the left; but when considered as a whole,
the left element is convex and the right is concave. Thus the two agree in direction of curvature but
disagree in convexity. The conclusion: when Gestalts form, the nature of the coding may change
radically, and a measure like SI that presumes separate coding of elements is no longer appropriate.
In sum, GI provides a strong converging operation for confirming EFs, but SI does not.

Converging operations from redundancy gains and losses


Stimuli can often be discriminated from one another more quickly if they differ redundantly in
two or more dimensions. Thus red versus green traffic lights are made more discriminable by
making them different in their position as well as color; coins are made more discriminable by
differing in diameter, color, thickness, etc. When two configurations are made to differ in multiple
parts rather than just one, do they too become more discriminable? Not necessarily; sometimes
the opposite happens.
Consider a square in Figure 5.4 whose width is increased significantly to create a rectangle. If
that rectangle is increased in height, this may not create even greater discriminability from the
original because the shape goes back to being a square, albeit a larger one. Or consider the triangle
in the lower part that is made into an arrow by changing the orientation of its diagonal. If that
arrow is then changed by moving its vertical from the left to the right side of the figure, will the
result be even more different from the original triangle? No, we will have returned to another tri-
angle, which—while different in orientation from the original triangle—is harder to discriminate
from the original than was the arrow. The conclusion is that just as the arrow and triangle stimuli
show CSEs and GI, they also show ‘redundancy losses’, a third converging operation that taps into
EFs: by changing the diagonal and then the vertical of a triangle, the EFs end up unchanged.

Theory of basic gestalts, EF hierarchies, and the


ground-up constant signal method
Disentangling multiple potential EFs remains a challenge because it is difficult or impossible to
alter any aspect of a form without inadvertently altering others; for example, altering the perim-
eter of a form generally alters its area. As a result, we face the challenge of confounded potential

  Exceptions to this generalization may occur when EFs happen to be correlated with congruent vs. incongru-
2

ent pairs, e.g. with the four-stimulus set ‘((, (),) (,))’ congruent stimuli such as ((contain the EF of parallelism
but lack symmetry about the vertical axis whereas incongruous stimuli like () contain symmetry but lack
parallelism. This set yields Garner but no Stroop. With the stimulus set ‘| |, | |, | |, | |’ however, congruent
stimuli such as | | contain symmetry and parallelism whereas incongruous stimuli such as | |  lack either. This
set yields both Garner and Stroop. The key factor determining whether Stroop arises is the mapping of sali-
ent EFs onto responses; configurations by themselves yield no Stroop.
Emergent features and feature combination 101

(c)

(a) (b)

(a) (b) (c)

Fig. 5.4  Two progressions in which an original form A is modified in one way to create a different
form B, but a second modification results in a form C that is more similar to the original than is B.

Proximity Linearity
Position orientation symmetry Surroundedness

Length/prox Collinearity Closure Inside/outside


orientation symmetry intersections
terminators parallelism inflection pts
intersections
Fig. 5.5  Ground-Up Constant Signal Method for revealing hierarchies of EFs. Top row shows how novel
features emerge as additional dots are added to a stimulus, while the bottom row shows the same for
line segments.
Adapted from James R. Pomerantz and Mary C. Portillo, Grouping and emergent features in vision: Toward
a theory of basic Gestalts, Journal of Experimental Psychology: Human Perception and Performance, 37 (5)
pp. 1331–1349, DOI: org/10.1037/a0024330 © 2011, American Psychological Association.

EFs. The Theory of Basic Gestalts (Pomerantz and Portillo 2011) addresses this challenge by com-
bining the Ground-Up Method for constructing configurations from the simplest possible ele-
ments in Figure 5.5 with a Constant Signal Method that minimizes these confounds by adding
context elements incrementally to a fixed base discrimination. This allows EFs to reveal their
presence through new CSEs in the composites.
Figure 5.6 Panel a shows a baseline odd quadrant display containing one dot per quadrant, with
one quadrant’s dot placed differently than in the other three quadrants. In Panel b, a single, identi-
cally located dot is added to each quadrant, which nonetheless makes locating the odd quadrant
much faster. This is a CSE demonstrating the EF of proximity (Pomerantz and Portillo 2011). In
Panel c, another identically located dot is added again to make a total of three per quadrant, and
again we see a CSE in yet faster performance in Panel c than in the baseline Panel a. This second
102 Pomerantz and Cragin

(a) (b) (c)

Fig. 5.6  Building EFs with the Ground-Up Constant Signal method. Panel (a) shows the base signal,
with the upper left quadrant having its dot at the lower left, versus the lower right in the other three
quadrants. Panel (b) adds a first, identical context dot to each quadrant in the upper right, yielding
a composite containing an EF of the orientation between the two dots now in each quadrant, a
diagonal versus vertical angle. Panel (c) adds an identical, third context dot to each quadrant, near
to the center, yielding a composite containing an EF of linearity versus nonlinearity/triangularity.
Speed and accuracy of detecting the odd quadrant improves significantly from Panel (a) to (b) to (c),
although the signal being discriminated remains the same.

CSE could be taken as confirmation of the EF of linearity, in that it is so easy to find the linear
triplet of dots in a field of nonlinear (triangular) configurations. But first we must rule out that
the CSE in Panel c relative to Panel a is not merely the result of the already-demonstrated EF of
proximity in Panel b. Dot triplets do indeed contain the potential EF of linearity vs. triangularity
but they also contain EFs of proximity and/or orientation arising from their component dot pairs,
so the task is to tease these apart.
The first key to dissociating these two is that the identical stimulus difference between the odd
and the remaining three quadrants exists in Panel c as exists in Panels b and A of Figure 5.6. This
is the unique contribution of the Ground-Up Constant Signal Method: the signal that Ss must
detect remains the same as new context elements are added. The second key is that Panel c shows
a CSE not only with respect to Panel a but also with respect to Panel b. This indicates that the third
dot does indeed create a new EF over and above the EF that already had emerged in Panel b. That
in turn supports linearity’s being an EF in its own right, over and above proximity. It shows how
EFs may exist in a hierarchy, with higher-order EFs like linearity arising in stimuli that contain
more elements.
Pomerantz and Portillo (2011) used this Ground-Up Constant Signal method to demonstrate
that linearity is its own EF with dot triplets whether the underlying signal contained a proximity
or orientation difference with dot pairs. They also showed that the EF of proximity is essentially
identical in salience to the EF of orientation in that the two show comparably sized CSEs com-
pared with the same base stimulus with just one dot per quadrant. Over the past 100 years, it
has been difficult to compare the strengths of different Gestalt principles of grouping because of
‘apples vs. oranges’ comparisons, but because the Ground-Up Constant Signal Method measures
the two on a common scale, their magnitudes may be compared directly and fairly.
To date this method has confirmed that the three most basic or elemental EFs in human vision
are proximity, orientation, and linearity. They are most basic in the sense that they emerge from
the simplest possible stimuli and that their EFs do not appear to be reducible to anything more
elemental (i.e., the CSE for linearity occurs over and above the CSEs for the proximity or orienta-
tion EFs it necessarily contains). Axial symmetry has yielded mixed results; further tests will be
Emergent features and feature combination 103

needed to determine whether it is or is not a confirmed EF. The results for surroundedness have
been somewhat less ambiguous: it does not appear to be an EF, although the evidence is not totally
conclusive (Portillo 2009).
Work is ongoing to test additional potential EFs using the same Ground-Up, Constant Signal
Method to ensure fair comparisons and to isolate the unique contribution made by each EF indi-
vidually, given that they often co-occur. As a lead up to that, Stupina ([Cragin] 2010) has explored
several regions of two-line stimulus space using this method, and she has found up to 8 EFs there.

Strengths and limitations of the method


The primary strengths of the Ground-Up Constant Signal Method are allowing an objective meas-
urement of EF (grouping) strength; ensuring this strength can be compared fairly across different
EFs on the same scale of measurement; and ensuring that the EFs it detects cannot be reduced to
more elementary EFs.
The method has limitation, however. It is almost certainly an overly conservative method that
is more likely to miss genuine EFs than to issue false positives. This is because as context ele-
ments are added to the base signal discrimination—added dots or line segments—deleterious
consequences will accumulate, thus making it harder for a CSE to appear. Besides allowing EFs
to arise, the superimposed context elements could mask or crowd the targets (Levi 2008), mak-
ing performance worse. Moreover, because the added context elements are always identical, they
should dilute the dissimilarity of the target to the distracters (Tversky 1977). Adding context ele-
ments also increases the chances that perceivers will attend to the irrelevant and non-informative
contexts rather than to the target signal, and it increases the overall informational load—the total
stimulus ensemble—that must be processed. When CSEs are detected, they occur in spite of these
five factors, not because of them. And with the Ground-Up Constant Signal Method where new
context elements are piled on top of old, it becomes less and less likely that any benefit from
new EFs would suffice to overcome the resulting mountain of negatives. For this reason, efforts
are underway to measure the adverse effects of these five factors separately and to correct our
CSEs measurements for them. If this effort succeeds, more CSEs—and thus EFs—may become
apparent.

Other types of emergent features


This review has focused on EFs underlying classic Gestalt demonstrations that have received wide
attention over the last 100 years since their introduction. All of them so far have been in the visual
domain, but EFs likely abound in other modalities. There are other likely EFs in vision too that are
not normally associated with Gestalt phenomena but might as well be.

Color as a gestalt
Color is usually treated as a property of the stimulus and in fact makes the list of ‘basic features’
underlying human vision (Wolfe and Horowitz 2004). However, color is not a physical feature but
rather a psychological one; wavelength is the corresponding physical feature, and color originates
‘in the head’, from interactions of units that are sensitive to wavelength. Color certainly meets the
criterion of a non-linear, surprising property emerging when wavelengths are mixed: combining
wavelengths seen as red and green on a computer monitor to yield yellow is surely an unexpected
outcome (Pomerantz 2006)! What is more, even color fails to qualify as a basic feature in human
104 Pomerantz and Cragin

vision, because it is color contrast to which we are most sensitive; colors in a Ganzfeld fade alto-
gether. Moving (non-stabilized) edges providing contrast are required for us to see color.

EFs in other sensory modalities


Potential EFs arise in modalities other than vision, possibly in all modalities. In audition, when
two tones of similar but not identical frequency are sounded together, one hears beats or differ-
ence tones, which are so salient that musicians use them to tune their instruments. With other
frequency relationships, one may experience chords if the notes are separated harmonically; low-
ering one of the three tones in a triad of a major chord by a semitone can convert it into a minor
chord that, phenomenally, leads to a vastly different percept. Whether this major-minor distinc-
tion qualifies as an EF by the CSE criterion advanced here remains to be determined; that would
require the major-minor difference to be more salient that the frequency difference separating
the two tones that make a chord sound major versus minor. Other potential EFs with simple tone
combinations might involve dissonance and the octave relationship.
Gestalt grouping arises in the haptic senses, as has been recently demonstrated (Overvliet et al.
2012), suggesting that EFs may be found in that modality. Potential EFs may abound in the chemi-
cal senses as well; after all, a chef ’s final creation is clearly different from the mere sum of its
ingredients. Human tasters are notoriously poor at identifying the ingredients in foods, as the
long-held secret of Coca Cola’s formula attests. This suggests that what people perceive through
smell and taste are relational properties that emerge when specific combinations of odorants or
tastants are combined. Future research may identify configural properties in our chemical senses
that lead to superiority effects; if so, this should identify the core EFs that guide our perception of
taste and odors.

Hyper-emergent features?
If novel features can emerge from combinations of more elementary, ‘basic’ features, then can novel
features arise from combinations of EFs too, creating something we may call hyper-emergent fea-
tures? Given that our ultimate goal is to understand how we perceive complex objects and scenes,
these may play an essential role there.

Conclusions
This chapter aims to define EFs, explaining how they are identified and quantified, and enumerat-
ing those that have been confirmed to date. The Gestalt psychologists struggled to define grouping,
likening it variously to a belongingness or to a glue binding parts together, and advancing ambigu-
ous claims such as, ‘A strong form coheres and resists disintegration by analysis into parts of by
fusion with another form’ (Boring 1942). Working from the Theory of Basic Gestalts (Pomerantz
and Portillo 2011), we view grouping neither as a coherence, as a glue or a belongingness, nor as
a loss of independence when two items form a single perceptual unit. Instead we see grouping as
the creation of novel and salient features—EFs—to which perceivers can and do preferentially
attend. When we view an isolated stimulus such as a dot, we can roughly determine its x and y
coordinates in space, but we are much better determining the distances and angle between two
dots than we are at determining the position of either dot. This superiority of configurations, even
simple ones, is the defining feature of EFs, and we have uncovered over one dozen that meet this
criterion. The goal of future work is to explore additional EFs meeting this criterion and to ensure
Emergent features and feature combination 105

that these new EF are detectable through other, converging operations such as those derived from
selective attention tasks.

Unresolved issues and challenges


One current challenge to this method is that it may be, and probably is, overly conservative, and
so is more likely to miss a genuine EF than to false-positively identify one that is not genuine, as
noted above. Determining a correction for this is an immediate challenge.
A second challenge will be to develop neural and computational models to explain configural
superiority. When perceivers view a triangle, we have a fairly clear idea how its three component
line segments may be detected by the simple and complex cells discovered decades ago by Hubel
and Wiesel (1962). We know less well how a feature such as closure is processed; not only do we
not know how the closure of three lines is detected but how that occurs more quickly than the
orientation of its three component line segments is detected. A major advance on this problem
was made recently by Kubilius et al. (2011), showing that brain area LOC is best able to tell arrows
from triangles but that V1 is best able to distinguish line orientations. But how is it that people
can respond more quickly to the arrows and triangles if those are processed in LOC then they can
respond to oriented line segments that can be processed in V1? A possible explanation is that V1
can detect but cannot compare line orientations; LOC handles the latter, but more slowly with line
segments than with whole arrows and triangles.

References
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94, 2: 115–47.
Boring, E. G. (1942). Sensation and Perception in the History of Experimental Psychology.
(New York: Appleton-Century-Crofts).
Chen, L. (2005). ‘The topological approach to perceptual organization’. Visual Cognition 12: 553–637.
Cragin, A.I., Hahn, A.C., and Pomerantz, J.R. (2012) Emergent Features Predict Grouping in Search and
Classification Tasks. Talk presented at the 2012 Annual meeting of the Vision Sciences Society, Naples,
FL, USA. In: Journal of Vision 12(9): article 431. doi:10.1167/12.9.431.
Duncker, K. (1929). Über induzierte Bewegung. Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung. [On induced motion. A contribution to the theory of visually perceived motion].
Psychologische Forschung 12: 180–259.
Enns, J. T. (1990). ‘Three dimensional features that pop out in visual search’. In Visual Search, edited by
D. Brogan, pp. 37–45 (London: Taylor and Francis).
Feldman, J. (in press). Bayesian models of perceptual organization. In J. Wagemans (Ed.), Oxford Handbook
of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Garner, W. R. (1974). The Processing of Information and Structure. (Potomac, MD: Erlbaum).
Garner, W. R., Hake, H. W., and Eriksen, C. W. (1956). ‘Operationism and the concept of perception’.
Psychological Review 63, 3: 149–56.
Hubel. D. H. and Wiesel, T.N. (1962). ‘Receptive fields, binocular interaction and functional architecture in
the cat’s visual cortex’. Journal of Physiology 160: 106–54.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: The University of Chicago Press.
Julesz, B. (1981). ‘Textons, the elements of texture perception, and their interaction’. Nature 290 (March 12,
1981): 91–7.
Kanizsa G. (1979). Organization in Vision: Essays on Gestalt Perception. (New York: Praeger Publishers).
106 Pomerantz and Cragin

Kellman, P. J. and Shipley, T. F. (1991). ‘A theory of visual interpolation in object perception’. Cognitive
Psychology, 23: 141–221.
Kogo, N. and van Ee, R. (in press). ‘Neural mechanisms of figure-ground organization: Border-ownership,
competition and perceptual switching’. In Oxford Handbook of Perceptual Organization, edited by J.
Wagemans. (Oxford: Oxford University Press).
Kubilius, J., Wagemans, J., and Op de Beeck, H. P. (2011). ‘Emergence of perceptual Gestalts in the human
visual cortex: The case of the configural superiority effect’. Psychological Science 22: 1296–303.
Kubovy, M. and Van Valkenburg, D. (2002). ‘Auditory and visual objects’. In Objects and Attention,
Scholl, B. J., pp. 97–126 (Cambridge, MA: MIT Press).
Levi, D. M. (2008). ‘Crowding—an essential bottleneck for object recognition: a mini-review’. Vision
Research 48 (5): 635–54.
Neisser, U. (1967). Cognitive Psychology. (New York: Appleton, Century, Crofts).
Overvliet, K. E., Krampe, R.T., and Wagemans, J. (2012). ‘Perceptual Grouping in Haptic Search: The
Influence of Proximity, Similarity, and Good Continuation’. Journal of Experimental Psychology: Human
Perception and Performance 38(4): 817–21.
Pomerantz, J. R. (2006). ‘Color as a Gestalt: Pop out with basic features and with conjunctions’. Visual
Cognition 14: 619–28.
Pomerantz, J. R. and Kubovy, M. (1986). ‘Theoretical approaches to perceptual organization’. In
Handbook of Perception and Human Performance, K. R. Boff, L. Kaufman, and J. Thomas, pp. 36–46.
(New York: John Wiley & Sons).
Pomerantz, J. R. and Portillo, M. C. (2011). ‘Grouping and emergent features in vision: Toward a theory of
basic Gestalts’. Journal of Experimental Psychology: Human Perception and Performance 37: 1331–49.
Pomerantz, J. R. and Portillo, M.C. (2012). ‘Emergent Features, Gestalts, and Feature Integration Theory’.
In Perception to Consciousness: Searching with Anne Treisman, edited by J. Wolfe and L. Robertson, pp.
187–92. (New York: Oxford University Press).
Pomerantz, J. R., Sager, L. C., and Stoever, R. J. (1977). ‘Perception of wholes and their component
parts: Some configural superiority effects’. Journal of Experimental Psychology: Human Perception and
Performance 3: 422–35.
Pomerantz, J. R., Carson, C. E., and Feldman, E. M. (1994). ‘Interference effects in perceptual organization’.
In Cognitive Approaches to Human Perception, edited by S. Ballesteros, pp. 123–52. (Hillsdale,
NJ: Lawrence Erlbaum Associates).
Portillo, M. C. (2009). Grouping and Search Efficiency in Emergent Features and Topological Properties in
Human Vision. Unpublished doctoral dissertation, Rice University, Houston, Texas, USA.
Ramachandran, V. S. (1988). ‘Perception of shape from shading’. Nature 331, 14: 163–66.
Rock, I. (1983). The Logic of Perception. (Cambridge, MA: MIT Press).
Stephan, A. (2003). ‘Emergence’. Encyclopedia of Cognitive Science. (London: Nature Publishing Group/
Macmillan Publishers).
Stupina, A.I. [now Cragin, A.I] (2010). Perceptual Organization in Vision: Emergent Features in Two-Line
Space. Unpublished master’s thesis, Rice University, Houston, Texas, USA.
Townsend, J. T. (1971) ‘A note on the identifiability of parallel and serial processes’. Perception and
Psychophysics 10: 161–3.
Treisman, A. and Gelade, G. (1980). ‘A feature integration theory of attention’. Cognitive Psychology
12: 97–136.
Treisman, A. and Gormican, S. (1988). ‘Feature analysis in early vision: evidence from search asymmetries’.
Psychological Review 95: 15–48.
Treisman, A. and Souther, J. (1985). ‘Search asymmetry: a diagnostic for preattentive processing of
separable features’. Journal of Experimental Psychology: General 114: 285–310.
Emergent features and feature combination 107

Tversky, A. (1977). ‘Features of similarity’. Psychological Review 84(4): 327–52.


Van der Helm, P. A. (in press a). ‘Symmetry perception’. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Van der Helm, P. A. (in press b). ‘Simplicity in perceptual organization’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R
(2012a). ‘A century of Gestalt psychology in visual perception I: Perceptual grouping and figure-Ground
organization’. Psychological Bulletin 138 (6): 1172–217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R. Pomerantz, J. R., van der Helm, P., and van
Leeuwen (2012b). ‘A century of Gestalt psychology in visual perception II: Conceptual and theoretical
foundations’. Psychological Bulletin 138 (6): 1218–52.
Wertheimer, M. (1912). ‚Experimentelle Studien über das Sehen von Bewegung [Experimental studies on
seeing motion]‘. Zeitschrift für Psychologie 61: 161–265. Translated extract reprinted as ‘Experimental
studies on the seeing of motion’. In (1961). Classics in Psychology, edited by T. Shipley, pp. 1032–89
(New York: Philosophical Library).
Wolfe, J. M. and Horowitz, T.S. (2004). ‚What attributes guide the deployment of visual attention and how
do they do it?’ Nature Reviews: Neuroscience 5: pp. 1–7.
Chapter 6

Symmetry perception
Peter A. van der Helm

Introduction
Mirror symmetry (henceforth, symmetry) is a visual regularity that can be defined by configura-
tions in which one half is the mirror image of the other (see Figure 6.1a)—these halves then are
said to be separated by a symmetry axis.1 Albeit with fluctuating degrees of asymmetry, it is abun-
dantly present in the world. For instance, the genetic blueprint of nearly every organism implies
a symmetrical body—if the mirror plane is vertical, this conveniently yields gravitational stabil-
ity. Furthermore, many organisms tend to organize things in their environment such that they are
symmetrical—think of bird nests and human art and design (Hargittai 1986; Shubnikov and Koptsik
1974; Washburn and Crowe 1988; Weyl 1952; Wynn 2002; van Tonder and Vishwanath, this volume;
Koenderink, this volume). Presumably, for organisms with symmetrical bodies, symmetrical things
are practical to make and to work with (Allen 1879). Think also of the preference which many
organisms have for more symmetrical shapes over less symmetrical ones in mate selection and, by
pollinators, in flower selection (Møller 1992, 1995; Johnstone 1994; Swaddle and Cuthill 1993). This
preference presumably favors mates and flowers with high genetic quality (Møller 1990). Currently
relevant is that it also requires a considerable perceptual sensitivity to symmetry—which many spe-
cies of mammals, birds, fish, and insects indeed are known to have (Barlow and Reeves 1979; Beck
et al. 2005; Giurfa et al. 1996; Horridge 1996; see also Osorio and Cuthill, this volume).
In human perception research, detection of symmetry is in fact assumed to be an integral part of
the perceptual organization process that is applied to every incoming visual stimulus (Tyler 1996;
van der Helm and Leeuwenberg 1996; Wagemans 1997). This assumption has been related to the
idea that extraction of regularities like symmetry can be used to model the outcome of the percep-
tual organization process, because it would allow for efficient mental representations of patterns
(for more details about this idea and its potentially underlying neuro-cognitive mechanisms, see
van der Helm, this volume). It has also been related to the idea that the high perceptual sensitivity
to symmetry arose because the evolution of visual systems selected individual regularities on the
basis of their relevance in the world (Tyler 1996). It may, however, also have arisen because the evo-
lution selected a general regularity-detection mechanism with sufficient survival value (cf. Enquist
and Arak 1994). The latter option suggests a package deal: to survive, a visual system’s detection
mechanism may pick up irrelevant regularities as long as it also picks up relevant regularities.
The foregoing indicates that perceptual organization and evolutionary relevance provide an
appropriate context for an appreciation of symmetry perception. It also indicates that, to this end,

1  This definition reflects the common usage of the word symmetry. In mathematics, the word symmetry is
also used to refer to any configuration that remains invariant under certain transformations; this definition
is suited to classify visual regularities, but another definition is needed to model their perception (see Section
“The scope of formal models of symmetry detection”).
Symmetry perception 109

(a) (b)

(c) (d)

Fig. 6.1  Visual regularity. (a) A symmetry—left and right hand halves are mirror images of each
other. (b) A Glass pattern with coherently-oriented dot dipoles at random positions. (c) A repetition
with four identical subpatterns (the repeats). (d) Multiple symmetries with two and three global
symmetry axes, respectively.

it is expedient to consider symmetry in reference to other visual regularities (i.e., regularities to


which the visual system is sensitive; see Figure 6.1). These starting points reverberate in the next
evaluation of the presumed role of symmetry in perceptual organization, as well as in the subse-
quent review of research on symmetry perception. Notice that it would take too much space to
give a detailed account of this extensive research field in which empirical evidence is based on
many different experimental designs and stimuli. Evidence, however, is always evidence of some-
thing. Therefore, rather than elaborating on details of empirical studies (which readers may look
up using the given references), this review focuses on the conclusions that can be drawn from
them, to look for converging evidence for or against proposed ideas, theories, and models.

The role of symmetry in perceptual organization


Mach (1886) was surely not the first to notice that symmetry is visually salient, but he is to be
credited for his pioneering empirical work on the role of symmetry in visual perception. After
that, for instance, the Gestalt psychologists (Koffka 1935; Köhler 1920; Wertheimer 1912, 1923)
identified symmetry as a factor in perceptual grouping, and Bahnsen (1928) concluded that sym-
metry influences figure-ground segmentation. Such seminal work triggered, in the second half of
the 20th century, an enormous increase in the number of symmetry studies.
Other reasons for that increase were not only that symmetry was recognized as being relevant
in the world (see Section “Introduction”), but also that it is suited to study the mechanisms by
which the visual system picks up information from stimuli. Formal process models of symme-
try detection are discussed later on, but here, it is expedient to briefly address its neural basis.
In this respect, notice that grouping principles seem to be effective throughout the hierarchical
visual process (Palmer et  al. 2003), so that it may not be possible to assign a specific locus to
110 van der Helm

symmetry detection. Indeed, various neuro-scientific studies used symmetry patterns as stimuli,
but thus far, the data are too divergent to draw firm conclusions about locus and timing of sym-
metry detection in the brain. One thing that seems clear, however, is that the lateral occipital com-
plex (LOC) is prominently involved (Beh and Latimer 1997; Sasaki et al. 2005; Tyler and Baseler
1998; Tyler et al. 2005; van der Zwan et al. 1998). The LOC in fact seems a hub where different
perceptual-grouping tendencies interact, which agrees with ideas that it is a shape-selective area
associated with perceptual organization in general (Grill-Spector 2003; Malach et al. 1995; Treder
and van der Helm 2007). Hence, the neuro-scientific evidence may still be scanty, but all in all, it
adds to the above-mentioned idea that symmetry is relevant in perceptual organization.
In cognitive science, behavioral research into this idea yielded evidence that symmetry plays a
role in issues such as object recognition (Pashler 1990; Vetter and Poggio 1994), figure–ground
segregation (Driver et al. 1992; Leeuwenberg and Buffart 1984; Machilsen et al. 2009), and amodal
completion (Kanizsa 1985; van Lier et al. 1995). It further finds elaboration in structural descrip-
tion approaches, that is, formal models which—using some criterion—predict preferred stimu-
lus interpretations on the basis of view-independent specifications of the internal structure of
objects. Some of these approaches work with a-priori fixed perceptual primitives like the volu-
metric building blocks called geons (e.g., Biederman 1987; Binford 1981), which is convenient for
object recognition. Other approaches (e.g., Leeuwenberg 1968, 1969, 1971; Leeuwenberg and van
der Helm 2013) allow primitives to be assessed flexibly, that is, in line with the Gestaltist idea that
the whole determines what the perceived parts are. The latter is more plausible regarding object
perception (Kurbat 1994; Leeuwenberg et al. 1994; Palmer and Rock 1994), but in both cases,
symmetry is taken to be a crucial component of how perception imposes structure on stimuli. In
Leeuwenberg’s approach, for instance, symmetry is one of the regularities exploited to arrive at
simplest stimulus organizations in terms of objects arranged in space (van der Helm, this volume).
Furthermore, in Biederman’s approach, symmetry is taken to define geons because it is a so-called
nonaccidental property: if present in the proximal stimulus, it is also likely to be present in the
distal stimulus (see also Feldman, this volume).
However, the proximal features of symmetry vary with viewpoint, and this drives a wedge
between the perception of symmetry as such and its role in object perception (Schmidt and
Schmidt 2013; Wagemans 1993). That is, symmetry is effective as nonaccidental property only
when viewed orthofrontally—then, as discussed later on, it indeed has many extraordinary detect-
ability properties. Yet, in structural description approaches, it is taken to be effective as group-
ing factor also when viewed non-orthofrontally. This touches upon the more general problem of
viewpoint generalization: how does the visual system arrive at a view-independent representation
of a three-dimensional (3D) scene, starting from a two-dimensional (2D) view of this scene?
Viewpoint generalization has been proposed to involve normalization, that is, a mental rotation
yielding a canonical 2D view of a scene (e.g., Szlyk et al. 1995). This presupposes the generation of can-
didate 3D organizations which, subsequently, are normalized. However, Sawada et al. (2011) not only
showed that any pair of 2D curves is consistent with a 3D symmetry interpretation, but also argued that
it is implausible that every such pair is perceived as being symmetrical. View-dependent coincidences,
for instance, have a strong effect on how a scene is perceptually organized, and may prevent interpreta-
tions involving symmetry (van der Helm, this volume). Likewise, detection of symmetry viewed in
perspective or skewed (i.e., sheared plus rotated, yielding something close to perspective) seems to rely
on proximal features rather than on hypothesized distal features. That is, it deteriorates as its proximal
features are more perturbed (van der Vloed et al. 2005; Wagemans et al. 1991).
Also when viewed orthofrontally, the grouping strength of symmetry is elusive. Symmetry is
often thought to be a cue for the presence of a single object—as opposed to repetition which the
Gestaltists had identified as a grouping factor too (under the umbrella of similarity), but which
Symmetry perception 111

rather is a cue for the presence of multiple objects. However, it seems safer to say that symme-
try is better detectable when it forms one object than when the symmetry halves form separate
objects, and that repetition is less detectable when it forms one object than when the repeats
form separate objects. At least, this is what Corballis and Roldan (1974) found for dot patterns
in which grouping by proximity was responsible for the perceived objects. To tap more directly
into the grouping process, Treder and van der Helm (2007) used stereopsis to assign symmetry
halves and repeats to different perceived depth planes. The process of depth segregation is known
to take a few hundreds of milliseconds, and they found that it interacts hardly with repetition
detection but strongly with symmetry detection. This suggests that the segregation into separate
objects (i.e., the depth planes) agrees with the perceptual structure of repetition but not with
that of symmetry. In a similar vein, Morales and Pashler (2002) found that grouping by color
interferes with symmetry detection, in a way that suggests that individual colors are attended
one at a time.
The foregoing perhaps questions the grouping capability of symmetry, but above all, it shows
the relevance of interactions between different grouping factors. In any case, further investigation
is required to see if firmer conclusions can be drawn regarding the specific role of symmetry in the
build-up of perceptual organizations. Furthermore, notice that the foregoing hardly affects con-
siderations about the functionality of symmetry in the world—after all, this functionality takes
effect once symmetry has been established. It also stands apart from the extraordinary detectabil-
ity properties that are discussed next.

Modulating factors in symmetry detection


Whereas the foregoing sections discussed the context of research on symmetry perception, the
remainder of this chapter focuses on symmetry perception as such. The essence of detecting sym-
metry and other visual regularities in a stimulus is that correlations between stimulus parts are
to be assessed to establish if a stimulus exhibits some form of regularity. The central question
therefore is: which correlations between which parts are to be assessed, and how? This question is
addressed in the next sections by discussing various models and their accounts of observed phe-
nomena. Before that, this section addresses four of the most prominent general factors that can be
said to have a modulating effect on those correlations between parts, namely, absolute orientation,
eccentricity, jitter, and proximity.

Absolute orientation
The absolute orientation of symmetry axes is known to be relevant (for effects of the relative
orientation of symmetry axes, see Section “Representation models of symmetry detection”). The
effect usually found is that vertical symmetry (i.e., with a vertical axis) is more salient than hori-
zontal symmetry which, in turn, is more salient than oblique symmetry (see, e.g., Barlow and
Reeve 1979; Baylis and Driver 1994; Kahn and Foster 1986; Palmer and Hemenway 1978; Rock
and Leaman 1963). This usually found vertical-symmetry advantage has been attributed to the
neural architecture of the brain (Julesz 1971), but the evidence for that is not conclusive (Corballis
et al. 1971; Herbert and Humphrey 1996; Jenkins 1983). Furthermore, other studies did not find
this usual effect or found even an opposite effect (see, e.g., Corballis and Roldan 1975; Fisher and
Bornstein 1982; Jenkins 1983, 1985; Locher and Smets 1992; Pashler 1990; Wagemans et al. 1992).
In any case, notice that horizontal symmetry and vertical symmetry are not different regularities
but are the same regularities in different absolute orientations. Hence, it might well be that effects
of absolute orientation result from visuo-cognitive interactions (e.g., with the vestibular system)
rather than from purely visual processes (cf. Latimer et al. 1994; Wenderoth 1994).
112 van der Helm

Eccentricity
Detection of symmetry deteriorates as it is presented more eccentrically (Saarinen 1988), but if
scaled-up properly, it can maintain the same level of detectability (Tyler 1999). This scaling-up
compensates for the fact that eccentric receptive fields are sensitive to relatively large-scale infor-
mation, as opposed to foveal receptive fields which are sensitive to relatively small-scale informa-
tion. Hence, this is a general property of the visual system and not specific to symmetry which,
apparently, remains equally detectable across the visual field if this factor is taken into account
(see also Sally and Gurnsey 2001).

Jitter
Jitter refers to relatively small, dynamic, displacements of stimulus elements. Then, but also in
case of small static displacements, regularity detection depends on the visual system’s toler-
ance in matching potentially corresponding elements in symmetry halves or repeats. This toler-
ance too is a general property of the visual system and not specific to regularity detection. In
any case, Barlow and Reeves (1979) found that symmetry detection is quite resistant to jitter.
Furthermore, Dry (2008) proposed Voronoi tesselation as a scale-independent mechanism yield-
ing stimulus-dependent tolerance areas. Such a mechanism can, in any model, be adopted to
account for the visual system’s tolerance in matching elements.

Proximity
Proximity effects refer to the fact that stimulus elements that are closer to each other can be
matched more easily (this is not to be confused with the Gestalt law of proximity, which is not
about matching but about grouping). For instance, whereas detection of n-fold repetition (i.e., n
juxtaposed repeats) can only start to be successful by matching elements that are one repeat apart,
symmetry detection can already start to be successful by matching elements near the axis of sym-
metry. Jenkins (1982) in fact proposed that symmetry detection integrates information from only
a limited region about the axis of symmetry: his data suggested that this integration region (IR)
is a strip approximately 1 degree wide, irrespective of the size of the texture at the retina. Dakin
and Herbert (1998) specified this further: their data suggested that the IR has an aspect ratio of
about 2:1, and that its size scales with the spatial frequency content of the pattern. Thus, for homo-
geneous blob patterns for instance, the IR scales with blob size, so that it steadily covers a more or
less constant number of features.
Noticing this scale invariance, however, Rainville and Kingdom (2002) proposed that the size of
the IR is not determined by spatial frequency but by the spatial density of what they called ‘micro-
elements’: their data suggested that the IR covers about 18 such informational units regardless of
their spatial separation. This agrees with studies reporting that the detectability of symmetry does
not vary with the number of elements (i.e., no number effect) for symmetries with more than about
20 elements (e.g., Baylis and Driver 1994; Dakin and Watt 1994; Olivers et al. 2004; Tapiovaara
1990; Wenderoth 1996a). For symmetries with less than about 20 elements, however, these studies
reported opposite effects, and this hints at an explanation that takes into account that symmetry
detection is an integral part of perceptual organization, as follows (see also van der Helm, 2014).
For any stimulus—including symmetry stimuli—a symmetry percept is basically just one of the
possible outcomes of the perceptual organization process; it results only if it is stronger than other
percepts. It is true that a symmetry percept is bound to result for a really otherwise-random sym-
metry stimulus, but such stimuli are rare if not impossible. A symmetry structure with many sym-
metry pairs is usually strong enough to overcome spurious structures, but the smaller the number
Symmetry perception 113

of symmetry pairs is, the harder it is to construct a symmetry stimulus without spurious struc-
tures. This also implies that, in dense stimuli, such spurious structures are more prone to arise in
the area near the axis. In case of small numbers of symmetry pairs, such spurious structures may
have various effects on detection (see below), and in general, they may give the impression that
only the area near the axis is decisive.
In sum, it is true that proximity plays a role in symmetry perception, and the area near the sym-
metry axis is indeed relatively important. Notice, however, that Barlow and Reeves (1979) already
found that also symmetry information in the outer regions of stimuli is picked up quite effec-
tively (see also Tyler et al. 2005; van der Helm and Treder 2009; Wenderoth 1995). Furthermore,
even if symmetry processing would be restricted to a limited stimulus area, then this would not
yet specify which stimulus information in this area is processed, and how. The latter reflects the
fundamental question that formal models of symmetry detection focus on. That is, the factors
discussed here can of course be taken into account in model applications, but are usually not at
the heart of formal models. This is already an indication of their scope, which is discussed next.

The scope of formal models of symmetry detection


Existing formal models of symmetry detection can be divided roughly into representation models
and process models (these are also discussed separately in the next two sections). Whereas pro-
cess models rather focus on performance (how does the detection process proceed?), representa-
tion models rather focus on competence (what is the result?). In other words, whereas process
models rather focus on detection mechanisms, representation models rather focus on detectabil-
ity, or salience, in terms of the strength of symmetry percepts. Of course, eventually, this differ-
ence in scope should be overcome to obtain a unified account, and a possible unification direction
is discussed at the end of this chapter.
Furthermore, as a rule, formal models of symmetry detection start from ideas about the per-
ceptual structure of symmetry, that is, about the parts that are to be correlated somehow to assess
if symmetry is present in a stimulus. Models may differ fundamentally regarding these ideas (see
below), but these ideas usually imply that the models are applicable only to single and nested sym-
metries, possibly perturbed by noise. For instance, if an experimental task involves the detection
of a local symmetry among juxtaposed local symmetries, then humans perform about the same as
when this context were noise (either case is also called crowding, and in either case, symmetry is
known to not pop-out; Nucci and Wagemans 2007; Olivers et al. 2004; Olivers and van der Helm
1998; Roddy and Gurnsey 2011). Indeed, to a particular local symmetry, juxtaposed local sym-
metries actually constitute noise, and this is usually also how such situations are treated by formal
models of symmetry perception.
Moreover, many models are tailored specifically to symmetry (e.g., Chipman 1977; Dakin and
Watt 1994; Dry 2008; Masame 1986, 1987; Yodogawa 1982; Zimmer 1984). Ideally, however, a model
should be equally applicable to other visual regularities (i.e., repetition and Glass patterns; see Figure
6.1b,c). To this end, one might invoke considerations about visual regularity in general. In the 20th
century, this led first to the transformational approach, and later, to the holographic approach. Both
approaches propose a formal criterion for what visual regularity is, and they conclude to more or less
the same visual regularities. However, they rely on fundamentally different mathematical formaliza-
tions of regularity, and as a result, they assign different structures to those visual regularities. The
mathematical details are beyond the scope of this chapter, but the following gives a gist.
According to the transformational approach, visual regularities are configurations that remain
invariant under certain transformations (Palmer 1983). This idea of invariance under motion
114 van der Helm

relies on the same formalization as used in the classification of crystals and regular wall patterns
(Shubnikov and Koptsik 1974; Weyl 1952). It holds that symmetry and repetition are visual regu-
larities because they remain invariant under a 180° 3D rotation about the symmetry axis and a 2D
translation the size of one or more repeats, respectively. Because these transformations identify
entire symmetry halves or entire repeats with each other, they can be said to assign a block struc-
ture to both regularities (see Figure 6.2a).
However, its applicability is unclear for Glass patterns (which are as detectable as symmetry; see
below). Originally, Glass (1969) constructed the patterns named after him by superimposing two
copies of a random dot pattern—one slightly translated or rotated with respect to the other, for
instance. With the transformational approach in mind, this construction method suggests that the
resulting percept too is that of a whole consisting of two overlapping identical substructures (i.e.,
those two copies). This also seems to comply with a grouping over multiple views as needed in
case of binocular disparity and optic flow (Wagemans et al. 1993). However, the actually resulting
percept rather seems to require a framing in terms of relationships between randomly positioned
but coherently oriented dot dipoles (see Section “Representation models of symmetry detection”).
Furthermore, in original rotational Glass patterns, the dipole length increases with the distance
from the center of the pattern, but later, others consistently constructed rotational Glass patterns
by placing identical dot dipoles in coherent orientations at random positions (as in Figure 6.1b).
The two types of Glass patterns do not seem to differ in salience but, by the transformational

(a)

Block structures

(b)

Point structure Block structure Dipole structure


Fig. 6.2  (a) The transformational approach relies on invariance under motion; it assigns a block
structure to both symmetry (at the left) and repetition (in the middle), because entire symmetry
halves and entire repeats are the units that are identified with each other by the shown
transformations. (b) The holographic approach relies on invariance under growth; it assigns a
point structure to symmetry, a block structure to repetition, and a dipole structure to—here,
translational—Glass patterns (at the right), because symmetry pairs, repeats, and dipoles,
respectively, are the units by which these configurations can be expanded preserving the regularity
in them.
Symmetry perception 115

construction above, the latter type would be a perturbed regularity. Because transformational
invariance requires perfect regularity, however, the transformational approach has a problem with
perturbed regularity. A formal solution might be to cross-correlate corresponding parts, but in
symmetry for instance, a simple cross-correlation of the two symmetry halves does not seem to
agree with human performance (Barlow and Reeves 1979; Tapiovaara 1990).
This unclarity regarding Glass patterns adds to the fact the transformational approach does not
account for the key phenomenon—discussed later on in more detail—that symmetries and Glass
patterns are about equally detectable but generally better detectable than 2-fold repetitions (notice
that they all consist transformationally of the same number of corresponding parts; cf. Bruce and
Morgan 1975). Hence, the transformational approach may account for how visual regularities can
be classified, but not for how they are perceived preceding classification.
This drawback does not hold for the holographic approach (van der Helm and Leeuwenberg
1996, 1999, 2004). This approach is also based on a rigorous mathematical formalization of regu-
larity in general (van der Helm and Leeuwenberg 1991), but the difference is that it relies on invar-
iance under growth (which agrees with how mental representations can be built up). To give a
gist, according to this approach, symmetries, repetitions, and Glass patterns are visual regularities
because, preserving the regularity in them, they can be expanded stepwise by adding symmetry
pairs, repeats, and dot dipoles, respectively. This implies that these regularities can be said to be
assigned a point structure, a block structure, and a dipole structure, respectively (see Figure 6.2b).
Thereby, this mathematical formalization supports a structural differentiation that, as discussed
next, seems to underlie detectability differences between visual regularities (see also Attneave
1954; Bruce and Morgan, 1975).

Representation models of symmetry detection


As indicated, representation models of symmetry perception focus on detectability, or salience,
in terms of the strength of symmetry percepts. As a rule, such models capitalize on the concept
of weight of evidence (MacKay, 1969)—that is, they provide a measure of the weight of evidence
for the presence of symmetry in a stimulus. This typically implies that the somehow quantified
amount of symmetry information in a stimulus is normalized by the somehow quantified total
amount of information in the stimulus. Thereby, such a measure is a metric of the strength of the
symmetry percept, and can be applied to both perfect and perturbed symmetry. This also holds
for the holographic model which is based on considerations about visual regularity in general but
which, for symmetry, is usually not outperformed by models tailored specifically to symmetry.
Therefore, here, this holographic model is taken as a robust representative. It is specified in terms
of multi-element stimuli (like the dot stimuli in Figure 6.2), but notice that such stimuli allow for
straightforward generalizations to other stimulus types.
Next, the predictive power of this holographic model is evaluated for perfect symmetry (in
comparison to repetition and Glass patterns), perturbed symmetry (also in comparison to repeti-
tion and Glass patterns, and focusing on cases of noise added to a perfect regularity), and multiple
or n-fold symmetry (i.e., patterns with n global symmetry axes)—all viewed orthofrontally (some
examples are given in Figure 6.1). To this end, various detectability phenomena are considered,
some of which are put in an evolutionary perspective.

Perfect symmetry
In the holographic model, the support for the presence of a regularity is quantified by the number
of nonredundant relationships (E) between stimulus parts that, according to this model, constitute
116 van der Helm

a regularity. Thus, for symmetry E equals the number of symmetry pairs; for repetition E equals
the number of repeats minus one; and for Glass patterns E equals the number of dot dipoles minus
one. Furthermore, the total amount of information in a stimulus is given by the total number
of elements in the stimulus (n), so that the holographic weight-of-evidence metric (W) for the
detectability of a regularity is: W = E/n.
A perfect symmetry on n elements is constituted by E=n/2 symmetry pairs, so that it gets W=0.5
no matter the total number of elements—hence, symmetry is predicted to show no number effect,
which agrees with empirical reports (e.g., Baylis and Driver 1994; Dakin and Watt 1994; Olivers et al.
2004; Tapiovaara 1990; Wenderoth 1996a; see also Section “Modulating factors in symmetry detec-
tion”). Furthermore, E=n/2–1 for a Glass pattern on n elements, so that, for large n, it is predicted
to show more or less the same detectability as symmetry—empirical support for this is discussed in
the next subsection. For an m-fold repetition on n elements, however, E=m-1, so that its detectabil-
ity is predicted to depend strongly on the number of elements per repeat—hence, a number effect,
which found empirical support (Csathó et al. 2003). In particular, 2-fold repetition is predicted to
be generally less detectable than symmetry—which also found empirical support (Baylis and Driver
1994, 1995; Bruce and Morgan 1975; Csathó et al. 2003; Corballis and Roldan 1974; Zimmer 1984).
Hence, the foregoing shows that holographic weight of evidence accounts for the key phenom-
enon that symmetry and Glass patterns are about equally detectable but generally better detect-
able than repetition. This differentiation holds not only for perfect regularities, but as discussed
next, also for perturbed regularities.

Perturbed symmetry
A perfect regularity can be perturbed in many ways, and there are of course limits to the detect-
ability of the remaining regularity. Relevant in this respect is that the percept of an imperfect
regularity results from the perceptual organization process applied to the stimulus. This means
that the percept generally cannot be said to be some original perfect regularity plus some per-
turbation. For instance, if a perfect repetition is perturbed by randomly added noise elements
(which is the form of perturbation considered here), then there may be some remaining repeti-
tiveness depending on the location of the noise. In general, however, repetition seems quite easily
destroyed perceptually—some evidence for this can be found in Rappaport (1957) and in van der
Helm and Leeuwenberg (2004).
Symmetry and Glass patterns, however, are quite resistant to noise, and this is fairly independ-
ent of the location of the noise (e.g., Barlow and Reeves 1979; Maloney et al. 1987; Masame 1986,
1987; Nucci and Wagemans 2007; Olivers and van der Helm 1998; Troscianko 1987; Wenderoth
1995). In fact, both symmetry and Glass patterns exhibit graceful degradation, that is, their detect-
ability decreases gradually with increasing noise proportion (i.e., the proportion of noise elements
relative to the total number of stimulus elements). Their behavior is explicated next in more detail.
By fitting empirical data, Maloney et al. (1987) found that the detectability (d’) of Glass patterns
in the presence of noise follows the psychophysical law
d ’ = g / (2 + N / R)

with R the number of dot dipoles that constitute the regularity; N the number of added noise
elements; and g an empirically determined proportionality constant that depends on stimulus
type and that enables more detailed data fits than rank orders. Maloney et al. (1987) arrived at
this on the basis of considerations from signal detection theory, and the holographic model pre-
dicts the same law on the basis of structural considerations. In the holographic model, W=E/n is
Symmetry perception 117

proposed to be proportional to the detectability of regularity, and for Glass patterns in the pres-
ence of noise, it implies n=2R+N and E=R-1 or, for large R, approximately E=R. Substitution in
W=E/n then yields the psychophysical law above.
The holographic model also predicts this psychophysical law for symmetry (with R equal to the
number of symmetry pairs), and it indeed yields a near perfect fit on Barlow and Reeves’ (1979)
symmetry data (van der Helm 2010). In the middle range of noise proportions, this fit is as good
as that for the Weber-Fechner law (Fechner 1860; Weber 1834) if, in the latter, the regularity-to-
noise ratio R/N is taken as signal (cf. Zanker 1995). In both outer ranges, it is even better because,
unlike the Weber-Fechner law, it accounts for floor and ceiling effects. This means that, in both
outer ranges of noise proportions, the sensitivity to variations in R/N is disproportionally lower
than in the middle range, so that disproportionally larger changes in R/N are needed to achieve
the same change in the strength of the percept (which is also supported by Tjan and Liu (2005),
who used morphing to perturb symmetries).
Interestingly, this account of perturbed symmetry also predicts both symmetry and asymme-
try effects, that is, apparent overestimations and underestimations of the symmetry in a stim-
ulus when compared triadically to slightly more and slightly less symmetrical stimuli (Freyd
and Tversky 1984). These effects are context dependent, and the psychophysical law above sug-
gests that they are due not to incorrect estimations of symmetry but to correct estimations of
symmetry-to-noise ratios. For more details on this, see Csathó et al. (2004), but notice that these
effects are evolutionary relevant for both prey and predators. As discussed in van der Helm and
Leeuwenberg (1996), overestimation by oneself may occur in the case of partly occluded oppo-
nents, for instance, and is helpful to detect them. Furthermore, underestimation by opponents
may occur if oneself is camouflaged, for instance, and is helpful to avoid being detected. The
occurrence of such opposite effects is consistent with the earlier-mentioned idea of a package deal
in the evolutionary selection of a general regularity-detection mechanism. This idea is supported
further by the above-established fact that symmetry and Glass patterns exhibit the same detect-
ability properties, even though symmetry clearly has more evolutionary relevance. A further hint
at such a package deal is discussed at the end of the next subsection.

Multiple symmetry
Regularities can also occur in nested combinations, and in general, additional local regularities in
a global regularity enhance the detectability of this global regularity (e.g., Nucci and Wagemans
2007). To account for this, the holographic model invokes Leeuwenberg’s (1968) structural
description approach, which specifies constraints for hierarchical combinations of global and local
regularities in descriptive codes (which are much like computer programs that produce things by
specifying the internal structure of those things). As a rule, this implies that a compatible local
regularity is one that occurs within a symmetry half of a global symmetry or within a repeat of a
global repetition. The general idea then is that the just-mentioned enhancement occurs only in
case of such combinations. More specifically, however, it implies that local regularity in symmetry
halves adds only once to the detectability of the symmetry, and that local regularity in the repeats
of an m-fold repetition adds m times to the detectability of the repetition (van der Helm and
Leeuwenberg 1996). In other words, repetition is predicted to benefit more from compatible local
regularities than symmetry does—as supported by Corballis and Roldan (1974).
A special case of nested regularities is given by multiple symmetry (see Figure 6.1d). According
to the transformational approach, the detectability of multiple symmetry is predicted to increase
monotonically as a function of the number of symmetry axes—which seems to agree with empirical
118 van der Helm

data (e.g., Palmer and Hemenway 1978; Wagemans et al. 1991). Notice, however, that these studies
considered 1-fold, 2-fold, and 4-fold symmetries, but not 3-fold symmetries which seem to be odd
ones out: they tend to be less detectable than 2-fold symmetries (Wenderoth and Welsh 1998).
According to the holographic approach, hierarchical-compatibility constraints indeed imply
that 3-fold symmetries—and, likewise, 5-fold symmetries—are not as detectable as might be
expected on the basis of the number of symmetry axes alone. For instance, in a 2-fold symme-
try, each global symmetry half is itself a 1-fold symmetry which, in a descriptive code, can be
described as being nested in that global symmetry half. In 3-fold symmetry, however, each global
symmetry half exhibits two overlapping 1-fold symmetries, and because they overlap, only one
of them can be described as being nested in that global symmetry half. In other words, those
hierarchical-compatibility constraints imply that all symmetry can be captured in 2-fold symme-
tries but not in 3-fold symmetries—and, likewise, in 4-fold symmetries but not in 5-fold symmetries.
This suggest not only that 3-fold and 5-fold symmetries can be said to contain perceptually hidden
regularity—which may increase their aesthetic appeal (cf. Boselie and Leeuwenberg 1985)—but
also that they are less detectable than 2-fold and 4-fold symmetries, respectively.
A study by Treder et al. (2011) into imperfect 2-fold symmetries composed of two superim-
posed perfect 1-fold symmetries (which allows for variation in their relative orientation) showed
that the relative orientation of symmetry axes can indeed have this effect. That is, though equal
in all other respects and controlling for absolute orientation, orthogonal symmetries (as in 2-fold
symmetry) were found to be better detectable than non-orthogonal ones (as in 3-fold symmetry).
This suggests that the constituent single symmetries in a multiple symmetry first are detected
separately and then engage in an orientation-dependent interaction. Notice that this would be a
fine example of the Gestalt motto that the whole is something else than the sum of its parts.
Evolutionary interesting, 3-fold and 5-fold symmetries are overrepresented in flowers (Heywood
1993). Furthermore, in human designs, they are virtually absent in decorative motifs (Hardonk
1999) but not in mystical motifs (think of triquetas and pentagrams; Forstner 1961; Labat 1988).
This might well be due to a subconsciously attributed special status to them—caused by their
special perceptual status. In flowers, this may have given them a procreation advantage (Giurfa
et al. 1999). In this respect, notice that insect vision evolved 200–275 million years earlier than
flowering plants (Sun et al. 2011), so that such an perceptual effect may have influenced the distri-
bution of flowers from the start. Furthermore, throughout human history, the special perceptual
status of 3-fold and 5-fold symmetries may have made humans feel that they are more appropriate
for mystical motifs than for decorative motifs (van der Helm 2011). Such considerations are of
course more speculative than those based on psychophysical data, but they do suggest a plausible
two-way interaction between vision and the world: the world determines if a visual system as a
whole has sufficient evolutionary survival value, but subsequently, visual systems also influence
how the world is shaped (see also van der Helm, this volume).

Process models of symmetry detection


To account for the process of symmetry detection, various spatial filtering models have been pro-
posed (e.g., Dakin and Hess 1997; Dakin and Watt 1994; Gurnsey et al. 1998; Kovesi 1997, 1999;
Osorio 1996; Poirier and Wilson 2010; Rainville and Kingdom 2000; Scognamillo et al. 2003).
Whereas representation models usually rely on fairly precise correlations between stimulus ele-
ments to establish symmetry, spatial filtering models usually rely on fairly crude correlations. For
a review, see Treder (2010), but to give an example, Dakin and Watt (1994) proposed a two-stage
model: first, an image is spatially filtered yielding a number of blobs, and then a blob alignment
Symmetry perception 119

procedure is applied to measure how well the centroids of the blobs align along a putative symme-
try axis. In the brain, something like spatial filtering occurs in the lateral geniculate nucleus, that
is, before symmetry perception takes place. It is more than just a modulating factor, however. In
Dakin and Watt’s (1994) model, for instance, the chosen spatial filtering scale in fact determines
the elements that are correlated to establish symmetry in a stimulus.
The latter can be exemplified further by considering anti-symmetry, that is, symmetry in which
otherwise perfectly corresponding elements have opposite properties in some dimension. For
instance, in stimuli consisting of monochromatic surfaces, angles may be convex in one contour
but concave in the corresponding contour (this can also be used to define anti-repetition in such
stimuli; Csathó et al. 2003). Such corresponding contours have opposite contrast signs, and detec-
tion seems possible only post-perceptually (van der Helm and Treder 2009). This also holds, in
otherwise symmetrical checkerboard stimuli, for corresponding squares with opposite contrasts
(Mancini et al. 2005). In both cases, contrast interacts with other grouping factors (grouping by
color in particular). It can, however, also be considered in isolation, namely, in dot patterns in
which symmetrically positioned dots can have opposite contrast polarities with respect to the
background (this can also be used to define anti-repetition and anti-Glass patterns in such stim-
uli). This does not seem to have much effect on symmetry detection (Saarinen and Levi 2000;
Tyler and Hardage 1996; Wenderoth 1996b; Zhang and Gerbino 1992). Representational models
cannot account for that, because they rely on precise correspondences. In contrast, there are spa-
tial filters (and maybe neural analogs) that filter out positional information only, thereby cance-
ling the difference between symmetry and antisymmetry in such stimuli (Mancini et al. 2005).
In Glass patterns, spatial filtering may also be responsible for identifying the constituent dot
dipoles which, after all, may blur into coherently-oriented blobs at courser scales. A potential prob-
lem here, however, is that this might not work for Glass patterns in the presence of noise given by
randomly added single dots. For instance, in Maloney et al.’s (1987) experiment, each dipole dot had
6–10 noise dots closer by than its mate. Further research is needed to assess how spatial filtering
might agree with the psychophysical law discussed in Section “Representation models of symmetry
detection”, which is based on precise correspondences and holds for Glass patterns and symmetry.
The foregoing indicates a tension between process models that rely on fairly crude spatial filter-
ing and representation models that rely on fairly precise correlations between stimulus elements.
Neither type of model alone seems able to account for all aspects of symmetry detection. Yet, uni-
fication might be possible starting from Dakin and Watt’s (1994) conclusion that their human data
match the performance of a fairly fine-scale filter. This empirical finding suggests that symmetry
does not benefit from the presence of relatively large blobs. As elaborated in the remainder if this
section, such an effect is in fact predicted by a process model that allows for effects of spatial filtering
even though it relies on fairly precise structural relationships between elements (van der Helm and
Leeuwenberg 1999). This model fits in the holographic approach discussed above, but it also builds
on processing ideas by Jenkins (1983, 1985) and Wagemans et al. (1993). In this respect, it is a nice
example of a stepwise development of ideas—each previous step as important as the next one.

Bootstrapping
Jenkins (1983, 1985) subjected symmetry and repetition to various experimental manipulations
(e.g., jitter), to investigate what the properties are that characterize these regularities perceptually.
He concluded that symmetry and repetition are characterized by properties of what he called vir-
tual lines between corresponding elements. That is, for orthofrontally viewed perfect regularities,
symmetry is characterized by parallel orientation and midpoint colinearity of virtual lines between
120 van der Helm

corresponding elements in symmetry halves. Likewise, repetition is characterized by parallel orien-


tation and constant length of virtual lines between corresponding elements in repeats. Thus, both
symmetry and repetition can be said to have a point structure, that is, a structure in which each
element constitutes one substructure. Notice that this idea suggests a detection mechanism which
connects virtual lines to assess regularity in a stimulus (see Figure 6.3ab, top panels).
Virtual lines between corresponding points are indeed plausible anchors for a detection mecha-
nism, but this idea seems to be missing something. This was made clear by Wagemans et al. (1991)
who found that the detectability of symmetry in skewed symmetry is hampered, even though
skewing preserves the parallel orientation and midpoint colinearity of virtual lines. Wagemans
et  al. (1993) therefore proposed that the actual detection anchors of symmetry and repetition
(and, likewise, of Glass pattern) are given by virtual trapezoids and virtual parallelograms, respec-
tively (see Figure 6.3ab, top and middle panels). Notice that skewing is an appropriate manipula-
tion to assess this for symmetry (because it perturbs the virtual trapezoids), but not for repetition
(because a skewed perfect repetition is still a perfect repetition). Nevertheless, van der Vloed et al.’s
(2005) study on symmetry and repetition in perspective supports the idea that such correlation

(a) (b) (c)

Fig. 6.3  (a) Symmetry is characterized by parallel orientation and midpoint colinearity of virtual lines
(indicated in bold in top panel) between corresponding elements in symmetry halves; two such
virtual lines can be combined to form a virtual trapezoid (middle panel), from which detection can
propagate in an exponential fashion (bottom panel). (b) In the original bootstrap model, the same
applies to repetition, which is characterized by parallel orientation and constant length of virtual
lines between corresponding elements in repeats. (c) In the holographic bootstrap model, repetition
involves an intermediate stepwise grouping of elements into blocks, which implies that detection
propagates in a linear fashion.
Symmetry perception 121

quadrangles are indeed the detection anchors for both regularities. The detection process can then
be modeled as exploiting these anchors in a bootstrap procedure which starts from correlation
quadrangles to search for additional correlation quadrangles in order to build a representation of
a complete regularity (Wagemans et al. 1993; see Figure 6.3ab, middle and bottom panels).
This bootstrap idea is indeed plausible, but it still seems to be missing something else. That is,
just as Jenkins’ idea, it is not sustained by a mathematical formalism (cf. Bruce and Morgan 1975),
and just as the transformational approach, both ideas do not yet explain detectability differences
between symmetry and repetition. To the latter end, one might resort to modulating factors—in
particular, to proximity. As discussed in Section “Modulating factors in symmetry detection”, such
factors do play a role, but as discussed next, those detectability differences can also be explained
without resorting to such factors.

Holographic bootstrapping
In a reaction to Wagemans (1999) and consistent with the holographic approach, van der Helm
and Leeuwenberg (1999) proposed that symmetry is indeed detected as proposed by Wagemans
et  al. (1993) but that repetition detection involves an additional step. That is, according to the
holographic approach, symmetry pairs are indeed the constituents of symmetry, but repeats—
rather than single element pairs—are the constituents of repetition. This suggests that repetition
detection involves an intermediate step, namely, the grouping of elements into blocks that, even-
tually, correspond to complete repeats (see Figure 6.3c).
This holographic procedure implies that symmetry detection propagates exponentially, but that
repetition detection propagates linearly. For Glass patterns, in which it takes the dot dipoles as
constituents, it also implies that detection propagates exponentially. Thus, it again accounts for
the key phenomenon that symmetry and Glass patterns are about equally detectable but better
detectable than repetition. In addition, it predicts the following.
Suppose that, for some odd reason, a restricted part of a stimulus is processed before the rest
of the stimulus is processed. Then, exponentially propagating symmetry detection is hampered,
whereas linearly propagating repetition detection is not or hardly hampered (see Figure 6.4). By
way of analogy, one may think of a slow car for which it matters hardly whether or not there is
much traffic on the road, versus a fast car for which it matters a lot. Such a split-stimulus situation
seems to occur if the restricted part contains relative large and therefore salient blobs. Such blobs
can plausibly be assumed to be processed first, namely, due to the spatial filtering difference, in the
lateral geniculate nucleus, between the magnocellular pathway (which mediates relatively course
structures relatively fast) and the parvocellular pathway (which mediates relatively fine structures
relatively slow). Hence, then, the holographic bootstrap model predicts that symmetry detec-
tion is hampered by such blobs. Furthermore, due to the number effect in repetition (see Section
“Representation models of symmetry detection”), repetition detection is actually predicted to
benefit from such blobs. Both predictions were confirmed empirically by Csathó et  al. (2003).
They are also relevant to the evolutionary biology discussion on whether symmetry or size—of
sexual ornaments and other morphological traits—is the more relevant factor in mate selection
(e.g., Breuker and Brakefield 2002; Goddard and Lawes 2000; Morris 1998). That is, a global sym-
metry may be salient as such but its salience is reduced by salient local traits.

Conclusion
Visual symmetry will probably remain an inexhaustible topic in many research domains. It is
instrumental in ordering processes that counter natural tendencies towards chaos. Thereby, it is
122 van der Helm

Fig. 6.4  Holographic bootstrapping in case of split stimuli, for symmetry (top) and repetition
(bottom). Going from left to right, suppose that, at a first stage, only the grey areas in the stimuli are
available to the regularity detection process. Then, at first, the propagation proceeds as usual (the
structure detected so far is indicated by black dots). The restriction to the grey areas, however, stops
the exponentially spreading propagation in symmetry sooner than the linearly spreading propagation
in repetition—hence symmetry is hindered more by the split situation than repetition is. When, later,
the rest of the stimulus becomes available, the propagation again proceeds as usual and symmetry
restores its advantage over repetition.

probably also the most important regularity in the interaction between vision and the world. In
vision, there is still unclarity about its exact role in perceptual organization (which depends on
interactions between various grouping factors), but its detectability is extraordinary. The percep-
tual sensitivity to symmetry seems part of an evolutionary package deal, that is, evolution seems
to have yielded a detection mechanism that includes a lower sensitivity to repetition (which is
also less relevant evolutionary) but an equally high sensitivity to Glass patterns (even though
these are even less relevant evolutionary). Therefore, rather than focusing on the relevance of
individual regularities in the external world, it seems expedient to focus on internal percep-
tual mechanisms to explain these sensitivities in a unified fashion. As discussed on the basis
of empirical evidence, these mechanisms seem to rely not only on fairly precise correlations
between stimulus elements, but also on spatial filtering to establish what the to-be-correlated
elements might be.

Acknowledgment
Preparation of this chapter was supported by Methusalem grant METH/08/02 awarded to Johan
Wagemans (www.gestaltrevision.be).

References
Allen, G. (1879). ‘The origin of the sense of symmetry’. Mind 4: 301–316.
Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61: 183–193.
Bahnsen, P. (1928). ‘Eine untersuchung über symmetrie und asymmetrie bei visuellen wahrnehmungen’.
Zeitschrift für Psychologie 108: 355–361.
Barlow, H. B., and B. C. Reeves (1979). ‘The versatility and absolute efficiency of detecting mirror
symmetry in random dot displays’. Vision Research 19: 783–793.
Baylis, G. C., and J. Driver (1994). ‘Parallel computation of symmetry but not repetition within single
visual shapes’. Visual Cognition 1: 377–400.
Baylis, G. C., and J. Driver (1995). ‘Obligatory edge assignment in vision: The role of figure and part
segmentation in symmetry detection’. Journal of Experimental Psychology: Human Perception and
Performance 21: 1323–1342.
Symmetry perception 123

Beck, D. M., M. A. Pinsk, and S. Kastner (2005). ‘Symmetry perception in humans and macaques’. Trends
in Cognitive Sciences 9: 405–406.
Beh, H. C., and C. R. Latimer (1997). ‘Symmetry detection and orientation perception: Electrocortical
responses to stimuli with real and implicit axes of orientation’. Australian Journal of Psychology
49: 128–133.
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94: 115–147.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–244.
Boselie, F., and E. L. J. Leeuwenberg (1985). ‘Birkhoff revisited: Beauty as a function of effect and means’.
American Journal of Psychology 98: 1–39.
Breuker, C. J., and P. M. Brakefield (2002). ‘Female choice depends on size but not symmetry of dorsal
eyespots in the butterfly Bicyclus anynana’. Proceedings of the Royal Society of London B 269: 1233–1239.
Bruce, V. G., and M. J. Morgan (1975). ‘Violations of symmetry and repetition in visual patterns’. Perception
4: 239–249.
Chipman, S. F. (1977). ‘Complexity and structure in visual patterns’. Journal of Experimental
Psychology: General 106: 269–301.
Corballis, M. C., and C. E. Roldan (1974). On the perception of symmetrical and repeated patterns’.
Perception and Psychophysics 16: 136–142.
Corballis, M. C., and C. E. Roldan (1975). ‘Detection of symmetry as a function of angular orientation’.
Journal of Experimental Psychology: Human Perception and Performance 1: 221–230.
Corballis, M. C., G. A. Miller, and M. J. Morgan (1971). ‘The role of left-right orientation in
interhemispheric matching of visual information’. Perception and Psychophysics 10: 385–388.
Csathó, Á., G. van der Vloed, and P. A. van der Helm (2003). ‘Blobs strengthen repetition but weaken
symmetry’. Vision Research 43: 993–1007.
Csathó, Á., G. van der Vloed, and P. A. van der Helm (2004). ‘The force of symmetry
revisited: Symmetry-to-noise ratios regulate (a)symmetry effects’. Acta Psychologica 117, 233–250.
Dakin, S. C., and A. M. Herbert (1998). ‘The spatial region of integration for visual symmetry detection’.
Proceedings of the Royal Society London B 265: 659–664.
Dakin, S. C., and R. F. Hess (1997). ‘The spatial mechanisms mediating symmetry perception’. Vision
Research 37: 2915–2930.
Dakin, S. C., and R. J. Watt (1994). ‘Detection of bilateral symmetry using spatial filters’. Spatial Vision 8:
393–413.
Driver, J., G. C. Baylis, and R. D. Rafal (1992). ‘Preserved figure-ground segregation and symmetry
perception in visual neglect’. Nature 360: 73–75.
Dry, M. (2008). ‘Using relational structure to detect symmetry: A Voronoi tessellation based model of
symmetry perception’. Acta Psychologica 128: 75–90.
Enquist, M., and A. Arak (1994). ‘Symmetry, beauty and evolution’. Nature 372: 169–172.
Fechner, G. T. (1860). Elemente der Psychophysik. (Leipzig: Breitkopf und Härtel).
Feldman, J. (this volume). Probabilistic models of perceptual features. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Fisher, C. B., and M. H. Bornstein (1982). ‘Identification of symmetry: Effects of stimulus orientation and
head position’. Perception and Psychophysics 32: 443–448.
Forstner, D. (1961). Die Welt der Symbole [The world of symbols]. (Innsbruck: Tyriola Verlag).
Freyd, J., and B. Tversky (1984). ‘Force of symmetry in form perception’. American Journal of Psychology
97: 109–126.
Giurfa, M., B. Eichmann, and R. Menzel (1996). ‘Symmetry perception in an insect’. Nature, 382: 458–461.
Giurfa, M., A. Dafni, and P. R. Neal (1999). ‘Floral symmetry and its role in plant-pollinator systems’.
International Journal of Plant Sciences 160: S41–S50.
124 van der Helm

Glass, L. (1969). ‘Moiré effect from random dots’. Nature 223: 578–580.


Goddard, K. W., and M. J. Lawes (2000). ‘Ornament size and symmetry: Is the tail a reliable signal of male
quality in the Red-collared Widowbird?’ The Auk 117: 366–372.
Grill-Spector, K. (2003). ‘The neural basis of object perception’. Current Opinion in Neurobiology 13: 159–166.
Gurnsey, R., A. M. Herbert, and J. Kenemy (1998). ‘Bilateral symmetry embedded in noise is detected
accurately only at fixation’. Vision Research 38: 3795–3803.
Hardonk, M. (1999). Cross-cultural universals of aesthetic appreciation in decorative band patterns.
Ph.D. thesis, Radboud University Nijmegen, The Netherlands.
Hargittai, I. (ed.) (1986). Symmetry: unifying human understanding. (New York: Pergamon).
Herbert, A. M., and G. K. Humphrey (1996). ‘Bilateral symmetry detection: testing a ‘callosal’ hypothesis’.
Perception 25: 463–480.
Heywood, V. H. (ed.) (1993). Flowering plants of the world. (London: Batsford).
Horridge, G. A. (1996). ‘The honeybee (Apis mellifera) detects bilateral symmetry and discriminates its
axis’. Journal of Insect Physiology 42: 755–764.
Jenkins, B. (1982). ‘Redundancy in the perception of bilateral symmetry in dot textures’. Perception and
Psychophysics 32: 171–177.
Jenkins, B. (1983). ‘Component processes in the perception of bilaterally symmetric dot textures’. Perception
and Psychophysics 34: 433–440.
Jenkins, B. (1985). ‘Orientational anisotropy in the human visual system’. Perception and Psychophysics
37: 125–134.
Johnstone, R. A. (1994). ‘Female preferences for symmetrical males as a by-product of selection for mate
recognition’. Nature 372: 172–175.
Julesz, B. (1971). Foundations of Cyclopean Perception. (Chicago: University of Chicago Press).
Kahn, J. I., and D. H. Foster (1986). ‘Horizontal-vertical structure in the visual comparison of rigidly
transformed patterns’. Journal of Experimental Psychology: Human Perception and Performance 12: 422–433.
Kanizsa, G. (1985). ‘Seeing and thinking’. Acta Psychologica 59: 23–33.
Koenderink, J. (this volume). Gestalts as ecological templates. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Koffka, K. (1935). Principles of Gestalt psychology. (London: Routledge and Kegan Paul).
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand [Static and stationary
physical shapes]. (Braunschweig, Germany: Vieweg).
Kovesi, P. (1997). ‘Symmetry and asymmetry from local phase’. In Proceedings AI’97, Tenth Australian Joint
Conference on Artificial Intelligence, pp. 185–190.
Kovesi, P. (1999). ‘Image features from phase congruency’. Videre: A Journal of Computer Vision Research
1: 1–26.
Kurbat, M. A. (1994). ‘Structural description theories: Is RBC/JIM a general-purpose theory of human
entry-level object recognition?’ Perception 23: 1339–1368.
Labat, R. (1988). Manuel d’épigraphie akkadienne: signes, syllabaire, idéogrammes (6th ed.).
(Paris: Imprimerie Nationale).
Latimer, C. R., W. Joung, and C. Stevens (1994). ‘Modelling symmetry detection with back-propagation
networks’. Spatial Vision 8: 415–431.
Leeuwenberg, E. L. J. (1968). Structural information of visual patterns: an efficient coding system in
perception. (The Hague, Paris: Mouton and Co).
Leeuwenberg, E. L. J. (1969). ‘Quantitative specification of information in sequential patterns’. Psychological
Review 76: 216–220.
Leeuwenberg, E. L. J. (1971). ‘A perceptual coding language for visual and auditory patterns’. American
Journal of Psychology 84: 307–349.
Symmetry perception 125

Leeuwenberg, E. L. J., and H. F. J. M. Buffart (1984). ‘The perception of foreground and background as
derived from structural information theory’. Acta Psychologica 55: 249–272.
Leeuwenberg, E. L. J., and P. A. van der Helm (2013). Structural information theory: The simplicity of visual
form. (Cambridge, UK: Cambridge University Press).
Leeuwenberg, E. L. J., P. A. van der Helm, and R. J. van Lier (1994). ‘From geons to structure: A note on
object classification’. Perception 23: 505–515.
Locher, P., and G. Smets (1992). ‘The influence of stimulus dimensionality and viewing orientation on
detection of symmetry in dot patterns’. Bulletin of the Psychonomic Society 30: 43–46.
Mach, E. (1886). Beiträge zur Analyse der Empfindungen [Contributions to the analysis of sensations]. (Jena,
Germany: Gustav Fisher).
Machilsen, B., M. Pauwels, and J. Wagemans (2009). ‘The role of vertical mirror symmetry in visual shape
detection’. Journal of Vision 9: 1–11.
MacKay, D. (1969). Information, mechanism and meaning. (Boston: MIT Press).
Malach, R., J. B. Reppas, R. R. Benson, K. K. Kwong, H. Jiang, W. A. Kennedy, P. J. Ledden, T. J. Brady,
B. R. Rosen, and R. B. H. Tootell (1995). ‘Object-related activity revealed by functional magnetic
resonance imaging in human occipital cortex’. Proceedings of the National Academy of Sciences USA
92: 8135–8139.
Maloney, R. K., G. J. Mitchison, and H. B. Barlow (1987). ‘Limit to the detection of Glass patterns in the
presence of noise’. Journal of the Optical Society of America A 4: 2336–2341.
Mancini, S., S. L. Sally, and R. Gurnsey (2005). ‘Detection of symmetry and anti-symmetry’. Vision
Research 45: 2145–2160.
Masame, K. (1986). ‘Rating of symmetry as continuum’. Tohoku Psychologica Folia 45: 17–27.
Masame, K. (1987). ‘Judgment of degree of symmetry in block patterns’. Tohoku Psychologica Folia
46: 43–50.
Møller, A. P. (1990). ‘Fluctuating asymmetry in male sexual ornaments may reliably reveal male quality’.
Animal Behaviour 40: 1185–1187.
Møller, A. P. (1992). ‘Female swallow preference for symmetrical male sexual ornaments’. Nature
357: 238–240.
Møller, A. P. (1995). ‘Bumblebee preference for symmetrical flowers’. Proceedings of the National Academy of
Science USA 92: 2288–2292.
Morales, D., and H. Pashler (1999). ‘No role for colour in symmetry perception’. Nature 399: 115–116.
Morris, M. R. (1998). ‘Female preference for trait symmetry in addition to trait size in swordtail fish’.
Proceedings of the Royal Society of London B 265: 907–911.
Nucci, M., and J. Wagemans (2007). ‘Goodness of regularity in dot patterns: global symmetry, local
symmetry, and their interactions’. Perception 36: 1305–1319.
Olivers, C. N. L., and P. A. van der Helm (1998). ‘Symmetry and selective attention: A dissociation between
effortless perception and serial search’. Perception and Psychophysics 60: 1101–1116.
Olivers, C. N. L., N. Chater, and D. G. Watson (2004). ‘Holography does not account for
goodness: A critique of van der Helm and Leeuwenberg (1996)’. Psychological Review 111: 261–273.
Osorio, D. (1996). ‘Symmetry detection by categorization of spatial phase, a model’. Proceedings of the Royal
Society of London B 263: 105–110.
Osorio, D., and I. C. Cuthill (this volume). Camouflage and perceptual organization in the animal
kingdom. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Palmer, S. E. (1983). ‘The psychology of perceptual organization: A transformational approach’. In Human and
machine vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 269–339. New York: Academic Press.
Palmer, S. E., J. L. Brooks, and R. Nelson (2003). ‘When does grouping happen?’ Acta Psychologica
114: 311–330.
126 van der Helm

Palmer, S. E., and K. Hemenway (1978). ‘Orientation and symmetry: Effects of multiple, rotational, and
near symmetries’. Journal of Experimental Psychology: Human Perception and Performance 4: 691–702.
Palmer, S. E., and I. Rock (1994). ‘Rethinking perceptual organization: The role of uniform connectedness’.
Psychonomic Bulletin and Review 1: 29–55.
Pashler, H. (1990). ‘Coordinate frame for symmetry detection and object recognition’. Journal of
Experimental Psychology: Human Perception and Performance 16: 150–163.
Poirier, F. J. A. M. and H. R. Wilson (2010). ‘A biologically plausible model of human shape symmetry
perception’. Journal of Vision 10: 1–16.
Rainville, S. J. M., and F. A. A. Kingdom (2000). ‘The functional role of oriented spatial filters in the
perception of mirror symmetry-psychophysics and modeling’. Vision Research 40: 2621–2644.
Rainville, S. J. M., and F. A. A. Kingdom (2002). ‘Scale invariance is driven by stimulus density’. Vision
Research 42: 351–367.
Rappaport, M. (1957). ‘The role of redundancy in the discrimination of visual forms’. Journal of
Experimental Psychology 53: 3–10.
Rock, I., and R. Leaman (1963). ‘An experimental analysis of visual symmetry’. Acta Psychologica
21: 171–183.
Roddy, G., and R. Gurnsey (2011). ‘Mirror symmetry is subject to crowding’. Symmetry 3: 457–471.
Saarinen, J. (1988). ‘Detection of mirror symmetry in random dot patterns at different eccentricities’. Vision
Research 28: 755–759.
Saarinen, J., and D. M. Levi (2000). ‘Perception or mirror symmetry reveals long-range interactions
between orientation-selective cortical filters’. Neuroreport 11: 2133–2138.
Sally, S., and R. Gurnsey (2001). ‘Symmetry detection across the visual field’. Spatial Vision 14: 217–234.
Sasaki, Y., W. Vanduffel, T. Knutsen, C. Tyler, and R. B. H. Tootell (2005). ‘Symmetry activates extrastriate
visual cortex in human and nonhuman primates’. Proceedings of the National Academy of Sciences USA
102: 3159–3163.
Sawada, T., Y. Li, and Z. Pizlo (2011). ‘Any pair of 2D curves is consistent with a 3D symmetric
interpretation’. Symmetry 3: 365–388.
Schmidt, F., and T. Schmidt (2014). ‘Rapid processing of closure and viewpoint-invariant symmetry:
behavioral criteria for feedforward processing’. Psychological Research 78: 37–54.
Scognamillo, R., G. Rhodes, C. Morrone, and D. Burr (2003). ‘A feature-based model of symmetry
detection’. Proceedings of the Royal Society B: Biological Sciences 270: 1727–1733.
Shubnikov, A. V., and V. A. Koptsik (1974). Symmetry in science and art. (New York: Plenum).
Sun, G., D. L. Dilcher, H. Wang, and Z. Chen (2011). ‘A eudicot from the Early Cretaceous of China’.
Nature 471: 625–628.
Swaddle, J., and I. C. Cuthill (1993). ‘Preference for symmetric males by female zebra finches’. Nature
367: 165–166.
Szlyk, J. P., I. Rock, and C. B. Fisher (1995). ‘Level of processing in the perception of symmetrical forms
viewed from different angles’. Spatial Vision 9: 139–150.
Tapiovaara, M. (1990). ‘Ideal observer and absolute efficiency of detecting mirror symmetry in random
images’. Journal of the Optical Society of America A 7: 2245–2253.
Tjan, B. S., and Z. Liu (2005). ‘Symmetry impedes symmetry discrimination’. Journal of Vision 5: 888–900.
Treder, M. S. (2010). ‘Behind the looking-glass: a review on human symmetry perception’. Symmetry 2:
1510–1543.
Treder, M. S., and P. A. van der Helm (2007). ‘Symmetry versus repetition in cyclopean
vision: A microgenetic analysis’. Vision Research 47: 2956–2967.
Treder, M. S., G. van der Vloed, and P. A. van der Helm (2011). ‘Interactions between constituent single
symmetries in multiple symmetry’. Attention, Perception and Psychophysics 73: 1487–1502.
Symmetry perception 127

Troscianko, T. (1987). ‘Perception of random-dot symmetry and apparent movement at and near
isoluminance’. Vision Research 27: 547–554.
Tyler, C. W. (1996). ‘Human symmetry perception’. In Human symmetry perception and its computational
analysis, edited by C. W. Tyler, pp. 3–22. (Zeist, The Netherlands: VSP).
Tyler, C. W. (1999). ‘Human symmetry detection exhibits reverse eccentricity scaling’. Visual Neuroscience
16: 919–922.
Tyler, C. W., and L. Hardage (1996). ‘Mirror symmetry detection: Predominance of second-order pattern
processing throughout the visual field’. In Human symmetry perception and its computational analysis,
edited by C. W. Tyler, pp. 157–172. (Zeist, The Netherlands: VSP).
Tyler, C. W., and H. A. Baseler (1998). ‘fMRI signals from a cortical region specific for multiple pattern
symmetries’. Investigative Ophthalmology and Visual Science 39 (Suppl.): 169.
Tyler, C. W., H. A. Baseler, L. L. Kontsevich, L. T. Likova, A. R. Wade, and B. A. Wandell (2005).
‘Predominantly extra-retinotopic cortical response to pattern symmetry’. NeuroImage 24: 306–314.
van der Helm, P. A. (2010). ‘Weber-Fechner behaviour in symmetry perception? ’ Attention, Perception and
Psychophysics 72: 1854–1864.
van der Helm, P. A. (2011). ‘The influence of perception on the distribution of multiple symmetries in
nature and art’. Symmetry 3: 54–71.
van der Helm, P. A. (2014). Simplicity in vision: A multidisciplinary account of perceptual organization.
(Cambridge, UK: Cambridge University Press).
van der Helm. P. A. (this volume). Simplicity in perceptual organization. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
van der Helm, P. A., and E. L. J. Leeuwenberg (1991). ‘Accessibility, a criterion for regularity and hierarchy
in visual pattern codes’. Journal of Mathematical Psychology 35: 151–213.
van der Helm, P. A., and E. L. J. Leeuwenberg (1996). ‘Goodness of visual regularities: A nontrans­
formational approach’. Psychological Review 103: 429–456.
van der Helm, P. A., and E. L. J. Leeuwenberg (1999). ‘A better approach to goodness: Reply to Wagemans
(1999)’. Psychological Review 106: 622–630.
van der Helm, P. A., and E. L. J. Leeuwenberg (2004). ‘Holographic goodness is not that bad: Reply to
Olivers, Chater, and Watson (2004)’. Psychological Review 111: 261–273.
van der Helm, P. A., and M. S. Treder (2009). ‘Detection of (anti)symmetry and (anti)repetition: Perceptual
mechanisms versus cognitive strategies’. Vision Research 49: 2754–2763.
van der Vloed, G., Á. Csathó, and P. A. van der Helm (2005). ‘Symmetry and repetition in perspective’.
Acta Psychologica 120: 74–92.
van der Zwan, R., E. Leo, W. Joung, C. R. Latimer, and P. Wenderoth (1998). ‘Evidence that both area V1
and extrastriate visual cortex contribute to symmetry perception’. Current Biology 8: 889–892.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1995). ‘Competing global and local
completions in visual occlusion’. Journal of Experimental Psychology: Human Perception and Performance
21: 571–583.
van Tonder, G. J., and D. Vishwanath (this volume). Design insights: Gestalt, Bauhaus and Japanese
gardens. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Vetter, T., and T. Poggio (1994). ‘Symmetric 3D objects are an easy case for 2D object recognition’. Spatial
Vision 8: 443–453.
Wagemans, J. (1993). ‘Skewed symmetry: A nonaccidental property used to perceive visual forms’. Journal
of Experimental Psychology: Human Perception and Performance 19: 364–380.
Wagemans, J. (1997). ‘Characteristics and models of human symmetry detection’. Trends in Cognitive
Science 1: 346–352.
128 van der Helm

Wagemans, J. (1999). ‘Toward a better approach to goodness: Comments on van der Helm and
Leeuwenberg (1996)’. Psychological Review 106: 610–621.
Wagemans, J., L. van Gool, and G. d’Ydewalle (1991). ‘Detection of symmetry in tachistoscopically
presented dot patterns: Effects of multiple axes and skewing’. Perception and Psychophysics 50: 413–427.
Wagemans, J., L. van Gool, and G. d’Ydewalle (1992). ‘Orientational effects and component processes in
symmetry detection’. The Quarterly Journal of Experimental Psychology 44A: 475–508.
Wagemans, J., L. van Gool, V. Swinnen, and J. van Horebeek (1993). ‘Higher-order structure in regularity
detection’. Vision Research 33: 1067–1088.
Washburn, D. K., and D. W. Crowe (1988). Symmetries of culture: Theory and practice of plane pattern
analysis. (Washington, D. C., University of Washington Press).
Weber, E. H. (1834). De tactu [Concerning touch]. (New York: Academic Press).
Wenderoth, P. (1994). ‘The salience of vertical symmetry’. Perception 23: 221–236.
Wenderoth, P. (1995). ‘The role of pattern outline in bilateral symmetry detection with briefly flashed dot
patterns’. Spatial Vision 9: 57–77.
Wenderoth, P. (1996a). ‘The effects of dot pattern parameters and constraints on the relative salience of
vertical bilateral symmetry’. Vision Research 36: 2311–2320.
Wenderoth, P. (1996b). ‘The effects of the contrast polarity of dot-pair partners on the detection of bilateral
symmetry’. Perception 25: 757–771.
Wenderoth, P., and S. Welsh (1998). ‘Effects of pattern orientation and number of symmetry axes on the
detection of mirror symmetry in dot and solid patterns’. Perception 27: 965–976.
Wertheimer, M. (1912). ‘Experimentelle Studien über das Sehen von Bewegung’ [Experimental study on
the perception of movement]. Zeitschrift für Psychologie 12: 161–265.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt [On Gestalt theory]’. Psychologische
Forschung 4: 301–350.
Weyl, H. (1952). Symmetry. (Princeton, NJ: Princeton University Press).
Wynn, T. (2002). ‘Archaeology and cognitive evolution’. Behavioral and Brain Sciences 25: 389–402,
432–438.
Yodogawa, E. (1982). ‘Symmetropy, an entropy-like measure of visual symmetry’. Perception and
Psychophysics 32: 230–240.
Zanker, J. M. (1995). ‘Does motion perception follow Weber’s law?’ Perception 24: 363–372.
Zhang, L., and W. Gerbino (1992). ‘Symmetry in opposite-contrast dot patterns’. Perception 21
(Supp. 2): 95a.
Zimmer, A. C. (1984). ‘Foundations for the measurement of phenomenal symmetry’. Gestalt Theory
6: 118–157.
Chapter 7

The perception of hierarchical structure


Ruth Kimchi

Introduction
Visual objects are viewed as a prime example of hierarchical structure; they can be defined as “multi-
level hierarchical structure of parts and wholes” (Palmer 1977). For instance, a human body is com-
posed of parts—head, legs, arms, etc., which in turn are composed of parts—eyes, nose, and so forth.
The perceptual relations between wholes and their component parts have been a controver-
sial issue for psychologists and philosophers before them. In psychology it can be traced back to
the controversy between Structuralism and Gestalt. The Structuralists, rooted firmly in British
Empiricism, claimed that perceptions are constructed from atoms of elementary, unrelated local
sensations that are unified by associations due to spatial and temporal contiguity. The Gestalt
theorists rejected both atomism and associationism. According to the doctrine of holism in tra-
ditional Gestalt psychology, a specific sensory whole is qualitatively different from the complex
that one might predict by considering only its individual parts, and the quality of a part depends
upon the whole in which this part is embedded (Köhler 1930/1971; Wertheimer 1923/1938; see
also Wagemans, this volume).
This chapter focuses on some modern attempts to grapple with the issue of part-whole relation-
ships: global precedence and the primacy of holistic properties. I begin with the presentation of
the global precedence hypothesis and the global-local paradigm, followed by a brief review of the
empirical findings concerning the boundary conditions of the global advantage effect, its source
and its brain localization. The following sections focus on the microgenesis and the ontogenesis
of the perception of hierarchical structure. I then discuss some issues concerning the interpreta-
tion of the global advantage effect, present a refinement of terminology between global proper-
ties and holistic/configural properties, and review empirical evidence for this distinction and for
the primacy of holistic properties. I close by briefly considering the implications of the empiri-
cal evidence for the understanding of the perception of hierarchical structure and part-whole
relationship.

Global precedence
The global precedence hypothesis, proposed by Navon (1977), states that perceptual processing
proceeds from the global structure towards analysis of more local details. Viewing a visual object
as represented by a hierarchical network with nested relationships (e.g., Palmer 1977), the glo-
bality of a visual property corresponds to the place it occupies in the hierarchy:  Properties at
the top of the hierarchy are more global than those at the bottom, which in turn are more local.
Consider, for example, a human face: The spatial relationship between the facial components (e.g.,
eyes, nose, mouth) is more global than the specific shapes of the components, and in turn, the
relationship between the subparts of a component is more global than the specific properties of
the subparts. The global precedence hypothesis claims that the processing of an object is global to
130 Kimchi

local; namely, more global properties of a visual object are processed first, followed by analysis of
more local properties.
The global precedence hypothesis has been tested by studying the perception of hierarchical
patterns in which larger figures are constructed by suitable arrangement of smaller figures (first
introduced by Asch 1962, and later by Kinchla 1974, 1977). An example is a set of large letters
constructed from the same set of smaller letters having either the same identity as the larger letter
or a different identity (see Figure 7.1). These hierarchical patterns satisfy two conditions, which
were considered by Navon (1977, 1981, 2003) to be critical for testing the hypothesis: first, the
global and local structures can be equated in familiarity, complexity, codability, and identifiability,
so they differ only in level of globality, and second, the two structures can be independent so that
one structure cannot be predicted from the other.
In one experimental paradigm, which has become very popular, observers are presented with
such stimuli and are required to identify the larger (global) or the smaller (local) letter in separate
blocks of trials. Findings of global advantage—namely, faster identification of the global letter
than the local letter and disruptive influence from irrelevant global conflicting information on
local identification (global-to-local interference)—are taken as support for the global precedence
hypothesis (e.g., Navon 1977, experiment 3).
Much of the research following Navon’s (1977) seminal work has been concentrating on delin-
eating boundary conditions of the global advantage effect, examining its locus (perceptual or
post-perceptual), and its localization in the brain (see Kimchi 1992, and Navon 2003, for reviews).
Global advantage:  boundary conditions. Several studies have pointed out certain variables that
can moderate or even reverse the effect. Global advantage is not likely to occur when the overall
visual angle of the hierarchical stimulus exceeds 7º—10º (Kinchla and Wolfe 1979; Lamb and
Robertson 1990), but the effect is just modulated when eccentricity of both levels is equated (e.g.,
Amirkhiabani and Lovegrove 1999; Navon and Norman 1983). Global advantage is also less likely
to occur with spatial certainty than spatial uncertainty (e.g., Lamb and Robertson 1988),  with

S SS SS HHH HHH
SS
SS H
S SS S HH H H H
S H
S H
SS S S S S S HH H HH H

Consistent Conflicting

S S H H
S S H H
S S H H
SSSSSS HHHHH H
S S H H
S S H H
S S H H
Conflicting Consistent

Fig. 7.1  An example of Navon’s hierarchical letters: large H’s and S’s are composed of small H’s and S’s.
Reprinted from Cognitive Psychology, 9(3), David Navon, Forest before trees: The precedence of global features in
visual perception, pp. 353–83, Copyright (1977), with permission from Elsevier.
The Perception of Hierarchical Structure 131

central than peripheral presentation (e.g., Grice et al. 1983; Pomerantz 1983; but see, e.g., Luna
et al. 1990; Navon and Norman 1983), with sparse than dense elements (e.g., Martin 1979), with
few relatively large elements than many relatively small elements (Kimchi 1988; Kimchi and
Palmer 1982, 1985; Yovel et al. 2001), with long than short exposure duration (e.g., Luna 1993;
Paquet and Merikle 1984), and when the goodness of the local forms or their meaningfulness are
superior to that of the global form (e.g., LaGasse 1994; Poirel et al. 2006; Sebrechts and Fragala
1985). The global advantage effect can be also modulated by direct and indirect attentional manip-
ulations (e.g., Han and Humphreys 2002; Kinchla et al. 1983; Lamb et al. 2000; Robertson 1996;
Ward 1982). For example, Han and Humphreys (2002, experiment 1) showed that when attention
was divided between the local and global levels, the presence of a salient local element, which pre-
sumably captured attention, speeded responses to local targets while slowing responses to global
targets.
The source of global advantage. The source (or the locus) of the global advantage effect is still
disputed. Several investigators concluded that the source of global advantage is perceptual (e.g.,
Andres and Fernandes 2006; Broadbent 1977; Han et al. 1997; Han and Humphreys 1999; Koivisto
and Revonsuo 2004; Miller and Navon 2002; Navon 1977, 1991; Paquet 1999; Paquet and Merikle
1988), possibly as a result of early perceptual-organizational processes (Han and Humphreys 2002;
Kimchi 1998, 2000, 2003b). The involvement of organizational processes in global advantage is
discussed in detail later in the chapter. It has been also suggested that global advantage arises from
a sensory mechanism—faster processing of low spatial frequencies than high spatial frequencies
(e.g., Badcock et al. 1990; Han et al. 2002; Hughes et al. 1990; Shulman et al. 1986; Shulman and
Wilson 1987). Although the differential processing rate of low and high spatial frequencies may
play a role in global and local perception, it cannot account for several findings (e.g., Behrmann
and Kimchi 2003; Kimchi 2000; Navon 2003). For example, it cannot handle the effects of mean-
ingfulness and goodness of form on global/local advantage (e.g., Poirel et al. 2006; Sebrechts and
Fragala 1985). Also, Behrmann and Kimchi (2003) reported that two individuals with acquired
integrative visual object agnosia exhibited normal spatial frequency thresholds in both the high-
and low-frequency range, yet both were impaired, and differentially so, at deriving the global
shape of multi-element hierarchical stimuli. Other investigators suggested that global advantage
arises in some post-perceptual process (e.g., Boer and Keuss 1982; Miller 1981a, 1981b; Ward
1982). This view is supported by the findings demonstrating that attention typically modulates
the global advantage effect (e.g., Kinchla et al. 1983; Lamb et al. 2000; Robertson 1996), but, as
noted by Navon (2003), attention can magnify biases that originate prior to the focusing of atten-
tion. Similarly, an effect that arises at the perceptual level can be magnified by post-perceptual
processes, such as response-related processes (Miller and Navon 2002).
Global advantage: brain localization. Data from behavioral and functional neuroimaging studies
are seen to suggest functional hemispheric asymmetry in global versus local perception, with
the right hemisphere biased toward global processing and the left hemisphere biased toward
local processing (e.g., Delis et al. 1986; Fink et al. 1997; Kimchi and Merhav 1991; Robertson
et  al. 1993; Weissman and Woldorff 2005). One view suggests that this asymmetry is related
to the relation between spatial frequency processing and global and local perception. Ivry and
Robertson (1998; Robertson and Ivry 2000), proponents of this view, proposed that there are two
stages of spatial frequency filtering, and the two hemispheres differ in the secondary stage that is
sensitive to the relative rather than absolute spatial frequencies. The left hemisphere emphasizes
information from the higher spatial frequencies within the initially selected range, and the right
hemisphere emphasizes the lower spatial frequencies, with the result that the right hemisphere
is preferentially biased to process global information and the left hemisphere local information.
132 Kimchi

Alternative accounts for the hemispheric asymmetry in global/local processing include the
proposal of hemispheric differences in sensitivity to the saliency of the stimulus, with the right
hemisphere biased toward more salient objects and the left hemisphere biased toward less salient
objects (Mevorach et al. 2006a, 2006b), and the integration hypothesis, which suggests that the
hemispheres are equivalent with respect to shape identification but differ in their capacities for
integrating shape and level information, with the right hemisphere involved in binding shapes to
the global level and the left hemisphere involved in binding shapes to the local level (Hubner and
Volberg 2005).

Microgenesis of the perception of hierarchical structure


One approach to understanding the processes involved in perception is to study its microgen-
esis—the time course of the development of the percept in adult observers. Kimchi (1998) studied
the microgenesis of the perception of hierarchical stimuli that vary in number and relative size of
their elements, using a variation of the primed matching paradigm (Beller 1971). In this paradigm
the observer is presented with a prime followed immediately by a pair of test figures to be matched
for identity. Responses to “same” test pairs are faster when the test figures are similar to the prime
than when they are dissimilar to it. This paradigm enables us to assess implicitly the observer’s
perceptual representations, and by varying the duration of the prime and constructing test figures
that are similar to different aspects of the prime, we can probe changes in the representation over
time (e.g., Kimchi 1998, 2000; Sekuler and Palmer 1992).
The priming stimuli were few- and many-element hierarchical patterns presented for various
durations (40—690 ms). There were two types of “same”-response test pairs defined by the simi-
larity relation between the test figures and the prime. In the element-similarity test pair, the fig-
ures were similar to the prime in their elements but differed in their global configurations. In the
configuration-similarity test pair, the test figures were similar to the prime in their global configu-
rations but differed in their elements. A neutral prime (X) served as a baseline (control) condition
for the two types of test pairs. An example of priming stimuli and their respective “same”- and
“different”-response test pairs is presented in Figure 7.2a.
The priming measure, calculated for each prime type, indicates how much the prime in ques-
tion speeded “same” responses to configuration-similarity test pairs relative to element-similarity
test pairs. The amount of priming is defined by the difference in “same” reaction time (RT) to
an element-similarity test pair versus a configuration-similarity test pair after seeing the prime,
minus the baseline RT difference to these test pairs in the control condition. Priming of the
configuration should produce priming values of greater than zero, and priming of the elements
should produce priming values of less than zero.
The results (Figure 7.2b) show that the global configuration of patterns containing many rela-
tively small elements was primed at brief exposures (see also Razpurker-Apfeld and Kimchi 2007),
whereas the local elements of such patterns were primed only at longer exposures. The global
advantage typically observed with briefly presented many-element patterns (e.g., Navon 1977;
Paquet and Merikle 1984) and before recognition of the local shape (Miller and Navon 2002)
is consistent with this finding. The converse pattern of results was obtained with configurations
composed of few, relatively large elements: The elements were primed at brief exposures, whereas
the global configuration was primed only at longer exposures.
Results concerning the accessibility of the global configuration and local elements of few- and
many-element patterns to rapid search (Kimchi 1998; Kimchi et  al. 2005) converged with the
primed matching results. The global configuration of many-element patterns was accessible to rapid
search, whereas search for the local elements of such patterns was effortfull and inefficient. For the
The Perception of Hierarchical Structure 133

(a) Prime Test pairs


Same Different

Element
Few-element

similarity

Configuration
similarity

Element
Many-element

similarity

Configuration
similarity

(b)

40
30
20
10
0
Priming (msec)

40 90 190 390 690


–10
–20
–30
–40
–50 Few-element
Many-element
–60
–70
Prime duration
Fig. 7.2  (a) Examples of the priming stimuli and the “same”-response and “different”-response test pairs
for the few-element and many-element hierarchical patterns used by Kimchi (1998). (b) Priming effects
for the element and many-element patterns as a function of prime duration. Values greater than zero
indicate configuration priming; values less than zero indicate element priming (see text for details).
Adapted from Ruth Kimchi, Uniform connectedness and grouping in the perceptual organization of hierarchical
patterns, Journal of Experimental Psychology: Human Perception and Performance, 24 (4) pp. 1105–18, DOI:
org/10.1037/0096-1523.24.4.1105© 1998, American Psychological Association.

few-element patterns, search for local elements was fast and efficient, whereas the global configu-
ration was searched less efficiently (see also, Enns and Kingstone 1995).
The results of the microgenetic analysis show that the relative dominance of the global configu-
ration and the local elements varies during the evolution of the percept, presumably as a result of
grouping and individuation processes that operate in early perceptual processing. Many, relatively
small elements are grouped into global configuration rapidly and effortlessly, providing an early
134 Kimchi

(a) (b)

Fig. 7.3  Examples of patterns composed of a few, relatively large elements. (a) Open-ended


L elements form a global square. The global square configuration is primed at brief exposure
durations, indicating a rapid grouping of the elements. (b) Closed square elements form a global
square. The global square configuration is primed only at longer prime durations, indicating
time-consuming grouping of the local elements.
Adapted from Vision Research, 40 (10–12), Ruth Kimchi, The perceptual organization of visual objects: a
microgenetic analysis, pp. 1333–47, DOI: 10.1016/S0042-6989(00)00027-4 Copyright (2000), with permission
from Elsevier.

representation of global structure; the individuation of the elements occurs later and appears to be
time consuming and attention demanding. Few, relatively large elements, on the other hand, are
individuated rapidly and effortlessly and their grouping into a global configuration consumes time
and requires attention. Kimchi (1998) suggested that early and rapid grouping of many small ele-
ments on the one hand, and early and rapid individuation of a few large elements on the other hand,
are desirable characteristics for a system whose one of its goals is object identification and recogni-
tion, because many small elements close to one another are likely to be texture elements of a single
object, whereas a few large elements are likely to be several discrete objects or several distinctive
parts of a complex object.1
Notwithstanding the critical role of number and relative size of the elements in the micro-
genesis of the perception of hierarchical patterns, additional research has suggested that the
“nature” of the elements also plays an important role (Han et  al. 1999; Kimchi 1994, 2000),
further demonstrating the involvement of organizational processes in global advantage. Thus,
when the few, relatively large elements are open-ended line segments as opposed to closed
shapes (Figure 7.3), their configuration, rather than the elements, is available at brief exposure
duration, provided the presence of collinearity and/or closure (Kimchi 2000). Furthermore
the advantage of the global level of many-element patterns can be modulated and even van-
ish, depending on how strongly the local elements group and on the presence of strong cues
to segment the local elements, as when closure is present at the local level (Han et  al. 1999;
Kimchi 1994).

1  Note that in these hierarchical patterns the number of elements is correlated with their relative size for strictly
geometrical reasons: increasing the number of elements necessarily results in decreasing their relative size as
long as the overall size of the pattern is kept constant. The effect of relative size can be separated from that of
number by constructing patterns in which there are only a few element that are relatively small or large, but if
the global size is to be kept constant, other factors, such as relative spacing may be involved. Furthermore, it is
impossible to completely isolate the effect of number from the effect of size because the complete orthogonal
design combining number and relative size would require a geometrically problematic figure—a pattern com-
posed of many relatively large elements (see Kimchi and Palmer 1982, for discussion).
The Perception of Hierarchical Structure 135

The development of the perception of


hierarchical structure
Studies that examined the perception of hierarchical structure in infancy report that 3- and
4-month old infants are sensitive to both global and local structures of visual stimuli and demon-
strate processing advantage for global over local information (Freeseman et al. 1993; Frick et al.
2000; Ghim and Eimas 1988; Quinn et al. 1993; Quinn and Eimas 1986; see also Quinn and Bhatt,
this volume).
Studies that examined developmental trends in the processing of hierarchical structure beyond
infancy did not yield consistent results. Kimchi (1990) found that children as young as three years
of age are as sensitive as adults to the number and relative size of the elements of hierarchical
stimuli, demonstrating a local bias for few-element patterns, and a global bias for many-element
patterns. Several studies reported that global processing in hierarchical visual stimuli continues
to develop into late childhood (Burack et al. 2000; Dukette and Stiles 1996, 2001; Enns et al. 2000;
Harrison and Stiles 2009; Poirel et al. 2008; Porporino et al. 2004; Scherf et al. 2009). Enns et al.
(2000; Burack et al. 2000) also suggested a longer developmental progression for grouping than
for individuation abilities. Other studies, on the other hand, showed longer developmental pro-
gression for local processing (e.g., Mondloch et al. 2003).
Kimchi et al. (2005) systematically examined the development of the perception of hierarchi-
cal structure from childhood to young adulthood, by comparing the performance of five- to
fourteen-year-old children and young adults on few- and many-element hierarchical patterns in
visual search and speeded classification tasks. In the visual search task, participants searched for a
globally-defined or locally-defined target (a diamond) in displays of a variable number of few- or
many-element patterns (Figure 7.4a). The primary dependent variable was search rate, defined
as the slope of the best-fitting linear RT function over the number of items in the display. The
results (RT slopes; Figure 7.4b) show different age-related trends in search rates for global and
local targets in the many- versus the few-element displays. The RT slopes for global targets in the
many-element displays and for local targets in the few-element displays were essentially zero in all
age groups, indicating an efficient and effortless search that did not vary with age. The RT slopes
for local targets in the many-element displays and for global targets in the few-element displays
were steeper and decreasing significantly between five and ten years of age, indicating an inef-
ficient and effortful search that improved with age.
In the classification task, participants were presented with an array of five columns of few- or
many-element patterns (Figure 7.5a). The patterns in the central column were similar in ele-
ments to the patterns on one side and in configuration to the patterns on the other side (incon-
gruent displays). The task was to indicate whether the central column belonged with the patterns
on the left or right side on the basis of similarity in global configuration (global classification)
or in local elements (local classification). The results (Figure 7.5b) converged with those of the
visual search. Five-year-olds made significantly more errors than older participants in the global
classification of few-element patterns and in the local classification of many-element patterns,
whereas all age groups yielded similar low error rates in the global classification of many-element
patterns and in the local classification of few-element patterns. Similar age trends were evident
in the RT data.
These results suggest that grouping of many small elements and individuation of a few large
elements mature at a relatively early age, while grouping a few large elements and individuat-
ing many small elements develop with age, improving significantly between age five and ten and
reaching adult-like levels between ten and fourteen years of age.
136 Kimchi

(a) Global target Local target

T T
Few-element
D D
Many-element

T T

D D

(b)
60

50
Reaction time slope (ms/item)

40

30

20

10

–10 5 10 14 23
Age (years)

Few-global Many-global
Few-local Many-local

Fig. 7.4  (a) Examples of displays in the visual search task used by Kimchi et al. (2005). An example is
shown for each combination of pattern (many-elements or few-elements) and target (global or local).
The target (T) and distractors (D) for each example are indicated. All the examples presented illustrate
display size of 6. (b) Search slopes for global and local targets as a function of pattern and age.
Reproduced from Ruth Kimchi, Batsheva Hadad, Marlene Behrmann, and Stephen E. Palmer, Psychological
Science, 16(4), Microgenesis and Ontogenesis of Perceptual Organization: Evidence From Global and Local
Processing of Hierarchical Patterns, pp. 282–90, doi:10.1111/j.0956-7976.2005.01529.x Copyright © 2005 by
SAGE Publications. Reprinted by Permission of SAGE Publications.

These findings may help resolve some of the apparent contradictions in the developmental literature
mentioned earlier. Enns et al. (2000; Burack 2000) used few-element patterns and found age-related
improvements in search rates for globally-defined but not for locally-defined targets. Mondloch et al.
(2003), on the other hand, used many-element patterns and found age-related improvements for local
but not for global processing. Thus, depending on the nature of the stimuli used, the different studies
tapped into different processes that emerge along different developmental trajectories.
The Perception of Hierarchical Structure 137

(a) Few-element Many-element

(b)
15
Percentage error

10

0
5 10 14 22
Age

Few-global Many-global
Few-local Many-local

Fig. 7.5  (a) Examples of incongruent displays in the few-element and many-element conditions


for the speeded classification task used by Kimchi et al. (2005). (b) Error rates for global and local
classifications in incongruent displays as a function of pattern and age.
Reproduced from Ruth Kimchi, Batsheva Hadad, Marlene Behrmann, and Stephen E. Palmer, Psychological
Science, 16(4), Microgenesis and Ontogenesis of Perceptual Organization: Evidence From Global and Local
Processing of Hierarchical Patterns, pp. 282–90, doi:10.1111/j.0956-7976.2005.01529.x Copyright © 2005 by
SAGE Publications. Reprinted by Permission of SAGE Publications.

Importantly, however, the adult-like grouping of many small elements observed with the
younger children in the visual search and classification tasks (Kimchi et  al. 2005) may not
reflect the same level of functioning as the fast and early grouping observed in adults in the
primed matching task (Kimchi 1998), as suggested by the findings of Scherf et al. (2009). Using
the primed matching task, Scherf et al. (2009) found age-related improvement in the ability to
derive the global shape of the many-element patterns at the short prime durations that contin-
ued through adolescence. It is possible then, that different tasks tap into different levels of the
organizational abilities. Children are capable of grouping elements into global configuration to a
certain degree, which may suffice to support performance in the visual search and classification
tasks, but when confronted with more challenging task such as primed matching under brief
exposures, adult-like performance emerged only in adolescence, indicating that the full pro-
cess of integrating local elements into coherent shapes to the extent of facilitating global shape
identification develops late into adolescence. This long developmental trajectory coincides with
138 Kimchi

what is known about the structural and functional development of the ventral visual pathway
(Bachevalier et al. 1991; Gogtay et al. 2004).
The findings concerning the development of the perception of hierarchical structure converge
with other findings reported in the literature, suggesting that there is a protracted developmental
trajectory for some perceptual organization abilities, even those that appear to emerge during
infancy (see Kimchi 2012, for a review and discussion).

Interpretation of global advantage: Levels of structure


and holistic properties
Overall, global advantage is normally observed with the typical hierarchical stimuli (i.e.,
many-element hierarchical patterns) used in the global–local paradigm to the limits of visibility and
visual acuity. A number of issues have been raised, however, concerning the interpretation of global
advantage (Kimchi 1992; Navon, 2003). One issue concerns the hierarchical patterns that are the
cornerstone of the global–local paradigm. Hierarchical patterns provide an elegant control for many
intervening variables while keeping the hierarchical structure transparent, but the local elements of
the hierarchical patterns are not the local properties of the global form, they are not the parts of the
whole (Kimchi 1992, 1994; Navon 2003). The local properties of the large letter H (see Figure 7.1),
for example, are not the local Hs or Ss but, among others, vertical and horizontal lines. Thus, global
advantage is not an advantage of a global property of a visual object over its local properties, but
rather, an advantage of properties of higher level units over the properties of the lower level units
(Kimchi 1992). Somewhat different, albeit related suggestion has been made by Navon (2003): the
local elements of hierarchical patterns are local constituents of a well-grouped cluster, and global
advantage is an advantage of the cluster over its local constituents. This suggestion is compatible with
the view presented earlier, that perceptual organization processes play a role in global advantage
(Han and Humphreys 1999; Kimchi 1998; Kimchi et al. 2005).
Furthermore, the assumption that the global form and the local elements of hierarchical stim-
uli map directly into two perceptual levels that differ only in their level of globality, has been
questioned. For example, Kimchi and Palmer (1982, 1985) showed that many-element patterns
(like those typically used in the global-local paradigm) are perceived as global form associated
with texture, and the form and texture are perceptually separable. Patterns composed of few, rel-
atively large elements, on the other hand, are perceived as a global form and figural parts, and are
perceptually integral. Pomerantz (1981, 1983) distinguished between patterns in which only the
position of the elements matters for the global form and patterns in which both the position and
the nature of the elements matter, arguing that the local elements in Navon’s hierarchical stimuli
are mere placeholders. If the local elements of many-element patterns serve to define texture or
are mere placeholders, then they may not be represented as figural units, and consequently, faster
identification of the global form than the local form may be accounted for not by its level of glo-
bality but by a qualitative difference in identification of a figural unit versus a textural molecule.
However, this argument is somewhat weakeded by the finding that an earlier representation of
the global form of many-element hierarchical stimuli is followed by a spontaneous individua-
tion of the local elements (Kimchi 1998), and the finding that element heterogeneity in many-
element hierarchical stimuli has no effect on global/local advantage (Navon 2003).
Another, not unrelated issue is that the difference between global and local properties, as opera-
tionally defined in the global-local paradigm, may be captured in terms of relative size, and rela-
tive size alone rather than level of globality, may provide a reasonable account for the observed
global advantage with hierarchical patterns (Navon and Norman 1983). Navon (2003, p.  290)
The Perception of Hierarchical Structure 139

argued that globality is inherently confounded with relative size, that it is a fact of nature that rela-
tive size is “an inherent concomitant of part–whole relationship.” This is indeed the case if global
properties are properties of a higher level unit. For example, the shape of a face is larger than the
shape of its nose. Yet, if global properties are meant to be properties that depend on the relation-
ship between the components, as the theoretical motivation for the global precedence hypothesis
seems to imply (e.g., Navon 1977, 2003), then the essential difference between global proper-
ties and component properties is not captured by their relative size. To distinguish, for example,
squareness from the component vertical and horizontal lines of a square, or faceness from the
facial components of a face, based only on their relative sizes would miss the point.
Thus, a refinement of terminology is called for between global properties, which are defined
by the level they occupy within the hierarchical structure of the stimulus, and holistic/configural
properties that arise from the interrelations between the component properties of the stimulus
(Kimchi 1992, 1994). Evidence concerning the primacy of holistic properties and the distinction
between holistic properties and global properties is presented in the next sections.

The primacy of holistic properties


The Gestaltists claim that wholes have properties that cannot be derived from the properties of
their components is captured in modern psychology by the notion of holistic or configural prop-
erties. Holistic/configural properties are properties that do not inhere in the component parts,
and cannot be predicted by considering only the individual component parts or their simple sum.
Rather, they arise on the basis of the interrelations and interactions between the parts. Examples are
symmetry, regularity, and closure (Garner 1978; Kimchi 1992, 1994; Pomerantz 1981; Rock 1986;
Wagemans 1995, 1997). Thus, for example, four simple lines that vary in orientation can configure
into a square—with a configural property of closure—or into a cross—with a configural property
of intersection. Holistic properties exist along with, not instead of, component properties, and are
a different aspect of a stimulus (Garner 1978). The Gestaltists’ claim about the primacy of wholes
finds its modern counterpart in the hypothesis about the primacy of holistic properties, which
states that holistic properties dominate component properties in information processing.
Holistic primacy in visual forms. Empirical research pitting holistic against component properties
using visual forms (with proper controls for differences in discriminability) has provided converg-
ing evidence for the primacy of holistic properties (see Kimchi 2003a, for a review). Lasaga (1989)
and Kimchi (1994; Kimchi and Bloch 1998) investigated the relative dominance of component
and holistic properties by examining whether the discriminability of the components predicts
the discrimination of their configurations. They reasoned that if holistic properties dominate
information processing, then, irrespective of the discriminability of the components, the dis-
crimination between stimuli that have dissimilar holistic properties should always be easier than
discrimination between stimuli that have similar holistic properties, and classification by holistic
properties should be easier than classification by the components.
Consider the stimulus sets presented in Figure 7.6. Discrimination and classification perfor-
mance with the four simple lines that vary in orientation (Figure 7.6a) showed that discrimination
between the two oblique lines is more difficult than between any other pair of lines, and the clas-
sification that involves grouping of the horizontal and vertical lines together and the two oblique
lines together is significantly faster and more accurate than the two other possible groupings
(Kimchi 1994; Lasaga and Garner 1983). These simple stimuli were then grouped to form a new
set of four stimuli (Figure 7.6b), which differed in highly discriminable component properties
(e.g., oblique vs. vertical lines) but shared a holistic property (e.g., closure), or shared a component
140 Kimchi

(a) (b)

(c) (d)

Fig. 7.6  Examples of the stimulus sets for the discrimination and classification tasks used by Kimchi
(1994) and Kimchi and Bloch (1998). Four simple lines that vary in orientation (a) are grouped into the
stimuli in (b). Four simple lines that vary in curvature (c) are grouped into the stimuli in (d). Note that for
the stimuli in (d), configurations that share holistic properties (e.g., closure) are not, unlike those in (b),
simple rotation of one another.
Parts (a) and (b) are reproduced from Ruth Kimchi, The role of wholistic/configural properties versus global
properties in visual form perception, Perception, 23(5), pp. 489–504, doi:10.1068/p230489 © 1994, Pion. With
permission from Pion Ltd, London www.pion.co.uk and www.envplan.com. Parts (c) and (d) are reproduced from
Psychonomic Bulletin & Review, 5(1), pp. 135–139, Dominance of configural properties in visual form perception,
Ruth Kimchi and Benny Bloch, DOI: 10.3758/BF03209469 Copyright © 1998, Springer-Verlag. With kind
permission from Springer Science and Business Media.

property (e.g., oblique lines) but differed in holistic property (closed vs. open). The pattern of per-
formance with the configurations was not predicted by the discriminability of their components;
rather it confirmed the prediction of the hypothesis about the primacy of holistic properties: the
two most difficult discriminations were between stimuli with dissimilar components but similar
holistic properties (square vs. diamond and plus vs. X). Moreover, the discrimination between a
pair of stimuli that differ in a holistic property was equally easy, regardless of whether they dif-
fered in component properties (e.g., the discrimination between square and plus was as easy as the
discrimination between square and X). Also, the easiest classification was the one that was based
on holistic properties, namely the classification that involved grouping of the square and diamond
together and the plus and X together (Kimchi 1994, see also Lasaga 1989). Similar results were
also observed with stimulus sets in which stimuli that shared a holistic property were not a simple
rotation of each other (Figure 7.6c,d; Kimchi and Bloch 1998).
Thus, when both holistic and component properties are present in the stimuli and can be
used for the task at hand, performance is dominated by holistic properties, regardless of the
The Perception of Hierarchical Structure 141

discriminability of the component properties. When holistic properties are not effective for the
task at hand, discrimination and classification can be based on component properties, but there is
a significant cost relative to performance based on holistic properties.
The primacy of holistic properties is also manifested in the configural superiority effect
(Pomerantz et al. 1977; see also Pomerantz and Cragin, this volume): the discrimination of two
simple oblique lines can be significantly improved by the addition of a context that creates a tri-
angle and an arrow configuration.
Other studies have provided converging evidence for the early representation of holistic proper-
ties. Thus, Kimchi (2000; Hadad and Kimchi 2008), using primed matching, showed that shapes
grouped by closure were primed at very short exposure durations, suggesting that closure was
effective already early in the perceptual process. Holistic properties were also found to be acces-
sible to rapid search (e.g., Rensink and Enns 1995).
Holistic primacy in faces. The case of faces is an interesting one. The “first-order spatial relations”
between facial components, namely the basic arrangement of the components (i.e., the eyes above
the nose and the mouth below the nose), is distinguished from the “second-order spatial relations”—
the spacing of the facial components relative to each other. Facial configuration, or faceness, is the
consequence the former, differentiating faces from other object classes. The configural properties
that arise from the latter (e.g., elongation, roundedness) differentiate individual faces (e.g., Diamond
and Carey 1986; Maurer et al. 2002). The dominance of the facial configuration (i.e., faceness) over
the components is easily demonstrated: replacing the components but keeping their spatial arrange-
ment the same does not change the perception of faceness. An example is the “fruit face” painting
by the Renaissance artist Archimbaldo. On the other hand, the relative contribution of configural
properties and component properties to face perception and recognition has been a controversial
issue (e.g., Maurer et al. 2002). Some studies demonstrated that configural properties dominate face
processing (e.g., Bartlett and Searcy 1993; Freire et al. 2000; Leder and Bruce 2000; Murray et al.
2000), and other studies provided evidence that facial features themselves play an important role in
face processing (e.g., Cabeza and Kato 2000; Harris and Nakayama 2008; Schwarzer and Massaro
2001). However, Amishav and Kimchi (2010) demonstrated, using Garner’s (1974) speeded classi-
fication paradigm with proper control of the relative discriminability of the two types of properties,
that perceptual integrality of configural and component properties, rather than relative dominance
of either, is the hallmark of upright face perception (see also Behrmann et al. this volume).

Global versus holistic properties


Although the terms global and holistic properties are often used interchangeably, they can be
distinguished on both theoretical and empirical grounds. As noted earlier, global properties are
defined by the level they occupy within the hierarchical structure of the stimulus. The differ-
ence between global and local properties (as operationally defined in the global–local paradigm)
involves size: Global properties are by definition larger than local properties because the global
configuration is necessarily larger than the local elements of which it is composed. The critical dif-
ference between holistic properties and component properties, however, is not their relative size.
Holistic/configural properties are a consequence of the interrelations between the component
properties of the stimulus.
To examine whether the distinction between global and holistic properties has psychological
reality, we must dissociate level of globality (global vs. local) from type of property (holistic vs.
nonholistic). With hierarchical stimuli, it is possible to construct stimuli in which different types
of properties are present at the global and the local levels. Accordingly, Kimchi (1994) employed
hierarchical stimuli that varied in configural (closure) and nonconfigural (line orientation)
142 Kimchi

Global level
Closure Line orientation

Closure
Local level
Line orientation

Fig. 7.7  Four sets of four stimuli each, produced by the orthogonal combination of type of property
and level of structure.
Reproduced from Ruth Kimchi, The role of wholistic/configural properties versus global properties in visual form
perception, Perception, 23(5), pp. 489–504, doi:10.1068/p230489 © 1994, Pion. With permission from Pion Ltd,
London www.pion.co.uk and www.envplan.com.

properties at the global or the local levels. The orthogonal combination of type of property and
level of structure produced four sets of four stimuli each (see Figure 7.7). Participants classified
each set of four stimuli on the basis of the variation at either the global or the local level of the
stimuli (global or local classification task). Depending on the stimulus set, classification (global
or local) was based on closure or on line orientation. The results showed that global classification
was faster than local classification only when the local classification was based on line orientation;
no global classification advantage was observed when local classification was based on closure.
Han et  al. (1999) used different stimuli (arrows and triangles) and the typical global-local
task. They found a global advantage (i.e., faster RTs for global than for local identification and
global-to-local interference) for both orientation discrimination and closure discrimination, but
the global advantage was much weaker for the closure discrimination task than for the orientation
discrimination task. Under divided-attention conditions, there was a global advantage for orienta-
tion but not for closure discrimination tasks.
Thus, both Kimchi’s (1994) and Han et al.’s (1999) results indicate that relative global or local
advantage for many-element hierarchical patterns depends on whether discrimination at each
level involves configural or nonconfigural properties. When local discrimination involves a con-
figural property like closure, the global advantage markedly decreases or even disappears relative
to the case in which discrimination at that level involves a nonconfigural property like orientation.
These findings converge with the findings reviewed earlier that show a relative perceptual
dominance of configural properties. They also suggest that configural properties are not neces-
sarily global or larger. Leeuwenberg and van der Helm (1991; 2013) using a different approach,
also claim that holistic properties that dominate classification and discrimination of visual forms
The Perception of Hierarchical Structure 143

are not always global. According to the descriptive minimum principle approach proposed by
Leeuwenberg and van der Helm (see also van der Helm’s chapter on simplicity, this volume), the
specification of dominant properties can be derived from the simplest pattern representations,
and it is the highest hierarchical level in the simplest pattern-representation, the “superstructure,”
that dominates classification and discrimination of visual forms. The “superstructure” is not nec-
essarily global or larger.

Concluding remarks
The vast majority of the findings reviewed in this chapter support the view of holistic dominance.
This dominance can arise from temporal precedence of the global level of structure, as when the
global configuration of a many-element pattern is represented before the elements are individu-
ated (global precedence), or from dominance in information processing, as when holistic proper-
ties such as closure, dominate component properties in discrimination and classification of visual
forms (holistic primacy).
In light of this evidence, a view that holds that the whole is perceived just by assembling compo-
nents is hardly tenable. However, several findings suggest that positing holistic dominance as a rigid
perceptual law is hardly tenable either. Early relative dominance of either the global structure or the
components has been found, depending on certain stimulus factors (e.g., Kimchi 1998, 2000), con-
figural dominance has been found with certain configurations but not with others (e.g., Pomerantz
1981; see also Pomerantz and Cragin, this volume), and the relative dominance of configural proper-
ties versus component properties has been found to depend on its relevance to the task at hand (e.g.,
Han et al., 1999; Pomerantz and Pristach 1989). It is also important to note that there are different
kinds of wholes with different kinds of parts and part-whole relationships. Consider for example, a
face with its eyes, nose, mouth, and a wall of bricks. Both are visual objects—wholes—but the eyes,
nose and mouth of a face are its component parts, whereas the bricks in the wall are mere constitu-
ents. Furthermore, there are weak or strong wholes, mere aggregation of elements or configuration
that preempt the components (see Rock 1986). To complicate things even further (or rather, shed
some light), a distinction has been made between global versus local in terms of relative size and
levels of representation in a hierarchical structure and between holistic/configural versus simple/
component properties (Kimchi 1992, 1994). It is likely, therefore, that global precedence charac-
terizes the course of processing of some wholes but not of others, and that the processing of some
wholes but not of others is dominated by holistic properties; it is also the case that the processing of
some wholes (e.g., faces) is characterized by the integrality of configural and component properties.
In a final note, it is appropriate to comment about holistic dominance and the logical relations
between parts and wholes, or between components and configurations. Components can exist with-
out a global configuration, but a configuration cannot exist without components. Therefore, compo-
nents are logically prior to the configuration of which they are part. Similarly, if holistic/configural
properties do not reside in the component properties but rather emerge from the interrelations
among components, then logic dictates the priority of the components. Holistic dominance is also
not easily reconciled with the classical view of visual hierarchy in the spirit of Hubel and Wiesel
(1968; Maunsell and Newsome 1987). However, the logical structure of the stimulus does not neces-
sarily predict processing consequences at all levels of processing (Garner 1983; Kimchi 1992; Kimchi
and Palmer 1985), and the anatomical, structural aspects of the hierarchy of the visual system can be
distinguished from the temporal, functional aspects of it, taking into account the extended connec-
tion within cortical areas and the massive feedback pathways (e.g., Maunsell and Essen 1983). It is
possible, for example, as suggested by Hochstein and Ahissar’s (2002) reverse hierarchy theory, that
implicit, nonconscious, fast perceptual processing proceeds from components to configurations,
144 Kimchi

whereas, conscious, top-down, task-driven attentional processing begins with configurations and
then descends to components/local details if required by the task.

Acknowledgments
Preparation of this chapter was supported by the Max Wertheimer Minerva Center for Cognitive
Processes and Human Performance, University of Haifa.
Correspondence should be sent to Ruth Kimchi, Department of Psychology, University of
Haifa, Haifa 3498838, Israel; email: rkimchi@research.haifa.ac.il.

References
Amirkhiabani, G. and Lovegrove, W. J. (1999). Do the global advantage and interference effects covary?
Perception and Psychophysics 61(7) : 1308–19.
Amishav, R. and Kimchi, R. (2010). Perceptual integrality of componential and configural information in
face processing. Psychonomic Bulletin & Review 17(5): 743–48.
Andres, A. J. D. and Fernandes, M. A. (2006). Effect of short and long exposure duration and dual-tasking
on a global-local task. Acta Psychologica 122(3): 247–66.
Asch, S. E. (1962). A problem in the theory of associations. Psychologische Beiträge 6: 553–63.
Bachevalier, J., Hagger, C., and Mishkin, M. (1991). In N. A. Lassen, D. H. Ingvar, M. E. Raicjle, and
L. Friberg (eds.), Brain work and mental activity, Vol. 31, pp. 231–40. Copenhagen: Munksgaard.
Badcock, C. J., Whitworth, F. A., Badcock, D. R., and Lovegrove, W. J. (1990). Low-frequency filtering and
processing of local-global stimuli. Perception 19: 617–29.
Bartlett, J. C. and Searcy, J. (1993). Inversion and configuration of faces. Cognitive Psychology 25(3): 281–316.
Behrmann, M. and Kimchi, R. (2003). What does visual agnosia tell us about perceptual organization
and its relationship to object perception? Journal of Experimental Psychology-Human Perception and
Performance 29(1): 19–42.
Beller, H. K. (1971). Priming: effects of advance information on matching. Journal of Experimental
Psychology 87: 176–82.
Boer, L. C. and Keuss, P. J. G. (1982). Global precedence as a postperceptual effect: An analysis of
speed-accuracy tradeoff functions. Perception & Psychophysics 13: 358–66.
Broadbent, D. E. (1977). The hidden preattentive process. American Psychologist 32(2): 109–18.
Burack, J. A., Enns, J. T., Iarocci, G., and Randolph, B. (2000). Age differences in visual search for
compound patterns: Long-versus short-range grouping. Developmental Psychology 36(6): 731–40.
Cabeza, R. and Kato, T. (2000). Features are also important: Contributions of featural and configural
processing to face recognition. Psychological Science 11(5) : 429–33.
Delis, D. C., Robertson, L. C., and Efron, R. (1986). Hemispheric specialization of memory for visual
hierarchical stimuli. Neuropsychologia 24(2): 205–14.
Diamond, R. and Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of
Experimental Psychology: General 115(2): 107–17.
Dukette, D. and Stiles, J. (1996). Children’s analysis of hierarchical patterns: Evidence from a similarity
judgment task. Journal of experimental Child Psychology 63: 103–40.
Dukette, D., and Stiles, J. (2001). The effects of stimulus density on children’s analysis of hierarchical
patterns. Developmental Science 4(2): 233–51.
Enns, J. T. and Kingstone, A. (1995). Access to global and local properties in visual search for compound
stimuli. Psychological Science 6(5): 283–91.
Enns, J. T., Burack, J. A., Iarocci, G., and Randolph, B. (2000). The orthogenetic principle in the perception
of “forests” and “trees”? Journal of Adult Development 7(1): 41–8.
The Perception of Hierarchical Structure 145

Fink, G. R., Halligan, P. W., Marshall, J. C., Frith, C. D., Frackowiak, R. S. J., and Dolan, R. J. (1997).
Neural mechanisms involved in the processing of global and local aspects of hierarchically organized
visual stimuli. Brain 120: 1779–91.
Freeseman, L. J., Colombo, J., and Coldren, J. T. (1993). Individual differences in infant visual
attention: Four-month-olds’ discrimination and generalization of global and local stimulus properties.
Child Development 64(4): 1191–203.
Freire, A., Lee, K., and Symons, L. A. (2000). The face-inversion effect as a deficit in the encoding of
configural information: direct evidence. Perception 29(2): 159–70.
Frick, J. E., Colombo, J., and Allen, J. R. (2000). Temporal sequence of global-local processing in
3-month-old infants. Infancy 1(3): 375–86.
Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum.
Garner, W. R. (1978). Aspects of a stimulus: Features, dimensions, and onfigurations. In E. Rosch and
B. B. Lloyd (eds.), Cognition and ategorization, pp. 99–133. Hillsdale, NJ: Erlbaum.
Garner, W. R. (1983). Asymmetric interactions of stimulus dimensions in perceptual information
processing. In T. J. Tighe and B. E. Shepp (eds.), Perception, cognition, and development: Interactional
analysis (pp. 1–37). Hillsdale, NJ: Erlbaum.
Ghim, H. r., and Eimas, P. D. (1988). Global and local processing by 3- and 4-month-old infants. Perception
& Psychophysics 43(2): 165–71.
Gogtay, N., Giedd, J. N., Lusk, L., Hayashi, K. M., Greenstein, D., Vaituzis, A. C. et al. (2004). Dynamic
mapping of human cortical development during childhood through early adulthood. Proceedings of the
National Academy of Sciences of the United States of America 101(21): 8174–9.
Grice, G. R., Canham, L., and Boroughs, J. M. (1983). Forest before trees? It depends where you look.
Perception & Psychophysics 33(2) : 121–8.
Hadad, B., and Kimchi, R. (2008). Time course of grouping of shape by perceptual closure: Effects of spatial
proximity and collinearity. Perception & Psychophysics 70: 818–27.
Han, S. and Humphreys, G. W. (1999). Interactions between perceptual organization based on Gestalt laws
and those based on hierarchical processing. Perception & Psychophysics 61(7): 1287–98.
Han, S. and Humphreys, G. W. (2002). Segmentation and selection contribute to local processing in hierarchical
analysis. The Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology 55(1): 5–21.
Han, S., Fan, S., Chen, L., and Zhuo, Y. (1997). On the different processing of wholes and
parts: A psychophyiological analysis. Journal of Cognitive Neuroscience 9: 687–98.
Han, S., Humphreys, G. W., and Chen, L. (1999). Parallel and competitive processes in hierarchical
analysis: Perceptual grouping and encoding of closure. Journal of Experimental Psychology: Human
Perception and Performance 25(5): 1411–32.
Han, S., Weaver, J. A., Murray, S. O., Kang, X., Yund, E. W., and Woods, D. L. (2002). Hemispheric
asymmetry in global/local processing: effects of stimulus position and spatial frequency. Neuroimage
17(3): 1290–9.
Harris, A. and Nakayama, K. (2008). Rapid adaptation of the m170 response: importance of face parts.
Cereb Cortex 18(2): 467–76.
Harrison, T. B. and Stiles, J. (2009). Hierarchical forms processing in adults and children. Journal of
Experimental Child Psychology 103(2): 222–40.
Hochstein, S. and Ahissar, M. (2002). View from the top: hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804.
Hubel, D. H. and Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate
cortex. Journal of Physiology 195: 215–43.
Hubner, R. and Volberg, G. (2005). The integration of object levels and their content: a theory of global/
local processing and related hemispheric differences. Journal of Experimental Psychology. Human
Perception and Performance 31(3): 520–41.
146 Kimchi

Hughes, H. C., Fendrich, R., and Reuter-Lorenz, P. (1990). Global versus local processing in the absence of
low spatial frequencies. Journal of Cognitive Neuroscience 2: 272–82.
Ivry, R. and Robertson, L. C. (1998). The two sides of perception. Cambridge, MA: MIT Press.
Kimchi, R. (1988). Selective attention to global and local-levels in the comparison of hierarchical patterns.
Perception & Psychophysics 43(2): 189–98.
Kimchi, R. (1990). Children’s perceptual organisation of hierarchical visual patterns. European Journal of
Cognitive Psychology 2(2): 133–49.
Kimchi, R. (1992). Primacy of wholistic processing and global/local paradigm: A critical review.
Psychological Bulletin 112(1): 24–38.
Kimchi, R. (1994). The role of wholistic/configural properties versus global properties in visual form
perception. Perception 23(5) 489–504.
Kimchi, R. (1998). Uniform connectedness and grouping in the perceptual organization of hierarchical
patterns. Journal of Experimental Psychology: Human Perception and Performance 24(4): 1105–18.
Kimchi, R. (2000). The perceptual organization of visual objects: a microgenetic analysis. Vision Research
40(10–12): 1333–47.
Kimchi, R. (2003a). Relative dominance of holistic and component properties in the perceptual
organization of visual objects. In M. A. Peterson and G. Rhodes (eds.), Perception of faces, objects, and
scenes: Analytic and holistic processes, pp. 235–63. New York, NY: Oxford University Press.
Kimchi, R. (2003b). Visual perceptual organization: A microgenetic analysis. In R. Kimchi, M. Behrmann,
and C. R. Olson (eds.), Perceptual organization in vision: Behavioral and neural perspectives, pp. 117–54.
Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Kimchi, R. (2012). Ontogenesis and microgenesis of visual perceptual organization. In J. A. Burack, J. T.
Enns, and N. A. Fox (eds.), Cognitive Neuroscience, Development, and Psychopathology, pp. 101–31.
New York: Oxford University Press.
Kimchi, R. and Bloch, B. (1998). Dominance of configural properties in visual form perception.
Psychonomic Bulletin & Review 5(1): 135–9.
Kimchi, R. and Merhav, I. (1991). Hemispheric Processing of Global Form, Local Form, and Texture. Acta
Psychologica 76(2): 133–47.
Kimchi, R. and Palmer, S. E. (1982). Form and Texture in Hierarchically Constructed Patterns. Journal of
Experimental Psychology: Human Perception and Performance 8(4): 521–35.
Kimchi, R. and Palmer, S. E. (1985). Separability and Integrality of Global and Local Levels of Hierarchical
Patterns. Journal of Experimental Psychology: Human Perception and Performance 11(6): 673–88.
Kimchi, R., Hadad, B., Behrmann, M., and Palmer, S. E. (2005). Microgenesis and ontogenesis
of perceptual organization: Evidence from global and local processing of hierarchical patterns.
Psychological Science 16(4): 282–90.
Kinchla, R. A. (1974). Detecting target elements in multi-element arrays: A confusability model. Perception
& Psychophysics 15: 149–158.
Kinchla, R. A. (1977). The role of structural redundancy in the perception of visual targets. Perception &
Psychophysics 22: 19–30.
Kinchla, R. A., Macias, S.-V., and Hoffman, J. E. (1983). Attending to different levels of structure in a visual
image. Perception & Psychophysics 33: 1–10.
Kinchla, R. A. and Wolfe, J. M. (1979). The order of visual processing: “Top-down,” “bottom-up,” or
“middle-out.”. Perception & Psychophysics 25(3): 225–31.
Köhler, W. (1930/1971). Human Perception (M. Henle, trans.). In M. Henle (ed.), The selected papers of
Wofgang Köhler, pp. 142–67). New York: Liveright.
Koivisto, M. and Revonsuo, A. (2004). Preconscious analysis of global structure: Evidence from masked
priming. Visual Cognition 11(1): 105–27.
The Perception of Hierarchical Structure 147

LaGasse, L. L. (1994). Effects of good form and spatial frequency on global precedence. Perception &
Psychophysics 53 : 89–105.
Lamb, M. R. and Robertson, L. (1988). The processing of hierarchical stimuli: Effects of retinal locus,
location uncertainty, and stimulus identity. Perception & Psychophysics 44: 172–81.
Lamb, M. R. and Robertson, L. C. (1990). The effect of visual angle on global and local reaction times
depends on the set of visual angles presented. Perception & Psychophysics 47(5): 489–96.
Lamb, M. R., Pond, H. M., and Zahir, G. (2000). Contributions of automatic and controlled processes
to the analysis of hierarchical structure. Journal of Experimental Psychology: Human Perception and
Performance 26(1): 234–45.
Lasaga, M. I. (1989). Gestalts and their components: Nature of information-precedence. In
B. S. S. Ballesteros (ed.), Object perception: Structure & Process, pp. 165–202. Hillsdale, NJ: Erlbaum.
Lasaga, M. I. and Garner, W. R. (1983). Effect of line orientation on various information-processing tasks.
Journal of Experimental Psychology: Human Perception and Performance 9(2): 215–25.
Leder, H. and Bruce, V. (2000). When inverted faces are recognized: The role of configural information
in face recognition. Quarterly Journal of Experimental Psychology: Human Experimental Psychology
53A(2): 513–36.
Leeuwenberg, E. and Van der Helm, P. (1991). Unity and variety in visual form. Perception
20(5): 595–622.
Leeuwenberg, E. and Van der Helm, P. A. (2013). Structural Information Theory. Cambridge: Cambridge
University Press.
Luna, D. (1993). Effects of exposure duration and eccentricity of global and local information on processing
dominance. European Journal of Cognitive Psychology 5(2): 183–200.
Luna, D., Merino, J. M., & Marcos-Ruiz, R. (1990). Processing dominance of global and local information
in visual patterns. Acta Psychologica, 73(2), 131–143.
Martin, M. (1979). Local and global processing: the role of sparsity. Memory and Cognition 7: 476–84.
Maunsell, J. H. R. and Essen, D. C. V. (1983). The connections of the middle temporal visual area and their
relationship to a cortical hierarchy in macaque monkey. Journal of Neuroscience 3: 2563–86.
Maunsell, J. H. R. and Newsome, W. T. (1987). Visual processing in monkey extrastriate cortex. Annual
Review of Neuroscience 10: 363–401.
Maurer, D., Le Grand, R., and Mondloch, C. J. (2002). The many faces of configural processing. Trends in
Cognitive Sciences 6(6): 255–60.
Mevorach, C., Humphreys, G. W., and Shalev, L. (2006a). Effects of saliency, not global dominance, in
patients with left parietal damage. Neuropsychologia 44(2): 307–319.
Mevorach, C., Humphreys, G. W., and Shalev, L. (2006b). Opposite biases in salience-based selection for
the left and right posterior parietal cortex. Nature Neuroscience 9(6): 740–2.
Miller, J. (1981a). Global precedence in attention and decision. Journal of Experimental Psychology: Human
Perception and Performance 7: 1161–74.
Miller, J. (1981b). Global precedence: Information availability or use Reply to Navon. Journal of
Experimental Psychology: Human Perception and Performance 7: 1183–5.
Miller, J. and Navon, D. (2002). Global precedence and response activation: evidence from LRPs. The
Quarterly Journal of Experimental Psychology: A, Human Experimental Psychology 55(1): 289–310.
Mondloch, C. J., Geldart, S., Maurer, D., and de Schonen, S. (2003). Developmental changes in the
processing of hierarchical shapes continue into adolescence. Journal of Experimental Child Psychology
84: 20–40.
Murray, J. E., Yong, E., and Rhodes, G. (2000). Revisiting the perception of upside-down faces.
Psychological Science 11(6): 492–6.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology, 9, 353–383.
148 Kimchi

Navon, D. (1981). The forest revisited: More on global precedence. Psychological Research, 43, 1–32.
Navon, D. (1991). Testing a queue hypothesis for the processing of global and local information. Journal of
Experimental Psychology: General, 120, 173–189.
Navon, D. (2003). What does a compound letter tell the psychologist’s mind? Acta Psychologica, 114(3),
273–309.
Navon, D., and Norman, J. (1983). Does global precedence really depend on visual angle? Journal of
Experimental Psychology: Human Perception and Performance, 9, 955–965.
Palmer, S. E. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology 9: 441–74.
Paquet, L. (1999). Global dominance outside the focus of attention. Quarterly Journal of Experimental
Psychology: Human Experimental 52(2): 465–85.
Paquet, L. and Merikle, P. (1984). Global precedence: The effect of exposure duration. Canadian Journal of
Psychology 38: 45–53.
Paquet, L. and Merikle, P. (1988). Global precedence in attended and nonattended objects. Journal of
Experimental Psychology: Human Perception and Performance 14(1): 89–100.
Poirel, N., Pineau, A., and Mellet, E. (2006). Implicit identification of irrelevant local objects interacts with
global/local processing of hierarchical stimuli. Acta Psychol (Amst) 122(3): 321–36.
Poirel, N., Mellet, E., Houde, O., and Pineau, A. (2008). First came the trees, then the forest: developmental
changes during childhood in the processing of visual local-global patterns according to the
meaningfulness of the stimuli. Developmental Psychology 44(1): 245–53.
Pomerantz, J. R. (1981). Perceptual organization in information processing. In J. R. Pomerantz and
M. Kubovy (eds.), Perceptual Organization, pp. 141–80. Hillsdale, NJ: Lawrence Erlbaum Associates.
Pomerantz, J. R. (1983). Global and local precedence: Selective attention in form and motion perception.
Journal of Experimental Psychology: General 112(4): 516–40.
Pomerantz, J. R. and Pristach, E. A. (1989). Emergent features, attention, and perceptual glue in visual
form perception. Journal of Experimental Psychology: Human Perception and Performance 15: 635-49.
Pomerantz, J. R., Sager, L. C., and Stoever, R. J. (1977). Perception of wholes and of their component
parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and
Performance 3(3): 422–35.
Porporino, M., Shore, D. I., Iarocci, G., and Burack, J. A. (2004). A developmental change in selective
attention and global form perception. International Journal of Behavioral Development 28: 358–64.
Quinn, P. C. and Eimas, P. D. (1986). Pattern-line effects and units of visual processing in infants. Infant
Behavior and Development 9(1): 57–70.
Quinn, P. C., Burke, S., and Rush, A. (1993). Part-whole perception in early infancy: Evidence for
perceptual grouping produced by lightness similarity. Infant Behavior and Development 16(1): 19–42.
Razpurker-Apfeld, I. and Kimchi, R. (2007). The time course of perceptual grouping: The role of
segregation and shape formation. Perception & Psychophysics 69(5): 732–43.
Rensink, R. A. and Enns, J. T. (1995). Preemption effects in visual search: evidence for low-level grouping.
Psychological Review 102: 101–30.
Robertson, L. C. (1996). Attentional persistence for features of hierarchical patterns. Journal of
Experimental Psychology: General 125(3) 227–49.
Robertson, L. C. and Ivry, R. (2000). Hemispheric asymmetries: Attention to visual an auditory primitives.
Current Directions in Psychological Science 9(2): 59–64.
Robertson, L. C., Lamb, M. R., and Zaidel, E. (1993). Interhemispheric relations in processing hierarchical
patterns: Evidence from normal and commissurotomized subjects. Neuropsychology 7(3): 325–42.
Rock, I. (1986). The description and analysis of object and event perception. In K. R. Boff, L. Kaufman and
J. P. Thomas (eds.), Handbook of perception and human performance, Vol. 33, pp. 1–71. New York: Wiley.
Scherf, K. S., Behrmann, M., Kimchi, R., and Luna, B. (2009). Emergence of Global Shape Processing
Continues Through Adolescence. Child Development 80(1): 162–77.
The Perception of Hierarchical Structure 149

Schwarzer, G. and Massaro, D. W. (2001). Modeling face identification processing in children and adults.
Journal of Experimental Child Psychology 79(2): 139–61.
Sebrechts, M. M. and Fragala, J. J. (1985). Variation on parts and wholes: Information precedence vs. global
precedence. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pp. 11–18).
Sekuler, A. B. and Palmer, S. E. (1992). Perception of partly occluded objects: A microgenetic analysis.
Journal of Experimental Psychology: General 121(1): 95–111.
Shulman, G. L., Sullivan, M. A., Gish, K., and Sakoda, W. J. (1986). The role of spatial-frequency channels
in the perception of local and global structure. Perception 15: 259–73.
Shulman, G. L. and Wilson, J. (1987). Spatial frequency and selective attention to local and global
information. Neuropsychologia 18: 89–101.
Wagemans, J. (1995). Detection of visual symmetries. Spatial Vision 9(1): 9–32.
Wagemans, J. (1997). Characteristics and models of human symmetry detection. Trends in Cognitive
Sciences 1(9): 346–52.
Ward, L. M. (1982). Determinants of attention to local and global features of visual forms. Journal of
Experimental Psychology: Human Perception and Performance 8: 562–81.
Weissman, D. H. and Woldorff, M. G. (2005). Hemispheric asymmetries for different components of
global/local attention occur in distinct temporo-parietal loci. Cerebral Cortex 15(6): 870–6.
Wertheimer, M. (1923/1938). Laws of organization in perceptual forms In W. D. Ellis (ed.), A source book of
Gestalt psychology, pp. 71–88. London: Routledge and Kegan Paul.
Yovel, G., Yovel, I., and Levy, J. (2001). Hemispheric asymmetries for global and local visual
perception: Effects of stimulus and task factors. Journal of Experimental Psychology: Human Perception
and Performance 27(6): 1369–85.
Chapter 8

Seeing statistical regularities


Steven Dakin

Introduction: seeing statistics
The human visual system has evolved to guide behaviour effectively within complex natural visual
environments. To achieve this goal, the brain must rapidly distil a massive amount of sensory
data into a compact representation that captures important image structure (Marr 1982). Natural
images are particularly rich, in part because the surfaces that populate them are often covered in
markings or texture. This texture can be richly informative, for example about material composi-
tion (Kass and Witkin 1985), but is intrinsically complex since textures are by their nature com-
posed of a large number of individual features. One way the visual system produces a compact
description of complex textures is to exploit redundancy (i.e. that one image-patch is not unrelated
to any other patch of the same image) by characterizing attributes of the features making up the
texture (such as orientation) in terms of local statistical properties (e.g. mean orientation). Indeed,
a useful operational definition of ‘visual texture’ is any image for which a statistical representation
is appropriate. To put it another way, texture is less about the image, but more about the quality of
the statistic that can be computed from it (in the context of the task at hand).
Statistics are a sufficient representation of natural texture in the sense that one can synthesize
realistic texture based on statistical descriptions of image features derived from histograms of, for
example, grey levels, local orientation, and spatial frequency structure (Figure 8.1a; Portilla and
Simoncelli 1999). Since they exploit redundancy, these schemes work well on uniform regions
of texture. However, changes in statistics over space also inform our interpretation of natural
scenes. Figure 8.1b is defined by a continuous variation in the average orientation/size and in
the range of orientation/sizes present in the texture. The vivid impression of surface tilt and slant
generated by this image is consistent with the visual system assuming that surface texture is iso-
tropic (i.e. all orientations are equally likely) so that changes in the mean and variance of orien-
tation structure must arise from underlying changes in surface tilt and slant respectively (Malik
and Rosenholtz 1994; Witkin 1981). Furthermore, there is evidence that these statistics drive
a general and active reconstruction process that is used to resolve uncertainty about the local
structure of complex scenes. Texture statistics influence the appearance of elements rendered
uncertain either by visual crowding (Parkes et al. 2001) or by recall within a visual memory task
(Brady and Alvarez 2011).
For the visual system to make accurate statistical descriptions it must combine information
across space and/or time, and in this chapter I focus exclusively on this integration process. This
contrasts with the traditional view of texture perception that emphasizes its role in the segmen-
tation (Rosenholtz chapter) of the distinct surfaces that populate scenes, i.e. in the signalling of
discontinuity—rather than continuity—of feature properties across space.
Note that there is some confusion in the literature over ‘order’ of texture statistics. Bela Julesz
proposed that humans use so-called first- and second-order statistics to capture differences in
texture, i.e. to achieve texture segmentation. According to this terminology, ‘first-order’ refers to
Seeing Statistical Regularities 151

(a) (b)

Fig. 8.1  Statistics convey the (a) appearance and (b) shape of texture. (a) Although this image
appears to be entirely natural, with scrutiny one can see that only the top half shows real leaves. The
lower half started its life as random pixel-noise that had statistical properties of the leaves imposed
upon it (Portilla and Simoncelli 1999). While statistical representations capture important properties
of texture, changes in those statistics are also informative. For example, (b) shows a gradient
defined by simultaneous changes in the mean and variance of both the size of elements and their
orientation. Notice how changes in these statistics convey a vivid sense of surface shape.

all grey-level (i.e. measured from single pixels) statistics and ‘second-order’ refers to all statis-
tics of dipoles (pixel-pairs; Julesz 1981; Julesz et al. 1973). In this chapter, I use ‘order’ in the
more conventional sense, i.e. the order of a histogram statistic where variance (for example) is
a second-order statistic because it is computed on the square of the raw data. Thus, statistics
of varying order can be computed on different image features such as ‘pixel luminance’ or ‘disc
size’, and here I will consider statistical representations on a ‘feature-by-feature’ basis. Such an
approach makes the implicit assumption that these features are appropriate ‘basis functions’ for
further visual processing (see Feldman chapter on probabilistic features). For example, consider
Figure 8.2b showing a texture composed of a ramp controlling the range of grey levels present.
While this information is captured by second-order luminance statistics, it is also captured by
the first-order contrast statistics. Indeed, this is a more meaningful characterization of the struc-
ture in that it is contrast and not luminance that is the currency of visually driven responses in
the primate cortex. More specifically, such a texture will lead to a change in the mean response
(a first-order statistic) of a bank of Gabor filters, which (like V1 neurons) are tuned for contrast
and not luminance. This point is made by Kingdom, Hayes, and Field (2001) who argue that a
basis set of spatial-frequency/orientation band-pass Gabor filters (Daugman 1985) is appropri-
ate because Gabors are not only a reasonable model of receptive field organization in V1 but can
also generate an efficient/sparse code for natural image structure (Olshausen and Field 2005). I
will follow this approach and comment on the appropriateness of a basis function (size, orienta-
tion, etc.) with respect to either specific neural mechanism or the standard Gabor model of V1
receptive fields. Finally note that discrimination of the spatial structure of the pattern in Figure
8.2b cannot be achieved by pooling filter-responses across the whole pattern (which, for example,
could not distinguish a horizontal from a vertical gradient). Instead what is required is integration
across space by mechanisms tuned to (confusingly) the ‘second-order’ (here contrast-defined) spa-
tial structure. Such mechanisms are linked to texture segmentation and are considered in depth
elsewhere (Rosenholtz chapter).
152 Dakin

(a) Mean grey-level (first-order) (b) Grey-level standard deviation (second-order)

Probability

Probability
Dark Light Dark Light

(c) Grey-level skewness (third-order) (d) Grey-level kurtosis (fourth-order)


Probability

Probability
Dark Light Dark Light

Fig. 8.2  Noise textures made up of vertical ‘slices’ varying in (a) first- (b) second-, (c) third- and
(d) fourth-order grey-level statistics. Probability density functions for three ‘slices’ through the
image are given to the right of each texture, with curve-colour coding the slice they correspond to.
Probability density functions are Pearson type VII distributions, which allow one to independently
manipulate these statistical moments (http://en.wikipedia.org/wiki/Kurtosis#The_Pearson_type_
VII_family). Note that the normal distribution (a, b, and green curves in c, d) is a special case of this
distribution.

Luminance statistics
Figure 8.2 shows four textures containing left-to-right-variation in their (a) first- to (d) fourth-
order luminance (L) statistics. Bauer (2009) reports that elements contribute to average perceived
luminance (or brightness) in proportion to their own perceived brightness, i.e. a power law L0.33
(Stevens 1961). However, Nam and Chubb (2000) have reported that humans are near veridical
at judging the brightness of textures containing variation in luminance, with elements (broadly)
contributing in proportion to their luminance. Furthermore, Nam and Chubb (2000) acknowl-
edge that while much of their data are well fit by a power function, this tends to over- and under-
emphasize the role of the highest and lowest luminance respectively.
Different image statistics have been proposed to capture our sensitivity to the range of lumi-
nances present (contrast; Figure 8.2b), but a good predictor of perceived contrast in complex
images remains the standard deviation of grey levels (Bex and Makous 2002; Moulden, Kingdom,
and Gatley 1990). It should be evident from Figure 8.2 that the most salient changes in these
noise textures are carried by the first- and second-order luminance statistics. However, Chubb
et al. (2007) showed that observers’ sensitivity to modulation of grey levels is determined by ‘tex-
ture filters’ with sensitivity not only to mean grey level and contrast, but also to a specific type of
grey-level skewness: the presence of dark elements embedded in light backgrounds which they call
‘blackshot’ (Chubb, Econopouly, and Landy 1994). Sensitivity to such skewness cannot be medi-
ated by simple contrast-gain control1 since the response of neurons in lateral geniculate nucleus
(LGN) of cat are wholly determined by first- and second-order statistics and ignore manipula-
tion of luminance skew and kurtosis (8.2c, d, ; Bonin, Mante, and Carandini 2006). Motoyoshi

1  Processes regulating neural responsivity (gain) as a function of prevailing local contrast and thought to max-
imise information transmission in the visual pathway.
Seeing Statistical Regularities 153

et al. (2007) have suggested that grey-level skewness yields information about surface gloss, with
positive skew (left part of Figure 8.2c) being associated with darker and more glossy surfaces
than skew in the opposite direction (right part of Figure 8.2c). However, it has been argued that
specular reflections (that are largely responsible for kurtosis differences in natural scenes) have
to be appropriately located with respect to underlying surface structure in order for a percept of
gloss to arise (Anderson and Kim 2009; Kim and Anderson 2010). This suggests that perception
of material properties cannot be achieved in the absence of a structural scene analysis. The lack of
any perceptible gloss in Figure 8.2c is consistent with the latter view.
Kingdom et al. (2001) studied sensitivity to changes in contrast histogram statistics (variance,
skew, and kurtosis) by manipulating the contrast, phase, and density of Gabor elements mak-
ing up their textures. They report that a model observer using the distribution of wavelet/filter
responses does a better job of accounting for human discrimination than raw pixel distributions.

Orientation statistics
In terms of spatial vision, orientation is a critical visual attribute that is made explicit at the earli-
est stages of representation in V1, the primary visual cortex (Hubel and Wiesel 1962). That orien-
tation is a property of a Gabor filter supports it being considered a reasonable basis function for
studying human perception of texture statistics (Kingdom et al. 2001). Furthermore, orientation
is known to be encoded in cortex using a distributed or population code, so that there are natu-
ral comparisons to be made between human coding of orientation statistics and computational
models of orientation coding across neural populations (e.g. Deneve, Latham, and Pouget 1999).
Miller and Sheldon (1969) used magnitude estimation to show that observers could accurately
and precisely judge the average orientation of six lines spanning 20°, with each element con-
tributing in proportion to its physical orientation. Dakin and Watt (1997) had observers clas-
sify if the mean orientation of a spatially unstructured field of elements with orientations drawn
from a Gaussian distribution (e.g. 3a, b) was clockwise or anti-clockwise of vertical. For elements
with a standard deviation of 6° observers could judge if the mean orientation was clockwise or
anti-clockwise of vertical as precisely as they could for a sine-wave grating (which contains neg-
ligible variation in orientation2). Using textures composed of two populations of elements with
different means, Dakin and Watt (1997) also showed that observers rely on the mean, and not on,
for example, the mode, to represent global orientation, and that observers can discern changes
in the second-order statistics (orientation variance or standard deviation—s.d.) of a texture but
not in a third-order statistic (orientation skew). Morgan, Chubb, and Solomon (2008) went on to
show that discrimination of changes in orientation s.d. as a function of baseline (‘pedestal’) ori-
entation s.d. follows a dipper-shaped function, i.e. best discrimination arises around a low—but
demonstrably non-zero—level of orientation s.d. Such a pattern of results arises naturally from
an observer basing their judgements on a second-order statistic computed over orientation esti-
mates corrupted by internal noise. However, Morgan et al. found that two-thirds of their observ-
ers showed more facilitation3 than predicted by the intrinsic noise model. They speculate that this
could arise from the presence of a threshold non-linear transduction of orientation variability

  The range of orientations present in a sine-wave grating (its orientation bandwidth) depends only on the size
2

of the aperture the grating is presented within. In the limit, a grating of infinite size contains only one orienta-
tion. For the multi-element textures used in the averaging experiment, orientation bandwidth results from a
complex interaction of element-size, element-orientation and arrangement.
  The extent to which performance improves in the presence of a low-variance pedestal.
3
154 Dakin

(e.g. as it does for blur), which would serve to reduce the visibility of intrinsic noise/uncertainty
and ‘regularize’ the appearance of arrays of oriented elements.
Such orientation statistics provide information that may support other visual tasks. Orientation
variance provides an index of organization that predicts human performance on structure-vs-
noise tasks (Dakin 1999) and can be used as a criterion for selecting filter size for texture process-
ing (Dakin 1997). Baldassi and Burr (2000) presented evidence that texture-orientation statistics
support orientation ‘pop-out’. They showed that observers presented with an array of noisy ori-
ented elements containing a single ‘orientation outlier’ could identify the tilt of the target ele-
ment even when they couldn’t say which element was the target. Furthermore, target orientation
thresholds show a square-root dependency on the number of distractors present, suggesting that
the cue used was the result of averaging target and distractor information. Observers’ ability to
report the orientation of a single element presented in the periphery, and surrounded by distrac-
tors, depends on feature spacing. When target and flanker are too closely spaced visual crowd-
ing arises—a phenomenon whereby observers can see that a target is present but lose detailed
information about its identity (Levi 2008). Using orientation-pop-out stimuli Parkes et al. (2001)
showed that under crowded conditions observers were still able to report the average orientation
(suggesting that target information was not lost but had been combined with the flankers) and
that orientation averaging does not require resolution of the individual components of the texture.
Collectively, these findings suggest that some simple global statistics computed from a pool of
local orientation estimates support the detection of salient orientation structure across the visual
field. But how does that process work: does pooling operate in parallel, is it spatially restricted,
and is it local estimation or global pooling that limits human performance? A qualitative compari-
son of orientation discrimination thresholds across conditions will not answer these questions;
rather, one needs to compare performance to an ideal observer. An equivalent noise paradigm
(Figure  8.3a–e) involves measuring the smallest discernible change in mean orientation in the
presence of difference levels of orientation variability (Figure 8.3a–c). Averaging performance—
the threshold mean orientation offset (θ)—can then be predicted using:

σ int
2
+ σ ext
2
θ=
n  (1)

where σint is the internal noise (i.e. the observer’s effective uncertainty about the orientation of any
one element), σext the external noise (i.e. the orientation variability imposed on the stimulus), and
n the effective number of samples averaged. By fitting this model to our data we can read off the
global limits on performance (the effective number of samples being averaged by observers) and
the local limits on performance (the precision of each estimate). This model provides an excel-
lent account of observers’ ability to average orientation and has allowed us to show that experi-
enced observers, confronted with N elements, judge mean orientation using a global pool of ~√N
elements irrespective of spatial arrangement, indicating no areal limit on orientation averaging
(Dakin 2001). Precision of local samples tends to fall as the number of elements increases, at least
in part due to increases in crowding (Dakin 2001; Dakin et al. 2009; Solomon 2010), although
it persists with widely spaced elements (Dakin 2001). Solomon (2010) showed that the number
of estimates pooled for orientation variance discrimination was actually higher than for mean
orientation, a finding that could perhaps arise from a strategy that weighted the contribution of
elements with ‘outlying’ orientations more heavily.
This approach assumes that observers’ averaging strategy does not change with the amount of exter-
nal noise added to the stimulus. Recently, Allard and Cavanagh (2012) questioned this notion, reporting
(a) Low variance (b) High variance (c) Probability density functions

Reference
Probability
a
b

Reference
Orientation

(d) Averaging task (e) Equivalent noise paradigm


Task: “Is the overall orientation clockwise
or anticlockwise of vertical?”

θthresh: Offeset of mean signal (º)


90
Stimulus
Strategy 32
16
s
8 Fewer sample
4 More noise
2
Ref. orientation
Average n elements
each w. precision σint 0.5 2 8 32 64
P(θ)

Orientation s.d. (º)


P(θ)

θ θ θ

(f) High coherence (g) Low coherence (h) Probability density functions
Signal
Probability

g
f

Signal
Orientation

Fig. 8.3  Probing the statistical representation of orientation. (a–b) Stimuli from a discrimination
experiment, containing (c) differing ranges of orientation (here (a) σ = 6° or (b) σ = 16°). (d)
Observers judge if the average orientation of the elements is clockwise or anti-clockwise of a
reference orientation (here, vertical) and one experimentally determines the minimum offset of the
mean (the mean-orientation threshold) supporting some criterion level of performance. (e) For an
equivalent noise paradigm one measures the mean-orientation thresholds with differing levels of
orientation variability and fit results with a model that yields estimates of how many samples are
being averaged and how noisy each sample is. (f, g) Depicts stimuli from a detection experiment
where observers detect the presence of a subset of elements at a single orientation (here vertical). (h)
In coherence paradigms one establishes the minimum proportion of elements required, here (f) 50%
or (g) 12.5%, to support discrimination from randomly oriented elements.
156 Dakin

that the effective sample size (n) for orientation averaging changed with noise level, which they specu-
late could result from a strategy change whereby observers are less prone to pool orientations that
look the same. These authors estimated sampling by taking ratios of mean-orientation-discrimination
thresholds collected with two different numbers of elements at the same noise level.
Combining Equation 1 with the assumption that internal noise does not change with the num-
ber of elements present, they predicted that threshold ratios should be inversely proportional
to the ratio of sampling rates. However, data from various averaging tasks (Dakin 2001; Dakin,
Mareschal, and Bex 2005a) violate this assumption; estimates of internal/additive noise derived
using Equation 1 change with the number of elements present. For this reason, estimation of sam-
pling efficiency by computing threshold ratios is not reasonable and Allard and Cavanagh’s (2012)
results are equally consistent with rises in additive noise (which Equation 1 attributes to local-
orientation uncertainty) offsetting the benefits of more elements being present. What this study
does do is to highlight the interesting issue of why additive noise should rise with the number of
elements present on screen, especially when crowding is minimized.
Girshick, Landy, and Simoncelli (2011) examined observers’ judgement of mean orientation in
terms of their precision (i.e. threshold, variability of observers’ estimate) and accuracy (i.e. bias,
a systematic tendency to misreport the average). Observers compared the means of texture pairs
composed of orientations where (a) both textures had high variability, (b) both textures had low
variability, or (c) one texture had high and one low variability (this ingenious condition being
designed to reveal intrinsic bias which would be matched—and so cancel—when variability lev-
els were matched across comparisons). The authors not only measured the well-known oblique
effect (lower thresholds for cardinal orientations; Appelle 1972) in low-noise conditions but also
a relative bias effect consistent with observers generally over-reporting cardinal orientations. The
idea is then that (within a Bayesian framework; Feldman chapter on Bayesian models) observ-
ers report the most likely mean orientation using not only the data to hand but also their prior
experience of orientation structure (i.e. from natural scenes). Observers’ performance is limited
both by the noise on their readout (the likelihood term) and their prior expectation. Using an
encoder–decoder approach Girshick et al. (2011) then used variability/bias estimates to infer each
observer’s prior and showed that it closely matched the orientation structure of natural scenes.
Consistent with this view, observers are less likely to report oblique orientations as their uncer-
tainty rises when they become increasingly reliant on their prior expectations which are based on
natural scene statistics (Tomassini, Morgan, and Solomon 2010).
Using a coherence paradigm (Figure 8.1d–f; Newsome and Pare 1988), Husk, Huang, and Hess
(2012) examined orientation processing by measuring observers’ tolerance to the presence of
random-oriented elements when judging overall orientation. They report that coherence thresh-
olds were largely invariant to the contrast, spatial frequency, and number of elements present
(like motion coherence tasks), but that the task showed more dependency on eccentricity than
motion-processing. They further showed that their data could not only reflect a ‘pure’ integration
mechanism (e.g. one computing a vector average of all signal orientation), but must also reflect
the limits set by our ability to segment the signal orientation from the noise (a process they model
using overlapping spatial filters tuned to the two orientations i.e. signal alternatives).

Motion statistics (direction and speed)


Reliable judgement of mean direction is possible in displays composed of elements taking ran-
dom walks (with some mean direction across frames; Williams and Sekuler 1984) or with each
moving in a single directions drawn from either Gaussian or uniform random distributions
Seeing Statistical Regularities 157

(Watamaniuk, Sekuler, and Williams 1989). Such directional pooling is flexible over a range of
directions (Watamaniuk and Sekuler 1992; Watamaniuk et al. 1989), operates over a large (up to
63 deg2) spatial range (consistent with large MT receptive fields) and over intervals of around 0.5 s
(Watamaniuk and Sekuler 1992).
Interestingly, direction judgements are biased by the luminance content, with brighter elements
contributing more strongly to the perceived direction (Watamaniuk, Sekuler, and McKee 2011).
This is interesting as it suggests that the direction estimates themselves may not reflect the output
of motion-tuned areas like MT which (unlike LGN or V1) exhibit little or no tuning for contrast
once the stimulus is visible (Sclar, Maunsell, and Lennie 1990). This in turn speaks to the appro-
priateness of element direction as a basis function for studying motion averaging. Although it is
widely accepted that percept of global motion in such dot displays does reflect genuine pooling
of local motion and not the operation of a motion-signalling mechanism operating at a coarse
spatial scale, this is based on evidence that, for example, high-pass filtering stimuli do not reduce
integration (Smith, Snowden, and Milne 1994). A more sophisticated motion channel that pooled
coarsely across space but across a range of spatial frequencies (Bex and Dakin 2002) might explain
motion pooling without recourse to explicit representation of individual elements. Motion coher-
ence paradigms (analogous to Figure 8.3d–f) not only assume that local motion is an appropriate
level of abstraction of their stimulus but that a motion coherence threshold can be meaningfully
mapped onto mechanism in the absence of an ideal observer. Barlow and Tripathy’s (1997) com-
prehensive effort to model motion coherence tasks suggests the limiting factor tends not to be a
limited sampling capacity (of perfectly registered local motion) but correspondence noise (i.e. on
registration of local motion). This is problematic for the studies that use poor performance on
motion coherence tasks as an indicator of an ‘integration deficit’ in a range of neuropsychiatric
and neurodevelopmental disorders (see also de-Wit & Wagemans chapter).
Adapting the equivalent noise approach described for orientation we have also shown that the
oblique effect for motion (poor discrimination around directions other than horizontal and verti-
cal) is a consequence of poor processing of local motion (not reduced global pooling) and that
the pattern of performance mirrors the statistical properties of motion energy in dynamic natural
scenes (Dakin, Mareschal, and Bex 2005b). Furthermore—like orientation—pooling of direction
is flexible and can operate over large areas with little or no effect on the global sampling or on
local uncertainty.
The standard model of motion averaging (Eqn 1) is vector summation—essentially averaging
of individual (noisy) motions. However, such a model fails badly on motion coherence stimuli
(where it is in the observer’s interest to ignore a subset of ‘noise’ directions; Dakin et al. 2005a).
This flexibility—to both average over-estimates and to exclude noise where appropriate—can be
captured by a maximum likelihood estimator (MLE). In this context MLEs work by fitting a series
of Gaussian templates (with profiles matched to a series of channels tuned to different direc-
tions) to simulated neural responses (subject to Poisson noise) evoked by the stimulus (Dakin
et al. 2005a). The preferred direction of the best-fitting channel is the MLE direction estimate.
This model—unlike a simple vector averaging of directions—can also explain observers’ ability to
judge the mean direction of asymmetrical direction distributions (Webb, Ledgeway, and McGraw
2007) better than simple vector averaging of stimulus directions. Furthermore, presence of mul-
tiplicative noise4 explains why sampling rate changes, for example, with the number of elements

  Random variability of the response of neurons in the visual pathway often rises in proportion to their mean
4

response-level (Dean 1981).


158 Dakin

(a) Size: Low variance (b) Size: High variance

Reference

Fig. 8.4  Even though these stimuli contain elements with either (a) low or (b) high levels of size
variability, one can tell that elements are on average (a) bigger or (b) smaller than the reference.

present. The MLE is a population decoder operating on combined neural responses to all of the
elements present. As for any system, the more elements we add, the more information we add
and so we expect the quality of our estimate of direction to improve. However, as the number
of elements rises so does the overall levels of neural activity and with it the multiplicative noise.
The trade-off between gains (arising from the larger sample size) and losses (because of increased
noise) are captured by a power-law dependence of the effective number of elements pooled on the
number of elements present (Dakin et al. 2005a).
With respect to the speed of motion, observers can make an estimate of mean (rather than
modal) speed over multiple elements for displays containing asymmetrical distributions of ele-
ment speed (Watamaniuk and Duchon 1992). Speed discrimination thresholds are not greatly
affected by the addition of substantial speed variation (µ = 7.6, σ = 1.7 deg/sec) consistent with
observers’ having a high level of uncertainty about the speed of any one element of the display
(Watamaniuk and Duchon 1992). Observers can make perceptual discriminations based on the
mean and variance of speed information but neither skewness nor kurtosis (Atchley and Andersen
1995). Anecdotally, displays composed of a broad range of speeds often produce a percept not of
coherent movement but of two transparent surfaces composed of either fast or slow elements.
Thus, performance of a mean speed task could be based on which display contains more fast ele-
ments. This strategy could be supported by the standard model of speed perception (where per-
ceived speed depends on the ratio of outputs from two channels tuned to high and low temporal
frequencies; e.g. Tolhurst, Sharpe, and Hart 1973). Simple temporally tuned channels necessarily
operate on a crude spatial stimulus representation and would predict, for example, that observers
would be unable to individuate elements within moving-dot stimuli (Allik 1992).

Size statistics
Looking at Figure 8.4 one is able to tell that the average element size on the left and right is
respectively greater or less than the size of the reference disk in the centre. However, demonstrat-
ing that such a judgement really involves averaging has taken some time. Like orientation, early
work relied on magnitude estimation to show that observers could estimate average line length
(Miller and Sheldon 1969). Ariely (2001) showed that we are better at judging the mean area of
Seeing Statistical Regularities 159

a set of disks than we are at judging the size of any member of the set. Importantly, Chong and
Treisman (2003) determined what visual attribute of the disk was getting averaged by having
observers adjust the size of a single disc to match the mean of two disks. They found (following
Teghtsoonian 1965) that observers pooled a size estimate about halfway between area (A) and
diameter (D), i.e. A0.76. Chong and Treisman (2003) went on to show that observers’ mean-size
estimates for displays containing 12 discs were little affected by size heterogeneity (over a ±0.5
octave range), exposure duration, memory delays, or even the shape of the probability density
function for element size. Note that when discriminating stimuli composed of disks with different
mean size there are potential confounds in terms of either overall luminance or contrast of the
display (for disk or Gabor elements, respectively) as well as the density of element (if elements
occupy the similarly-sized regions). Chong and Treisman (2005) showed that judgements of mean
element size were unlikely to be based on such artefacts; neither mismatching density nor inter-
mingling the two sets to be discriminated greatly impacted performance.
Although they were carefully conducted, it is difficult to draw definitive conclusions about
the mechanism for size averaging based on these early studies because of the qualitative nature
of their data analyses. Quantitative comparison of human data to the performance of an ideal
observer (that averages a series of noiseless size estimates from a subset of the elements present)
led Myczek and Simons (2008) to conclude that the evidence for size averaging was equivocal.
Performance was frequently consistent with observers not averaging but rather, for example,
reporting the largest element in a display. In response Chong, Joo, Emmanouil, and Treisman
(2008) presented results which are intuitively difficult to reconcile with a lack of averaging
(e.g. superior performance with more elements) but what hampered resolution of this debate
was a consistent failure to apply a single plausible ideal observer model to a complete psy-
chophysical data set. The ideal observer used by Myczek and Simons (2008) limited sample
size but not uncertainty about individual disk sizes, and varied its decision rules based on the
condition. To resolve this debate, Solomon, Morgan, and Chubb (2011) used an equivalent
noise approach, measuring mean size and size-variance discrimination in the presence of dif-
ferent levels of size variability, and modelled results using a variant on Equation 1. Their results
indicate that observers can average 62–75% of elements present to judge size variance and that
(most) observers could use at least three elements when judging mean size. Although Solomon
et al. note that performance was not substantially better than that of an ideal observer using the
largest size present, more recent estimates of sampling for size averaging are closer to an effec-
tive sample size of five elements5 (Im and Halberda 2013). This suggests that size averaging does
involve some form of pooling. Note that it is a unique benefit of equivalent noise analysis that—
provided one accepts the assumptions of the ideal observer—one can remain agnostic as to the
underlying psychological/neural reality of how averaging works but still definitely establish
that observers perform in a manner that effectively involves averaging across multiple elements.
Recently, however, Allik et al. (2013) have presented compelling evidence that observers not
only use mean size but that this size averaging is compulsory (i.e. taking place without awareness
of individual sizes).
There has been considerable debate in this field as to whether the number of elements pre-
sent influences the observers’ ability to average size. The majority of studies (Allik et  al. 2013;
Alvarez 2011; Ariely 2001; Chong and Treisman 2005) report little gain from the addition of

5  This is a corrected value based on a reported value of 7, which Allik et al (2013) point out is an over-estimate
(by a factor of 2 ). This is because the equivalent noise model fit by Im and Halberda’s (2013) does not allow
for a two-interval/two-alternative forced-choice task.
160 Dakin

extra elements, which has led some to conclude that this is evidence for a high-capacity parallel
processor of mean size (Alvarez 2011; Ariely 2001). From the point of view of averaging, Allik
et al. (2013) point out that near-constant performance indicates a consistent drop in efficiency
(i.e. sample size divided by number of elements), and propose a variant on the equivalent noise
approach that can account for this pattern of performance.
The development of models of size averaging that link behaviour to neural mechanisms has
been limited by a general lack of knowledge about the neural code for size. As a candidate basis
function for texture averaging, let us once again consider the Gabor model of V1 receptive fields.
Gabors code for spatial frequency (SF) not size. Although SF is likely a central component of
the neural code for size it cannot suffice in isolation (since it confounds size with SF content).
A further complication arises from the finding that the codes for size, number, and density are
intimately interconnected. Randomizing the size or density of elements makes it hard to judge
their number and we have suggested that this is consistent with estimates of magnitude from
texture (element size, density, or number) sharing a common mechanism possibly based on the
relative response of filters tuned to different SFs (Dakin et al. 2011). I note that such a model—
like the notion that a ratio of high to low temporal-frequency-tuned filters could explain speed
averaging—predicts no requirement for individuation of element sizes for successful size averag-
ing (Allik et al. 2013).

Averaging of other dimensions


Observers can discriminate differences in depth between two surfaces containing high levels of
disparity noise (σ = 13.6 arc min) indicating robust depth averaging, albeit at low levels of sam-
pling efficiency compared to other tasks (Wardle et al. 2012). Like motion perception (Mareschal,
Bex, and Dakin 2008), local/internal noise limits depth averaging in the peripheral visual field
(Wardle et al. 2012). De Gardelle and Summerfield (2011) looked at averaging of colour (judging
‘red vs blue’) and shape (‘square vs circle’) as a function of the variability of the attribute and report
that observers apparently assign less weight to outliers. Morgan and Glennerster (1991) showed
that observers represented the location of a cloud of dots by the centroid of their individual posi-
tions with performance improving with increasing numbers of elements. Observers presented
with crowded letter-like stimuli lose information in a manner consistent with features having
undergone a compulsory averaging of the positions of their constituent features (Greenwood,
Bex, and Dakin 2009). It has been shown that in addition to low-level image properties, observers
are able to make statistical summary representations of facial attributes such as emotion and gen-
der (Haberman and Whitney 2007) and even identity (de Fockert and Wolfenstein 2009). Pooling
of cues relating to human form even extends to pooling of biological motion (Giese chapter);
observers are able to precisely judge the mean heading of crowds of point-light walkers (Sweeny,
Haroz, and Whitney 2013).

Attention
Attneave (1954) argued that statistical characterization of images could provide a compact rep-
resentation of complex visual structure that can distil useful information and so reduce task
demands. In this chapter I have reviewed evidence that the computation of texture statistics pro-
vides one means to achieve this goal. It has been proposed that attention serves essentially the
same purpose, filtering relevant from irrelevant information: ‘it implies withdrawal from some
things in order to deal effectively with others’ (James 1890:  256). How then do attention and
Seeing Statistical Regularities 161

averaging interact? Alvarez and Oliva (2009) used a change-detection task to show that simul-
taneous changes in local and global structure were more detectable, under conditions of high
attentional load, than changes to local features alone. They argue that this is consistent with a
reduction in attention to the background increasing noise in local (but less so on global) repre-
sentations. However, to perform this task one had only to notice any change in the image, so that
observers could use whatever cue reaches threshold first. Consequently, another interpretation of
these findings is that global judgements are easier so that observers use them when they can. In
order to determine the role of attention in averaging one must have a task where one can quantify
the extent to which observers are relying on local or global information. To this end, an equiva-
lent noise paradigm (see above) has been used to assess the role of attention in averaging and, in
particular, to separate its influence from that of crowding (Dakin et al. 2009). Attentional load and
crowding in an orientation-averaging task have quite distinct effects on observers’ performance.
While crowding effectively made observers uncertain about the orientation of each local element,
attentional restrictions limited global processing, specifically how many elements they could effec-
tively average.

Discussion
My review suggests several commonalities between averaging of various features. Coding seems
to be predominantly limited to first- and second-order statistics (sensitivity to third-order sta-
tistics in the luminance domain likely arises from the cortical basis filters being tuned for con-
trast, itself a second-order statistic). Computation of texture statistics generally exhibits flexibility
about the spatial distribution of elements, and does not require individuation of elements. Many
experimental manipulations of averaging end up influencing the local representation of direction
and orientation (e.g. crowding, eccentricity, absolute direction/orientation) with global pooling/
sampling being influenced only by attention or by the number of elements actually present. The
fact that size averaging only benefits modestly if at all from the addition of more elements is
odd—and has been used to call into question whether size averaging is possible at all. However,
recent equivalent noise experiments suggest that size averaging is possible. Further application of
this technique to determine the influence of number of elements on size averaging would allow us
to determine if the lack of effect of element number represents, for example, a trade-off between
sampling improvements and loss of local information that accompanies an increase in the num-
ber of elements.
I would sound a note of caution about the use of equivalent noise paradigms to study the human
estimation of visual ensemble statistics. The two-parameter model (Equation 1) is a straightfor-
ward means of interpreting discrimination performance in terms of local/global limits on visual
processing. However, this is psychophysics and the parameters such a model yields cannot guar-
antee that the underlying neural mechanism operates in the same manner as the ideal observer.
For example, if your performance on a size-averaging task is best fit by an EN model averaging
three elements, this means you are behaving as though you are averaging a sample of three ele-
ments. In other words, you could not achieve this performance using fewer than three elements.
What it does not say is that you are necessarily averaging a series of estimates at all. As described
above, you could average using all the elements (corrupted by noise) or (if the sampling rate
were low) just a few outlying sizes (i.e. very large or very small). Similarly, estimated internal
noise—which I have termed local noise—reflect the sum of all additive noise to which the system
is prone. Consequently, extra noise terms can be added to the two-parameter model to capture
the influence of late or decisional noise (Solomon 2010). However, wherever noise originates, the
162 Dakin

two-parameter form of this expression is still a legitimate means of estimating how much perfor-
mance is being limited by an effective precision on judgements about individual elements and an
effective ability to pool across estimates. I contend that this, like the psychometric function, can be
treated as a compact characterization of performance that is useful for constraining biologically
plausible models of visual processing of texture statistics.
I further submit that current psychophysical data on averaging of luminance, motion, orienta-
tion, speed, and perhaps size suggest a rather simple ‘back-pocket’ model of ensemble statisti-
cal encoding. Specifically, a bank of mechanisms each pooling a set of input units (with V1-like
properties) distributed over a wide range of spatial locations and spatial frequencies and with
input sensitivities distributed over a Gaussian range of the attribute of interest. Activity of each
over these channels is limited by (a) effective noise on each input unit and (b) multiplicative noise
on the pool, and is decoded using a maximum-likelihood/template-matching procedure to con-
fer levels of resistance to uncorrelated noise (of the sort used in coherence paradigms) that a
vector-averaging procedure would be unable to produce.
The cortical locus for the computation of these statistics is unknown. However, it may be ear-
lier than one might think. As well as the unexpected dependence of motion pooling on signal
luminance (indicating pooling of signals generated pre-MT), note also that while observers can
average orientation signals defined by either luminance or contrast, they are unable to average
across stimulus types. This indicates that averaging happens before assignment of an abstract (i.e.
cue-invariant) orientation label (Allen et  al. 2003). As well as the issue of neural locus, there
are several other open questions around visual computation of summary statistics. First, what is
actually getting averaged? We have seen some effort in this regard for size averaging—something
between diameter and area (a ‘one-and-a-half-dimensional’ representation?) gets averaged—but
no effort has been made to separate out size from (say) spatial frequency. Building better models
requires an understanding of their input. In this vein, can spatially coarse channels of the kind
described above really provide a sufficient description of images? Such a representation would pre-
dict an almost complete loss of information about individual elements under averaging. Although
that does seem to happen in some circumstances, the limits on the local representation have yet to
be firmly established. And finally, how important are natural scenes in driving our representation
of ensemble statistics other than orientation or motion?

References
Allard, R. and P. Cavanagh (2012). ‘Different Processing Strategies Underlie Voluntary Averaging in Low
and High Noise’. Journal of Vision 12(11): 6. doi: 10.1167/12.11.6
Allen, H. A., R. F. Hess, B. Mansouri, and S. C. Dakin (2003). ‘Integration of First- and Second-Order
Orientation’. Journal of the Optical Society of America. A: Optics, Image Science, and Vision
20(6): 974–986.
Allik, J. (1992). ‘Competing Motion Paths in Sequence of Random Dot Patterns’. Vision Research
32(1): 157–165.
Allik, J., M. Toom, A. Raidvee, K. Averin, and K. Kreegipuu (2013). ‘An almost General Theory of Mean
Size Perception’. Vision Research 83: 25–39. doi: 10.1016/j.visres.2013.02.018
Alvarez, G. A. and A. Oliva (2009). ‘Spatial Ensemble Statistics are Efficient Codes that Can Be Represented
with Reduced Attention’. Proceedings of the National Academy of Sciences of the United States of America
106(18): 7345–7350. doi: 10.1073/pnas.0808981106
Alvarez, G. A. (2011). ‘Representing multiple objects as an ensemble enhances visual cognition’. Trends
Cogn. Sci. 15(3): 122–131. doi: 10.1016/j.tics.2011.01.003
Seeing Statistical Regularities 163

Anderson, B. L. and J. Kim (2009). ‘Image Statistics Do Not Explain the Perception of Gloss and Lightness’.
Journal of Vision 9(11): 10 11–17. doi: 10.1167/9.11.10
Appelle, S. (1972). ‘Perception and Discrimination as a Function Of Stimulus Orientation: The “Oblique
Effect” in Man and Animals’. Psychol. Bull. 78(4): 266–278.
Ariely, D. (2001). ‘Seeing Sets: Representation by Statistical Properties’. Psychological Science 12(2): 157–162.
Atchley, P. and G. J. Andersen (1995). ‘Discrimination of Speed Distributions: Sensitivity to Statistical
Properties’. Vision Research 35(22): 3131–3144.
Attneave, F. (1954). ‘Some Informational Aspects of Visual Perception’. Psychol. Rev. 61(3): 183–193.
Baldassi, S. and D. C. Burr (2000). ‘Feature-Based Integration of Orientation Signals in Visual Search’.
Vision Research 40(10–12): 1293–1300.
Barlow, H. and S. P. Tripathy (1997). ‘Correspondence Noise and Signal Pooling in the Detection of
Coherent Visual Motion’. Journal of Neuroscience 17(20): 7954–7966.
Bauer, B. (2009). ‘Does Stevens’s Power Law for Brightness Extend to Perceptual Brightness Averaging’.
Psychological Record 59: 171–186.
Bex, P. J. and S. C Dakin (2002). ‘Comparison of the Spatial-Frequency Selectivity of Local and Global
Motion Detectors’. Journal of the Optical Society of America. A: Optics, Image Science, and Vision
19(4): 670–677.
Bex, P. J. and W. Makous (2002). ‘Spatial Frequency, Phase, and the Contrast of Natural Images’. Journal of
the Optical Society of America. A: Optics, Image Science, and Vision 19(6): 1096–1106.
Bonin, V., V. Mante, and M. Carandini (2006). ‘The Statistical Computation Underlying Contrast Gain
Control’. Journal of Neuroscience 26(23): 6346–6353. doi: 10.1523/JNEUROSCI.0284-06.2006
Brady, T. F. and G. A. Alvarez (2011). ‘Hierarchical Encoding in Visual Working Memory: Ensemble
Statistics Bias Memory for Individual Items’. Psychological Science 22(3): 384–392.
doi: 10.1177/0956797610397956
Chong, S. C. and A. Treisman (2003). ‘Representation of Statistical Properties’. Vision Research 43(4): 393–404.
Chong, S. C. and A. Treisman (2005). ‘Statistical Processing: Computing the Average Size in Perceptual
Groups’. Vision Research 45(7): 891–900. doi: 10.1016/j.visres.2004.10.004
Chong, S. C., S. J. Joo, T. A. Emmanouil, and A. Treisman (2008). ‘Statistical Processing: Not so
Implausible After All’. Perception and Psychophysics 70(7): 1327–1334; discussion 1335–1326.
doi: 10.3758/PP.70.7.1327
Chubb, C., J. Econopouly, and M. S. Landy (1994). ‘Histogram Contrast Analysis and the Visual
Segregation of IID Textures’. Journal of the Optical Society of America. A: Optics, Image Science, and
Vision 11(9): 2350–2374.
Chubb, C., J. H. Nam, D. R. Bindman, and G. Sperling (2007). ‘The Three Dimensions of Human Visual
Sensitivity to First-Order Contrast Statistics’. Vision Research 47(17): 2237–2248. doi: 10.1016/j.
visres.2007.03.025
Dakin, S. C. (1997). ‘The Detection of Structure in Glass Patterns: Psychophysics and Computational
Models’. Vision Research 37(16): 2227–2246.
Dakin, S. C. and R. J. Watt (1997). ‘The Computation of Orientation Statistics from Visual Texture’. Vision
Research 37(22): 3181–3192.
Dakin, S. C. (1999). ‘Orientation Variance as a Quantifier of Structure in Texture’. Spatial Vision 12(1): 1–30.
Dakin, S. C. (2001). ‘Information Limit on the Spatial Integration of Local Orientation Signals’. Journal of
the Optical Society of America. A: Optics, Image Science, and Vision 18(5): 1016–1026.
Dakin, S. C., I. Mareschal, and P. J. Bex (2005a). ‘Local and Global Limitations on Direction Integration
Assessed Using Equivalent Noise Analysis’. Vision Research 45(24): 3027–3049. doi: 10.1016/j.
visres.2005.07.037
Dakin, S. C., I. Mareschal, and P. J. Bex (2005b). ‘An Oblique Effect for Local Motion: Psychophysics and
Natural Movie Statistics’. Journal of Vision 5(10): 878–887. doi: 10.1167/5.10.9
164 Dakin

Dakin, S. C., P. J. Bex, J. R. Cass, and R. J. Watt (2009). ‘Dissociable Effects of Attention and Crowding on
Orientation Averaging’. Journal of Vision 9(11): 28, 1–16. doi: 10.1167/9.11.28
Dakin, S. C., M. S. Tibber, J. A. Greenwood, F. A. Kingdom, and M. J. Morgan (2011). ‘A Common Visual
Metric for Approximate Number and Density’. Proceedings of the National Academy of Sciences of the
United States of America 108(49): 19552–19557. doi: 10.1073/pnas.1113195108
Daugman, J. G. (1985). ‘Uncertainty Relation for Resolution in Space, Spatial-Frequency, and Orientation
Optimized by Two Dimensional Cortical Filters’. Journal of the Optical Society of America. A: Optics,
Image Science, and Vision 2: 1160–1169.
Dean, A. F. (1981). ‘The Variability of Discharge of Simple Cells in the Cat Striate Cortex’. Exp. Brain Res.
44(4): 437–440.
Deneve, S., P. E. Latham, and A. Pouget (1999). ‘Reading Population Codes: A Neural Implementation of
Ideal Observers’. Nat. Neurosci. 2(8): 740–745. doi: 10.1038/11205
de Fockert, J. and C. Wolfenstein (2009). ‘Rapid Extraction of Mean Identity from Sets of Faces’. Q. J. Exp.
Psychol. (Hove) 62(9): 1716–1722. doi: 10.1080/17470210902811249
de Gardelle, V. and C. Summerfield (2011). ‘Robust Averaging during Perceptual Judgment’. Proceedings
of the National Academy of Sciences of the United States of America 108(32): 13341–13346. doi: 10.1073/
pnas.1104517108
Girshick, A. R., M. S. Landy, and E. P. Simoncelli (2011). ‘Cardinal Rules: Visual Orientation
Perception Reflects Knowledge of Environmental Statistics’. Nat. Neurosci. 14(7): 926–932.
doi: 10.1038/nn.2831
Greenwood, J. A., P. J. Bex, and S. C. Dakin (2009). ‘Positional Averaging Explains Crowding with
Letter-Like Stimuli’. Proceedings of the National Academy of Sciences of the United States of America
106(31): 13130–13135. doi: 10.1073/pnas.0901352106
Haberman, J. and D. Whitney (2007). ‘Rapid Extraction of Mean Emotion and Gender from Sets of Faces’.
Curr. Biol. 17(17): R751–753. doi: 10.1016/j.cub.2007.06.039
Hubel, D. H. and T. N. Wiesel (1962). ‘Receptive Fields, Binocular Interaction and Function Architecture in
the Cat’s Visual Cortex’. Journal of Physiology 160: 106–154.
Husk, J. S., P. C. Huang, and R. F. Hess (2012). ‘Orientation Coherence Sensitivity’. Journal of Vision
12(6): 18. doi: 10.1167/12.6.18
Im, H. Y. and J. Halberda (2013). ‘The Effects of Sampling and Internal Noise on the Representation of
Ensemble Average Size’. Atten. Percept. Psychophys. 75(2): 278–286. doi: 10.3758/s13414-012-0399-4
James, W. (1890). The Principles of Psychology. New York: Henry Holt and Co.
Julesz, B., E. N. Gilbert, L. A. Shepp, and H. L. Frisch (1973). ‘Inability of Humans to Discriminate
between Visual Textures that Agree in Second-Order Statistics—Revisited’. Perception 2(4): 391–405.
Julesz, B. (1981). ‘Textons, the Elements of Texture Perception, and their Interactions’. Nature
290(5802): 91–97.
Kass, M. and A. Witkin (1985). ‘Analyzing Oriented Patterns’. Paper presented at the Ninth International
Joint Conference on Artificial Intelligence.
Kim, J. and B. L. Anderson (2010). ‘Image Statistics and the Perception of Surface Gloss and Lightness’.
Journal of Vision 10(9): 3. doi: 10.1167/10.9.3
Kingdom, F. A., A. Hayes, and D. J. Field (2001). ‘Sensitivity to Contrast Histogram Differences in
Synthetic Wavelet-Textures’. Vision Research 41(5): 585–598.
Levi, D. M. (2008). ‘Crowding—an Essential Bottleneck for Object Recognition: A Mini-Review’. Vision
Research 48(5): 635–654. doi: 10.1016/j.visres.2007.12.009
Malik, J. and R. Rosenholtz (1994). ‘A Computational Model for Shape from Texture’. Ciba Foundation
Symposium 184: 272–283; discussion 283–276, 330–278.
Mareschal, I., P. J. Bex, and S. C. Dakin (2008). ‘Local Motion Processing Limits Fine Direction
Discrimination in the Periphery’. Vision Research 48(16): 1719–1725. doi: 10.1016/j.visres.2008.05.003
Seeing Statistical Regularities 165

Marr, D. (1982). Vision. San Francisco: Freeman.


Miller, A. L. and R. Sheldon (1969). ‘Magnitude Estimation of Average Length and Average Inclination’. J.
Exp. Psychol. 81(1): 16–21.
Morgan, M., C. Chubb, and J. A. Solomon (2008). ‘A “Dipper” Function for Texture Discrimination Based
on Orientation Variance’. Journal of Vision 8(11): 9 1–8. doi: 10.1167/8.11.9
Morgan, M. J. and A. Glennerster (1991). ‘Efficiency of Locating Centres of Dot-Clusters by Human
Observers’. Vision Research 31(12): 2075–2083.
Motoyoshi, I., S. Nishida, L. Sharan, and E. H. Adelson (2007). ‘Image Statistics and the Perception of
Surface Qualities’. Nature 447(7141): 206–209. doi: 10.1038/nature05724
Moulden, B., F. Kingdom, and L. F. Gatley (1990). ‘The Standard Deviation of Luminance as a Metric for
Contrast in Random-Dot Images’. Perception 19(1): 79–101.
Myczek, K. and D. J. Simons (2008). ‘Better than Average: Alternatives to Statistical Summary
Representations for Rapid Judgments of Average Size’. Perception and Psychophysics 70(5): 772–788.
Nam, J. H. and C. Chubb (2000). ‘Texture Luminance Judgments are Approximately Veridical’. Vision
Research 40(13): 1695–1709.
Newsome, W. T. and E. B. Pare (1988). ‘A Selective Impairment of Motion Perception Following Lesions of
the Middle Temporal Visual Area (MT)’. Journal of Neuroscience 8(6): 2201–2211.
Olshausen, B. A. and D. J. Field (2005). ‘How Close Are We to Understanding v1?’ Neural Comput.
17(8): 1665–1699. doi: 10.1162/0899766054026639
Parkes, L., J. Lund, A. Angelucci, J. A. Solomon, and M. Morgan (2001). ‘Compulsory Averaging of
Crowded Orientation Signals in Human Vision’. Nat. Neurosci. 4(7): 739–744. doi: 10.1038/89532
Portilla, J. and E. P. Simoncelli (1999). ‘Texture Modeling and Synthesis Using Joint Statistics of Complex
Wavelet Coefficients’. Paper presented at the IEEE Workshop on Statistical and Computational Theories
of Vision.
Sclar, G., J. H. Maunsell, and P. Lennie (1990). ‘Coding of Image Contrast in Central Visual Pathways of the
Macaque Monkey’. Vision Research 30(1): 1–10.
Smith, A. T., R. J. Snowden, and A. B. Milne (1994). ‘Is Global Motion Really Based on Spatial Integration
of Local Motion Signals?’ Vision Research 34(18): 2425–2430.
Solomon, J. A. (2010). ‘Visual Discrimination of Orientation Statistics in Crowded and Uncrowded Arrays’.
Journal of Vision 10(14): 19. doi: 10.1167/10.14.19
Solomon, J. A., M. Morgan, and C. Chubb (2011). ‘Efficiencies for the Statistics of Size Discrimination’.
Journal of Vision 11(12): 13. doi: 10.1167/11.12.13
Stevens, S. S. (1961). ‘To Honor Fechner and Repeal his Law: A Power Function, Not a Log Function,
Describes the Operating Characteristic of a Sensory System’. Science 133(3446): 80–86. doi: 10.1126/
science.133.3446.80
Sweeny, T. D., S. Haroz, and D. Whitney (2013). ‘Perceiving Group Behavior: Sensitive Ensemble
Coding Mechanisms for Biological Motion of Human Crowds’. J. Exp. Psychol. Hum. Percept. Perform.
39(2): 329–337. doi: 10.1037/a0028712
Teghtsoonian, M. (1965). ‘The Judgment of Size’. American Journal of Psychology 78: 392–402.
Tolhurst, D. J., C. R. Sharpe, and G. Hart (1973). ‘The Analysis of the Drift Rate of Moving Sinusoidal
Gratings’. Vision Research 13(12): 2545–2555.
Tomassini, A., M. J. Morgan, and J. A. Solomon (2010). ‘Orientation Uncertainty Reduces Perceived
Obliquity’. Vision Research 50(5): 541–547. doi: 10.1016/j.visres.2009.12.005
Wardle, S. G., P. J. Bex, J. Cass, and D. Alais (2012). ‘Stereoacuity in the Periphery is Limited by Internal
Noise’. Journal of Vision 12(6): 12. doi: 10.1167/12.6.12
Watamaniuk, S. N., R, Sekuler, and D. W. Williams (1989). ‘Direction Perception in Complex Dynamic
Displays: The Integration of Direction Information’. Vision Research 29(1): 47–59.
166 Dakin

Watamaniuk, S. N. and A. Duchon (1992). ‘The Human Visual System Averages Speed Information’. Vision
Research 32(5): 931–941.
Watamaniuk, S. N. and R. Sekuler (1992). ‘Temporal and Spatial Integration in Dynamic Random-Dot
Stimuli’. Vision Research 32(12): 2341–2347.
Watamaniuk, S. N., R. Sekuler, and S. P. McKee (2011). ‘Perceived Global Flow Direction Reveals Local
Vector Weighting by Luminance’. Vision Research 51(10): 1129–1136. doi: 10.1016/j.visres.2011.03.003
Webb, B. S., T. Ledgeway, and P. V. McGraw (2007). ‘Cortical Pooling Algorithms for Judging Global
Motion Direction’. Proceedings of the National Academy of Sciences of the United States of America
104(9): 3532–3537. doi: 10.1073/pnas.0611288104
Williams, D. W. and R. Sekuler (1984). ‘Coherent Global Motion Percepts from Stochastic Local Motions’.
Vision Research 24(1): 55–62.
Witkin, A. (1981). ‘Recovering Surface Shape and Orientation from Texture’. Artificial Intelligence 17: 17–47.
Chapter 9

Texture perception
Ruth Rosenholtz

Introduction: What is texture?
The structure of a surface, say of a rock, leads to a pattern of bumps and dips that we can feel with
our fingers. This applies equally well to the surface of skin, the paint on the wall, the surface of a car-
rot, or the bark of a tree. Similarly, the pattern of blades of grass in a lawn, pebbles on the ground,
or fibers in woven material, all lead to a tactile ‘texture’. The surface variations that lead to texture
we can feel also tend to lead to variations in the intensity of light reaching our eyes, producing what
is known as ‘visual texture’ (or here, simply ‘texture’). Visual texture can also come from variations
that do not lend themselves to tactile texture, such as the variation in composition of a rock (quartz
looks different from mica), waves in water, or patterns of surface color such as paint.
Texture is useful for a variety of tasks. It provides a cue to the shape and orientation of a surface
(Gibson 1950). It aids in identifying the material of which an object or surface is made (Gibson
1986). Most obviously relevant for this Handbook, texture similarity provides one cue to perceiv-
ing coherent groups and regions in an image.
Understanding human texture processing requires the ability to synthesize textures with desired
properties. By and large this was intractable before the wide availability of computers. Gibson
(1950) studied shape-from-texture by photographing wallpaper from different angles. Our under-
standing of texture perception would be quite limited if we were restricted to the small set of
textures found in wallpaper. Attneave (1954) gained significant insight into visual representation
by thinking about perception of a random noise texture, though he had to generate that texture
by hand, filling in each cell according to a table of random numbers. Beck (1966; 1967) formed
micropattern textures out of black tape affixed to white cardboard, restricting the micropatterns
to those made of line segments. Olson and Attneave (1970) had more flexibility, as their micropat-
terns were drawn in india ink. Julesz (1962, 1965) was in the enviable position of having access
to computers and algorithms for generating random textures. More recently, texture synthesis
techniques have gotten far more powerful, allowing us to gain new insights into human vision.
It is elucidating to ask why we label the surface variations of tree bark ‘texture’, and the sur-
face variations of the eyes, nose, and mouth ‘parts’ of a face object, or objects in their own right.
One reason for the distinction may be that textures have different identity-preserving transfor-
mations than objects. Shifting around regions within a texture does not fundamentally change
most textures, whereas swapping the nose and mouth on a face turns it into a new object (see also
Behrmann et al., this volume). Two pieces of the same tree bark will not look exactly the same,
but will seem to be the same ‘stuff ’, and therefore swapping regions has minimal effect on our
perception of the texture. Textures are relatively homogeneous, in a statistical sense, or at least
slowly varying. Fundamentally, texture is statistical in nature, and one could argue that texture
is stuff that is more compactly represented by its statistics—its aggregate properties—than by the
configuration of its parts (Rosenholtz 1999).
(a) (b)

(c) (d)

(e) (f)

Fig. 9.1  Texture segmentation pairs. (a)–(d): Micropattern textures. (a) Easily segments, and the two
textures have different 2nd order pixel statistics; (b) also segments fairly easily, yet the textures have the
same 2nd order statistics; (c) different 2nd-order statistics, does not easily segment, yet it is easy to tell
apart the two textures; (d) neither segments nor is it easy to tell apart the textures. (e,f) Pairs of natural
textures. The pair in (f) is easier to segment, but all four textures are clearly different in appearance.
Texture Perception 169

That texture and objects have different identity-preserving transformations suggests that one
might want to perform different processing on objects than on texture. In the late 1990s, that
was certainly the case in computer vision and image processing. Object recognition algorithms
differed greatly from texture classification algorithms. Algorithms for determining object shape
and pose were very different from those that found the shape of textured surfaces. In image cod-
ing, regions containing texture might be compressed differently than those dominated by objects
(Popat and Picard 1993). The notion of different processing for textures vs. objects was preva-
lent enough that several researchers developed algorithms to find regions of texture in an image,
though this was hardly a popular idea (Karu et al. 1996; Rosenholtz 1999).
However, exciting recent work (Section “Texture perception is not just for textures”) suggests that
human vision employs texture processing mechanisms even when performing object recognition
tasks in image regions not containing obvious ‘texture’. The phenomena of visual crowding provided
the initial evidence for this hypothesis. However, if true, such mechanisms would influence the
information available for object recognition, scene perception, and diverse tasks in visual cognition.
This chapter reviews texture segmentation, texture classification/appearance, and visual crowd-
ing. It is obviously impossible to fully cover such a diversity of topics in a short chapter. The
material covered will focus on computational issues, on the representation of texture by the visual
system, and on connections between the different topics.

Texture segmentation
Phenomena
An important facet of vision is the ability to perform ‘perceptual organization’, in which the visual
system quickly and seemingly effortlessly transforms individual feature estimates into perception
of coherent regions, structures, and objects. One cue to perceptual organization is texture similar-
ity. The visual system uses this cue in addition to and in conjunction with (Giora and Casco 2007;
Machilsen and Wagemans 2011) grouping by proximity, feature similarity, and good continuation
(see also Brooks, this volume; Elder, this volume).
The dual of grouping by similar texture is important in its own right, and has, in fact, received
more attention. In ‘preattentive’ or ‘effortless’ texture segmentation two texture regions quickly
and easily segregate—in less than 200 milliseconds. Observers may perceive a boundary between
the two. Figure 9.1 shows several examples. Like contour integration and perception of illusory
contours, texture segmentation is a classic Gestalt phenomenon. The whole is different than the
sum of its parts (see also Wagemans, this volume), and we perceive region boundaries which are
not literally present in the image (Figure 9.1a,b).
Researchers have taken performance under rapid presentation, often followed by a mask,
as meaning that texture segmentation is preattentive and occurs in early vision (Julesz 1981;
Treisman 1985). However, the evidence for both claims is somewhat questionable. We do not
really understand in what way rapid presentation limits visual processing. Can higher-level pro-
cessing not continue once the stimulus is removed? Does fast presentation mean preattentive?
(See also Gillebert & Humphreys, this volume.) Empirical results have given conflicting answers.
Mack et al. (1992) showed that texture segmentation was impaired under conditions of inatten-
tion due to the unexpected appearance of a segmentation display during another task. However,
the segmentation boundaries in their stimuli aligned almost completely with the stimulus for
the main task: two lines making up a large ‘+’ sign. This may have made the segmentation task
more difficult. Perhaps judging whether a texture edge occurs at the same location as an actual
170 Rosenholtz

line requires attention. Mack et al. (1992) demonstrated good performance at texture segmenta-
tion in a dual-task paradigm. Others (Braun and Sagi 1991; Ben-Av and Sagi 1995) show similar
results for a singleton-detection task they refer to as texture segregation. Certainly performance
with rapid presentation would seem to preclude mechanisms which require serial processing of
the individual micropatterns which make up textures like those in Figure 9.1a–d.
Some pairs of textures segment easily (Figure 9.1a), others with more difficulty (Figure 9.1b).
Some texture pairs are obviously different, even if they do not lead to a clearly perceived segmen-
tation boundary (Figure 9.1c), whereas other texture pairs require a great deal of inspection to tell
the difference (Figure 9.1d). Predicting the difficulty of segmenting any given pair of textures pro-
vides an important benchmark for understanding texture segmentation. Researchers have hoped
that such understanding would provide insight more generally into early vision mechanisms, such
as what features are available preattentively.

Statistics of pixels
When two textures differ sufficiently in their mean luminance, segmentation occurs (Boring 1945;
Julesz 1962). The same seems true for other differences in the luminance histogram (Julesz 1962;
Julesz 1965; Chubb et al. 2007). In other words, a sufficiently large difference between two textures
in their 1st-order luminance statistics leads to effortless segmentation.1 Differences in 1st-order
chrominance statistics also support segmentation (e.g. Julesz 1965).
However, differences in 1st-order pixel statistics are not necessary for texture segmentation to
occur. Differences in line orientation between two textures are as effective as differences in bright-
ness (Beck 1966; Beck 1967; Olson and Attneave 1970). Consider micropattern textures formed
of line segments (e.g. Figures 9.1a–c). Differences in the orientations of the line segments predict
segmentation better than either the orientation of the micropatterns, or their rated similarity. An
array of upright Ts segments poorly from an array rotated by 90 degrees; the line orientations are
the same in the two patterns. A T appears more similar to a tilted (45˚) T than to an L, but Ts seg-
ment from tilted-Ts more readily than they do from Ls.
Julesz (1965) generated textures defined by Markov processes, in which each pixel depends
probabilistically on its predecessors. He observed that one could often see within these textures
clusters of similar brightness values. For example, such clusters might form horizontal stripes, or
dark triangles. Julesz suggested that early perceptual grouping mechanisms might extract these
clusters, and that: ‘As long as the brightness value, the spatial extent, the orientation and the den-
sity of clusters are kept similar in two patterns, they will be perceived as one.’
It is tempting to observe clusters in Julesz’ examples and conclude that extraction of ‘texture
elements’ (aka texels), underlies texture perception. However, texture perception might also be
mediated by measurement of image statistics, with no intermediate step of identifying clusters.
The stripes and clusters in Julesz’ examples were, after all, produced by random processes. As
Julesz (1975) put it: 
[10 years ago], I was skeptical of statistical considerations in texture discrimination because I did not
see how clusters of similar adjacent dots, which are basic for texture perception, could be controlled

  Terminology in the field of texture perception stands in a confused state. ‘1st- and 2nd-order’ can refer to
1

(a) 1st-order histograms of features vs. 2nd-order correlations of those features; (b) statistics involving a
measurement to the first power (e.g. the mean) vs. a measurement to the power of 2 (e.g. the variance)—i.e.
the 1st- and 2nd-moments from mathematics; or (c) a model with only one filtering stage, vs. a model with a
filtering stage, a non-linearity, and then a 2nd filtering stage. This chapter uses the first definition.
Texture Perception 171

and analyzed by known statistical methods . . . In the intervening decade much work went into finding
statistical methods that would influence cluster formation in desirable ways. The investigation led to
some mathematical insights and to the generation of some interesting textures.

The key, for Julesz, was to figure out how to generate textures with desired clusters of dark
and light dots, while controlling their image statistics. With the help of collaborators Gilbert,
Shepp, and Frisch (acknowledged in Julesz 1975), Julesz proposed simple algorithms for gen-
erating pairs of micropattern textures with the same 1st- and 2nd-order pixel statistics. For
Julesz’ black and white textures, 1st-order statistics reduce to the fraction of black dots making
up the texture. 2nd-order or dipole statistics can be measured by dropping ‘needles’ onto a
texture, and observing the frequency with which both ends of the needle land on a black dot,
as a function of needle length and orientation. Such 2nd-order statistics are equivalent to the
power spectrum.
Examination of texture pairs sharing 1st- and 2nd-order pixel statistics led to the now-famous
‘Julesz conjecture’: ‘Whereas textures that differ in their first- and second-order statistics can be
discriminated from each other, those that differ in their third- or higher-order statistics usu-
ally cannot’ (Julesz 1975). This theory predicted a number of results, for both random noise and
micropattern-based textures. For instance, the textures in Figure 9.1a differ in their 2nd-order
statistics, and readily segment, whereas the textures in Figure 9.1d share 2nd-order statistics, and
do not easily segment.

Statistics of textons
However, researchers soon found counterexamples to the Julesz conjecture (Caelli and Julesz
1978; Caelli et al 1978; Julesz et al 1978; Victor and Brodie 1978). For example, the Δ ➔ texture
pair (Figure 9.1b) is relatively easy to segment, yet the two textures have the same 2nd-order
statistics. A difference in 2nd-order pixel statistics appeared neither necessary nor sufficient for
texture segmentation.
Based on the importance of line orientation in texture segmentation (Beck 1966, 1967; Olson
and Attneave 1970), two new classes of theories emerged. The first suggested that texture segmen-
tation was mediated not by 2nd-order pixel statistics, but rather by 1st-order statistics of basic
stimulus features such as orientation and size (Beck et al. 1983). Here ‘1st-order’ refers to histo-
grams of, e.g., orientation, instead of pixel values.
But what of the Δ ➔ texture pair? By construction, it contained no difference in the 1st-order
statistics of line orientation. However, notably triangles are closed shapes, whereas arrows are not.
Perhaps emergent features (Pomerantz & Cragin, this volume), like closure, also matter in texture
segmentation. Other iso-2nd order pairs hinted at the relevance of additional higher-level fea-
tures, dubbed textons. Texton theory proposes that segmentation depends upon 1st-order statis-
tics not only of basic features like orientation, but also of textons such as curvature, line endpoints,
and junctions (Julesz 1981; Bergen and Julesz 1983).
While intuitive on the surface, this explanation was somewhat unsatisfying. Proponents were
vague about the set of textons, making the theory difficult to test or falsify. In addition, it was
not obvious how to extract textons, particularly for natural images (Figure 9.1e,f). (Though see
Barth et al. (1998), for both a principled definition of a class of textons, and a way to measure
them in arbitrary images.) Texton theories have typically been based on verbal descriptions of
image features rather than actual measurements (Bergen and Adelson 1988). These ‘word models’
effectively operate on ‘things’ like ‘closure’ and ‘arrow junctions’ which a human experimenter has
labeled (Adelson 2001).
172 Rosenholtz

Image processing-based models


By contrast, another class of ‘image-computable’ theories emerged. These models are based on
simple image processing operations (Knutsson and Granlund 1983; Caelli 1985; Turner 1986;
Bergen and Adelson 1988; Sutter et al. 1989; Fogel and Sagi 1989; Bovik et al. 1990; Malik and
Perona 1990; Bergen and Landy 1991; Rosenholtz 2000). According to these theories, texture seg-
mentation arises as an outcome of mechanisms like those known to exist in early vision.
These models have similar structure: a first linear filtering stage, followed by a non-linear opera-
tor, additional filtering, and a decision stage. They have been termed filter-rectify-filter (e.g. Dakin
et al. 1999), or linear-nonlinear-linear (LNL, Landy and Graham 2004) models. Chubb and Landy
(1991) dubbed the basic structure the ‘back-pocket model’, as it was the model many researchers
would ‘pull out of their back pocket’ to explain segmentation phenomena.
The first stage typically involves multiscale filters, both oriented and unoriented. The stage-two
non-linearity might be a simple squaring, rectification, or energy computation (Knutsson and
Granlund 1983; Turner 1986; Sutter et al. 1989; Bergen and Adelson 1988; Fogel and Sagi 1989;
Bovik et al. 1990), contrast normalization (Landy and Bergen 1991; Rosenholtz 2000), or inhibi-
tion and excitation between neighboring channels and locations (Caelli 1985; Malik and Perona
1990). The final filtering and decision stages often act as a coarse-scale edge detector. Much effort
has gone into uncovering the details of the filters and nonlinearities.
As LNL models employ oriented filters, they naturally predict segmentation of textures that
differ in their component orientations. But what about results thought to require more complex
texton operators? Bergen and Adelson (1988) examined segmentation of an XL texture pair like
that in Figure 9.1a. These textures contain the same distribution of line orientations, and Bergen
and Julesz (1983) had suggested that easy segmentation might be mediated by such features as
terminators and X- vs. L-junctions. Bergen and Adelson (1988) demonstrated the feasibility of a
simpler solution, based on low-level mechanisms. They observed that the Xs appear smaller than
the Ls, even though their component lines are the same length. Beck (1967) similarly observed
that Xs and Ls have a different overall distribution of brightness when viewed out of focus. Bergen
and Adelson demonstrated that if one accentuates the difference in size, by increasing the length
of the Ls’ bars (while compensating the bar intensities so as not to make one texture brighter
than the other), segmentation gets easier. Decrease the length of the Ls’ bars, and segmentation
becomes quite difficult. Furthermore, they showed that in the original stimulus, a simple size-
tuned mechanism—center-surround filtering followed by full-wave rectification—responds more
strongly to one texture than the other. Even though our visual systems can ultimately identify
nameable features like terminators and junctions, those features may not underlie texture seg-
mentation, which may involve lower-level mechanisms.
The LNL models naturally lend themselves to implementation. Nearly all the models cited here
(Section “Image processing-based models”) were implemented at least up to the decision stage.
They operate on arbitrary images. Implementation makes these models testable and falsifiable, in
stark contrast to word models operating on labeled ‘things’ like micropatterns and their features.
Furthermore, the LNL models have performed reasonably well. Malik and Perona’s (1990) model,
one of the most fully specified and successful, made testable predictions of segmentation diffi-
culty for a number of pairs of micropattern textures. They found strong agreement between their
model’s predictions and behavioral results of Kröse (1986) and Gurnsey and Browse (1987). They
also produced meaningful results on a complex piece of abstract art. Image-computable models
naturally make testable predictions about the effects of texture density (Rubenstein and Sagi 1996)
Texture Perception 173

alignment, and sign of contrast (Graham et al. 1992; Beck et al. 1987), for which word models
inherently have trouble making predictions.

Bringing together statistical and image processing-based models


Is texture segmentation, then, a mere artifact of early visual processing, rather than a meaningful
indicator of statistical differences between textures? The visual system should identify boundaries
in an intelligent way, not leave their detection to the caprices of early vision. Making intelligent
decisions in the face of uncertainty is the realm of statistics. Furthermore, statistical models seem
appropriate due to the statistical nature of textures.
Statistical and image processing-based theories are not mutually exclusive. Arguably the first
filtering stage in LNL models extracts basic features, and the later filtering stage computes a sort
of average. Perhaps thinking in terms of intelligent decisions can clarify the role of unknown
parameters in the LNL models, better specify the decision process, and lend intuitions about
which textures segment.
If the mean orientations of two textures differ, should we necessarily perceive a boundary? From
a decision-theory point of view this would be unwise; a small difference in mean might occur
by chance. Perhaps textures segment if their 1st-order feature statistics are significantly different
(Voorhees and Poggio 1988; Puzicha et al. 1997; Rosenholtz 2000). Significant difference takes
into account the variability of the textures; two homogeneous textures with mean orientations
differing by 30 degrees may segment, while two heterogeneous textures with the same difference
in mean may not. Experimental results confirm that texture segmentation shows this depend-
ence upon texture variability. Observers can also segment two textures differing significantly in
the variance of their orientations. However, observers are poor at segmenting two textures with
the same mean and variance, when one is unimodal and the other bimodal (Rosenholtz 2000). It
seems that observers do not use the full 1st-order statistics of orientation.
These results point to the following model of texture segmentation (Rosenholtz 2000). The
observer collects n noisy feature estimates from each side of a hypothesized edge. The number of
samples is limited, as texture segmentation involves local rather than global statistics (Nothdurft
1991). If the two sets of samples differ significantly, with some confidence, α, then the observer
sees a boundary. Rosenholtz (2000) tests for a significant difference in mean orientation, mean
contrast, orientation variance, and contrast variance.
The model can be implemented using biologically plausible image processing operations.
Though the theoretical development came from thinking about statistical tests on discrete
samples, the model extracts no ‘things’ like line elements or texels. Rather it operates on con-
tinuous ‘stuff ’ (Adelson 2001). The model has three fairly intuitive free parameters, all of
which can be determined by fitting behavioral data. Two internal noise parameters capture
human contrast and orientation discriminability. The last parameter specifies the radius of
the region over which measurements are pooled to compute the necessary summary statistics
(mean, variance, etc.).
Human performance segmenting orientation-defined textures is well fit by the model (Rosenholtz
2000). The model also predicts the rank ordering of segmentation strength for micropattern tex-
ture pairs (TL, +T, Δ➔, and L+) found by Gurnsey and Browse (1987). Furthermore, Hindi Attar
et al. (2007) related the salience of a texture boundary to the rate of filling-in of the central texture
in stabilized images. They found that the model predicted many of the asymmetries found in
filling-in.
174 Rosenholtz

(a) (b)

(c) (d)

Fig. 9.2  Comparison of the information encoded in different texture descriptors. (a) Original
peas image; (b) texture synthesized to have the same power spectrum as (a), but random phase.
This representation cannot capture the structures visible in many natural and artificial textures,
though it performs adequately for some textures such as the left side of Figure 9.1e. (c) Marginal
statistics of multiscale, oriented and non-oriented filter banks better capture the nature of edges
in natural images. (d) Joint statistics work even better at capturing structure.
Data from D.J. Heeger and J.R. Bergen, Pyramid-based texture analysis/synthesis, Proceedings of the 22nd
annual conference on Computer graphics and interactive techniques (SIGGRAPH ‘95), IEEE Computer Society
Press, Silver Spring, MD, 1995. Data from E.P. Simoncelli and B.A. Olshausen, Natural image statistics and neural
representation, Annual Review of Neuroscience, 24, pp. 1193–216, 2001.

The visual system may do something intelligent, like a statistical test (Voorhees and Poggio 1988;
Puzicha et al. 1997; Rosenholtz 2000), or Bayesian inference (Lee 1995; Feldman, on Bayesian
models, this volume), when detecting texture boundaries within an image. These decisions can
be implemented using biologically plausible image processing operations, thus bringing together
image processing-based and statistical models of texture segmentation.
Texture Perception 175

Texture perception more broadly


Decisions based upon a few summary statistics do a surprisingly good job of predicting existing
texture segmentation phenomena. Are these few statistics all that is required for texture percep-
tion more broadly? This seems unlikely. First, they perhaps do not even suffice to explain texture
segmentation. Simple contrast energy has probably worked in place of more complex features
only because we have tested a very limited a set of textures (Barth et al. 1998).
Second, consider Figure 9.1a–d. The mean and variance of contrast and orientation do little to
capture the appearance of the component texels, yet we have a rich percept of their shapes and
arrangement. What measurements, then, might human vision use to represent textures?
Much of the early work in texture classification and discrimination came from computer vision.
It aimed at distinguishing between textured regions in satellite imagery, microscopy, and medical
imagery. As with texture segmentation, early research pinpointed 2nd-order statistics, particularly
the power spectrum, as a possible representation (Bajcsy 1973). Researchers also explored Markov
Random Field representations more broadly. For practical applications, power spectrum and
related measures worked reasonably well. (For a review, see Haralick 1979, and Wechsler 1980.)
However, the power spectrum cannot predict texture segmentation, and texture appearance
likely requires more information rather than less. Furthermore, texture classification provides
a weak test. Performance is highly dependent upon both the diversity of textures in the dataset
and the choice of texture categories. A texture analysis/synthesis method better enables us to get
a sense of the information encoded by a given representation (Tomita et al. 1982; Portilla and
Simoncelli 2000). Texture analysis/synthesis techniques measure a descriptor for a texture, and
then generate new samples of texture that share the same descriptor. Rather than simply synthe-
sizing a texture with given properties, they can measure those properties from an arbitrary input
texture. The ‘analysis’ stage makes the techniques applicable to a far broader array of textures.
Most of the progress in developing models of human texture representation has been made using
texture analysis/synthesis strategies.
One can easily get a sense of the information encoded by the power spectrum by generating a
new image with the same Fourier transform magnitude, but random phase. This representation is
clearly inadequate to capture the appearance (Figure 9.2). The synthesized texture in Figure 9.2b
looks like filtered noise (because it is), rather than like the peas in Figure 9.2a. The synthesized
texture has none of the edges, contours, or other locally oriented structures of a natural image.
Natural images are highly non-Gaussian (Zetzsche et al 1993). The responses of oriented bandpass
filters applied to natural scenes are kurtotic (sparse) and highly dependent; these statistics cannot
be captured by the power spectrum alone, and are responsible for important aspects of the appear-
ance of natural images (Simoncelli and Olshausen 2001).
Due to limitations of the power spectrum and related measures, researchers feared that sta-
tistical descriptors could not adequately capture the appearance of textures formed of discrete
elements, or containing complex structures (Tomita et  al. 1982). Some researchers abandoned
purely statistical descriptors in favor of more ‘structural’ approaches, which described texture in
terms of discrete texels and their placement rule (Tomita et al. 1982; Zucker 1976; Haralick 1979).
Implicitly, structural approaches assume that texture processing occurs at later stages of vision, ‘a
cognitive rather than a perceptual approach’ (Wechsler 1980). Some researchers suggested choos-
ing between statistical and structural approaches, depending upon the kind of texture (Zucker
1976; Haralick 1979).
Structural models were less than successful, largely due to difficulty extracting texels. This
worked better when texels were allowed to consist of arbitrary image regions, rather than cor-
respond to recognizable ‘things’ (e.g. Leung and Malik 1996).
176 Rosenholtz

The parallels to texture segmentation should be obvious:  researchers rightly skeptical about
the power of simple statistical models abandoned them in favor of models operating on discrete
‘things’. As with texture segmentation, the lack of faith in statistical models proved unfounded.
Sufficiently rich statistical models can capture a lot of structure. Demonstrating this requires
more complex texture synthesis methodologies to find samples of texture with the same statis-
tics. A number of texture synthesis techniques have been developed, with a range of proposed
descriptors.
Heeger and Bergen’s (1995) descriptor, motivated by the success of the LNL segmentation mod-
els, consists of marginal (i.e. 1st-order) statistics of the outputs of multiscale filters, both oriented
and unoriented. Their algorithm synthesizes new samples of texture by beginning with an arbi-
trary image ‘seed’—often a sample of random noise, though this is not required—and iteratively
applying constraints derived from the measured statistics. After a number of iterations, the result
is a new image with (approximately) the same 1st-order statistics as the original. Figure 9.2c shows
an example. Their descriptor captures significantly more structure than the power spectrum;
enough to reproduce the general size of the peas and their dimples. It still does not quite get the
edges right, and misrepresents larger-scale structures.
Portilla and Simoncelli (2000) extended the Heeger/Bergen methodology, and included in their
texture descriptor the joint (2nd-order) statistics of responses of multiscale V1-like simple and
complex ‘cells’. Figure 9.2d shows an example synthesis. This representation captures much of
the perceived structure, even in micropattern textures (Portilla and Simoncelli 2000; Balas 2006),
though it is not perfect. Some non-parametric synthesis techniques have performed better at
producing new textures that look like the original (e.g. Efros and Leung 1999). However, these
techniques use a texture descriptor that is essentially the entire original image. It is unclear how
biologically plausible such a representation might be, or what the success of such techniques teach
us about human texture perception.
Portilla and Simoncelli (2000), then, remains a state-of-the-art parametric texture model.
This does not imply that its measurements are literally those made by the visual system, though
they are certainly biologically plausible. A  ‘rotation’ of the texture space would maintain the
same information while changing the representation dramatically. Furthermore, a sufficiently
rich set of 1st-order statistics can encode the same information as higher-order statistics (Zhu
et al 1996). However, the success of Portilla and Simoncelli’s model demonstrates that a rich and
high-dimensional set of image statistics comes close to capturing the information preserved and
lost in visual representation of a texture.

Texture perception is not just for textures


Researchers have long studied texture perception in the hope that it would lend insight into vision
more generally. Texture segmentation, rather than merely informing us about perceptual organi-
zation, might uncover the basic features available preattentively (Treisman 1985), or the nature of
early nonlinearities in visual processing (Malik and Perona 1990; Graham et al. 1992; Landy and
Graham 2004). However, common wisdom assumed that after the measurement of basic features,
texture and object perception mechanisms diverged (Cant and Goodale 2007). Similarly, work in
computer vision assumed separate processing for texture vs. objects.
More recent work blurs the distinction between texture and object processing. Modern com-
puter vision treats them much more similarly. Recent human vision research demonstrates
that ‘texture processing’ operations underlie vision more generally. The field’s previous suc-
cesses in understanding texture perception may elucidate visual processing for a broad array
of tasks.
Texture Perception 177

Peripheral crowding
Texture processing mechanisms have been associated with visual search (Treisman 1985) and set
perception (Chong and Treisman 2003). One can argue that texture statistics naturally inform
these tasks. Evidence of more general texture processing in vision has come from the study of
peripheral vision, in particular visual crowding.
Peripheral vision is substantially worse than foveal vision. For instance, the eye trades off sparse
sampling over a wide area in the periphery for sharp, high resolution vision over a narrow fovea.
If we need finer detail, we move our eyes to bring the fovea to the desired location.
The phenomenon of visual crowding2 illustrates that loss of information in the periphery is not
merely due to reduced acuity. A target such as the letter ‘A’ is easily identified when presented in
the periphery on its own, but becomes difficult to recognize when flanked too closely by other
stimuli, as in the string of letters, ‘BOARD’. An observer might see these crowded letters in the
wrong order, perhaps confusing the word with ‘BORAD’. They might not see an ‘A’ at all, or might
see strange letter-like shapes made up of a mixture of parts from several letters (Lettvin 1976).
Crowding occurs with a broad range of stimuli (see Pelli and Tillman 2008, for a review).
However, not all flankers are equal. When the target and flankers are dissimilar or less grouped
together, target recognition is easier (Andriessen and Bouma 1976; Kooi et al 1994; Saarela et al.
2009). Strong grouping among the flankers can also make recognition easier (Livne and Sagi 2007;
Sayim et al 2010; Manassi et al. 2012). Furthermore, crowding need not involve discrete ‘target’
and ‘flankers’; Martelli et al. (2005) argue that ‘self-crowding’ occurs in peripheral perception of
complex objects and scenes.

Texture processing in peripheral vision?


The percept of a crowded letter array contains sharp, letter-like forms, yet they seem lost in a
jumble, as if each letter’s features (e.g., vertical bars and rounded curves) have come unteth-
ered and been incorrectly bound to the features of neighboring letters (Pelli et  al. 2004).
Researchers have associated the phenomena of crowding with the ‘distorted vision’ of stra-
bismic amblyopia (Hess 1982). Lettvin (1976) observed that an isolated letter in the periph-
ery seems to have characteristics which the same letter, flanked, does not. The crowded letter
‘only seems to have a ‘statistical’ existence’. In line with these subjective impressions, research-
ers have proposed that crowding phenomena result from ‘forced texture processing’, involving
excessive feature integration (Pelli et al. 2004), or compulsory averaging (Parkes et al. 2001)
over each local pooling region. Pooling region size grows linearly with eccentricity, i.e. with
distance to the point of fixation (Bouma 1970).
Assume for the sake of argument—following Occam’s razor—that the peripheral mechanisms
underlying crowding operate all the time, by default; no mechanism perversely ‘switches on’ to
thwart our recognition of flanked objects. This Default Processing assumption has profound
implications for vision. Peripheral vision is hugely important; very little processing truly occurs
in the fovea. One can easily recognize the cat in Figure 9.3, when fixating on the ‘+’. Yet the cat
may extend a number of degrees beyond the fovea. Could object recognition, perceptual organiza-
tion, scene recognition, face recognition, navigation, and guidance of eye movements all share an
early, local texture processing mechanism? Is it that ‘texture is primitive and textures combine to

  ‘Crowding’ is used inconsistently and confusingly in the field, sometimes as a transitive verb (‘the flankers
2

crowd the target’), sometimes as a mechanism, and sometimes as the experimental outcome in which rec-
ognizing a target is impaired in the presence of nearby flankers. This chapter predominantly follows the last
definition, though in describing stimuli sometimes refers to the lay ‘at lot of stuff in a small space.’
(a) (b) (c)

(d) (e)

(f) (g) (h)

Fig. 9.3  Original images (a,c) and images synthesized to have approximately the same local summary
statistics (b,d). Intended (and model) fixation on the ‘+’. The cat can clearly be recognized while
fixating, even though much of the object falls outside the fovea. The summary statistics contain
sufficient information to capture much of its appearance (b). Similarly, the summary statistics contain
sufficient information to recognize the gist of the scene (d), though perhaps not to correctly assess its
details. (e) A patch of search display, containing a tilted target and vertical distractors. (f) The summary
statistics (here, in a single pooling region) are sufficient to decipher the approximate number of items,
much about their appearance, and the presence of the target. A target-absent patch from search for a
white vertical among black vertical and white horizontal bars. (h) The summary statistics are ambiguous
about the presence of a white vertical, perhaps leading to perception of illusory conjunctions.
Parts c-h are reproduced from Ruth Rosenholtz, Jie Huang, and Krista A. Ehinger, Rethinking the role of top-
down attention in vision: effects attributable to a lossy representation in peripheral vision, Frontiers in Psychology,
3, p. 13, DOI: 10.3389/fpsyg.2012.00013 (c) 2012, Frontiers Media S.A. This work is licensed under a Creative
Commons Attribution 3.0 License.
Texture Perception 179

produce forms’ (Lettvin 1976)? This seems antithetical to ideas of different processing for textures
and objects. Prior to 2000, it would have seemed surprising to use a texture-like representation
for more general visual tasks.
However, several state-of-the-art computer vision techniques operate upon local texture-like
image descriptors, even when performing object and scene recognition. The image descriptors
include local histograms of gradient directions, and local mean response to oriented multi-scale
filters, among others (Bosch et al. 2006, 2007; Dalal and Triggs 2005; Oliva and Torralba 2006;
Tola et  al. 2010; Fei-Fei and Perona 2005). Such texture descriptors have proven effective for
detection of humans in natural environments (Dalal and Triggs 2005), object recognition in natu-
ral scenes (Bosch et al. 2007; Mutch and Lowe 2008; Zhu et al. 2011), scene classification (Oliva
and Torralba 2001; Renninger and Malik 2004; Fei-Fei and Perona 2005), wide-baseline stereo
(Tola et al. 2010), gender discrimination (Wang et al. 2010), and face recognition (Velardo and
Dugelay 2010). These results represent only a handful of hundreds of recent computer vision
papers utilizing similar methods.
Suppose we take literally the idea that peripheral vision involves early local texture process-
ing. The key questions are whether on the one hand, humans make the sorts of errors one would
expect, and on the other hand whether texture processing preserves enough information to
explain the successes of vision, such as object and scene recognition.
A local texture representation predicts vision would be locally ambiguous in terms of the phase
and location of features, as texture statistics contains such ambiguities. Do we see evidence in
vision? In fact, we do. Observers have difficulty distinguishing 180 degree phase differences
in compound sine wave gratings in the periphery (Bennett and Banks 1991; Rentschler and
Treutwein 1985) and show marked position uncertainty in a bisection task (Levi and Klein 1986).
Furthermore, such ambiguities appear to exist during object and scene processing, though we
rarely have the opportunity to be aware of them. Peripheral vision tolerates considerable image
variation without giving us much sense that something is wrong (Freeman and Simoncelli 2011;
Koenderink et al. 2012). Koenderink et al. (2012) apply a spatial warping to an ordinary image.
It is surprisingly difficult to tell that anything is wrong, unless one fixates near the disarray. (See
<http://i-perception.perceptionweb.com/fulltext/i03/i0490sas>.)
To go beyond qualitative evidence, we need a concrete proposal for what ‘texture process-
ing’ means. This chapter has reviewed much of the relevant work. Texture appearance models
aim to understand texture processing in general, whereas segmentation models attempt only to
predict grouping. Our current best guess as to a model of texture appearance is that of Portilla
and Simoncelli (2000). Perhaps the visual system computes something like 2nd-order statistics
of the responses of V1-like cells, over each local pooling region. We call this the Texture Tiling
Model. This proposal (Balas et al. 2009; Freeman and Simoncelli 2011) is not so different from
standard object recognition models, in which later stages compute more complex features by
measuring co-occurrences of features from the previous layer (Fukushima 1980; Riesenhuber
and Poggio 1999). Second-order correlations are essentially co-occurrences pooled over a sub-
stantially larger area.
Can this representation predict crowded object recognition? Balas et al (2009) demonstrate that
its inherent confusions and ambiguities predict difficulty recognizing crowded peripheral letters.
Rosenholtz et al. (2012a) further show that this model predicts crowding of other simple symbols.
Visual search employs wide field-of-view, crowded displays. Is the difference between easy and
difficult search due to local texture processing? We can utilize texture synthesis techniques to
visualize the local information available (Figure 9.3). When target and distractor bars differ sig-
nificantly in orientation, the statistics are sufficient to identify a crowded peripheral target. The
model predicts easy ‘popout’ search (Figure 9.3e,f). The model also predicts the phenomenon of
180 Rosenholtz

illusory conjunctions (Figure 9.3g,h), and other classic search results (Rosenholtz et al. 2012b;
Rosenholtz et  al. 2012a). Characterizing visual search as limited by peripheral processing rep-
resents a significant departure from earlier interpretations which attributed performance to the
limits of processing in the absence of covert attention (Treisman 1985).
Under the Default Processing assumption, we must also ask whether texture processing might
underlie normal object and scene recognition. We synthesized an image to have the same local
summary statistics as the original (Rosenholtz 2011; Rosenholtz et al. 2012b; see also Freeman
and Simoncelli 2011). A fixated object (Figure 9.3b) is clearly recognizable; it is quite well encoded
by this representation. Glancing at a scene (Figure 9.3d), much information is available to deduce
the gist and guide eye movements; however, precise details are lost, perhaps leading to change
blindness (Oliva and Torralba 2006; Freeman and Simoncelli 2011; Rosenholtz et al. 2012b).
These results and demos indicate the power of the Texture Tiling Model. It is image-computa-
ble, and can make testable predictions for arbitrary stimuli. It predicts on the one hand difficulties
of vision, such as crowded object recognition and hard visual search, while plausibly supporting
normal object and scene recognition.

Parallels between alternative models of crowding


and less successful texture models
It is instructive to consider alternative models of crowding, and their parallels to previous work on
texture perception. A number of crowding experiments have been designed to test an overly sim-
ple texture processing model. In this ‘simple pooling’ or ‘faulty-integration’ model, each pooling
region yields the mean of some (often unspecified) feature. To a first approximation, this model
predicts worse performance the more one fills up the pooling region with irrelevant flankers,
as doing so reduces the informativeness of the mean. This impoverished model cannot explain
improved performance with larger flankers (Levi and Carney 2009; Manassi et al. 2012), or when
flankers group with one another (Saarela et al. 2009; Manassi et al. 2012).
Partially in response to failures of the simple pooling model, researchers have suggested
that some grouping might occur prior to the mechanisms underlying crowding (Saarela et
al. 2009). More generally, the field tends to describe crowding mechanisms as operating on
‘things’. Levi and Carney (2009) suggested that a key determinant of whether crowding occurs
is the distance between target and flanker centroids; averaging might operate on discrete fea-
tures of objects within the pooling region (Parkes et al. 2001; Greenwood et al. 2009; Põder
and Wagemans 2007; Greenwood et al. 2012), and/or localization of those discrete features
might be poor (Strasburger 2005; van den Berg et al. 2012); some crowding effects seem to
depend upon target/flanker identities rather than their features (Louie et al. 2007; Dakin et al.
2010), suggesting that they may be due to later, object-level mechanisms. Though as Dakin et
al. (2010) demonstrate, these apparently ‘object-centered’ effects can be explained by lower-
level mechanisms.
This sketch of alternative models should sound familiar. That crowding mechanisms might act
after early operations have split the input into local groups or objects should have obvious paral-
lels to theories of texture perception. Once again, a too-simple ‘stuff ’ model has been rejected
in favor of models which operate on ‘things’. These models, typically word models, do not easily
make testable predictions for novel stimuli.

The power of pooling in high dimensions


A ‘simple pooling model’ bears little resemblance to successful texture descriptors. Texture per-
ception requires a high dimensional representation. The Portilla and Simoncelli (2000) texture
Texture Perception 181

model computes 700–1000 image statistics per texture (depending upon choice of parameters).
(The Texture Tiling Model computes this many statistics per local pooling region.) The ‘forced
texture perception’ presumed to underlie crowding must also be high dimensional—after all, it
must at the very least support perception of actual textures.
Unfortunately it is difficult in general to get intuitions about behavior of high-dimensional
models. Low-dimensional models do not simply scale up to higher dimensions. A single mean
feature value captures little information about a stimulus. Additional statistics provide an increas-
ingly good representation of the original patch. Stuff-models, if sufficiently rich, can in fact cap-
ture a great deal of information about the visual input.
How well a stimulus can be encoded depends upon its complexity relative to the representation.
Flanker grouping can theoretically simplify the stimulus, leading to better representation and
perhaps better performance. In some cases the information preserved is insufficient to perform
a given task, and in common parlance the stimulus is ‘crowded’. In other cases, the information
is sufficient for the task, predicting the ‘relief from crowding’ accompanying, for example, a dis-
similar target and flankers (e.g. Rosenholtz et al. 2012a and Figure 9.3e,f).
A high-dimensional representation can also preserve the information necessary to individu-
ate ‘things’. For instance, it can capture the approximate number of discrete objects in Figure
9.3e,g. In fact, one can represent an arbitrary amount of structure in the input by varying the
size of the regions over which statistics are computed (Koenderink and van Doorn 2000),
and the set of statistics. The structural/statistical distinction is not a dichotomy, but rather a
continuum.
The mechanisms underlying crowding may be ‘later’ than texture perception mechanisms, and
operate on precomputed groups or ‘things’. However, just because we often recognize ‘things’
in our stimuli, as a result of the full visual-cognitive machinery, does not mean that our visual
systems operate upon those things to perform a given task. One should not underestimate the
power of high-dimensional models which operate on continuous ‘stuff ’. In texture perception,
such models have explained results for a wider variety of stimuli, and with arguably simpler
mechanisms.

Conclusions
In the last several decades, much progress has been made toward better understanding the mecha-
nisms underlying texture segmentation, classification, and appearance. There exists a rich body
of work on texture segmentation, both behavioral experiments and modeling. Many results can
be explained by intelligent decisions based on some fairly simple image statistics. Researchers
have also developed powerful models of texture appearance. More recent work demonstrates that
similar texture-processing mechanisms may account for the phenomena of visual crowding. The
details remain to be worked out, but if true, the visual system may employ local texture processing
throughout the visual field. This predicts that, rather than being relegated to a narrow set of tasks
and stimuli, texture processing underlies visual processing in general, supporting such diverse
tasks as visual search, object and scene recognition.

References
Adelson, E. H. (2001). ‘On seeing stuff: The perception of materials by humans and machines’. In
Proceedings of the SPIE: HVEI VI, edited by B. E. Rogowitz and T. N. Pappas, Vol. 4299: 1–12.
Andriessen, J. J.J., and Bouma, H. (1976) ‘Eccentric vision: Adverse interactions between line segments’.
Vision Research 16: 71–8.
182 Rosenholtz

Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61(3): 183–93.
Bajcsy, R. (1973). ‘Computer identification of visual surfaces’. Computer Graphics and Image Processing
2(2): 118–30.
Balas, B. J. (2006). ‘Texture synthesis and perception: using computational models to study texture
representations in the human visual system’. Vision research 46(3): 299–309.
Balas, B., Nakano, L., and Rosenholtz, R. (2009). ‘A summary-statistic representation in peripheral vision
explains visual crowding’. Journal of Vision 9(12): 1–18.
Barth, E., Zetzsche, C., and Rentschler, I. (1998). ‘Intrinsic two-dimensional features as textons’. Journal of
the Optical Society of America. A, Optics, image science, and vision 15(7): 1723–32.
Beck, J. (1966). ‘Effect of orientation and of shape similarity on perceptual grouping’. Perception &
psychophysics 1(1): 300–2.
Beck, J. (1967). ‘Perceptual grouping produced by line figures’. Perception & Psychophysics 2(11): 491–5.
Beck, J., Prazdny, K., and Rosenfeld, A. (1983). ‘A theory of textural segmentation’. In Human and machine
vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 1–38. (New York: Academic Press).
Beck, J., Sutter, A., and Ivry, R. (1987). ‘Spatial frequency channels and perceptual grouping in texture
segregation’. Computer Vision, Graphics, and Image Processing 37(2): 299–325.
Behrmann et al. (this volume). Holistic face perception. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Ben-av, M. B. and Sagi, D. (1995). ‘Perceptual grouping by similarity and proximity: Experimental results
can be predicted by intensity autocorrelations’. Vision Research 35(6): 853–66.
Bennett, P. J. and Banks, M. S. (1991). ‘The effects of contrast, spatial scale, and orientation on foveal and
peripheral phase discrimination’. Vision Research 31(10): 1759–86.
Bergen, J. R. and Adelson, E. H. (1988). ‘Early vision and texture perception’. Nature 333(6171): 363–4.
Bergen, J. R. and Julesz, B. (1983). ‘Parallel versus serial processing in rapid pattern discrimination’. Nature
303(5919): 696–8.
Bergen, J. R. and Landy, M. S. (1991). ‘Computational modeling of visual texture segregation’. In
Computational models of visual perception, edited by M. S. Landy and J. A. Movshon, pp. 253–71.
(Cambridge, MA: MIT Press).
Boring, E. G. (1945). ‘Color and camouflage’. In Psychology for the armed services, edited by E. G. Boring,
pp. 63–96. (Washington, D.C: The Infantry Journal).
Bosch, A., Zisserman, A., and Munoz, X. (2006). ‘Scene classification via pLSA’. In Proceedings of the 9th
European Conference on Computer Vision (ECCV’06), Springer Lecture Notes in Computer Science
3954: 517–30.
Bosch, A., Zisserman, A., and Munoz, X. (2007). ‘Image classification using random forests and ferns’.
In Proceedings of the 11th International Conference on Computer Vision (ICCV’07) (Rio de Janeiro,
Brazil): 1–8.
Bouma, H. (1970). ‘Interaction effects in parafoveal letter recognition’. Nature 226: 177–8.
Bovik, A. C., Clark, M., and Geisler, W. S. (1990). ‘Multichannel Texture Analysis Using Localized Spatial
Filters’. IEEE transactions on pattern analysis and machine intelligence 12(1): 55–73.
Braun, J. and Sagi, D. (1991). ‘Texture-based tasks are little affected by second tasks requiring peripheral or
central attentive fixation’. Perception 20: 483–500.
Brooks (this volume). Traditional and new principles of perceptual grouping. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Caelli, T. (1985). ‘Three processing characteristics of visual texture segmentation’. Spatial Vision 1(1): 19–30.
Caelli, T. M. and Julesz, B. (1978). ‘On perceptual analyzers underlying visual texture discrimination: Part
I’. Biol. Cybernetics 28: 167–75.
Caelli, T. M., Julesz, B., and Gilbert, E. N. (1978). ‘On perceptual analyzers underlying visual texture
discrimination: Part II’. Biol. Cybernetics 29: 201–14.
Texture Perception 183

Cant, J. S. and Goodale, M. A. (2007). ‘Attention to form or surface properties modulates different regions
of human occipitotemporal cortex’. Cerebral Cortex 17: 713–31.
Chong, S. C. and Treisman, A. (2003). ‘Representation of statistical properties’. Vision research 43: 393–404.
Chubb, C. and Landy, M. S. (1991). ‘Orthogonal distribution analysis: A new approach to the study
of texture perception’. In Computational Models of Visual Processing, edited by M. S. Landy and
J. A. Movshon, pp. 291–301. (Cambridge, MA: MIT Press).
Chubb, C., Nam, J.-H., Bindman, D. R., and Sperling, G. (2007). ‘The three dimensions of human visual
sensitivity to first-order contrast statistics’. Vision research 47(17): 2237–48.
Dakin (this volume). In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford:
Oxford University Press).
Dakin, S. C., Williams, C. B., and Hess, R. F. (1999). ‘The interaction of first- and second-order cues to
orientation’. Vision research 39(17): 2867–84.
Dakin, S. C., Cass, J., Greenwood, J. A., and Bex, P. J. (2010). ‘Probabilistic, positional averaging predicts
object-level crowding effects with letter-like stimuli’. Journal of Vision 10(10): 1–16.
Dalal, N., and Triggs, B. (2005). ‘Histograms of oriented gradients for human detection’. In 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ‘05): 886–93.
Efros, A. A., and Leung, T. K. (1999). ‘Texture synthesis by non-parametric sampling’. In Proceedings of the
Seventh IEEE International Conference on Computer Vision 2: 1033–8.
Elder (this volume). Bridging the dimensional gap: Perceptual organization of contour in two-dimensional
shape. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Fei-Fei, L. and Perona, P. (2005). ‘A Bayesian Hierarchical Model for Learning Natural Scene Categories’.
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
2: 524–31.
Feldman (this volume). Bayesian models of perceptual organization. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Fogel, I. and Sagi, D. (1989). ‘Gabor filters as texture discriminator’. Biological Cybernetics 61: 103–13.
Freeman, J. and Simoncelli, E. P. (2011). ‘Metamers of the ventral stream’. Nature neuroscience
14(9): 1195–201.
Fukushima, K. (1980). ‘Neocognitron: a self-organizing neural network model for a mechanism of pattern
recognition unaffected by shift in position’. Biological Cybernetics 36: 193–202.
Gibson, J. (1950). ‘The perception of visual surfaces’. The American journal of psychology 63(3): 367–84.
Gibson, J. J.J. (1986). The ecological approach to visual perception. (Hillsdale, NJ: Lawrence Erlbaum
Associates).
Gillebert and Humphreys (this volume). Mutual interplay between perceptual organization and attention: a
neuropsychological perspective. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Giora, E. and Casco, C. (2007). ‘Region- and edge-based configurational effects in texture segmentation’.
Vision Research 47(7): 879–86.
Graham, N., Beck, J., and Sutter, A. (1992). ‘Nonlinear processes in spatial-frequency channel models of
perceived texture segregation: Effects of sign and amount of contrast’. Vision Research 32(4): 719–43.
Greenwood, J. A., Bex, P. J., and Dakin, S. C. (2009). ‘Positional averaging explains crowding with
letter-like stimuli’. Proceedings of the National Academy of Sciences of the United States of America
106(31): 13130–5.
Greenwood, J. A., Bex, P. J., and Dakin, S. C. (2012). ‘Crowding follows the binding of relative position and
orientation’. Journal of Vision 12(3): 1–20.
Gurnsey, R. and Browse, R. (1987). ‘Micropattern properties and presentation conditions influencing visual
texture discrimination’. Percept. Psychophys. 41: 239–52.
184 Rosenholtz

Haralick, R. M. (1979). ‘Statistical and Structural Approaches to Texture’. Proceedings of the IEEE
67(5): 786–804.
Heeger, D. J. and Bergen, J. R. (1995). ‘Pyramid-based texture analysis/synthesis’. In Proceedings of the 22nd
annual conference on Computer graphics and interactive techniques (SIGGRAPH ‘95), pp. 229–38. (Silver
Spring, MD: IEEE Computer Society Press).
Hess, R. F. (1982). ‘Developmental sensory impairment: Amblyopia or tarachopia?’ Human neurobiology 1:
17–29.
Hindi Attar, C., Hamburger, K., Rosenholtz, R., Götzl, H., and Spillman, L. (2007). ‘Uniform versus
random orientation in fading and filling-in’. Vision Research 47(24): 3041–51.
Julesz, B. (1962). ‘Visual Pattern Discrimination’. IRE Transactions on Information Theory 8(2): 84–92.
Julesz, B. (1965). ‘Texture and Visual Perception’. Scientific American 212: 38–48.
Julesz, B. (1975). ‘Experiments in the visual perception of texture’. Scientific American 232(4): 34–43.
Julesz, B. (1981). ‘A theory of preattentive texture discrimination based on first-order statistics of textons’.
Biological Cybernetics 41: 131–8.
Julesz, B., Gilbert, E. N., and Victor, J. D. (1978). ‘Visual discrimination of textures with identical
third-order statistics’. Biol. Cybernet. 31: 137–40.
Karu, K., Jain, A., and Bolle, R. (1996). ‘Is there any texture in the image?’ Pattern Recognition
29(9): 1437–46.
Kooi, F. L., Toet, A., Tripathy, S. P., and Levi, D. M. (1994). ‘The effect of similarity and duration on spatial
interaction in peripheral vision’. Spatial vision 8(2): 255–79.
Knutsson, H. and Granlund, G. (1983). ‘Texture analysis using two-dimensional quadrature filters’. In
IEEE Computer Society workshop on computer architecture for pattern analysis and image database
management (CAPAIDM), pp. 206–13 (Silver Spring, MD: IEEE Computer Society Press).
Koenderink, J. J.J. and van Doorn, A. J. (2000). ‘Blur and disorder’. Journal of visual communication and
image representation 11(2): 237–44.
Koenderink, J. J.J., Richards, W., and van Doorn, A. J. (2012). ‘Space-time disarray and visual awareness’.
i-Perception 3(3): 159–62.
Kröse, B. (1986). ‘Local structure analyzers as determinants of preattentive pattern discrimination’. Biol.
Cybernet. 55 289–98.
Landy, M. S. and Graham, N. (2004). ‘Visual Perception of Texture’. In The Visual Neurosciences, edited by
L. M. Chalupa and J. S. Werner, pp. 1106–18. (Cambridge, MA: MIT Press).
Lee, T. S. (1995). ‘A Bayesian framework for understanding texture segmentation in the primary visual
cortex’. Vision Research 35(18): 2643–57.
Lettvin, J. Y. (1976). ‘On seeing sidelong’. The Sciences 16: 10–20.
Leung, T. K. and Malik, J. (1996). ‘Detecting, localizing, and grouping repeated scene elements from
an image’. In Proceedings of the 4th European Conf. on Computer Vision (ECVP ‘96), 1, 546–55
(London: Springer-Verlag).
Levi, D. M. and Carney, T. (2009). ‘Crowding in peripheral vision: why bigger is better’. Current biology
19(23): 1988–93.
Levi, D. M. and Klein, S. A. (1986). ‘Sampling in spatial vision’. Nature 320: 360–2.
Livne, T. and Sagi, D. (2007). ‘Configuration influence on crowding’. Journal of Vision 7(2): 1–12.
Louie, E., Bressler, D., and Whitney, D. (2007). ‘Holistic crowding: Selective interference between
configural representations of faces in crowded scenes’. Journal of Vision 7(2): 24.1–11.
Machilsen, B. and Wagemans, J. (2011). ‘Integration of contour and surface information in shape detection’.
Vision Research 51: 179–86. doi:10.1016/j.visres.2010.11.005.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). ‘Perceptual organization and attention’.
Cognitive Psychology 24: 475–501.
Malik, J. and Perona, P. (1990). ‘Preattentive texture discrimination with early vision mechanisms’. Journal
of the Optical Society of America. A 7(5): 923–32.
Texture Perception 185

Manassi, M., Sayim, B., and Herzog, M. (2012). ‘Grouping, pooling, and when bigger is better in visual
crowding’. Journal of Vision 12(10): 13.1–14.
Martelli, M., Majaj, N., and Pelli, D. (2005). ‘Are faces processed like words? A diagnostic test for
recognition by parts’. Journal of Vision 5: 58–70.
Mutch, J. and Lowe, D. G. (2008). ‘Object class recognition and localization using sparse features within
limited receptive fields’. International Journal of Computer Vision 80: 45–57.
Nothdurft, H. C. (1991). ‘Texture segmentation and pop-out from orientation contrast’. Vision Research
31(6): 1073–8.
Oliva, A. and Torralba, A. (2001). ‘Modeling the shape of the scene: A holistic representation of the spatial
envelope’. International Journal of Computer Vision 42(3): 145–75.
Oliva, A. and Torralba, A. (2006). ‘Building the gist of a scene: the role of global image features in
recognition’. Progress in Brain Research 155: 23–36.
Olson, R. K. and Attneave, F. (1970). ‘What Variables Produce Similarity Grouping?’ American Journal of
Psychology 83(1): 1–21.
Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., and Morgan, M. (2001). ‘Compulsory averaging of
crowded orientation signals in human vision’. Nature Neuroscience 4(7): 739–44.
Pelli, D. G. and Tillman, K. A. (2008). ‘The uncrowded window of object recognition’. Nature Neuroscience
11(10): 1129–35.
Pelli, D. G., Palomares, M., and Majaj, N. (2004). ‘Crowding is unlike ordinary masking: Distinguishing
feature integration from detection’. Journal of Vision 4: 1136–69.
Põder, E. and Wagemans, J. (2007). ‘Crowding with conjunctions of simple features’. Journal of Vision
7(2): 23.1–12.
Pomerantz & Cragin (this volume). Emergent features and feature combination. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Popat, K. and Picard, R. W. (1993). ‘Novel cluster-based probability model for texture synthesis,
classification, and compression’. In Proceedings of the SPIE Visual Communications and Image Processing
‘93, edited by B. G. Haskell and H.-M. Hang 2094: 756–68.
Portilla, J. and Simoncelli, E. P. (2000). ‘A Parametric Texture Model Based on Joint Statistics of Complex
Wavelet Coefficients’. International Journal of Computer Vision 40(1): 49–71.
Puzicha, J., Hofmann, T., and Buhmann, J. M. (1997). ‘Non—parametric Similarity Measures for
Unsupervised Texture Segmentation and Image Retrieval’. In Proceedings of the Computer Vision and
Pattern Recognition, CVPR ‘97, IEEE, 267–72.
Renninger, L. W. and Malik, J. (2004). ‘When is scene identification just texture recognition?’ Vision
Research 44(19): 2301–11.
Rentschler, I. and Treutwein, B. (1985). ‘Loss of spatial phase relationships in extrafoveal vision’. Nature
313: 308–10.
Riesenhuber, M. and Poggio, T. (1999). ‘Hierarchical models of object recognition in cortex’. Nature
neuroscience 2(11): 1019–25.
Rosenholtz, R. (1999). ‘General-purpose localization of textured image regions’. In Proceedings of
the SPIE, Human Vision and Electronic Imaging IV, edited by M. H. Wu et al., 3644: 454–60.
doi=10.1117/12.348465.
Rosenholtz, R. (2000). ‘Significantly different textures: A computational model of pre-attentive texture
segmentation’. In Proceedings of the European Conference on Computer Vision (ECCV ‘00), LNCS, edited
by D. Vernon 1843: 197–211.
Rosenholtz, R. (2011). ‘What your visual system sees where you are not looking’. In SPIE: Human
Vision and Electronic Imaging, XVI, edited by B. E. Rogowitz and T. N. Pappas,. 7865: 786510.
doi=10.1117/12.876659.
Rosenholtz, R., Huang, J. Raj, A., Balas, B. J., and Ilie, L. (2012a). ‘A summary statistic representation in
peripheral vision explains visual search’. Journal of Vision 12(4): 14. 1–17. doi: 10.1167/12.4.14.
186 Rosenholtz

Rosenholtz, R., Huang, J., and Ehinger, K. A. (2012b). ‘Rethinking the role of top-down attention in
vision: Effects attributable to a lossy representation in peripheral vision’. Frontiers in Psychology 3: 13.
doi:10.3389/fpsyg.2012.00013.
Rubenstein, B. S. and Sagi, D. (1996). ‘Preattentive texture segmentation: the role of line terminations, size,
and filter wavelength’. Perception & Psychophysics 58(4): 489–509.
Saarela, T. P., Sayim, B., Westheimer, G., and Herzog, M. H. (2009). ‘Global stimulus configuration
modulates crowding’. Journal of Vision 9(2): 5.1–11.
Sayim, B., Westheimer G., and Herzog, M. H. (2010). ‘Gestalt Factors Modulate Basic Spatial Vision’.
Psychological Science 21(5): 641–4.
Simoncelli, E. P. and Olshausen, B. A. (2001). ‘Natural image statistics and neural representation’. Annual
Review of Neuroscience 24: 1193–216.
Strasburger, H. (2005). ‘Unfocused spatial attention underlies the crowding effect in indirect form vision’.
Journal of Vision 5(11): 1024–37.
Sutter, A., Beck, J., and Graham, N. (1989). ‘Contrast and spatial variables in texture segregation: Testing a
simple spatial-frequency channels model’. Perception & Psychophysics 46(4): 312–32.
Tola, E., Lepetit, V., and Fua, P. (2010). ‘DAISY: an efficient dense descriptor applied to wide-baseline
stereo’. IEEE transactions on pattern analysis and machine intelligence 32(5): 815–30.
Tomita, F., Shirai, Y., and Tsuji, S. (1982). ‘Description of Textures by a Structural Analysis’. IEEE
transactions on pattern analysis and machine intelligence PAMI-4(2): 183–91.
Treisman, A. (1985). ‘Preattentive processing in vision’. Computer Vision, Graphics, and Image Processing
31: 156–77.
Turner, M. R. (1986). ‘Texture discrimination by Gabor functions’. Biological Cybernetics 55: 71–82.
van den Berg, R., Johnson, A., Martinez Anton, A., Schepers, A. L., and Cornelissen, F. W. (2012).
‘Comparing crowding in human and ideal observers’. Journal of Vision 12(8): 1–15.
Velardo, C. and Dugelay, J.-L. (2010). ‘Face recognition with DAISY descriptors’. In Proceedings of the 12th
ACM workshop on multimedia and security ACM: 95–100.
Victor, J. D. and Brodie, S. (1978). ‘Discriminable textures with identical Buffon Needle statistics’. Biol.
Cybernet. 31: 231–4.
Voorhees, H. and Poggio, T. (1988). ‘Computing texture boundaries from images’. Nature 333: 364–7.
Wagemans (this volume). Historical and conceptual background: Gestalt theory. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Wang, J.-G., Li, J., W.-Y. Yau, and E. Sung (2010). ‘Boosting dense SIFT descriptors and shape contexts
of face images for gender recognition’. In Proceedings of the Computer Vision and Pattern Recognition
Workshop (CVPRW ‘10) San Francisco, CA, pp. 96–102.
Wechsler, H. (1980). ‘Texture analysis—a survey’. Signal Processing 2: 271–82.
Zetzsche, C., Barth, E., and Wegmann, B. (1993). ‘The importance of intrinsically two-dimensional
image features in biological vision and picture coding’. In Digital images and human vision, edited by
A. B. Watson, pp. 109–38. (Cambridge, MA: MIT Press).
Zhu, S., Wu, Y. N., and Mumford, D. (1996). ‘Filters, random fields and maximum entropy (FRAME)—
Towards the unified theory for texture modeling’. In IEEE Conf. Computer Vision and Pattern
Recognition, pp. 693–6.
Zhu, C., Bichot, C. E., and Chen, L. (2011). ‘Visual object recognition using daisy descriptor’. In Proc. IEEE
Intl. Conf. on Multimedia and Expo (ICME 2011), Barcelona, Spain, pp. 1–6.
Zucker, S. W. (1976). ‘Toward a model of texture’. Computer Graphics and Image Processing 5(2): 190–202.
Section 3

Contours and shapes


Chapter 10

Contour integration: Psychophysical,
neurophysiological, and computational
perspectives
Robert F. Hess, Keith A. May, and Serge O. Dumoulin

A psychophysical perspective
Natural scenes and the visual system
The mammalian visual system has evolved to extract relevant information from natural images that
in turn have specific characteristics, one being edge alignments that define image features. Natural
scenes exhibit consistent statistical properties that distinguish them from random luminance distri-
butions over a large range of global and local image statistics. Edge co-occurrence statistics in natural
images are dominated by aligned structure (Geisler et al. 2001; Sigman et al. 2001; Elder and Goldberg
2002) and parallel structure (Geisler et al. 2001). The aligned edge structure follows from the fact
that pairs of separated local edge segments are most likely to be aligned along a linear or co-circular
path. This pattern occurs at different spatial scales (Sigman et al. 2001). The co-aligned information
represents contour structure in natural images. The parallel information, on the other hand, is most
frequently derived from regions of the same object and arises from surface texture. Edges are an
important and highly informative part of our environment. Edges that trace out a smooth path show
correspondence of position over a wide range of different spatial scales. As edges become more jag-
ged, and indeed more like edges of the kind common in natural images (i.e. fractal), correspondence
in position becomes limited to a smaller band of spatial scales. Although jagged edges have continu-
ous representation over spatial scale, the exact position and orientation of the edge changes from scale
to scale (Field et al. 1993). The contour information is therefore quite different at different spatial
scales so, to capture the full richness of the available information, it is necessary to make use of a range
of contour integration operations that are each selective for a narrow band of scales.

Quantifying contour detection


The history of studies on contour integration stretches back to the Gestalt psychologists (Koffka
1935) who formulated rules for perceptually significant image structure, including contour conti-
nuity: the Gestalt ‘law’ of good continuation. More recent attempts to examine these ideas psycho-
physically have used element arrays composed of dots or line segments (Beck et al. 1989; Moulden
1994; Smits and Vos 1987; Uttal 1983). Although these studies were informative, the broadband
nature of the elements used and the lack of control for element density made it difficult to appre-
ciate the relationship between the tuning properties of single cells and the network operations
describing how their outputs might be combined. Contours composed of broadband elements
or strings of more closely spaced elements could always be integrated using a single, broadband
detector without the need for network interactions (relevant to this is Figure 10.2).
190 Hess, May, and Dumoulin

Since local edge alignment in fractal images depends on scale, Field et  al. (1993) addressed
this question using spatial frequency narrowband elements (i.e. Gabors) and ensured that local
density cues could not play a role. We thought there might be specific rules for how the responses
of orientation-selective V1 cells are combined to encode contours in images. A typical stimulus is
seen in Figure 10.1a; it is an array of oriented Gabor micropatterns, a subset of which (frame on
the left) are aligned to make a contour (indicated by arrow).
In the figure in the left frame of Figure 10.1a, the contour in the middle of the field going from
the bottom right to the top left is clearly visible, suggesting that either elements aligned or of the
same orientation group together. The figure in the right frame of Figure 10.1a on first inspection

(a) Aligned contour Orthogonal contour

(b) (c)

100 The association field

90
Percent correct

80

70

60 Contour integration only occurs when:


path angle change is less than ±60°
50 S.D. Spacing between gabors is no greater
than 4-6 gabor wavelengths
0 10 20 30 40 The orientation of gabors is close to
that of the contour
Path angle (deg) Other variables:
The gabor phase is irrelevant
dection improves as the number of gabors increases
Fovea Filter model
up to 12

Fig. 10.1  Contours defined by orientation-linking. In (a), a comparison of a straight contour defined
by elements that are aligned with the contour (left) or orthogonal to it (right). In (b), the visual system’s
performance on detecting orientationally-linked contours of different curvature, compared with that
of a single elongated filter (solid line). In (c), the proposed mechanism, a network interaction called an
‘Association Field’.
Reprinted from Vision Research, 33 (2), David J. Field, Anthony Hayes, and Robert F. Hess, Contour integration by
the human visual system: Evidence for a local “association field”, pp. 173–93, Copyright © 1993, with permission
from Elsevier and Robert F. Hess and Steven C. Dakin, Absence of contour linking in peripheral vision, Nature, 390
(6660), pp. 602–4, DOI: 10.1038/37593 Copyright (c) 1997, Nature Publishing Group.
Contour Integration 191

does not contain an obvious contour, yet there is a similar subset of the elements of the same
orientation and in the same spatial arrangement as in the left frame of Figure 10.1a. These ele-
ments are however not aligned with the contour path, but orthogonal to it, and one of our initial
observations was that although this arrangement did produce visible contours, the contours were
far less detectable than those with elements aligned with the path. This suggested rules imposed
by the visual grouping analysis relating to the alignment of micropatterns, which may reflect the
interactions of adjacent cells with similar orientation preference exploiting the occurrence of
co-oriented structure in natural images.

Snakes, ladders, and ropes


Most experiments on contour integration have used ‘snake’ contours in which the contour ele-
ments are aligned, or nearly aligned, with the path (see Figure 10.1a, top left). Other forms of con-
tours are ‘ladders’ (Bex et al. 2001; Field et al. 1993; Ledgeway et al. 2005; May and Hess 2007a,b;
May and Hess 2008) in which the elements are perpendicular to the path (see figure 10.1a, top
right), and ‘ropes’(coined by S. Schwartzkopf) (Ledgeway et al. 2005), in which the elements are
all obliquely oriented in the same direction relative to the contour. Snakes are the easiest to detect
and ropes are the hardest (Ledgeway et al. 2005). Since the three types of contour are distinguished
by a group rotation of each contour element, they are identical in their intrinsic detectability (an
ideal observer would perform identically on all three); the difference in performance between the
different contour types therefore reveals something about the mechanisms that the visual system
uses to detect them, i.e. it constrains models of contour integration.
Since ropes are essentially undetectable, models tend to possess mechanisms that can link ele-
ments arranged in a snake or ladder configuration, but not in a rope configuration (May and Hess
2007b; May and Hess 2008; Yen and Finkel 1998). To explain the inferior detection of ladders,
Field et al (1993) and May and Hess (2007b) proposed weaker binding between ladder elements
than snake elements. Using a model based on Pelli et al.’s (2004) crowding model, May and Hess
(2007b) showed that this single difference between snake and ladder binding was sufficient to
explain their finding that detection of ladder contours was fairly good in the centre of the visual
field, but declined much more rapidly than snakes with increasing eccentricity.

The association field concept


To determine how visual performance varies as a function of the curvature of the contour, the
angular difference between adjacent 1-D Gabors along the contour path is varied. The effect of
this manipulation (unfilled symbols) is shown in Figure 10.1b where psychophysical performance
(per cent correct) is plotted against path angle (degrees). Performance remains relatively good for
paths of intermediate curvature but declines abruptly once the path becomes very curved. These
paths were jagged in that the sign of the orientation change from element to element is random,
in contrast to smooth curves where the angular change always has the same sign. Smooth curves
are easier to detect by a small amount (Dakin and Hess 1998; Hess et al. 2003; Pettet et al. 1996)
but otherwise show the same dependence on curvature. While straight contours could in princi-
ple be detected by an elongated receptive field, avoiding the need for more complex inter-cellular
interactions, this would not be the case for highly curved contours. The solid line in figure 10.1b
gives the linear filtering prediction (Hess and Dakin 1997) for a single elongated receptive field: its
dependence on curvature is much stronger than that measured psychophysically, adding support
to the idea that contours of this kind are detected by interactions across a cellular array rather than
by spatial summation within an individual cell. This conclusion was further strengthened by the
finding that performance is only marginally affected if the contrast polarity of alternate contour
192 Hess, May, and Dumoulin

elements (and half the background elements) is reversed (Field et al. 1997). This manipulation
would defeat any elongated receptive field that linearly summated across space. This suggests that
even the detection of straight contours may be via the linking of responses of a number of cells
aligned across space but with similar orientation preferences.
On the basis of the above observations Field et  al. (1993) suggested that these interactions
could be described in terms of an Association Field, a network of cellular interactions specifically
designed to capitalize on the edge-alignment properties of contours in natural images. Figure
10.1c illustrates the idea and summarizes the properties of the Association Field. The facilitatory
interactions are shown by continuous lines and the inhibitory interactions by dashed lines. The
closer the adjacent cell is in its position and preferred orientation, the stronger the facilitation.
This psychophysically defined ‘Association Field’ matches the joint-statistical relationship that
edge-alignment structure has in natural images (Geisler et al, 2001; Sigman et al, 2001; Elder and
Goldberg 2002; Kruger 1998; for more detail, see Elder, this volume).
So far we have assumed that the detection of contours defined by the alignment of spatial fre-
quency bandpass elements embedded within an array of similar elements of random orientation
is accomplished by a low-level mechanism operating within spatial scale (i.e. V1–V3 receptive
fields) rather than by a high-level mechanism operating across scale. This latter idea would be
more in line with what the Gestalt psychologists envisaged. The question then becomes, are con-
tours integrated within or across spatial scale? Figure 10.2 shows results obtained when the spatial
frequency of alternate micropatterns is varied (Dakin and Hess 1998). The top frames show exam-
ples of curved contours made up of elements of the same spatial scale (b) as opposed to elements
from two spatial scales (a and c). The results in the bottom frames show how the psychophysical
contour detection performance depends on the spatial frequency difference between alternate
contour elements. Contour integration exhibits spatial frequency tuning, more so for curved than
for straight contours, suggesting it is primarily a within-scale operation, providing support for
orientation linking as described by the Association Field operating at a low level in the cortical
hierarchy.

The nature and site of the linking process


The linking code within the Association Field must be conveyed in the firing pattern of cells in
early visual cortex. The typical form of this response as reflected in the post-stimulus time histo-
gram involves an initial burst of firing within the first 50 milliseconds followed by a slow sustained
response declining in amplitude over a 300 millisecond period. In principle, the extent of facilita-
tive inter-cellular interaction reflecting contour integration could be carried by the amplitude of
the initial burst of firing or the later sustained response or the pattern (including synchronicity) of
spikes. The initial burst of spikes is thought to carry the contrast-dependent signal (Lamme 1995;
Lamme et al. 1998; Zipser et al. 1996), and this is unlikely to carry the linking signal because it
has been shown that randomizing the contrasts of the Gabor elements has little effect on contour
integration performance (Hess et al. 1998).
Contour integration (i.e. its curvature dependence) does not depend critically on the element
temporal frequency so long as it is within the temporal window of visibility of individual ele-
ments (Hess et al. 2001), again suggesting a decoupling from contrast processing. However, when
the local orientation of contour elements changes over time, three interesting finding emerge.
First, the dynamics of contour integration are slow compared with contrast integration. Second,
the dynamics are dependent on curvature; the highest temporal frequency of orientation change
that would support linking varied from around 10Hz for straight contours to around 1–2 Hz for
Contour Integration 193

a b c

100 Straight path (0°) 100 Curved path (30°)

90 90
Percent correct

Percent correct
80 80

70 70

60 60

50 50
1 2 4 8 1 2 4 8
Carrier spatial frequency cpd Carrier spatial frequency cpd

Fig. 10.2  Orientational linking occurs within spatial scale. Frames at the top left and right (a) and (c)
show examples of contours defined by the orientation of elements that alternate in spatial scale. The
frame at the top center illustrates a contour defined by the orientation of elements within the one scale.
In the bottom frames, the detectability of contours, be they straight (bottom left) or curved (bottom
right), shows spatial scale tuning (adapted from Dakin and Hess 1998). In this experiment, one set of
Gabors had a carrier spatial frequency of 3.2 cpd, and the other set had a spatial frequency indicated by
the horizontal axis of the graphs.
Adapted from S.C. Dakin and R.F. Hess, Spatial-frequency tuning of visual contour integration, Journal of the
Optical Society of America A: Optics, Image Science, and Vision, 15(6), pp. 1486–99 © 1998, The Optical Society.

curved contours. Third, this does not depend on absolute contrast of elements (Hess et al. 2001).
These dynamics are not what one would expect if either synchrony of cellular firing which is in
the 1–2 ms range (Singer and Gray 1995) (Beaudot 2002; Dakin and Bex 2002) or contrast (Polat
1999; Polat and Sagi 1993, 1994) were involved in the linking process. The sluggish temporal
properties of the linking process may point to the code being carried by the later sustained part
of the spike train (Lamme 1995; Lamme et al. 1998; Zipser et al. 1996).
Contour integration is not a cue-invariant process (Zhou and Baker 1993) in that not all ori-
ented features result in perceptual contours: contours composed of elements alternately defined
by chromaticity and luminance do not link into perceptual contours (McIlhagga and Mullen
1996) and elements defined by texture-orientation do not link together either (Hess et al. 2000).
The rules that define linkable contours provide a psychophysical cue as to the probable site of
these elementary operations. McIlhagga and Mullen (1996) and Mullen et al. (2000) showed that
194 Hess, May, and Dumoulin

contours defined purely by chromaticity obey the same linking rules but that elements alternately
defined by luminance and chromatically do not link together. This suggests that, at the cortical
stage at which this occurs, luminance and chromatic information are processed separately, sug-
gesting a site later than V1since in V1 cells tuned for orientation processing both chromatic and
achromatic information (Johnson et al. 2001). Hess and Field (1995) showed that contour integra-
tion must occur at a level in the cortex where the cells process disparity. They devised a dichoptic
stimulus in which the embedded contour could not be detected monocularly because it oscillated
between two depth planes—it could be detected only if disparity had been computed first. These
contours were easily detected and their detectability did not critically depend on the disparity
range, suggesting the process operated at a cortical stage at or after where relative disparity was
computed. This is believed to be V2 (Parker and Cumming 2001).

A neurophysiological perspective
Cellular physiology
Neurons in primary visual cortex (V1 or striate cortex) respond to a relatively narrow range of
orientations within small (local) regions of the visual field (Hubel and Wiesel 1968). As such, V1
can be thought of as representing the outside world using a bank of oriented filters (De Valois
and De  Valois 1990). These filters form the first stage of contour integration. In line with this
filter notion, the V1 response to visual stimulation is well predicted by the contrast-energy of
the stimulus for synthetic (Boynton et al. 1999; Mante and Carandini 2005) and natural images
(Dumoulin et al. 2008; Kay et al. 2008; Olman et al. 2004).
Even though V1 responses are broadly consistent with the contrast-energy within the images,
a significant contribution of neuronal interactions is present that modulate the neural responses
independent of the overall contrast-energy (Allman et al. 1985; Fitzpatrick 2000). These neuronal
interactions can enhance or suppress neural responses and may also support mechanisms such
as contour integration. The Association Field might be implemented by facilitatory interactions
between cells whose preferred stimuli lie close together on a smooth curve, and inhibitory inter-
actions between cells whose preferred stimuli would be unlikely to coexist on the same physi-
cal edge. There is anatomical evidence for such a hard-wired arrangement within the long-range
intrinsic cortical connections in V1 (Gilbert and Wiesel 1979; Gilbert and Wiesel 1989). Neurons
in different orientation columns preferentially link with neurons with co-oriented, co-axially
aligned receptive fields (Bosking et al. 1997; Kisvárday et al. 1997; Malach et al. 1993; Stettler et al.
2002; Weliky et al. 1995; Schmidt 1997; Pooresmaeili, 2010).
Neurophysiological recordings further support these anatomical observations (Gilbert et  al.
1996; Kapadia et  al. 1995; Li et  al. 2006; Nelson and Frost 1985; Polat et  al. 1998). Neuronal
responses to local oriented bars within the classical receptive field are modulated by the pres-
ence of flanking bars outside the classical receptive field, i.e. in the extra-classical receptive field.
Importantly, the elements in the extra-classical receptive field are not able to stimulate the neu-
ron alone, so the response modulation critically depends on an interaction between the elements
placed within the classical receptive field and those placed outside it. Furthermore, the amount of
response modulation is greatly affected by the relative positions and orientations of the stimulus
elements. Co-axial alignment usually increases neural responses whereas orthogonal orientations
usually decrease neural responses (Blakemore and Tobin 1972; Jones et al. 2002; Kastner et al.
1997; Knierim and Van Essen 1992; Nelson and Frost 1978; Nothdurft et al. 1999; Sillito et al.
1995). These neural modulations may partly be explained by the hard-wired intrinsic connectivity
Contour Integration 195

in V1 but may also be supported by feedback or top-down influences from later visual cortex
(Li et al. 2008).
The evidence suggests that the extra-classical receptive field modulations resemble the
proposed contour Association Field. For example, recording in V1, Kapadia and col-
leagues (Kapadia et al. 1995) presented flanking bars in many different configurations in the
extra-classical receptive field while presenting a target bar in the classical receptive field at the
neuron’s preferred orientation. Kapadia and colleagues found that facilitation was generally
highest for small separations and small or zero lateral offsets between the flanker and target
bar. They also varied the orientation of the flanking bar while maintaining good continuation
with the target bar. The distribution of preferred flanker orientations was strongly peaked at
the cell’s preferred orientation, indicating co-axial facilitation. Yet some cells did not have an
obvious preferred flanker orientation or appeared to prefer non-co-axial flanker orientations.
Kapadia and colleagues suggested that the latter neurons might play a part in integrating curved
contours. Tuning to curvature is also highly prevalent in V2 and V4 (Anzai et al. 2007; Hegde
and Van Essen 2000; Ito and Komatsu 2004; Pasupathy and Connor 1999) suggesting a role
for these sites in co-circular integration along curved contours. V4 neurons are also tuned to
simple geometric shapes, further highlighting its role in intermediate shape perception (Gallant
et al. 1993; Gallant et al. 1996).

Functional imaging
Functional MRI studies further highlight the involvement of human extra-striate cortex in con-
tour integration. For example, Dumoulin et al. (2008) contrasted the responses to several natural
and synthetic image categories (Figure 10.3). They found distinct response profiles in V1 and
extra-striate cortex. Contrast-energy captured most of the variance in V1, though some evidence
for increased responses to contour information was found as well. In extra-striate cortex, on the
other hand, the presence of sparse contours captured most of the response variance despite large
variations in contrast-energy. These results provide evidence for an initial representation of natu-
ral images in V1 based on local oriented filters. Later visual cortex (and to a modest degree V1)
incorporates a facilitation of contour-based structure and suppressive interactions that effectively
amplify sparse-contour information within natural images.
Similarly, Kourtzi and colleagues implicated both early and late visual cortex in the process of
contour integration (Altmann et al. 2003; Altmann et al. 2004; Kourtzi and Huberle 2005; Kourtzi
et al. 2003). Using a variety of fMRI paradigms they demonstrated involvement of both V1 and
later visual areas. However, the stimuli in all these fMRI studies contain closed contours. Contour
closure creates simple concentric shapes that may be easier to detect (Kovacs and Julesz 1993)
and may involve specialized mechanisms in extra-striate cortex (Altmann et al. 2004; Dumoulin
and Hess 2007; Tanskanen et al. 2008). Furthermore, contour closure may introduce symmetry
for which specialized detection mechanisms exist (Wagemans 1995). Therefore these fMRI results
may reflect a combination of contour integration and shape processing, and may not uniquely
identify the site of the contour integration.
Beyond V2 and V4 lies ventral cortex, which processes shapes. In humans, the cortical region
where intact objects elicit stronger responses than their scrambled counterparts is known as the
lateral occipital complex (LOC) (Malach et al. 1995). It extends from lateral to ventral occipital
cortex. The term ‘complex’ acknowledges that this region consists of several visual areas. Early vis-
ual cortex (V1) is often also modulated by the contrast between intact and scrambled objects but
in an opposite fashion, i.e. fMRI signal amplitudes are higher for scrambled images (Dumoulin
196 Hess, May, and Dumoulin

(a) Full images = Contours + Textures


6 6 6

T-values

3 3 3
(b)
6 6 6
T-values

3 3 3

(c) (d)

Fig. 10.3  fMRI responses elicited by viewing pseudo-natural (a, b) and synthetic (b,d) images. The
fMRI responses are shown on an inflated cortical surface of the left hemisphere (c,d). The responses
are an average of five subjects and the average visual area borders are identified. Both pseudo-natural
and synthetic images yield similar results. In V1 strongest responses are elicited by viewing of the
‘full images’ (d, bottom inset). This supports the notion that V1 responses are dominated by the
contrast-energy within images. In extra-striate cortex, on the other hand, strongest responses are
elicited by viewing ‘contour’ images (d, top inset). These results suggest that facilitative and suppressive
neural interactions within and beyond V1 highlight contour information in extra-striate visual cortex.
Reproduced from Serge O. Dumoulina, Steven C. Dakinb, and Robert F. Hess, Sparsely distributed contours
dominate extra-striate responses to complex scenes, NeuroImage, 42(2), pp. 890–901, DOI: 10.1016/j.
neuroimage.2008.04.266 (c) 2008, The Wellcome Trust. This work is licensed under a Creative Commons
Attribution 3.0 License.

and Hess 2006; Fang et al. 2008; Grill-Spector et al. 1998; Lerner et al. 2001; Murray et al. 2002;
Rainer et al. 2002). Stronger responses to scrambled objects have been interpreted as feedback
from predictive coding mechanisms (Fang et al. 2008; Murray et al. 2002) or incomplete match
of low-level image statistics including the breakup of contours (Dumoulin and Hess 2006; Rainer
et al. 2002). These results highlight the interaction between early and late visual areas in the pro-
cessing of contour and shape.
Contour Integration 197

A computational perspective
Two main classes of contour integration model
Models of contour integration generally fall into one of two categories: Association Field models or
filter overlap models (although see Watt et al. (2008) for consideration of other models). In con-
trast to the Association Field, in filter overlap models, grouping occurs purely because the filter
responses to adjacent elements overlap.
Association Field models. Field et al (1993) did not explicitly implement an Association Field
model, but several researchers have done so since. Yen and Finkel (1998) set up a model that had
two sets of facilitatory connections:  co-axial excitatory connections between units whose pre-
ferred stimulus elements lay on co-circular paths (for detecting snakes, as in Figure 10.1a, left),
and trans-axial excitatory connections between units whose preferred stimulus elements were
parallel (for detecting ladders, as in Figure 10.1a, right). The two sets of connections competed
with each other, so the set of connections carrying the weaker facilitatory signals was suppressed.
Their model did a fairly good job of quantitatively accounting for a range of data from Field et al.
(1993) and Kovács and Julesz (1993).
Another Association Field model was set up by Li (1998), who took the view that contour
integration is part of the wider task of computing visual saliency. Li’s saliency model was based
firmly on the properties of V1 cells. The same model was able to account for contour integra-
tion phenomena, as well as many other phenomena related to visual search and segmentation in
multi-element arrays (Li 1999; Li 2000; Li 2002; Zhaoping and May 2007). However, Li provided
only qualitative demonstrations of the model’s outputs, rather than quantitative simulations of
psychophysical performance like those of Yen and Finkel.
The models of Li and of Yen and Finkel were recurrent neural networks, which exhibit temporal
oscillations. Both models showed synchrony in oscillations between units responding to elements
within the same contour, but a lack of synchrony between units responding to elements in dif-
ferent contours. Both sets of authors suggested that this might form the basis of segmentation of
one contour from others or from the background. In addition, the units responding to contour
elements responded more strongly than those responding to distractor elements.
The Association Field models described so far used ad hoc weightings on the facilitatory con-
nections. A  different approach is to assume that the connection weights reflect the image sta-
tistics that the observer is using to do the task. In this view, the Association Field is a statistical
distribution that allows the observer to make a principled decision about whether two edge ele-
ments should be grouped into the same contour. Geisler et al (2001) used this approach and found
that Association Fields derived from edge co-occurrence statistics in natural images accurately
accounted for human data on a contour detection task. Elder and Goldberg (2002) followed with
a similar approach.
Watt et al. (2008) have pointed out that many of the patterns of performance found in con-
tour integration experiments may reflect the difficulty of the task, rather than the properties of
the visual mechanism that the observer is using. Traditionally, task difficulty is factored out by
expressing the participant’s performance relative to the performance of the ideal observer for the
task (Banks et al. 1987; Geisler 1984; Geisler 1989). For many simple visual tasks, it is straight-
forward to derive the ideal algorithm, but this is not the case for most contour integration tasks
because of the complexity of the algorithms used for generating the contours. Recently, Ernst et al.
(2012) tackled this problem in an elegant way: they turned the idea of the Association Field on its
head and used it to generate the contours in the first place. The Association Field used to generate
198 Hess, May, and Dumoulin

the contours is then the correct, i.e. optimal, statistical distribution for calculating the likelihood
that the stimulus contains the contour. Using this approach, the properties of the contour, such as
curvature, element separation, etc., are determined by the parameters of the Association Field; the
ideal observer, who always uses the Association Field that generated the contour in the first place,
would therefore have an advantage over the human observer in knowing which sort of contour
was being presented on each trial. Not surprisingly, Ernst et  al. found that, although the ideal
observer’s pattern of performance, as a function of contour properties, was qualitatively similar to
human performance, the ideal observer performed much better. They investigated the possibility
that the human observer was using the same Association Field on each trial. This strategy would
be optimal for contours generated using that Association Field, but suboptimal in all other cases.
They generated the single Association Field that fitted best to all the data, but even this subopti-
mal model outperformed the human observers. Ernst et al. ruled out the effect of noise because
the model’s correlation with the human data was the same as the correlations between individual
subjects, so it would seem that their model was simply using a better Association Field for the task
than the human observers.
Although the ideal observer’s performance can provide a useful benchmark against which to
compare human performance, it may be over-optimistic to assume that human observers will be
able to implement a strategy that is optimal for whichever psychophysical task they are set: it is
more likely that the human observer possesses mechanisms that are optimal for solving real-world
tasks, and recruits them to carry out the artificial psychophysical task at hand (McIlhagga and
May 2012). The natural-image-based approach to deriving the association Field taken by Geisler
et al. and Elder and Goldberg may therefore be more fruitful than a pure ideal-observer approach.
Filter-overlap models. As an alternative to Association Field models, Hess and Dakin (1997)
implemented a model in which the contour linking occurred due to spatial overlap of filter
responses to different elements. Applying a V1-style filter to the image has the effect of blurring
the elements so that they join up. Thresholding the filter output to black and white generates a set
of blobs, or zero-bounded response distributions (ZBRs), and a straight contour will generate a
long ZBR in the orientation channel aligned with the contour. In Hess and Dakin’s model, the for-
mation of ZBRs took place only within orientation channels, and this severely limited its ability to
integrate curved contours. The model’s performance, as a function of contour curvature, is plotted
in Figure 10.1b, which shows that, while the model could successfully detect straight contours, its
performance deteriorated rapidly as the contour became more curved. Hess and Dakin suggested
that this kind of model may reflect contour integration in the periphery, while the Association
Field may reflect processing in the fovea.
The poor performance of Hess and Dakin’s filter-overlap model on detection of highly curved
contours was not a result of the filter-overlap process itself, but a result of the fact that formation
of ZBRs took place within a single orientation channel. May and Hess (2008) lifted this restriction,
and implemented a model that could extend ZBRs across orientation channel as well as space.
Unlike Hess and Dakin’s model, May and Hess’s model can easily integrate curved contours, and
we have recently found that it provides an excellent fit to a large psychophysical data set (Hansen
et al. in submission). May and Hess’s model forms ZBRs within a 3-dimensional space, (x, y, θ),
consisting of the two dimensions of the image (x, y), and a third dimension representing filter
orientation (θ). A straight contour would lie within a plane of constant orientation in this space,
whereas a curved contour would move gradually along the orientation dimension as well as across
the spatial dimensions. This 3-D space is formally known as the tangent bundle, and subsequently
other researchers have confirmed its usefulness in contour-completion tasks (Ben-Yosef and
Ben-Shahar 2012).
Contour Integration 199

Around the same time that May and Hess (2008) were developing their model of contour inte-
gration, Rosenholtz and colleagues independently had the same idea, but applied it to a much
broader set of grouping tasks (Rosenholtz et al. 2009). To perform grouping on the basis of some
feature dimension, f, you can create a multidimensional space (x, y, f), and then plot the image in
this space. Then image elements with similar feature values and spatial positions will be nearby
and, if you blur the representation, they join up.

Spatial extent of contour linking


Contour integration performance generally declines with increasing distance between the ele-
ments in a contour stimulus (Field et  al. 1993, May and Hess 2008). As with the comparison
between different contour types (snake, ladder, and rope), increasing the separation does not
make the task intrinsically harder, so the effect of increasing the separation tells us about the spa-
tial extent of the linking mechanism.
May and Hess (2008) varied both the element separation and Gabor carrier frequency in a
factorial design and found that the results strongly constrained the architecture of filter-overlap
models of contour integration. They found that performance was largely unaffected by the car-
rier wavelength of the elements; high-frequency elements could be integrated over almost as long
distances as low-frequency ones. This rules out filter-overlap models that use a linear filter to
integrate the elements because, to integrate over a large distance, you need a large-scale filter,
and large-scale filters tend not to respond well to high-frequency elements. To explain this result,
May and Hess proposed a second-order mechanism in which a squaring operation lies between
two linear filters. If we adjust the scale of the first-stage filter (before the nonlinearity) to match
the contour elements, and adjust the scale of the second-stage filter (after the nonlinearity) to be
large enough to bridge the gap between the elements, then we can accommodate pretty much any
combination of element spacing and carrier wavelength. If the first and second stage filters are
parallel, the model detects snakes; if they are orthogonal, the model detects ladders. The very poor
performance on ropes suggests that there is no corresponding mechanism in which the first and
second stages are oriented at 45 degrees to each other.

Does the same mechanism mediate both contour integration


and psychophysical flanker facilitation?
It has often been suggested that the mechanism that mediates contour integration is also respon-
sible for the psychophysical flanker facilitation effect, whereby a low-contrast target is made more
detectable by the presence of spatially separate flanking elements positioned a moderate dis-
tance from the target. This is an attractively parsimonious idea that has been suggested by many
researchers (Gilbert et al. 1996; Kapadia et al. 1995; Li 1996; Li 1998; Pettet et al. 1998; Polat 1999;
Polat and Bonneh 2000; Stemmler et  al. 1995; Yen and Finkel 1998). If the same mechanisms
underlie psychophysical flanker facilitation and contour integration, one would expect both phe-
nomena to be observed in the same range of conditions. This prediction was tested by Williams
and Hess (1998). Firstly, they found that, unlike foveal contour integration, flanker facilitation
requires the elements to have the same phase. Secondly, flanker facilitation was abolished when
co-circular target and flankers differed in orientation by 20degrees, whereas contours are eas-
ily detectable with larger orientation differences between neighboring elements. Thirdly, flanker
facilitation was abolished or greatly reduced when the stimulus was placed only three degrees into
the periphery, whereas contour integration can be performed easily at much larger eccentricities.
More recently, Huang et  al. (2006) showed that flanker facilitation was disrupted by dichoptic
200 Hess, May, and Dumoulin

presentation to a much greater extent than contour integration, suggesting that contour integra-
tion has a more central cortical site than flanker facilitation. The results from Williams and Hess
(1998) and Huang et al. (2006) showed that flanker facilitation occurs in a much more limited
range of conditions than contour integration, so it seems unlikely that contour integration could
be achieved by the mechanisms responsible for psychophysical flanker facilitation. Williams and
Hess argued that the latter effect might arise through a reduction in positional uncertainty due to
the flanking elements, a view subsequently supported by Petrov et al. (2006).

Does the same mechanism mediate both contour


integration and crowding?
Crowding is the phenomenon whereby a stimulus (usually presented in the periphery) that is eas-
ily identifiable becomes difficult to identify when flanked by distracting stimuli. One view is that
crowding is caused by excessive integration across space. Pelli et al. (2004) proposed that, at each
point in the visual field, there is a range of integration field sizes, and the observer uses the size of
field that is best for the task at hand; integration fields are used for any task that involves integra-
tion of information from more than one elementary feature detector. Pelli et al. argued that, at
each location in the visual field, the minimum available integration field size scales with eccentric-
ity. This means that, particularly in the periphery, the observer may be forced to use an integration
field that is inappropriately large for the task, and that is when crowding occurs.
Pelli et al.’s integration field sounds much like Field et al.’s Association Field, and May and Hess
(2007b) argued that the Association Field is in fact an example of the kind of integration field
that Pelli et al. suggested mediates the crowding effect. May and Hess implemented a simple ver-
sion of Pelli et al.’s crowding model and showed that this model could explain data on contour
detection in fovea and periphery, as well as showing Pelli et al.’s three key diagnostic features of
crowding: The critical target-flanker spacing for crowding to occur is independent of the size of
the target, scales with eccentricity, and is greater on the peripheral side of the target. Subsequently,
van den Berg et al. (2010) reported a population code model of feature integration that, like May
and Hess’s (2007b) model, explained both contour integration and crowding.
May and Hess (2007b) first proposed the link between contour integration and crowding and
provided circumstantial evidence in its support. Chakravarthi and Pelli (2011) later directly tested
this proposal by using the same stimuli for both a contour integration task and a crowding task. As
the ‘wiggle’ in the contours increased, the contour integration performance got worse (indicating
less integration), and performance on the crowding task got better (again indicating less integra-
tion). The ‘wiggle threshold’ was the same on both tasks, indicating that the same mechanism
mediated both contour integration and crowding (see also Rosenholtz, this volume).

Conclusion
The visual system groups local edge information into contours that are segmented from the back-
ground clutter in a visual scene. We have outlined two ways that this might be achieved. One is
an Association Field, which explicitly links neurons with different preferred locations and orien-
tations in a way that closely matches edge co-occurrence statistics in natural images. The other
is a simple filter-rectify-filter mechanism that, in the first stage, obtains a response to the con-
tour elements and, in the second stage, blurs this filter response along the contour; contours are
then defined by thresholding the filter output and identifying regions of contiguous response
across filter orientation and 2D image space. Both proposed mechanisms are consistent with
much of the available evidence, and it may be that either or both of these mechanisms play a
Contour Integration 201

role in implementing contour integration in biological vision. Evidence from electrophysiology


and functional imaging suggests that contour integration is implemented in early visual corti-
ces, perhaps V1, V2, and V4, but the exact biological implementation needs further elucidation.
The grouping phenomena discussed here involve local edge information, but similar grouping
processes might also be manifested in other domains. Indeed, Rosenholtz and colleagues (2009)
have shown how May and Hess’s (2008) filter-overlap algorithm for contour integration can be
extended to accommodate a wide variety of grouping tasks. Contour integration may also be
related to other pooling phenomena such as crowding. If this is the case, then the Association
Field that has been proposed as a mechanism for contour integration may be a specific example of
the integration field that is thought to be responsible for crowding.

Acknowledgements
This work was support by CIHR (#mop 53346 & mop10818) and NSERC (#46528-110) grants
to RFH. NWO (#452-08-008 & #433-09-223) grants supported SOD. KAM was supported by
EPSRC grant EP/H033955/1 to Joshua Solomon.

References
Allman, J., Miezin, F., and McGuinness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Ann.
Rev. Neurosci. 8: 407–30.
Altmann, C.F., Bulthoff, H.H., and Kourtzi, Z. (2003). Perceptual organization of local elements into global
shapes in the human visual cortex. Curr. Biol. 13(4): 342–9.
Altmann, C.F., Deubelius, A., and Kourtzi, Z. (2004). Shape saliency modulates contextual processing in
the human lateral occipital complex. J. Cogn. Neurosci. 16(5): 794–804.
Anzai, A., Peng, X., and Van Essen, D.C. (2007). Neurons in monkey visual area V2 encode combinations
of orientations. Nat. Neurosci. 10(10): 1313–21.
Banks, M.S., Geisler, W.S., and Bennett, P.J. (1987). The physical limits of grating visibility. Vision Research
27: 1915–24.
Beaudot, W.H.A. (2002). Role of onset asychrony in contour integration. Vision Research 42: 1–9.
Beck, J., Rosenfeld, A., and Ivry, R. (1989). Line segmentation. Spatial Vision 42(3): 75–101.
Ben-Yosef, G. and Ben-Shahar, O. (2012). A tangent bundle theory for visual curve completion. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34: 1263–80.
Bex, P.J., Simmers, A.J., and Dakin, S.C. (2001). Snakes and ladders: the role of temporal modulation in
visual contour integration. Vision Research 41: 3775–82.
Blakemore, C. and Tobin, E.A. (1972). Lateral inhibition between orientation detectors in the cat’s visual
cortex. Experimental Brain Research 15: 439–40.
Bosking, W.H., Zhang, Y., Schofield, B., and Fitzpatrick, D. (1997). Orientation selectivity and the
arrangement of horizontal connections in the tree shrew striate cortex. J. Neurosci. 17: 2112–27.
Boynton, G.M., Demb, J.B., Glover, G.H., and Heeger, D.J. (1999). Neuronal basis of contrast
discrimination. Vision Research 39(2): 257–69.
Chakravarthi, R. and Pelli, D.G. (2011). The same binding in contour integration and crowding. Journal of
Vision 11(8), 10: 1–12.
Dakin, S.C. and Bex, P.J. (2002). Role of synchrony in contour binding: some transient doubts sustained.
J. Opt. Soc. Am. A, Opt. Image Sci. Vis. 19(4): 678–86.
Dakin, S.C. and Hess, R.F. (1998). Spatial-frequency tuning of visual contour integration. J. Opt. Soc. Am. A
15(6): 1486–99.
202 Hess, May, and Dumoulin

De Valois, R.L. and De Valois, K.K. (1990). Spatial Vision. Oxford: Oxford University Press.
Dumoulin, S.O. and Hess, R.F. (2006). Modulation of V1 activity by shape: image-statistics or shape-based
perception? J. Neurophysiol. 95(6): 3654–64.
Dumoulin, S.O. and Hess, R.F. (2007). Cortical specialization for concentric shape processing. Vision
Research 47(12): 1608–13.
Dumoulin, S.O., Dakin, S.C., and Hess, R.F. (2008). Sparsely distributed contours dominate extra-striate
responses to complex scenes. Neuroimage 42(2): 890–901.
Elder, J.H. and Goldberg, R.M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. Journal of Vision 2(4), 5: 324–53.
Ernst, U.A., Mandon, S., Schinkel-Bielefeld, N., Neitzel, S.D., Kreiter, A.K., and Pawelzik, K.R. (2012).
Optimality of Human Contour Integration. PLoS Computational Biology 8(5): e1002520
Fang, F., Kersten, D., and Murray, S.O. (2008). Perceptual grouping and inverse fMRI activity patterns in
human visual cortex. J. Vis., 8(7), 2: 1–9.
Field, D.J., Hayes, A., and Hess, R.F. (1993). Contour integration by the human visual system: evidence for
a local ‘association field’. Vision Research 33(2): 173–93.
Field, D.J., Hayes, A., and Hess, R.F. (1997). The role of phase and contrast polarity in contour integration.
Investigative Ophthalmology and Visual Science 38: S999.
Fitzpatrick, D. (2000). Seeing beyond the receptive field in primary visual cortex. Curr. Opin. Neurobiol.
10(4): 438–43.
Gallant, J.L., Braun, J., and Van Essen, D.C. (1993). Selectivity for polar, hyperbolic, and Cartesian gratings
in macaque visual cortex. Science 259(5091): 100–3.
Gallant, J.L., Connor, C.E., Rakshit, S., Lewis, J.W., and Van Essen, D.C. (1996). Neural responses to polar,
hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J. Neurophysiol. 76(4): 2718–39.
Geisler, W.S. (1984). Physical limits of acuity and hyperacuity. J. Op. Soc. Am., A 1: 775–82.
Geisler, W.S. (1989). Sequential ideal-observer analysis of visual discriminations. Psychological Review
96: 267–314.
Geisler, W.S., Perry, J.S., Super, B.J., and Gallogly, D.P. (2001). Edge co-occurrence in natural images
predicts contour grouping performance. Vision Research 41(6): 711–24.
Gilbert, C.D. and Wiesel, T.N. (1979). Morphology and intracortical connections of functionally
characterised neurones in the cat visual cortex. Nature 280: 120–5.
Gilbert, C.D. and Wiesel, T.N. (1989). Columnar specificity of intrinsic horizontal and corticocortical
connections in cat visual cortex. J. Neurosci. 9(7): 2432–42.
Gilbert, C.D., Das, A., Ito, M., Kapadia, M., and Westheimer, G. (1996). Spatial integration and
cortical dynamics. Proceedings of the National Academy of Sciences of the United States of America
93: 615–22.
Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., and Malach, R. (1998). A sequence of
object-processing stages revealed by fMRI in the human occipital lobe. Hum Brain Mapp, 6(4): 316–28.
Hegde, J. and Van Essen, D.C. (2000). Selectivity for complex shapes in primate visual area V2. J. Neurosci.
20(5): RC61.
Hess, R.F., and Field, D.J. (1995). Contour integration across depth. Vision Research 35(12): 1699–711.
Hansen, B. C., May, K. A., and Hess, R. F. (2014) One “Shape” Fits All: The Orientation Bandwidth of
Contour Integration. J. Vis., (in submission)
Hess, R.F. and Dakin, S.C. (1997). Absence of contour linking in peripheral vision. Nature 390: 602–4.
Hess, R.F., Dakin, S.C., and Field, D.J. (1998). The role of ‘contrast enhancement’ in the detection and
appearance of visual contours. Vision Research 38 (6): 783–7.
Hess, R.F., Beaudot, W.H.A., and Mullen, K.T. (2001). Dynamics of contour integration. Vision Research
41: 1023–37.
Contour Integration 203

Hess, R.F., Ledgeway, T., and Dakin, S.C. (2000). Improvished second-order input to global linking in
human vision. Vision Research 40: 3309–18.
Hess, R.F., Hayes, A., and Field, D.J. (2003). Contour integration and cortical processing. J. Physiol. Paris
97(2–3): 105–19.
Huang, P.-C., Hess, R.F., and Dakin, S.C. (2006). Flank facilitation and contour integration: Different sites.
Vision Research 46: 3699–706.
Hubel, D.H. and Wiesel, T.N. (1968). Receptive fields and functional architecture of monkey striate cortex.
J. Physiol. 195(1): 215–43.
Ito, M. and Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of
macaque monkeys. J. Neurosci. 24(13): 3313–24.
Johnson, E.N., Hawken, M.J., and Shapley, R. (2001). The spatial transformation of color in the primary
visual cortex of the macaque monkey. Nat. Neurosci. 4(4): 409–16.
Jones, H.E., Wang, W., and Sillito, A.M. (2002). Spatial organization and magnitude of orientation contrast
interactions in primate V1. J. Neurophysiol. 88: 2796–808.
Kapadia, M.K., Ito, M., Gilbert, C.D., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: parallel studies in human observers and in V1 of alert monkeys. Neuron
15(4): 843–56.
Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1997). Neuronal correlates of pop-out in cat striate cortex.
Vision Research 37: 371–76.
Kay, K.N., Naselaris, T., Prenger, R.J., and Gallant, J.L. (2008). Identifying natural images from human
brain activity. Nature 452(7185): 352–5.
Kisvárday, Z.F., Tóth, E., Rausch, M., and Eysel, U.T. (1997). Orientation-specific relationship between
populations of excitatory and inhibitory lateral connections in the visual cortex of the cat. Cerebral
Cortex 7: 605–18.
Knierim, J.J. and Van Essen, D.C. (1992). Neuronal responses to static texture patterns in area V1 of the
alert macaque monkey. J. Neurophysiol. 67: 961–80.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kourtzi, Z. and Huberle, E. (2005). Spatiotemporal characteristics of form analysis in the human visual
cortex revealed by rapid event-related fMRI adaptation. Neuroimage 28(2): 440–52.
Kourtzi, Z., Tolias, A.S., Altmann, C.F., Augath, M., and Logothetis, N.K. (2003). Integration of local
features into global shapes: monkey and human FMRI studies. Neuron 37(2): 333–46.
Kovacs, I. and Julesz, B. (1993). A closed curve is much more than an incomplete one: effect of closure
in figure-ground segmentation. Proceedings of the National Academy of Sciences of the United States of
America 90: 7495–7.
Kruger, N. (1998). Colinearity and parallelism are statistically significant second order relations of complex
cell responses. Neural Processing Letters. 8: 117–29.
Lamme, V.A.F. (1995). The neurophysiology of figure-ground segregation in primary visual cortex.
J. Neurosci. 15(2): 1605–15.
Lamme, V.A.F., Super, H., and Speckreijse, H. (1998). Feedforward, horizontal and feedback processing in
the visual cortex. Curr. Op. Neurobiol. 8: 529–35.
Ledgeway, T., Hess, R.F., and Geisler, W.S. (2005). Grouping local orientation and direction signals to
extract spatial contours: Empirical tests of ‘association field’ models of contour integration. Vision
Research 45: 2511–22.
Lerner, Y., Hendler, T., Ben-Bashat, D., Harel, M., and Malach, R. (2001). A hierarchical axis of object
processing stages in the human visual cortex. Cereb. Cortex 11(4): 287–97.
Li, Z. (1996). A neural model of visual contour integration. Advances in Neural Information Processing
Systems, 9, pp. 69–75. Boston: MIT Pres.
204 Hess, May, and Dumoulin

Li, Z. (1998). A neural model of contour integration in the primary visual cortex. Neural Computation
10(4): 903–40.
Li, Z. (1999). Contextual influences in V1 as a basis for pop out and asymmetry in visual search. Proceedings
of the National Academy of Sciences of the United States of America 96: 10530–5.
Li, Z. (2000). Pre-attentive segmentation in the primary visual cortex. Spatial Vision 13: 25–50.
Li, Z. (2002). A saliency map in primary visual cortex. Trends in Cognitive Sciences 6: 9–16.
Li, W., Piech, V., and Gilbert, C.D. (2006). Contour saliency in primary visual cortex. Neuron
50(6): 951–62.
Li, W., Piech, V., and Gilbert, C.D. (2008). Learning to link visual contours. Neuron 57(3): 442–51.
Malach, R., Amir, Y., Harel, H., and Grinvald, A. (1993). Relationship between intrinsic connections and
functional architecture revealed by optical imaging and in vivo targeted biocytin injections in primary
striate cortex. Proc. Natl. Acad. Sci. USA 90: 10469–73.
Malach, R., Reppas, J.B., Benson, R.R., Kwong, K.K., Jiang, H., Kennedy, W.A., Ledden, P.J., Brady, T.J.,
Rosen, B.R., and Tootell, R.B. (1995). Object-related activity revealed by functional magnetic resonance
imaging in human occipital cortex. Proc. Natl. Acad. Sci. USA 92(18): 8135–9.
Mante, V. and Carandini, M. (2005). Mapping of stimulus energy in primary visual cortex. J. Neurophysiol.
94(1): 788–98.
May, K.A. and Hess, R.F. (2007a). Dynamics of snakes and ladders. J. Vis. 7(12) 13: 1–9.
May, K.A. and Hess, R.F. (2007b). Ladder contours are undetectable in the periphery: a crowding effect?
J. Vis. 7 (13) 9: 1–15.
May, K.A. and Hess, R.F. (2008). Effects of element separation and carrier wavelength on detection of
snakes and ladders: Implications for models of contour integration. J. Vis. 8(13), 4: 1–23.
McIlhagga, W.H. and May, K.A. (2012). Optimal edge filters explain human blur detection. J. Vis. 12(10),
9: 1–13.
McIlhagga, W.H. and Mullen, K.T. (1996). Contour integration with colour and luminance contrast. Vision
Research 36(9): 1265–79.
Moulden, B. (1994). Collator units: second-stage orientational filters. In: M.J. Morgan (ed.) Higher-order
processing in the visual system: CIBA Foundation Symposium 184, pp. 170–84. Chichester: John Wiley
and Sons.
Mullen, K.T., Beaudot, W.H.A., and McIlhagga, W.H. (2000). Contour integration in color vision: a
common process for blue-yellow, red-green and luminance mechanisms? Vision Research 40: 639–55.
Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., and Woods, D.L. (2002). Shape perception
reduces activity in human primary visual cortex. Proc. Natl. Acad. Sci. USA, 99(23): 15164–9.
Nelson, J.I., and Frost, B.J. (1978). Orientation-selective inhibition from beyond the classic visual receptive
field. Brain Res. 139(2): 359–65.
Nelson, J.I., and Frost, B.J. (1985). Intracortical facilitation among co-oriented, co-axially aligned simple
cells in cat striate cortex. Exp. Brain Res. 61(1): 54–61.
Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (1999). Response modulation by texture surround in
primate area V1: correlates of ‘popout’ under anesthesia. Vis. Neurosci. 16 (1): 15–34.
Olman, C.A., Ugurbil, K., Schrater, P., and Kersten, D. (2004). BOLD fMRI and psychophysical
measurements of contrast response to broadband images. Vision Research 44(7): 669–83.
Parker, A.J. and Cumming, B.G. (2001). Cortical mechanisms of binocular stereoscopic vision. Prog. Brain
Res. 134: 205–16.
Pasupathy, A. and Connor, C.E. (1999). Responses to contour features in macaque area V4. J. Neurophysiol.
82(5): 2490–502.
Pelli, D.G., Palomares, M., and Majaj, N.J. (2004). Crowding is unlike ordinary masking: distinguishing
feature integration from detection. J. Vis. 4(12): 1136–69.
Contour Integration 205

Petrov, Y., Verghese, P., and McKee, S.P. (2006). Collinear facilitation is largely uncertainty reduction. J.Vis.
6(2): 170–8.
Pettet, M.W., McKee, S.P., and Grzywacz, N.M. (1996). Smoothness constrains long-range interactions
mediating contour-detection. Investigative Ophthalmology and Visual Science 37: 4368.
Pettet, M.W., McKee, S.P., and Grzywacz, N.M. (1998). Constraints on long-range interactions mediating
contour-detection. Vision Research 38(6): 865–79.
Polat, U. (1999). Functional architecture of long-range perceptual interactions. Spatial Vision 12: 143–62.
Polat, U. and Bonneh, Y. (2000). Collinear interactions and contour integration. Spatial Vision
13(4): 393–401.
Polat, U. and Sagi, D. (1993). Lateral interactions between spatial channels: suppression and facilitation
revealed by lateral masking experiments. Vision Research 33(7): 993–9.
Polat, U. and Sagi, D. (1994). The architecture of perceptual spatial interactions. Vision Research
34(1): 73–8.
Polat, U., Mizobe, K., Pettet, M.W., Kasamatsu, T., and Norcia, A.M. (1998). Collinear stimuli regulate
visual responses depending on cell’s contrast threshold. Nature 391(6667): 580–4.
Pooresmaeili, A, Herrero, J. L., Self, M. W., Roelfsema, P. P., and Thiele, A. (2010). Suppressive lateral
interactions at parafoveal representations in primary visual cortex. The Journal of Neuroscience,
30(38): 12745–12758.
Rainer, G., Augath, M., Trinath, T., and Logothetis, N.K. (2002). The effect of image scrambling on visual
cortical BOLD activity in the anesthetized monkey. Neuroimage 16 (3 Pt 1): 607–16.
Rosenholtz, R., Twarog, N.R., Schinkel-Bielefeld, N., and Wattenberg, M. (2009). An intuitive model of
perceptual grouping for HCI design. Proceedings of the 27th international conference on Human factors
in computing systems, pp. 1331–40.
Schmidt, K.E., Goebel, R., Lowel, S., and Singer, W. (1997). The perceptual grouping criterion of
collinearity is reflected by anisotropies of connections in the primary visual cortex. J. Eur. Neurosci.
9: 1083–1089.
Sigman, M., Cecchi, G.A., Gilbert, C.D., and Magnasco, M.O. (2001). On a common circle: natural scenes
and gestalt rules. Proc. Nat. Acad. Sci. USA 98(4): 1935–40.
Sillito, A.M., Grieve, K.L., Jones, H.E., Cudeiro, J., and Davis, J. (1995). Visual cortical mechanisms
detecting focal orientation discontinuities. Nature 378: 492–6.
Singer, W., and Gray, C.M. (1995). Visual feature integration and the temporal correlation hypothesis. Ann.
Rev. Neurosci. 18: 555–86.
Smits, J.T. and Vos, P.G. (1987). The perception of continuous curves in dot stimuli. Perception
16(1): 121–31.
Stemmler, M., Usher, M., and Niebur, E. (1995). Lateral interactions in primary visual cortex: A model
bridging physiology and psychophysics. Science 269: 1877–80.
Stettler, D.D., Das, A., Bennett, J., and Gilbert, C.D. (2002). Lateral connectivity and contextual
interactions in macaque primary visual cortex. Neuron 36: 739–50.
Tanskanen, T., Saarinen, J., Parkkonen, L., and Hari, R. (2008). From local to global: Cortical dynamics of
contour integration. J. Vis. 8(7), 15: 1–12.
Uttal, W.R. (1983). Visual form detection in 3-dimentional space. Hillsdale: Lawrence Erlbaum.
van den Berg, R., Roerdink, J.B.T.M., and Cornelissen, F.W. (2010). A neurophysiologically plausible
population code model for feature integration explains visual crowding. PLoS Computational Biology
6 (1): e1000646.
Wagemans, J. (1995). Detection of visual symmetries. Spat. Vis. 9(1): 9–32.
Watt, R., Ledgeway, T., and Dakin, S.C. (2008). Families of models for gabor paths demonstrate the
importance of spatial adjacency. J. Vis. 8(7): 1–19.
206 Hess, May, and Dumoulin

Weliky, G.A., Kandler, K., Fitzpatrick, D., and Katz, L.C. (1995). Patterns of excitation and inhibition
evoked by horizontal connections in visual cortex share a common relationship to orientation columns.
Neuron 15: 541–52.
Williams, C.B., and Hess, R.F. (1998). The relationship between facilitation at threshold and suprathreshold
contour integration. J. Op. Soc. Am., A 15(8): 2046–51.
Yen, S.-C. and Finkel, L.H. (1998). Extraction of perceptually salient contours by striate cortical networks.
Vision Research 38: 719–41.
Zhaoping, L. and May, K.A. (2007). Psychophysical tests of the hypothesis of a bottom-up saliency map in
primary visual cortex. PLoS Computational Biology, 3(4). doi: 10.1371/journal.pcbi.0030062
Zhou, Y.X. and Baker, C.L., Jr. (1993). A processing stream in mammalian visual cortex neurons for
non-Fourier responses. Science 261(5117): 98–101.
Zipser, K., Lamme, V.A.F., and Schiller, P.H. (1996). Contextural modulation in primary visual cortex.
J. Neurophysiol. 16: 7376–89.
Chapter 11

Bridging the dimensional gap:


Perceptual organization of contour
into two-dimensional shape
James H. Elder

Introduction
The visible surface of a 3D object in the world projects to a 2D region of the retinal image. The rim
of the object, defined to be the set of surface points on the object grazed by the manifold of rays
passing through the optical centre of the eye (Koenderink 1984), projects to the image as a 1D
bounding contour. For a simply connected, unoccluded object, the rim projects as a simple closed
curve in the image, and such contours are sufficient to yield compelling percepts of 2D and even
3D shape (Figure 11.1a).
In the general case, however, even for a smooth object the bounding contour can be fragmented
due to occlusions, including self-occlusions, and the representation of the bounding contour is
further fragmented by the pointillist representations of the early visual system. From the photo-
receptors of the retina through the retinal ganglia, midbrain, and spatiotopic areas of the object
pathway in visual cortex, the image, and hence its contours, are represented piecemeal. A fun-
damental question is how the visual system assembles these pieces into the coherent percepts of
whole objects we experience.
An alternative to grouping the contour fragments of the boundary is to group the points inte-
rior to this contour based on their apparent similarity, a process known as region segmentation
(see Self and Roelfsema, this volume). By the Jordan Curve Theorem (Jordan 1887), for a simple
closed boundary curve the region and its boundary are formally dual (i.e. one can be derived from
the other), so in theory either method should suffice. In addition, an advantage of region grouping
is that one can initialize the solution with the correct topology (e.g. a simply connected region)
and easily maintain this topology as the solution evolves. The downside is the dependence of these
methods upon the homogeneous appearance of the object, which may not apply (Figure 11.1b). In
such cases, the geometric regularity of the boundary may be the only basis for perceptual organi-
zation. This is consistent with psychophysical studies using simple fragmented shapes that reveal
specialized mechanisms for contour grouping, distinct from processes for region grouping (Elder
and Zucker 1994).
One valid concern is that the contour grouping mechanisms revealed with simple artificial
stimuli may not generalize to complex natural scenes. However, a recent study by Elder and
Velisavljević (2009) suggests otherwise. This study used the Berkeley Segmentation Dataset
(BSD, Martin, Fowlkes, and Malik 2004) to explore the dynamics of animal detection in natural
scenes. For each image in the dataset, the BSD provides hand segmentations created by human
subjects, each of which carves up the image into meaningful regions. Elder and Velisavjlević
208 Elder

(a) (b)

Fig. 11.1  (a) Shape from contour. (b) When surface textures are heterogeneous, geometric
regularities of the object boundaries are the only cues for object segmentation. From Iverson (2012).
Reprinted with permission.

used this dataset to create new images in which luminance, colour, texture, and contour shape
cues were selectively turned on or off (Figure 11.2(a)). They then measured performance for
animal detection using these various modified images over a range of stimulus durations (Figure
11.2(b)). While each condition generally involved multiple cues, assuming additive cue combi-
nation, the contribution of each cue can be estimated using standard regression methods (Figure
11.2(c)).
The results show that humans do not use simple luminance or colour cues for animal detection,
but instead rely on contour shape and texture cues. Interestingly, the contour shape cues appear to
be the first available, influencing performance for stimulus durations as short as 10 msec, within a
backward masking paradigm. A control study found only a modest performance decrement when
the hand-drawn outlines were replaced by computer-generated edge maps (Elder and Zucker
1998b). Thus, contour grouping mechanisms appear to underlie rapid object perception for both
simple artificial images and complex natural scenes. (One can speculate on whether animal cam-
ouflage may make colour and texture cues less reliable than shape cues for animal detection in
particular—see Osorio and Cuthill, this volume.)
At the same time, we know from the fifty-year history of computer vision that contour grouping
is computationally difficult, due to fragmentation caused by occlusions as well as sections of con-
tour where figure/ground contrast is low. These two scenarios illustrate the problems of amodal
and modal completion, respectively (Figure 11.3). (A debate persists regarding whether a com-
mon mechanism underlies both amodal and modal completion—see van Lier and Gerbino, this
volume, for details. I will not address this debate here, but rather will consider the more general
problem of grouping fragmented contours, without regard for the cause of the fragmentation. It is
likely that the models discussed here could be productively refined by making this distinction, for
example by switching grouping mechanisms based upon the detection of T-junctions suggestive
of occlusion.)
To further complicate matters, natural images are often highly cluttered, so that for each contour
fragment, there are typically multiple possible fragments that might b e the correct continuation
Bridging the Dimensional Gap 209

(a)

LCTS LTS LCS SO LC

(c) 2
Texture
(b) 1.5
Shape
1000 ms
30-120 ms 1
50 ms

d'
+ 0.5
Until response Colour
0 Luminance
Animal Non-Animal
–0.5
100 101 102 103
Stimulus duration (msec)

Fig. 11.2  Psychophysical animal detection experiment. (a) Example stimuli. The letters indicate the cues
available: Luminance, Color, Texture, Shape. ‘SO’ stands for ‘Shape Outline’. (b) Stimulus sequence. (c)
Estimated influence of the four individual cues to animal detection.
Reproduced from James H. Elder and Ljiljana Velisavljević, Cue Dynamics Underlying Rapid Detection of Animals
in Natural Scenes, Journal of Vision, 9(7), figure 3, doi: 10.1167/9.7.7 © 2009, Association for Research in Vision
and Ophthalmology.

of the contour. Thus to effectively exploit contours for object segmentation, the visual system must
be able to cope with uncertainty, using a relaxed form of perceptual contour closure that can work
reliably even for fragmented contours (Elder and Zucker 1993). For these reasons, computing the
correct bounding contours of objects in complex natural scenes is generally thought to be one of
the harder computer vision problems, and the state of the art is still quite far from human per-
formance (Arbelaez et al. 2011). So the question remains: how does the brain rapidly and reliably
solve this problem that computer vision algorithms fail to solve?

Computational framework
The standard computational framework for modelling contour grouping consists of three stages:
1 Local orientation coding. Detection of the local oriented elements (edges or line segments) to be
grouped.
2 Pairwise association. Computation of the strength of grouping (ideally expressed as a
probability) between each pair of local elements. This can be represented as a transition matrix.
These local probabilities are typically based on classical local Gestalt cues such as proximity,
good continuation and similarity in brightness, contrast and colour.
3 Global contour extraction. Inference of global contours based upon this transition matrix.
I will review all three of these stages below, but will focus primarily on the last, which in my
view is the hardest. To see this, we must first more clearly articulate the exact goal of the global
contour extraction stage. There are essentially two proposals. One (e.g. Geisler et al. 2001) is to
extract the unordered set of local elements comprising each contour. The second (e.g. Elder and
Goldberg 2002) is to extract the ordered sequence of local elements forming the contour. We
210 Elder

Fig. 11.3  Object boundaries project to the image as fragmented contours, due to occlusions (cyan) and
low figure/ground contrast (red).
Reproduced from Wagemans, J., Elder, J., Kubovy, M., Palmer, S., Peterson, M., Singh, M., & von der Heydt, R.,
A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization.
Psychological Bulletin, 138(6), pp. 1172–1217 (c) 2012, American Psychological Association.

will analyse these two objectives in more detail below, but for now note that in either case the
solution space is exponential in the number of elements comprising each contour. In particular,
given n oriented elements in the image and k elements comprising a particular contour, there are
n!/(k!(n – k)!) possible set solutions and n!/(n – k)! sequence solutions. Thus a key problem is to
identify effective algorithms that only need to explore a small part of this search space to find the
correct contours.

Local orientation coding


The first stage of contour grouping involves the detection of the oriented elements corresponding
to the local tangents of the underlying contours. This process is normally identified with primary
visual cortex in primate, where orientation selectivity first emerges (Hubel and Wiesel 1968; see
also Hess et al., this volume). Computationally, even this step is non-trivial, as the local contrast
of the image can be faint (as it is for the contour highlighted in red in Figure 11.3), or blurred.
Multiscale filtering methods (Elder and Zucker 1998b; Lindeberg 1998) have been shown to be
computationally effective here, and this matches fairly well with the physiological (Hawken and
Parker 1991; Ringach 2002) and psychophysical (Wilson and Bergen 1979; Watt and Morgan 1984;
Elder and Sachs 2004) evidence for multiscale processing in human and non-human primate.
The orientation bandwidths of these local mechanisms have been estimated psychophysically
Bridging the Dimensional Gap 211

θ1 θ2

α1 ρ
α2
β1
β2

Fig. 11.4  The Gestalt cue of proximity can be expressed as a function of the distance ρ between each
pair of local elements. The cue of good continuation for oriented edges in an image can be expressed to
first order as a function of two angles θ1 and θ2. The cue of similarity can be expressed as a function of
photometric measurements αi, βi on either side of each edge.
Reproduced from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 4, doi: 10.1167/2.4.5 © 2002, Association for Research in
Vision and Ophthalmology.

using grating stimuli (Blakemore and Nachmias 1971; Campbell and Kulikowski 1966; Phillips
and Wilson 1984; Snowden 1992) and orientation fields (e.g. Glass patterns, Maloney, Mitchison,
and Barlow 1987; Dakin 1997, 2001; Or and Elder 2011) to be between 7 and 15 deg (half-width
at half-height), and this corresponds fairly well to the physiology (Hawken and Parker 1991;
Ringach 2002).
Beyond issues of scale and contrast is the problem that for natural scenes, not all contours are
created equally. Contours corresponding to object boundaries may in fact be in the minority, lost in
a sea of contours produced by reflectance changes, shading, and shadows. Computationally, colour
and texture information has been found useful in estimating the relative importance of local edges
(e.g. Martin et al. 2004), but the mapping of these mechanisms to visual cortex remains unclear.

Pairwise association
The study of the strength of association between pairs of local elements is rooted in the early work
of Gestalt psychologists (Wertheimer 1938), who identified three central cues that are relevant
here: proximity, good continuation, and similarity (Figure 11.4). We consider each in turn below.
(See also Feldman, this volume.)

Proximity
The principle of proximity states that the strength of grouping between two elements increases
as these elements are brought nearer to each other. But how exactly does grouping strength vary
as a function of their separation? In an early attempt to answer this question, Oyama (1961)
manipulated the horizontal and vertical spacing of dots arranged in a rectangular array, measur-
ing the duration of time subjects perceived the arrays organized as vertical lines vs horizontal lines
(Figure 11.5a). He found that the ratio of durations th/tv could be accurately related to the ratio of
dot spacing dh/dv through a power law: th/tv = (dh/dv)−α, with α ≈ 2.89.
Using an elaboration of this psychophysical technique, Kubovy and colleagues (Kubovy and
Wagemans 1995; Kubovy, Holcombe, and Wagemans 1998) modelled the proximity cue as an
exponential decay, which is consistent with random-walk models of contour formation (Mumford
1992; Williams and Jacobs 1997). However, they also noted that a power law model would fit
their data equally well. Further, they found that the proximity cue was approximately scale invari-
ant: scaling all distances by the same factor did not affect results. Since the power law is the only
212 Elder

(a) (b)
Proximity: Contour likelihood distribution
101 Data
Power law model
100 Simulated noisy
power law

10–1

p(Gap)
10–2

dv 10–3

10–4
dh –3 –2 –1 0 1 2 3 4 5
log(Gap)
Fig. 11.5  (a) Psychophysical stimulus used to measure the proximity cue (Oyama 1961). See text for
details. (b) Ecological statistics of the proximity cue for contour grouping. The data follow a power law
for distances greater than 2 image pixels. For smaller distances, measurement noise dominates.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 7a, doi: 10.1167/2.4.5 © 2002, Association for Research
in Vision and Ophthalmology.

perfectly scale-invariant distribution, this last result adds strength to the power-law model of
proximity.
Perceptual scale invariance is rational if in fact the proximity of elements along real contours in
natural images is scale invariant, i.e. if the ecological distribution follows a power law. In support
of this idea, Sigman et al. (2001) reported that the spatial correlation in the response of collinearly
oriented filters to natural images does indeed follow a power law, suggesting a correspondence
between perception and the ecological statistics of the proximity cue. Quantitatively, however, the
correspondence is poor: while Oyama estimated the perceptual exponent to be α ≈ 2.89, Sigman
et al. estimated an ecological exponent of only 0.6, reflective of a much weaker cue to grouping.
This discrepancy can be accounted for if we consider that Sigman et al. did not restrict their
measurements to pairs of neighbouring elements on the same contour of the image. In fact, the
measurements were not constrained to be on the same contour, or even on a contour at all. Thus
the estimate mixes measurements made between strongly related and only weakly related image
features. This mixing of measurements on, off, and between contours can be expected to weaken
estimates of the conditional statistical distributions that generate natural images.
Elder and Goldberg (2002) estimated these distributions more directly, using human subjects
to label the sequence of elements forming the contours of natural images, with the aid of an inter-
active image editing tool (Elder and Goldberg 2001). This technique allowed the measurements
to be restricted to successive elements along the same contour, and yielded a clear power law
(Figure 11.5b) with exponent α = 2.92, very close to the perceptual estimate of Oyama.
In summary, the convergence between psychophysics and ecological statistics is compelling.
Ecologically, proximity follows a power law and exhibits scale invariance, and these properties are
mirrored by the psychophysical results. Thus we have a strong indication that the human percep-
tual system for grouping contours is optimally tuned for the ecological statistics of the proximity
cue in natural scenes.
Bridging the Dimensional Gap 213

Good continuation
The principal of good continuation refers to the tendency for elements to be grouped to form
smooth contours (Wertheimer 1938). A very nice method for studying the principal of good con-
tinuation in isolation was developed by Field, Hayes, and Hess (1993) (see also Hess et al, this
volume). In this method, a contour formed from localized oriented elements is embedded in a
random field of distractor elements, in such a way that the cue of proximity is roughly eliminated.
Aligning the contour elements to be tangent to the contour makes the contour easily detected,
whereas randomizing the orientation of the elements renders the contour invisible. This clearly
demonstrates the role of good continuation in isolation from other cues.
These findings led Field et al to suggest the notion of an ‘association field’ that determines the
linking of oriented elements within a local visual neighbourhood (Figure 11.6), a construct that is
closely related to the machinery of cocircularity support neighbourhoods, developed somewhat
earlier for the purpose of contour refinement in computer vision (Parent and Zucker 1989).
Ecological data on good continuation have also begun to emerge. Kruger (1998) and later
Sigman et al. (2001) found evidence for colinearity, cocircularity and parallelism in the statistics
of natural images. Geisler et al. (2001) found similar results using both labelled and unlabelled
natural image data. Crucially, Geisler et al. also conducted a companion psychophysics experi-
ment that revealed a fairly close correspondence between the tuning of human perception to the
good continuation cue, and the statistics of this cue in natural images.
To be optimal the decision to group two elements should be based on the likelihood ratio, in
this case, the ratio of the probability that two elements from the same contour would generate
the observed geometric configuration, to the probability that a random pair of elements would
generate this configuration. To compute this ratio, Geisler et al. treated contours as unordered
sets of oriented elements, measuring the statistics for pairs of contour elements on a common
object boundary, regardless of whether these element pairs were close together or far apart on the
object contour. In contrast, Elder and Goldberg (2002) modelled contours as ordered sequences
of oriented elements, restricting measurements to adjacent pairs of oriented elements along the
contours. Figure 11.7 shows maps of the likelihood ratios determined using the two methods.
Note that the likelihood ratios are much larger for the sequential statistics, reflecting a stronger
statistical association between neighbouring contour elements.

(a) (b)

Fig. 11.6  Models of good continuation. (a) Cocircularity support neighbourhood. (b) Association field.
(a) © 1998 IEEE. Adapted, with permission, from Parent, P.; Zucker, S.W., Trace inference, curvature
consistency, and curve detection, IEEE Transactions on Pattern Analysis and Machine Intelligence. (b)
Adapted from Vision Research, 33(2), David J. Field, Anthony Hayes, and Robert F. Hess, Contour
integration by the human visual system: Evidence for a local “association field”, pp. 173–93, Copyright
(1993), with permission from Elsevier.
214 Elder

(a) Likelihood ratio (b)


φ = 90° 100

10 Likelihood ratio
1000000
1 4.9
3.6
0.1 100000
2.6
0.01 1.9
10000
<1.4

Gap (pixels)
φ = 0°
0 1000
<1.4
1.9 100
2.6
3.6
10

4.9
1
d = 1.23° <1.4 0 <1.4 1.9 2.6 3.6 4.9 6.7 9.2 13 17 >23
Gap (pixels)

Fig. 11.7  Association fields derived from the ecological statistics of contours. (a) Likelihood ratio for two
oriented elements to be on the same object boundary, adapted from Geisler et al. (2001). (b) Likelihood
ratio for two oriented elements to be neighbouring elements on the same object boundary.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 18 b and e, doi: 10.1167/2.4.5 © 2002, Association for
Research in Vision and Ophthalmology.

When defined over pairs of oriented elements, there are various ways to encode the principle
of good continuation. A straight-line interpolation between the elements, either between their
centres or their end-points, induces two interpolation angles (Figure 11.4): small values for these
angles indicate good continuation. However, Elder and Goldberg (2002) observed that these
angles are highly correlated for contours in natural scenes (Figure 11.8a), suggesting a recoding
into the difference and sum of these angles, which are approximately uncorrelated and represent
the cues of cocircularity and parallelism, respectively (Figure 11.8b). Kellman and Shipley (1991)
have used the term ‘relatability’ to refer to a particular constraint on these two angles found to be
predictive of contour completion phenomena.

Similarity
In the context of contour grouping, the principle of similarity suggests that elements with similar
photometric properties—brightness, contrast, colour, texture—are more likely to group than ele-
ments that differ on these dimensions. Psychophysically, the principle has been demonstrated in
a number of ways with dot patterns. Hochberg and Hardy (1960) showed that proximity ratios of
up to two can be overcome by intensity similarity cues, and contrast similarity is known to affect
the perception of Glass patterns (Earle 1999).
Elder and Goldberg (2002) explored the ecological statistics of similarity in edge grouping,
coding similarity in terms of the difference in brightness (α1 + β1) − (α2 + β2) and the difference in
contrast (α1 − β1) − (α2 − β2) between the edges (see Figure 11.4). They found that while the bright-
ness cue carries useful information for grouping, the contrast cue is relatively weak.
The edges shown in Figure 11.4 are consistent in contrast polarity: light matches light and dark
matches dark. However, it has been argued that grouping mechanisms should be insensitive to
contrast polarity (Grossberg and Mingolla 1985; Kellman and Shipley 1991), since polarity can
easily reverse along an object boundary due to variations in the background. On the other hand,
Bridging the Dimensional Gap 215

(a) Good continuation: Interpolation angles (b) Parallelism and cocircularity

300
150

100 200

Cocircularity cue: θji–θij (deg)


50 100
θji (deg)

0 0

–50 –100

–100 –200

–150 –300

–150 –100 –50 0 50 100 150 –300 –200 –100 0 100 200 300
θij (deg) Parallelism cue: θji+θij (deg)

Fig. 11.8  (a) The two angles formed when interpolating between two oriented elements are negatively
correlated. (b) Linear recoding into parallelism and cocircularity cues results in a more independent code.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 8 a and b, doi: 10.1167/2.4.5 © 2002, Association for
Research in Vision and Ophthalmology.

while Elder and Goldberg (2002) restricted their statistical study to pairs of elements of the same
contrast polarity, they observed that fewer than 13% of the associations in their original ground
truth dataset involved a reversal in contrast polarity. This suggests that contrast polarity could in
fact be an important cue for contour grouping. Is there behavioural evidence that humans take
advantage of this cue?
Although the psychophysical record is a bit complex, the simple answer to this question is
yes. For example, contrast reversals are known to essentially eliminate the perception of Glass
patterns (Glass and Switkes 1976), consistent with the use of polarity to disambiguate grouping.
Similarly, Elder and Zucker (1993) found that contrast reversal eliminated the benefit of bound-
ary grouping cues in fragmented contour stimuli, and Field, Hayes, and Hess (2000) found that
contrast reversals reduced the detectability of contours embedded in random-oriented element
distractors. Further, while Rensink and Enns (1995) found that polarity reversal did not appear
to weaken the contour grouping required to elicit the Muller-Lyer illusion, Chan and Hayward
(2009) found that careful control of junction effects does reveal a sensitivity to contrast polarity
in this illusion.
On the other hand, Gilchrist et  al. (1997) found that the effect of contrast on pairwise ele-
ment grouping depends on the shape of the elements, and, using modified forms of the Elder
and Zucker stimuli, Spehar (2002) found that the effect of contrast reversal was greatly reduced
if the reversal does not coincide with an orientation discontinuity. Together, these results suggest
an interesting perceptual interaction between geometric relationships such as good continuation
and similarity cues.
While these behavioural results all involve simple synthetic stimuli, Geisler and Perry (2009)
have more recently reported a joint study of the ecological statistics of contours with a compan-
ion psychophysical investigation modelled on these statistics. This study not only confirmed and
quantified the contrast polarity cue for natural scenes, but showed that humans do in fact take
advantage of this cue, in a way that is consistent with the underlying statistics.
216 Elder

Cue combination
One of the central questions in perceptual organization concerns how the brain combines mul-
tiple cues to determine the association between pairs of local elements. Historically this problem
has often been posed in terms of competitive interactions. In natural scenes, however, disparate
weak cues can often combine synergistically to yield strong evidence for a particular grouping.
It is perhaps this aspect of perceptual organization research that has benefited the most from the
modern probabilistic approach (see also both chapters by Feldman, this volume).
Geisler et al. (2001) used a non-parametric statistical approach, jointly modelling the ecological
statistics of proximity and good continuation cues as a 3D histogram. They showed that human
psychophysical performance on a contour detection task parallels these statistics, suggesting that
the brain combines these two classical Gestalt cues in a near-optimal way. Elder and Goldberg
(2002) demonstrated that the ecological statistics of proximity, good continuation, and similarity
cues can be coded in such a way as to be roughly uncorrelated, so that to a first approximation
the Gestalt laws can be factored: the likelihood of a particular grouping can be computed as the
product of the likelihoods for each individual grouping cue.
Elder and Goldberg’s approach also allowed quantification of the statistical power of each
Gestalt cue, which they quantified as the reduction in the entropy of the grouping decision deriv-
ing from observation of the cue. They found that the cue of proximity was by far the most power-
ful, reducing the entropy by roughly 75%, whereas good continuation and similarity cues, while
important, reduced entropy by roughly 10% each. They further demonstrated that the most accu-
rate grouping decisions are made by combining all of these cues optimally according to the proba-
bilistic model, trained on the ecological statistics of natural images.

Global contour extraction


In order to exploit these local Gestalt cues for contour grouping, we must somehow relate the
local pairwise probabilities linking two oriented elements to the probability of a global curve.
Geisler et al. (2001) proposed using a threshold on the local probability and a simple rule of ‘tran-
sitivity’: if element A groups with element B, and element B groups with element C, then declare
that element A must group with element C. This principle matches the set statistics studied by

(a) (b) (c)

Fig. 11.9  Common topological errors resulting from feed-forward grouping algorithms. (a) Bifurcations
that can result from a transitivity rule. (b–c) Self-intersections that can also be produced by shortest-path
algorithms. The intersections in (b) have non-unit rotation indices and can thus be weeded out easily;
however the contour in (c) has the correct rotation index and therefore is more difficult to detect.
(a) Reprinted from Vision Research, 41(6), W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly, Edge co-occurrence
in natural images predicts contour grouping performance, pp. 711–24, Copyright (2001), with permission from
Elsevier. Adapted from James H. Elder and Stephen W. Zucker, ‘Computer Contour Closure’. In Bernard Buxton
and Roberto Cipolla (eds), Proceedings of the 4th European Conference on Computer Vision, pp. 399–412,
DOI: 10.1007/BFb0015553 Copyright © 1996, Springer-Verlag. With kind permission from Springer Science and
Business Media.
Bridging the Dimensional Gap 217

Geisler et al. (2001), which do not discriminate the sequencing of elements along the contour.
However, as a consequence, this transitivity principle does not discriminate between simple (i.e.
non-intersecting) curves and more complex topologies, including contours with bifurcations
and intersections (Figure 11.9), and generally yields ‘textures’ of oriented elements as opposed
to bounding contours. For this reason, we will focus here on a common probabilistic approach,
which is to model contours as first-order Markov chains.

The markov assumption


A general probabilistic model for a contour as a discrete sequence of k-oriented elements involves
a joint distribution of dimensionality k: far too much to learn for any biological or machine vision
system without some additional assumptions. A  common assumption is that this joint distri-
bution factors along the sequence, so that the likelihood that a specific sequence of edges cor-
responds to a real contour in the image can be expressed as the product of the probabilities of
each local pairwise association between adjacent edges in the sequence (Elder and Zucker 1996;
Elder and Goldberg 2001; Cohen and Deschamps 2001; Elder, Krupnik, and Johnston 2003). This
assumption greatly simplifies the probabilistic model: the local pairwise grouping probabilities
are now sufficient statistics for computing maximum probability contours, and it becomes natural
to represent the grouping problem as a graph, where the vertices of the graph represent the ori-
ented elements in the image and the edges of the graph represent sequential grouping hypotheses
between pairs of elements. Simple contours are then represented as acyclic paths in this graph,
and the maximum probability contour connecting two elements in the image is represented as the
most probable path in this graph connecting the two corresponding vertices.
Critically, the Markov property also confers an optimal substructure property: any piece of a
maximum probability contour must itself have maximum probability. This property allows maxi-
mum probability contours to be computed progressively in polynomial time, via shortest-path
methods such as Dijkstra’s algorithm or dynamic programming (Elder and Zucker 1996; Elder
et al. 2003).
In mapping this model to visual cortex, one might be concerned about the computation time
if inferring a contour requires sequentially passing probabilities between individual neurons rep-
resenting oriented elements. However the optimal substructure property raises the possibility of
a hierarchical computation: earlier visual areas could compute optimal fragments which are then
stitched together by later visual areas to infer optimal global contours, leading to a logarithmic
improvement in computation time (for related hierarchical algorithms for perceptual organiza-
tion, see Joo et al., this volume).
Many models and computer vision algorithms exploit local Gestalt cues using such a Markov
assumption, either explicitly or implicitly (e.g. Lowe 1985; Sha’ashua and Ullman 1988; Jacobs
1996; Elder and Zucker 1996; Mahamud, Thornber, and Williams 1999; Cohen and Deschamps
2001; Elder et al. 2003; Wang and Siskind 2003; Estrada and Elder 2006). For example, the shortest
path from each edge back to itself can be computed (Elder and Zucker 1996) in order to find the
maximum probability closed contours in an image, presumed to correspond to the boundaries of
the major objects in the scene. In interactive applications, users can specify starting and ending
edges, and the maximum probability contours connecting them can be computed (Mortensen
and Barrett 1995, 1998; Elder and Goldberg 2001; Cohen and Deschamps 2001).
A significant advantage of the probabilistic approach is that the parameters of the model can be
learned in a straightforward way from the ecological statistics of contour grouping (Geisler et al.
2001; Elder and Goldberg 2002), avoiding the ad hoc selection of algorithm parameters and opti-
mizing performance on natural scenes (Elder et al. 2003; Estrada and Elder 2006).
218 Elder

Limitations of the markov assumption


Unfortunately, these first-order Markov models generally do not perform well on natural scenes
unless augmented by additional problem-domain knowledge (Elder et al. 2003) or user interac-
tion (Mortensen and Barrett 1995, 1998; Elder and Goldberg 2001; Cohen and Deschamps 2001).
There are a number of reasons for this. One is the problem of topology. Unlike the transitiv-
ity assumption, shortest path algorithms based upon the Markov assumption enforce the ordi-
nality constraint, and thus eliminate incorrect topologies caused by bifurcation (Figure 11.9a).
Unfortunately, these algorithms are still not guaranteed to extract a contour of the correct topology
as embedded in the image plane (Elder and Zucker 1996). Filtering the output of the algorithm to
retain only those curves with unit rotation index does eliminate some incorrect topologies (Figure
11.9b), but this breaks the optimality of the algorithm, and other incorrect topologies will still
exist that cannot be filtered out as easily (Figure 11.9c).
A second major problem is that the Markov property restricts the prior over contour length
to have an exponential form, and this prior cannot be changed within the constraints of
polynomial-time shortest-path algorithms. This induces a prior bias towards small contours, so
that algorithms tend to extract only small parts of a shape rather than an entire shape.
Finally, It has been shown that real object boundaries are not in fact strictly Markov (Ren,
Fowlkes, and Malik 2008), signalling that higher-order statistical properties of shape may be
important in distinguishing correct contours. Yet the Markov restriction means that these
higher-order, more global properties of object shape cannot be used to help the algorithm distin-
guish real object boundaries from conjunctions of fragments that should not be grouped together.
(See Feldman, ‘Probabilistic Models of Features and Objects’, this volume, for a more complete
discussion of local vs global features.)
An alternative is to explicitly incorporate into the probabilistic model a realistic prior over con-
tour length, and to explicitly detect and filter out topological errors as they occur. Unfortunately,
these modifications cannot be accommodated within the framework of efficient polynomial-time
shortest-path algorithms. Instead, one can apply breadth-first search techniques with pruning
that monotonically extend current contour hypotheses by selecting the most probable continua-
tions, but such approximate methods are not guaranteed to find the most probable contours and
in practice do not work that well without further constraints. An example from Elder et al. (2003)
is shown in Figure 11.10 (right column). The algorithm proceeds by greedy search over the expo-
nential space of possible contours, monotonically increasing the length of the contour hypotheses,
and pruning those of lower probability. As can be seen in this example, closed contours cor-
responding to parts of objects can sometimes be computed in this way, but for complex scenes
it is rare that the entire object boundary is recovered exactly, unless additional domain-specific
constraints are brought to bear. (The remainder of Figure 11.10 will be discussed in Section 7.2.)
These limitations can sometimes be managed if there are additional contextual constraints that
can be used to narrow the problem. For example, in interactive applications users can guide the algo-
rithm to connect a small number of specified points on the boundary of interest, effectively break-
ing the problem down into more manageable subproblems (Mortensen and Barrett 1995, 1998;
Elder and Goldberg 2001; Cohen and Deschamps 2001). In search applications, where something is
known about the objects of interest, appearance constraints can be incorporated into the local prob-
abilities to reduce the effects of clutter. Figure 11.11 shows an example where the goal is to extract
bounding contours of skin regions (Elder et al. 2003). Here the hue of the skin is a sufficiently strong
constraint to yield the correct global contours. On the other hand, humans seem able to organize
contours in cluttered natural scenes even without such strong constraints. This suggests that we must
be able to exploit more general global cues not captured by the first-order Markov model.
Input image Spatial prior Multi-scale Single-scale

Fig. 11.10  Contour grouping algorithms. Right column: single scale. Left three columns: multi-scale,
with coarse-to-fine feedback.
© 2006 IEEE. Reprinted, with permission, from Estrada, F.J., Elder, J.H., Multi-Scale Contour Extraction Based on
Natural Image Statistics, IEEE Conference on Computer Vision and Pattern Recognition Workshop.

Fig. 11.11  Using the first-order Markov model with a strong prior for skin hue.
© 2006 IEEE. Reprinted, with permission, from Johnston, L., & Elder, J. H., Efficient Computation of Closed
Contours using Modified Baum-Welch Updating. IEEE Workshop on Perceptual Organization in Computer Vision.
220 Elder

Going global: beyond the first-order model


Among the many possible global shape cues that might drive the perceptual organization of con-
tours, there are four that have been studied in some detail:  closure, convexity, symmetry, and
parallelism. I will review what is known about each below.

Closure
The classical Gestalt demonstration shown in Figure 11.12 is often taken to demonstrate a princi-
ple of closure overcoming the principle of proximity to determine the perceptual organization of
contours (Koffka 1935). Note, however, that the percept here can potentially be explained as the
result of a principle of good continuation, without requiring the invention of a separate factor of
closure. This close relationship between good continuation and closure has continued to confound
in more recent work. Using the methodology of Field et al. (1993), Kovacs and Julesz (1993) found
superior detection performance for closed, roughly circular contours, compared to open curvilin-
ear controls. However, the good continuation cues between the open and closed stimuli were not
perfectly equated in these experiments. For example, the open controls contained many inflections
in curvature, whereas the closed contours were nearly circular. These differences are important, as
it has been shown that changes in curvature sign can greatly reduce the detectability of contours
(Pettet 1999).
Tversky, Geisler, and Perry (2004) addressed this question directly, using the Field et al. (1993)
methodology to compare detection for circular contours and S-shaped contours matching the
circular contours exactly in curvature, save for a single inflection point. They found a small advan-
tage for closed contours, but argued that this advantage could potentially be due to probabil-
ity summation over smaller groups of elements. Thus, despite its long history in the perceptual
organization literature, recent findings suggest that closure may play at most a minor role in the
detection of contours.
Does this mean that the Gestaltists were wrong? Not necessarily. Koffka’s observations were not
that closure is a grouping cue per se, but rather that closure somehow profoundly determines the
final percept of form:
Ordinary lines, whether straight or curved, appear as lines and not as areas. They have shape, but they
lack the difference between an inside and an outside . . . If a line forms a closed, or almost closed, figure,
we see no longer merely a line on a homogeneous background, but a surface figure bounded by the line.
(Koffka 1935, p. 150)

The Gestaltists thus believed that closure, above and beyond the cue of good continuation,
determines the percept of solid form. In this spirit, Elder and Zucker (1993, 1994, 1998a)
argued for closure as a perceptual bridge from 1D contour to 2D shape, i.e. as a perceptual
form of the Jordan Curve Theorem (see ‘Introduction’). They investigated this idea through
a series of 2D shape discrimination experiments in which they manipulated the degree of

Fig. 11.12  The role of closure in perceptual organization. One perceives four large rectangles even
though this requires grouping together more distant pairs of contour fragments.
Reproduced from Kurt Koffka, Principles of Gestalt Psychology, Harcourt, Brace, and World, New York, Copyright
© 1935, Harcourt, Brace, and World.
Bridging the Dimensional Gap 221

closure, but held the shape information constant. They showed that small changes in good
continuation and closure could yield large changes in shape discriminability (Figures 11.13a–
b). Moreover, the task seems to remain fairly difficult when good continuation is restored
without closure (Figure 11.13c), suggesting that the property of closure contributes something
above and beyond good continuation cues. In support of this, Garrigan (2012) has recently
shown that contour shape is more effectively encoded in memory when the contour is closed
than when it is open.
Some models for global contour extraction based on the first-order Markov assumption incor-
porate closure by explicitly searching for closed cycles of local elements (Elder and Zucker 1996;
Elder et al. 2003), but these first-order Markov models still suffer from the problems discussed
above. Moreover, the statistical structure of a cycle is profoundly different from that of a Markov
chain, as closure induces more global statistical dependencies between local elements. In this
sense there is a mismatch between the first-order Markov model used by these methods and the
goal of recovering closed contours. Future work will hopefully reveal more principled ways to
incorporate closure into models of global contour extraction: in ‘Generative Models of Shape’ we
discuss one promising direction.

Convexity
Convexity has long been known as a figure/ground cue (Rubin 1927) (see also the chapters by
Peterson, by Fowlkes and Malik, and by Kogo and van Ee in this volume). In the computer vision
literature, Jacobs (1996) demonstrated its utility for grouping contour fragments that can then be
used as features for object recognition, and Liu, Jacobs, and Basri (1999) subsequently developed
a novel psychophysical method to demonstrate that the human visual system also uses a convex-
ity cue for grouping contours. Their method relies on the finding of Mitchison and Westheimer
(1984) that judging the relative stereoscopic depth of two contour fragments becomes more dif-
ficult when the fragments are arranged to form a configuration with good continuation and
closure. Using an elaboration of this method, they showed that stereoscopic thresholds are sub-
stantially higher for occluded contour fragments that can be completed to form a convex shape,
relative to fragments whose completion induces one or more concavities. This suggests that the
visual system is using convexity as a grouping cue. A more recent computer vision algorithm

(a) (b) (c)

Fig. 11.13  Closure as a bridge from 1D to 2D shape. (a) Shape discrimination is easy when good
continuation and closure are strong. (b) Discrimination becomes hard when good continuation and
closure are weak. (c) Discrimination is of intermediate difficulty when good continuation is strong but
closure is weak.
Reprinted from Vision Research, 33 (7), James Elder and Steven Zucker, The effect of contour closure on the rapid
discrimination of two-dimensional shapes, pp. 981–91, Copyright © 1993, with permission from Elsevier.
222 Elder

that uses convexity as a soft cue, allowing contours that are highly but not perfectly convex, has
been show to outperform Jacob’s original algorithm on a standard dataset (Corcoran, Mooney,
and Tilton 2011).

Symmetry and parallelism


The Gestaltists identified symmetry as a factor of ‘good shape’, and a determinant of figure/ground
organization (Koffka 1935) (see also Peterson, this volume, and van der Helm, this volume). In the
computer vision literature, symmetry has been used in numerous contour grouping algorithms
(e.g. Mohan and Nevatia 1992; Zisserman et al. 1995; Stahl and Wang 2008). Kanizsa (1979), how-
ever, has observed that symmetry appears easily overruled when pitted against principles of good
continuation and convexity. Parallelism has been identified as a factor determining the perceptual
simplicity of line configurations (Arnheim 1967), and as a grouping cue in computer vision algo-
rithms (Lowe 1985; Jepson, Richards, and Knill 1996; Jacobs 2003).
Despite this relatively long history, definitive psychophysical evidence for the role of symmetry
and parallelism in contour grouping has come relatively recently. Using psychophysical meth-
ods derived from the attention literature (Behrmann, Zemel, and Mozer 1998), Feldman (2007)
showed that comparison of features lying on pairs of line segments is significantly faster if the
segments are parallel or mirror-symmetric, suggesting a fast grouping of the segments based
upon these cues. Using the paradigm of Field et al. (1993), Machilsen, Pauwels, and Wagemans
(2009) have recently demonstrated enhanced detectability of bilaterally symmetric vs asymmetric
closed forms, suggesting a role for more complex, global symmetry processing in contour group-
ing. Physiologically, it is known that bilaterally symmetric patterns differentially activate human
extrastriate visual areas V3, V4, V7, and LO, and homologous areas in macaque cortex (Sasaki
2007).

Feedback
We have seen the importance of both local cues and global cues in the perceptual organization
of contours. How could these most effectively be brought together, given what is known of the
functional architecture of primate visual cortex?
In contrast to V1, many neurons in extrastriate visual area V2 of macaque are selective for
both real and illusory contours (von der Heydt, Peterhans, and Baumgartner 1984; see also van
Lier and Gerbino, and Kogo and van Ee, this volume). Illusory contours are the result of modal
completion processes (see ‘Introduction’) that generate percepts of contours in the absence of
local contrast, by extrapolating from nearby, geometrically aligned inducers—see Figure 11.15
(bottom right) for an example. Illusory contours are thus a direct manifestation of contour
grouping processes, in this case the result of grouping together contour fragments on spatially
separated inducers. The selectivity of neurons in V2 for illusory contours suggests that the
transformation of the visual input from V1 to V2 involves the grouping of contour fragments
based upon Gestalt principles of proximity and good continuation. This computation may be
supported by long-range horizontal connections that, at least in areas 17 and 18 of cat, are
known to run between cortical columns with similar orientation specificity (Gilbert and Wiesel
1989), although input from later visual areas may be equally or even more important in this
computation.
Indeed, while physiological models for contour integration based upon good continuation
principles have been based primarily upon these cortical networks in area V1 and V2 (Li 1998;
Yen and Finkel 1998), fMRI data in both human and macaque implicate not only V1 and V2 but
other extrastriate visual areas (VP, V4, LOC) in contour grouping. Although sketches of a more
Bridging the Dimensional Gap 223

complete physiological model for contour grouping have begun to emerge (e.g. Roelfsema 2006),
the overall computational architecture is still largely unknown.
One possibility is that the computation is feedforward. For example, progressively more global
and selective representations may be computed in V1, V2, V4, culminating in a neurally local-
ized representation of entire objects in TE/TEO (Thorpe 2002; see also Joo et al, this volume).
However, the functional architecture of visual cortex suggests that recurrent feedback might also
be involved. Figure 11.14(b) shows the known connectivity of visual areas in the object pathway
of primate brain. In addition to the feedforward sequence V 1 → V 2 → V 4 → TE/TEO emphasized
in prior work (Thorpe 2002), there are feedback connections from each of the later areas to each
of the earlier areas, as well as additional feedforward connections. How can we determine empiri-
cally if these feedback connections play a role in the perceptual organization of contours into
representations of global shape?

Timing
One way to test the plausibility of computational architectures for perceptual organization is to
examine the timing of stimulus-driven perceptual and neural events relative to the stimulus onset
and to each other. Here I will review a range of results using varied methodological paradigms
that together suggest a strong role for feedback in the perceptual organization of contours.

Animal detection
Some models of contour formation have been based upon recurrent interactions within and
between areas V1 and V2 (e.g. Neumann and Sepp 1999; Gintautas et al. 2011). However, psycho-
physical results on the animal detection task (Figure 11.2) show that humans can perform above
chance using contour shape alone for stimulus presentations as short as 10 msec, even with strong

(b)
V1 V2
PG
d
d
V3 V1
V4
TEO TE

TE
VTF

(a) TG
Feedback
Generative
model 36,35
STP
7a
TF
TE

Feedforward TEMPORAL– ‘WHAT’

Fig. 11.14  Feedback in the human object pathway. (a) Feedback of global shape hypotheses may be
used to condition grouping in earlier visual areas. (b) Connectivity in primate object pathway. Solid
arrowheads indicate feedforward connections, open arrowheads indicate feedback connections.
From Leslie G. Ungerleider, Functional Brain Imaging Studies of Cortical Mechanisms for Memory, Science 270
(5237), pp. 769–775, Copyright © 1995, The American Association for the Advancement of Science. Reprinted
with permission from AAAS.
224 Elder

backward masking (Elder and Velisavljević 2009). While inferring underlying mechanisms from
these results is complicated by the unknown degree of temporal blurring in the cortical network,
roughly speaking this result suggests that at least on some trials, recurrencies involving delays
much greater than 10 msec may not be involved, and this constrains the class of computations
that might underlie performance on these specific trials. For example, Gintautas et al. (2011) have
modelled contour detection based upon a lateral connection network in V1, estimating that each
iteration of the network should take on the order of 37.5 msec. This appears to be too long to
explain the most rapid trials in the animal detection task.
On the other hand, Elder and Velisavljević (2009) also found that performance on the animal
task improves continuously up to at least 120-msec stimulus duration, leaving open the pos-
sibility of recurrence for harder trials. Similarly, in animal detection experiments measuring
reaction time (e.g. Thorpe, Fize, and Marlot 1996), most attention has focused on the fastest
trials, where evoked potentials correlated with the stimulus emerge as soon as 150 msec after
stimulus onset, leaving little time for recurrence or feedback. Average reaction times, however,
are much longer, closer to 500 msec, and the distribution has a long positive tail with many
reaction times greater than 600 msec, leaving ample time for recurrence and/or feedback for
most trials. Further, more recent evidence suggests that visual signals may arrive in higher
areas much faster than previously thought (Foxe and Simpson 2002), allowing sufficient time
for feedback even on the faster trials (see also Self and Roelfsema, this volume, on the limits of
feed-forward processing).

Border ownership
Physiologically, it is known that selective response to higher-order contour properties depend-
ent upon contour grouping emerges later in time. For example, in V2, while edge signals emerge
within 30 msec of stimulus onset and peak roughly 100 msec post-stimulus, border-ownership
signals emerge roughly 80 msec after stimulus onset, peaking 130–180 msec post-stimulus.
Importantly, this delay does not appear to depend upon the spatial extent of the contour, arguing
against lateral recurrence and suggesting instead a role for feedback from higher visual areas with
a round-trip time delay of 30–80 msec (Craft et al. 2007; see also Kogo and van Ee, this volume).

Illusory contours and TMS


Another window on the cortical mechanisms underlying contour grouping is provided by experi-
ments employing transcranial magnetic stimulation (TMS). Applied to early visual areas, TMS
blocks the perception of briefly presented stimuli when applied 30 msec prior to stimulus onset
and up to 50 msec after stimulus onset (Corthout et al. 1999). Intriguingly, TMS is also effective
in blocking stimulus perception when applied during a second time window, 80–120 msec after
stimulus onset (Walsh and Cowey 1998; Lamme and Roelfsema 2000), again suggesting a role for
feedback, this time with a round-trip time delay of 30–150 msec.
Numerous studies have suggested an involvement of feedback from temporal areas to V1
and V2 in the formation of illusory contour percepts (Halgren et  al. 2003; Murray, Bennett,
and Sekuler 2002; Yoshino et al. 2006), but a more recent TMS study (Wokke et al. 2013) pro-
vides perhaps the most direct evidence for the causal role of feedback in bridging the gap from
one-dimensional contour fragments to the perception of global shape. Human observers were
shown pairs of illusory shape stimuli (Figure 11.15, lower right). In one stimulus the inducers
were aligned to form an illusory square, while in the other the inducers were rotated slightly to
create a curved illusory shape. Observers were asked to judge which of the stimuli more closely
resembled a square. On some trials TMS was applied, either at the occipital pole to disrupt
processing in V1/V2, or in the lateral occipital lobe to disrupt processing in LO. Application of
Bridging the Dimensional Gap 225

V1/V2 LO
* *
* *
100 100
*

95 95
Correct responses (%)

Correct responses (%)


90 90

85 85

80 80
None 100– 160– 240– None 100– 160– 240–
122 182 262 122 182 262
TMS time window (ms) TMS time window (ms)

V1/V2 LO

Fig. 11.15  Evidence for the role of feedback in bridging the dimensional gap. TMS was found to disrupt
illusory contour shape judgments later when applied to V1/V2 than when applied to LO – see text for
details.
Reproduced from Martijn E. Wokke, Annelinde R.E. Vandenbroucke, H. Steven Scholte, Victor A.F. Lamme,
Psychological Science, Confuse Your Illusion: Feedback to Early Visual Cortex Contributes to Perceptual
Completion, 24 (9), pp. 63–71, © 2013, SAGE Publications. Reprinted by Permission of SAGE Publications.

TMS was found to disrupt performance at both locations, but interestingly, the effect depended
critically on the timing. In LO, TMS disrupted processing when the pulse occurred 100–122
msec after stimulus onset, whereas in V1/V2, processing was disrupted when the pulse was
applied later, 160–182 msec after stimulus onset. This is strongly suggestive of a feedback pro-
cess in the grouping of inducer contour fragments to form shape percepts, with a one-way
feedback time constant (LO to V1/V2) of 40–80 msec.
226 Elder

In summary, numerous behavioural and physiological results suggest a role for feedback in
bridging the gap from contour to shape. One purpose of this feedback might be to allow global
features computed and available first in higher visual areas to condition the local associations
computed in V1/V2. In order to further develop this idea, a more formal computational theory
is called for.

Computational models
Using local Gestalt cues alone to drive shortest-path or approximate search algorithms based on
the first-order Markov assumption fails in the general case. However, Estrada and Elder (2006)
have demonstrated that a relatively simple elaboration of the approximate search scheme can sub-
stantially improve performance. The idea is to place the Markov model within a coarse-to-fine
scale-space framework (Figure 11.10—left three columns). Specifically, the image is represented
at multiple scales (i.e. levels of resolution) by progressive smoothing with a Gaussian filter, and
breadth-first search is first initiated at the coarsest scale. Since the number of features at this scale
is greatly reduced, the search space is much smaller and the algorithm generally finds good, coarse
blob hypotheses that code the rough location and shape of the salient objects in the scene. These
hypotheses are then fed back to the next finer level of resolution, where they serve as probabil-
istic priors, conditioning the likelihoods and effectively shrinking the search space to promising
regions of the image.
This is a very specific kind of feedback model that does not incorporate any sophisticated
global features or probabilistic model over shapes, and is not really recurrent, but it does dem-
onstrate the potential performance advantages of feedback. A number of more general models
for incorporating feedback into perceptual organization have been advanced (Grossberg 1976;
Cavanagh 1991; Hochstein and Ahissar 2002; Lee and Mumford 2003; Tu et al. 2005; Yuille and
Kersten 2006; also Self and Roelfsema, and van Leeuwen, this volume). Figure 11.14a sketches a
conceptual model that is broadly consistent with these prior ideas. For concreteness, let us sup-
pose that earlier areas (e.g. V1, V2) in the visual pathway compute and encode specific partial
grouping hypotheses corresponding to fragments of contours. These fragment hypotheses are
communicated to higher-order areas (e.g. V4 or TEO), which use them and more global princi-
ples to generate complete hypotheses of object shape. These global hypotheses are then fed back
to earlier visual areas to sharpen selectivity for other fragments that might support these global
hypotheses.
Neurons in higher areas of the object pathway in primate visual cortex encode shape informa-
tion using a more global representation than neurons in early visual areas (Pasupathy and Connor
1999; Connor, Brincat, and Pasupathy 2007; see also van Leeuwen, this volume). In order to feed
back useful information, the brain must be able to convert this global representation to the more
local, spatiotopic representation native to these earlier areas. Because there will always be uncer-
tainty about the shapes being represented (due to grouping ambiguity, for example), this mapping
is probabilistic. A probabilistic model capable of randomly generating observed data consistent
with an internal representation is known as a generative model. One of the great strengths of a
generative model of shape is its capacity to produce probable global shape hypotheses given even
partial shape information, thus contributing to the grouping process. In the final part of this chap-
ter we consider what form such a generative model might take.

Generative models of shape


While there are many computational theories and algorithms for shape representation, few are
truly generative, and those that are have generally not been fully developed and tested (e.g. Leyton
Bridging the Dimensional Gap 227

1988). A  key problem in establishing a generative model of shape is to guarantee that gener-
ated shape hypotheses have valid topology. For example, if the goal is to recover a simple closed
contour, the model should only generate simple, closed curve hypotheses. While this has been a
major limitation of prior contour-based models (e.g. Dubinskiy and Zhu 2003), a recently pro-
posed alternative approach based on spatial perturbations of perceptual space called formlets can
provide this guarantee (Grenander, Srivastava, and Saini 2007; Oleskiw, Elder, and Peyré 2010;
Elder et al. 2013).
The formlet approach involves the application of coordinate transformations of the planar space
in which a shape is embedded. This idea can be traced back at least to D’Arcy Thompson, who
considered specific classes of global coordinate transformations to model the relationship between
the shapes of different animal species (Thompson 1917). Coordinate transformation methods for
representing shape have been explored more recently in the field of computer vision (e.g. Jain,
Zhong, and Lakshmanan 1996; Sharon and Mumford 2006) and for developmental studies of
human shape selectivity and categorization (Ons and Wagemans 2011, 2012), but these methods
do not in general preserve the topology of embedded contours.
Formlets are based on the key insight that, while general smooth coordinate transformations
of the plane will not preserve the topology of an embedded curve, it is straightforward to design
a specific family of diffeomorphic transformations (i.e. smooth 1:1 mappings) that will. It then
follows immediately by induction that a generative model based upon arbitrary sequences of dif-
feomorphisms will preserve topology.
Specifically, a formlet is defined to be a simple, isotropic, radial deformation of planar space that
is localized within a specified circular region of a selected point in the plane. The formlet family
comprises formlets over all locations and spatial scales. While the gain of the deformation is also
a free parameter, it is constrained to satisfy a simple criterion that guarantees that the formlet is
a diffeomorphism. Since topological changes in an embedded figure can only occur if the defor-
mation mapping is either discontinuous or non-injective, these diffeomorphic deformations are
guaranteed to preserve the topology of embedded figures. Figure 11.16 shows some examples.

Evaluation
One way to evaluate and compare generative shape models is to take advantage of their ability to
generate complete shape hypotheses given only partial data. Specifically, one can use the models

Fig. 11.16  Shapes generated by random formlet composition over the unit circle. Top row: shapes
resulting from a sequence of five random formlets. The red dot and circle indicate formlet location and
scale, respectively. Bottom row: e\xample shapes produced from the composition of many random
formlets.
© 2010, IEEE. Adapted with permission, from T.D. Oleskiw, J.H Elder, and G. Peyré, On growth and formlets:
Sparse multi-scale coding of planar shape, IEEE Conference on Computer Vision and Pattern Recognition.
228 Elder

to address the problem of contour completion (Figure 11.3), using an animal shape dataset, based
on the conceptual model illustrated in Figure 11.14.
Elder et al. (2013) used this method to compare the formlet model with a contour-based shape-
let model (Dubinskiy and Zhu 2003) that is not guaranteed to preserve topology. For each shape
in the dataset, they simulated the occlusion of a single random section of the contour, and used
each model and a variation of matching pursuit (Mallat and Zhang 1993) to approximate the
animal shapes, allowing the models to see only the visible portions of the shapes. (Note that these
models could in principle handle more than one occlusion.) They then measured the residual
error between the model and target for both the visible and occluded portions of the shapes, as a
function of the number of model basis functions (shapelets or formlets) employed. Performance
on the occluded portions, where the model is under-constrained by the data, reveals how well the
structure of the model captures properties of natural shapes.
Figure 11.17 shows an example result for this experiment. While shapelet pursuit intro-
duces topological errors in both visible and occluded regions, formlet pursuit remains topo-
logically valid, as predicted. Figure 11.18 shows quantitative results on a database of animal
shapes. While the shapelet and formlet models achieve comparable error on the visible por-
tions of the boundaries, on the occluded portions the error is substantially lower for the
formlet representation. This suggests that the structure of the formlet model better captures
regularities in the shapes of natural objects.

(Feed)back to the future


Human perceptual organization relies profoundly on contour-grouping mechanisms to recover
the boundaries of objects in the scene, and to infer their 2D and 3D shapes. Although classi-
cal local Gestalt cues such as proximity, good continuation, and similarity are very powerful, by
themselves they appear to be insufficient to support reliable global contour extraction in com-
plex natural scenes. This suggests that the human perceptual organization system is capable of
exploiting more global cues that are not easily accommodated by standard first-order Markov
models. The demonstrated performance advantages of coarse-to-fine methods for contour group-
ing (Estrada and Elder 2006), together with the massive feedback connections that are known to
pervade primate object pathway (Van Essen et al. 1991; Ungerleider 1995), suggest that the human
brain may employ a recurrent computation to bring these global features to bear, allowing efficient

Fig. 11.17  Example of 30% occlusion pursuit with shapelets (red) and formlets (blue) for k = 0, 2, 4, 8,
16, 32 basis functions. Solid lines indicate visible contour, dashed lines indicate occluded contour.
Reprinted from Image and Vision Computing, 31(1), James H. Elder, Timothy D. Oleskiw, Alex Yakubovich, and
Gabriel Peyré, On growth and formlets: Sparse multi-scale coding of planar shape, pp. 1–13, Copyright © 2013,
with permission from Elsevier.
Bridging the Dimensional Gap 229

Shapelet visible Shapelet occluded


Formlet visible Formlet occluded

10% occlusion 30% occlusion


0.06 0.05

0.05 0.04
Normalized RMS error

Normalized RMS error


0.04
0.03
0.03
0.02
0.02

0.01 0.01

0 0
0 10 20 30 0 10 20 30
Number of components Number of components
Fig. 11.18  Results of occlusion pursuit evaluation. The formlet model is substantially more accurate than the
shapelet model on the occluded portions of the shapes. Black denotes error for the initial affine-fit ellipse.
Reprinted from Image and Vision Computing, 31(1), James H. Elder, Timothy D. Oleskiw, Alex Yakubovich, and
Gabriel Peyré, On growth and formlets: Sparse multi-scale coding of planar shape, pp. 1–13, Copyright © 2013,
with permission from Elsevier.

and reliable global contour extraction in complex natural scenes. This idea is supported by recent
physiological results (Wokke et al. 2013).
While global cues such as closure, convexity, symmetry, and parallelism could potentially be
computed in higher areas of object pathway and combined with local cues using standard cue
combination mechanisms, a more general theory identifies these higher areas with generative
shape representations capable of producing global shape ‘hallucinations’ based on contour frag-
ments computed in early visual cortex. These global shape hypotheses can then be fed back to
early visual areas to refine the segmentation.
The main problem in establishing such a generative model has been topology: prior models do
not guarantee that sampled shapes are simple closed contours. However, a recent novel framework
for shape representation provides this guarantee. The theory (Grenander et al. 2007; Oleskiw et al.
2010; Elder et  al. 2013), based upon localized diffeomorphic deformations of the image called
formlets, has its roots in early investigations of biological shape transformation (Thompson 1917).
The formlet representation is seen to yield more accurate shape completion than an alternative
contour-based generative model of shape, which should make it more effective at generating
global shape hypotheses to guide feedforward contour grouping processes.
While the nature of the computations underlying the perceptual organization of con-
tours into representations of shape is becoming clearer, there are still many unknowns. These
include: (1) What are the key statistical properties of shapes not captured by the first-order Markov
model? (2) To what degree is the human visual system tuned to these higher-order properties?
(3) How can a generative model like the formlet model be elaborated accurately to embody these
statistics? (4) How exactly do generated hypotheses condition selectivity in earlier visual areas?
We do not know exactly when these questions will be answered, but it seems certain that the
answer will come from the kind of closely coupled computational, behavioural and physiological
investigation that has led to recent progress in this field.
230 Elder

References
Arbelaez, P., M. Maire, C. Fowlkes, and J. Malik (2011). ‘Contour Detection and Hierarchical Image
Segmentation’. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5): 898–916.
Arnheim, R. (1967). Art and Visual Perception. Berkeley, CA: University of California Press.
Behrmann, M., R. S. Zemel, and M. C. Mozer (1998). ‘Object-Based Attention and Occlusion Evidence
from Normal Participants and a Computational Model’. Journal of Experimental Psychology Human
Perception and Performance 24: 1011–1036.
Blakemore, C., and J. Nachmias (1971). ‘The Orientation Specificity of Two Visual After-Effects’. Journal of
Physiology 213: 157–174.
Campbell, F., and J. Kulikowski (1966). ‘Orientation Selectivity of the Human Visual System’. Journal of
Physiology 187: 437–445.
Cavanagh, P. (1991). ‘What’s Up in Top-Down Processing?’ In Representations of Vision Trends and Tacit
Assumptions in Vision Research, edited by A. Gorea, pp. 295–304. Cambridge: Cambridge University Press.
Chan, L. K. H. and W. G. Hayward (2009). ‘Sensitivity to Attachments Alignment, and Contrast Polarity
Variation in Local Perceptual Grouping’. Attention, Perception and Psychophysics 71(7): 1534–1552.
Cohen, L. and T. Deschamps (2001). ‘Multiple Contour Finding and Perceptual Grouping as a Set of
Energy Minimizing Paths’. In Energy Minimization Methods in Computer Vision and Pattern Recognition
Lecture Notes in Computer Science 2134, pp. 560–575. Los Alamitos, CA: IEEE.
Connor, C., S. Brincat, and A. Pasupathy (2007). ‘Transformation of Shape Information in the Ventral
Pathway’. Current Opinion in Neurobiology 17: 140–147.
Corcoran, P., P. Mooney, and J. Tilton (2011). ‘Convexity Grouping of Salient Contours’. In Proceedings of
the International Workshop on Graph Based Representations in Pattern Recognition, Vol. 6658 of Lecture
Notes in Computer Science, edited by X. Jiang, M. Ferrer, and A. Torsello, pp. 235–244.
Corthout, E., B. Uttl, V. Walsh, M. Hallett, and A. Cowey (1999). ‘Timing of Activity in Early Visual
Cortex as Revealed by Transcranial Magnetic Stimulation’. NeuroReport 10: 2631–2634.
Craft, E., H. Schutze, E. Niebur, and R. von der Heydt (2007). ‘A Neural Model of Figure-Ground
Organization’. Journal of Neurophysiology 97: 4310–4326.
Dakin, S. (1997). ‘The Detection of Structure in Glass patterns Psychophysics and Computational Models’.
Vision Research 37: 2227–2246.
Dakin, S. (2001). ‘Information Limit on the Spatial Integration of Local Orientation Signals’. Journal of the
Optical Society of America A—Optics, Image Science, and Vision 18: 1016–1026.
Dubinskiy, A. and S. C. Zhu (2003). ‘A Multi-Scale Generative Model for Animate Shapes and Parts’. In
Proceedings of the 9th IEEE International Conference on Computer Vision, Vol. 1, pp. 249–256. Los
Alamitos, CA: IEEE.
Earle, D. C. (1999). ‘Glass Patterns Grouping by Contrast Similarity’. Perception 28(11): 1373–1382.
Elder, J. H. and S. W. Zucker (1993). ‘The Effect of Contour Closure on the Rapid Discrimination of
Two-Dimensional Shapes’. Vision Research 33(7): 981–991.
Elder, J. H. and S. W. Zucker (1994). ‘A Measure of Closure’. Vision Research 34(24): 3361–3370.
Elder, J. H. and S. W. Zucker (1996). ‘Computing Contour Closure’. In Proceedings of the 4th European
Conference on Computer Vision, pp. 399–412. New York. Springer.
Elder, J. H. and S. W. Zucker (1998a). ‘Evidence for Boundary-Specific Grouping’. Vision Research
38(1): 143–152.
Elder, J. H. and S. W. Zucker (1998b). ‘Local Scale Control for Edge Detection and Blur Estimation’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 20(7): 699–716.
Elder, J. H. and R. M. Goldberg (2001). ‘Image Editing in the Contour Domain’. IEEE Transactions on
Pattern Analysis and Machine Intelligence 23(3): 291–296.
Bridging the Dimensional Gap 231

Elder, J. H. and R. M. Goldberg (2002). ‘Ecological Statistics of Gestalt Laws for the Perceptual
Organization of Contours’. Journal of Vision 2(4): 324–353.
Elder, J. H., A. Krupnik, and L. A. Johnston (2003). ‘Contour Grouping with Prior Models’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 25(25): 661–674.
Elder, J. H. and A. J. Sachs (2004). ‘Psychophysical Receptive Fields of Edge Detection Mechanisms’. Vision
Research 44(8): 795–813.
Elder, J. H. and L. Velisavljević (2009). ‘Cue Dynamics Underlying Rapid Detection of Animals in Natural
Scenes’. Journal of Vision 9(7): 1–20.
Elder, J. H., T. D. Oleskiw, A. Yakubovich, and G. Peyré (2013). ‘On Growth and Formlets: Sparse
Multi-Scale Coding of Planar Shape’. Image and Vision Computing 31: 1–13.
Estrada, F. and J. H. Elder (2006). ‘Multi-Scale Contour Extraction Based on Natural Image Statistics’.
In IEEE Conference on Computer Vision and Pattern Recognition Workshop. Washington, DC: IEEE.
Feldman, J. (2007). ‘Formation of Visual “Objects” in the Early Computation of Spatial Relations’. Perception
and Psychophysics 69(5): 816–827.
Field, D., A. Hayes, and R. F. Hess (1993). ‘Contour Integration by the Human Visual System: Evidence for
a Local “Association Field”’. Vision Research 33(2): 173–193.
Field, D., A. Hayes, and R. Hess (2000). ‘The Roles of Polarity and Symmetry in the Perceptual Grouping of
Contour Fragments’. Spatial Vision 13(1): 51–66.
Foxe, J. and G. Simpson (2002). ‘Flow of Activation from V1 to Frontal Cortex in Humans’. Experimental
Brain Research 142: 139–150.
Garrigan, P. (2012). ‘The Effect of Contour Closure on Shape Recognition’. Perception 41: 221–235.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-Occurence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41(6): 711–724.
Geisler, W. S. and J. S. Perry (2009). ‘Contour Statistics in Natural Images: Grouping across Occlusions’.
Visual Neuroscience 26(1): 109–121.
Gilbert, C. D and T. N. Wiesel (1989). ‘Columnar Specificity of Intrinsic Horizontal and Corticocortical
Connections in Cat Visual Cortex’. Journal of Neuroscience 9(7): 2432–2443.
Gilchrist, I., G. Humphreys, M. Riddoch, and H. Neumann (1997). ‘Luminance and EDE Information
in Grouping: A Study Using Visual Search’. Journal of Experimental Psychology Human Perception and
Performance 23: 464–480.
Gintautas, V., M. Ham, B. Kunsberg, S. Barr, S. Brumby, C. Rasmussen, J. George, I. Nemenman,
L. Bettencourt, and G. Kenyon (2011). ‘Model Cortical Association Fields Account for the Time Course
and Dependence on Target Complexity of Human Contour Perception’. PLOS Computational Biology
7(10): 1–16.
Glass, L. and E. Switkes (1976). ‘Pattern Recognition in Humans: Correlations which Cannot Be Perceived’.
Perception 5: 67–72.
Grenander, U., A. Srivastava, and S. Saini (2007). ‘A Pattern-Theoretic Characterization of Biological
Growth’. IEEE Transactions on Medical Imaging 26(2): 648–659.
Grossberg, S. (1976). ‘Adaptive Pattern Classification and Universal Recoding: I. Parallel Development and
Coding of Neural Feature Detectors’. Biological Cybernetics 23: 121–134.
Grossberg, S. and E. Mingolla (1985). ‘Neural Dynamics of Form Perception: Boundary Completion,
Illusory Figures, and Neon Color Spreading’. Psychological Review 92: 173–211.
Halgren, E., J. Mendola, C. Chong, and A. Dale (2003). ‘Cortical Activation to Illusory Shapes as Measured
with Magnetoencephalography’. NeuroImage 18: 1001–1009.
Hawken, M. J. and A. J. Parker (1991). ‘Spatial Receptive Field Organization in Monkey V1 and its
Relationship to the Cone Mosaic’. In Computational Models of Visual Processing, edited by M. S. Landy
and J. A. Movshon, chap. 6, pp. 84–93. Cambridge, MA: MIT Press.
232 Elder

von der Heydt, R., E. Peterhans, and G. Baumgartner (1984). ‘Illusory Contours and Cortical Neuron
Responses’. Science 224: 1260–1262.
Hochberg, J. and D. Hardy (1960). ‘Brightness and Proximity Factors in Grouping’. Perceptual and Motor
Skills 10: 22.
Hochstein, S. and M. Ahissar (2002). ‘View from the Top Hierarchies and Reverse Hierarchies in the Visual
System’. Neuron 36(5): 791–804.
Hubel, D. H. and T. N. Wiesel (1968). ‘Receptive Fields and Functional Architecture of Monkey Striate
Cortex’. Journal of Physiology 195: 215–243.
Jacobs, D. (1996). ‘Robust and Efficient Detection of Salient Convex Groups’. IEEE Transactions on Pattern
Analysis and Machine Intelligence 18(1): 23–37.
Jacobs, D. (2003). ‘What Makes Viewpoint-Invariant Properties Perceptually Salient?’ Journal of the Optical
Society of America A 20(7): 1304–1320.
Jain, A., Y. Zhong, and S. Lakshmanan (1996). ‘Object Matching Using Deformable Templates’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 18(3): 267–278.
Jepson, A., W. Richards, and D. Knill (1996). ‘Modal Structure and Reliable Inference’. In Perception as
Bayesian Inference, edited by D. Knill and W. Richards, pp. 63–92. Cambridge: Cambridge University Press.
Johnston, L. and J. H. Elder (2004). ‘Efficient Computation of Closed Contours using Modified
Baum-Welch Updating’. In Proceedings of IEEE Workshop on Perceptual Organization in Computer
Vision, Los Alamitos, CA: IEEE Computer Society Press.
Jordan, C. (1887). Cours d’analyse, Vol. 3. Pris: Gauthier-Villars.
Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kellman, P. and T. Shipley (1991). ‘A Theory of Visual Interpolation in Object Perception’. Cognitive
Psychology 23: 142–221.
Koenderink, J. J. (1984). ‘What Does the Occluding Contour Tell us About Solid Shape?’ Perception
13: 321–330.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kovacs, I. and B. Julesz (1993). ‘A Closed Curve Is Much More than an Incomplete One: Effect of
Closure in Figure-Ground Discrimination’. Proceedings of the National Academy of Sciences of the USA
90: 7495–7497.
Kruger, N. (1998). ‘Collinearity and Parallelism are Statistically Significant Second Order Relations of
Complex Cell Responses’. Neural Processing Letters 8: 117–129.
Kubovy, M. and J. Wagemans (1995). ‘Grouping by Proximity and Multistability in Dot
Lattices: A Quantitative Gestalt Theory’. Psychological Science 6(4): 225–234.
Kubovy, M., A. O. Holcombe, and J. Wagemans (1998). ‘On the Lawfulness of Grouping by Proximity’.
Cognitive Psychology 35: 71–98.
Lamme, V. A. and P. R. Roelfsema (2000). ‘The Distinct Modes of Vision Offered by Feedforward and
Recurrent Processing’. Trends in Neuroscience 23(11): 571–579.
Lee, T. and D. Mumford (2003). ‘Hierarchical Bayesian Inference in the Visual Cortex’. Journal of the
Optical Society of America A 20(7): 1434–1448.
Leyton, M. (1988). ‘A Process-Grammar for Shape’. Artifical Intelligence 34: 213–247.
Li, Z. (1998). ‘A Neural Model of Contour Integration in the Primary Visual Cortex’. Neural Computation
10(4): 903–940.
Lindeberg, T. (1998). ‘Edge Detection and Ridge Detection with Automatic Scale Selection’. International
Journal of Computer Vision 30(2): 117–154.
Liu, Z., D. W. Jacobs, and R. Basri (1999). ‘The Role of Convexity in Perceptual Completion’. Vision
Research 39(25): 4244–4257.
Lowe, D. G. (1985). Perceptual Organization and Visual Recognition. Boston: Kluwer.
Bridging the Dimensional Gap 233

Machilsen, B., M. Pauwels, and J. Wagemans (2009). ‘The Role of Vertical Mirror Symmetry in Visual
Shape Detection’. Journal of Vision 9(12).
Mahamud, S., K. K. Thornber, and L. R. Williams (1999). ‘Segmentation of Salient Closed Contours
from Real Images’. In IEEE International Conference on Computer Vision, pp. 891–897. Los Alamitos,
CA: IEEE Computer Society.
Mallat, S. and Z. Zhang (1993). ‘Matching Pursuits with Time-Frequency Dictionaries’. In IEEE
Transactions on Signal Processing 41(12): 3397–3415.
Maloney, R., G. Mitchison, and H. Barlow (1987). ‘Limit to the Detection of Glass Patterns in the Presence
of Noise’. Journal of the Optical Society of America A—Optics and Image Science 4: 2236–2341.
Martin, D., C. Fowlkes, and J. Malik (2004). ‘Learning to Detect Natural Image Boundaries Using Local
Brightness, Color and Texture Cues’. IEEE Transactions on Pattern Analysis and Machine Intelligence
26(5): 530–549.
Mitchison, G. J. and G. Westheimer (1984). ‘The Perception of Depth in Simple Figures’. Vision Research
24(9): 1063–1073.
Mohan, R. and R. Nevatia (1992). ‘Perceptual Organization for Scene Segmentation and Description’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 14(6): 616–635.
Mortensen, E. N. and W. A. Barrett (1995). ‘Intelligent Scissors for Image Composition’. In SIGGRAPH’95
Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 191–
198. Los Angeles, CA: SIGGRAPH.
Mortensen, E. N. and W. A. Barrett (1998). ‘Interactive Segmentation with Intelligent Scissors’. Graphical
Models and Image Processing 60(5): 349–384.
Mumford, D. (1992). ‘Elastica and Computer Vision’. In Algebraic Geometry and Applications, edited by
C. Bajaj. Heidelberg: Springer.
Murray, R. F., P. Bennett, and A. Sekuler (2002). ‘Optimal Methods for Calculating Classification
Images: Weighted Sums’. Journal of Vision 2: 79–104.
Neumann, H. and W. Sepp (1999). ‘Recurrent V1–V2 Interaction in Early Visual Boundary Processing’.
Biological Cybernetics 81(5–6): 425–444.
Oleskiw, T., J. Elder, and G. Peyré (2010). ‘On Growth and Formlets’. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society.
Ons, B. and J. Wagemans (2011). ‘Development of Differential Sensitivity for Shape Changes Resulting
from Linear and Nonlinear Planar Transformations’. i-Perception 2: 121–136. Doi: 10.1068/i0407.
Ons, B. and J. Wagemans (2012). ‘A Developmental Difference in Shape Processing and Word–Shape
Associations between 4 and 6.5 Year Olds’. i-Perception 3: 481–494. Doi: 10.1068/i0481.
Or, C. and J. Elder (2011). ‘Oriented Texture Detection Ideal Observer Modeling and Classification Image
Analysis’. Journal of Vision 11(8): 1–19.
Oyama, T. (1961). ‘Perceptual Grouping as a Function of Proximity’. Perceptual and Motor Skills
13: 305–306.
Parent, P. and S. W. Zucker (1989). ‘Trace Inference, Curvature Consistency, and Curve Detection’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 11: 823–839.
Pasupathy, A. and C. E. Connor (1999). ‘Responses to Contour Features in Macaque Area V4’. Journal of
Neurophysiology 82: 2490–2502.
Pettet, M. W. (1999). ‘Shape and Contour Detection’. Vision Research 39: 551–557.
Phillips, G. and H. Wilson (1984). ‘Orientation Bandwidths of Spatial Mechanisms Measured by Masking’.
Journal of the Optical Society of America A—Optics and Image Science 1: 226–232.
Ren, X., C. Fowlkes, and J. Malik (2008). ‘Learning Probabilistic Models for Contour Completion in
Natural Images’. International Journal of Computer Vision 77: 47–63.
Rensink, R. A. and J. T. Enns (1995). ‘Preemption Effects in Visual Search Evidence for Low-Level
Grouping’. Psychological Review 102(1): 101–130.
234 Elder

Ringach, D. L. (2002). ‘Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque
Primary Visual Cortex’. Journal of Neurophysiology 88: 455–463.
Roelfsema, P. R. (2006). ‘Cortical Algorithms for Perceptual Grouping’. Annual Review of Neuroscience
29: 203–227.
Rubin, E. (1927). ‘Visuell wahrgenommene wirkliche bewegungen’. Zeitschrift für Psychologie 103: 354–384.
Sasaki, Y. (2007). ‘Processing Local Signals into Global Patterns’. Current Opinion in Neurobiology
17(2): 132–139.
Sha’ashua, A. and S. Ullman (1988). ‘Structural Saliency: The Detection of Globally Salient Structures Using
a Locally Connected Network’. In Proceedings of the 2nd International Conference on Computer Vision,
pp. 321–327. Los Alamos, CA: IEEE.
Sharon, E. and D. Mumford (2006). ‘2D-Shape Analysis Using Conformal Mapping’. International Journal
of Computer Vision 70(1): 55–75.
Sigman, M., G. A. Cecchi, C. D. Gilbert, and M. O. Magnasco (2001). ‘On a Common Circle: Natural
Scenes and Gestalt Rules’. Proceedings of the National Academy of Sciences 98(4): 1935–1940.
Snowden, R. (1992). ‘Orientation Bandwidth: The Effect of Spatial and Temporal Frequency’. Vision
Research 32: 1965–1974.
Spehar, B. (2002). ‘The Role of Contrast Polarity in Perceptual Closure’. Vision Research 42(3): 343–350.
Stahl, J. and S. Wang (2008). ‘Globally Optimal Grouping for Symmetric Closed Boundaries by Combining
Boundary and Region Information’. IEEE Transactions on Pattern Analysis and Machine Intelligence
30(3): 395–411.
Thompson, D. (1917). On Growth and Form. Cambridge: Cambridge University Press.
Thorpe, S. (2002). ‘Ultra-Rapid Scene Categorization with a Wave of Spikes’. In Proceedings of the
Biologicaly-Motivated Computer Vision Conference, Vol. LNCS 2525, pp. 1–15.
Thorpe, S., D. Fize, and C. Marlot (1996). ‘Speed of Processing in the Human Visual System’. Nature
381: 520–522.
Tu, Z., X. Chen, A. Yuille, and S. Zhu (2005). ‘Image Parsing: Unifying Segmentation, Detection, and
Recognition’. International Journal of Computer Vision 63(2): 113–140.
Tversky, T., W. S. Geisler, and J. S. Perry (2004). ‘Contour Grouping: Closure Effects are Explained by Good
Continuation and Proximity’. Vision Research 44: 2769–2777.
Ungerleider, L. (1995). ‘Functional Brain Imaging Studies of Cortical Mechanisms for Memory’. Science
270(5237): 769–775.
Van Essen, D. C., B. Olshausen, C. H. Anderson, and J. L. Gallant (1991). ‘Pattern Recognition, Attention,
and Information Processing Bottlenecks in the Primate Visual Search’. SPIE 1473: 17–28.
Wagemans, J., J. Elder, M. Kubovy, S. Palmer, M. Peterson, M. Singh, and R. von der Heydt (2012).
‘A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping And Figure-Ground
Organization’. Psychological Bulletin 138(6): 1172–1217. Doi: 10.1037/a0029333.
Walsh, V. and A. Cowey (1998). ‘Magnetic Stimulation Studies of Visual Cognition’. Trends in Cognitive
Science 2: 103–110.
Wang, S. and J. M. Siskind (2003). ‘Image Segmentation with Ratio Cut’. IEEE Transactions on Pattern
Analysis and Machine Intelligence 25(6): 675–690.
Watt, R. J. and M. J. Morgan (1984). ‘Spatial Filters and the Localization of Luminance Changes in Human
Vision’. Vision Research 24(10): 1387–1397.
Wertheimer, M. (1938). ‘Laws of Organization in Perceptual Forms’. In A Sourcebook of Gestalt Psychology,
edited by W. D. Ellis, pp. 71–88. London: Routledge and Kegan Paul.
Williams, L. R. and D. W. Jacobs (1997). ‘Stochastic Completion Fields: A Neural Model of Illusory
Contour Shape and Salience’. Neural Computation 9(4): 837–858.
Wilson, H. R. and J. R. Bergen (1979). ‘A Four Mechanism Model for Threshold Spatial Vision’. Vision
Research 19: 19–32.
Bridging the Dimensional Gap 235

Wokke, M. E., A. R. E. Vandenbroucke, H. S. Scholte, and V. A. F. Lamme (2013). ‘Confuse your
Illusion: Feedback to Early Visual Cortex Contributes to Perceptual Completion’. Psychological Science
24(1): 63–71.
Yen, S. and L. Finkel (1998). ‘Extraction of Perceptually Salient Contours by Striate Cortical Networks’.
Vision Research 38(5): 719–741.
Yoshino, A., M. Kawamoto, T. Yoshida, N. Kobayashi, and J. Shigemura (2006). ‘Activation Time Course
of Responses to Illusory Contours and Salient Region: A High-Density Electrical Mapping Comparison’.
Brain Research 1071(1): 137–144.
Yuille, A. and D. Kersten (2006). ‘Vision as Bayesian Inference Analysis by Synthesis?’ Trends in Cognitive
Sciences 10(7): 301–308.
Zisserman, A., J. Mundy, D. Forsyth, J. Lui, N. Pillow, C. Rothwell, and S. Utcke (1995). ‘Class-Based
Grouping in Perspective Images’. In Proceedings of the 5th International Conference on Computer Vision,
pp. 183–188. Los Alamitos, CA: IEEE.
Chapter 12

Visual representation of
contour and shape
Manish Singh

Contours and information


Images are far from uniform in their information content. Rather, information tends to be con-
centrated in regions around contours. This makes good sense: the presence of a contour signals
some physically significant ‘event’ in the world—whether it be the occluding boundary of an
object, a reflectance change, or something else. Indeed, human observers are just as good at scene
recognition with line drawings as they are with full-colour photographs (e.g. Walther et al. 2011).
Similarly, object recognition (e.g. Biederman and Ju 1988) and 3D shape perception (e.g. Cole et
al. 2009) are often just as good with line drawings as they are with shaded images. It is therefore
not surprising that line drawings have a long history—having been used by humans as an effective
mode of visual depiction and communication since prehistoric times (as evidenced, for example,
by the Chauvet cave paintings; see e.g. Clottes 2003).
In his seminal article, Attneave (1954) noted not only the high-information content of contours
in images, but also argued that along contours points of maximal curvature carry the greatest infor-
mation. In support of this latter claim, Attneave provided two lines of evidence. First, he briefly
reported the results of a study in which participants were asked to approximate a shape as closely
as possible with only a limited number of points, and then to indicate the locations corresponding
to those points on the original shape. Histograms of locations selected by the participants exhibited
sharp peaks at local maxima of curvature—pointing to their importance in shape representation.
Second, Attneave made a line drawing of a sleeping cat using only local curvature maxima that were
then connected with straight-line segments. The resulting drawing was readily recognizable as a cat
(now famously known as ‘Attneave’s cat’), suggesting that not much information had been lost.
Attneave’s second line of evidence has been the subject of further discussion and some contro-
versy; the precise result appears to depend on the geometry of the contour (whether or not it has
large variations in curvature and salient maxima) and the presence of other types of competing
candidates points (e.g. Kennedy and Domander 1985; De Winter and Wagemans 2008a, 2008b;
Panis et  al. 2008). His first experimental finding has been uncontroversial, however. Indeed,
Norman, Phillips, and Ross (2001) conducted a study along the lines described briefly in Attneave
(1954) using silhouettes cast by natural 3D objects (sweet potatoes), and replicated his findings
(see Figure 12.1a for sample results).1 Similarly, De Winter and Wagemans (2008b) found that
when participants are asked simply to mark ‘salient points’ along the bounding contours of 2D
shapes—without being required to replicate the shape—they are again most likely to pick local
maxima of curvature. As we will see, curvature extrema play an important role in modern theories

  A detailed report of Attneave’s original experiment was apparently never published. His 1954 article cites only
1

a ‘mimeographed note’.
Visual Representation of Contour and Shape 237

(a)

(b)

Fig. 12.1  (a) Generative model of open contours expressed a probability distribution on turning angle
from the current contour orientation. The distribution is centered on 0: meaning that going ‘straight’
(i.e. zero turning) is most likely, with the probability decreasing monotonically with turning angle in
either direction. This empirically motivated generative model explains why information along contour
increases monotonically with curvature (b) Sample results from Norman et al’s (2001) replication of
Attneave’s experiment. Histograms of points selected by subjects show peaks at maxima of curvature.
(a) Reproduced from Jacob Feldman and Manish Singh, Information Along Contours and Object Boundaries,
Psychological Review, 112(1), pp. 243-252, DOI: 10.1037/0033-295X.112.1.243 (c) 2005, American Psychological
Association. (b) Reproduced from J. Farley Norman, Flip Phillips, Heather E. Ross, Information concentration along
the boundary contours of naturally shaped solid objects, Perception 30(11) pp. 1285 – 1294, doi:10.1068/p3272,
Copyright (c) 2001, Pion. With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com

of shape representation as well (Hoffman and Richards 1984; Richards, Dawson, and Whittington
1986; Leyton 1989; Hoffman and Singh 1997; Singh and Hoffman 2001; De Winter and Wagemans
2006, 2008a; Cohen and Singh 2007).
But why should curvature maxima be the most informative points along a contour? The link
between contour curvature and information content follows fairly directly from Shannon’s
theory of information (in particular, from the definition of surprisal as u = –log(p)), once one
adopts a simple and empirically motivated generative model of contours (Feldman and Singh
2005; Singh and Feldman 2012).2 Specifically, one may ask, as one moves along a contour, where
is the contour likely to go ‘next’ at any given point? A  great deal of psychophysical work on
contour integration and contour detection has shown that the visual system implicitly expects
that a contour is most likely to go ‘straight’ (i.e. to continue along its current tangent direction),
and that the probability of ‘turning’ away from the current tangent direction decreases mono-
tonically with the magnitude of the turning angle (Field, Hayes, and Hess 1993; Feldman 1997;
Geisler et al. 2001; Geisler and Perry, 2009; Elder and Goldberg 2002; Yuille et al. 2004). The
visual system’s local probabilistic expectations about contours may thus be summarized as a

  Note that the formula for the surprisal is consistent with the simple everyday intuition that improbable events,
2

when they occur, are cause for greater surprise—and hence are more informative—than when a highly prob-
able, or expected, event occurs. As they say, ‘man bites dog’ is news; ‘dog bites man’ is not.
238 Singh

von Mises (or circular normal) distribution on turning angles, centered on 0 (see Figure 12.1b;
Feldman and Singh 2005; Singh and Feldman 2012). Indeed, even the assumption of a specific
distributional form is not necessary to derive Attneave’s claim; all that is needed is that the
distribution on turning angles peak at 0 degrees, and decrease monotonically on both sides. It
then follows directly from this that the surprisal, u = –log(p), increases monotonically with the
magnitude of the turning angle. And turning angle, of course, is simply the discrete analogue
of curvature. Hence maxima of curvature are also maxima of contour information—which is
precisely Attneave’s claim.
One can go further, however. Attneave (1954) treated curvature only as an unsigned quantity,
i.e. simply as a magnitude. For a closed contour (such as the outline of an object), however, it is not
only meaningful but also more appropriate to treat curvature as a signed quantity—specifically,
as having positive sign in convex sections of the contour, and negative sign in concave sections.
Indeed, there are principled reasons to expect that the visual system should treat convex and con-
cave portions of a shape quite differently (Koenderink and van Doorn 1982; Koenderink 1984;
Hoffman and Richards 1984). From the point of view of information content of contours, however,
the key observation is that on closed contours, the probability distribution on turning angles is
not centred on 0, but rather is biased such that positive turning angles (involving turns toward
the shape, or figural side of the contour) are more likely than negative turning angles. Indeed, this
must be the case if the contour is to eventually close in on itself. And it entails, via the –log(p) rela-
tion, an asymmetry in surprisal, such that negative curvature is more ‘surprising’—and hence more
informative—than corresponding magnitudes of positive curvature (see Feldman and Singh 2005
for details). This asymmetry in information content is supported by empirical findings showing
that changes at concavities are easier to detect visually than corresponding changes at convexities
(Barenholtz et al. 2003; Cohen et al. 2005), although there are nonlocal influences as well—based
on, for example, whether a shape change alters qualitative part structure (e.g. Bertamini and Farrant
2005; Vandekerckhove, Panis, and Wagemans 2008). (See also ‘Interactions between Contour and
Region Geometry’ for more on nonlocal influences in shape perception.)
In summary, Attneave’s claim about curvature and information follows from a simple and
empirically motivated generative model of contours. And, as noted above, Attneave’s theoretical
claim can also be extended to closed contours, with the result that negative curvature segments
carry more information than corresponding positive curvature segments.3 The stochastic gen-
erative model of contours may also be extended to incorporate the role of co-circularity, i.e. the
visual expectation that contours tend to maintain their curvature (Singh and Feldman 2012).
Psychophysical evidence for this expectation by the visual system comes from studies of contour
integration (Feldman 1997; Pizlo, Salach-Goyska, and Rosenfeld 1997) as well as visual extrapola-
tion of contours (Singh and Fulvio 2005, 2007).

Contour extrapolation and interpolation


A natural way to investigate the visual representation of contours is by examining how the visual
system ‘fills in’ the shape of contour segments that are missing in the image—for example, due
to partial occlusion or camouflage (or insufficient image contrast). Shape completion is a highly
under-constrained problem, a form of the problem of induction (Hume 1748/1993). Given any
pair of inducing contour segments, there are always infinitely many smooth contours that can

  It is important to note that, since the generative models of contours considered in this section were entirely
3

local, these claims follow simply from local expectations about contour behaviour.
Visual Representation of Contour and Shape 239

fill in the missing intervening portion of the shape. Because visually completed contours are, by
definition, generated by the visual system (being absent in the retinal images themselves), detailed
measurement of their shape provides a unique window on the shape constraints embodied in the
visual processing of contours.

Contour extrapolation
Perhaps the simplest context for examining visual shape completion is that of contour extrapola-
tion: if a curved contour disappears behind an occluder, how does the visual system ‘expect’ it will
proceed behind the occluder? In other words, what shape will it take—not just in the immediate
vicinity of the point of occlusion, but also further away? A precise answer to this question would
serve to characterize the commonly (though often loosely) used notion of ‘good continuation’.4
Indeed, Wertheimer (1923) originally proposed the principle of good continuation as a way of
choosing between different possible extensions of a contour segment (e.g. see his Figures 16–19).
However, a mathematically precise characterization has been elusive. Some formal questions con-
cerning the meaning of good continuation include:
1 Which geometric variables of the contour does the visual system use in extrapolating its shape,
e.g. its tangent direction, curvature, rate of change of curvature, higher derivatives?
2 How does the visual system combine the contributions of these variables to actually generate
the extended shape of the extrapolated contour?
In addition, contour extrapolation is also a critical component of the general problem of shape
completion—since a visually interpolated contour must both smoothly extend each inducing con-
tour, as well as smoothly connect the two individual extrapolants (e.g. Ullman 1976; Fantoni and
Gerbino 2003). Therefore, a full understanding of visual shape completion requires an under-
standing of how the visual system extrapolates each curved inducing contour.
Singh and Fulvio (2005, 2007) used an experimental method they called location-and-gradient
mapping to measure the shape of visually extrapolated contours. This method obtains paired
measurements of extrapolation position and orientation at multiple distances from the point of
occlusion in order to build up an extended representation of a visually extrapolated contour. In
their stimuli, a curved contour disappears behind the straight edge of a half-disk occluder (see
Figure 12.2a). Observers iteratively adjust the (angular) position of a short line probe on the oppo-
site (curved) side of the occluder, and its orientation, in order to optimize the percept of smooth
continuation. Measurements are taken at multiple distances from the point of occlusion by using
half-disk occluders of different sizes (see Figure 12.2b).
In their first study, Singh and Fulvio (2005) used arcs of circles and parabolas as inducing con-
tours. By fitting various shape models to the extrapolation data, they found that:
1 The visual system makes systematic use of contour curvature in extrapolating contours—in
other words, extrapolation curvature increases systematically with the curvature of the inducing
contour. Although this result makes perfect intuitive sense, it is noteworthy that current models
of shape completion (in both human and computer vision) do not use the curvature of the
inducer—only its position and tangent direction at the point of occlusion. This empirical result
thus underscores the need for models of shape completion to incorporate the role of inducer
curvature as well.

  This question is of course intimately related to the generative models of contours considered in ‘Contours and
4

Information’. The main difference is that the previously considered models focused on where contour is likely
to go ‘next’—i.e. in the immediate vicinity of the current location—whereas the question we are now posing
includes the extended behaviour of the contour.
240 Singh

(a) (b)

Fig. 12.2  (a) Stimulus used by Singh and Fulvio (2005, 2007) to study the visual extrapolation of
contours behind an occluder. A curved inducing contour disappears behind the straight edge of a half-
disk occluder. Observers adjust the angular position as well as the orientation of a line probe around
the curved edge of the occluder to optimize the percept of smooth continuation. (b) Measurements
are obtained at multiple distances from the point of occlusion to build a detailed representation of an
observer’s visually extrapolated contour.
Reproduced from Manish Singh and Jacqueline M. Fulvio, Visual Extrapolation of Contour Geometry, Proceedings
of the National Academy of Sciences, USA 102(3), pp. 939–944, doi: 10.1073/pnas.0408444102, Copyright
(2005) National Academy of Sciences, U.S.A.

2 Visually extrapolated contours are characterized by decaying curvature with increasing distance
from the point of occlusion. Specifically, fits of spiral shape models (i.e. models that include
both a curvature term and a rate of change of curvature term) to extrapolation data consistently
yielded negative values for the rate of change of curvature.5
3 The precision of subjects’ visually extrapolated contours decreases systematically with the
curvature of the inducing contour:  the higher the inducing curvature, the less precisely the
visually extrapolated contour is localized. This result is consistent with findings from contour
interpolation studies using dot-sampled contours, which have also found a ‘cost of curvature’
in human performance (Warren, Maloney, and Landy 2002).
In a subsequent study, Singh and Fulvio (2007) tested whether observers make use of the rate of
change of curvature of an inducing contour in visually extrapolating its shape. This study used
arcs of Euler spirals as inducing contours—characterized by linearly increasing or decreasing cur-
vature as a function of arc length (i.e. length measured along the contour)—and manipulated
their rate of change of curvature (both in the positive and negative directions). In fitting a two-
parameter Euler-spiral model to the extrapolation settings, they found no systematic relationship
between the rate of change of curvature of the inducing contour and the rate of change of cur-
vature of the fitted Euler spiral to the extrapolation data. Thus observers appear not to take into
account rate of change of curvature in visually extrapolating contours behind occluders. Indeed,
visually extrapolated contours continued to exhibit a decaying-curvature behaviour even when

5  The decaying curvature behaviour explains the (initially surprising) finding that a parabolic shape model bet-
ter explained observers’ extrapolation data than a circular shape model—irrespective of whether the inducing
contour itself was a circular or parabolic arc (see Singh and Fulvio 2005 for details).
Visual Representation of Contour and Shape 241

the inducing contours had monotonically increasing curvature as they approached the occluder.
Importantly, this failure to use inducer rate of change of curvature was not simply due to a fail-
ure to detect it. A control experiment confirmed that observers could indeed reliably distinguish
between inducing contours with monotonically increasing vs decreasing curvature.
Taken together, these results may be viewed as providing a formal characterization of ‘good
continuation’. Specifically, they show that the visual system uses tangent direction as well as curva-
ture—but not rate of change of curvature—in visually extrapolating a curved contour. Moreover,
the influence of inducer curvature on visually extrapolated contours decays with distance from the
point of occlusion. Singh and Fulvio (2005, 2007) modelled these characteristics using a Bayesian
model involving two probabilistically expressed constraints: a likelihood constraint to maintain
the curvature of the inducing contour (i.e. a bias toward ‘co-circularity’; Parent and Zucker 1989),
and a prior constraint to minimize curvature (i.e. a bias toward ‘straightness’; e.g. Field et al.
1993; Feldman 1997, 2001; Geisler et al. 2001; Elder and Goldberg 2002). Both constraints were
expressed as probability distributions on curvature. The prior was expressed as a Gaussian dis-
tribution centred on 0 curvature with fixed variance, whereas the likelihood was centred on the
estimated inducer curvature at the point of occlusion, with a (Weber-like) linearly increasing
standard deviation with distance from the point of occlusion. Near the point of occlusion, the like-
lihood is very precise (low variance) and thus tends to dominate the prior.6 With increasing dis-
tance from the point of occlusion, however, the likelihood becomes less reliable (larger variance),
and so gradually the prior comes to dominate the likelihood. This shift in relative reliabilities leads
to the decaying curvature behaviour (see Singh and Fulvio 2007 for details).

Contour interpolation
Fulvio, Singh, and Maloney (2008) extended the location-and-gradient mapping method to study
contour interpolation. Their stimulus displays contained a contour whose middle portion was
occluded by a rectangular surface. On each trial, a vertical interpolation window was opened at
one of six possible locations through which a short linear probe was visible (see Figure 12.3a).
Observers iteratively adjusted the location (height) and orientation of the line probe in order to
optimize the percept of smooth continuation of a single contour behind the occluder. The per-
ceived interpolated contours were thus mapped out by taking measurements at six evenly spaced
locations along the width of the occlusion region. The experiments manipulated the geometry of
the two inducing segments—specifically, the turning angle between them (Figure 12.3b) and their
relative vertical offset (Figure 12.3c).
A basic question was: for a given pair of inducing contours, are observers’ settings of position and
orientation through the six interpolation windows globally consistent—i.e. consistent with a single,
stable, smooth interpolating contour. Using two measures of global consistency—a parametric one
and a non-parametric one—Fulvio et al. (2008) found that although increasing the turning angle
between inducers adversely affected the precision of interpolation settings, it did not adversely
affect their internal consistency. By contrast, increasing the relative offset between the two inducing
contours did disrupt the internal consistency of observers’ interpolation settings. In other words,
observers made their settings using simple heuristics (they were largely influenced by the closest
inducing contour), and their local settings of height and orientation at various locations no longer
‘hung together’ into any actual extended contour. A natural way to understand this difference is

6  Under the assumption of Gaussian distributions for the prior and likelihood, the Bayesian posterior is also
a Gaussian distribution whose mean is a weighted average of the prior mean and likelihood mean, with the
relative weights inversely proportional to their respective variances (see e.g. Box and Tiao 1992).
242 Singh

(a) (b)

(c)

Fig. 12.3  (a) Stimulus used by Fulvio, Singh, and Maloney (2008, 2009) to study contour interpolation.
For a given pair of inducing edges, an interpolation window is opened at one of six possible locations
along the width of the occluder. Observers adjust the height as well as the orientation of a line probe
visible through the interpolation window in order to optimize the percept of smooth interpolation. The
inducer geometry was manipulated by varying the turning angle (shown in (b)) and the relative offset
(shown in (c)) between the two inducers.
Reprinted from Vision Research, 48(6), Jacqueline M. Fulvio, Manish Singh, and Laurence T. Maloney, Precision and
consistency of contour interpolation, pp. 831–49, Copyright (2008), with permission from Elsevier.

that increasing the relative offset between inducer pairs leads eventually to a geometric context
where the interpolating contour must be inflected—i.e. contain a point of inflection (or change in
the sign of curvature) somewhere along its path—which is a factor that is known to disrupt visual
completion (Takeichi et al. 1995; Singh and Hoffman 1999). On the other hand, simply increasing
the turning angle between the two inducers does not necessitate inflected interpolating contours—
it only requires interpolating contours with greater curvature in a single direction.
These two factors—turning angle and relative offset between inducers—are often combined
conjunctively to define the strength of grouping between pairs of inducing edges. For example,
Kellman and Shipley’s (1991) definition of edge relatability requires that both the relative offset
between inducers, as well as the turning angle between them, be within specific ranges in order for
them to be considered ‘relatable’. This conjunctive combination, however, ignores the qualitatively
different effects that these two factors have on contour interpolation. Specifically, although both
factors lead to an increase in imprecision, only relative offset leads to a failure of internal consist-
ency. In a subsequent study, Fulvio, Singh, and Maloney (2009) developed a purely experimental
criterion to test for internal consistency of interpolation measurements—one that relied solely on
observers’ own interpolation performance rather than on any experimenter-defined measures.
The results independently verified and extended their earlier findings.

Part-based representations of shape


A great deal of evidence—both psychophysical (see below) and physiological (e.g. Pasupathy
and Connor 2002)—indicates that the human visual system represents contours and shapes in a
Visual Representation of Contour and Shape 243

piecewise manner. In other words, it segments contours and shapes into simpler ‘parts’ and organ-
izes shape representation using these parts and their spatial relationships. Far from being arbitrary
subsets, these perceptual parts are highly systematic, and segmented using predictable geometric
‘rules’. Moreover, these segmented parts tend to correspond, in high-level vision, to psychologi-
cally meaningful subunits of objects (such as head, leg, branch, etc.) that are highly relevant to a
number of cognitive processes, including categorization, naming, and object recognition.
Although in Attneave’s (1954) usage, the phrase ‘maxima of curvature’ along a contour does
not distinguish between positive (convex) and negative (concave) curvature, the sign of curva-
ture actually plays a fundamental role in modern theories of shape representation—and especially
in theories of part segmentation. Once one treats curvature as a signed quantity (which can be
done whenever the distinction between convex and concave is well defined), one can differentiate
between positive maxima of curvature (marked by M+ in Figure 12.4a) and negative minima of
curvature (marked by m– in Figure 12.4a). Both of these extrema types have locally maximal mag-
nitude of curvature, and are hence ‘maxima of curvature’ by Attneave’s nomenclature. However, by
definition, positive maxima lie in convex segments of a shape’s bounding contour, whereas negative
minima lie in concave segments. Apart from these two extrema types, another important class of
points is defined by inflections, which are zero crossings of curvature—i.e. points where curvature
crosses from positive (convex) to negative (concave), or vice versa (marked by o in Figure 12.4a).
The distinction between positive maxima and negative minima of curvature is critical for part
segmentation—where negative minima of curvature play a special role. According to Hoffman
and Richards’ (1984) ‘minima rule’, the visual system uses negative minima of curvature to seg-
ment shapes into parts. This rule is motivated by the principle of transversality, according to which
when two smooth objects are joined to form a composite object, their intersection generically

(a) (b)
M+
m– m–

O O
O M+
m– m–
O m–
O

O m–
(c)
m–

M+

Fig. 12.4  (a) Illustrating different types of curvature-based features along the outline of a


shape: Positive maxima of curvature (marked by M+), negative minima of curvature (marked by m–),
and inflection points (marked by o). (b) Motivation behind the minima rule: Joining two smooth
objects generically produces negative minima of curvature on the composite object. (c) Similarly,
when a branch grows out of a trunk (or a limb out of an embryo), negative minima are created at
the loci of protrusion.
244 Singh

produces a concave crease (i.e. a discontinuity in the tangent plane of the composite surface; see
Figure 12.4b). And a concave crease is simply an extreme—i.e. ‘sharp’—form of a negative mini-
mum of curvature. (More precisely, a generic application of smoothing to a concave crease yields
a smooth negative minimum.) Similarly, when a new branch grows out of a trunk (or a limb out
of an embryo), negative minima of curvature are created between the sprouting branch and the
trunk (see Figure 12.4c; Leyton 1989). Hence, when faced with a complex object with unknown
part structure, it is a reasonable strategy for the visual system to use the presence of negative
minima of curvature as a cue to identifying separate parts.
A great deal of psychophysical evidence indicates that negative minima of curvature do indeed
play an important role in visually segmenting shapes into parts. For example, when subjects are
asked to draw cuts on line drawings of various objects to demarcate their natural parts, a large
proportion of their cuts pass through or near negative minima of curvature (Siddiqi, Tresness,
and Kimia 1996; De Winter and Wagemans 2006). Similar results have also been obtained with
3D models of objects (Chen, Golivinskiy, and Funkhouser 2009). Furthermore, even when unfa-
miliar, randomly generated shapes are used (hence lacking any high-level cues from recognition
or category knowledge), and subjects are simply asked to indicate whether or not a given contour
segment belongs to a particular shape (i.e. in a performance-based task where the instructions to
participants involve no mention of ‘parts’), their identification performance is substantially bet-
ter for segments delineated by negative minima of curvature than for those delineated by other
extrema types (Cohen and Singh 2007). This result indicates that part segmentation is a relatively
low-level geometry-driven process that operates automatically without relying on familiarity with
the shape, or any task requirement involving naming or recognition.7
Part segmentation using negative minima of curvature has been shown to explain a number
of visual phenomena, including the perception of figure and ground (Baylis and Driver 1994,
1995; Hoffman and Singh 1997); the perception of shape similarity (Hoffman and Richards 1984;
Bertamini and Farrant 2005; Vandekerckhove et al. 2008); object recognition in contour-deleted
images (Biederman 1987; Biederman and Cooper 1991); perception of transparency (Singh
and Hoffman 1998); visual search for shapes (Wolfe and Bennett 1997; Hulleman, te Winkel
and Boselie 2000; Xu and Singh 2002); the visual estimation of the ‘centre’ of a two-part shape
(Denisova, Singh, and Kowler 2006); the visual estimation of the orientation of a two-part shape
(Cohen and Singh 2006); and the allocation of visual attention to multi-part objects (Vecera,
Behrmann, and Filapek 2001; Barenholtz and Feldman 2003).
Although the minima rule provides an important cue for part segmentation, it is not suf-
ficient to divide a shape into parts—which of course requires segmenting the interior region
of a shape, not simply its bounding contour. Specifically, although the minima rule provides
a number of candidate part boundaries (namely, the negative minima of curvature), it does
not indicate how these boundaries should be paired to form part cuts that segment the shape.
Furthermore, even in shapes containing exactly two negative minima, simply connecting these
two minima does not necessarily yield intuitive part segmentations (see e.g. Singh, Seyranian,
and Hoffman 1999; Singh and Hoffman 2001 for examples). The basic limitation of the minima
rule stems from the fact that localizing negative minima of curvature involves only the local
geometry of the bounding contour of the shape, but not the nonlocal geometry of its interior
region (see ‘Interactions between Contour and Region Geometry’ for more on this important

7  This does not mean, of course, that high-level cognitive factors do not also exert an influence when present;
they clearly do (see e.g. De Winter and Wagemans 2006). The point is simply that cognitive factors are not
necessary for part segmentation; low-level geometry-driven mechanisms of part segmentation can and do
operate in their absence.
Visual Representation of Contour and Shape 245

distinction). Because of the contributions of such nonlocal region-based factors, it is possible to


have negative minima on a shape that do not correspond to perceptual part boundaries (Figure
12.5a) and, conversely, to have perceptual part boundaries that do not correspond to negative
minima (Figure 12.5b).
In order to address such limitations, researchers have proposed a number of additional geo-
metric factors for segmenting objects into parts: limbs and necks (Siddiqi et al. 1996), convexity
(Latecki and Lakamper 1999; Rosin 2000), a preference for shorter cuts (Singh et al. 1999), local
symmetry, good continuation (Singh and Hoffman 2001), as well as cognitive factors based on
object knowledge (De Winter and Wagemans 2006). And each of these factors has indeed been
shown to play a role in part segmentation. However, with a large number of such factors (in
addition to the minima rule), it becomes increasingly difficult to model the various complex
interactions between them—the way in which they cooperate and compete with each other in
various geometric contexts—and therefore to have a unifying theory of part segmentation.
A different approach to part segmentation is to use an axial, or skeleton-based, representation of
the interior region of a shape in order to segment it into parts. Specifically, each axial branch of the
shape skeleton can be used to identify a natural part of the shape (see Figure 12.5c)—assuming, of
course, that the skeleton-computation procedure can yield a one-to-one correspondence between
parts and axial branches. The desirability of such a correspondence was in fact articulated in Blum’s
original papers that introduced his Medial-Axis Transform (MAT) as a representation of animal

(a) (b) no m– (part boundary)

m– (no part boundary)

(c)

Fig. 12.5  Two examples of failure of the minima rule: (a) A negative minimum that does not correspond
to a part boundary; and (b) a part boundary that does not correspond to a negative minimum. These
failures arise because the minima rule uses only local contour geometry, not region-based geometry.
(c) A different approach to part segmentation involves establishing a one-to-one correspondence
between axial branches are parts. Such a correspondence is achieved by a Bayesian approach to skeleton
computation.
Data from Jacob Feldman and Manish Singh, Bayesian estimation of the shape skeleton, Proceedings of the
National Academy of Sciences of the United States of America 103(47), pp. 18014–18019, doi: 10.1073/
pnas.0608811103, 2006.
246 Singh

and plant morphology (e.g. Blum 1973).8 However, as recognized subsequently by Blum and Nagel
(1978; see their Figure 2), the MAT does not achieve this one-to-one correspondence. Although
modern techniques for computing the medial axis and related transforms have become increas-
ingly sophisticated, they nevertheless largely inherit the intrinsic limitations of the MAT—which
stem from the basic conception of skeleton computation as a deterministic process involving the
application of a fixed geometric ‘transform’ to any given shape. Specifically, a geometric-transform
approach does not attempt to separate the shape ‘signal’ from any contributions of noise. Every
feature along the contour is effectively treated as being ‘intrinsic’ to the shape. One consequence of
this is a high degree of sensitivity of the skeleton to noise, such that the smallest perturbation to the
contour can dramatically alter the branching topology of the shape skeleton.
In order to address these concerns, Feldman and Singh (2006) used an inverse-probability
approach to estimate the skeleton that ‘best explains’ a given shape. The key idea in this approach
is to treat object shapes as resulting from a combination of generative factors and noise. The skel-
etal shape representation must then model the generative (or ‘intrinsic’) factors, while factoring
out the noise. Specifically, shapes are assumed to ‘grow’ from a skeleton via a stochastic generative
process. The estimated skeleton of a given shape is then one’s best inference of the skeleton that
generated it. Skeletons with more branches, and more highly curved branches, can of course pro-
vide a better fit to the shape (i.e. lead to a higher likelihood), but they are also penalized for their
added complexity (i.e. they have a lower prior). Thus one’s ‘best’ estimate of the skeleton involves
a Bayesian trade-off between fit to the shape and the complexity of the skeleton.
This trade-off leads to a pruning criterion for ‘spurious’ branches of the shape skeleton: a candi-
date axial branch is included in the final shape skeleton only if it improves the fit to the shape suf-
ficiently to warrant the increase in skeletal complexity that it entails. More precisely, the posterior
of the skeleton that includes the test branch must be larger than the posterior of the skeleton that
excludes it (recall that the posterior includes both the contribution of the fit to the shape, via the
likelihood term, as well as of skeleton complexity, via the prior). Axial branches that do not meet
this criterion are effectively treated as ‘noise’ and pruned. As a result, this probabilistic computa-
tion is able to establish a one-to-one correspondence between axial branches and perceptual parts
(see Figure 12.5c for an example). Importantly, it can predict both the successes of the minima
rule (cases where negative minima are perceived as part boundaries) and its failures (cases where
negative minima are not perceived as part boundaries, or where part boundaries do not corre-
spond to negative minima; recall Figures 12.5a and 12.5b)—despite the fact that in this approach
contour curvature is never explicitly computed. Thus, it yields a single axial branch for the curved
shape in Figure 12.5a; but a skeleton with two axial branches for the shape in Figure 12.5b. Indeed,
the contributions of other known factors influencing part segmentation can all be understood in
terms of this more fundamental process of probabilistic estimation of the shape skeleton, indicat-
ing that this may provide a unifying theory of part segmentation. See Singh, Feldman, and Froyen
(in preparation) and Feldman et al. (2013) for more on this probabilistic approach to skeletons and
parts, and its application to various visual problems.

Interactions between contour and region geometry


The Gestaltists noted early on that a closed contour is perceptually much more than an open
one (Koffka 1935). This claim has been corroborated in a number of experimental contexts

8  In the MAT conception, a shape is viewed as the union of maximally inscribed circles, and its skeleton—the
MAT—is taken to be the locus of the centres of these circles.
Visual Representation of Contour and Shape 247

(e.g. Elder and Zucker 1993; Kovacs and Julesz 1993; Garrigan 2012). However, because closed
contours automatically define an enclosed region, it is less clear whether this advantage of
closure obtains at the level of contour geometry (see Tversky, Geisler, and Perry 2004), or at
the level of region-based geometry, i.e. the geometry of the region enclosed by the contour.
We have seen in the context of part segmentation that there is more to the representation of
a shape than simply the geometry of its bounding contour. To motivate the distinction between
contour geometry and region (or surface) geometry further, consider the simple shape shown in
Figure 12.6a. This shape may be conceptualized in two different ways:
1 It could be viewed as a rubber band lying on a table (the ‘rubber-band representation’).
Mathe­matically, we would define it as a closed one-dimensional contour embedded in
two-dimensional space. In this case, a natural way to represent its geometry would be in terms
of some contour property—say, curvature—expressed as a function of arc length (resulting in
a curvature plot such as in Figure 12.6b). The relevant notions of distance and neighbourhood
relations would then also be defined along the contour. As a result, although points A and B
on the shape are close to each other in the Euclidean plane, they would not be considered
‘neighbouring’ points because they are quite far from each other when distances are measured
along the contour.
2 Alternatively, it could be viewed as a piece of cardboard cut out into a particular shape (the
‘cardboard-cutout representation’). Mathematically, we may define it as a connected and compact
two-dimensional subset of the Euclidean plane (namely, the region enclosed by the contour).
Under this conceptualization, points A and B on the shape would indeed be considered quite
close to each other (because the intervening region is now also part of the shape).

(a) (b)
+
A
Curvature

– A
B

(c) (d)
+
Curvature

Fig. 12.6  Illustrating the limitations of a contour-based representation of shape. (a) Although the
two points A and B are very close to each other on the shape, they are very distant on the curvature
plot of its bounding contour, as shown in (b). (c) Similarly, although the two highlighted sections of
the contour belong to the same ‘bend’ in the shape, this fact is not reflected in any obvious way in
the curvature plot in (d).
248 Singh

The distinction between region-based and contour-based notions of shape has a number of
other implications as well. In Figure 12.6c, for example, the two highlighted sections of the con-
tour belong to the same ‘bend’ in the shape. A  purely contour-based representation, however,
would have difficulty in explicitly representing this fact. In the curvature plot in Figure 12.6d,
for instance, the two contour sections do not appear to be related in any obvious way. What a
contour-based representation misses here is the locally parallel structure of the two highlighted
contour segments. It is clear that such structure can be extracted only by examining relationships
across (i.e. on ‘opposite’ sides of) the shape—not just along the contour. For the same reason,
bilateral symmetry or local symmetry in shapes is relatively easy to capture using region-based
representations, but difficult using purely contour-based representations. As an example, even
though the two shapes shown in Figure 12.7 have very similar curvature profiles, their global
region-based geometries are entirely different (Sebastian and Kimia 2005).
We should note that, in the examples above, we assumed that ‘material’ surface was on the inside
of the closed contour—not an unreasonable assumption for closed contours if we know we are
viewing solid, bounded, objects (the alternative would be an extended surface containing a shaped
hole). In the general case, however, the visual system faces the problem of border-ownership or
figure-ground assignment—determining whether the material object or surface lies on one side
of the contour or the other—a problem that is particularly acute when only a small portion of
an object’s outline is visible. An interesting interaction occurs between contour geometry and
region-based geometry in solving this problem, such that the side with the ‘simpler’ region-based
description tends to be assigned figural status. In more formal terms, the relevant geometric fac-
tors have been characterized in terms of part salience (Hoffman and Singh 1997) and stronger
axiality (Froyen, Feldman, and Singh 2010).
A natural way to capture region-based geometry is in terms of skeletal, or axial, representations
(introduced briefly in ‘Part-Based Representations of Shape’)—compact ‘stick-figure’ representa-
tions that capture essential aspects of its morphology (see, e.g., Kimia 2003). A well-known fig-
ure by Marr and Nishihara (1978) shows 3D models of various animals made out pipe cleaners.
A striking aspect of these models is how easily they are recognized as specific animals, despite the
absence of surface geometry—or indeed any surface characteristics. The demonstration suggests
that the axial information preserved in these pipe-cleaner models is an important component of
human shape representation. It should be borne in mind, however, that a skeletal representation
actually includes not just an estimate of the shape’s axes (which are shown in Marr and Nishihara’s
pipe-cleaner models), but also an estimate of the shape’s ‘width’ at each point on each axis (which
is not). In Blum’s MAT, for instance, this local ‘width’ is captured by the size of the maximally

Fig. 12.7  Although the two shapes have similar curvature profiles—differing only in the presence of a
zero-curvature segment in the shape on the right—their region-based geometries are entirely different.
Example based on Sebastian and Kimia (2005).
Adapted from Signal Processing, 85(2), Thomas B. Sebastian and Benjamin B. Kimia, Curves vs. skeletons in object
recognition, pp. 247–63, Copyright © 2005, with permission from Elsevier.
Visual Representation of Contour and Shape 249

(a) (b) (c) (d) (e)

Fig. 12.8  Illustrating the distinction between contour and region (or surface) geometry. The same
contour segment, visible through an aperture in (a), could belong to surfaces with very different
geometries. First, the contour segment could correspond to a protuberance on the shape, as in (b),
or to an indentation, as in (c). Second, the curvature of the contour could arise due to variation in
the width of the shape about a straight axis (as in (b) and (c)), or due to curvature of the axis itself,
with the local width function being constant (as in (d) and (e)).

inscribed circle at any given point. In Feldman and Singh’s (2006) Bayesian skeleton model, it
is approximately twice the length of the ‘ribs’ along which the shape is assumed to have ‘grown’
from the axis. Each such measure of local width of the shape implicitly defines a point-to-point
correspondence across the shape. In other words, it specifies for any given point on the shape’s
bounding contour which point on the ‘opposite’ side of the shape is locally symmetric to it.9
What are the perceptual implications of the difference between contour-based geometry and
region-based geometry? Consider the local contour segment in Figure 12.8a, shown through an
aperture. The same contour segment could belong to shapes with very different region-based geom-
etries. First, the contour segment could correspond either to a convex protuberance on the shape, or
to a concave indentation (Figures 12.8b vs. 12.8c). This distinction is based simply on a figure-ground
reversal (or change in border ownership)—whether the shape lies either on one, or the other, side
of the contour. This has been shown to be an important factor in predicting perceptual grouping in
the context of both amodal (Liu, Jacobs, and Basri 1999) and modal (Kogo et al. 2010) completion.
The second distinction we consider, however, does not depend on a figure-ground rever-
sal: assuming a locally convex region (say), the curvature on the contour could arise either from
variation in the width of the shape about a straight axis (as in Figures 12.8b and 12.8c), or from
curvature of the axis itself, with the local width of the shape being constant (Figures 12.8d and
12.8e). It is clear that these two cases actually represent two extremes of a continuum—where all
of the contour curvature can be attributed entirely to either the width function alone, or to axis
curvature alone. A continuous family of intermediate cases is of course possible—where the con-
tour’s curvature arises partly due to the curvature of the shape’s axis, and partly due to variations
in the shape’s width (Siddiqi et al. 2001; Fulvio and Singh 2006).
In order to examine the perceptual consequences of such region-based differences in shape, Fulvio
and Singh (2006) examined visual shape interpolation in stereoscopic illusory-contour displays. Their
displays varied systematically in their region-based geometry, while preserving the contour-based
geometry of the inducing edges (see Figure 12.9). Using two different experimental methods, they
probed the perceived shape of the illusory contours in the ‘missing’ region. The results exhibited large
influences of region-based geometry on perceived illusory-contour shape. First, illusory contours

  One way to think about local symmetry is as follows: imagine placing a mirror at a point along the shape’s axis,
9

with its orientation matching the local orientation of the axis. If the axis is defined appropriately, this mirror
will reflect the tangent of the contour on one side of the shape to the tangent of the contour on the opposite
side of the shape (Leyton 1989).
250 Singh

(a) (b)

Fig. 12.9  (a) Stereoscopic stimuli used by Fulvio and Singh (2006) to study the influence of region-based
geometry on illusory-contour shape. In these stimuli, region-based geometry was manipulated while
keeping local contour geometry fixed (as in Figure 12.8). A schematic of the binocular percept is shown
in (b). The results showed significant differences in perceived illusory-contour shape as a function of
region-based geometry.
Reprinted from Acta Psychologica, 123 (1–2), Jacqueline M. Fulvio and Manish Singh, Surface geometry influences
the shape of illusory contours, pp. 20–40, Copyright © 2006 with permission from Elsevier.

enclosing locally concave shapes were found to be systematically more angular (closer to the inter-
section point of the linear extrapolations of the two inducers) than those enclosing locally convex
shapes. This influence of local convexity is consistent with results obtained with partly occluded
shapes (Fantoni, Bertamini, and Gerbino 2005). Beyond the influence of local sign of curvature,
however, this influence of local convexity also exhibited an interaction with two skeleton-based vari-
ables: shape width and axis curvature. Specifically, the influence of local convexity on illusory-contour
shape was found to be: (1) greater for narrower shapes than for wider ones; and (2) greater for shapes
with a straight axis and symmetric contours (‘diamonds’ and ‘bowties’; Figures 12.8b and 12.8c) than
for shapes with a curved axis and locally parallel contours (‘bending tubes’; see Figures 12.8d and
12.8e). These results indicate that, even at the level of illusory ‘contours’, an important role is played by
nonlocal region-based geometry involving skeleton-based parameters.
The influence of region-based geometry manifests itself in object recognition and classification
as well. In comparing the recognition performance of contour and region-based models, Sebastian
and Kimia (2005) compared the shape-matching performance of two algorithms—one based on
matching their bounding contours, the other based on matching axis-based graphs derived from
them. They found that when small variations were introduced on the shapes (e.g. involving partial
occlusion, rearrangement of parts, or addition or deletion of a part), the contour-based matching
scheme produced many spurious matches, leading to a substantial deterioration in performance.
By contrast, the axis-based matching scheme was highly robust to such variations. They con-
cluded that, even though axis-based representations are more complex and take more time to
compute, the additional time and effort required to compute them are well worth it.
Do human observers make use of parameters of the shape skeleton in classifying shapes?
Different classes of shape—e.g. animals and leaves—differ not only in their means along various
skeleton-based parameters (e.g. number of branches, axis curvature, etc.), but also in their dis-
tributional forms. For example, the distribution of number of branches tends to be Gaussian for
animals with a mean of around 5 (reflecting the typical number of body parts in an animal body
plan), whereas the distribution tends to be exponential for leaves (consistent with a recursively
(a)

0.6
Animals n=424
Leaves n=341
0.5

Probability 0.4

0.3

0.2

0.1

0
0 5 10 15 20 25
Number of branches

(b)

70 50 30
Fig. 12.10  Different categories of shape, such as animals and leaves, differ in the statistics of various
skeleton-based parameters. (a) Shows the distribution of number of axial branches computed from
databases of animal and leaf shapes. Note that the two categories differ both in the mean, as well as
the distributional form, of this variable. (b) To address the question of whether human observers rely on
skeleton-based statistics to classify shapes, Wilder, Feldman, and Singh (2011) created morphed shapes
by mixing animals and leaves in different proportions. Subjects were asked whether each morphed
shape looked ‘more like’ an animal or leaf. The results showed that a naive Bayesian classifier based on
the distribution of a small number of axis-based parameters provided an excellent predictor of human
shape classification.
Reprinted from Cognition 119(3), John Wilder, Jacob Feldman, and Manish Singh, Superordinate shape
classification using natural shape statistics, pp. 325–40, Copyright © 2011 with permission from Elsevier.
252 Singh

branching process); see Figure 12.10a. Do human subjects rely on such statistical differences in
skeletal parameters when performing shape classification? Wilder, Feldman, and Singh (2011)
used morphed shapes created by combining animal and leaf shapes in different proportions (e.g.
60% animal and 40% leaf; see Figure 12.10b). Subjects indicated whether each shape looked more
like an animal or more like a leaf. (The morphing proportions ranged between 30% and 70% so
the shapes were typically not recognizable as any particular animal or leaf.) They then compared
subjects’ performance with that of a naive Bayesian classifier based on a small number of skeletal
parameters, and found a close match between the two. By contrast, classification based only on
contour-based variables (such as contour curvature) and other traditional shape measures (such
as compactness and aspect ratio) did not provide good predictors of human classification perfor-
mance. These comparisons provide strong evidence for the use of a skeleton-based representation
of shape by the human visual system. More recent work also provides evidence for the role of
region-based representation of shape in contour-detection tasks, i.e. detecting a closed contour in
background noise (Wilder, Singh, and Feldman 2013).

Conclusions
Contours constitute an essential source of information about shape, and along contours points
with the greatest magnitude of curvature tend to be most informative. This concentration of
information is closely tied to generative models of contours assumed by the visual system—i.e. its
internal models about how contours tend to be generated (and hence its expectations about how
contours tend to behave locally). Therefore, visual expectations about contour continuity (‘good
continuation’) and the information content of contours are naturally viewed as two sides of the
same coin. In going from open to closed contours—such as the outlines of objects—the influence
of sign of curvature (convex vs concave) becomes critical, with concave sections of a contour
carrying more information, and playing a special role in part segmentation. The visual system
represents complex shapes by automatically segmenting them into simpler parts—‘simpler’
because these parts are closer to being convex (they contain less negative curvature). One type
of curvature extrema—negative minima of curvature—provides a particularly important cue
for part segmentation. However, sign of curvature (local convexity) and curvature extrema are
entirely contour-based notions, and this fact likely explains why the minima rule cannot fully
predict part segmentation. The visual system employs not only a contour-based representation of
shape, but also a region-based one—namely, a representation of the interior region enclosed by
the contour—making explicit properties such as the local width of the shape, the curvature of its
axis, and more generally, locally parallel and locally symmetric structure. Psychophysical results
from a variety of domains—shape classification, amodal and modal grouping, visual shape com-
pletion—provide clear evidence for the representation of region geometry based on skeleton or
axis models. Even at the level of so-called ‘illusory contours’, nonlocal region-based geometry
exerts a strong influence.
We conclude that, as far as the human visual representation of shape is concerned, contour
geometry cannot ultimately be viewed in isolation, but must be considered in tandem with
region-based geometry.

References
Attneave, F. (1954). ‘Some Informational Aspects of Visual Perception’. Psychological Review 61: 183–193.
Barenholtz, E., E. H. Cohen, J. Feldman, and M. Singh (2003). ‘Detection of Change in Shape: An
Advantage for Concavities’. Cognition 89(1): 1–9.
Visual Representation of Contour and Shape 253

Barenholtz, E. and J. Feldman (2003). ‘Visual Comparisons within and between Object Parts: Evidence for
a Single-part Superiority Effect’. Vision Research 43(15): 1655–1666.
Baylis, G. C. and J. Driver (1994). ‘Parallel Computation of Symmetry but not Repetition in Single Visual
Objects’. Visual Cognition 1: 377–400.
Baylis, G. C. and J. Driver (1995). ‘Obligatory Edge Assignment in Vision: The Role of Figure and Part
Segmentation in Symmetry Detection’. Journal of Experimental Psychology: Human Perception and
Performance 21(6): 1323–1342.
Bertamini, M. and T. Farrant (2005). ‘Detection of Change in Shape and its Relation to Part Structure’. Acta
Psychologica 120: 35–54.
Biederman, I. (1987). ‘Recognition by Components: A Theory of Human Image Understanding’.
Psychological Review 94: 115–147.
Biederman, I. and G. Ju (1988). ‘Surface vs. Edge-Based Determinants of Visual Recognition’. Cognitive
Psychology 20: 38–64
Biederman, I. and E. E. Cooper (1991). ‘Priming Contour-Deleted Images: Evidence for Intermediate
Representations in Visual Object Recognition’. Cognitive Psychology 23: 393–419.
Blum, H. (1973). ‘Biological Shape and Visual Science (Part I)’. Journal of Theoretical Biology 38: 205–287.
Blum, H. and R. N. Nagel (1978). ‘Shape Description Using Weighted Symmetric Axis Features’. Pattern
Recognition 10: 167–180.
Box, G. E. P. and G. C. Tiao (1992). Bayesian Inference in Statistical Analysis. New York: Wiley.
Chen, X., A. Golovinskiy, and T. A. Funkhouser (2009). ‘A Benchmark for 3D Mesh Segmentation’. ACM
Transactions on Graphics 28(3): 1–12.
Clottes, J. (2003). Chauvet Cave: The Art of Earliest Times. Translated by Paul G. Bahn. Salt Lake
City: University of Utah Press.
Cohen, E. H., E. Barenholtz, M. Singh, and J. Feldman (2005). ‘What Change Detection Tells Us about the
Visual Representation of Shape’. Journal of Vision 5(4): 313–321.
Cohen, E. H. and M. Singh (2006). ‘Perceived Orientation of Complex Shape Reflects Graded Part
Decomposition’. Journal of Vision 6(8): 805–821.
Cohen, E. H. and M. Singh (2007). ‘Geometric Determinants of Shape Segmentation: Tests Using Segment
Identification’. Vision Research 47: 2825–2840.
Cole, F., K. Sanik, D. DeCarlo, A. Finkelstein, T. Funkhouser, S. Rusinkiewicz, and M. Singh (2009).
‘How Well Do Line Drawings Depict Shape?’ In ACM Transactions on Graphics (Proc. SIGGRAPH)
28: 2009.
De Winter, J. and J. Wagemans (2006). ‘Segmentation of Object Outlines into Parts: A Large-scale
Integrative Study’. Cognition 25: 275–325.
De Winter, J. and J. Wagemans (2008a). ‘The Awakening of Attneave’s Sleeping Cat: Identification of
Everyday Objects on the Basis of Straight-line Versions of Outlines’. Perception 37: 245–270.
De Winter, J. and J. Wagemans (2008b). ‘Perceptual Saliency of Points along the Contour of Everyday
Objects: A Large-scale Study’. Perception and Psychophysics 70(1): 50–64.
Denisova, K., M. Singh, and E. Kowler (2006). ‘The Role of Part Structure in the Perceptual Localization of
a Shape’. Perception 35: 1073–1087.
Elder, J. H. and S. W. Zucker (1993). ‘Contour Closure and the Perception of Shape’. Vision Research
33(7): 981–991.
Elder, J. H. and R. M. Goldberg (2002). ‘Ecological Statistics of Gestalt Laws for the Perceptual
Organization of Contours’. Journal of Vision 2(4): 324–353.
Fantoni, C. and W. Gerbino (2003). ‘Contour Interpolation by Vector-field Combination’. Journal of Vision
3(4): 281–303.
Fantoni, C., M. Bertamini, and W. Gerbino (2005). ‘Contour Curvature Polarity and Surface Interpolation’.
Vision Research 45: 1047–1062.
254 Singh

Feldman, J. (1997). ‘Curvilinearity, Covariance, and Regularity in Perceptual Groups’. Vision Research
37(20): 2835–2848.
Feldman, J. (2001). ‘Bayesian Contour Integration’. Perception and Psychophysics 63(7): 1171–1182.
Feldman, J. and M. Singh (2005). ‘Information along Contours and Object Boundaries’. Psychological
Review 112(1): 243–252.
Feldman, J. and M. Singh (2006). ‘Bayesian Estimation of the Shape Skeleton’. Proceedings of the National
Academy of Sciences 103(47): 18014–18019.
Feldman, J., M. Singh, E. Briscoe, V. Froyen, S. Kim, and J. Wilder (2013). ‘An Integrated Bayesian
Approach to Shape Representation and Perceptual Organization’. In Shape Perception in Human and
Computer Vision: An Interdisciplinary Perspective, edited by S. Dickinson and Z. Pizlo, pp. 55–70.
London: Springer.
Field, D. J., A. Hayes, and R. F. Hess (1993). ‘Contour Integration by the Human Visual System: Evidence
for a Local “Association Field”’. Vision Research 33(2): 173–193.
Froyen, V., J. Feldman, and M. Singh (2010). ‘A Bayesian Framework for Figure-ground Interpretation’.
In Advances in Neural Information Processing Systems, edited by J. Lafferty, C. K. I. Williams,
J. Shawe-Taylor, R. Zemel, and A. Culotta, pp. 631–639. La Jolla, CA: The NIPS Foundation.
Fulvio, J. M. and M. Singh (2006). ‘Surface Geometry Influences the Shape of Illusory Contours’. Acta
Psychologica 123: 20–40.
Fulvio, J. M., M Singh, and L. T. Maloney (2008). ‘Precision and Consistency of Contour Interpolation’.
Vision Research 48: 831–849.
Fulvio, J. M., M. Singh, and L. T. Maloney (2009). ‘An Experimental Criterion for Consistency in
Interpolation of Partially-occluded Contours’. Journal of Vision 9(4): 5: 1–19.
Garrigan, P. (2012). ‘The Effect of Contour Closure on Shape Recognition’. Perception 41(2): 221–235.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-occurrence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41: 711–724.
Geisler, W. S. and J. S. Perry (2009). ‘Contour Statistics in Natural Images: Grouping across Occlusions’.
Visual Neuroscience 26: 109–121.
Hoffman, D. D. and W. A. Richards (1984). ‘Parts of Recognition’. Cognition 18: 65–96.
Hoffman, D. D. and M. Singh (1997). ‘Salience of Visual Parts’. Cognition 63: 29–78.
Hulleman, J., W. te Winkel, and F. Boselie (2000). ‘Concavities as Basic Features in Visual Search: Evidence
from Search Asymmetries’. Perception and Psychophysics 62: 162–174.
Hume, D. (1748/1993). An Enquiry concerning Human Understanding. Indianapolis, IN: Hackett.
Kellman, P. and T. Shipley (1991). ‘A Theory of Visual Interpolation in Object Perception’. Cognitive
Psychology 23: 141–221.
Kennedy, J. M. and R. Domander (1985). ‘Shape and Contour: The Points of Maximum Change Are Least
Useful for Recognition’. Perception 14: 367–370.
Kimia, B. (2003). ‘On the Role of Medial Geometry in Human Vision’. Journal of Physiology 97: 155–190.
Koenderink, J. J. and A. van Doorn (1982). ‘The Shape of Smooth Objects and the Way Contours End’.
Perception 11: 129–137.
Koenderink, J. J. (1984). ‘What Does the Occluding Contour Tell us about Solid Shape?’ Perception
13: 321–330.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kogo, N., C. Strecha, L. Van Gool, J. Wagemans (2010). ‘Surface Construction by a 2-D
Differentiation-Integration Process: A Neurocomputational Model for Perceived Border Ownership,
Depth, and Lightness in Kanizsa Figures’. Psychological Review 117(2), 406–439.
Kovacs, I. and B. Julesz (1993). ‘A Closed Curve Is Much More than an Incomplete One: Effect
of Closure in Figure-ground Segmentation’. Proceedings of the National Academy of Sciences
90: 7495–7497.
Visual Representation of Contour and Shape 255

Latecki, L. and R. Lakamper (1999). ‘Convexity Rule for Shape Decomposition Based on Discrete Contour
Evolution’. Computer Vision and Image Understanding 73: 441–454.
Leyton, M. (1989). ‘Inferring Causal History from Shape’. Cognitive Science 13: 357–387.
Liu, Z., D. Jacobs, and R. Basri (1999). ‘The Role of Convexity in Perceptual Completion: Beyond Good
Continuation’. Vision Research 39: 4244–4257.
Marr, D. and H. K. Nishihara (1978). ‘Representation and Recognition of the Spatial Organization of
Three-dimensional Shapes’. Proceedings of the Royal Society of London B 200: 269–294.
Norman, J. F., F. Phillips, and H. E. Ross (2001). ‘Information Concentration along the Boundary Contours
of Naturally Shaped Solid Objects’. Perception 30: 1285–1294.
Panis, S., J. de Winter, J. Vandekerckhove, and J. Wagemans (2008). ‘Identification of Everyday Objects on
the Basis of Fragmented Versions of Outlines’. Perception 37: 271–289.
Parent, P. and S. W. Zucker (1989). ‘Trace Inference, Curvature Consistency, and Curve Detection’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 2(8): 823–839.
Pasupathy, A. and C. E. Connor (2002). ‘Population Coding of Shape in Area V4’. Nature Neuroscience
5(12): 1332–1338.
Pizlo, Z., M. Salach-Goyska, and A. Rosenfeld (1997). ‘Curve Detection in a Noisy Image’. Vision Research
37(9): 1217–1241.
Richards, W., B. Dawson, and D. Whittington (1986). ‘Encoding Contour Shape by Curvature Extrema’.
Journal of the Optical Society of America A 3: 1483–1491.
Rosin, P. L. (2000). ‘Shape Partitioning by Convexity’. IEEE Transactions on Systems, Man, and Cybernetics,
Part A 30: 202–210.
Sebastian, T. and B. Kimia (2005). ‘Curves vs. Skeletons in Object Recognition’. Signal Processing 85 (2): 247–263.
Siddiqi, K., B. Kimia, A. Tannenbaum, and S. Zucker (2001). ‘On the Psychophysics of the Shape Triangle’.
Vision Research 41(9): 1153–1178.
Siddiqi, K., K. Tresness, and B. Kimia (1996). ‘Parts of visual form: psychophysical aspects. Perception
25: 399–424.
Singh, M. and D. D. Hoffman (1998). ‘Part Boundaries Alter the Perception of Transparency’. Psychological
Science 9: 370–378.
Singh, M. and D. D. Hoffman (1999). ‘Completing Visual Contours: The Relationship between Relatability
and Minimizing Inflections’. Perception and Psychophysics 61: 636–660.
Singh, M., G. D. Seyranian, and D. D. Hoffman (1999). ‘Parsing Silhouettes: The Short-cut Rule’. Perception
and Psychophysics 61(4): 636–660.
Singh, M. and D. D. Hoffman (2001). ‘Part-based Representations of Visual Shape and Implications for
Visual Cognition’. In From Fragments to Objects: Segmentation and Grouping in Vision: Advances in
Psychology, edited by T. Shipley and P. Kellman, vol. 130, pp. 401–459. New York: Elsevier.
Singh, M. and J. M. Fulvio (2005). ‘Visual Extrapolation of Contour Geometry’. Proceedings of the National
Academy of Sciences, USA 102(3): 939–944.
Singh, M. and J. M. Fulvio (2007). ‘Bayesian Contour Extrapolation: Geometric Determinants of Good
Continuation’. Vision Research 47: 783–798.
Singh, M. and J. Feldman (2012). ‘Principles of Contour Information: A Response to Lim and Leek (2012)’.
Psychological Review 119(3): 678–683.
Singh, M., J. Feldman, and V. Froyen (in preparation). ‘Unifying Parts and Skeletons: A Bayesian Approach
to Part Segmentation’. In Handbook of Computational Perceptual Organization, edited by S. Gepshtein, L.
T. Maloney & M. Singh. Oxford: Oxford University Press.
Takeichi, H., H, Nakazawa, I. Murakami, and S. Shimojo (1995). ‘The Theory of the Curvature-constraint
Line for Amodal Completion’. Perception 24: 373–389.
Tversky, T., W. Geisler, and J. Perry (2004). ‘Contour Grouping: Closure Effects are Explained by Good
Continuation and Proximity’. Vision Research 44(24): 2769–2777.
256 Singh

Ullman, S. (1976). ‘Filling-in the Gaps: The Shape of Subjective Contours and a Model for their Generation’.
Biological Cybernetics 25: 1–6.
Vandekerckhove, J., S. Panis, and J. Wagemans (2008). ‘The Concavity Effect is a Compound of Local and
Global Effects’. Perception and Psychophysics 69: 1253–1260.
Vecera, S. P., M. Behrmann, and J. C. Filapek (2001). ‘Attending to the Parts of a Single Object: Part-based
Selection Limitations’. Perception and Psychophysics 63: 308–321.
Walther, D., B. Chai, E. Caddigan, D. Beck, and Li Fei-Fei (2011). ‘Simple Line Drawings Suffice for
Functional MRI Decoding of Natural Scene Categories’. Proceedings of the National Academy of Sciences
of the USA, 108(23): 9661–9666.
Warren, P. A., L. T. Maloney, and M. S. Landy (2002). ‘Interpolating Sampled Contours in 3D: Analyses of
Variability and Bias’. Vision Research 42: 2431–2446.
Wertheimer, M. (1923). ‚Untersuchungen zur Lehre von der Gestalt II’. Psychologische Forschung 4: 301–350.
Translation published in W. Ellis (1938). A Source Book of Gestalt Psychology. London: Routledge and
Kegan Paul, pp. 71–88.
Wilder, J., J. Feldman, and M. Singh (2011). ‘Superordinate Shape Classification Using Natural Shape
Statistics’. Cognition 119: 325–340.
Wilder, J., M. Singh, and J. Feldman (2013). ‘Detecting Shapes in Noise: The Role of Contour-based and
Region-based Representations’. Poster presented at the Annual Meeting of the Vision Sciences Society
(VSS 2013).
Wolfe, J. M. and S. C. Bennett (1997). ‘Preattentive Object Files: Shapeless Bundles Of Basic Features’.
Vision Research 37: 25–43.
Xu, Y. and M. Singh (2002). ‘Early Computation of Part Structure: Evidence from visual Search’. Perception
and Psychophysics 64: 1039–1054.
Yuille, A. L., F. Fang, P. Schrater, and D. Kersten (2004). ‘Human and Ideal Observers for Detecting
Image Curves’. In Advances in Neural Information Processing Systems, edited by S. Thrun, L. Saul, and
B. Schoelkopf, vol. 16, pp. 59–70. Cambridge, MA: MIT Press.
Section 4

Figure-ground organization
Chapter 13

Low-level and high-level contributions


to figure-ground organization
Mary A. Peterson

Background
Investigators of visual perception have yet to find a completely satisfactory answer to the funda-
mental question, ‘How do we segregate a complex scene into individual objects?’. For the most
part we seem to accomplish this task readily, but the apparent ease of object perception can lead
us astray as we try to understand how it is done.
At one level we can describe the segregation of a scene into objects (or ‘figures’) as follows.
When two regions of the visual input share a border, visual processes determine whether one of
them has a definite shape bounded by the shared border. In this case, the shaped region is per-
ceived as the figure (the object) and the border is perceived as its bounding contour. The region on
the opposite side of the border appears to simply continue behind the figure/object; it is perceived
as a shapeless ground to the figure/object at their shared border. This figure–ground interpretation
is a local one; regions can be perceived as grounds along one portion of their border and as figures
along other portions (Hochberg 1980; Peterson 2003a; Kim and Feldman 2009). Note that the
figure appears to be closer to the viewer than the ground at their shared border; thus the border is
perceived as a depth edge. Figure 13.1(A) illustrates the distinction between figures and grounds.
Our understanding of the processes involved in arriving at these percepts has progressed over the
last 100 years, but it remains far from complete.
In attempting to understand how object perception occurs, many theorists have taken figure–
ground assignment to occur an early stage of processing, one that happens at a low level in the
visual hierarchy before object memories stored at higher levels are accessed and before attention
operates. The assumption is that figures must be defined at this low/early stage in order to provide
a substrate for those higher-level processes. This is the classic view of figure-ground assignment,
and is discussed in the next section ‘The Traditional View of Figure–Ground Perception’. On the
classic view of figure–ground assignment, only properties that can be computed on the image can
influence the first figure assignment; properties that require access to memory may affect later
interpretations but not the first one (Wertheimer 1923/1938). A number of such image-based fac-
tors have been identified; those factors are reviewed in ‘The Traditional View of Figure–Ground
Perception’. Modern research suggests that the classic low-level stage view of figure assignment is
not correct. Instead, research shows that high-level representations of object structure and seman-
tics and subjective factors like attention and intention influence figure assignment. This research
is reviewed in ‘Challenges to the Classic View: High-level Influences on Figure Assignment’. In the
modern approach figure assignment is viewed as resulting from interactions between high and
low levels of the visual hierarchy. In ‘Modern Theoretical Views of Figure–Ground Perception’, we
260 Peterson

(a) (b) (d)

(c)

Fig. 13.1  (a) A black region shares borders with three white regions. It shares borders with two of
these white regions on the bottom and right side. There, the white regions are the near, shaped
entities (the figures)—they depict a cat and a tree—and the black region is perceived as a locally
shapeless ground. The black region shares borders with a third white region on the left and top.
There, the black region is perceived as the shaped entity—a woman—and the white side is perceived
as a locally shapeless ground. (b), (c) Displays with eight alternating black and white regions of equal
area. The black regions are critical regions in that they possess Gestalt configural properties of (local)
convexity (b) and symmetry (c). Participants tend to report that they perceive the critical regions as
figures under conditions where the critical regions are black and white equally often. (d) The black
region is smaller than, and enclosed by, the white region.
This material has been reprinted from Mary A. Peterson, ‘Overlapping partial configuration in object memory: an
alternative solution to classic problems in perception and recognition’, in Mary A. Peterson and Gillian Rhodes
(eds), Perception of Faces, Objects, and Scenes: Analytic and Holistic Processes, p. 270, figure 10.1a © 2003,
Oxford University Press and has been reproduced by permission of Oxford University Press http://ukcatalogue.
oup.com/product/9780195313659.do For permission to reuse this material, please visit http://www.oup.co.uk/
academic/rights/permissions.

discuss these models and review recent evidence consistent with this highly interactive alternative
to the classic view. Finally we give our Conclusion.

The traditional view of figure–ground perception


Early in the twentieth century, the Structuralists and the Gestalt psychologists debated the role
of past experience in organizing the visual input. The Structuralists (e.g., Wundt and Titchener)
argued that past experience was solely responsible for perceptual organization. On this view, one
perceives objects in the present scene because those objects had been seen previously. The Gestalt
psychologists (e.g., Wertheimer and Koffka) raised questions highlighting the weaknesses of the
Structuralist position, such as: How are novel objects perceived? How does one find the appro-
priate memory to use to organize the present display from myriad memories? As an alternative,
the Gestalt psychologists proposed that before memories of past experiences are accessed, the
visual input is organized into figures and grounds based on factors readily apparent in the image.
Subsequently, the figures served as the substrates on which higher-level processes like attention
and memory access operate; the grounds were not analyzed by high-level processes.
To account for figure–ground organization without recourse to past experience in the form of
object memories, the Gestalt psychologists held that there were inborn tendencies to see regions
with certain properties as figures. Those ‘configural’ properties included convexity, symmetry,
Low-level and High-level Contributions to Figure–Ground Organization 261

small area, and enclosure. In principle, the configural properties identified by the Gestalt psy-
chologists can be calculated on the image without calling upon memory.1 The Gestalt psycholo-
gists and others demonstrated that observers were likely to perceive regions with these classic
properties as figures more often than abutting regions that were concave, asymmetric, larger in
area, and enclosing (e.g., Bahnsen 1928; Rubin 1958/1915; Kanisza and Gerbino 1976; for review,
see Hochberg 1971; Pomerantz and Kubovy 1986; Peterson 2001).
Results demonstrating the effectiveness of many of the configural properties were obtained in
experiments in which observers viewed stimuli with abutting black and white regions sharing
borders, and reported whether the black region(s) or the white region(s) appeared to be figures.
The regions of one color possessed the property under consideration whereas the regions of the
other color did not, and no other properties known to be relevant to figure-ground perception2
distinguished the two regions. Many sample displays were presented so that the property being
tested was paired with the black and white regions equally often. Figures 13.1(B)–(D) shows sam-
ple displays used to test the role of convexity, symmetry, enclosure, and small area. Observers
tended to report perceiving regions with the tested properties as figures on a large proportion of
trials, as much as 90 per cent for convexity (Kanisza and Gerbino 1976).
The Gestalt psychologists demonstrated that properties such as convexity, symmetry, closure,
and small area—properties that could be calculated on the input image and did not seem to
demand past experience—can account for figure assignment; that past experience is not neces-
sary. These results contradicted the Structuralists’ claim that past experience alone segregates
objects from one another, at least on the assumption that there is an inborn tendency to use the
Gestalt configural properties for figure assignment. The Gestalt view that figure–ground segre-
gation preceded access to object memories took hold. Many theorists still hold the classic view
today (e.g., see Craft et al. 2007 for a recent statement of this view), and it remains quite common
for theorists to conceive of figure–ground segregation as an early process or stage of processing
(e.g., Zhou et al. 2000). But note that evidence indicating that the Gestalt configural properties
are relevant to figure assignment does not entail that past experience is not also relevant. We
discuss evidence showing that past experience plays a role in figure assignment in ‘Challenges
to the Classic View: High-level Influences on Figure Assignment’. First, we review other recently
identified configural properties that can in principle be calculated on the image.

New image-based configural properties


Additional image-based properties relevant to figure assignment have been discovered during the
twentieth century and the early 2000s. These new properties are discussed here and are illustrated
in Figures 13.2(A)–(G).

A note about methods


The investigators who demonstrated the relevance of new image-based properties did so using
a variety of methods, including both the traditional method of showing observers test displays
and asking them to report which region they perceived as figure (direct reports) and new indirect
methods in which observers perform matching tasks or search tasks and experimenters use the

  They might instead be extracted during an individual’s lifetime from statistical regularities of the
1

environment.
2  At the time, investigators did not know that using displays with multiple regions inflated estimates of the effec-
tiveness of the properties of convexity and symmetry (see Peterson and Salvagio 2008; Mojica & Peterson 2014).
262 Peterson

response time (RT) data from these other tasks to infer how observers had organized the test
displays.
One benefit of indirect methods is that they don’t require instructions regarding figure assign-
ment; hence, according to their proponents, they may be less likely to induce certain types of
response biases based on hypotheses about what the experimenter expects (Driver and Baylis
1996; Hulleman and Humphreys 2004; Vecera et al. 2002; for review, see Wagemans et al. 2012;
Peterson and Kimchi 2013). Note, however, that in all cases where indirect measures have been
employed they supported the same conclusions as direct reports. Thus, where indirect meas-
ures have been used they have not uncovered evidence that direct reports were contaminated by
response bias, an important contribution.
Another benefit of indirect measures is that whereas an individual’s reports regarding what
he or she perceives as figure cannot be scored as ‘correct’ or ‘incorrect’, there is a correct answer
on the indirect tasks that are employed; RTs on correct trials can be compared across various
conditions, and the RT differences may provide insight into various aspects of figure–ground
perception. For instance, indirect methods have been enormously useful in attempts to learn
about figure–ground-relevant processing taking place outside of awareness (see ‘Challenges to the
Classic View: High-level Influences on Figure Assignment’).
Despite the benefits of indirect methods, direct measures remain important. To date, only
direct reports allow one to measure the probability that a region with a certain property will be
perceived as figure in a briefly exposed display. Given that configural properties operate proba-
bilistically and their effectiveness is influenced by context (Zhou et al. 2000; Jehee et al. 2007;
Peterson and Salvagio 2008; Goldreich and Peterson 2012), probability measures have been
very useful in elucidating the mechanisms of figure assignment. Moreover, although indirect
methods sometimes assay perceived organization, at other times they convey information
about the process of arriving at a percept rather than the percept itself. For instance, rather
than using response times to index which region are perceived as the figure, Peterson and
Lampignano (2003) and Peterson and Enns (2005) used them to assay competition for figural
status between cues/properties that favor assigning the figure on opposite sides of a border.
Observers were aware of the figures they perceived, but they were unaware of the competition
that led to their percepts. Thus, in this case, indirect methods informed about process rather
than about the percept. In-depth discussions of the methods can be found elsewhere (e.g.,
Wagemans et al. 2012; Peterson and Kimchi 2013). In the remainder of this section we simply
indicate whether direct or indirect methods were used in experiments supporting a role for var-
ious properties in figure assignment. In ‘Challenges to the Classic View: High-level Influences
on Figure Assignment’ and ‘Modern Theoretical Views of Figure–Ground Perception’ we also
point out how indirect measures have been useful in attempts to understand the mechanisms
of figure assignment.

New static and dynamic image-based properties


The new image-based properties include both static and dynamic properties. We review new
static properties first, and then new dynamic properties.

Part salience
Using direct reports, Hoffman and Singh (1997) showed that the figure is more likely to be per-
ceived on the side of a border where the parts are more ‘salient’. Part salience (Figure 13.2A) is
determined by a number of geometric factors, including the curvature (‘sharpness’) of the part’s
Low-level and High-level Contributions to Figure–Ground Organization 263

(a) (b) (c) (d)

EE Non-EE

(e) (f) (g)

edge motion

dot motion Frame 1 Frame 2 Frame 1 Frame 2

Fig. 13.2  (a) The black region with a salient part tends to be perceived as the figure. (b) An
extremal edge (EE) cues the left side of the central border as the figure. (This illustration
was originally published as Figure 13.1(b) on p. 78 of ‘Extremal edges: a powerful cue to
depth perception and figure-ground organization’ by Stephen E. Palmer and Tandra Ghose,
Psychological Science, 19(1): 77–84. Copyright © 2008 Association for Psychological Science.
Reprinted by Permission of SAGE Publications.) (c) The black, lower, region tends to be perceived
as the figure. (d) The black regions are wider at the base than at the top, and tend to be
perceived as figures. (e) When the white dots on the black region and the border between the
black and white regions move synchronously in the same direction (say to the right as indicated
by the arrows above and below the display) and the black dots on the white region remain
stationary, the black region is perceived as the figure. (f) Two frames side by side indicate two
sequential frames. The dashed lines are overlaid on the figures to help the reader understand
how the displays transformed from frame 1 to frame 2. Observers perceived the black region
as the deforming figure because the convex parts delimited from the black side of the border
were perceived to move hinged on the concave cusps between them. (g) Two frames side by
side indicate two sequential frames.  The black region is perceived as the moving figure, as if it is
advancing on the white region. The dashed vertical lines are added to aid the appreciation of the
advancing movement in the static display.
Reproduced Stephen E. Palmer and Joseph L. Brooks, Edge-region grouping in figure-ground organization and
depth perception, Journal of Experimental Psychology: Human Perception and Performance, 34 (6), p. 1356,
figure 13.1a © 2008, American Psychological Association.

boundaries and the degree to which it ‘sticks out’, measured as perimeter/cut length. Part salience
is related to convexity, but it allows quantification of other geometric factors.

Extremal edges and gradient cuts


An extremal edge (EE) in an image is a projection of a viewpoint-specific horizon of self-occlusion
on a smooth convex surface; the straight side of a cylinder is an example of an EE (Figure 13.2B).
Using direct reports, Palmer and Ghose (2008) showed that the figure tends to be assigned on the
264 Peterson

side of a border with an EE gradient; this is true even when the EE is placed in conflict with other
factors (Ghose and Palmer 2010).

Lower region
Using both direct and indirect measures, Vecera et al. (2002) showed that regions below a hori-
zontally oriented border are more likely than regions above the border to be perceived as figure
(Figure 13.2C). In principle, the lower region can be calculated on the input image, so we list it
here, although we note that this cue could be derived from past experience. Vecera and Palmer
(2006) proposed that the configural property of the lower region derives from the ecological sta-
tistics of objects in the earth’s gravitational field. Note that ecological statistics can in principle
underlie many of the image-based configural cues; hence, these properties may have become rel-
evant over the course of evolution, as assumed by the Gestalt psychologists, or during an indi-
vidual’s lifetime.

Top–bottom polarity
Using both direct and indirect measures, Hulleman and Humphreys (2004) showed that regions
that are wider at the bottom and narrower at the top are more likely to be perceived as figures than
regions that are wider at the top and narrower at the bottom (Figure 13.2D). Like the lower region
property, top–bottom polarity can be calculated on the input image. Inasmuch as it accords with
gravitational stability, it might have evolved as a figure cue or it might be extracted from ecological
statistics during an individual’s lifetime.

Edge-region grouping
Palmer and Brooks (2008) showed that properties that group a border with the region on one
side but not the other can affect figure assignment (Figure 13.2E). Six different grouping fac-
tors (common fate, proximity, flicker synchrony, and three varieties of similarity—blur similar-
ity, color similarity, and orientation similarity) affected figure assignment, as assessed by direct
reports and confidence estimates, albeit to widely varying degrees. Figure 13.2(E) is a static
display illustrating the effect of common fate in a bipartite display comprising two equal-area
regions, one black and one white, covered with dots of the opposite contrast. When the dots
on one region and the border between the two regions move synchronously in the same direc-
tion, the region on which the dots lie is perceived as the figure. For instance, in Figure 13.2(E),
if the white dots on the black region move to the right at the same time as the central border
moves to the right (as indicated by the arrow below the display) and the black dots on the white
region remain stationary, the common fate of the white dots on the black region and the border
increases the probability that the black region will be perceived as the figure. Similar effects were
found for flicker (Weisstein and Wong 1987), blur similarity (Marshall et al. 1996; Mather and
Smith 2002), and a different common fate display (Yonas et al. 1987). Some of the properties
that group borders with regions involve dynamic changes (common fate and flicker synchrony),
whereas others are static (e.g., proximity and similarity). We next discuss two new configural
properties that involve dynamic changes.

Articulating motion
Barenholtz and Feldman (2006) showed that when a contour deforms dynamically, observers
tend to assign figure and ground in such a way that the articulating vertex is concave rather
than convex (Figure 13.2F). They used bipartite displays in which a central border separated the
Low-level and High-level Contributions to Figure–Ground Organization 265

display into two equal-area regions. One region had convex parts delimited by concave cusps
whereas the other region had concave parts. They deformed the central border between succes-
sive frames (‘Frame 1’ and ‘Frame 2’ in Figure 13.2F) and asked observers to report which side of
the display appeared to be the deforming figure. Observers perceived the convex parts as moving
as if they were hinged on the concave cusps between them, an effect that depended on the con-
cavity of the cusps separating the convex parts (Barenholtz and Feldman 2006), consistent with
the hypothesis that a concave vertex is the joint between the convex parts of a figure (Hoffman
and Richards 1984). Later, Kim and Feldman (2009) asked observers to report which side of the
border appeared to be moving rather than which side appeared to be the figure, thereby using
reports about motion to assay figure assignment indirectly. This is a valuable indirect measure
because few assumptions are required to translate observers’ moving side reports into figure side
reports, although stimuli must be exposed for relatively long durations so that the motion can
be perceived.

Advancing region motion


Barenholtz and Tarr (2009) showed that when a border is moved such that the bounded area
grows on one side and shrinks on the other side, as in Figure 13.2(G), observers report perceiving
the figure on the growing side, such that the figure appears to be advancing onto the other region.
Thus, advancing region overpowers the classic Gestalt configural property of small area.

Image-based ground properties


Peterson and Salvagio (2008) found that the likelihood that convex regions are perceived as
figures varies with the color homogeneity of the concave regions alternating with the convex
regions: when the concave regions are homogeneously colored, as in Figure 13.3(A), the convex
regions are highly likely to be perceived as figures, but when the concave regions are heterogene-
ously colored, as in Figure 13.3(B), the convex regions are not perceived as figures much more
often than expected on the basis of chance.
Goldreich and Peterson (2012) pointed out that single objects (or single surfaces) tend to be a
single color, or at least tend not to change color only when out of sight behind other objects. Thus,
when homogeneously colored regions alternate with regions endowed with object properties (e.g.,
convexity), the best interpretation for the display is that the homogeneously colored regions are
portions of a single surface behind the convex objects, i.e., they are the ground regions.3
Peterson and Salvagio (2008) also found that effects of convexity were reduced when the num-
ber of alternating convex and concave regions decreased from eight to two in displays with homo-
geneously colored concave regions (Figure 13.3C). Goldreich and Peterson (2012) claimed that
when there were four homogeneously colored concave regions (as in the eight-region displays)
there was strong support for the interpretation that the concave regions were disparate views onto
a single ground surface. This support became systematically weaker as the number of concave
regions decreased. Peterson and Salvagio’s results shown below the displays in Figure 13.3(C)
demonstrated that a previously unacknowledged ground cue enhanced the Gestalt configural cue
of convexity in the classic Gestalt demonstrations. Mojica and Peterson (2014) observed a similar
effect for symmetry, another classic Gestalt configural property.

3  This ground cue operates only in the presence of figure cues (Peterson and Salvagio 2008; Goldreich and
Peterson 2012).
266 Peterson

(a) (b)

82% 61%
(c)

57% 66% 77% 89%


Fig. 13.3  Displays used to investigate effects of convexity that revealed an image-based ground
property. The average percentage of trials on which observers reported perceiving the convex
regions as figure [averaged over observers (n = ~30) and stimuli (n = ~60)] is shown under each
display. (a), (b) Eight-region displays with alternating convex and concave regions. In (a) the concave
regions are homogeneously colored (and the convex regions are heterogeneously colored). In (b) the
concave regions are heterogeneously colored (the convex regions are homogeneously colored). The
convex regions have a higher luminance than the concave regions in (a) and a lower luminance than
the concave regions in (b). In the experiments, the luminance of the convex and concave regions
was balanced across displays. (c) Black and white displays with two, four, six, and eight regions.
Here convex regions are shown in black. In the experiments, the black/white color and the left/
right location of the convex regions was balanced across displays. (In black and white displays, both
concave and convex regions are homogeneously colored.)

Gillam and Grove (2011) pointed out that near surfaces are not necessarily located in front of
a single surface; rather they are often interposed in front of multiple objects at different distances
from the viewer. In the latter case, the contours of the occluded far objects abut the contour of the
near object in the visual field, but they are otherwise unrelated. Gillam and Grove hypothesized
that the presence of unrelated contour alignments near a border serves as a ground cue because
the unrelated contours are improbable except under conditions of occlusion. Their results sup-
ported their hypothesis, providing additional evidence that properties of grounds, as well as prop-
erties of figures, are critical to figure assignment.

Summary
Dating back to the early twentieth century and continuing to the present day myriad image-based
configural properties have been shown to affect figure assignment. Recently, ground properties
have been discovered as well. Given that object perception, which entails figure assignment, is a
critical function of vision, it is not surprising that many factors exert an influence. An analogy can
be made to depth perception, where numerous cues signal depth, including monocular, binocular,
and movement-based cues.4

4  Note that the functions served by depth cues and configural cues overlap somewhat but not completely.
Configural cues determine where objects lie with respect to a border; they signal border assignment. In contrast
Low-level and High-level Contributions to Figure–Ground Organization 267

Challenges to the classic view: high-level


influences on figure assignment
There have long been questions regarding whether the only factors that contribute to figure
assignment are image-based factors that can in principle be used without ontogenetic experi-
ence, as the Gestaltists claimed, or whether factors that vary with an individual’s experience
or subjective state can exert an influence as well (for review, see Peterson 1999a). In the last
25 years substantial evidence has accumulated showing that high-level factors such as atten-
tion, intention, and past experience influence figure assignment. We review that evidence in
this section and then go on to consider the implications for theory in ‘Modern Theoretical
Views of Figure–Ground Perception’.

Attention and intention


Kienker et al. (1986) and Sejnowski and Hinton (1987) used attention to bias figure–ground per-
ception in a computational model of figure assignment published before there was any empirical
evidence that subjective factors like attention play a role. Their model introduced the ideas that
(1) borders activate border assignment units facing in opposite directions; (2) that opposite-facing
border assignment units engage in inhibitory competition; and (3) that the figure is perceived on
the side bordered by the winning units. Much later, Zhou et al. (2000) found neurophysiological
evidence of border assignment units (see Kogo and van Ee, this volume, for discussion of modern
models building on these ideas and Alais and Blake, this volume, for discussion of competitive
models in binocular rivalry). Despite the fact that the Zeitgeist did not currently acknowledge that
attention or other high-level subjective factors could influence figure assignment, Kienker and
colleagues used attention to seed the activation of the figure units on one side of the competing
border assignment units; those highly activated figure units boosted the activation of the border
assignment units facing toward them, and consequently increased the likelihood that those bor-
der assignment units would win the competition and would appear to bound a figure lying on the
attended side.
Empirical data showing that subjective factors influence figure assignment came first as evi-
dence that the viewer’s perceptual set (‘intention’) to perceive the figure on one side of a border
increased the likelihood of seeing that side as figure (under conditions where response bias was
unlikely; Peterson et al. 1991; Peterson and Gibson 1994a). Next, Peterson and Gibson (1994a)
showed that fixated regions are more likely than unfixated regions to be perceived as figures, and
that effects of fixation combined additively with intention and with other figure-relevant proper-
ties. Since attention and fixation are often coupled, these results were consistent with the predic-
tions made by Kienker and colleagues. Baylis and Driver (1995) and Driver and Baylis (1996)
separated fixation and attention by instructing their observers to attend to one of two regions
sharing a border. Their observers remembered the shape of the region to which they had directed
their attention better than the shape of the complementary region. Because previous research by
Rubin (1958/1915) (see also Hoffman and Richards 1984) had shown that observers remember
the shape of the figure but not that of the ground, Baylis and Driver reasoned that the attended

many depth cues are irrelevant to border assignment, and hence, to object perception (binocular disparity,
accretion/deletion, and motion parallax excepted). Some research has begun to investigate how configural
cues and depth cues combine (Peterson and Gibson 1993; Peterson, 2003b; Burge et al. 2005; Qiu et al. 2005;
Burge et al., 2010; Burge, Palmer, & Peterson, 2005; Peterson, 2003b; Peterson & Gibson, 1993; Qiu et al., 2005;
but see Gillam, Anderson, & Rizwi,et al. 2009). Further research on this topic is needed.
268 Peterson

region had been perceived as the figure and that endogenously (volitionally) allocated attention
can affect figure assignment.
Attention can also be allocated exogenously in that it can be drawn to a region by a flash of
light. Baylis and Driver failed to find evidence that exogenously allocated attention affected figure
assignment, but their failure was probably due to the use of an insensitive test. In 2004 Vecera
et al. performed a more sensitive test and, using the same indirect measure as Baylis and Driver,
showed that exogenous attention can also affect figure assignment. Moreover, Vecera et al. found
that attention effects added to those of convexity, complementing the similar additive effect
Peterson and Gibson observed for fixation. Thus, there is now ample evidence that high-level
factors like intention, fixation, and attention (both endogenously and exogeneously oriented) can
affect figure assignment. Moreover, neurophysiological evidence shows that attention enhances
neural responses to figures (Qiu et al. 2007; Poort et al. 2012).

Past experience
The Gestalt psychologists did not conduct systematic tests of whether, in addition to the low-level
factors they identified, high-level representations of previously seen objects can affect figure
assignment There were a few demonstrations that past experience could exert an influence on
figure assignment (e.g., Rubin, 1958/1915; Schafer and Murphy 1943) but these demonstrations
were not above criticism and were dismissed because they were inconsistent with the Zeitgeist (see
Peterson 1999a for review and discussion).
In 1991, Peterson, Harvey, and Weidenbacher obtained results that strongly suggested that past
experience with particular objects influences figure assignment (Peterson et al. 1991). They exam-
ined reversals of figure–ground perception using center-surround displays modeled on the Rubin
vase-faces display. In their displays the factors of symmetry, small area, enclosure, fixation, and
sometimes the depth cue of overlap favored the interpretation that the center region was the
figure. However, past experience favored the interpretation that the surrounding regions were
the figures in that a portion of a familiar object was sketched on the outside of the border shared
by the center and surrounding regions. They showed these displays to observers such that the
familiar object was depicted in its upright orientation on some trials and in an inverted orienta-
tion on other trials, and asked observers to report figure–ground reversals over the course of
30-second trials viewing both upright and inverted displays (for samples see Figure 13.4A and B,
respectively).
Peterson et al. (1991) found that when the familiar object suggested in the surround was pre-
sented in its upright orientation rather than an inverted orientation, observers both maintained
the surround as figure longer and obtained it as figure faster by reversal out of the center-as-
figure percept. The latter finding—that surrounds were obtained as figure by reversal out of the
center-as-figure interpretation faster when they depicted upright rather than inverted familiar
objects—led Peterson et al. to hypothesize that, contrary to the traditional view, access to memo-
ries of previously seen objects occurred outside of awareness prior to figure assignment. (Peterson
and Gibson (1994a) replicated this pattern of results with a set of stimuli designed to isolate effects
of object familiarity.)5

  Top-down set can amplify effects of a familiar configuration (Peterson et al., 1991; Peterson & and Gibson,
5

1994a).
(a) (b) (c)

(d) (e)

Fig. 13.4  (a) Two portions of standing women are suggested on the left and right sides in the
white regions surrounding the small, symmetric black central region. (b) An upside down (inverted)
version of (b). (c) The same parts are suggested on the left and right sides in the white regions as
in (a), but here the parts have been spatially rearranged such that the configuration is no longer
familiar. (d) A bipartite display with equal-area regions to the right and left of the central border.
The black region depicts a portion of a familiar object. These displays were viewed both upright
and inverted. (e) A bipartite display with equal-area regions to the right and left of the central
border. The black region depicts a portion of a familiar object—a seahorse. The white region is a
novel symmetric shape. Hence, past experience and symmetry compete for figural status in this
stimulus.
(a) Reproduced Mary A. Peterson, Erin H. Harvey, and Hollis L. Weidenbacher, Shape recognition inputs to
figure-ground organization: which route counts?, Journal of Experimental Psychology: Human Perception and
Performance, 17 (4), p. 1356, figure 13.2a © 1991, American Psychological Association. (c) Reproduced Mary
A. Peterson, Erin H. Harvey, and Hollis L. Weidenbacher, Shape recognition inputs to figure-ground organization:
which route counts?, Journal of Experimental Psychology: Human Perception and Performance, 17 (4), p. 1356,
figure 13.2c © 1991, American Psychological Association. (d) This material has been reprinted from Mary A.
Peterson and Emily Skow-Grant, ‘Memory and learning in figure-ground perception’, in B. Ross and D. Irwin
(eds), Cognitive Vision. Psychology of Learning and Motivation Vol. 42, p. 5, figure 13.4a Copyright © 2003,
Elsevier. (e) Reproduced from Mary A. Peterson and Bradley S. Gibson, Must Figure-Ground Organization Precede
Object Recognition? An Assumption in Peril, Psychological Science 5(5), p. 254, Figure 13.1 Copyright © 1994 by
Association for Psychological Science. Reprinted by Permission of SAGE Publications.
270 Peterson

Peterson et al. (1991) observed the effects of past experience on figure assignment only when the
parts were arranged into familiar configurations; when the same parts were rearranged into novel
configurations, as in Figure 13.4(C), no such effects were observed. Thus, these were effects of
familiar configuration and not familiar parts. Moreover, instruction-delivered knowledge that the
inverted displays depicted inverted familiar objects or that the part-rearranged displays were con-
structed by rearranging the parts of well-known, familiar objects was not sufficient to allow past
experience to affect figure assignment with those stimuli; upright displays were necessary. That
instruction-delivered knowledge was insufficient to change the pattern of results obtained with
inverted and part-rearranged displays indicated that fast, bottom-up, access to the relevant object
representations afforded only by upright displays was necessary for effects of past experience on
figure assignment. These results led Peterson and colleagues to hypothesize that high-level memo-
ries of familiar objects can influence figure assignment, provided that they are accessed quickly.
Inverting the displays slowed access to memories of familiar objects, and therefore removed their
influence on figure assignment.
Peterson and her colleagues then created a set of displays designed to isolate effects of famil-
iar configuration in order to investigate whether past experience exerts an influence on the
first perceived figure assignment. In these displays, vertically elongated rectangles were divided
into two equal-area black and white regions by an articulated central border. The region on
one side of the central border depicted a portion of a familiar object, whereas the region on
the other side did not (a example is shown in Figure 13.4D.) The right/left location and black/
white color of the familiar regions was balanced across the set of displays. The displays were
exposed for brief durations (e.g., 86 ms) and masked; each display was viewed twice only, once
in an upright orientation and once in an inverted orientation. Observers reported whether they
perceived the region on the right or the left of the central border as figure. Observers’ reports
regarding the first perceived figure–ground organization indicated that the figure was more
likely to be perceived on the side of the border where the familiar configuration lay when the
displays were upright rather than inverted (Gibson and Peterson 1994). Peterson and Gibson
(1994b) also pitted a familiar configuration against the image-based configural cue of symme-
try (e.g., Figure 13.4E) and found that effects of both cues were evident in observers’ reports
regarding the first-perceived figure–ground organization in displays exposed for as little as
28 ms. Moreover, these results showed that past experience does not always dominate other
cues; instead past experience operates as one of many cues to figural status (cf., Peterson 1994).
Furthermore, these results suggested that the cues of symmetry and past experience compete
to determine the percept.
The results discussed above were obtained with direct reports regarding figural status. Some
scientists expressed concern that these direct reports might not indicate the first perceived fig-
ure assignment, that participants might reverse the displays in search of familiar objects before
they reported figure assignment. A variety of findings argued against that alternative view. First,
familiar configuration did not always determine where the figure was perceived. Second, the
same conclusions were supported by reversal data as well as by reports of the first perceived
figure assignment (Peterson et al. 1991; Peterson and Gibson 1994a). Third, Vecera and Farah
(1997) reported converging evidence using indirect measures, as did Peterson and Lampignano
(2003), Peterson and Enns (2005), Peterson and Skow (2008), and Navon (2011). For instance,
Peterson and Enns (2005) showed participants a novel border twice, first as the border of a prime
object, on its left, say, as in Figure 13.5(A) and later as the border of a test object on either the
same or the opposite side (Figure 13.5B, left and right columns, respectively). In the test the
participants’ task was to report whether two test objects were the same as or different from each
Low-level and High-level Contributions to Figure–Ground Organization 271

(a) Prime

Same side Opposite side


(b) Experimental

+ +

(c) Control

+ +

Fig. 13.5  Displays used by Peterson and Enns (2005). (A) The prime display showing a figure on the
left of a stepped border. (B), (C) Four pairs of same/different test displays. All four samples show
trials on which the correct response was ‘different’. (B) In experimental test displays the prime
border was repeated in one or both of the two test displays (one on ‘different’ trials, as illustrated;
both on ‘same trials). When repeated, the prime border was either shown as the boundary of a
figure on the same side as in the prime (left column, top stimulus), or on the opposite side, the
side that was perceived as the ground in the prime (right column, top stimulus). (C) Control test
displays that did not share a border with the prime. Half the control test displays faced in the same
direction as the prime figure, half faced in the opposite direction (as in the left and right columns,
respectively), to serve as controls for the experimental same direction and opposite direction
displays.
Reproduced from Perception and Psychophysics, 67(4), The edge complex: Implicit memory for figure assignment
in shape perception, Mary A. Petrson, p. 731, Figure 13.3, DOI: 10.3758/BF03193528 Copyright © 2005,
Springer-Verlag. With kind permission from Springer Science and Business Media.

other, with no reference back to the prime object. (This is a variant of Driver and Baylis’ (1996)
indirect measure.) When the border repeated from the prime was assigned to an object on the
opposite side at test, participants’ response times were longer than they were either when it was
assigned to an object on the same side, or when the test objects were control objects with novel
borders, as in Figure 13.5(C). These results showed that a memory of the side to which a border
was previously assigned enters into the determination of where a figure lies when the border is
272 Peterson

encountered again, slowing the decision when cues in the current display favor assigning the
border to a different side.6
The results of Peterson and Enns (2005) (and other results using indirect measures) can best be
understood within a competitive architecture in which candidate objects on both sides of borders
compete for figure assignment outside of awareness. On this view, response times were longer
when the border was assigned to an object on the opposite side at test because a memory that the
object was previously located on the prime side competes with the properties that favor perceiving
the object on the opposite side of the border in the test display.7 Recall that Kienker et al. (1986)
(see also McClelland and Rumelhart 1987; Vecera and O’Reilly 1998; Vecera and O’Reilly 2000)
had introduced the idea that figure assignment entails competition. Modern views of competi-
tion are discussed in more detail in the section ‘Modern Theoretical Views of Figure–Ground
Perception’.

Summary
Research in the late twentieth and early twenty-first centuries has firmly established that, in addi-
tion to image-based factors, high-level factors like attention, intention, and past experience influ-
ence figure assignment. This research also suggested that competition is a mechanism of figure
assignment. Accordingly, modern theoretical views of figure assignment involve competition and
take into consideration influences from both high- and low-level factors, as we will now discuss.

Modern theoretical views of figure–ground perception


Competition
Modern views of figure–ground perception involving competition arose both from modeling
approaches (e.g., see the previous discussion of Kienker et al. 1986) and from neural evidence
(Desimone and Duncan 1995). The computational models assume that inhibitory competition
occurs between feature units or border assignment units, similar to those proposed by Kienker
et al. (1986). Current models are more sophisticated, and allow context and past experience to
exert an influence. Kogo and van Ee (Kogo and van Ee, this volume) provide an up-to-date review
of these models. Accordingly, in the present chapter the discussion focuses on neural models
involving competition between objects or object properties rather than between border assign-
ment units or feature units.
Desimone and Duncan (1995) proposed that objects, or object properties, compete for rep-
resentation by populations of neurons. The competition is evident in the reduction of a neu-
ron’s response when more than one stimulus is present in its receptive field, even when one
of the stimuli is a good stimulus in that it elicits a vigorous response when presented alone
and the other stimulus is a poor stimulus in that it elicits little or no response when presented

  Driver and Baylis (1996) had initially used displays like these to argue against the idea that past experience
6

exerts an influence on figure assignment. They obtained the same pattern of results on experimental trials as
Peterson and Enns (2005) did. However, their research design lacked a critical control condition. Peterson
and Enns (2005) included a control condition and were able to demonstrate that the longer reaction times
obtained on probes with the figure assigned on the opposite side at test were due to effects of past experience
on figure assignment.
  Treisman and DeSchepper (1996) interpreted similar results in terms of negative priming. Peterson and
7

Lampignano (2003) and Peterson (2012) argue that competition is a better explanation.
Low-level and High-level Contributions to Figure–Ground Organization 273

alone (e.g., Moran and Desimone 1985; Miller et al. 1993; Rolls and Tovee 1995). This com-
petition has become known as biased competition because it can be biased or overcome by
contrast or attention. For instance, if an animal attends to one of two stimuli within a neuron’s
receptive field, the neuron’s response pattern changes to resemble the pattern obtained when
only the attended stimulus is present. Critically, if the attended stimulus is the poor stimulus,
the response to the good stimulus is suppressed (Chelazzi et  al. 1993; Duncan et  al. 1997;
Reynolds et al. 1999; see Reynolds and Chelazzi 2004 for a review). Likewise, if one shape
is higher in contrast than the other, the neuron’s response pattern resembles the response to
the high-contrast stimulus alone, and the response to the other stimulus is suppressed. Thus,
the biased competition model entails competition at high levels between objects that might
be perceived, and it predicts suppression of objects that lose the competition. Note that the
biased competition model does not rule out competition between border assignment/edge
units as well. Competition has been shown to occur at many levels in the visual hierarchy
(e.g., Craft et al. 2007).
Peterson and Skow (2008) noted that the two objects that might be perceived on opposite sides
of a border necessarily fall within the same receptive field, and reasoned that the biased competi-
tion model might account for figure–ground perception, with the winner perceived as the object/
figure and the loser perceived as the shapeless ground (see Peterson et al. 2000 for a similar pro-
posal). They reasoned that if the region perceived as ground lost the cross-border competition
for figure assignment, then responses to an object that was potentially present there would be
suppressed. To test this hypothesis they used displays in which many properties favored the inter-
pretation that the object/figure lay on the inside of a closed silhouette border, whereas familiar
configuration favored the interpretation that the object/figure lay on the outside of the silhouette’s
border (e.g., Figure 13.6.) In other words, the silhouettes were designed so that the inside would
win the competition and be perceived as the figure, whereas the outside would lose the competi-
tion and be perceived as a shapeless ground. Indeed, subjects perceived the figure on the inside
and were unaware of the familiar configuration suggested on the outside of the silhouettes, as
predicted if it lost the competition for figural status. (The familiar configuration suggested on the
outside of the left and right contours of the silhouettes in Figure 13.6 is a portion of a house with
a pitched roof and a chimney.)
To assess whether responses to the loser were suppressed, Peterson and Skow (2008) showed
a line drawing of either a real-world object or a novel object shortly after a brief exposure of
one of these silhouettes. Participants made a speeded object decision regarding the line draw-
ing (i.e., they reported whether the line drawing depicted a real-world object or a novel object).
Half the objects were of each type. The real-world objects were mostly from the Snodgrass and
Vanderwart (1980) set; the novel objects were drawn from the Kroll and Potter (1984) set. The
critical manipulation concerned the line drawings of real-world objects:8 they depicted objects
that were either from the same basic-level category or a different category to the familiar config-
uration that was suggested on the groundside of the silhouette border (Figure 13.6A and 13.B,
respectively). Peterson and Skow predicted that if assigning the figure on the inside of the bor-
der entailed suppression of a competitor on the outside, participants’ response times should be
longer to correctly classify a real-world object from the same rather than a different basic-level

8  The line drawings of novel objects were included because the task required participants to decide whether they
were viewing a line drawing of a real-world object or a novel object. To observe effects of competition-induced
suppression, some sort of discrimination at test was necessary.
274 Peterson

(a) Same category (b) Different category

Fig. 13.6  Trial sequence used by Peterson and Skow (2008). Time is shown vertically. A silhouette
with a house suggested on the ground side of its left and right borders was shown centered on
fixation for 50 ms. The silhouette disappeared and 33 ms later a line drawing was displayed, also
centered on fixation. The line drawing depicted either a real-world object or a novel object. When it
was a real-world object, it was either from the same basic level category (A) or a different category
(B) as the object suggested on the groundside of the preceding silhouette. (Novel objects are not
shown.)
Reproduced Mary A. Peterson and Emily Skow, Suppression of shape properties on the ground side of an edge:
evidence for a competitive model of figure assignment, Journal of Experimental Psychology: Human Perception
and Performance, 34(2), p. 255, figure 13.3 © 2008, American Psychological Association.

category from the familiar object suggested on the outside of the silhouette borders. (Note that
this is the opposite of what would be expected if the familiar configuration in the prime was on
the figure side of the border, and that is because the inhibitory competition account predicts
that a competing object on the losing side, i.e., the groundside, is suppressed.) Peterson and
Skow observed the predicted pattern of results. Importantly, the borders of the line drawings
were not the same as those of the silhouettes, ruling out an interpretation in terms of border
units alone. Thus, Peterson and Skow’s results implied that competition occurs between objects
that might be perceived on opposite sides of borders. Note that evidence for high-level com-
petition does not rule out the existence of competition at lower levels, e.g., between border
assignment units.
The evidence for high-level influences on figure assignment and for competition between
objects that might be perceived on opposite sides of a border raises questions regarding how
high the processing of objects competing for figure assignment goes, both functionally and struc-
turally. The answers to these questions favor interpreting figure assignment within a dynami-
cal interactive model in which a fast non-selective feedforward sweep of activation occurs first,
Low-level and High-level Contributions to Figure–Ground Organization 275

competition occurs at many levels, and feedback integrates the outcome of the competition across
all levels, as discussed next.

A dynamical interactive view of figure assignment


with non-selective feedforward activation,
competition, and feedback
Dynamical interactive models of perception were proposed in the mid-1980s by McClelland and
Rumelhart (1987). These early dynamical models deviated from serial hierarchical models in pro-
posing that processing at a lower level need not be completed before processing at a higher level
began, and that feedback from ongoing processing at a higher level could influence processing at
lower levels. To account for the extant evidence that past experience affects figure assignment, Vecera
and O’Reilly (1998, 2000) (see Peterson 1999b for commentary) proposed a dynamical interactive
variant of Kienker et al.’s model in which feedback from template-like memory representations of
objects seeded the feature units on one side of a border thereby affecting the competition between
border assignment units in the same way that attention had an influence in the original model.
Bullier (2001) and Lamme and Roelfsema (2000) proposed a more sweeping change to serial
hierarchical models. They proposed that input was processed to the highest levels in a first, fast,
non-selective feedforward pass of processing but that even high-level processing in this first pass
was not sufficient for perceptual organization, which required a subsequent feedback pass of pro-
cessing (see also Dehaene et al. 2006). These authors did not discuss cross-border competition
(but see Peterson et al. 2000).
There is ample evidence that high-level processing can be achieved rapidly, perhaps too rapidly
for feedback to be involved: for instance, categorizing a stimulus as an animal or a vehicle is thought
to require processing at high levels in the visual hierarchy (levels higher than those where objects
are represented), perhaps at levels beyond traditional visual areas. Thorpe and colleagues (Thorpe
et al. 1996; Joubert et al. 2008; Crouzet et al. 2010) demonstrated that observers can initiate a cat-
egorization response within 100–150 ms of stimulus onset. These results alone could indicate that
a fast feedforward pass of processing is sufficient for perceptual awareness of an object, and indeed
some theorists reached that conclusion (e.g., Serre et al. 2007). However, Peterson and colleagues
(Peterson et al. 2012a; Cacciamani et al. 2014; Sanguinetti et al. 2014) recently investigated whether
semantic access occurs only for objects that are ultimately determined to be figures, or whether
semantic access occurs also for objects that compete for figural status but are ultimately determined
to be grounds. They found that semantic access occurred for objects that are suggested on the side
of a border that is ultimately determined to be ground to an object/figure on the opposite side of
the border. Their results are consistent with the interpretation that a first non-selective pass of pro-
cessing occurs for objects that might be perceived on both sides of a border, and that subsequent
processing (e.g., competition and feedback) are necessary for object perception. Using multivoxel
pattern analysis with stimuli rendered invisible by binocular rivalry, Fahrenfort et al. (2012) also
showed that semantic access was not sufficient for perceptual awareness.
Fahrenfort et  al. (2012) also reported evidence consistent with the hypothesis that interac-
tive processing among a large number of brain regions is required for perceptual awareness of
an object:  They observed long-range activations between brain regions (primarily measured as
gamma range oscillatory power) only for stimuli of which observers were consciously aware, not
for stimuli present in one eye’s view but not perceived because of rivalry. Fahrenfort et al. (2012)
found evidence of categorization at a relatively high level in the brain—the right ventral occipito-
temporal cortex. Barense et al. (2012) showed that an even higher-level brain region, the perirhinal
276 Peterson

cortex of the medial temporal lobe (long thought to be a declarative memory structure only), was
involved in effects of familiar configuration on figure assignment. These data are consistent with
the hypothesis that before figure assignment occurs, a non-selective first pass of processing pro-
ceeds to the highest levels of processing, as per the hypotheses of Lamme and Roelfsema (2000)
and Bullier (2001).
Barense et al.’s (2012) behavioral data led them to hypothesize that the perirhinal cortex of the
medial temporal lobe sends modulatory feedback to the visual cortex. Peterson et  al. (2012b)
found evidence of the predicted feedback for regions perceived as figures, consistent with the
hypothesis that perceptual awareness requires additional interactive processing beyond the first
feedforward pass, as predicted by Lamme and Roelfsema (2000) and Bullier (2001). In addition,
Salvagio et al. (2012) showed that suppression applied to one side of a border, as a result of com-
petition for figural status taking place at high levels where receptive fields are large, is relayed to
levels as low as V1, where receptive fields are much smaller. Likova and Tyler (2008) also found
that activity is suppressed in V1 on the groundside of a border in conditions where a figure is
differentiated from the ground only at a global scale. These recent results are consistent with the
hypothesis that competition for figural status occurring at high structural levels generates feed-
back to lower-level visual areas. As such, they are consistent with current dynamical interactive
views of figure assignment involving (a)  a first fast pass of non-selective feedforward process-
ing that identifies both low-level and high-level attributes of objects that might be perceived on
opposite sides of borders, (b) competition between those object candidates, and (c) feedback that
integrates the signals across the hierarchy of brain regions (Peterson and Cacciamani, 2013; for
related discussion see van Leeuwen, this volume).

Conclusion
One hundred years after Gestalt views first took hold, our understanding of scene segmentation
has progressed substantially. We now know that in addition to the configural properties iden-
tified by the Gestalt psychologists, figure assignment is affected by past experience, attention,
and intentions, as well as by other image-based factors identified during the twentieth century.
Figure assignment is also affected by ground properties. Recent use of indirect measures and brain
imaging techniques has revealed that there is much more processing of the regions ultimately
perceived as grounds than was supposed in traditional approaches, and that competition and
feedback are involved in figure assignment. These new methods offer the promise of uncovering
the mechanisms that organize the visual field into figures and grounds.

Acknowledgements
Much of the research reported in this chapter was conducted while the author was supported by
grants from the NSF, most recently by NSF BCS 0960529. Thanks to Laura Cacciamani for help
with the figures.

References
Bahnsen, P. (1928). Eine Untersuchung über Symmetrie und Asymmetrie bei visuellen Wahrnehmungen.
Z Psychol 108: 129–154.
Baylis, G.C. and Driver, J. (1995). One-sided edge assignment in vision: 1. Figure-ground segmentation
and attention to objects. Curr Direct Psychol Sci 4: 140–146.
Low-level and High-level Contributions to Figure–Ground Organization 277

Barenholtz, E. and Feldman, J. (2006). Determination of visual figure and ground in dynamically
deforming shapes. Cognition 101(3): 530–544.
Barenholtz, E. and Tarr, M. J. (2009). Figure–ground assignment to a translating contour: a preference for
advancing vs. receding motion. J Vision 9(5): 27, doi: 10.1167/9.5.27
Barense, M. G., Ngo, J., Hung, L., and Peterson, M. A. (2012). Interactions of memory and perception in
amnesia: the figure–ground perspective. Cereb Cortex 22(11): 2680–2691.
Bullier, J. (2001). Integrated model of visual processing. Brain Res Rev 36: 96–107.
Burge, J., Peterson, M. A., and Palmer, S. E. (2005). Ordinal configural cues combine with metric disparity
in depth perception. J Vision 5(6): 534–542.
Burge, J., Fowlkes, C., and Banks, M. S. (2010). Natural-scene statistics predict how the figure–ground cue
of convexity affects human depth perception. J Neurosci 30(21): 7269–7280.
Cacciamani, L., Mojica, A. J., Sanguinetti, J. L., and Peterson, M. A. (2014). Semantic access occurs outside
of awareness for the groundside of a figure. Unpublished manuscript.
Chelazzi, L., Miller, E. K., Duncan, J., and Desimone, R. (1993). A neural basis for visual search in inferior
temporal cortex. Nature 363: 345–347.
Craft, E., Schütze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. J Neurophysiol 97(6): 4310–4326.
Crouzet, S. M., Kirchner, H., and Thorpe, S. J. (2010). Fast saccades towards faces: face detection in just
100 ms. J Vision 10(4): 16, doi: 10.1167/10.4.16.
Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious,
and subliminal processing: a testable taxonomy. Trends Cogn Sci 10: 204–211.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Ann Rev Neurosci
18(1): 193–222.
Duncan, J., Humphreys, G. W., and Ward, R. (1997). Competitive brain activity in visual attention. Curr
Opin Neurobiol 7: 255–261.
Driver, J. and Baylis, G. C. (1996). Figure-ground segmentation and edge assignment in short-term visual
matching. Cogn Psychol 31: 248–306.
Fahrenfort, J. J., Snijders, T. M., Heinen, K., van Gaal, S., Scholte, H. S., and Lamme, V. A. (2012).
Neuronal integration in visual cortex elevates face category tuning to conscious face perception. Proc
Natl Acad Sci USA 109(52): 21504–21509.
Ghose, T. and Palmer, S. E. (2010). Extremal edges versus other principles of figure-ground organization.
J Vision 10(8): 3, doi: 10.1167/10.8.3
Gibson, B. S. and Peterson, M. A. (1994). Does orientation-independent object recognition precede
orientation-dependent recognition? Evidence from a cueing paradigm. J Exp Psychol: Hum Percept
Perform 20: 299–316.
Gillam, B. J., Anderson, B. L., and Rizwi, F. (2009). Failure of facial configural cues to alter metric
stereoscopic depth. J Vision 9(1): 3, doi: 10.1167/9.1.3
Gillam, B. J. and Grove, P. M. (2011). Contour entropy: a new determinant of perceiving ground or a hole.
J Exp Psychol: Hum Percept Perform 37(3): 750–757.
Goldreich, D. and Peterson, M. A. (2012). A Bayesian observer replicates convexity context effects. Seeing
Perceiving 25: 365–395.
Hochberg, J. (1971). Perception 1. Color and shape. In: Woodworth and Schlosberg’s Experimental Psychology,
3rd edn, edited by J. W. Kling and L. A. Riggs, pp. 395–474 (New York: Holt, Rinehard and Winston).
Hochberg, J. (1980). Pictorial functions and perceptual structures. In: The Perception of Pictures, Vol. 2,
edited by M. A. Hagen, pp. 47–93 (New York: Academic Press).
Hoffman, D. D. and Richards, W. (1984). Parts of recognition. Cognition 18(1–3): 65–96.
Hoffman, D. D. and Singh, M. (1997). Salience of visual parts. Cognition 63: 29–78.
278 Peterson

Hulleman, J. and Humphreys, G. W. (2004). A new cue to figure–ground coding: top–bottom polarity. Vis
Res 44(24): 2779–2791.
Jehee, J. F. M., Lamme, V. A. F, and Roelfsema, P. R. (2007). Boundary assignment in a recurrent network
architecture. Vis Res 47: 1153–1165.
Joubert, O. R., Fize, D., Rousselet, G. A., and Fabre-Thorpe, M. (2008). Early interference of context
congruence on object processing in rapid visual categorization of natural scenes. J Vision 8(13): 11,
doi: 10.1167/8.13.11.
Kanisza, G. and Gerbino, W. (1976). Convexity and symmetry in figure-ground organization. In: Vision
and Artifact, edited by M. Henle, pp. 25–32 (New York: Springer).
Kienker, P. K., Sejnowski, T. J., Hinton, G. E., and Schumacher, L. E. (1986). Separating figure from
ground with a parallel network. Perception 15: 197–216.
Kim, S.-H. and Feldman, J. (2009). Globally inconsistent figure/ground relations induced by a negative
part. J Vision 9(10): 8, doi:10.1167/9.10.8.
Kroll, J. F. and Potter, M. C. (1984). Recognizing words, pictures, and concepts: a comparison of lexical,
object, and reality decisions. J Verbal Learn Behav 23: 39–66.
Lamme, V. A. F. and Roelfsema, P. R. (2000): The distinct modes of vision offered by feedforward and
recurrent processing. Trends Neurosci 23(11): 571–579.
Likova, L. T. and Tyler, C. W. (2008). Occipital network for figure/ ground organization. Exp Brain Res
189: 257–267.
McClelland, J. L. and Rumelhart, D. E. (1987). Parallel Distributed Processing, Volume 2. Explorations in the
Microstructure of Cognition: Psychological and Biological Models. (Cambridge, MA: MIT Press).
Marshall, J. A., Burbeck, C. A., Ariely, D., Rolland, J. P., and Martin, K. E. (1996). Occlusion edge blur:
a cue to relative visual depth. J Opt Soc Am A 13: 681–688.
Mather, G. and Smith, D. R. R. (2002). Blur discrimination and its relation to blur-mediated depth
perception. Perception 31(10): 1211–1219.
Miller, E. K., Gochin, P. M., and Gross, C. G. (1993). Suppression of visual responses of neurons in inferior
temporal cortex of the awake macaque by addition of a second stimulus. Brain Res 616: 25–29.
Mojica, A. J. and Peterson, M. A. (2014). Display-wide Influences on Figure-Ground Perception: The Case
of Symmetry. Attention, Perception, & Performance, doi: 10.3758/s13414-014-0646-y.
Moran, J. and Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex.
Science 229: 782–784.
Navon, D. (2011). The effect of recognizability on figure-ground processing: does it affect parsing or only
figure selection? Q J Exp Psychol 64(3): 608–624.
Palmer, S. E. and Brooks, J. L. (2008). Edge-region grouping in figure-ground organization and depth
perception. J Exp Psychol: Hum Percept Perform 34(6): 1353–1371.
Palmer S. E. and Ghose T. (2008). Extremal edges: a powerful cue to depth perception and figure-ground
organization. Psychol Sci 19(1): 77–84.
Peterson, M. A. (1994). The proper placement of uniform connectedness. Psychonom Bull Rev 1: 509–514.
Peterson, M. A. (1999a). Organization, segregation, and recognition. Intellectica 28: 37–51.
Peterson, M. A. (1999b). What’s in a stage name? J Exp Psychol: Hum Percept Perform 25: 276–286.
Peterson, M. A. (2001). Object perception. In: Blackwell Handbook of Perception, edited by E. B. Goldstein,
pp. 168–203 (Oxford: Blackwell).
Peterson, M. A. (2003a). Overlapping partial configurations in object memory: an alternative solution to classic
problems in perception and recognition. In: Perception of Faces, Objects, and Scenes: Analytic and Holistic
Processes, edited by M. A. Peterson and G. Rhodes, pp. 269–294 (New York: Oxford University Press).
Peterson, M. A. (2003b). On figures, grounds, and varieties of amodal surface completion. In: Perceptual
Organization in Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and
C. Olson, pp. 87–116 (Mahwah, NJ: LEA).
Low-level and High-level Contributions to Figure–Ground Organization 279

Peterson, M. A. (2012). Plasticity, competition, and task effects in object perception. In rom Perception to
Consciousness: Searching with Anne Treisman. Ch. 11, edited by J. M. Wolfe, and L. Robertson, pp. 253–262.
Peterson, M. A., & Cacciamani, L. (2013). Toward a dynamical view of object perception. In: Shape
Perception in Human and Computer Vision: an Interdisciplinary Perspective, edited by S. Dickinson and
Z. Pizlo, pp. 445–459 (Berlin: Springer).
Peterson, M. A. and Enns, J. T. (2005). The edge complex: Implicit perceptual memory for cross-edge
competition leading to figure assignment. Percept Psychophys 4: 727–740.
Peterson, M. A. and Gibson, B. S. (1993). Shape recognition contributions to figure-ground organization in
three-dimensional displays. Cogn Psychol 25: 383–429.
Peterson, M. A. and Gibson, B. S. (1994a). Object recognition contributions to figure-ground
organization: operations on outlines and subjective contours. Percept Psychophys 56: 551–564.
Peterson, M. A. and Gibson, B. S. (1994b). Must figure-ground organization precede object recognition?
An assumption in peril. Psychol Sci 5: 253–259.
Peterson, M. A. and Kimchi, R. (2013). Perceptual organization. In: Handbook of Cognitive Psychology,
edited by D. Reisberg, pp. 9–31 (Oxford: Oxford University Press).
Peterson, M. A. and Lampignano, D. L. (2003). Implicit memory for novel figure–ground displays includes
a history of border competition. J Exp Psychol: Hum Percept Perform 29: 808–822.
Peterson, M. A. and Salvagio, E. (2008). Inhibitory competition in figure-ground perception: context and
convexity. J Vision 8(16): 4, doi:10.1167/8.16.4.
Peterson, M. A. and Skow, E. (2008). Suppression of shape properties on the ground side of an
edge: evidence for a competitive model of figure assignment. J Exp Psychol: Hum Percept Perform
34(2): 251–267.
Peterson, M. A., Harvey, E. H., and Weidenbacher, H. L. (1991). Shape recognition inputs to figure-ground
organization: which route counts? J Exp Psychol: Hum Percept Perform 17: 1075–1089.
Peterson, M. A., de Gelder, B., Rapcsak, S. Z., Gerhardstein, P. C., and Bachoud-Lévi, A.-C. (2000).
Object memory effects on figure assignment: conscious object recognition is not necessary or sufficient.
Vision Res 40: 1549–1567.
Peterson, M. A., Cacciamani, L., Mojica, A. J., and Sanguinetti, J. L. (2012a). The ground side of a
figure: shapeless but not meaningless. Gestalt Theory 34(3/4): 297–314.
Peterson, M. A., Cacciamani, L., Barense, M. D., and Scalf, P. E. (2012b). The perirhinal cortex modulates
V2 activity in response to the agreement between part familiarity and configuration familiarity.
Hippocampus 22: 1965–1977.
Pomerantz, J. R. and Kubovy, M. (1986). Theoretical approaches to perceptual organization. In: Handbook
of Perception and Human Performance, Vol. II, edited by K. R. Boff, L. Kaufman, and J. P. Thomas, pp.
36:1–46 (New York: John Wiley and Sons).
Poort, J., Raudies, F., Wannig, A., Lamme, V. A., Neumann, H., and Roelfsema, P. R. (2012). The role of
attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron 75(1): 143–156.
Qiu, F. T. and von der Heydt, R. (2005). Figure and ground in the visual cortex: V2 combines stereoscopic
cues with Gestalt rules. Neuron 47: 155–166.
Qiu, F. T., Sugihara, T., and von der Heydt, R. (2007). Figure-ground mechanisms provide structure for
selective attention. Nat Neurosci 10(11): 1492–1499.
Reynolds, J. H. and Chelazzi, L. (2004). Attentional modulation of visual processing. Ann Rev Neurosci
27: 611–647.
Reynolds, J. H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. J Neurosci 19: 1736–1753.
Rolls, E. T. and Tovee, J. (1995). The responses of single neurons in the temporal visual cortical areas
of the macaque when more than one stimulus is present in the receptive-field. Exp Brain Res
103: 409–420.
280 Peterson

Rubin, E. (1958/1915). Figure and ground. In: Readings in Perception, edited by D. C. Beardslee and
M. Wertheimer, pp. 194–203 (Princeton, NJ: Van Nostrand) (original work published 1915).
Salvagio, E. M., Cacciamani, L., and Peterson, M. A. (2012). Competition-strength-dependent ground
suppression in figure-ground perception. Attention, Percept Perform 74(5): 964–978.
Sanguinetti, J. L., Allen, J. J. B., and Peterson, M. A. (2014). The ground side of an object: perceived as
shapeless yet processed for semantics. Psychol Sci, 25(1), 256–264.
Schafer, R. and Murphy, G. (1943). The role of autism in a visual figure–ground relationship. J Exp
Psychol 2: 335–343.
Sejnowski, T. J. and Hinton, G. E. (1987). Separating figure from ground with a Boltzmann machine.
In: Vision, brain, and cooperative computation, edited by M. A. Arbib and A. Hanson, pp. 703–724
(Cambridge, MA: MIT Press).
Serre, T., Oliva, A. and Poggio, T. A. (2007). A feedforward architecture accounts for rapid categorization.
Proc Natl Acad Sci USA 104(15): 6424–6429.
Snodgrass, J. G. and Vanderwart, M. (1980). A standardized set of 260 pictures: norms for name
agreement, image agreement, familiarity, and visual complexity. J Exp Psychol: Hum Learning Memory
6(2): 174–215.
Thorpe, S., Fize, D., and Marlot, C. (1996). Speed of processing in the human visual system. Nature
381: 520–522.
Treisman, A., and DeSchepper, B. (1996). Object tokens, attention, and visual memory. In Attention and
performance, XVI: Information integration in perception and communication, edited by T. Inui and
J. McClelland, pp. 15–46. Cambridge, MA: MIT Press.
Vecera, S. P. and Farah, M. J. (1997). Is visual image segmentation a bottom-up or an interactive process?
Percept Psychophys 59: 1280–1296.
Vecera, S. P., Flevaris, A. V., and Filapek, J. C. (2004). Exogenous spatial attention influences figure–ground
assignment. Psychol Sci 15: 20–26.
Vecera, S. P. and O’Reilly, R. C. (1998). Figure–ground organization and object recognition processes: an
interactive account. J Exp Psychol: Hum Percept Perform 24: 441–462.
Vecera, S. P. and O’Reilly, R. C. (2000). Graded effects in hierarchical figure–ground organization: a reply
to Peterson (1999). J Exp Psychol: Hum Percept Perform 26: 1221–1231.
Vecera, S. P. and Palmer, S. E. (2006). Grounding the figure: contextual effects of depth planes on
figure-ground organization. Psychonom Bull Rev 13: 563–569.
Vecera, S. P., Vogel, E. K., and Woodman, G. F. (2002). Lower-region: a new cue for figure–ground
assignment. J Exp Psychol: Gen 131: 194–205.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception I. Perceptual grouping and figure–ground
organization. Psychol Bull 138(6): 1172–1217.
Weisstein, N. and Wong, E. (1987). Figure-ground organization affects the early processing of information.
In: Vision, Brain, and Cooperative Computation, edited by M. A. Arbib and A. R. Hanson, pp. 209–230
(Cambridge, MA: MIT Press).
Wertheimer, M. (1923/1938). Laws of organization in perceptual forms. In: A Source Book of Gestalt
Psychology, edited by W. D. Ellis, pp. 71–94) (London: Routledge and Kegan Paul) (original work
published 1923).
Yonas, A., Craton, L. G., and Thompson, W. B. (1987). Relative motion: kinetic information for the order
of depth at an edge. Percept Psychophys 41(1): 53–59.
Zhou, H., Friedman, H. S., and von der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. J Neurosci 20: 6594–6611.
Chapter 14

Figures and holes


Marco Bertamini and Roberto Casati

Holes have special ontological, topological, and visual properties. Perhaps because of these they
have attracted great interest from many scholars. In this chapter, we discuss these properties,
and highlight their interactions. For instance, holes are not concrete objects, their existence in
perception is, therefore, an exception to the general principle, grounded in evolution, that the
visual system parses a scene into regions corresponding to concrete objects. In 1948, Rudolf
Arnheim discussed the role of holes in the sculptures of Henry Moore. Arnheim’s analysis was
informed by Gestalt principles of figure-ground. In the case of holes within sculptures, given
their relative closure and compactness, Arnheim detected a sense of presence. It is worth report-
ing his words here as this ambiguity is precisely the issue that has been central to much later
work: ‘Psychologically speaking, these statues […] do not consist entirely of bulging convexi-
ties, which would invade space aggressively, but reserve an important role to dells and caves and
pocket-shaped holes. Whenever convexity is handed over to space, partial “figure”-character is
assumed by the enclosed air-bodies, which consequently appear semi-substantial’ (Arnheim,
1948, p. 33).
This chapter starts with a discussion of the ontology and topology of holes. In the last part of
the chapter, the focus will be on the role of holes in the study of figure-ground organization and
perception of shape.

Ontology
In philosophy, ontology is the study of the nature of being, and of the basic categories of being and
their relationships. The ontology of holes moves from the prima facie linguistic evidence that we
make statements about holes, thus presupposing their extra-mental existence. At the same time,
holes appear to be absences, thus non-existing items. Therefore, if they exist, they are sui generis
objects. Within the debate on the nature of holes, materialism maintains that nothing exists in the
world, but concrete material objects, thus holes should be explained away by reference to proper-
ties of objects (Lewis & Lewis, 1983). Others, by contrast, maintain that holes exist, even though
they are not material (Casati & Varzi, 1994; 1996). If we accept that holes exist, further problems
must be addressed. For example, whether holes exist independently of the object in which they
find themselves, whether they should be equated with the hole linings (and thus be considered as
material parts of material objects), and whether one can destroy a hole by filling it up (as opposed
to ending up with a filled hole).
To consider holes as existing extra-mentally is no trivial assumption. There are some advan-
tages, such as the possibility of describing the shape of a holed object by referring to the shape of
the hole in it. For example we can describe a star-shaped hole in a square-shaped object. If holes
282 Bertamini and Casati

could not be referred to directly, the description of the same configuration would be awkward
(Figure 14.1a).
However, if holes exist, they are not material objects. Yet they possess geometric properties,
and therefore there are some entities with geometric properties that are not objects. This would
entail that Gestalt rules can fail in parsing the visual scene into objects. However, if holes have
shape-like figures, they do not prevent the visual area corresponding to their shape from being
seen as ground. Therefore, the same area can behave as figure and ground at the same time, which
is, prima facie, problematic for theories of figure-ground segmentation and for the principle of
unidirectional contour ownership (Koffka, 1935). Border ownership is covered in detail in Kogo
and van Ee, this volume.
Various solutions exist. Some may wonder whether ontology is relevant for the study of visual
perception. There may exist a property such that anything that is a hole has that property, but this
does not entail that to have the impression of seeing a hole one must visually represent that very
property—holes can be immaterial bodies or negative parts of objects (Hoffman and Richards,

(a)

(b)

Fig. 14.1  (a) The cognitive advantage of holes: the object is easily described as a blue square with a
star-shaped hole. A description of the shape of the object that does not mention the shape of the
hole would be more difficult. (b) Evidence for naïve topology: two solids that mathematical topology
cannot distinguish, but that appear quite different to common-sense classifications.
Reproduced from Casati, Roberto, and Achille C. Varzi., Holes and Other Superficialities, figure: “Cognitive
advantage of holes”, © 1994 Massachusetts Institute of Technology, by permission of The MIT Press.
Figures and Holes 283

1984), or portions of object boundaries, and perception may be blind to their real nature, although
still delivering the impression of perceiving a hole (Siegel, 2009). Alternatively, one may suggest
that the process of figure-ground organization misfires in the case of holes, whose Gestalt proper-
ties erroneously trigger the ‘figure’ response. That is, holes are (rare) exceptions. Another solution
is to say that holes have a special ‘tag’ as the missing part of an object (Nelson et al., 2009). The solu-
tion that requires fewer changes to Gestalt principles, however, is to say that the shape properties of
the hole are a property of the object-with-hole, just like the large concavity in a letter C. These prop-
erties do not make the hole or the concavity of the letter C into a figure in the sense of foreground.
What is meant by figure in figure-ground organization is not just something that has shape, but
something that is more specific and is closely linked to surface stratification. In all these cases,
the visual system makes important decisions about whether holes exist, and about their nature as
objects or quasi-objects. Some developmental findings comfort this hypothesis. Giralt and Bloom
(2000) found that 3-year-old children can already classify, track, and count holes. Therefore, there
is good evidence that the human perceptual system takes holes seriously into account.

Topology
Holes play an important part in topology, a branch of mathematics dealing with spatial prop-
erties. Topological shape-invariance is intuitively understood by imagining that objects are
rubber-sheet. In particular, the concept of homotopy classification is used to describe the differ-
ence between shapes. Two objects are topologically equivalent if it is possible to transform one
of them into the other by just stretching it, without cutting or gluing at any place. Thus, a cube is
topologically equivalent to a sphere, but neither is equivalent to a doughnut. This classification,
in non-technical terms, measures the number of holes in an object. For instance, all letters of the
alphabet used in this chapter belong to one of three classes respectively with zero (the capital L),
one hole (capital A) or two holes (capital B). Capital L is topologically equivalent to capital I, Y,
and V. This explains the joke that says that a topologist cannot distinguish a mug from a doughnut
(assuming the mug has a handle, they both have just one hole).
The joke about topologists hints at a psychologically interesting distinction. Intuitive topologi-
cal classifications of objects are not well aligned with topological classifications. As there is a naïve
physics that departs from standard physics, there appears to be a naïve topology that does not
coincide with mathematical topology. For instance, a cube perforated with a Y-shaped hole is
topologically equivalent to a cube perforated with two parallel I-shaped holes, surprising as this
may appear (Figure 14.1b). Moreover, a knot in a hole is invisible to mathematical topology. Naïve
topology uses both objects and holes to classify shapes.
Within vision science, Chen has argued that extraction of topological properties is a fundamen-
tal function of the visual system, and that topological perception is prior to the perception of other
featural properties (for a review, see Chen, 2005; see Casati, 2009, for a criticism). There is some
empirical evidence in support of this claim. In particular, Chen has shown that human observers
are better at discriminating pairs of shapes that are topologically different than pairs that are topo-
logically the same (Chen, 1982) and Todd et al. (1998) have found that in a match-to-sample task
performance was highest for topological properties, intermediate for affine properties, and lowest
for Euclidean properties. More recently, Wang et al. (2007) reported that sensitivity to topological
properties is greater in the left hemisphere, and Zhou et al. (2010) have found that topological
changes disrupt multiple objects tracking.
Holes play an important role in studies of topology, and topology is useful in explaining some
perceptual phenomena. However, in this context, holes are defined as an image property. In other
284 Bertamini and Casati

Hard Context Easy


Fig. 14.2  The configural superiority effect: target detection improves with the addition of a context.
In this example the closed region is easier to find compared to a difference in orientation.

words, the letter O is an example of a hole whether or not this is perceived as a black object in
front of a white background. The depth order of the white and black regions is irrelevant, and the
experiments cited above did not try to establish whether observers perceived the region inside the
hole as showing a surface at greater depth than the object itself.
Let us take the phenomenon of configural superiority (Figure 14.2) studied by Pomerantz
(2003; Pomerantz, Sager, & Stoever et  al., 1977; see also Pomerantz chapter, this volume) and
discussed also in Chen (2005). This effect may be taken to demonstrate the salience of perception
of a hole over individual sloped lines. However, ‘closure’ may be a better term for this configu-
ral property. That is, because depth order is not important, this concept of hole is closer to the
concept of closure. This is consistent with the literature, because closure is a factor that enhances
shape detection (Elder & Zucker, 1993) and modulates shape adaptation (Bell et al., 2010). Note
that closure is on a continuum: even contours that are not closed in a strict image sense can be
more or less closed perceptually (Elder & Zucker, 1994). This quantitative aspect of closure is
important for the concept of hole, because it makes a hole simply the extreme of a continuum
of enclosed regions and not something unique. Moreover, if closure is sufficient to define holes
then any closed contour creates a hole, which makes holes very common, whereas true holes (i.e.
apertures) are relatively rare.

Holes as ground regions


We have briefly discussed the ontology and topology of holes; holes are especially interesting in the
study of perceptual organization, that is, when a hole is defined in terms of figure-ground organi-
zation (see Peterson chapter, this volume) and perception of surface layout. A general definition
of a visual hole is a region surrounded by a closed contour, but perceived as an aperture (a miss-
ing piece of surface) through which a further (and farther) surface is visible. This is a definition
specific to visual holes, rather than the more general concept of physical holes, as not all physical
holes may be visible (Palmer et al., 2008). This usage of the term ‘hole’ within the literature dealing
with perceptual organization critically relies on ordinal depth information. Holes would not exist
in a two-dimensional world, but they only require ordinal rather than metric depth.
Bertamini (2006) argued that visual holes are ideal stimuli to study the effect of figure-ground
reversal on perception of shape:  a closed region perceived as object or hole provides a direct
comparison between a figure (object) and a ground (hole) that are otherwise identical in shape
(congruent). However, Palmer et al. (2008) argued that contour ownership and ordinal depth can
be dissociated in figure-ground organization. More specifically, in the case of a visual hole the
outside object (the object-with-hole) is foreground and, therefore, nearer in depth than the back-
ground, but the contour can also describe the ground region inside the hole, contrary to what uni-
directional contour ownership would suggest. If holes are special in that they have one property of
Figures and Holes 285

background (depth order), but also a property of the foreground (contour ownership) then they
are not useful in the study of general figure-ground effects, as these would not generalize to other
ground regions. We will return to this problem after the discussion of the empirical evidence.
It is informative to attempt to draw on a piece of paper something that will be perceived imme-
diately as a visual hole. In so doing, one discovers that this is a difficult task, and for good reasons.
A finite and enclosed region of an image, such as a circle, tends to be perceived as foreground
because of factors such as closure and relative size (the closed contour is smaller relative to the
page). Therefore, other factors must be present to reverse this interpretation.

Factors that make a region appear as a hole


In 1954 Arnheim provided a demonstration of the role of convexity in figure ground organization
using a hole (see also Arnheim, 1948). As shown in Figure 14.3a, the shape on top is more likely
to be seen as a hole compared to the shape on the bottom. Note that here convexity is used in a
piecewise sense as a global property of a complex shape (Bertamini & Wagemans, 2012). This role
of convexity in figure-ground was later confirmed by Kanizsa and Gerbino (1976).
Arnheim’s demonstration is elegant because of its simplicity, as the two shapes can be made
the same in area or in contour length, and in Figure 14.3a they are not the shapes of any specific
familiar object. The difference between the two regions is thus something about the shape itself.

(a) (b) (c) (d)

Fig. 14.3  Figural factors affecting the perception of holes: the hole percept is stronger in the top
element of each pair. (a) Arnheim (1954) claimed that globally concave shapes tend to be seen as holes.
This figure shows an extreme version of his demonstration in which the set of smooth contour segments
are identical in both cases (they are just arranged differently) and have, therefore, the same curvature
and the same total length. For a version with equal area see Bertamini (2006). Most observers, when
forced to choose, select the shape on the top as a better candidate for being a hole. (b) Bozzi (1975)
used the example of a square within a square to show the role of the relationship between contours, a
hole is perceived when edges are parallel. (c) Effect of grouping factors, such as similarity of texture or
color (Nelson and Palmer, 2001). (d) Effect of high entropy (lines with random orientation).
Reproduced Barbara Gillam and Philip M. Grove, Contour entropy: A new determinant of perceiving ground or a
hole, Journal of Experimental Psychology: Human Perception and Performance, 37(3), 750–757 © 2011, American
Psychological Association.
286 Bertamini and Casati

However, neither of the two is unambiguously perceived as a hole, so the key to the demonstra-
tion is to ask a relative judgment: which one of the two appears more like a hole. Bertamini (2006)
found that when asked this question most observers chose the concave shape, as predicted by
Arnheim.
Bozzi (1975) made phenomenological observations on the conditions necessary for the per-
ception of holes. The figure that contains the hole should have a visible outer boundary (unlike
the Arnheim examples), there should be evidence that the background visible inside the hole
is the same as the background outside, and the boundary of the hole should be related to the
outer boundary of the object, for instance when contours are parallel as in the frame of a window
(Figure 14.3b).
An early empirical study on the conditions necessary for perception of holes was conducted by
Cavedon (1980). She found that observers did not report seeing a hole even when a physical hole was
present if there were no detectable depth cues. In a more recent list of factors that affect the perception
of a hole, Nelson and Palmer (2001) reported that in addition to depth information grouping factors
are also important because they make the region visible inside a hole appear as a continuation of the
larger background (for instance because both have the same texture, Figure 14.3c). Another impor-
tant contribution to the perception of a hole is information that makes the relationship between the
shape of the hole and the shape of the object appear non-accidental. The evidence from Nelson and
Palmer (2001) confirmed the observation by Bozzi (1975). If a white region is centred inside a black
region it is more likely to be perceived as a hole than if it is slightly crooked.
Gillam and Grove (2011) have shown that properties of the ground itself may be important
to generate the percept of a hole. Specifically, they found that a simple rectangle appears more
hole-like when the entropy of the enclosed contours is greater. This can be seen by comparing a
region with multiple lines of different orientations (high entropy) and a region with parallel lines
(low entropy) (Figure 14.3d). A  final factor that strongly affects figure-ground stratification is
shading. For instance, Bertamini and Helmy (2012) used shading to create the perception of holes
(described later, see also Figure 14.6).
Bertamini and Hulleman (2006) explored the appearance of surfaces seen through holes. In par-
ticular, they tested whether the surface seen under multiple holes is a single amodally-completed
surface or whether the background takes on the shape of the complement of the hole (i.e. the
contour of the hole itself). Observers found it difficult to judge the extension of these amodal
surfaces, and were affected by the context (flanking objects). It is interesting that a hole can show
a surface without any information about the bounding contours of that surface. Therefore, the
shape of this object is not specified by any form of contour extrapolation (see chapter on percep-
tual completions). The shape of the hole may still constrain what is hidden in terms of probabili-
ties (Figure 14.4). For example, given a few basic assumptions, underneath a vertically-orientated
hole the value of the posterior probability is greater for a vertically-orientated rectangle than a
horizontal one (Bertamini & Hulleman, 2006).
In another set of observations, Bertamini and Hulleman (2006) used stereograms to test holes
that were moving. If a visual hole has an existence independent of the object-with-hole, perhaps it
can move independently from that very object. However, a substantial proportion of participants
perceived a lens in the aperture of the hole. Also, for objects in which texture changed as they
moved (as in would within a hole), the percept was that of detachment of the contour from the
texture inside the contour. In all cases where there was accretion/deletion of texture on the figural
side, this resulted in detachment of texture, and introduction of a lens-like/spotlight-like appear-
ance. With respect to visual hole the most important finding was that there was strong resistance
to perceive holes moving independently from the object-with-hole.
Figures and Holes 287

Fig. 14.4  Assuming that the three grey regions are perceived as holes, what is the shape of the
underlying grey surface? Unlike other completion phenomena there is no contour continuation. One
solution is a single grey object underneath all three holes, a second solution is three shapeless blobs,
and finally, as shown by the dashed lines, the contour of the holes, albeit perceived on a different
depth plane, can constrain the possible hidden objects.

Remembering the shape of a hole


In his classic book Palmer (1999) discuss the issue of holes in terms of a paradox. An important
principle from Gestalt says that ground regions are shapeless (Koffka, 1935; Rubin, 1921). This fol-
lows from the fact that contours are assigned only to the foreground and can only provide infor-
mation about the shape of the foreground. However, we have defined a visual hole as a ground
region. Therefore, will the hole be shapeless like all other ground regions? If so observers should
not be able to describe a hole or remember its shape in a memory task.
Although Rubin did not set out to study holes, he did use a set of figures in a study about
shape memory, and asked observers to perceive each of them as either figure or ground (1921).
When the instructions changed between study phase and test phase, memory performance
was very poor. However, in a better controlled set of experiments Palmer et al. (2008) found
that memory for the interior shapes of regions initially perceived as holes was as good as the
memory for those regions perceived as solid objects. In another set of studies, Nelson et al.
(2009) noted that memory was good for holes as long as they were located in a single sur-
face. Memory was poor for regions that were enclosed within multiple surfaces, i.e. accidental
regions. This is consistent with the definition that says that the hole is a region with a closed
contour, and is also consistent with most people’s intuition that a hole has to exist within a
single object-with-hole.
Because memory for holes is as good as memory for objects, Palmer argued that regions can
be represented as having a shape even when they are not figures, and that in the case of holes,
although they are not figures and are not material, they are ‘figures for purposes of describing
shape’ (p. 287). The idea that hole boundaries are used to describe shape was also in Casati and
Varzi (1994, pp. 162–163), who claimed that ‘in addition to figural boundaries there are topical
boundaries, which confer a figural role on some portion of the visual field . . . without at the same
time suggesting that such a role is played by figures in the old sense’.
Other authors have subscribed to this position. Feldman and Singh (2005) worked on an analy-
sis of convexity and concavity information along contours. There are important differences in how
the visual system treats the two, but what is coded as convex or concave depends on figure-ground
and, therefore, for a given closed contour, the coding is reversed if the contour is perceived as
a figure or a hole. Feldman and Singh suggested that perhaps this does not happen because, as
288 Bertamini and Casati

suggested by Palmer, holes may have ‘a quasi-figural status, as far as shape analysis is concerned’
(Feldman & Singh, 2005, p. 248).

Visual search and holes


Some interesting evidence about perception of holes comes from studies that used the vis-
ual search paradigm. In a study focused on pre-attentive accessibility to stereoscopic depth,
O’Toole and Walker (1997) tested visual search for items defined by crossed or uncrossed dis-
parity. Within a random dot stereogram this manipulation created some conditions in which
holes were perceived (behind the background at fixation). O’Toole and Walker found some
evidence for an advantage for targets in front, relative to targets behind. Interpretation was
difficult because of the presence of nonlinear trends in the search slopes, but in general terms
O’Toole and Walker suggested that their results are consistent with the emergence of global
surface percepts.
Bertamini and Lawson (2006) conducted a series of visual search studies using similar random
dot stereograms, but focusing more directly on the comparison between a search for a simple cir-
cular figure and a search for a simple circular hole. Note that for contours such as a circle this type
of figure ground reversal means that in one case the target is strictly convex and, in the other case,
the target is strictly concave. A manipulation that was added in Bertamini and Lawson compared
with O’Toole and Walker (1997) was the fact that, in some cases, the background surface was
available for preview before the items appeared. Bertamini and Lawson (2006) found that pro-
viding a preview benefited search for concavities (holes) more than it did search for convexities
(figures) and that for convex figures, nearer targets were responded to more quickly. The effect of
background preview is important. The best explanation comes from the observation that when a
hole appears on a background that was already present the shape of that surface changes, by con-
trast adding a figure in front of the background does not cause a change of shape of a pre-existing
object. On the key comparison between convexity and concavity, however, there was no evidence
that concave targets (holes) were inherently more salient.
Hulleman and Humphreys (2005) studied the difference between searching among objects
and searching among holes. The target was a ‘C’ and the distractor was a ‘O’. It was easier to
search among objects than to search among holes, although it should be noted that stimuli were
always more complex, for instance in terms of additional contours, in the hole conditions. The
authors conclude that their results support the idea that the shape of a hole is only available
indirectly.
Taking the studies about memory and those about visual search one could say that observers
must be able to see holes given that they can remember them and find them in a search task.
However, it is also possible that observers knew about the properties of the holes only through
the shape of the host surface, given that holes are always properties of an object. To know more
about how holes are processed we will describe studies in which observers had to respond as fast
as possible to specific local or global aspects of the hole.

Attention and visual holes


Let us consider the shapes in Figure 14.5. It is easy to notice that the hexagon is irregular and a
pair of vertices is not aligned. In the examples of Figure 14.5 the vertex on the left in lower than
the one on the right, vertically. If observers have to judge which vertex is lower the task difficulty
will vary with vertical offset. Using irregular hexagons like those on the left side of Figure 14.5,
Figures and Holes 289

Fig. 14.5  Colour and shading are powerful ways to affect figure-ground. On the left we perceive
surfaces on top of other surfaces but on the right we perceive holes. The convexity (+) and concavity
(–) of the vertices is labeled to highlight the complete reversal that takes place with a figure-ground
reversal. The hexagon on the top row has only one type of vertices, these are convex (figure) or
concave (hole). The hexagon on the bottom row has both types, and they all reverse as we move
from figure to hole.

Baylis and Driver (1993) have shown that closure of the shape improves performance, i.e. there is
a within object advantage. However, as pointed out by Gibson (1994) one has to be careful when
comparing vertices that can be perceived as convex or concave. In particular, the object on top has
convex vertices and the one at the bottom has concave vertices.
To manipulate the coding of convexity while retaining the same hexagonal shapes, Bertamini and
Croucher (2003) compared figures and holes. This is the manipulation illustrated in Figure 14.5,
although color and texture were used as figural factors rather than shading. Note that this can be
seen as a 2×2 design in which the convexity of the critical vertices varies independently of the
overall shape of the hexagon. Results confirmed that figure-ground reversal had an effect on task
difficulty: performance was better when the vertices were perceived as convex. In other words,
the coding of the vertices as convex or concave was more important that the overall shape of the
hexagon. The reason it is easier to judge the position of convex vertices is likely to be that there is
an explicit representation of position for visual parts, and convexities specify parts (Koenderink,
1990; Hoffman & Richards, 1984). Therefore, the different convexity coding for figures and holes
implies a different part structure in the two cases.
290 Bertamini and Casati

The advantage for judging the position of convex vertices (as opposed to concave) is supported
by evidence that does not rely on holes (Bertamini, 2001), but holes do provide the most direct
test of the role of convexity. Holes have been used in subsequent studies by Bertamini and Mosca
(2004), and Bertamini and Farrant (2006). Using random dot stereograms Bertamini  and
Mosca (2004) could ensure that there was no ambiguity in figure-ground relations. In a random
dot stereogram, no shape information is available until images have been binocularly fused and,
therefore, depth order is established at the same time as shape information. In this sense, unlike
texture, shading, and other factors that can create a hole percept, random dot stereograms create
holes that cannot be perceived any other way. Bertamini and Mosca’s (2004) experiments con-
firmed that the critical factor in affecting relative speed on this task was whether the region was
seen as foreground or background, thus changing contour ownership.
The explanation of the effect relies on the assumption is that the contour of a silhouette is per-
ceived as the rim of an opaque object. To test this Bertamini and Farrant (2006) compared objects
and holes to a third case, that of thin (wire-like) objects. As a thin line tends to be perceived as
the contour of a surface, these thin objects, which are both objects and holes, can only be cre-
ated within random dot stereograms. Bertamini and Farrant confirmed that holes created by thin
objects are different in terms of performance from both objects and holes. They concluded that
thin wire-like objects have a different perceived part structure, which is intermediate between that
of objects and that of holes.
Albrecht et al. (2008) studied holes with a cueing paradigm. It is known that responses to uncued
locations are faster for probes that are located on the cued surface compared with the uncued sur-
face (Egly et al., 1994). This is taken as evidence of object-based attention. Albrecht et al. (2008)
compared surfaces with identical rectangular regions perceived as holes. Stereograms were used
to ensure that holes were perceived as such. The object-based advantage was not found for holes
when the background surface visible through the holes was shared by the two holes, but the effect
was present when this background was split, so that different objects were visible through different
holes. The findings show clearly that the important factor in deployment of attention is not just the
closure of the contours, as this was the same for the rectangles perceived as objects and as holes,
but the perceptual organization of the regions as different surfaces in depth. The region cued inside
a hole is the background surface, consistently with the idea that a hole is a ground region. That is,
what is seen inside the hole belongs to a surface that extends beyond the contour of the aperture.
Another paradigm that has been used to study attention is that of multiple objects tracking, in
which observers track moving items among identical moving distractors (Pylyshyn, & Storm, 1988;
Scholl, 2009). Horowitz and Kuzmova (2011) compared performance when tracking figures and when
tracking holes. Holes were as easy to track as figures. Therefore, Horowitz and Kuzmova concluded
that holes are proto-objects, that is, bundles that serve as tokens to which attention can be deployed.
The results from multiple objects racking are consistent with the results from visual search tasks.
Observers can find and attend to locations where a hole is present. How far can we go in perceiv-
ing holes and their shape as if they were the same as objects? To answer that question Bertamini
and Helmy (2012) used a shape interference task. Observers were presented with simple shapes
and had to discriminate a circle from a square (see Figure 14.6). However, there was also an irrel-
evant surrounding contour that could be either a circle or a square. Different (incongruent) inside
and outside contours produced interference, but the effect was stronger when they formed an
object-with-hole, as compared with a hierarchical set of surfaces or a single hole separating differ-
ent surfaces (a trench). This result supports the hypothesis that the interference is constrained by
which surface owns the contour, and that the shape of a hole cannot be processed independently
of the shape of the object-with-hole.
Figures and Holes 291

Congruent

Incongruent
Fig. 14.6  In the top row there is a square contour surrounded by another square contour. This is
true for both the object and the hole. In the bottom row there is a square contour surrounded by a
circular contour. Therefore, these are examples in which the two contours are congruent (same) or
incongruent (different). What is different between objects and holes is that in the case of holes the
surrounding contour is part of the same surface that also defines the hole.

Conclusions
This chapter has shown the surprisingly large range and diversity of the studies of holes. Some
authors have focused on the nature of holes. We have seen the implications of this characteriza-
tion for accounts of the perception of holes. Can they act as objects or at least as proto-objects?
Other authors have used holes because they are convenient stimuli to manipulate key variables, in
particular figure-ground and contour ownership.
We can confidently say that humans are not blind to holes. Observers can remember the shape
of holes, they can search among holes and they can perform multiple tracking of holes. For some
tasks there is little difference between holes and objects. Therefore, the more difficult question to
answer is to what extent holes are treated by vision on a par with objects, and conversely to what
extent they are different from other ground regions.
In terms of local coding of convexity, it appears that holes are not similar to objects and that
convexity is assigned relative to the foreground surface (Bertamini & Mosca, 2004). In terms of
global shape analysis, here also the shape of a hole cannot be treated independently of the shape of
the foreground surface that is the object-with-hole (Bertamini & Helmy, 2012). On the one hand,
this makes holes less of a curiosity in the sense that they are not an exception to the principles
of figure-ground, and in particular they are not an exception to the principle of unidirectional
contour ownership (Bertamini, 2006). On the other hand holes as ground regions provide the
292 Bertamini and Casati

ideal comparison for their complements. We can compare congruent contours perceived as either
objects (foreground) or holes (background) to test the role of a change in figure-ground relation-
ships while at the same time factors such as shape, size, and closure are fixed.

References
Albrecht, A. R., List, A., & Robertson, L. C. (2008). Attentional selection and the representation of holes
and objects. J Vision 8(13): 1–10.
Arnheim, R. (1948). The holes of Henry Moore: on the function of space in sculpture. J Aesthet Art
Criticism 7(1): 29–38.
Arnheim, R. (1954). Art and Visual Perception: A Psychology of the Creative Eye (Berkeley: University of
California Press).
Baylis, G. C., & Driver, J. (1993). Visual attention and objects: evidence for hierarchical coding of location.
J Exp Psychol Hum Percept Perform 19(3): 451–470.
Bell, J., Hancock, S., Kingdom, F. A. A., & Peirce, J. W. (2010). Global shape processing: which parts form
the whole? J Vision 10(6): 16.
Bertamini, M., (2001). The importance of being convex: An advantage for convexity when judging position.
Perception, 30: 1295–1310.
Bertamini, M. (2006). Who owns the contour of a hole? Perception 35: 883–894.
Bertamini, M., & Croucher, C. J. (2003). The shape of holes. Cognition 87: 1, 33–54.
Bertamini, M., & Farrant, T. (2006). The perceived structural shape of thin (wire-like) objects is different
from that of silhouettes. Perception 35: 1265–1288.
Bertamini, M., & Helmy, M. S. (2012). The shape of a hole and that of the surface-with-hole cannot be
analysed separately. Psychonom Bull Rev 19: 608–616.
Bertamini, M., & Hulleman, J. (2006). Amodal completion and visual holes (static and moving). Acta
Psychol 123: 55–72.
Bertamini, M., & Lawson, R. (2006). Visual search for a figure among holes and for a hole among figures.
Percept Psychophys 58: 776–791.
Bertamini, M., & Mosca, F. (2004). Early computation of contour curvature and part structure: Evidence
from holes. Perception 33: 35–48.
Bertamini, M., & Wagemans, J. (2012). Processing convexity and concavity along a 2D
contour: figure-ground, structural shape, and attention. Psychonom Bull Rev 20(2): 197–207.
Bozzi, P. (1975). Osservazione su alcuni casi di trasparenza fenomenica realizzabili con figure a tratto.
In Studies in Perception: Festschrift for Fabio Metelli, edited by G. d’Arcais, pp. 88–110. Milan/
Florence: Martelli- Giunti.
Casati, R. (2009) Does topological perception rest on a misconception about topology? Philosoph Psychol
22(1): 77–81.
Casati, R., & Varzi, A. C. (1994). Holes and Other Superficialities. Cambridge, MA: MIT Press.
Casati, R., & Varzi, A. C. (1996). Holes. In The Stanford Encyclopedia of Philosophy, edited by
Edward N. Zalta. Available at: http://plato.stanford.edu/
Cavedon, A. (1980). Contorno e disparazione retinica come determinanti della localizzazione in profondità: le
condizioni della percezione di un foro. Università di Padova Istituto di Psicologia Report 12.
Chen, L. (1982). Topological structure in visual perception. Science 218: 699–700.
Chen, L. (2005). The topological approach to perceptual organization. Visual Cogn 12(4): 553–637.
Egly R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and locations: evidence
from normal and parietal lesion subjects. J Exp Psychol Gen 123: 161–177.
Elder, J. H., & Zucker, S. W. (1993). The effect of contour closure on the rapid discrimination of
two-dimensional shapes. Vision Research 33(7): 981–991.
Figures and Holes 293

Elder, J. H., & Zucker, S. W. (1994). A measure of closure. Vision Res 34(24): 3361–3369.
Feldman, J., & Singh, M. (2005). Information along contours and object boundaries. Psychol Rev
112: 243–252.
Gibson, B. S. (1994). Visual attention and objects: one versus two or convex versus concave? J Exp Psychol
Hum Percept Perform 20(1): 203–207.
Gillam, B. J., & Grove, P. M. (2011). Contour entropy: a new determinant of perceiving ground or a hole.
Journal of experimental psychology. Hum Percept Perform 37(3): 750–757.
Giralt, N., & Bloom, P. (2000). How special are objects? Children’s reasoning about objects, parts, and
holes. Psychol Sci 11(6): 497–501.
Hoffman, D. D., & Richards, W. (1984) Parts of recognition. Cognition 18: 65–96.
Horowitz, T. S., & Kuzmova, Y. (2011). Can we track holes? Vision Res 51(9): 1013–1021.
Hulleman, J. & Humphreys, G. W. (2005). The difference between searching amongst objects and searching
amongst holes. Perception & Psychophysics, 67: 469–482.
Kanizsa G., & Gerbino W. (1976). Convexity and symmetry in figure-ground organization. In Vision and
Artifact, edited by M. Henle, pp. 25–32. New York: Springer.
Koenderink, J. J. (1990). Solid Shape. Cambridge, MA: MIT Press.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt.
Lewis, D., & Lewis, S. (1983). Holes. In Philosophical Papers, edited by D. Lewis, Vol. 1, pp. 3–9.
New York: Oxford University Press.
Nelson, R., & Palmer, S. E. (2001). Of holes and wholes: the perception of surrounded regions. Perception
30: 1213–1226.
Nelson, R., Thierman, J., & Palmer, S. E. (2009). Shape memory for intrinsic versus accidental holes.
Attention, Percept Psychophys 71: 200–206.
O’Toole, A. J., & Walker, C. L. (1997). On the preattentive accessibility of stereoscopic disparity: Evidence
from visual search. Percept Psychophys 59: 202–218.
Palmer, S. E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Palmer, S. E., Davis, J., Nelson, R., & Rock, I. (2008). Figure-ground effects on shape memory for objects
versus holes. Perception 37: 1569–1586.
Pomerantz, J. R. (2003). Wholes, holes, and basic features in vision. Trends Cogn Sci 7(11): 471–473.
Pomerantz, J. R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and of their component
parts: some configural superiority effects. J Exp Psychol Hum Percept Perform 3(3): 422–435.
Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: evidence for a parallel
tracking mechanism. Spatial Vision 3(3): 1–19.
Rubin E., (1921). Visuell wahrgenommene Figuren. Copenhagen: Gyldendals.
Scholl, B. J. (2009). What have we learned about attention from multiple object tracking (and vice versa)?
in Computation, Cognition, and Pylyshyn, edited by D. Dedrick & L. Trick, pp. 49–78. Cambridge,
MA: MIT Press.
Siegel, S. (2009). The visual experience of causation. Philosoph Q 59(236): 519–540.
Todd, J., Chen, L., & Norman, F. (1998). On the relative salience of Euclidean, affine, and topological
structure for 3-D form discrimination. Perception 27: 273–282.
Wang B., Zhou T. G., Zhuo Y., and Chen L. (2007). Global topological dominance in the left hemisphere.
Proc Nat Acad Sci USA 104: 21014–21019.
Zhou, K., Luo, H., Zhou, T., Zhuo, Y., and Chen, L. (2010). Topological change disturbs object continuity
in attentive tracking. Proc Nat Acad Sci USA 107(50): 21920–21924.
Chapter 15

Perceptual completions
Rob van Lier and Walter Gerbino

History and definitions


Perceptual completions demonstrate that organizational principles predict not only the belong-
ingness of stimulus-specified parts to functional wholes (Wertheimer 1923/2012) but also the
production of parts devoid of local stimulus counterparts. In vision, completions overcome gaps
in the optic input and reveal the creative side of perception.1
To clarify the distinction between amodal and modal completions (Michotte and Burke 1951;
Michotte et al. 1964; Wagemans et al. 2006) let us refer to the Kanizsa triangle (Figure 15.1a),
an icon of vision science first published in a congress report (Kanizsa 1954) and then in a paper
(Kanizsa 1955/1987) rich in demonstrations that paved the way to decades of research. According
to the standard explanation, each 300 degree black sector becomes a complete disk by the addition
of a 60 degree amodal sector, while the three open corners become a single outlined triangle by
the addition of amodal rectilinear segments that complete its partially defined sides. The tendency
to form improvement “requires” an occluding surface bounded by modal contours made of a
stimulus-specified portion (the rectilinear borders of the black sectors, which take an occlusion
polarity opposite to that of the arcs) and an illusory portion.
By describing amodal completion as a process instantiated by stimulus-defined incompleteness,
driven by a tendency to regularization, and leading to the modal presence of entities without a
counterpart in the local stimulation, Kanizsa (1954, 1955) went beyond the phenomenological
notions of unsichtbar vorhanden (invisibly present, Metzger 1936/2006, c­ hapter  8) and donnée
amodal (amodal datum, Michotte and Burke 1951).2
Figure 15.1b (Kanizsa 1955, fi ­ gure 5) illustrates another configuration involving a different
interplay between amodal and modal completions. Instead of being perceived as a unitary but
complex shape, the black image region splits into two overlapping shapes (an instance of “duo
organization,” Koffka 1935, p. 153), with a preference for the modal completion of the fuzzy
contours of a cross occluding a square bounded by sharp amodal contours. The competition
between processes supporting amodal vs. modal completions is involved in the apparent trans-
parency effect studied by Rosenbach (1902; Metzger 1936, ­figure 141; Kitaoka et al. 2001) and

  Our chapter cover completions of fragmentary proximal stimuli, like those observed during the free viewing
1

of “incomplete” images. It does not cover the filling-in of sensory holes like the blind spot and scotomas (for
such cases see Pessoa et al. 1998; Pessoa and De Weerd 2003).
  The French expression “compléments amodaux” (which appears for instance in the title of Michotte et al.
2

1964) has been occasionally translated into English as “amodal complements” (Jackendoff 1992, pp. 163–164),
but the prevalent contemporary usage is “amodal completion.” The difference between complement and com-
pletion points to the contrast between the phenomenological notion discussed by Michotte and Burke (1951)
and the idea that amodal complements are the product of an active process of completion, already present for
instance in Glynn (1954), who worked on the Rosenbach phenomenon under Michotte’s guidance.
Perceptual completions 295

(a) (b)

(c) (d)

Fig. 15.1  Demonstrations from Kanizsa (1955). (a) Illusory triangle induced by line endings and black
sectors with a 1/3 support ratio. (b) Scission of a black region into a foreground cross with modal fuzzy
margins over an amodally completed square with sharp margins. (c) An illusory rectangle induced by
truncated octagons with concave notches. (d) Four crosses holding the same collinear contours available
in the truncated octagons.
Reproduced from ‘Quasi-Perceptual Margins in Homogeneously Stimulated Fields’, Gaetano Kanizsa, in Susan Petry
and Glenn E. Meyer (eds) The Perception of Illusory Contours, pp. 40–49, DOI: 10.1007/978-1-4612-4760-9_4
Copyright © 1987, Springer-Verlag New York. With kind permission from Springer Science and Business Media.

was analyzed by Petter (1956), who examined several determining factors, including relative
length.
The Michotte school credited Helmholtz for the definition of the amodal vs. modal dichotomy
(Burke 1952, p. 405). Amodal data are experienced without the modal property of the sense that
conveys the information on which they depend (typically, color in the case of vision). Koffka
(1935) used the expression “representation without color” (p. 178) to qualify the amodal presence
of the ground portion behind the figure, and discussed the one-sided function of borders (p. 183)
introduced by Rubin (1915/1921) as a key aspect of perceptual organization, connected with the
“double representation” (p. 178) of image regions that split into a foreground modal surface and
an amodal background.3

  Amodal completion has much in common with the so-called “interposition cue to depth” (Helmholtz 1867;
3

English translation, 1924, 3rd volume, pp. 283–284), a notion that, despite having been strongly criticized
(Ratoosh 1949; Chapanis and McCleary 1953; Dinnerstein and Wertheimer 1957), often appears in the con-
temporary depth literature without any proper reference to unification and stratification factors, which are at
the core of completion phenomena.
296 van Lier and Gerbino

The contrast between configurations c and d in Figure 15.1 (Kanizsa 1955, ­figures 20 and 21)
demonstrates the role of figural incompleteness in co-determining amodal and, consequently,
modal completions. Kanizsa (1987) criticized the tendency to maximize structural regularity as
an explanatory factor but this organizational principle remains at the heart of perceptual comple-
tion theories.
Amodal and modal completions are linked by (i)  the causal hypothesis (the first causes the
second); and (ii) the identity hypothesis (they share common geometric constraints, as suggested
by Kellman and Shipley 1991; Shipley and Kellman 1992). Much research has been devoted to
clarify such issues.

Amodal completion
Let us distinguish local vs. global completions. Local completions depend on features at or near the
occlusion boundary, whereas global completions depend on properties of the whole visual pattern.

Local completions
According to local completion models the shape completed behind the occluder depends on the
properties of the incoming, partly occluded, contours. The local features par excellence signaling
occlusion and triggering amodal completion are T-junctions; they arise at intersections where one
contour continues and another contour ends at that intersection. The continuous contour most
of the times belongs to the occluding object (closer to the observer), whereas the other contour
belongs to the partly occluded object (farther away from the observer; Helmholtz 1867/1924;
Ratoosh 1949). The issue of border ownership has been elaborated further in various studies
(Nakayama et al. 1989; see also Singh, this volume).
While T-junctions comprise a powerful local cue for occlusion (although there are exceptions;
Buffart et al. 1981; Chapanis and McCleary 1953), the form of the occluded shape is a matter of
quite some debate and varies from linear continuations (Kanizsa 1979, 1985; Wouterlood and
Boselie 1992) to inflected curved contours (Takeichi et al. 1995). In an influential paper Kellman
and Shipley (1991) advocated the so-called relatability criterion. This criterion predicts comple-
tions by a smooth curve when linear extensions would meet behind the occluding surface at
angles of 90 degrees or larger. When linear extensions would meet at smaller angles no amodal
completion is predicted. In response, Wouterlood and Boselie (1992) argued that edges could
be relatable without triggering amodal completion, and also that edges could be nonrelatable,
but still trigger amodal completion. After that, also Tse (1999a,b), Singh (2004), and Anderson
(2007a) questioned the effectiveness of the relatability criterion. No doubt, however, that the ideas
of Kellman and Shipley had great impact on thinking about perceptual completions. Fantoni and
Gerbino (2003), for example, modeled contour completion by a so-called vector field combina-
tion. Here, interpolated trajectories result from an algorithm that computes the vectors represent-
ing good continuation and minimal path. The field model is sensitive to both the local geometry
of contour fragments and shape characteristics such as symmetry. The latter can be implemented
by weighting the relative influence of good continuation versus minimal path. Besides these geo-
metrical properties also retinal distances are taken into account.

Global completions
Global completions depend on shape regularities like symmetry (Buffart et  al. 1981; Sekuler
1994; Sekuler et al. 1994; van Lier et al. 1994, 1995a, 1995b). The preferred completion can be
the result of converging local and global completion tendencies, like in Figure 15.2a, where the
Perceptual completions 297

(a) (b)

(c) (d)

Fig. 15.2  (a) An occlusion pattern for which local and global completion tendencies converge to the
same shape. (b) Occlusion pattern with diverging local (left) and (right) global completions; (c) Local and
global completions of self-occluding parts; given the perceived indented cube on the left, the upper right
preserves most symmetry and can be regarded as the global completion. (d) The two blobs at both sides
of the pillar are readily perceived as connected with each other.
(c) Reproduced Rob van Lier and Johan Wagemans, From images to objects: Global and local completions of self-
occluded parts, Journal of Experimental Psychology: Human Perception and Performance, 25 (6), pp. 1721–1741,
http://dx.doi.org/10.1037/0096-1523.25.6.1721 © 1999, American Psychological Association. (d) Reprinted from
Cognitive Psychology, 39(1), Peter Ulric Tse, Volume Completion, pp. 37–68, Copyright © 1999, with permission
from Elsevier.

preferred completion results from good continuation of the partly occluded contours and also
reveals a highly regular shape. The local and global tendencies may also diverge into different
shapes (Figure 15.2b).
The Structural Information Theory (SIT) initiated by Leeuwenberg (1969, 1971) and further
developed since then (van der Helm and Leeuwenberg 1991, 1996; see also van der Helm, this
volume) provides an account of global regularities by means of regularity-based coding rules and
combines it with the minimum principle (Hochberg and McAlister 1953). Buffart et al. (1981)
applied SIT to occlusion patterns and demonstrated that preferred completions yielded the
298 van Lier and Gerbino

simplest codes. However, other studies showed that observers do not always perceive the most
regular shapes (Boselie 1988; Wouterlood and Boselie 1992; Kanizsa 1985; Rock 1983).
Sekuler (1994; Sekuler et al. 1994) investigated the tendencies toward local and global comple-
tions and showed that for partly occluded shapes with abundant regularity (e.g., comprising both
vertical and horizontal axes of symmetry after completion), global completions prevailed. Sekuler
(1994) proposed a completion model in which local and global strategies act independently and
are weighted against each other (e.g., depending on the occurrence of symmetry axes). The diverg-
ing completion tendencies were also investigated by van Lier et al. (1994, 1995a, 1995b). They
provided an integrative account within SIT in which the perceptual complexity of an interpreta-
tion is not only determined by the regularity of the perceived shapes but also by the positional
regularities between the shapes, and by the degree of occlusion (van Lier et  al. 1994; van Lier
2001). Crucially, the shape regularities increase the plausibility of an interpretation, whereas the
positional regularities (i.e., coincidental regularities; Rock 1983) decrease an interpretation’s plau-
sibility. van der Helm (2000) additionally argued that, within a Bayesian framework, the shape
and positional complexities can be related to priors and conditionals, respectively.
The influence of regularities on amodal completion is a frequently discussed issue in the litera-
ture (Anderson 2007a,b; Kanizsa 1979; Kellman et al. 2007; Sekuler 1994, van Lier 1999; 2001;
van der Helm 2011; Wagemans et  al. 2012) and may also lead to various pragmatic and theo-
retical stances to more or less rule out their effects. For example, to avoid influences of global
regularities Wouterlood and Boselie’s (1992) local completion model was set up only for irregular
patterns (implicitly acknowledging the influence of regularities), whereas Kellman and Shipley
(1991) excluded the effect of global regularities on amodal completion by asserting that global
completions result from cognitive interferences. In the general discussion we will briefly come
back to this issue.

2D versus 3D in amodal completion


Within the domain of amodal completion the experimental studies mainly dealt with 2D pat-
terns in which two coplanar surfaces are perceived, one partly occluding the other. In the
past decades various attempts have been made to extend the research on amodal completion
towards more veridical 3D layouts of the visual scene. For example, Kellman et al. (2005a,b)
extended their initial relatability account (Kellman and Shipley 1991) to three dimensions. In
fact, the authors proposed that the relatability criterion operates in all directions. Relatable
contour elements are thought to be roughly coplanar and within the plane they must meet the
2D relatability criteria. The authors tested their 3D predictions by means of a variety of stereo
displays and confirmed their ideas. A further extension toward 3D was provided by Fantoni
et al. (2008). These authors reported experiments on geometric constraints for 3D interpola-
tion between surface patches when no contour edges were visible and concluded that their
results proved evidence that for textured 3D displays the inducing slant can constrain surface
interpolation in the absence of explicit edge information. More in particular they stated that
3D contour and surface interpolation processes share common geometric constraints as for-
malized by 3D relatability.
Beside amodal completion between spatially separated parts, 3D object interpretations follow-
ing a specific retinal image comprise amodal parts as well. Michotte et al. (1964/1991) termed
this “amodal completion without cover”. The influence of symmetry on 3D object completion
was investigated by van Lier and Wagemans (1999) who studied completions of the object’s
non-visible rear. Similar to 2D shapes they found a preference for symmetrical shapes; skewed
symmetries in 2D projections of 3D volumes trigger preferences for symmetrical completions
Perceptual completions 299

of those volumes (see Figure 15.2c). Amodal 3D completions were also studied by Tse (1998,
1999a,b) who introduced the concept of “complete mergeability” stating that completion is not
triggered by contour relatability but by intermediate representations such as volumes. Roughly,
the principle of complete mergeability entails that separated volumes are amodally connected
behind an occluder along a trajectory defined by their visible surfaces such that they completely
merge (Figure 15.2d). In a follow up, Tse (2002) launched a contour propagation approach on
surface filling-in for projections of 3D objects. These ideas connect strongly with various 3D
shape perception notions (Koenderink 1990) that already had great impact on our general
understanding of the relation between 2D projections and 3D shape perception.
With the inclusion of 3D objects, and even more complex sceneries, the domain of amodal
completion further expanded. One may question whether these completions are all part of one
and the same underlying completion process or whether the generation of the amodal parts is dis-
tributed along different stages between retinal input and object/scene representation. Answering
such questions is obviously in need of further experimental research.

Experimental paradigms in amodal completion studies


In the past decades a number of different paradigms have been employed to investigate the form
of the amodally completed shapes. Evidence for the relatability criterion was obtained by means
of tasks in which observers had to rate the perceived unity between segments (Kellman and
Shipley 1991), or by means of depth discrimination (Yin et  al. 2000) where observers had to
judge the perceived depth relation of two spatially separated spots. In other experiments the
perceived shape of an amodally completed contour was to be indicated by means of probing
specific locations (Fantoni and Gerbino 2003; Takeichi et al. 1995). To test perceived shapes, also
drawing tasks have been employed (Buffart et al. 1981; Boselie 1988) in which participants drew
their preferred completion. Gerbino and Salmaso (1987) designed a more objective experiment
by means of the simultaneous matching task in which the observer was instructed to decide
whether a particular shape fits with a simultaneously presented occlusion pattern (Figure 15.3a).
The authors noticed a response time advantage for matches based on interpretations as com-
pared to literal matches.
Other tasks comprise shape discrimination (Ringach and Shapley 1996), mental rotation (van
Lier and Wagemans 1999; Koning and van Lier 2004), primed matching (Sekuler and Palmer
1992; van Lier et al. 1995b), and visual search (de Wit et al. 2005; Rauschenberger and Yantis
2001; Rauschenberger et al. 2004; Rensink and Enns 1998). Various studies revealed early amodal
completion tendencies. For example, Sekuler and Palmer (1992) adopted the so-called primed
matching paradigm in completion research to study the microgenesis of completions. By means
of varying the prime duration of a certain occlusion prime on a subsequent pair of test shapes
(that could comprise the preferred completion) snapshots of the completion process could be
made. Sekuler and Palmer (1992) showed that after 200 ms a partly occluded disk has the same
facilitating effect as a completely visible disk on a subsequent comparison task in which the simi-
larity of two disks had to be judged (see also Figure 15.3b). This does not necessarily imply that
initial interpretations always start with a “mosaic stage”. Bruno et al. (1997) found no mosaic
stage for stimuli specified by binocular parallax and concluded that the occurrence of a mosaic
stage depends on various presentation constraints. They further argued that the relative slow time
course might be due to conflicting cues in pictorial displays in which, for example, T-junctions
favor completions while other cues favor a 2D percept. The primed matching paradigm has also
been used to establish differential effects with regard to local and global completions (Sekuler
1994; Sekuler et al. 1994; van Lier et al. 1995b, de Wit and van Lier 2002). It turned out that,
300 van Lier and Gerbino

(a) Targets

Comparison pattern Comparison pattern


(Complete occluder) (Truncated occluder)

TPC PC D

C PC D

(b)
Prime Test pair; Relative matching time

<

>

Fig. 15.3  (a) A display comprising a few stimulus combinations in the study of Gerbino and Salmaso
(1987). In a simultaneous matching task, matches could be topographical (T), phenomenal (P),
categorical (C), or different (D; i.e., a nonmatch). The phenomenal matches always involved amodal
completions. Matching times involving amodal completions (PC) were similar to matching times on
topographical matches (TPC); while both were faster than categorical matches. (b) A few prime/test
pair combinations in the primed matching task. When prime durations were larger than 200 ms, the
matching times following the occluded disks (third row) were similar to the matching times of the
complete disks (first row) and differed from matching times following the truncated disks (second row),
suggesting that the occluded disk in the prime has been amodally completed to a full disk.
(a) Reprinted from Acta Psychologica, 65 (1), W. Gerbino and D. Salmaso, The effect of amodal completion on
visual matching, pp. 22–25, Copyright © 1987, with permission from Elsevier. (b) Adapted from Allison B. Sekuler
and Stephen E. Palmer, Perception of partly occluded objects: A microgenetic analysis, Journal of Experimental
Psychology: General, 121(1), pp. 95–111, http://dx.doi.org/10.1037/0096-3445.121.1.95 © 1992, American
Psychological Association.

depending on particular shape properties, global completions often lead to larger facilitating
effects as compared to local completions.

Neural correlates of amodal completion


Behavioral experiments have shown that amodal completion is established within a time window
of a few hundred milliseconds. Using fMRI, Kourtzi and Kanwisher (2002) investigated which
Perceptual completions 301

cortical areas are involved in the process of amodal completion. As an experimental method they
used a sequential presentation paradigm to measure the so-called repetition suppression effect;
repetition of similar items leads to a reduction in BOLD activation. Kourtzi and Kanwisher (2001)
found such a suppression in the Lateral Occipital Complex (LOC) for the subsequent presenta-
tion of two patterns with reversed depth orders. In the latter patterns the physical contours were
different, due to occlusion, while the perceived shapes were identical. In a second experiment the
authors additionally showed that depth order reversal revealing the same contours but different
shapes did not induce repetition suppression. The suppression effect for the depth order reversal
when the same shapes are perceived shows that the LOC comprises representations of occluded
parts, exceeding the actual retinal input (see also Weigelt et al. 2007). Note that this does not imply
that these interpretations are actually established within the LOC.
Rauschenberger et al. (2006) also applied the repetition suppression paradigm and tuned in on
the time course of completion showing BOLD response modulation due to the literal shape after
100 ms exposures of an occlusion prime (a notched disk adjacent to a square) and BOLD response
modulations on the amodally completed shape (a full disk) after 250 ms exposures, and even
found such modulations in primary cortex areas V1 and V2. Further support for an initial mosaic
stage has been shown by Plomp et al. (2006) using MEG measurements. Also using MEG, de Wit
et al. (2006) found support for the prevalence of global as compared to local completions for a
set of highly regular shapes. Beside brain imaging research also single cell recordings in primates
revealed the impact of occlusion. For example, Sugita (1999) showed that neurons as early as
V1 and V2 responded to amodally completed bars under disparity conditions in which the central
part of a bar was perceived to be behind a partly occluding patch. In a more recent study Bushnell
et al. (2011) found that single neurons in V4 responded differently to real object contours as com-
pared to accidental contours caused by interposition of two partially overlapping surfaces.
Although there are still a number of open questions it is clear by now that amodal completion is
triggered relatively early in the visual process. It also appears to be early in an ontogenetic sense,
to be discussed next.

Infant research on amodal completion


Amodal completion has been a core topic in quite some infant research and appears to play a
decisive role in early developmental stages. Infant research on visual completion requires alterna-
tive research methodologies such as the habituation paradigm (see also Quinn and Bhatt, this vol-
ume). In a typical infant research set-up, infants are exposed to a habituation display comprising
an occlusion stimulus, such as the rod-and-box display in which two pieces of what could be one
single rod are moving back and forth behind an occluding box (Kellman and Spelke 1983; Kellman
et al. 1986). For infants of three to four months of age the complete rod prevails. Spatiotemporal
continuity is an important condition for young infants (Jusczyk et al. 1999; Kavšek 2004; Kellman
1984), and even for infants of two months (Johnson and Aslin 1995; Kawataba et  al. 1999),
although amodal completion does not necessarily occur (Carter et al. 2003; Johnson and Aslin
1996). It has been shown that at 4 months of age, good form may play a role as well (Johnson et al.
2002). Nevertheless, de Wit et al. (2008) showed that for certain ambiguous occlusion displays
four-month-old infants preferred local completions above global completions. It does not come as
a surprise that also for 3D completion there is quite some divergence in the results. For example,
Soska and Johnson (2008) did find object completion of the rear side of a geometric object (like
prisms) at six but not four months of age, whereas Vrins et al. (2011) did find completion effects in
four and a half-month-old infants as long as there were enough depth cues in the displays. Vrins
et al. (2011) also showed that four and a half-month-old infants may have certain expectations
302 van Lier and Gerbino

about the rear of relatively complex multi-object scenes such as Tse’s wrapped ghost figures (see
Figure 15.2d) in which the two blobs at each side are preferred to be connected. Apparently, the
results highly depend on the specific stimulus that is presented but also on the specific abilities
of the infant; age is but one of the crucial factors—the developmental stage of perceptual motor
abilities are important as well. A highly active baby has a more integrated view of her surrounding
world, including the ability to amodally complete hidden parts of objects (Soska et al. 2010). All
in all, care has to be taken not to over-generalize the experimental results.

More amodal completion phenomena: tunnels, animals


We close this section with briefly mentioning two additional research domains related to amodal
completion. The first deals with the so-called tunnel effect in dynamic occlusion displays (Burke
1952; Michotte et al. 1964/1991; Michotte 1946/1963). In this dynamic occlusion variant, mov-
ing objects are temporarily occluded, but persist representationally. The perceived continuity of
movement has triggered a wealth of research on perceptual causality and related phenomena like
apparent motion (Yantis 1995), change detection (Flombaum and Scholl 2006), and object track-
ing (Feldman and Tremoulet 2006). The second research domain covers studies on a wide range of
non-primate animals that further prove the fundamental nature of amodal completion. Amodal
completion has been found in mice (Kanizsa et al. 1993), chicks (Regolin and Vallortigara 1995),
and fish (Sovrano and Bisazza 2008), just to mention a few studies.

Modal completion
Modal completions like the triangle in Figure 15.1a are often called illusory surfaces (or surfaces
bounded by illusory contours) to stress that—contrary to real surfaces—their boundaries cross
a broad region of homogeneous luminance. When the background is white they appear even
whiter, which is taken as the signature of modal completion.
Several types of illusory contours exist. Some fit in the category of perceptual completions eas-
ily, since they are conceivable as extrapolations or interpolations of image contours; others do not.
Configurations in Figure 15.4 involve lines and dots that act as inducers or modifiers of illusory
contours not aligned with explicit image contours. Ehrenstein (1941/1987) devised the pattern
in Figure 15.4a to demonstrate that brightness contrast does not explain blobs induced by line
endings. Blobs of increased brightness are clearly visible when line inducers are thin (four upper
rows), but disappear when the inducers are so thick that the central blob is totally or almost totally
enclosed (two bottom rows), contrary to the expectation that contrast should increase with the
amount of black surrounding the target region. In b–c panels of Figure 15.4 the so-called Koffka
cross (used to discuss completion in the blind spot by Koffka 1935, p. 145, ­figure 20) induces a
rounded square when the arms are large (b) but a circle when the arms are narrow (c).4
Even more intriguing is the way dots gracefully modify the illusory shape (Figure 15.4d),
becoming part of it instead of acting as partially occluded elements (like conventional inducers
do), and turning the illusory boundaries concave against the preference for convexity observed
in several figure/ground phenomena (Barenholtz 2010; Bertamini 2001; Bertamini and Lawson
2008; Fantoni et al. 2005; Kanizsa and Gerbino 1976). The incorporation of dots in blobs induced
by line endings of the Ehrenstein grid, the Koffka cross, and similar patterns has been discussed

  The effect of line-ending separation on the illusory shape may be informative for computational theories of
4

completion (Thornber and Williams 1997).


Perceptual completions 303

(a) (b) (c)

(d) (e)

Fig. 15.4  Illusory figures induced by line patterns. (a) The Ehrenstein illusion in a variant of a
demonstration devised by Erhrenstein (1941, Figure 3; see also 1987); bright illusory blobs appear at
line endings in the four upper rows but not in the two lower rows, where the target white region is
totally or almost totally surrounded by black. (b) A broad-arm Koffka cross induces an illusory square
with rounded corners. (c) A narrow-arm Koffka cross induces an illusory disk. (d) Adding four dots to
the narrow-arm Koffka cross makes the illusory blob concave. (e) Past experience with the capital letter
E supports the illusory brightening of the letter body, consistent with top-left illumination; rotating the
page by 90 or 180 degrees impairs recognition of the letter E and destroys the illusory brightening.
Reproduced from ‘Can We See Constructs?’, Walter Gerbino and Gaetano Kanizsa, in Susan Petry and Glenn
E. Meyer (eds) The Perception of Illusory Contours, pp. 246–252, DOI: 10.1007/978-1-4612-4760-9_4 Copyright
© 1987, Springer-Verlag New York. With kind permission from Springer Science and Business Media.

by several authors (Day 1987; Day and Jory 1980; Gerbino and Kanizsa 1987; Kennedy 1987;
Minguzzi 1987; Sambin 1974) but still awaits a satisfactory explanation (Fantoni and Gerbino
2013; Vezzani 1999).
Figure 15.4e illustrates a category of illusory effects occurring when some two-tone images are
perceived as 3D objects under directional illumination, with sharp cast and attached shadows
(Ishikawa and Mogi 2011; Moore and Cavanagh 1998). Often, the emergence of the 3D struc-
ture takes the character of a visual discovery, involving a complex and irreversible figure/ground
switch favoured by past experience, like in pictures of the Gestalt completion test (Street 1931),
Mooney faces (Mooney 1957), and the dalmatian dog (for a discussion see Rock 1984). After the
reorganization that allows observers to overcome the initial camouflage, the discovered object
typically includes illusory surfaces classifiable as modal completions, similar to those used by Tse
(1998, 1999a; Tse and Albert 1998) to claim that illusory volumes can occur without the tangent
discontinuities that play such a crucial role in Kanizsa-like displays (Figure 15.5).
Illusory surfaces are perceived in a variety of conditions (broader than illustrated in our fig-
ures, which depict only some members of the family), against the idea that extrapolation and
interpolation of image contour fragments are the only mechanisms involved in their formation.
Therefore, the expression “modal completion” cannot be taken as denoting a hypothetical process
of joining input fragments by means of illusory additions, according to what Kogo and Wagemans
304 van Lier and Gerbino

(a) (b) (c)

Fig. 15.5  Illusory volumes constrained by global geometry. (a) The visible parts of the “sea monster”
are bounded by contours without tangent discontinuities; nevertheless they support an illusory surface
oriented in depth, occluding the amodal parts of the monster. (b) The amodally completed black
“worm” supports an illusory pole. (c) Partially occluded black rings surrounds a cylindrical illusory pole
Reproduced from Peter U. Tse, Illusory volumes from conformation, Perception 27(8) pp. 977–92, doi:10.1068/
p270977, Copyright © 1998, Pion. With kind permission from Pion Ltd, London www.pion.co.uk and www.
envplan.com.

(2013) consider a common misinterpretation found in the literature on mid-level vision. Rather,
it should be taken as denoting the phenomenal presence of parts devoid of an obvious local coun-
terpart (a luminance difference, in the case of surface contours) but supported by global stimulus
information and functional to the overall organization of the perceptual world.
Halko et al. (2008), who evaluated different theories of modal completion, pointed out that
extrapolation and interpolation mechanisms are insufficient to account for all aspects of illusory
contours, claimed that surface/figural processes are necessary, and experimentally supported the
general view that several mechanisms cooperate in the formation of illusory contours and (more
importantly) in the modulation of their vividness. Their conclusions agree with the central role of
illusory contours in vision science. Converging evidence indicates that they can be conceived as
a powerful effect of mid-level mechanisms constrained by image properties but oriented towards
scene analysis; i.e., they provide an ideal domain for testing propositions that link low-level repre-
sentations anchored to retinotopic properties and representations at the level of occluding objects
and 3D surfaces, available for recognition.

Incompleteness as a local cue


A key issue in explaining illusory contours is the possibility that their occurrence and vividness
totally depend on bottom-up mechanisms instantiated by local cues (i.e., features of input frag-
ments definable as inducers). Consider Figure 15.1a and Kanizsa’s original hypothesis that the
formation of the modal occluding triangle is functional to the amodal completion of incomplete
elements that, thanks to amodal parts, would achieve a better form—relative to the literal form
strictly correspondent to retinal topography—as expected from the minimum principle (Hubbard
2011; Leeuwenberg and van der Helm 2013; Palmer 1999).
Helmholtzian explanations of illusory contours (Gregory 1972; Rock 1987) refer to other prin-
ciples but treat local incompleteness as a prototypical condition. Several authors questioned local
incompleteness as a necessary and/or sufficient condition for the formation of illusory contours
(Pinna et al. 2004; Pinna and Grossberg 2006; Purghé and Katsaras 1991). However, it is generally
agreed (Albert and Hoffman 2000) that an image region with a local concavity between tangent
discontinuities (i.e., a generic pacman) both looks incomplete, when shown in isolation, and acts
as an effective inducer, when combined with analogous regions. A good demonstration that local
Perceptual completions 305

(a) (b) (c)

(d) (e) (f)

Fig. 15.6  Non-trivial effects of inducers. (a) Perceived incompleteness of inducers is unnecessary Aligned
contour fragments suffice to elicit an illusory triangle. (b,c) Regularly arranged rectilinear segments
lead to illusory contours that are much weaker than those induced by segments randomly varying in
orientation and length. (d,e,f) Convex inducers can support an illusory square, whose vividness in much
higher when they are irregular than regular.
(a) Reproduced from I. Rock and R. Anson, Illusory contours as the solution to a problem, Perception 8(6)
pp. 65–681, doi:10.1068/p080665, Copyright © 1979, Pion. With kind permission from Pion Ltd, London
www.pion.co.uk and www.envplan.com. (b and c) Reproduced from ‘Perceptual Grouping and Subjective
Contours’, Barbara Gillam, in Susan Petry and Glenn E. Meyer (eds) The Perception of Illusory Contours, pp. 268–
273, DOI: 10.1007/978-1-4612-4760-9_30 Copyright © 1987, Springer-Verlag New York. With kind permission
from Springer Science and Business Media. (d,e,f) Reproduced from M.K. Albert, Parallelism and the perception
of illusory contours, Perception 22(5) pp. 589–595, doi:10.1068/p220589, Copyright © 1993, Pion. With kind
permission from Pion Ltd, London www.pion.co.uk and www.envplan.com).

completeness/incompleteness matters was provided by van Lier et al. (2006), who discovered that
background contours are misaligned by an illusory square induced by pacmen but not by an
equivalent hole between crosses (following the same logic of c-d panels in Figure 15.1).
Rock (1983, p. 107; 1987, p. 64; Rock and Anson 1979) criticized perceived incompleteness as a
necessary condition on the basis of demonstrations like the one in Figure 15.6a. Each of the three
black regions looks as an irregular shape with a boundary that includes convexities and concavities
but does not convey a specific sense of incompleteness. Nevertheless, alignment of contour fragments
along a closed and regular boundary suffices for most observers to perceive an illusory surface. The
crucial role of alignment is confirmed by the reduced proportion of naïve observers who perceive an
illusory shape when the three relevant concavities cover a narrower angle, so that the interpolation of
distant contour fragments must be curvilinear and concave (not shown in Figure 15.6).
As emphasized by Rock (1987, p. 63), suboptimal patterns can support the perception of an
illusory shape after a figure/ground reorganization that entails the reversal of the occlusion polar-
ity of some contours fragments (in Figure 15.6a, those corresponding to the concave corners uni-
fied by the illusory triangle). This process can be influenced by set and knowledge, consistently
with the idea that inducer incompleteness cannot be taken as a pre-existing determinant of the
306 van Lier and Gerbino

formation of an illusory occluder. Nevertheless, when an illusory surface emerges in Figure 15.6a,
amodal completion—or at least, amodal continuation (Anderson 2007b; Gillam 2003; Minguzzi
1987)—becomes possible. In such cases amodal continuation follows, rather than precedes, the
reorganization that brings to modal life the illusory occluding surface. The causal relationship
between amodal and modal parts does not always hold.
Compare now Figure 15.6b and Figure 15.6c (Gillam 1987; Gillam and Chan 2002; Gillam and
Grove 2011). The illusory surface is more vivid when the inducing lines vary in orientation and
length (c) than when they group together in a regular array (b). Configural order acts as a global
factor affecting modal completion; which suggests that the degree of modal presence could be
taken as a measure of the amount of structural improvement involved in the mapping of a given
input into an organized pattern.
Another case in which the vividness of the illusory shape seems to be inversely related to induc-
ers’ regularity is illustrated in d-f panels of Figures 15.6 (Albert 1993). Parallelism of sides or,
more accurately, Ebenbreite (constant width; Morinaga 1941; Metzger 1953) is a powerful fac-
tor of figure/ground organization. When inducers are convex regions bounded by parallel sides
(rectangles in Figure 15.4d) the illusory square is barely visible, if it exists at all; one can easily
perceive only an orderly arrangement of rectangles sitting along a square perimeter. The illu-
sory square becomes visible (thanks to the pathognomonic lightness enhancement) when each
inducer is trapezoidal and can be locally improved by amodal continuation in the direction of a
parallelogram (Figure 15.6e), or triangular and can easily look as a small visible protrusion of an
indeterminate but clearly occluded shape (Figure 15.6f).

Kanizsa-type vs. Petter-type modal completions


Modal completions involving contours without gradient—to use the label preferred by Kanizsa
(1979)—come in two types: the Kanizsa-type (Figure 15.1a), in which the modal contour without
gradient divides the illusory figure from the ground, with amodally completed inducers lying in
between; and the Petter-type (Figure 15.1b) in which the modal contour without gradient divides
the front figure from the back figure, and both are divided from the ground by real contours.
Following the computer vision terminology (Waltz 1975), modal completions involve L-junctions
conceived as degenerate T-junctions with a missing edge due to the coincidental equivalence of
adjacent luminances; i.e., they depend on the assignment of edges of relevant L-junctions to dif-
ferent surfaces (rather than to the same surface), with a depth order dependent on the overall
figural context (Nakayama et al. 1989).
In Kanizsa-type completions only one edge of each convex L-junction of a pacman (Figure 15.1a)
becomes the intrinsic occlusion boundary of the amodally completed pacman, while the other edge is
assigned to the illusory occluding figure (becoming extrinsic to the pacman) and is extrapolated as an
occlusion boundary intrinsic to the illusory figure and separating it from the ground. In Petter-type
completions one edge of a concave L-junction (Figure 15.1b) becomes the intrinsic occlusion bound-
ary of the back figure (separating it from the ground), while the other edge is extrapolated as the
occlusion boundary intrinsic to the front figure and separating it from the back figure.
Describing L-junctions as degenerate T-junctions is geometrically correct (given the coinciden-
tal nature of the missing edge), but does not convey the idea that perceptual organization maps
L-junctions into X-junctions, by extrapolating one edge of the L-junction as an illusory modal
contour and the other as an occluded amodal contour. This idea strengthens the amodal-modal
link and makes clear that, in general, both completion phenomena should be considered as joint
products of organizing processes that strive for simplicity (i.e., driven by the tendency to mini-
mize the complexity of the representation). Let us make this hypothesis explicit for both types of
modal completion.
Perceptual completions 307

Kanizsa-type completions induced by pacmen and other extended regions depend on their ten-
dency towards amodal completion or, at least, amodal continuation. Processes activated by local
concavities and asymmetries are constrained by alignment and distance (to mention only the main
factors) and achieve a stable state by generating amodal parts that complete the input regions, but
also require the formation of an occluding surface partially bounded by illusory contours. Kanizsa
(1979) admitted that such aspects of perceptual organization are almost indistinguishable from
the generation and acceptance of object hypotheses, postulated by Gregory (1972) to account
for input gaps. The crucial difference regards “gaps”. According to Kanizsa the object-hypothesis
explanation fails to recognize that the very notion of “gap” is problematic. Rather than taking its
meaning for granted one should use illusory figures as an operational way of defining gaps and—
more generally—partial occlusions.
Petter-type completions occur when a single homogeneous region splits into two figures whose
stratification is, in optimal conditions, fully predictable on the basis of figural parameters. Petter
(1956) described several factors supporting a perceptual preference for a specific stratification
order in self-splitting figures. One is movement (if the black region deforms in a way consistent
with movement of one figure while the other remains stationary, the moving figure appears in
front). But static regions split as well, according to two figural factors: a preference for the order
that minimizes modal contours (a vs. b in Figure 15.7); and a preference for the modal completion

(a) (b)

(c) (d)

Fig. 15.7  Minimization of modal contours. (a) The bar is preferentially perceived in front of the disk
because such ordering, rather than the opposite, requires a modal contour shorter than the amodal
contour. (b) The disk is preferentially perceived in front of the bar because the two modal arcs are
shorter than the amodal rectilinear segments. (c) Petter (1956, p. 219) also hypothesized that the
preference for perceiving the larger shape in front depends on the higher support ratio (i.e., the modal
contour should be proportionally shorter), when modal and amodal contours have the same absolute
length. (d) Phenomenal undulation depends on the dominance of Petter’s rule over interposition, which
does not propagate from the unambiguous T-junctions joining the thin frame and the grey horizontal
bar towards the ambiguous L-junctions joining the thin frame and the black vertical bar.
(a) Reproduced from G. Petter, Nuove ricerche sperimentali sulla totalizzazione percettiva, Rivista di Psicologia, 50,
pp. 213–27, figure 9. (d) Reprinted from Acta Psychologica, 59(1), G. Kanizsa, Seeing and thinking, pp. 23–33,
Copyright © 1985, with permission from Elsevier.
308 van Lier and Gerbino

of contours with a higher support ratio (those in which the modal extrapolation is proportionally
shorter, relative to the length of the image-specified contour; Figure 15.7c). In static self-splitting
figures the tendency towards the minimization of modal contours agrees with the assumption that
representation costs are higher for modal than amodal contours of the same length, given that
modal contours are phenomenally visible though unsupported by local input evidence.
Kanizsa (1968/1979) referred to the first static factor (known as Petter’s rule) to explain strik-
ing demonstrations in which the perceived stratification order violates cognitive expectations.
Figure 15.7d displays a pattern modified from Kanizsa (1985) that illustrates a remarkable failure
of unambiguous T-junction information to propagate the stratification order over the whole thin
frame, because of the local dominance of Petter’s rule.
Tommasi et al. (1995) confirmed that the minimization of modal contours acts independently
of the empirical depth cue of relative size. Singh et al. (1999) established that Petter’s rule actu-
ally overcomes support ratio as a determinant of stratification of self-splitting figures, when the
two principles come into conflict, but also confirmed Petter’s intuition that support ratio matters,
when modal and amodal contour lengths are equal.

Modal completion in stereopsis


Research on stereopsis and amodal/modal completions reinforces the conclusion that amodal
completion of background surfaces can provide the driving force for the generation of modal
illusory occluders.
After von Szily’s neglected work (Ehrenstein and Gillam 1998) the “anomalous contour” observed
in dot matrix stereograms by Lawson and Gulick (1967; Lawson and Mount 1967) was the first dis-
covery of the power of monocular occlusions to generate illusory foreground surfaces (for a review
see Anderson and Julesz 1995). The emergence of cyclopean occluders bounded by modal contours
invisible to each eye does not require dense-texture Julesz-type stereograms (Julesz 1971). Patterns
containing minimal information about monocular occlusion are sufficient for the occurrence of Da
Vinci stereopsis with phantom occluders (Gillam and Grove 2004; Gillam and Nakayama 1999).
On the other hand, zero-disparity static textures can be captured by stereoscopic illusory contours
and illusory contours set in apparent motion (Ramachandran 1986).
Binocular disparity provides a powerful way of manipulating occlusion polarity and controlling
the shift between Kanizsa-type and Petter-type disambiguation of L-junctions, corresponding to
the perception of pacmen as partially occluded disks or portions of a background seen through
holes, respectively (Anderson et al. 2002; Anderson 2009; Ramachandran 1986). In an influential
paper Nakayama et al. (1990) discussed connections between disparity, amodal/modal comple-
tions, illusory contours and transparency (Nakayama 2009).

Kinetic illusory contours


In kinetic occlusion displays the segmentation between a figure bounded by modal contours and
a partially occluded ground is supported only by motion; namely, by specific transformations of
extended shapes, like in Sampaio’s screen effect (Leyssen 2011; Michotte et al. 1962; Sampaio 1943),
or by the accretion/deletion of texture elements (Gibson 1979; Gibson et al. 1969; Kaplan 1969).
Kinetic illusory contours result from the process of spatiotemporal boundary formation (Bruno 2001;
Shipley and Kellman 1994), which is supported by the effectiveness of depth-from-motion mecha-
nisms (Hegdé et al. 2004; Yonas et al. 1987) and constrained by inducer properties (e.g., speed and
dot density in accretion/deletion displays; Andersen and Cortese 1989; Barraza and Chen 2006).
Perceptual completions 309

Kinetic illusory figures depend on relative motion between their implicit boundary and
an appropriate set of inducers (appearing/disappearing texture elements, lines changing in
length, deforming shapes). However, Bruno and Gerbino (1991) showed that their shape is
modulated by factors beyond relative motion. When radial lines rotate behind an implicit
triangle, the illusory figure is triangular and rigid; when the radial lines keep their absolute
orientation constant and change their length consistently with the occlusion of a rotating tri-
angle, the illusory figure appears as a deforming blob with a specific shape heavily dependent
on the number of inducing lines. Orientation affects the connectability of line endings and,
consequently, the modally completed shape (Fantoni and Gerbino 2013). A theory of illusory
object formation in dynamic displays, consistent with the identity hypothesis, has been formu-
lated by Palmer et al. (2006).

Neural correlates of modal completion


Electrophysiological recordings in alert monkeys by von der Heydt et al. (1984) showed that about
one third of cells in V2 (but none in V1) respond to illusory contours induced by line endings only
slightly less than to real contours. Results for Kanizsa-type displays were similar (see also von der
Heydt and Peterhans 1989). Since this pioneering work, systematic efforts have been devoted to
clarify how the brain processes illusory contours.
In a PET study Ffytche and Zeki (1996) found that perception of illusory contours in a variant
of the Kanizsa triangle was associated with increased activity in early visual areas only (nota-
bly V2) and concluded that it occurred without cognitive influences. With respect to V1 results
are controversial. Ramsden et al. (2001) reported that illusory contour orientations were nega-
tively signalled in V1 and argued that such “de-emphasis” in V1, together with V2 activation,
could provide the unique signature of illusory contours. In a study on moving Kanizsa-type dis-
plays, Seghier et al. (2000) found activation in V5 but also clear activation in V1. Lee and Nguyen
(2001) reported that V1 neurons do respond to static Kanizsa figures, but under the feedback
modulation from V2.
A review of neuroimaging studies by Seghier and Vuilleumier (2006) reached the conclusion
that illusory contours may involve more than a single brain locus or a single perceptual pro-
cess, and engage early, intermediate, and late stages in the hierarchy of brain processing. They
proposed two distinct illusory contour mechanisms, each with a different time-course, involving
both feedforward signals from low-level areas and feedback signals from higher processing stages.
Komatsu (2006) reviewed research on the neural basis of filling-in and also reached the conclu-
sion that the modal character of the Kanizsa triangle may require higher cortical areas (as shown
by Mendola et al. 1999), but is correlated to the activation of V1 by feedback connections. Stanley
and Rubin (2003) demonstrated that fMRI activity in the human LOC was elevated for both sharp
illusory shapes (bounded by well-defined illusory contours) and vague salient regions (illusory
blobs without sharp contours).
Using fMRI adaptation data Montaser-Koushari et  al. (2007) were able to detect
orientation-selective responses to illusory contours in multiple visual areas. Pan et  al. (2012)
combined intrinsic optical imaging in anesthetized rhesus macaques with single-cell recordings
in awake ones, and found a complete overlap of orientation domains in V4 for processing real
contours and illusory contours induced by line endings; whereas the orientation domains mapped
in V1 and V2 mainly encode the local features of the inducers. Their results indicate that real and
illusory contours are represented equivalently in V4, which seems to be a good candidate for the
integration of local features into global contours.
310 van Lier and Gerbino

Modal completion in infants


Research on infant perception of illusory figures focused on the amount of experience neces-
sary for adult-like performance (Condry et  al. 2000; Kellman and Arterberry 1998). Contrary
to behavioral and EEG studies showing that illusory contours do not emerge before the seventh
month (Bertenthal et al. 1980; Csibra 2001; Csibra et al. 2000) Bremner et al. (2012) demonstrated
that four-month-olds do perceive a Kanizsa rectangle as an occluding surface when the gap in the
horizontal trajectory of the deleting/accreting object is about 4.4 degrees wide, but not when it is
5.9 degrees wide.
Using a carefully chosen, quite underspecified, kinetic display Valenza and Bulf (2011) demon-
strated that modal completion and illusory contours are available at birth, in the absence of any
visual experience. First, newborns were tested for their ability to detect a rod-and-box display;
then, they were habituated to an illusory rod-and-box display, to a control display without illusory
contours, or to a real rod-and-box display. The rod was perceived as a unit in both illusory and real
conditions, consistently with the idea that experiencing objects as bounded and spatiotemporally
continuous is part of innate knowledge (Kellman and Spelke 1983). Valenza and Bulf ’s results
confirm the importance of kinetic information in modal completion (Johnson and Aslin 1998;
Otsuka and Yamaguchi 2003) and suggest that the formation of illusory occluders depend on a
basic visual capability already present at birth, although constrained by newborn’s perceptual and
attentional limitations.
As regards the effectiveness of Kanizsa-type illusory figures to capture attention, Bulf et  al.
(2009) found a pop-out effect in six-month-old infants for real but not illusory targets (contrary to
adults who exhibited a pop-out effect for both), despite the ability to perceive the Kanizsa triangle,
as established in a preferential-looking task.

Modal completion in animals


Nieder (2002) reviewed comparative evidence on the perception of illusory contours and con-
cluded that various animal species are able to perform such perceptual completions and see
contours without luminance contrast gradients, thanks to processes that take place at early levels
of the visual system and are largely independent from top-down influences.
Behavioral and neural evidence show that—among others—honeybees (van Hateren et  al.
1990), chicks (Zanforlin 1981), cats (Bravo et al. 1988), and monkeys (von der Heydt et al. 1984)
perceive Kanizsa-type illusory contours. Zylinski et al. (2012) examined the dynamic camouflage
responses of the cuttlefish and found evidence of modal completion of contour fragments.
Using Petter-type displays, Forkman and Vallortigara (1999) demonstrated that hens, like
humans, are sensitive to the minimization of modal contours in self-splitting figures according to
Petter’s rule. Vallortigara and Tommasi (2001) discussed this result as an example of evolutionary
convergence toward a perceptual universal (Shepard 2001).

General discussion
Modal and amodal completions both deal with percepts that go beyond the retinal input. Kellman
and Shipley (1991) coined the identity hypotheses, which states that modal and amodal comple-
tions share the same underlying mechanisms and identical representations, at some processing
stage. This basically elegant idea has been highly debated in recent years (Albert 2007; Anderson
et al. 2002; Anderson 2007a, 2007b; Kellman et al. 2007; Singh 2004). According to the identity
hypothesis, one of the predictions is that modally and amodally completed contours should be
Perceptual completions 311

the same when the geometric properties of the shapes are the same. Anderson et al. (2002) and
Singh (2004) argued that this prediction is incorrect. More in particular they argued that modal
and amodal completions generate different percepts and that neurophysiological data are not in
line with the identity hypothesis (see also Anderson 2007a). Differential percepts also occur when
shape regularities like symmetry are involved. Such regularities seem to affect amodal completion
more than modal completion. Kellman et al. (2005a) argued that in such cases the amodal presence
is due to a process they referred to as Recognition from Partial Information (RPI) which would
then overrule the completion processes. Anderson (2007a) responded that splitting up the amodal
completion processes in two different processes (one identical to modal completions, based on
relatability criteria, and one sensitive to global regularities) lacks experimental support. So far, the
controversy continues; further investigations may shed more light on this issue.
A fruitful direction to push research forward lies in the development of neurally plau-
sible, computational models of perceptual grouping. Here we refer to the so-called DISC
(Differentiation-Integration for Surface Completion) model by Kogo et al. (2010) that accounts
for depth ordering of surfaces in 2D patterns. Their model is built on the notion of border owner-
ship; by means of appropriate feedback mechanisms image borders are assigned to surfaces and,
with that, more or less stable interpretations of an ambiguous pattern can be reached. The percep-
tion of modal completion, for example, is (re)produced when such border ownership signals arise
at the location of illusory contours. The DISC model appears sensitive to certain global stimulus
properties and bridges between amodal and modal completions (see also Kogo and van Ee, this
volume).
The role of shape regularities also touches upon the seeing-thinking issue in amodal comple-
tion as triggered by Kanizsa (1979, 1985; Kanizsa and Gerbino 1982; but see also Michotte et al.
1964) who demonstrated different completion tendencies due to perception versus knowledge.
According to Kanizsa, perception runs it own course even if knowledge would predict a different
outcome. The influence of knowledge on amodal completion is an issue that deserves more atten-
tion in future research (see also Gerbino and Zabai 2003; Vrins et al. 2009; Hazenberg et al., 2013).
For example Vrins et al. (2009) have shown that object-related knowledge such as the hardness of
materials (after Gerbino and Zabai 2003), may influence the perceptual outcome relatively early
in the perceptual process. Obviously, interpretations of occlusion scenes depend on bottom up
streams and top down streams, revealing a complex interplay between sensory input and world
knowledge. There is need of getting a clearer picture of the processes involved in amodal com-
pletion. In the end, however, it might turn out to be a hazardous enterprise to draw a firm line
between perception and cognition, certainly so at the cortical level.
Finally, we would like to remark that the scope of this chapter was restricted to a selection of, in
our view relevant, completion issues within the visual modality. There are filling-in effects in other
sensory modalities as well, such as the auditory domain (Bregman 1990; Riecke et al. 2012), or the
tactile domain (Flach and Haggard 2006; Geldard and Sherrick 1972). In all sensory modalities
the study of processes overcoming the interruption of ongoing input opens up a window to the
underlying representational processes. Given the outcomes of behavioral and neurocognitive
research in adults, infants, and animals, it has become clear that completion processes are funda-
mental for the perception of the surrounding world.

References
Albert, M. K. (1993). Parallelism and the perception of illusory contours. Perception 22: 589–95.
Albert, M. K. (2007). Mechanisms of amodal completion. Psychological Review 114: 455–69.
312 van Lier and Gerbino

Albert, M. K. and Hoffman, D. D. (2000). The generic-viewpoint assumption and illusory contours.
Perception 29: 303–12.
Anderson, B. L. (2007a). The demise of the identity hypothesis and the insufficiency and nonnecessity of
contour relatability in predicting object interpolation: Comment on Kellman, Garrigan, and Shipley
(2005). Psychological Review 114: 470–87.
Anderson, B. L. (2007b). Filling-in models of completion: Rejoinder to Kellman, Garrigan, Shipley, and
Keane (2007) and Albert (2007). Psychological Review 114: 509–27.
Anderson, B. L. (2009) Revisiting the relationship between transparency, subjective contours, luminance,
and color spreading. Perception 38: 869–71.
Anderson, B. L. and Julesz, B. (1995). A theoretical analysis of illusory contour formation in stereopsis.
Psychological Review 102: 705–43.
Anderson, B. L., Singh, M. and Fleming, R. W. (2002). The interpolation of object and surface structure.
Cognitive Psychology 44: 148–90.
Andersen, G. J. and Cortese, J. M. (1989). 2-D contour perception resulting from kinetic occlusion.
Perception and Psychophysics 46: 49–55.
Barenholtz, E. (2010). Convexities move because they contain matter. Journal of Vision 10: 1–12.
Barraza J. F. and Chen, V. J. (2006). Vernier acuity of illusory contours defined by motion. Journal of Vision
14: 923–32.
Bertamini, M. (2001). The importance of being convex: An advantage for convexity when judging position.
Perception 30: 1295–310.
Bertamini, M. and Lawson, R. (2008). Rapid figure-ground responses to stereograms reveal an advantage
for a convex foreground. Perception 37: 483–94.
Bertenthal, B. I., Campos, J. J., and Haith, M. M. (1980) Development of visual organization: The
perception of subjective contours. Child Development 51: 1072–80.
Boselie, F. (1988). Local versus global minima in visual pattern completion. Perception and Psychophysics
43: 431–45.
Bravo, M., Blake, R., and Morrison, S. (1988). Cats see subjective contours. Vision Research 28: 861–5.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge: MIT
Press.
Bremner, J. G., Slater, A. M., Johnson, S. P., Mason, U. C., and Spring, J. (2012). Illusory contour figures
are perceived as occluding contours by 4-month-old infants. Developmental Psychology 48: 398–405.
Bruno, N. (2001). Breathing illusions and boundary formation in space-time. In: T. Shipley and P.
J. Kellman (eds.). From Fragment to Objects. Segmentation and Grouping in Vision, pp. 402–27.
New York: Elsevier.
Bruno, N. and Gerbino, W. (1991). Illusory figures based on local kinematics. Perception 20: 259–73.
Bruno, N., Bertamini, M., and Domini, F. (1997). Amodal completion of partly occluded surfaces: Is there
a mosaic stage? Journal of Experimental Psychology 23: 1412–26.
Buffart, H., Leeuwenberg, E., and Restle, F. (1981). Coding theory of visual pattern completion. Journal of
Experimental Psychology: Human Perception and Performance 7: 241–74.
Bulf, H., Valenza, E., and Simion, F. (2009). The visual search of an illusory figure: A comparison between
6-month-old infants and adults. Perception 38: 1313–27.
Burke, L. (1952). On the tunnel effect. The Quarterly Journal of Experimental Psychology, 4: 121–38.
Reprinted in A. Michotte et collaborateurs (eds.) (1962), Causalité, permanence et réalité phénoménales,
pp. 374–406. Louvain: Publications Universitaires.
Bushnell, B., Harding, P., Kosai, Y., and Pasupathy A. (2011). Partial occlusion modulates contour-based
shape encoding in primate area V4. Journal of Cognitive Neuroscience 31: 4012–24.
Chapanis, A. and McCleary, R. A. (1953). Interposition as a cue for the perception of relative distance.
Journal of General Psychology 48: 113–32.
Perceptual completions 313

Condry, K. F., Smith, W. C., and Spelke, E. S. (2000). Development of perceptual organization. In:
F. Lacerda and M. Heiman (eds.), Emerging Cognitive Abilities in Early Infancy, pp. 1–28. Hillsdale,
NJ: Erlbaum.
Csibra, G. (2001). Illusory contour figures are perceived as occluding surfaces by 8-month-old infants.
Developmental Science 4: F7–F11.
Csibra, G., Davis, G., Spratling, M. W., and Johnson, M. H. (2000). Gamma oscillations and object
processing in the infant brain. Science 290: 1582–5.
Day, R. H. (1987). Cues for edge and the origin of illusory contours: an alternative approach. In: S. Petry
and G. E. Meyer (eds.). The Perception of Illusory Contours, pp. 53–61. New York: Springer.
Day, R. H. and Jory, M. K. (1980). A note on a second stage in the formation of illusory contours.
Perception and Psychophysics 27: 89–91.
de Wit, T. and van Lier, R. (2002). Global visual completion of quasi-regular shapes. Perception
31: 969–84.
de Wit, T. C. J., Mol, K. R., and van Lier, R. (2005). Investigating metrical and structural aspects of visual
completion: Priming versus searching. Visual Cognition 12: 409–28.
de Wit, T., Bauer, M., Oostenveld, R., Fries, P., and van Lier, R. (2006). Cortical responses to contextual
influences in amodal completion. Neuroimage 32: 1815–25.
de Wit, T. C. J., Vrins, S., DeJonckheere, P. J. N., and van Lier, R. (2008). Form perception of partly
occluded objects in 4-month-old infants. Infancy 13: 660–74.
Dinnerstein, D. and Wertheimer, M. (1957). Some determinants of phenomenal overlapping. The American
Journal of Psychology 70: 21–37.
Ehrenstein, W. (1941). Über Abwandlungen der L. Hermannschen Helligkeitserscheinung. Zeitschrift für
Psychologie, 150, 83–91. English translation, Modifications of the brightness phenomenon of L. Hermann.
In: S. Petry and G. E. Meyer (eds.) (1987). The Perception of Illusory Contours, pp. 246–52. New York: Springer.
Ehrenstein, W. H. and Gillam, B. J. (1998). Early demonstrations of subjective contours, amodal
completion, and depth from half-occlusions: “stereoscopic experiments with silhouettes” by Adolf von
Szily (1921). Perception 27: 1407–16.
Fantoni, C. and Gerbino, W. (2003). Contour interpolation by vector-field combination. Journal of Vision
3: 281–303.
Fantoni, C. and Gerbino, W. (2013). “Connectability” matters too: Completion theories need to be
complete. Cognitive Neuroscience 4: 47–8.
Fantoni, C., Bertamini, M., and Gerbino W. (2005). Contour curvature polarity and surface interpolation.
Vision Research 45: 1047–62.
Fantoni, C., Hilger, J. D., Gerbino, W., and Kellman, P. J. (2008). Surface interpolation and 3D relatability.
Journal of Vision 8: 1–19.
Feldman, J. and Tremoulet, P. (2006). Individuation of visual objects over time. Cognition, 99: 131–65.
Ffytche, D. H. and Zeki, S. (1996). Brain activity related to the perception of illusory contours. Neuroimage
3: 104–8.
Flach, R. and Haggard, P. (2006). The cutaneous rabbit revisited. Journal of Experimental
Psychology: Human Perception and Performance 32: 717–32.
Flombaum, J. I. and Scholl, B. J. (2006). A temporal same-object advantage in the tunnel effect: Facilitated
change detection for persisting objects. Journal of Experimental Psychology: Human Perception and
Performance 32: 840–53.
Forkman, B. and Vallortigara, G. (1999). Minimization of modal contours: an essential cross-species
strategy in disambiguating relative depth. Animal Cognition 2: 181–5.
Geldard, F. and Sherrick, C. (1972). The cutaneous “rabbit”: A perceptual illusion. Science, 178: 178–9.
Gerbino, W. and Kanizsa, G. (1987). Can we see constructs?. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 246–52. New York: Springer.
314 van Lier and Gerbino

Gerbino, W. and Salmaso, D. (1987). The effect of amodal completion on visual matching. Acta
Psychologica 65: 25–46.
Gerbino, W. and Zabai, C. (2003). The joint. Acta Psychologica 114: 331–53.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gibson, J. J., Kaplan, G. A., Reynolds, H. N., and Wheeler, K. (1969). The change from visible to
invisible: A study of optical transitions. Perception and Psychophysics 5: 113–16.
Gillam, B. J. (1987). Perceptual grouping and subjective contours. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 268–73. New York: Springer.
Gillam, B. J. (2003). Amodal completion—A term stretched too far: The role of amodal continuation.
Perception 32 (Suppl.): 27.
Gillam, B. J. and Chan, W. M. (2002). Grouping has a negative effect on both subjective contours and
perceived occlusion at T-junctions. Psychological Science 13: 279–83.
Gillam, B. J. and Grove, P.M. (2004). Slant or occlusion: global factors resolve stereoscopic ambiguity in
sets of horizontal lines. Vision Research 44: 2359–66.
Gillam, B. J. and Grove, P. M. (2011). Contour entropy: A new determinant of perceiving ground or a hole.
Journal of Experimental Psychology: Human Perception and Performance 37: 750–7.
Gillam, B. J. and Nakayama, K. (1999). Quantitative depth for a phantom surface can be based on
cyclopean occlusion cues alone. Vision Research 39: 109–12.
Glynn, A. J. (1954). Apparent transparency and the tunnel effect. Quarterly Journal of Experimental
Psychology 6: 125–39. Reprinted in A. Michotte et collaborateurs (eds.) (1962), Causalité, permanence et
réalité phénoménales, pp. 422–32. Louvain: Publications Universitaires.
Gregory, R. (1972). Cognitive contours. Nature 238: 51–2.
Halko, M. A., Mingolla, E., and Somers, D. C. (2008). Multiple mechanisms of illusory contour perception.
Journal of Vision 8: 1–17.
Hateren, J. H. van, Srinivasan M.V., and Wait, P.B. (1990) Pattern recognition in bees: orientation
discrimination. Journal of Comparative Physiology A, 167: 649–54.
Hazenberg, S. J. Jongsma, M., Koning, A., and van Lier, R. (2014). Differential familiarity effects in amodal
completion: Support from behavioral and electrophysiological measurements. Journal of Experimental
Psychology: Human Perception & Performance, 40: 669–84.
Hegdé, J., Albright, T. D., and Stoner, G. R. (2004). Second-order motion conveys depth-order
information. Journal of Vision 4: 838–42.
Helmholtz, H. von (1867). Handbuch der physiologischen Optik. Leipzig: Voss. English translation by J.
P. C. Southall of the third German edition (1910): Treatise on Physiological Optics. New York: Dover,
1924. Available at: <http://poseidon.sunyopt.edu/BackusLab/Helmholtz/>.
Hochberg, J. E. and McAlister, E. (1953). A quantitative approach to figural “goodness”. Journal of
Experimental Psychology 46: 361–4.
Hubbard, T. L. (2011). Extending Prägnanz: Dynamic aspects of mental representation and Gestalt
principles. In: L. Albertazzi, G. van Tonder, and, D. Vishwanath (eds.), Perception Beyond Inference: The
Information Content of Visual Processes, pp. 75–108. Cambridge, MA: MIT Press.
Ishikawa. T. and Mogi, K. (2011). Visual one-shot learning as an “anti-camouflage device”: a novel
morphing paradigm. Cognitive Neurodynamics 5: 231–9.
Jackendoff, R. S. (1992). Languages of the Mind: Essays on Mental Representation. Cambridge: MIT Press.
Johnson, S. P. and Aslin, R. N. (1995). Perception of object unity in 2-month-old infants. Developmental
Psychology 31: 739–45.
Johnson, S. P. and Aslin, R. N. (1996). Perception of object unity in young infants: The roles of motion,
depth, and orientation. Cognitive Development 11: 161–80.
Johnson, S. P. and Aslin, R. N. (1998). Young infants’ perception of illusory contours in dynamic displays.
Perception 27: 341–53.
Perceptual completions 315

Johnson, S. P., Bremner, J. G., Slater, A. M., Mason, U. C., and Foster, K. (2002). Young infants’ perception
of unity and form in occlusion displays. Journal of Experimental Child Psychology 81: 358–74.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: University of Chicago Press.
Jusczyk, P. W., Johnson, S. P., Spelke, E. S., and Kennedy, L. J. (1999). Synchronous change and perception
of object unity: evidence from adults and infants. Cognition 71: 257–88.
Kanizsa, G. (1954). Linee virtuali e margini fenomenici in assenza di discontinuità di stimolazione. Atti
del X convegno degli psicologi italiani, Chianciano Terme—Siena, 10–14 ottobre. Firenze: Editrice
Universitaria.
Kanizsa, G. (1955). Margini quasi-percettivi in campi con stimolazione omogenea. Rivista di Psicologia
49: 7–30. English translation, Quasi-perceptual margins in homogeneously stimulated fields. In S. Petry
and G. E. Meyer (eds.) (1987), The Perception of Illusory Contours, pp. 40–9. New York: Springer.
Kanizsa, G. (1968). Percezione attuale, esperienza passata e l’ “esperimento impossibile”. In: G. Kanizsa and
G. Vicario (eds.) Ricerche sperimentali sulla percezione. Trieste: Edizioni Università degli Studi, pp. 9–48.
English translation in: Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kanizsa, G. (1985). Seeing and thinking. Acta Psychologica 59: 23–33.
Kanizsa, G. (1987). 1986 Addendum. In: S. Petry and G. E. Meyer (eds.) The Perception of Illusory Contours,
p. 49. New York: Springer.
Kanizsa, G. and Gerbino, W. (1976). Convexity and symmetry in figure-ground organization. In M. Henle
(ed.), Vision and Artifact, pp. 25–32. New York: Springer.
Kanizsa, G. and Gerbino, W. (1982). Amodal completion: Seeing or thinking? In: J. Beck (ed.),
Organization and Representation in Perception, pp. 167–190. Hillsdale, NJ: LEA.
Kanizsa, G., Renzi, P, Conte, S, Compostela, C., and Guerani, L. (1993). Amodal completion in mouse
vision. Perception 22: 713–21.
Kaplan, G. A. (1969). Kinetic disruption of optical texture: The perception of depth at an edge. Perception
and Psychophysics 6: 193–8.
Kavšek, M. (2004). Infant perception of object unity in static displays. International Journal of Behavioural
Development 28: 538–45.
Kawataba, H., Gyoba, J., Inoue, H., and Ohtsubo, J. (1999). Visual completion of partly occluded grating in
infants under 1 month of age. Vision Research 39: 3586–91.
Kellman, P. J. (1984). Perception of three-dimensional form by human infants. Perception and Psychophysics
36: 353–538.
Kellman, P. J. and Arterberry. M. E. (1998). The Cradle of Knowledge. Cambridge: MIT Press.
Kellman, P. J., and Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cognitive
Psychology 23: 141–221.
Kellman, P. J., and Spelke, E. S. (1983). Perception of partly occluded objects in infancy. Cognitive
Psychology 15: 483–524.
Kellman, P. J., Spelke, E. S., and Short, K. R. (1986). Infant perception of object unity from translatory
motion in depth and vertical translation. Child Development 57: 72–86.
Kellman, P. J., Garrigan, P., and Shipley, T. F. (2005a). Object interpolation in three dimensions.
Psychological Review 112: 586–609.
Kellman, P. J., Garrigan, P., Shipley, T. F., Yin, C., and Machado, L. (2005b). 3-D interpolation in object
perception: Evidence from an objective performance paradigm. Journal of Experimental Psychology
31: 558–83.
Kellman, P. J., Garrigan, P, Shipley, T., and Keane, B. (2007). Interpolation processes in object
perception: Reply to Anderson (2007). Psychological Review 114: 488–502.
Kennedy, J. M. (1987). Lo, perception abhors not a contradiction. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 253–61. New York: Springer.
316 van Lier and Gerbino

Kitaoka, A., Gyoba, J., Sakurai, K., and Kawabata, H. (2001). Similarity between Petter’s effect and visual
phantoms. Perception 30: 519–22.
Koenderink, J. (1990). Solid shape. Cambridge: MIT Press.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace.
Kogo, N. and Wagemans, J. (2013). The “side” matters: How configurality is reflected in completion.
Cognitive Neuroscience 4: 31–45.
Kogo, N., Strecha, C., van Gool, L., and Wagemans, J. (2010). Surface construction by a 2-D
differentiation–integration process: A neurocomputational model for perceived border ownership,
depth, and lightness in Kanizsa figures. Psychological Review 117: 406–39.
Komatsu, H. (2006) The neural mechanisms of perceptual filling-in. Nature Reviews Neuroscience 7: 220–31.
Koning, A. and van Lier, R. (2004) Mental rotation depends on the number of objects rather than on the
number of image fragments. Acta Psychologica 117: 65–77.
Kourtzi, Z. and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293: 1506–9.
Lawson, R. B. and Gulick, W. L. (1967). Stereopsis and anomalous contour. Vision Research 7: 271–97.
Lawson R. B. and Mount, D. C. (1967). Minimum condition for stereopsis and anomalous contour. Science
158: 804–6.
Lee T. S. and Nguyen M. (2001). Dynamics of subjective contour formation in the early visual cortex.
Proceedings of the National Academy of Sciences of the United States of America 98: 1907–11.
Leeuwenberg, E. L. J. (1969). Quantitative specification of information in sequential patterns. Psychological
Review 76: 216–20.
Leeuwenberg, E. L. J. (1971). A perceptual coding language for visual and auditory patterns. The American
Journal of Psychology 84: 307–49.
Leeuwenberg, E. L. J. and van der Helm, P. A. (2013). Structural Information Theory: The Simplicity of
Visual Form. Cambridge: Cambridge University Press.
Leyssen, S. (2011). “B moves farther than it should have done”: Perceived boundaries in Albert Michotte’s
experimental phenomenology of perception. In: M. Grote and M. Stadler (eds.) Membranes Surfaces
Boundaries: Interstices in the History of Science, Technology and Culture. Preprint 420, pp. 85–104.
Berlin: Max Planck Institute for the History of Science.
Mendola, J., Dale, A., Fischl, B., Liu, A., and Tootell, R. (1999). The representation of illusory and real
contours in human cortical visual areas revealed by functional magnetic resonance imaging. Journal of
Neuroscience 19: 8560–72.
Metzger, W. (1936). Gesetze des Sehens. Frankfurt: Kramer. English translation by L. Spillmann, S. Lehar, M.
Stromeyer, and M. Wertheimer (2006) The Laws of Seeing. Cambridge, MA: MIT Press.
Metzger, W. (1953). Gesetze des Sehens, 2nd edition. Frankfurt: Kramer.
Michotte, A. (1946/1963). The Perception of Causality. New York: Basic Books.
Michotte, A. and Burke, L. (1951). Une nouvelle énigme de la psychologie de la perception: le “donnée
amodal” dans l’experience sensorielle. Actes du XIII Congrés Internationale de Psychologie, Stockholm,
Proceedings and papers, pp. 179–80. Reprinted in: A. Michotte et collaborateurs (eds.) (1962), Causalité,
permanence et réalité phénoménales, pp. 347–71. Louvain: Publications Universitaires.
Michotte, A., Thinès, G., and Crabbé, G. (1964). Les Compléments Amodaux des Structure Perceptives.
Louvain: Publications Universitaires. English traslation, Amodal completion of perceptual structures.
In: G. Thinès, A. Costall, and G. Butterworth (eds.) (1991), Michotte’s Experimental Phenomenology of
Perception, pp. 140–67. Hillsdale, NJ: Erlbaum.
Minguzzi, G. F. (1987). Anomalous figures and the tendency to continuation. In: S. Petry and G. E. Meyer
(eds.). The Perception of Illusory Contours, pp. 71–5. New York: Springer.
Montaser-Kouhsari, L., Landy, M.S., Heeger, D. J., and Larsson, J. (2007) Orientation selective adaptation
to illusory contours in human visual cortex. Journal of Neuroscience 27: 2186–95.
Perceptual completions 317

Mooney, C. M. (1957). Age in the development of closure ability in children. Canadian Journal of
Psychology 11: 219–26.
Moore, C. and Cavanagh, P. (1998). Recovery of 3D volume from 2-tone images of novel objects. Cognition
67: 45–71.
Morinaga, S. (1941). Beobachtungen über Grundlagen und Wirkungen anschulich gleichmässiger Breite,
Archiv für die gesamte Psychologie 110: 310–48.
Nakayama, K. (2009). Nakayama, Shimojo, and Ramachandran’s 1990 paper. Perception 38: 859–77.
Nakayama, K., Shimojo, S., and Silverman, G. H. (1989). Stereoscopic depth: Its relation to image
fragmentation, grouping, and the recognition of occluded objects. Perception 18: 55–68.
Nakayama, K., Shimojo, S., and Ramachandran, V. S. (1990). Transparency: relation to depth, subjective
contours, luminance, and neon color spreading. Perception 19: 497–513.
Nieder, A. (2002). Seeing more than meets the eye: processing of illusory contours in animals. Journal of
Comparative Physiology A 188: 249–60.
Otsuka, Y., and Yamaguchi, M. K. (2003). Infants’ perception of illusory contours in static and moving
figures. Journal of Experimental Child Psychology 86: 244–51.
Palmer, S. E. (1999). Gestalt perception. In: R. A. Wilson and F. C. Keil (eds.). The MIT Encyclopedia of
Cognitive Science, pp. 344–6. Cambridge: MIT Press.
Palmer, E. M., Kellman, P. J., and Shipley, T. F. (2006). A theory of dynamic occluded and illusory object
perception. Journal of Experimental Psychology: General 135: 513–41.
Pan, Y., Chen, M., Yin, J., An, X. Zhang, X., Lu, Y., Gong, H., Li, W., and Wang, W. (2012). Equivalent
representation of real and illusory contours in macaque V4. The Journal of Neuroscience 32: 6760–70.
Pessoa, L. and De Weerd, P. (eds.) (2003). Filling-in: From Perceptual Completion to Cortical Reorganization.
New York: Oxford University Press.
Pessoa, L., Thompson, E. and Noë, A. (1998). Finding out about filling-in: a guide to perceptual completion
for visual science and the philosophy of perception. Behavioral and Brain Sciences 21: 723–48
(discussion 748–802).
Petter, G. (1956). Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di Psicologia
50: 213–27.
Pinna, B. and Grossberg, S. (2006). Logic and phenomenology of incompleteness in illusory figures: New
cases and hypotheses. Psychofenia 9: 93–135.
Pinna, B., Ehrenstein, W. H., and Spillmann, L. (2004). Illusory contours and surfaces without amodal
completion and depth stratification. Vision Research 44: 1851–5.
Plomp, G., Liu, L., van Leeuwen, C., and Ioannides, A. (2006) The mosaic stage in amodal completion as
characterized by magnetoencephalography responses. Journal of Cognitive Neuroscience 18: 1394–405.
Purghé, F. and Katsaras, P. (1991). Figural conditions affecting the formation of anomalous surfaces: overall
configuration versus single stimulus part. Perception 20: 193–206.
Ramachandran, V. S. (1986). Capture of stereopsis and apparent motion by illusory contours. Perception
and Psychophysics 39: 361–73.
Ramsden, B., Hung, C., and Roe, A. (2001) Real and illusory contour processing in area V1 of the
primate: a cortical balancing act. Cerebral Cortex 11: 648–65.
Ratoosh, P. (1949). On interposition as a cue for the perception of distance. Proceedings of the National
Academy of Science USA 35: 257–9.
Rauschenberger, R. and Yantis, S. (2001). Masking unveils pre-amodal completion representation in visual
search. Nature 410: 369–72.
Rauschenberger, R., Liu, T., Slotnick, S.D., and Yantis, S. (2006) Temporally unfolding neural
representation of pictorial occlusion. Psychological Science 17: 358–64.
Rauschenberger, R., Peterson, M. A., Mosca, F., and Bruno, N. (2004). Amodal completion in visual
search: Preemption or context effects? Psychological Science 15: 351–5.
318 van Lier and Gerbino

Regolin, L. and Vallortigara, G. (1995) Perception of partly occluded objects by young chicks. Perception
and Psychophysics 57: 971–6.
Rensink, R. A. and Enns, J. T. (1998). Early completion of occluded objects. Vision Research 38: 2489–505.
Riecke, L., Micheyl, C., and Oxenham, A. (2012). Global not local masker features govern the auditory
continuity illusion. Journal of Neuroscience 32: 4660–64.
Ringach, D. and Shapley, R. (1996). Spatial and temporal properties of illusory contours and amodal
boundary completion. Vision Research 19: 3037–50.
Rock, I. (1983). The Logic of Perception. Cambridge: MIT Press.
Rock, I. (1984). Perception. New York: Freeman.
Rock, I. (1987). A problem-solving approach to illusory contours. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 62–70. New York: Springer.
Rock, I. and Anson, R. (1979). Illusory contours as the solution to a problem. Perception 8: 665–81.
Rosenbach, O. (1902). Zur Lehre von den Urtheilstäuschungen. Zeitschrift für Psychologie 29: 434–48.
Rubin E. (1915). Synsoplevede Figurer. Copenhagen: Gyldendal. German translation (1921), Visuell
Wahrgenomme Figuren. Berlin: Gyldendal.
Sambin, M. (1974). Angular margins without gradient. Italian Journal of Psychology 1: 355–61.
Sampaio, A. C. (1943). La translation des objets comme facteur de leur permanence phénoménale
[The translation of objects as a factor in their phenomenal permanence]. Louvain: Éditions de
l’Institut Supérieur de Philosophie. Reprinted in: A. Michotte et collaborateurs (eds.) (1962),
Causalité, permanence et réalité phénoménales, pp. 33–90. Louvain: Publications Universitaires.
Seghier, M., Dojat, M., Delon-Martin, C., Rubin, C., Warnking, J., Segebarth, C., and Bullier, J.
(2000). Moving illusory contours activate primary visual cortex: an fMRI study. Cerebral Cortex
10: 663–70.
Seghier, M. L. and Vuilleumier, P. (2006). Functional neuroimaging findings on the human perception of
illusory contours. Neuroscience and Biobehavioral Reviews 30: 595–612.
Sekuler, A. (1994). Local and global minima in visual completion: effects of symmetry and orientation.
Perception 23: 529–45.
Sekuler, A. and Palmer, S. (1992). Perception of partly occluded objects: A microgenesis analysis. Journal of
Experimental Psychology: General 121: 95–111.
Sekuler, A., Palmer, S., and Flynn, C. (1994). Local and global processes in visual completion. Psychological
Science 5: 260–7.
Shepard, R. N. (2001). Perceptual-cognitive universals as reflections of the world. Behavioral and Brain
Sciences 24: 581–601.
Shipley, T. F. and Kellman, P. J. (1992). Perception of partly occluded objects and illusory figures: Evidence
for an identity hypothesis. Journal of Experimental Psychology: Human Perception and Performance
10: 106–20.
Shipley, T. F. and Kellman, P. J. (1994). Spatiotemporal boundary formation: Boundary, form, and motion
perception from transformations of surface elements. Journal of Experimental Psychology: General 123:
3–20.
Singh, M. (2004). Modal and amodal completion generate different shapes. Psychological Science 15: 454–9.
Singh, M., Hoffman, D. D., and Albert, M. K. (1999). Contour completion and relative depth: Petter’s rule
and support ratio. Psychological Science 10: 423–8.
Smith, W. C., Johnson, S. P., and Spelke, E. S. (2003). Motion and edge sensitivity in perception of object
unity. Cognitive Psychology 46: 31–64.
Soska, K. C. and Johnson, S. P. (2008). Development of three-dimensional object completion in infancy.
Child Development 79: 1230–6.
Soska, K. C., Adolph, K. E., and Johnson, S. P. (2010). Systems in development: Motor skill acquisition
facilitates three-dimensional object completion. Developmental Psychology 46: 129–38.
Perceptual completions 319

Sovrano, V. and Bisazza, A. (2008). Recognition of partly occluded objects by fish. Animal Cognition
11: 161–6.
Stanley, D. A. and Rubin, N. (2003). fMRI activation in response to illusory contours and salient regions in
the human Lateral Occipital Complex. Neuron 37: 323–31.
Street, R. F. (1931). A Gestalt Completion Test. New York: Teachers College, Columbia University.
Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature 401: 269–72.
Takeichi, H., Nakazawa, H., Murakami, I., and Shimojo, S. (1995). The theory of the curvature-constraint
line for amodal completion. Perception 24: 373–89.
Thornber, K. K. and Williams, L. R. (1997). Characterizing the distribution of completion shapes
with corners using a mixture of random processes. In: M. Pelillo and E. R. Hancock (eds.), Energy
Minimization Methods in Computer Vision and Pattern Recognition: Lecture Notes in Computer Science
Vol. 1223, pp. 19–34. Berlin: Springer.
Tommasi, L., Bressan, P., and Vallortigara, G. (1995). Solving occlusion indeterminacy in chromatically
homogeneous patterns. Perception 24: 391–403.
Tse, P. U. (1998). Illusory volumes from conformation. Perception 27: 977–92.
Tse, P. U. (1999a). Volume completion. Cognitive Psychology 39: 37–68.
Tse, P. U. (1999b). Complete mergeability and amodal completion. Acta Psychologica 102: 165–201.
Tse, P. U. (2002). A contour propagation approach to surface filling-in and volume formation. Psychological
Review 109: 91–115.
Tse, P. U. and Albert, M. K. (1998). Amodal completion in the absence of image tangent discontinuities.
Perception 27: 455–64
Valenza, E. and Bulf, H. (2011). Early development of object unity: evidence for perceptual completion in
newborns. Developmental Science 14: 1–10.
Vallortigara, G. and Tommasi, L. (2001). Minimization of modal contours: An instance of an evolutionary
internalized geometric regularity?. Behavioral and Brain Sciences 24: 706–7.
van der Helm, P. A. (2000). Simplicity versus likelihood in visual perception: From surprisals to precisals.
Psychological Bulletin, 126: 770–800.
van der Helm, P. A. (2011). Bayesian confusions surrounding simplicity and likelihood in perceptual
organization. Acta Psychologica 138: 337–46.
van der Helm, P. A. and Leeuwenberg, E. L. J. (1991). Accessibility, a criterion for regularity and hierarchy
in visual pattern codes. Journal of Mathematical Psychology 35: 151–213.
van der Helm, P. A. and Leeuwenberg, E. L. J. (1996). Goodness of visual
regularities: A nontransformational approach. Psychological Review 103: 429–56.
van Lier, R. (1999). Investigating global effects in visual occlusion: from a partly occluded square to the
back of a tree-trunk. Acta Psychologica 102: 203–20.
van Lier, R. (2001). Simplicity, regularity, and perceptual interpretations: A structural information
approach. In: T. Shipley and P. Kellman (eds.), From Fragments to Objects: Segmentation in Vision,
pp. 331–52. New York: Elsevier.
van Lier, R. and Wagemans, J. (1999). From images to objects: Global and local completions of self-occluded
parts. Journal of Experimental Psychology: Human Perception and Performance 25: 1721–41.
van Lier, R., van der Helm, P., and Leeuwenberg, E. (1994). Integrating global and local aspects of visual
occlusion. Perception 23:, 883–903.
van Lier, R., van der Helm, P., and Leeuwenberg, E. (1995a). Competing global and local completions in
visual occlusion. Journal of Experimental Psychology: Human Perception and Performance 21: 571–83.
van Lier, R., Leeuwenberg, E., and van der Helm, P. (1995b). Multiple completions primed by occlusion
patterns. Perception, 24: 727–40.
van Lier, R., de Wit, Tessa C. J., and Koning, A. (2006). Con-fusing contours and pieces of glass. Acta
Psychologica 123: 41–54.
320 van Lier and Gerbino

Vezzani, S. (1999). A note on the influence of grouping on illusory contours. Psychonomic Bulletin and
Review 6: 289–91.
von der Heydt, R. and Peterhans, E. (1989) Mechanisms of contour perception in monkey visual cortex.
I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48.
von der Heydt, R., Peterhans, E., and Baumgartner, G. (1984) Illusory contours and cortical neuron
responses. Science 224: 1260–2.
Vrins, S., De Wit, T., and van Lier, R. (2009). Bricks, butter, and slices of cucumber: Investigating semantic
influences in amodal completion. Perception 38: 17–29.
Vrins, S., Hunnius, S., and van Lier, R. (2011). Volume completion in 4.5-month-old infants. Acta
Psychologica 138: 92–9.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., and
van Leeuwen, C. (2012). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychological Bulletin 138: 1218–52.
Wagemans, J., van Lier, R., and Scholl, B.J. (2006). Introduction to Michotte’s heritage in perception and
cognition research. Acta Psychologica 123: 1–19.
Waltz, D. (1975). Understanding line drawings of scenes with shadows. In: P. H. Winston (ed.). The
Psychology of Computer Vision, pp. 19–91. New York: McGraw Hill.
Weigelt, S., Singer, W., and Muckli, L., (2007). Separate cortical stages in amodal completion revealed by
functional magnetic resonance adaptation. BMC Neuroscience, 8:70 doi:10.1186/1471-2202-8-70.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychologische Forschung, 4, 301–50.
English translation in: L. Spillmann (ed.) (2012). On Perceived Motion and Figural Organization.
Cambridge: MIT Press.
Wouterlood, D. and Boselie, F. (1992). A critical discussion of Kellman and Shipley’s (1991) theory of
occlusion phenomena. Psychological Research 54: 278–85.
Yantis, S. (1995). Perceived continuity of occluded visual objects. Psychological Science 6: 182–6.
Yin, C., Kellman, P. J., and Shipley, T. (2000). Surface integration influences depth discrimination. Vision
Research 40: 1969–78.
Yonas, A., Craton, L., and Thompson, W. B. (1987). Relative motion: Kinetic information for the order of
depth at an edge. Perception and Psychophysics 41: 53–9.
Zanforlin, M. (1981). Visual perception of complex forms (anomalous surfaces) in chicks. Italian Journal of
Psychology 1: 1–16.
Zylinski, S., Darmaillacq, A.-S., and Shashar, N. (2012). Visual interpolation for contour completion by
the European cuttlefish (Sepia officinalis) and its use in dynamic camouflage. Proceedings of the Royal
Society B 279: 1–5.
Chapter 16

The neural mechanisms


of figure-ground segregation
Matthew W. Self and Pieter R. Roelfsema

Introduction
Vision appears to be simple. We open our eyes and perceive a well organized world full of rec-
ognizable objects without any feeling of effort. The apparent ease with which we perceive the
world disguises the immense computational efforts necessary to segregate, localize, and recognize
objects. The difficulty of this task stems from the fact that (daytime) vision is based on the distrib-
uted pattern of activity across the millions of cones in the retina. This point-like representation
must be transformed by the neural circuitry of the visual system to produce our coherent percept.
The ultimate goal of this circuitry is to localize and recognize objects and to guide visually driven
behavior. To achieve this goal it is necessary to group together the activity patterns that are pro-
duced by one object (or figure) and to segregate these from patterns produced by other objects or
background regions.
The neuronal mechanisms by which the visual system segregates a figure from its back-
ground and groups together the elements belonging to the figure have been studied using a
texture-segmentation task. In the original version of this paradigm (Lamme 1995) a macaque
monkey was required to fixate on a central fixation dot. Then a full-screen texture composed of
thousands of oriented lines was presented. The texture contained a small square region made from
lines of the orthogonal orientation (Figure 16.1a) (a version using motion-defined textures was
also used and produced similar results). This region is perceived as a figure in front of, and there-
fore occluding, the background. The monkey’s task was to make an eye movement towards the
figure after the presentation of a go-cue. In some experiments (Self et al. 2012; Supèr et al. 2001)
there were also catch-trials with a uniform texture without a figure. On these trials the monkeys
were rewarded for maintaining fixation at the center after the presentation of the go-cue. Monkeys
generally perform very well on this task with performance levels of greater than ninety per cent
correct. The virtue of this paradigm is that it is possible to vary the position of the figure relative
to the receptive field(s) of the neuron(s) under study, while keeping the bottom-up activation of
the neurons constant (Figure 16.1b). If the figure is placed in the receptive field then the response
of the neuron to the figure can be tested (red condition in Figure 16.1b). If the figure is moved
elsewhere then the response to the background can be measured (blue condition). Importantly,
the orientation of the textures is always counterbalanced so that on average exactly the same line
elements fall into the RF in both the figure and ground conditions. This creates conditions in
which the visual information present in the RF is identical but the visual context is different. On
figure trials the RF falls on the behaviorally relevant texture, whereas on ground trials it falls on
the irrelevant background region.
(a) (b)

Spiking activity
figure

ground
4o
100ms Time

(c) Boundary detection (d) Region growing

Ground Ground

Figure
Figure

Neuron 135o Neuron 45o Horizontal inhibition Excitatory feedback

Fig. 16.1  (a) An example stimulus used in the texture-segmentation task. The background texture
covers the entire screen and the monkey’s task is to make a saccade towards the small square
figure region. (b) In the figure condition the figure is centered on the RF of the recorded cell (red
condition). In the ground condition the figure is moved so that the RF falls on the background (blue
condition). Note that the orientation is also reversed so that identical line elements are present inside
the RF. The graph to the right illustrates the typical response of V1 cells. The early response (<100ms
after stimulus onset) is the same regardless of whether the RF was on the figure or background. In
the later time-period (>100ms) the responses to the figure (red line) are significantly higher than
those to the ground (blue line). The shaded grey region represents the modulation in firing and is
referred to as figure-ground modulation (FGM). (c) Boundaries can be detected through mutual
inhibition between cells tuned for the same orientation. Here cells on either side of the boundary
(the pink dashed line) have stronger responses than cells in the middle of the texture as they only
receive inhibition (the black bars) from one side. (d) Models of region growing suggest that the
figure-region becomes perceptually grouped through excitatory feedback from neurons in higher
visual areas tuned to the figural orientation (red cone). This leads to enhanced firing-rates across the
entire figure region.
The Neural Mechanisms of Figure-ground Segregation 323

The responses of neurons in V1 are modulated by the visual context (Figure 16.1b). In the previ-
ous studies responses for the large majority of neurons were stronger when the RF fell on a figure
compared to the background, on average by around forty per cent of the activity produced by the
background. We will refer to this modulation in firing-rate as figure-ground modulation (FGM).
Most notably this modulation did not begin until around 100ms after the onset of the texture
(40-50ms after the initial visual response in V1). The initial response was identical regardless of
the visual context showing that the input into V1 from the thalamus did not discriminate between
figure and ground. A follow-up study showed that figures defined by other cues (color, motion,
luminance, depth) produced similar levels of FGM in V1 (Zipser et al. 1996).
How does the visual system segregate such a texture? Psychophysical studies (Mumford et al.
1987; Wolfson and Landy 1998) have suggested that there are two complementary mechanisms
at work to segment the scene. The first is boundary detection, the enhancement of the borders
of the object (Figure 16.1c). We will propose that boundary detection is achieved through a
mixture of center-surround interactions mediated by feedforward anatomical connections and
mutual inhibition between neurons tuned for similar features mediated by horizontal connec-
tions within visual cortex. These processes rapidly enhance neural firing-rates at locations in
the visual scene where there are local changes in feature values. The second process is region
growing, which groups together regions of the scene with similar features (Figure 16.1d). We will
discuss evidence for a region growing process in which a surface label (also enhanced neuronal
activity) simultaneously arises across regions of similar feature values. We hypothesize that both
processes exist in visual cortex and work together to rapidly and accurately segment the visual
scene. The neural connection schemes for these processes are however, quite different, and their
timing differs too.

Boundary detection
Theory of boundary detection
A fundamental processing strategy in the visual system is to contrast feature information from
nearby regions of space. This strategy has the dual effect of making the visual system relatively
insensitive to uniform regions of the scene and enhancing the responses to regions in which
feature-values change. A well-known example of the neural implementation of this strategy is the
retinal ganglion cell. These cells have a center-surround receptive field organization; they respond
strongly to an increase or decrease in luminance restricted in size so that it selectively activates
the center mechanism. They are less driven however by uniform regions of luminance which
simultaneously activate the center and surround mechanism. This organization makes these cells
more responsive to luminance defined edges if the edge is correctly aligned with the receptive
field. A retinal ganglion cell would not however be able to signal the presence of the boundaries
in Figure 16.1a. These boundaries are defined by orientation and the luminance on each side of
the boundary is the same. Such orientation defined edges cannot be detected in the retina or
thalamus of primates because these structures lack cells that are selective for orientation; a cortical
mechanism is required.
In theory, orientation-defined texture boundaries could be detected by “orientation-opponent”
cells driven by one orientation in their center and the orthogonal orientation in their sur-
round. Such cells have however yet to be found in visual cortex. Instead it has been proposed
that these edges are detected through mutual inhibition between neurons tuned for the same
324 Self and Roelfsema

orientation (Grossberg and Mingolla 1985; Knierim and Van Essen 1992; Li 1999; Marr and
Hildreth 1980; Sillito et al. 1995). In such an iso-orientation inhibition scheme, the activity of
neurons that code image regions with a homogeneous orientation is suppressed, whereas the
amount of inhibition is smaller for neurons with RFs near a boundary so that their firing rate
is higher (Figure 16.1c). There is a good deal of evidence that iso-orientation suppression exists
in visual cortex. Cells in V1 that are well-driven by a line element of their preferred orienta-
tion are suppressed by placing line elements with a similar orientation in the nearby surround
(Knierim and Van Essen 1992). These surrounding elements do not drive the cell to fire them-
selves and are therefore demonstrably outside the classical receptive field of the V1 cells, yet
they strongly suppress the response of the cell to the center element. Importantly this suppres-
sion is greatly reduced if the line elements outside the RF are rotated so that they are orthogonal
to the preferred orientation of the cell. This result supports the idea that V1 neurons receive an
orientation-tuned form of suppression coming from regions surrounding the RF (Allman et al.
1985; Jones et al. 2001; Kastner et al. 1999; Levitt and Lund 1997; Nelson and Frost 1978; Sillito
et al. 1995). The time-course of this suppression is very rapid. Studies using grating stimuli have
determined that iso-orientation suppression can be observed within 25ms of the onset of the
visual response (Li et al. 2001; Nothdurft et al. 1999). One study which examined the latency
of this effect at the level of individual cells found even shorter latencies of around 7-10ms (Bair
et al. 2003). Thus, representations of the boundaries of objects in natural scenes are enhanced
and projected forwards to higher visual areas as part of (for luminance-defined boundaries), or
closely following (for texture-defined boundaries), the initial feedforward sweep of visual activ-
ity. Indeed, studies of the neuronal responses to the boundaries of texture defined figures in V1
(Lamme et al. 1999) and also in higher visual area V4 (Poort et al. 2012) find enhanced activity
at around 70ms after stimulus onset.

Rapid detection performance and the limits


of feedforward processing
This rapid enhancement of neuronal activity at the edges of the figure may be sufficient to perform
rapid detection tasks. The figures used in early studies of texture-segmentation were rather simple
square forms (Lamme 1995; Zipser et al. 1996) and it is likely that detectors exist in higher visual
areas which are activated by such simple and regular forms. The activity of such a detector would
signal to the rest of the brain the presence or absence of the square-region in the scene, implicitly
grouping together the boundaries of the object. Indeed primates show a remarkable ability to make
rapid present or absent judgments when viewing rapidly presented sequences of natural images.
For example we are able to very rapidly determine if a stream of images contains an animal or not,
even when the presentation time of each image is reduced to 20ms per image (Thorpe et al. 1996).
These ultra-rapid abilities may rely on the activation of cells in higher visual areas that are tuned for
characteristic diagnostic features (e.g. a cell tuned for the presence of an eye in the image would be
sufficient to solve the above task). However there are limits to the abilities of cells in higher visual
areas to group together the detected boundaries. For example neurons in inferotemporal cortex (IT)
have RFs that cover almost an entire hemi-field. This is extremely useful for determining whether
a particular grouping of features is present in the visual scene (Brincat and Connor 2004; Kayaert
et al. 2005; Tanaka 1993), but information about the precise spatial location of the object is lost.
Furthermore the use of specialized feature-detectors at high levels is limited to situations in which
familiar objects are presented (Sheinberg and Logothetis 2001). It is highly unlikely that detectors
exist for objects that have never been seen. Also, the early studies which examined responses in
The Neural Mechanisms of Figure-ground Segregation 325

higher visual areas did so using anesthetized preparations and usually presented one object on the
screen at any one time. Studies in awake-behaving animals using multi-object scenes have revealed
that there are very strong inhibitory interactions which control the flow of information through
this feedforward network (Miller et al. 1993; Sheinberg & Logothetis 2001). Stimulus representa-
tions compete with one another so that at the level of IT there may only be active representations
for one or a few objects at a time (Desimone and Duncan 1995). This competition is strongly biased
by behavioral relevance so that relevant objects tend to win the representational battle (Luck et al.
1997; Reynolds et al. 1999). In natural images that typically contain many overlapping objects this
may mean that very few objects are represented at high levels of the visual system placing a severe
limit on the number of objects that can be grouped by fast feedforward processes.
In summary feedforward grouping of elements using complex receptive fields has many advan-
tages, such as its speed. It is unlikely, however, that feedforward processing would be able to
correctly group scenes containing novel objects and determine their location with high spatial
resolution. Furthermore, the inhibitory interactions that curtail the flow of information towards
higher visual areas imply that feedforward processes are not sufficient to group scenes contain-
ing multiple, overlapping or ambiguous objects. In these situations extra grouping processes are
required which are more flexible, but this additional flexibility may come at the cost of taking
more time.

Region growing
What is region growing?
How is the rest of the object grouped together once its boundaries have been detected? One mecha-
nism that has been used in computational models is region growing. Region growing is the coun-
terpart to the boundary detection process described above. Whereas boundary detection enhances
responses at the borders of an object, region growing has been proposed to begin in regions of
uniform feature-value and to spread outwards until encountering a feature-boundary (Grossberg
and Mingolla 1985), although we will later suggest that region growing proceeds simultaneously
across large regions of uniform texture. Region growing relies on statistical similarities between
features (Grossberg and Mingolla 1985; Mumford et al. 1987; Wolfson and Landy 1998). Regions
with similar features are grouped together and thereby segregated from regions with different fea-
ture values. Psychophysical studies have demonstrated that the performance of human observers
on shape discrimination tasks is best explained by models which use mechanisms for boundary
detection as well as for region growing (Mumford et al. 1987). Indeed, humans can discriminate
between textures which are physically separated from one another so that the boundary detection
process cannot be used (Wolfson and Landy 1998). Computational models of texture segmentation
stipulated that region growing requires an entirely different connection schemes than boundary
detection (Bhatt et al. 2007; Grossberg and Mingolla 1985; Poort et al. 2012; Roelfsema et al. 2002).
Whereas boundary detection requires iso-orientation inhibition, i.e. cells encoding the same feature
should inhibit one another (as was discussed above), region growing requires iso-orientation excita-
tion, which means that cells that represent similar features enhance each other’s activity.

A computational model of region growing


How is it possible that the visual system implements these opposing connection schemes? One
solution would be that the different schemes are implemented during different phases of process-
ing. The boundary detection process has a relatively short latency of <20ms after the initial visual
326 Self and Roelfsema

response in V1 and V4 for texture-defined boundaries1. In contrast, figure-ground modulation


at the center of a figure-region has a longer latency of >50ms after the initial visual response.
However, a difference in timing is unlikely to be the only explanation. It would require that the
connection schemes of visual cortex switch from iso-orientation suppression to iso-orientation
enhancement within 20–30ms! Such a dramatic and rapid reorganization of connectivity is highly
unlikely. It is more likely that these two processes make use of different sets of cortico-cortical
connections. We have previously suggested that boundary detection algorithms use feedfor-
ward and horizontal connections whereas region growing processes use feedback from higher
to lower visual areas (Poort et al. 2012; Roelfsema et al. 2002). The implication is that feedfor-
ward and horizontal projections can implement center-surround comparisons within the RF and
iso-orientation suppression over small spatial scales in early visual areas and over larger scales in
higher areas. Feedback connections would then propagate region filling signals from the higher
areas back to the lower areas (Figure 16.2).
This division was made explicit in a computational model of texture-segmentation (Poort et al.
2012; Roelfsema et al. 2002). In this model feature-maps were present at multiple spatial scales in
a multilayer visual hierarchy. At each level of the hierarchy there was iso-orientation inhibition
for the detection of edges. This architecture has the result that for any given figure size there will
be a level in the model hierarchy at which the figure appears as a singleton amongst distracters,
i.e. a form of ‘pop-out’ (V4 in Figure 16.2a; TE in Figure 16.2b). Iso-orientation excitation for
region growing is implemented in the feedback pathway. Neurons at the higher level where pop-out
occurred then send a feature-specific feedback signal back to earlier visual areas to enhance the
response of neurons encoding the same feature and suppress the responses of neurons encoding
the opposite feature (Figure 16.2c). For example, a figure composed of leftwards oriented line ele-
ments strongly activates leftwards-preferring cells in a high-level area (e.g. IT). These cells send
feedback to earlier processing levels and ultimately also to V1 to activate only those cells that prefer
leftwards oriented line elements and to suppress those that prefer rightwards. One further compu-
tational rule is required by the model to restrict the enhanced activity to the interior of the figure.
The feedback connections have to be gated by feedforward activity, so that only those cells that
were well activated by the feedforward sweep of activity are modulated by the feedback signal. This
ensures that feedback only excites cells that are activated by an orientation close to their preferred
orientation. In the example given here this ensures that feedback does not excite cells that are tuned
for the leftward orientation with RFs outside the boundaries of the figure (where the orientation
of the line elements is rightwards) and that the region growing signal stays focused on the repre-
sentation of the figure. The final result of this model is that the figure-region becomes grouped
through enhanced firing-rates in early visual areas compared to the background (Figure 16.2d).
The model is able to reproduce the firing-rate modulations observed in the texture-segmentation
tasks described above (Poort et al. 2012; Roelfsema et al. 2002). Furthermore the model is able to
correctly segregate more complex figures such as N or U shapes or figures with holes which con-
tain potentially confusing interior convex regions which might mistakenly be segregated as figures
(Roelfsema et al. 2002). While the model initially incorrectly assigns figure status to the interior of

  It should be noted that this latency only applies to texture-defined boundaries. Luminance defined bounda-
1

ries can be detected through center-surround processes such as the receptive-field of the retinal ganglion cell
described above. Enhanced activity at luminance defined boundaries can be seen in the feedforward input
into V1 (Sugihara et al. 2011) and does not require the kinds of interactions that we discuss here.
The Neural Mechanisms of Figure-ground Segregation 327

(a) Feedforward (+ lat. inhibition) (b)


TE
V4 TEO

V2 V4

V1 V2

V1

Input

(c) Feedback (d)


V1: 65 ms V1: 190 ms
V4

V2

V1
Fig. 16.2  (a) A model of figure-ground segmentation (Roelfsema et al. 2002; Roelfsema and
Houtkamp 2011). Neurons encoding the edges of the figure have enhanced activity as they receive
less horizontal inhibition (orange arrows) from their neighbors. (b) The input stimulus produces
increased activity throughout the visual hierarchy (averaged across orientation maps), the edges of
the figure merge together in the large RFs of high-level areas such as TEO. (c) Neurons in higher
visual areas send feedback back to neurons in lower areas. This feedback is gated by the activity
of neurons in lower visual areas and enhances responses throughout the figure (region growing).
(d) The result of the model is that early responses are enhanced at the boundaries of the figure
whereas at later time-points the response enhancement also spreads to the center of the figure.
Reproduced from Attention, Perception, and Psychophysics, 7(8), pp. 2542–2572, Incremental grouping of image
elements in vision, Pieter R. Roelfsema, Copyright © 2011, Springer-Verlag. With kind permission from Springer
Science and Business Media.

the N/U or the hole, this is later overruled by feedback from higher areas which do not extract the
interior of these figures due to the poor spatial resolution of their RFs.

Alternative explanations for figure-ground modulation


The computational model described above would predict that the enhanced activity observed at the
boundaries of the figure relies on mechanisms that differ from those for FGM at the center of the
figure. This prediction has been debated by other groups which have suggested that figure-ground
modulation is strongly related to the mechanisms that underlie boundary detection. Zhaoping
Li has presented a model (Li 1999) where FGM mainly arises through iso-orientation inhibi-
tion. This mechanism, which according our aforementioned model is responsible for boundary
detection, was able to reproduce some results of earlier studies of FGM, but it cannot explain the
FGM in the center of larger figures. Another group (Rossi et al. 2001) has suggested that FGM
328 Self and Roelfsema

could only be observed with very small figures (up to 2° in diameter) and did not observe FGM
in the center of larger figures. They suggested that FGM is in fact a boundary detection signal and
becomes greatly reduced as one moves away from the boundary. Both of these viewpoints sug-
gest that there is no region growing signal present in V1 and that neural activity in V1 does not
reflect surface perception, but rather the presence of nearby boundaries. Poort et al. (2012) recon-
ciled these apparently conflicting findings by showing that region growing is only pronounced for
behaviorally relevant objects (see below).

A relationship to border ownership?


Is the FGM signal observed by Lamme (1995) simply a boundary detection signal? If so, it is
unclear why this signal would be restricted to the figure and not also spread out from the bound-
ary into the background. Lamme (1995) showed that FGM is completely absent, or even slightly
negative, on background regions close to the figure boundary, whereas the modulation was at
a similar level throughout the figure region. This result demonstrates that if boundary detec-
tion signals spread from the borders of an object then this is mediated by a system which has
access to which side of the border is object and which side is background. Border-ownership
cells provide a possible neural substrate for this mechanism. The concept of border-ownership
is dealt with in more detail in Kogo and van Ee’s chapter in this volume, for our purposes here
it is sufficient to know that cells in visual cortex represent border ownership in modulations of
their firing-rate (Zhou et al. 2000). For example, a rightwards tuned border-ownership cell will
give a greater response when an edge is owned by an object to the right of its RF than when it is
owned by an object to the left. In this way border-ownership cells can give a spatial signal as to
which direction to start spreading a boundary-signal. Border-ownership cells are found in small
numbers in V1, and in much greater numbers in V2 and V4. In fact most orientation-selective
V2 and V4 neurons are also border-ownership selective highlighting the fundamental nature of
border-ownership coding (Zhou et al. 2000). The mechanisms by which border-ownership tuning
might arise in these cells were recently discussed by Craft et al (2007). Their theory (see also Jehee
et al. 2007) relies on the presence of, as-of-yet theoretical, grouping cells in higher visual areas
(V4 and above). Grouping cells are activated by the presence of convex, enclosed contours and
send feedback to BO-cells in lower areas which are aligned with the contour. This elegant theory
can explain how BO-tuning arises, although experimental evidence for grouping cells remains to
be found. Computational models suggest that firing-rate modulations shown by BO-tuned cells
in V2 could be used as a “seed” to spread a label in the correct direction within the object, and not
outwards into the background (Kogo et al. 2010).
The models described above share some similarities with our model in that recurrent pro-
cessing between neurons with small RFs at low levels of the visual system and those with large
RFs at high levels in the visual system is used to determine border ownership. Our model dif-
fers in that it specifies a mechanism by which the entire figure region can be labeled simulta-
neously with enhanced neural firing. The models of Craft et al. (2007) and Jehee et al. (2007)
are concerned with correctly assigning border-ownership and do not make predictions about
how FGM arises in V1 and the model of Kogo et al. (2010) suggests that FGM would arise
first at the boundaries of an object and spread towards the center. Nevertheless these mod-
els, and those of Grossberg (Bhatt et  al. 2007; Grossberg and Mingolla 1985), all suggest
that feedback to lower visual areas is essential in grouping together the figure region. These
models are therefore very different from those of Zhaoping Li who proposes that intra-areal
horizontal connections are sufficient to assign figure-ground status and that FGM is simply a
The Neural Mechanisms of Figure-ground Segregation 329

spreading of boundary detection signals from the borders of the object (Li 1999; Rossi et al.
2001; Zhaoping 2005).
We have carried out two recent studies which directly investigated the contribution of feedfor-
ward, lateral, and feedback connections to boundary detection and region growing. In the first
(Poort et al. 2012) we studied the effect of task-relevance on the enhanced firing at the bounda-
ries and the center of a figure. We found that FGM at the center of the figure (region filling)
depends strongly on the task that the monkey is doing, whereas boundary detection has only a
weak dependence. This result indicates that the processes that underlie boundary detection are
largely stimulus-driven, in accordance with a strong contribution from lateral and feedforward
inhibition, and that region-filling indeed depends more strongly on feedback connections from
higher visual areas.
In the second study (Self et al. 2013) we made laminar recordings of activity in V1 while mon-
keys performed a figure-ground task. Importantly, these laminar recordings provide unique
information about the neural circuitry underlying FGM as they allow us to examine the synaptic
currents and spiking changes that are produced at the borders and center of a perceptual figure.
We found that boundary detection engages different laminar circuits than region-filling. Taken
together these studies suggest that FGM observed at the center of the figure is not an extension of
a boundary detection signal at the edges.

The neural mechanisms of figure-ground modulation


The effect of attention on figure-ground modulation
We have hypothesized that the detection of the boundaries of a figure relies on different neu-
ral mechanisms than the FGM at the center of the figure. If this is the case then these two
processes may be affected differently by the task-relevance of the figures. In this study (Poort
et al. 2012) we recorded neural activity from V1 and V4 while monkeys made eye movements
towards a texture-defined figure or ignored it. We varied the animals’ attention by presenting
two possible tasks. The upper half of the screen contained two luminance-defined curves for
the first curve-tracing task where the monkey was trained to make an eye movement towards
the end of the curve that was connected to the fixation point. In the lower half of the screen a
texture-defined figure was present for a texture-segregation task where the animal had to make
an eye movement towards the center of the figure (Figure 16.3a). The animals performed only
one task per day, so that if he was performing the curve-tracing task he would ignore the figure
and vice versa. We shifted the location of the figure so that the neural responses to the figure
edge or figure center could be recorded along with intermediate locations and responses to the
background (Figure 16.3b).
When the animal was performing the figure detection task we observed that neuronal
responses to the figure were enhanced relative to responses evoked by the background, just as
in Figure 16.1b. We isolated the FGM signal (grey regions in Figure 16.1b) by subtracting back-
ground responses from responses evoked by the figure. In the figure detection task, FGM in V1
neurons was similar regardless of whether their RF was located on the figure or on the boundary
(Figure 16.3c). The level of FGM was similar to those obtained in previous studies. However,
when the animal was performing the curve-tracing task we observed a drop in responses to the
figure center whereas responses to the boundaries were relatively unaffected (Figure 16.3d). These
results show that the detection of the boundary, which we have linked to iso-orientation sup-
pression, proceeds equally well in presence or absence of attention. Previous studies have also
330 Self and Roelfsema

(a) Saccade (b)


RF on Background
Stimulus (600 ms)

Fixation (300 ms) RF Figure


T
D
Curve-tracing RF on Edge
FP figure

RF on Centre

Figure-detection

(c) (d)
FGM
0.1

0.05

0 0.06
400 400
−4 −2 −4 −2 s)
FGM

0 2 0 0 2 0 m
4 4 e(
m
Posi Posi Ti
tion 0 tion
(deg (deg
) )
Fig. 16.3  (a) The paradigm used to study the effect of attention on FGM. The monkeys were always
presented with two curves in the upper-half of the screen and a texture-defined figure in the bottom
half (shown in plain colors here for simplicity). On different days the monkey performed different
tasks. On curve-tracing days the monkey had to make an eye-movement towards the target circle
that was connected to the fixation-point by a curve. On figure-detection days he had to make a
saccade towards the figure. (b) The position of the figure relative to the RF was varied on each trial
to map out responses to the background, edge, and center of the figure. (c) The 3D color-plot shows
the amount of FGM according to position of the figure during the figure-detection task. The plot on
the left-hand side shows the response at the edge of the figure (red) vs. the center (blue). (d) FGM
during the curve-tracing task. When attention is directed to the curve-tracing task the level of FGM
is reduced in the center of the figure. The response at the edges was relatively unaffected.
Reprinted from Neuron, 75(1), Jasper Poort, Florian Raudies, Aurel Wannig, Victor A.F. Lamme, Heiko Neumann,
and Pieter R. Roelfsema, The Role of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual
Cortex, pp. 143–56, Copyright © 2012, with permission from Elsevier.

demonstrated enhanced edge-responses when animals ignore a stimulus (Marcus and Van Essen
2002) or even when animals are anesthetized (Kastner et al. 1997; Nothdurft et al. 1999; Nothdurft
et al. 2000). In contrast, our results show that the responses at the figure center depend on the
task-relevance of the figure. When the figure is behaviorally relevant then responses at the center
of the figure are similar to those at the edge, but when attention was directed to the other task
the responses fell to approximately halfway between the edge responses and the response to the
background. This result leads us to draw two conclusions. Firstly, that the process responsible for
The Neural Mechanisms of Figure-ground Segregation 331

boundary-enhancement is different to the process responsible for FGM at the center of the figure.
Secondly, while FGM at the figure-center is influenced by attention, it still arises in the absence
of attention. These results are in good agreement with a study that examined the effect of atten-
tion on border-ownership cells (Qiu et al. 2007), which found that border-ownership signal can
also be observed outside the focus of attention, but that attention can amplify coding of border
ownership. These results are consistent with our hypothesis that boundary detection, which is
thought to rely on iso-orientation inhibition, depends on an early process that may rely on feed-
forward or lateral connections (Figure 16.2a), whereas the FGM at the figure center depends on
iso-orientation excitation, which is mediated by feedback from higher visual areas (Figure 16.2c).
A process that depends on the activity in higher visual areas is expected to depend more strongly
on the task-relevance of the figure.
What then is the advantage of enhancing neural activity on figures compared to background?
One possibility is that by increasing the responses of neurons in early visual areas, which have
small RFs providing excellent spatial resolution, the visual system can more accurately localize
the figure to guide behavior. The neuronal processes that are responsible for making a saccade
to the center of the figure might take advantage of the FGM, because it selectively labels all the
image elements of the figure. The spatial profile of FGM can therefore be read out by the saccadic
system to determine the center of gravity of the image elements that belong to the figure. We
assessed this possibility by examining the relationship between the level of FGM in V1 and the
spatial accuracy of the saccade. The animals in this study were required to make very accurate
saccades to a 2.5° window centered within the 4° figure. We found that the spatial profile of FGM
in V1 indeed predicted the landing-point of the saccade on the figure. On trials where FGM was
strongest on the left-hand side of the figure the animal tended to make saccades that landed to
the left of center. The opposite was observed on trials with strong FGM on the right-hand side.
Trials with modulation spread evenly through the figure were associated with the most accurate
saccades. This result suggests that the FGM signal in V1 is used by the motor-system to plan sac-
cades to the center of gravity of the image elements that belong to the figure, possibly through the
direct projections from V1 to the superior colliculus (Fries and Distel 1983; Wurtz and Albano
1980). These and previous results, taken together, show that the activity in V1 is closely associ-
ated with both the perception of the animal (Supèr et al. 2001) and the spatial accuracy of the
behavioral output.

The laminar circuitry of figure-ground segregation


We have suggested above that increased firing-rates at the boundaries of a figure might be medi-
ated by feedforward and horizontal connections within V1 whereas FGM at the center of the fig-
ure could be due to feedback projections. These different projections target different layers of V1.
Feedforward connections predominantly target layer 4c and layer 6, horizontal connections are
present in all layers but are particularly dense in upper layer 4 and the superficial layers (Gilbert
and Wiesel 1983; Rockland and Pandya 1979) and feedback connections (from object process-
ing areas of the ventral stream) target layers 1 and 5 most strongly (Anderson and Martin 2009;
Rockland & Pandya 1979; Rockland and Van Hoesen 1994; Rockland and Virga 1989), and in
general tend to avoid layer 4c (Douglas and Martin 2004; Felleman and Van Essen 1991; Nassi
and Callaway 2009). We therefore recorded simultaneously from all the layers of V1 while two
macaque monkeys performed a texture-segregation task that had been used previously (Supèr
et  al. 2001). We used a multi-contact laminar electrode (Plexon “U-probe”) that allowed us to
measure multi-unit spiking activity (MUA) and the local field potential from twenty-four linearly
332 Self and Roelfsema

spaced contacts. The advantage of these electrodes is that they also allow the application of cur-
rent source density (CSD) analysis to the local field potential (Mitzdorf 1985;Schroeder et  al.
1991;Schroeder et al. 1998). This analysis reveals the laminar locations of current sinks (currents
flowing into neurons) and current sources (mostly passive current return to the extracellular
space). We recorded MUA and CSD responses evoked by the center and edge of the figure, as well
as to the background texture.
The results of this study were very revealing. Firstly we found strong laminar variations in the
strength of FGM at the center of the figure (Figure 16.4a). FGM was strongest in the superficial
and deep layers and significantly weaker in layer 4. The latency of modulation was relatively con-
stant across the layers, beginning at around 100ms after stimulus onset, so from latency analyses

(a) MUA (b) CSD

0.1
MUA

Current source density (mA.m–3)


1
0 super 0.5

1 super Laminar depth (mm) 0.5 0.25


Laminar depth (mm)

layer 4
0.5 0
layer 4 0
0
Centre

deep –0.25
deep –0.5
–0.5 –0.5
0 100 200 0 0.1 0 100 200 300
Time (ms) MUA
Time from stimulus onset (ms)
0 0.03 0.06 0.10
MUA
(c) (d)
0.1
MUA

0.8

Current source density (mA.m–3)


1
0 super
Laminar depth (mm)

1 0.4
Laminar depth (mm)

0.5
layer 4
0.5
0
0
0
deep
Edge

–0.4
–0.5 –0.5
0 100 200 0 0.1 0 100 200 300
Time (ms) MUA Time from stimulus onset (ms)
0 0.03 0.06 0.1
MUA

Fig. 16.4  FGM in the center of the figure (a) and at the edge (c) averaged across a number of
penetrations. The color-plots show the laminar profile of FGM—the difference in MUA evoked
by figure and background. The edge specifically causes early FGM in the superficial layers (white
arrow in c). The panels above show the MUA-response averaged across all laminae; panels to the
right show MUA response averaged across time. (b) Difference in the CSD evoked by the figure
center and background. Warm colors show stronger sinks in the figure condition (and/or stronger
sources in the ground condition) and cooler colors stronger sources. The black arrows indicate the
first sinks that differentiate between figure and background at a latency of ~100ms in layer 5 and
layer 1. (d) The difference in CSD between the figure edge and the background. The earliest sinks
occur in upper layer 4/layer 3 and then in layer 2 (black arrows).
The Neural Mechanisms of Figure-ground Segregation 333

it was difficult to determine the source of this increase in spiking. Even more revealing was the
difference in current flow between the figure and ground conditions. In the figure condition we
observed extra current sinks flowing very superficially in layer 1 and/or upper layer 2 as well as
in layer 5 (Figure 16.4b). These layers are well-known to be the targets of feedback projections
from V2 to V1 (Anderson and Martin 2009; Rockland and Pandya 1979). These results therefore
support the idea that feedback projections, targeting layers 1 and 5, are the source of the increased
spiking in V1 for the center of the figure.
When we placed the boundary of the figure in the RF we observed an extra component to the
FGM signal that started at approximately 70ms after stimulus onset (arrow in Figure 16.4c). This
early boundary-FGM has also been observed in previous studies of texture-segregation (Lamme
et al. 1999; Nothdurft et al. 2000; Poort et al. 2012), but interestingly in our study the modula-
tion was confined entirely to the superficial layers of cortex. At later time-points (>100ms) this
modulation was followed by a pattern of spiking activity very similar to that observed at the figure
center. CSD analysis revealed an extra current sink in the edge condition compared to the center
at around 70ms beginning in upper layer 4 and extending into the superficial layers at the same
time as the increase in spiking in these layers (arrows in Figure 16.4d). It is clear from both the
pattern of MUA and CSD that the mechanisms underlying early FGM at the edge of the figure
differ from the mechanisms responsible for the FGM at the center. On the other hand, at later
time-points (>100ms) the MUA and CSD modulation at the edge resembled quite closely the
FGM at the center. We therefore suggest that the early edge FGM is the result of horizontal pro-
jections which are densest in upper layer 4 and superficial layers, whereas the later FGM at the
edge might reflect a feedback-signal targeting the entire figure-region. This study therefore pro-
vides good evidence that both boundary detection processes (mediated by local connections) and
region-filling processes (mediated by feedback connections) play a role in segregating textures
and that these processes occur in different layers of cortex, and at different times.

Feature-specific feedback signals


An important requirement for the region growing signal is that it should respect the boundaries
of the figure and should not grow beyond them. In the computational model described above
this is partially achieved by using a feature-specific signal. The orientation of the figure is repre-
sented by orientation-tuned cells in higher visual areas, which send back a spatially-imprecise, but
feature-selective signal to lower visual areas. The feature-specificity of the feedback signal ensures
that the FGM does not spread onto cells that code the background orientation. This mechanism is
effective in the computational model, but the feature-specificity of feedback signal in visual cortex
is not yet completely resolved.
There are several lines of evidence to support feature selective feedback. The first stems from
studies of feature-based attention. It is well documented that primates can be cued to attend to a
particular feature (e.g. the red items in a multicolor display). This can be extremely useful in visual
search tasks in which the subject has to locate a target object amongst multiple distracters. Indeed
a feature-specific modulation of activity of early visual areas forms a key part of theories of visual
search such as feature-integration theory and guided search (Treisman and Gelade 1980; Wolfe
et al. 1989). Neurophysiological studies of feature-based attention have found that the responses of
neurons encoding the cued feature are enhanced throughout the visual scene (Martinez-Trujillo
and Treue 2004; Roelfsema et al. 2003; Treue and Martinez-Trujillo 1999; Wannig et al. 2011).
These observations suggest that top-down attentional systems can select neurons based on their
feature-tuning.
334 Self and Roelfsema

In spite of these feature-selective feedback effects on neuronal firing-rates, the anatomical


evidence for feature-specific feedback is mixed. Early studies examined the spatial extent of
neurons that send feedback projections back to V1 by injecting retrograde tracers into V1 of
cats (Salin et al. 1989; Salin et al. 1995) and monkeys (Perkel et al. 1986). These studies found
a good match between the size of the region in V2 that projects to a column in V1 and the
size of the region of V2 that receives feedforward projections from that column (Salin et  al.
1995). However, as V2  RFs represent much larger regions of space than V1, this means that
a V1 column receives feedback from neurons encoding a much larger region of visual space
than they themselves represent (Salin and Bullier 1995). These results raised the question of
whether feedback projections would be able to provide a signal of sufficient spatial resolution
to mediate FGM. Furthermore, these projections were described as producing relatively diffuse
patterns of terminal arborizations, suggesting that they would not be able to form the basis for
a feature specific signal (Maunsell and Van Essen 1983;Rockland & Pandya 1979). In accord-
ance with this view, Stettler et  al. (2002) reported that feedback projections from V2 to V1
in monkey visual cortex are not specific for orientation. However, more recent studies using
more specific tracers have found instead that feedback projections are more specific than pre-
viously described. The terminal arborizations of feedback-axons have a patchy appearance in
V1, suggesting that they target-specific orientation columns (Angelucci et al. 2002; Angelucci
and Bullier 2003; Shmuel et  al. 2005). Thus, although there is clear functional evidence for
feature-specific feedback signals in early visual cortex, the anatomical substrate of these effects
remains to be fully elucidated.

Gating of feedback effects by feedforward activity


Feature-specific feedback would ensure that modulation does not spill-over onto neurons acti-
vated by the background texture. However this mechanism, by itself, does not prevent that feed-
back connections activate cells tuned for the orientation of the line elements inside the figure, but
with a RF located on the background. To prevent these cells from becoming modulated it is neces-
sary to gate feedback effects using feedforward activity (Roelfsema 2006). Are feedback effects in
visual cortex indeed gated by feedforward activation?
There is substantial evidence that feedback-based effects are strongest for cells that are
well-activated by the visual stimulus (Ekstrom et al. 2008;Treue and Martinez-Trujillo 1999) but
it is unclear how this arises. Long-range cortico-cortical connections are known to use glutamate
as their neurotransmitter (Johnson and Burkhalter 1994) and, in principle, feedback projections
might be able to drive their target neurons, even if these neurons are not in an active state. Crick
and Koch (1998) argued that this would be an undesirable situation because it might lead to
strong feedforward-feedback loops which could drive activity towards deleterious, even epilepto-
genic levels of activity (Crick and Koch 1998).
The question why feedback only modulates neural activity whereas feedforward projections
drive neural responses is not entirely resolved (Sherman and Guillery 1998). One possibility
raised by computational models is that feedforward and feedback projections utilize different glu-
tamate receptors (Dehaene et al. 2003; Lumer et al. 1997). A main ionotropic glutamate receptor
in cortex is the AMPA receptor (AMPA-R) which is a rapidly activated channel, well-suited to
drive a neuron’s membrane potential above threshold. The other principle glutamate receptor is
the NMDA receptor (NMDA-R) with a more slowly opening channel. The current passed by this
receptor shows a non-linear relationship with membrane voltage (Daw et al. 1993). At strongly
negative membrane potentials the channel does not pass current as it is blocked by the presence
The Neural Mechanisms of Figure-ground Segregation 335

of a magnesium ion in the channel pore. At the more depolarized levels that occur if a cell receives
other sources of input, the magnesium block is removed and the channel begins to pass current.
This mechanism implies that NMDA-Rs can act as coincidence detectors that are only active if
the neuron is depolarized by AMPA-R activation (Daw et al. 1993). NMDA-Rs would therefore
be well-placed to mediate the gating of a feedback-based modulatory signal, as these receptors
are unable to activate neurons that are not receiving synaptic input from other sources. There is
some evidence to suggest that NMDA-Rs may be more strongly involved in feedback process-
ing than in feedforward transmission. For example responses in thalamo-cortical recipient layers
are unaffected by APV, a drug that blocks all NMDA-Rs (Fox et al. 1990; Hagihara et al. 1988).
Furthermore, NMDA has found to produce multiplicative effects on firing in the superficial and
deep layers of visual cortex (Fox et al. 1990) and NMDA-Rs therefore provide a possible mech-
anism for the gating of feedback by feedforward activity. It is unlikely however that feedback
connections target synapses that only possess NMDA-Rs as synapses without AMPA-Rs are not
functional. It is possible however that feedback connections target synapses that are particularly
rich in NMDA-Rs. An alternative possibility has been raised by through the work of Matthew
Larkum who has shown that NMDA-Rs are required to integrate the inputs to the apical dendrites
of layer 5 neurons (Larkum et al. 2009). These dendrites are found in layer 1, the layer which is
the predominant target of feedback connections. It may be possible therefore that feedback con-
nections target layer 1, but cannot effectively modulate the firing-rate of cells unless NMDA-Rs
are activated.

The pharmacology of figure-ground modulation


We recently investigated the role that different glutamate receptors play in the texture-seg-
mentation task described earlier (Self et al. 2012). Our hypothesis was that FGM would pre-
dominantly rely on NMDA-R activation and would be blocked by the application of NMDA-R
antagonists. In contrast we suggested that feedforward processing of the signal would rely on
AMPA-R activation, but that these receptors would play no role in producing FGM. To address
this hypothesis we made laminar recordings from V1 in the same manner as described above
with one slight modification. The laminar electrodes now contained a fluid-line that allowed
us to inject pharmacological substances into different layers of cortex. We used CNQX, an
AMPA-R antagonist and APV and ifenprodil, which both block NMDA-Rs but with different
subunit specificity. APV is a broad-spectrum NMDA-R antagonist which blocks all NMDA-Rs
whereas ifenprodil is much more (>100x) specific for NMDA receptors containing the NR2B
subunit. In the texture-segregation task, the effects of the AMPA-R antagonist differed mark-
edly from those of the NMDA-R antagonists. CNQX strongly reduced responses in an early
response window (50–100ms after stimulus onset). Activity in this time-period is mostly
related to feedforward activation. Remarkably though, this drug had little effect on the level
of figure-ground modulation (Figure 16.5a). Indeed the level of modulation measured after
injections of CNQX was not significantly different from pre-injection levels. In contrast, both
NMDA-R antagonists strongly reduced FGM, whilst having opposing effects on the initial neu-
ral response. APV reduced responses during the early time window, though not to the extent
seen when using CNQX (Figure 16.5b). In contrast, ifenprodil actually increased responses in
this period (Figure 16.5c). Both NMDA-blockers reduced figure-ground modulation, and by
similar amounts. These results support our initial hypothesis that feedforward processing relies
predominantly on AMPA-R activity whereas figure-ground modulation is carried mostly by
NMDA-Rs.
(a)
CNQX 0.15
0.8 Figure
Ground
FGM (Pre)
0.6
Normalised MUA
0.1

Modulation Index
FGM (Drug)

0.4

0.2 0.05

0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)
(b) 0.25
APV Figure
0.8
Ground 0.2
FGM (Pre)

Modulation Index
0.6
Normalised MUA

FGM (Drug)
0.15
0.4
0.1
0.2
0.05
0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)

(c) IFENPRODIL
Figure
1.2
Ground
1 FGM (Pre)
FGM (Drug) 0.2
Normalised MUA

0.8
Modulation Index

0.6

0.4 0.1

0.2

0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)
Fig. 16.5  (a) An example of the effect of an injection of CNQX (an AMPA receptor antagonist). The
blue curves show the pre-drug response, the red drug shows the response recorded immediately
after the pressure injection of CNQX. The drug strongly reduced the initial response but had no
significant effect on the level of FGM. The right-hand graph shows a pre- and post-drug modulation
index score which is independent of the overall activity level (calculated as (Fig-Gnd)/(Fig+Gnd) using
the average activity from 0-200ms post-stimulus). (b) An example of the effect of APV, a broadband
NMDA-R antagonist. The drug has a minor effect of the initial activity level, but strongly reduces
FGM. (c) Ifenprodil blocks NMDA-Rs containing the NR2B subunit. This drug paradoxically increases
responses in general, but also causes a strong reduction in the level of FGM.
The Neural Mechanisms of Figure-ground Segregation 337

The effect of ifenprodil in this experiment was particularly interesting. Ifenprodil blocks
NMDA-Rs which contain the NR2B subunit (Williams 1993). This drug would therefore be
expected to generally reduce neural activity. In contrast we found that ifenprodil increases neu-
ral activity, while at the same time reducing figure-ground modulation. This combination of
effects suggests that NMDA-Rs containing the NR2B-subunit may be situated predominantly on
interneurons involved in inhibiting neural responses. It is not possible to determine from this
data whether the general effect of ifenprodil on excitability involves the same mechanisms that
produce the reduction in FGM. It may be possible to determine more precisely the role of the
different receptor subtypes by examining the distribution of different NMDA subunits on the dif-
ferent cell-types of V1 in future studies.

Towards a neural theory of figure-ground segmentation


In the previous sections we have outlined evidence from recent studies that supports a
two-process theory of figure-ground segmentation. In this theory the texture-defined bounda-
ries of objects are first detected through mutual inhibition between neurons tuned for simi-
lar features. We have observed how the boundaries of orientation-defined figures produce
enhanced neural firing in V1 and higher visual areas at short latencies in the superficial layers
of cortex. The second process that contributes to scene segmentation is a region growing pro-
cess. In our model, region growing begins with the detection of feature singletons by neurons
at multiple spatial scales throughout the visual system. These neurons then provide feedback
to neurons in early visual areas. We have also discussed evidence from other groups about
border-ownership signals, which are likely to play a complementary role in figure-ground seg-
regation, although the precise relationship between border-ownership coding, boundary detec-
tion and region growing remains to be determined. In particular it will be of great interest to
see how future models combine border-ownership coding with feedback-driven labeling of the
interior of figures to solve even the most complex figure-ground segregation tasks involving 3D
vision and overlapping surfaces (Kogo et al. 2010). We presented evidence that region-filling
leads to an enhanced neural representation for figure regions compared to backgrounds in V1.
Anatomical studies and our own pharmacological studies suggest that this signal is restricted
to the figure representation through a combination of feedback connections targeting the den-
drites of deep and superficial layer neurons in layer 1 and in layer 5 and the use of NMDA
receptors to confine the modulatory signal to the most active neurons. By implementing these
two mechanisms the visual system enhances the representation of figure surfaces in comparison
to the background to permit figure perception and to enable accurate saccades to the center of
such a figure. Figure-ground segregation may be one of the first visual tasks where we start to
understand the relative contributions of feedforward, lateral and feedback processing to per-
ceptual organization.

Acknowledgements
The research leading to these results has received funding from the European Union Sixth and
Seventh Framework Programmes (EU IST Cognitive Systems, project 027198  ‘‘Decisions in
Motion’’ and project 269921 ‘‘BrainScaleS’’) and a NWO-VICI grant awarded to P.R.R.
338 Self and Roelfsema

References
Allman, J., Miezin, F., and McGuinness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Annu
Rev Neurosci 8: 407–30.
Anderson, J.C. and Martin, K.A. (2009). The synaptic connections between cortical areas V1 and V2 in
macaque monkey. J Neurosci 29: 11283–93.
Angelucci, A. and Bullier, J. (2003). Reaching beyond the classical receptive field of V1 neurons: horizontal
or feedback axons? J Physiol Paris 97: 141–54.
Angelucci, A., Levitt, J.B., Walton, E.J., Hupe, J.M., Bullier, J., and Lund, J.S. (2002). Circuits for local and
global signal integration in primary visual cortex. J Neurosci 22: 8633–46.
Bair, W., Cavanaugh, J.R., and Movshon, J.A. (2003). Time course and time-distance relationships for
surround suppression in macaque V1 neurons. J Neurosci 23: 7690–701.
Bhatt, R., Carpenter, G.A., and Grossberg, S. (2007). Texture segregation by visual cortex: perceptual
grouping, attention, and learning. Vision Res 47: 3173–211.
Brincat, S.L. and Connor, C.E. (2004). Underlying principles of visual shape selectivity in posterior
inferotemporal cortex. NatNeurosci 7: 880–6.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. J Neurophysiol 97: 4310–26.
Crick, F. and Koch, C. (1998). Constraints on cortical and thalamic projections: the no-strong-loops
hypothesis. Nature 391: 245–50.
Daw, N.W., Stein, P.S., and Fox, K. (1993). The role of NMDA receptors in information processing. Annu
Rev Neurosci 16: 207–22.
Dehaene, S., Sergent, C., and Changeux, J.P. (2003). A neuronal network model linking subjective reports
and objective physiological data during conscious perception. Proc Natl Acad Sci USA 100: 8520–5.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annu Rev Neurosci
18: 193–222.
Douglas, R.J. and Martin, K.A. (2004). Neuronal circuits of the neocortex. Annu Rev Neurosci 27: 419–51.
Ekstrom, L.B., Roelfsema, P.R., Arsenault, J.T., Bonmassar, G., and Vanduffel, W. (2008). Bottom-up
dependent gating of frontal signals in early visual cortex. Science 321: 414–17.
Felleman, D.J. and Van Essen, D.C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cereb Cortex 1: 1–47.
Fox, K., Sato, H., and Daw, N. (1990). The effect of varying stimulus intensity on NMDA-receptor activity
in cat visual cortex. J Neurophysiol 64: 1413–28.
Fries, W. and Distel, H. (1983). Large layer VI neurons of monkey striate cortex (Meynert cells) project to
the superior colliculus. Proc R Soc Lond B Biol Sci 219: 53–9.
Gilbert, C.D. and Wiesel, T.N. (1983). Clustered intrinsic connections in cat visual cortex. J Neurosci 3:
1116–33.
Grossberg, S. and Mingolla, E. (1985). Neural dynamics of form perception: boundary completion, illusory
figures, and neon color spreading. Psychol Rev 92: 173–211.
Hagihara, K., Tsumoto, T., Sato, H., and Hata, Y. (1988). Actions of excitatory amino acid antagonists on
geniculo-cortical transmission in the cat’s visual cortex. Exp Brain Res 69: 407–16.
Jehee, J.F., Lamme, V.A., and Roelfsema, P.R. (2007). Boundary assignment in a recurrent network
architecture. Vision Res 47: 1153–65.
Johnson, R.R. and Burkhalter, A. (1994). Evidence for excitatory amino acid neurotransmitters in forward
and feedback corticocortical pathways within rat visual cortex. Eur J Neurosci 6: 272–86.
Jones, H.E., Grieve, K.L., Wang, W., and Sillito, A.M. (2001). Surround suppression in primate V1.
J Neurophysiol 86: 2011–28.
The Neural Mechanisms of Figure-ground Segregation 339

Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1997). Neuronal correlates of pop-out in cat striate cortex.
Vision Res 37: 371–6.
Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1999). Neuronal responses to orientation and motion
contrast in cat striate cortex. Vis Neurosci 16: 587–600.
Kayaert, G., Biederman, I., Op de Beeck, H.P., and Vogels, R. (2005). Tuning for shape dimensions in
macaque inferior temporal cortex. Eur J Neurosci 22: 212–24.
Knierim, J.J. and Van Essen, D.C. (1992). Neuronal responses to static texture patterns in area V1 of the
alert macaque monkey. J Neurophysiol 67: 961–80.
Kogo, N., Strecha, C., Van, G.L., and Wagemans, J. (2010). Surface construction by a 2-D
differentiation-integration process: a neurocomputational model for perceived border ownership, depth,
and lightness in Kanizsa figures. Psychol Rev 117: 406–39.
Lamme, V.A. (1995). The neurophysiology of figure-ground segregation in primary visual cortex. J Neurosci
15: 1605–15.
Lamme, V.A., Rodriguez-Rodriguez, V., and Spekreijse, H. (1999). Separate processing dynamics for
texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey. Cereb
Cortex 9: 406–13.
Larkum, M.E., Nevian, T., Sandler, M., Polsky, A., and Schiller, J. (2009). Synaptic integration in tuft
dendrites of layer 5 pyramidal neurons: a new unifying principle. Science 325: 756–60.
Levitt, J.B. and Lund, J.S. (1997). Contrast dependence of contextual effects in primate visual cortex. Nature
387: 73–6.
Li, W., Thier, P., and Wehrhahn, C. (2001). Neuronal responses from beyond the classic receptive field in
V1 of alert monkeys. Exp Brain Res 139: 359–71.
Li, Z. (1999). Visual segmentation by contextual influences via intra-cortical interactions in the primary
visual cortex. Network 10: 187–212.
Luck, S.J., Chelazzi, L., Hillyard, S.A., and Desimone, R. (1997). Neural mechanisms of spatial selective
attention in areas V1, V2, and V4 of macaque visual cortex. J Neurophysiol 77: 24–42.
Lumer, E.D., Edelman, G.M., and Tononi, G. (1997). Neural dynamics in a model of the thalamocortical
system. I. Layers, loops and the emergence of fast synchronous rhythms. Cereb Cortex 7: 207–27.
Marcus, D.S. and Van Essen, D.C. (2002). Scene segmentation and attention in primate cortical areas
V1 and V2. J Neurophysiol 88: 2648–58.
Marr, D. and Hildreth, E. (1980). Theory of edge detection. Proc R Soc Lond B Biol Sci 207: 187–217.
Martinez-Trujillo, J.C. and Treue, S. (2004). Feature-based attention increases the selectivity of population
responses in primate visual cortex. Curr Biol 14: 744–51.
Maunsell, J.H. and Van Essen, D.C. (1983). The connections of the middle temporal visual area (MT) and
their relationship to a cortical hierarchy in the macaque monkey. J Neurosci 3: 2563–86.
Miller, E.K., Gochin, P.M., and Gross, C.G. (1993). Suppression of visual responses of neurons in inferior
temporal cortex of the awake macaque by addition of a second stimulus. Brain Res 616: 25–9.
Mitzdorf, U. (1985). Current source-density method and application in cat cerebral cortex: investigation of
evoked potentials and EEG phenomena. Physiol Rev 65: 37–100.
Mumford, D., Kosslyn, S.M., Hillger, L.A., and Herrnstein, R.J. (1987). Discriminating figure from
ground: the role of edge detection and region growing. Proc Natl Acad Sci USA 84: 7354–8.
Nassi, J.J. and Callaway, E.M. (2009). Parallel processing strategies of the primate visual system. Nat Rev
Neurosci 10: 360–72.
Nelson, J.I. and Frost, B.J. (1978). Orientation-selective inhibition from beyond the classic visual receptive
field. Brain Res 139: 359–65.
Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (1999). Response modulation by texture surround in
primate area V1: correlates of “popout” under anesthesia. Vis Neurosci 16: 15–34.
340 Self and Roelfsema

Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (2000). Response profiles to texture border patterns in
area V1. Vis Neurosci 17: 421–36.
Perkel, D.J., Bullier, J., and Kennedy, H. (1986). Topography of the afferent connectivity of area 17 in the
macaque monkey: a double-labelling study. J Comp Neurol 253: 374–402.
Poort, J., Raudies, F., Wannig, A., Lamme, V.A., Neumann, H., and Roelfsema, P.R. (2012). The role of
attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron 75: 143–56.
Qiu, F.T., Sugihara, T., and von der, H.R. (2007). Figure-ground mechanisms provide structure for selective
attention. Nat Neurosci 10: 1492–9.
Reynolds, J.H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. J Neurosci 19: 1736–53.
Rockland, K.S. and Pandya, D.N. (1979). Laminar origins and terminations of cortical connections of the
occipital lobe in the rhesus monkey. Brain Res 179: 3–20.
Rockland, K.S. and Van Hoesen, G.W. (1994). Direct temporal-occipital feedback connections to striate
cortex (V1) in the macaque monkey. Cereb Cortex 4: 300–13.
Rockland, K.S. and Virga, A. (1989). Terminal arbors of individual “feedback” axons projecting from area
V2 to V1 in the macaque monkey: a study using immunohistochemistry of anterogradely transported
Phaseolus vulgaris-leucoagglutinin. J Comp Neurol 285: 54–72.
Roelfsema, P.R. (2006). Cortical algorithms for perceptual grouping. Annu Rev Neurosci 29: 203–27.
Roelfsema, P.R. and Houtkamp, R. (2011). Incremental grouping of image elements in vision. Atten Percept
Psychophys 73: 2542–72.
Roelfsema, P.R., Lamme, V.A., Spekreijse, H., and Bosch, H. (2002). Figure-ground segregation in a
recurrent network architecture. J Cogn Neurosci 14: 525–37.
Roelfsema, P.R., Khayat, P.S., and Spekreijse, H. (2003). Subtask sequencing in the primary visual cortex.
Proc Natl Acad Sci USA 100: 5467–72.
Rossi, A.F., Desimone, R., and Ungerleider, L.G. (2001). Contextual modulation in primary visual cortex
of macaques. J Neurosci 21: 1698–709.
Salin, P.A. and Bullier, J. (1995). Corticocortical connections in the visual system: structure and function.
Physiol Rev 75: 107–54.
Salin, P.A., Bullier, J., and Kennedy, H. (1989). Convergence and divergence in the afferent projections to
cat area 17. J Comp Neurol 283: 486–512.
Salin, P.A., Kennedy, H., and Bullier, J. (1995). Spatial reciprocity of connections between areas 17 and 18
in the cat. Can J Physiol Pharmacol 73: 1339–47.
Schroeder, C.E., Tenke, C.E., Givre, S.J., Arezzo, J.C., and Vaughan, H.G., Jr. (1991). Striate cortical
contribution to the surface-recorded pattern-reversal VEP in the alert monkey. Vision Res 31: 1143–57.
Schroeder, C.E., Mehta, A.D., and Givre, S.J. (1998). A spatiotemporal profile of visual system activation
revealed by current source density analysis in the awake macaque. Cereb Cortex 8: 575–92.
Self, M.W., Kooijmans, R.N., Super, H., Lamme, V.A., and Roelfsema, P.R. (2012). Different glutamate
receptors convey feedforward and recurrent processing in macaque V1. Proc Natl Acad Sci USA
109: 11031–6.
Self, M. W., van Kerkoerle, T., Supèr, H., and Roelfsema, P.R. (2013). Distinct roles of the cortical layers of
area V1 in figure-ground segregation. Curr Biol 23: 2121–9.
Sheinberg, D.L. and Logothetis, N.K. (2001). Noticing familiar objects in real world scenes: the role of
temporal cortical neurons in natural vision. J Neurosci 21: 1340–50.
Sherman, S.M. and Guillery, R.W. (1998). On the actions that one nerve cell can have on
another: distinguishing ‘drivers’ from ‘modulators’. Proc Natl Acad Sci USA 95: 7121–6.
Shmuel, A., Korman, M., Sterkin, A., Harel, M., Ullman, S., Malach, R., and Grinvald, A. (2005).
Retinotopic axis specificity and selective clustering of feedback projections from V2 to V1 in the owl
monkey. J Neurosci 25: 2117–31.
The Neural Mechanisms of Figure-ground Segregation 341

Sillito, A.M., Grieve, K.L., Jones, H.E., Cudeiro, J., and Davis, J. (1995). Visual cortical mechanisms
detecting focal orientation discontinuities. Nature 378: 492–6.
Stettler, D.D., Das, A., Bennett, J., and Gilbert, C.D. (2002). Lateral connectivity and contextual
interactions in macaque primary visual cortex. Neuron 36: 739–50.
Sugihara, T., Qiu, F.T., and von der, H.R. (2011). The speed of context integration in the visual cortex.
J Neurophysiol 106: 374–85.
Supèr, H., Spekreijse, H., and Lamme, V.A. (2001). Two distinct modes of sensory processing observed in
monkey primary visual cortex (V1). Nature Neuroscience 4: 304–10.
Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science 262: 685–8.
Thorpe, S., Fize, D., and Marlot, C. (1996). Speed of processing in the human visual system. Nature
381: 520–2.
Treisman, A.M. and Gelade, G. (1980). A feature-integration theory of attention. Cogn Psychol 12: 97–136.
Treue, S. and Martinez-Trujillo, J.C. (1999). Feature-based attention influences motion processing gain in
macaque visual cortex. Nature 399: 575–9.
Wannig, A., Stanisor, L., and Roelfsema, P.R. (2011). Automatic spread of attentional response modulation
along Gestalt criteria in primary visual cortex. Nat Neurosci 14 1243–4.
Williams, K. (1993). Ifenprodil discriminates subtypes of the N-methyl-D-aspartate receptor: selectivity
and mechanisms at recombinant heteromeric receptors. MolPharmacol 44: 851–9.
Wolfe, J.M., Cave, K.R., and Franzel, S.L. (1989). Guided search: an alternative to the feature integration
model for visual search. J Exp Psychol Hum Percept Perform 15: 419–33.
Wolfson, S.S. and Landy, M.S. (1998). Examining edge—and region-based texture analysis mechanisms.
Vision Res 38: 439–46.
Wurtz, R.H. and Albano, J.E. (1980). Visual-motor function of the primate superior colliculus.
AnnuRevNeurosci 3: 189–226.
Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area v2. Neuron 47: 143–53.
Zhou, H., Friedman, H.S., and von der, H.R. (2000). Coding of border ownership in monkey visual cortex.
J Neurosci 20: 6594–611.
Zipser, K., Lamme, V.A., and Schiller, P.H. (1996). Contextual modulation in primary visual cortex.
J Neurosci 16: 7376–89.
Chapter 17

Neural mechanisms of figure-ground


organization: Border-ownership,
competition and perceptual switching
Naoki Kogo and Raymond van Ee

Introduction
Perception of depth order in a natural visual scene, with multiple overlapping surfaces, is a highly
non-trivial task for our visual system. To interpret the visual input—in fact a 2D image containing
a collection of borders between abutting image regions—the visual system must determine how
the borders are being created:  which of two overlapping surfaces is closer (‘figure’) and which
continues behind (‘ground’). This so-called ‘figure-ground’ determination involves integration
of contextual visual signals. In this chapter, we review the neural mechanisms of figure-ground
organization.

The properties of border-ownership


The computation of depth order at a border of regions involves assignment of the ‘owner side’
of the border (BOWN): at each location of a border there are two possible owner sides compet-
ing for ownership. The ‘border-ownership’ is assigned to the surface that is closer to the viewer
consistent with the border being perceived as the edge of the surface (Nakayama, Shimojo, and
Silverman 1989). When, for example, the visual system is exposed to an image such as shown in
Figure 17.1A, we perceive the orange rectangle to be on top of the green background: the border
is ‘owned’ by the orange side (Figure 17.1B–D).
Border signals and BOWN signals have fundamentally different properties: the border signal indi-
cates solely the existence of the border; the BOWN signal specifies polarity associated with the owner
side of the border. When there are multiple surfaces, BOWN has to be assigned at each location of the
boundary (Figure 17.1E). For example, the orange oval owns the border with the brown square (*),
but the border between the orange oval and the blue square (**) is owned by the blue square. In some
cases, the depth order cannot be determined (Figure 17.1F). Ownership of a border may gradually
shift from one side to the other side (Figure 17.1G). Due to occlusion, BOWN of the vertical border
between the orange and the green surfaces appears to be on the left at the lower part and on the right
at the upper part. As shown in Figure 17.1H, there is an apparent preference for border-ownership by
surfaces with convex shape (Koffka 1935; Peterson and Salvagio 2008; see also Peterson, this volume).
A geometrical layout of borders is not always sufficient to determine the ownership (Figure 17.1I).
Even though the two images have exactly the same borders, the ownership of the border is reversed
because the small oval region is perceived as a hole due to the consistency of its texture with the back-
ground (compare the ownerships at * in left and right, see also Bertamini and Casati, this volume).
Neural Mechanisms of Figure-ground Organization 343

BOWN is being computed in a context-sensitive manner. The image in Figure 17.1J is being
perceived as a green disk on top of an orange rectangle, meaning that the part of the border within
the black circle is owned by the left side, the green disk. When the image is modified such as in
Figure 17.1K, it is perceived as an orange object on top of the large green rectangle and the same
part of the border within the circle is the edge of the orange object. The reversal of BOWN also
happens in Figure 17.1L and 1M even though the local properties within the circle are exactly
the same. This clearly indicates that BOWN cannot be determined by the local properties alone.

Neural mechanisms of border-ownership computation


Discovery of border-ownership-sensitive neurons
The laboratory of von der Heydt has produced seminal results, demonstrating that neural activity
associated with border ownership is present in macaque visual cortex (Zhou, Friedman, and von
der Heydt 2000). With single-unit recording, they first specified the receptive field size, as well as
the orientation tuning of neurons in V1, V2, and V4. Subsequently, they presented images such
as shown in Figure 17.2 so that a region border covered the receptive field and matched the pre-
ferred orientation of the neuron. While they kept the geometrical properties within the receptive
field (black ovals) exactly the same, they modified the global context (Figure 17.2Aa and 17.2Ab).
In Figure 17.2Aa1, for example, when the grey square is present on the left side of the border, we
perceive the square as a figure on top of the white background. In Figure 17.2Ab1, on the other
hand, the white square on the right is perceived as being on top of the grey background. In other
words, while the local properties within the receptive field are kept the same, the perceived owner-
ship of the border is reversed.
The responses of the neurons were consistently associated with the preference of the perceived
‘owner side’ of the border. For example, the responses of a neuron shown in Figure 17.2Ac were
stronger when the figural surface was present at the right side. In most of the cases (Figure 17.2A),
the responses were stronger when the visual stimulus implied that the right side surface was closer
to the viewer. Note that the presence of BOWN-sensitive neurons varied across the visual cor-
tex: V1 (18%), V2 (59%), and V4 (53%) out of all orientation-sensitive neurons, suggesting hier-
archical processing.
Are these neurons truly the neuronal entities involved in BOWN computations? If so, these
neurons must be strongly involved in depth perception. Qiu and von der Heydt (2005), from the
same laboratory, investigated the involvement of these neurons in depth computation. They found
that 21% of neurons in V2 (and 3% in V1) exhibited responses tuned consistently to both the
depth order based on the figure-ground cues and the stereo-disparity cues.

Extra fast processing mechanism of border-ownership


computation
The onset latency of the BOWN-sensitive component of the responses is extremely short: 75 ms
from the onset of the input and 27 ms from the onset of the first arriving signals (Figure 17.2B).
Interestingly, the difference between onset latency for a small rectangle and onset latency for
a large rectangle appears to be relatively small (Sugihara, Qiu, and von der Heydt 2011). The
context-sensitive nature of BOWN indicates that the underlying neural mechanisms involve global
interactions, implying that the signals travel a long distance within an extremely short period.
These aspects turn out to be important constraints for developing neural models because the
fast signal processing in the BOWN computation cannot be explained by horizontal connections
(a) View angle

Figure-ground
interpretation

Owner side
(b) (d)

(c)

(e) (f) (g)

∗*
∗∗
**

(h) (i)

∗ ∗

(j) (k)

(l) (m)

Fig. 17.1  Continued.
Neural Mechanisms of Figure-ground Organization 345

(Craft et  al. 2007; Sugihara et  al. 2011; Zhang and von der Heydt 2010; Zhou et  al. 2000). In
macaques, the horizontal connections extend in the range of 2~4 mm in V2 (Amir, Harel, and
Malach 1993; Levitt, Kiper, and Movshon 1994), (note, one degree corresponds to 4~6 mm in
macaques; see, for example, Polimeni, Balasubramanian, and Schwartz 2006). Reaching distal
parts in cortical space using horizontal connections would require polysynaptic connections at
the cost of an increased processing period. Furthermore, the unmyelinated axons of these hori-
zontal connections have low conduction velocities (0.3 m/s; Girard, Hupe, and Bullier 2001).
Based on this analysis, as well as on the fact that the latencies in response were relatively invariant
under different figure sizes, Zhou et al. (2000) suggested that the global interactions in the BOWN
computation are achieved by feedforward-feedback loops. Such loops are physiologically realistic
because it has been shown that the feedforward-feedback connections involving myelinated axons
with conduction velocity of about 3.5 m/s (Girard et al. 2001), being ten times faster than the hori-
zontal connections. In addition, if the signals are conducted ‘vertically’ between layers, the size of
the figural surfaces would have less influence on the conduction distances. They proposed that the
collective BOWN signals activate a ‘grouping cell’ at a higher processing level, and that the group-
ing cell’s output is fed back to the BOWN-sensitive neurons (Figure 17.2C; Craft et al. 2007).

Competitive signal processing


For each location and orientation of a border throughout the visual field, there may exist a pair
of BOWN-sensitive neurons consisting of opposite preferred owner sides. This is schematically
drawn in Figure 17.3A for eight different orientations at one single location. The pair matching
the orientation of the border may then initiate the border-ownership competition through which
one of the competing signals become more dominant (Figure 17.3B).

Fig. 17.1  The concept of border-ownership (BOWN) and its properties. (a) When an image on the left
is presented, it is interpreted as an orange rectangle on top of a green surface (right). (b) A symbol
of BOWN signal used in this chapter. The straight line is aligned to the boundary and the side arrow
indicates the owner side. (c) At each location of boundaries, there are two possible ownerships that
compete. (d) After establishing the interpretation of an image, one of the competing sides becomes
the owner: inside of the rectangle in this example. (e) There could be multiple surfaces overlapping.
BOWN has to be determined for individual boundary sections between different pair of surfaces.
Here, the orange oval owns the boundary with the brown square (asterisk), but the boundary
between the orange oval and the blue square is owned by the blue square (double asterisks).
(f) In some cases BOWN cannot be determined such as in this example. There are no cues to favour
one of the two owner sides of the middle boundary. (g) BOWN can be reversed along a single
boundary section. The vertical boundary is perceived to be owned by the orange rectangle near the
bottom but owned by the green surface near the top. (h) The convexity preference of BOWN. The
white regions are associated with more convex shapes than the black regions and hence subjects
often report the white regions on top of the black background. (https://dl.dropboxusercontent.
com/u/47189685/Convexity%20Context%20Stimuli.zip). (i) The convexity is not a deterministic
factor. On the left, the central disk may be perceived as on top of the oval but on the right, with the
consistent texture with the background, the enclosed area is perceived as a hole with a part of the
background seen through it. (j) and (k) In (j), the ownership of the boundary between the orange disk
and the green rectangle belongs to the left while, in (k), it belongs to the right. The local properties
around the boundary are exactly the same in the two images (compare the local properties within the
black circles). Only the rest of the image, the global configuration is different. (l) and (m). The owner
side is reversed without changing the local properties within the black circles.
A 1 2 3 4 5 6
(a)

(b)

(c)
10°
20
(spikes/sec)
Response

10

0
a b a b a b a b a b a b

B V1 (n=7) C

Grouping cell

0 200 400 600 800

V2 (n=38) Feedback Feedforward

0 200 400 600 800

V4 (n=17)

0 200 400 600 800


Time (ms)

Preferred
Non-preferred

Fig. 17.2  Continued.
Neural Mechanisms of Figure-ground Organization 347

While the competition for the BOWN pair concerns an assignment of local depth order there is also
competition between global interpretations. A stimulus such as shown in Figure 17.3C—the famous
face-vase illusion by Rubin (1921)—evokes two competing perceptual interpretations (two faces vs one
vase). When the two faces are perceived as ‘figures’, the vase is perceived as part of ‘background’. When
perception switches, this relationship is reversed. Hence, this is a bistable figure-ground stimulus. The
perceptual switch corresponds to the reversal of the ownership of the borders. In Figure 17.3D, the
BOWN signal associated with the face side, B1, indicates that the face is closer to the viewer and the
competing BOWN signal, B2, indicates that the vase is closer. The associated depth map for each of the
interpretations specifies either the face or the vase as figural surface, while the locally assigned BOWN
signals coherently indicate the owner side (Figure 17.3E and 17.F). Bistable figure-ground perception
is a key phenomenon to investigate how global aspects of figure-ground organization and local com-
petitive BOWN computations are being integrated. Moreover, it reveals the temporal dynamics of the
underlying mechanisms (see ‘Computation of bistable figure-ground perception’).

Brain activity correlated to figure-ground organization


and involvement of feedback
In a series of papers, Lamme and colleagues examined neural responses in macaque V1 when a
textured area changed from background to figure (Lamme 1995; Lamme, Rodriguez-Rodriguez,
and Spekreijse 1999; Lamme et al. 2000; Lamme, Zipser, and Spekreijse 1998; Scholte et al. 2008;
Supèr, Spekreijse, and Lamme 2001; Supèr et al. 2003; Supèr and Lamme 2007). They presented a
textured image consisting of a central area whose line orientation was perpendicular to the sur-
rounding line orientation, creating a figure segmented from the background. V1 neurons showed
enhancement in activity only when the classic receptive field was located within the segmented
surface, indicating filling in of the enclosed area. They showed that this enhancement of the neural
activity starts later and is clearly distinguished from the early-onset responses. This long-onset
latency suggests involvement of a feedback mechanism and they proposed that figure-ground
organization is computed at a higher level, and that filling-in signals are fed back to V1 neurons.
They developed a neurocomputational model to reproduce this phenomenon (Roelfsema et al.

Fig. 17.2  BOWN-sensitive neurons in macaque visual cortex. (a) The images shown here are presented so
that the boundary between the surfaces matches the orientation and the position of the classic receptive
fields (black oval) of the recorded neuron. Perceived owner side is reversed between the six figures on
the top (a1~6) and the ones on the bottom (b1~6). In the columns 1, 2, 5, and 6, the figures on the top
row create BOWN on the left side, while the bottom row create on the right side. In the columns in 3 and
4, BOWN is on the right on the top and on the left on the bottom. As shown in c, the neural responses
reflect the reversal of the ownership showing, in this example, the preference to the right side. (b) The
time course of the neural response to BOWN. The BOWN-sensitive component (the difference between
the responses to the preferred and non-preferred owner side) emerges quickly after the stimulus onset.
(c) Because of the short onset latency of BOWN-sensitive component and the minimum dependency to
size, Craft et al. (2007) hypothesized that BOWN is computed by feedback connections. A ‘grouping cell’
at a higher level collects the BOWN signals through the feedforward connections and quickly distribute
the signal to the congruent BOWN signals through the feedback connections.
(a) Reproduced from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of Border Ownership
in Monkey Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611 Copyright © 2000, The Society
for Neuroscience. (b) Reproduced from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of
Border Ownership in Monkey Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611 Copyright ©
2000, The Society for Neuroscience. (c) Data from Edward Craft, Hartmut Schütze, Ernst Niebur, and Rüdiger von
der Heydt, A Neural Model of Figure–Ground Organization, Journal of Neurophysiology, 97(6), pp. 4310–4326
DOI: 10.1152/jn.00203.2007, 2007.
348 Kogo and van Ee

(a) (b)

(c) (d)

B2

B1

(e) (f)
Depth
Depth

Y
Y X
X
Fig. 17.3  (a) BOWN-sensitive neurons may be distributed to cover the whole visual field (grey square)
and, at each location (e.g. black dot), there is a bank of neurons assigned for different orientations and
for opposite ownership sides. (b) At the end of the computation, one of the competing signals may
become more dominant than the other. (c–f) When a ‘face or vase’ image (c) is presented, bistable
figure-ground perception is created. The perceptual switch of figure-ground corresponds to the coherent
reversal of BOWN at each location. For example, at the boundary on the ‘nose’ (d), the ownerships are
constantly reversing (B1 and B2) corresponding to the perception of ‘face’ (e) or ‘vase’ (f).

2002; see also Self and Roelfsema, this volume). In this model multiple layers were hierarchically
organized through feedforward and feedback connections, and increasing receptive field size with
higher levels of processing accounted for the filling in of segmented areas.
Qiu, Sugihara, and von der Heydt (2007) demonstrated the effect of attention on BOWN-
sensitive activity and they argued that grouping cells (integrating the BOWN signals)
Neural Mechanisms of Figure-ground Organization 349

constitute an efficient platform to implement selective attention (Craft et al. 2007; Mihalas
et al. 2011). FMRI results by Fang, Boyaci, and Kersten (2009) demonstrated that area V2 in
humans is sensitive to BOWN and that this BOWN sensitivity can be modified by attention. A
recent study by Poort et al. (2012) reported that a characteristic late component in the neural
responses—reflecting the perception of figure-ground—can also be modified by attention.
Neural correlates of figure-ground organization have also been investigated using other experi-
mental paradigms. Appelbaum et al. (2006, 2008) exposed observers to a homogeneous texture in
which figure and background differed only in their flicker frequencies. Using steady-state EEG in
combination with fMRI, they reported that the ‘frequency tagged’ signals from the figure resided
in the lateral cortex, while the ones for the background resided in the dorsal cortex. Likova and
Tyler (2008), using a different random-dot refresh rate for figure and background, reported that
fMRI signals in V1 and V2 were associated with a suppression of the background. They suggested
that the suppression reflected feedback from higher processing levels.
Using MEG, Parkkonen et al. (2008) investigated neural activity corresponding to a perceptual
switch during bistable figure-ground perception. They used a modified face or vase image in which
noise was superimposed. The noise was updated with distinct frequency tags for the face region and
the vase region. They reported activity modulations in the early visual cortex including primary
visual cortex corresponding to the perceptual switches. Because the perceptual switches are linked
to the way the image is interpreted at a higher level (by coherently integrating the lower-level sig-
nals), they suggested that top-down influences modify low-level neural activity. Other studies using
face or vase images also reported the involvement of top-down feedback in perceptual switching:
patients with lesions in the prefrontal cortex were less able to exert voluntary control over percep-
tual switching than normal subjects (Windmann et al. 2006), suggesting that the prefrontal cortex is
capable of controlling perceptual switching by sending feedback signals to the lower level. In addi-
tion, variation of the fMRI activity in the fusiform face area correlates to the subsequent perception
of a face, indicating that the ongoing level of face-sensitive neural activity influences the lower-level
activity involved in the switching (Hesselmann and Malach 2011). Pitts et al. (2007; 2011) reported
that the P1 and N1 components in EEG signals correlated to a perceptual face-vase switch and
they suggested that the perceptual switch was modulated by attention. These empirical data suggest
dynamic interactions between lower-level processing and higher-level processing.

Hierarchical organization and involvement


of top-down feedback projections
The possible involvement of feedback in figure-ground organization necessitated a new way to
view its underlying computational mechanism. Unlike the conventional view with a feedforward-
only system where the sequence of the signal processing corresponds to the order of the hier-
archical organization, the causality relationships between different perceptual properties in the
feedback system with mutual interactions have to be analysed with caution. The involvement of
a feedback process may entail the possibility that BOWN/figure-ground computation is being
influenced by image properties such as familiarity or anticipation of a surface shape, and even
other higher-level factors such as emotion. The exact computational mechanism for the feedback
modulation of such higher-level properties is, however, not known. Furthermore, it is also pos-
sible that there is a dissociation between the lower-level activity such as BOWN-sensitive neurons
and cognitive figure-ground perception. As explained in this section, this is an issue that is still
under debate, and a clear picture of the dynamic computational processes awaits future research.
(a) (d)

(b) Object units

∗ Figure units

Boundary units

Input from image

(c) E Object
representations

Figure-ground/depth segregation
Binocular cues

Configuration
Monocular cues
ss

cues
ce
ac
ed


as

B C
-b
ge

A Image
Ed

Fig. 17.4  (a) The familiarity of shape influences figure-ground perception. When an image with
the silhouette of a girl on both sides is presented, subjects tend to choose the ‘girl’ areas as
figures. When the same image is presented upside down (right), this bias disappears. Note that the
geometrical properties of the boundaries are the same in both images. Only on the left, the familiar
shape is recognized. (Reproduced from Mary A. Peterson, Erin M. Harvey, and Hollis J. Weidenacher,
Shape recognition contributions to figure-ground reversal: Which route counts? Journal of
Experimental Psychology: Human Perception and Performance, 17(4), pp. 1075–1089. http://dx.doi.
org/10.1037/0096-1523.17.4.1075, Copyright © 1991, American Psychological Association) (b)
A model proposed by Vecera and O’Reilly. The ‘boundary’ unit (corresponding to BOWN signals),
‘figure’ unit (for figure-ground organization, red asterisk), and ‘object’ unit (shape/object detection)
are hierarchically organized with mutual connections between layers. (Reproduced from Shaun
P. Vecera and Randall C. O'Reilly, Figure-ground organization and object recognition processes:
Neural Mechanisms of Figure-ground Organization 351

In behavioural studies, Peterson (Peterson, Harvey, and Weidenbacher 1991) reported that
when an image is segmented into several competing shapes, the one that has a familiar shape
tends to be chosen as a figure. In Figure 17.4A left, the two black areas are perceived as a silhouette
of a woman. Subjects selected these black areas as a figure more often than the white area. This
is not due to the local properties, such as the curvature of the borders because when the image is
shown upside down (Figure 17.4A right), the subjects’ preference for choosing the black areas as
figure was significantly reduced. This result suggests that information of competing areas is ana-
lysed at a higher level, and that the familiarity of the shapes can influence which area becomes the
figure through feedback projections (see also Peterson, this volume).
Using hierarchical layers that are interconnected by feedforward-feedback connections,
Kienker et al. (1986) incorporated the effect of attention on figure-ground organization. Vecera
and O’Reilly (1998) further elaborated on this work (Figure 17.4B). This model, with hierar-
chical layers that are mutually connected, includes a figure-ground layer (‘figure unit’) and an
object-detection layer (‘object unit’). The figure-ground layer is situated before the object-detection
process but they interact with one another through mutual connections. Vecera and O’Reilly
noted that the results by Peterson et al., on the influence of familiarity on figure-ground organi-
zation, could be explained this way, but Peterson pointed out that the model can reproduce the
effect of familiarity only when the low-level figure-ground cues are ambiguous (Peterson 1999,
but see the counter-argument by Vecera and O’Reilly 2000). Using examples in which the unam-
biguous low-level cues can be superseded by the familiarity cues (Peterson et al. 1991; Peterson
and Gibson 1993), Peterson argued that the figure-ground-first approach is limited and offered
a different model (Figure 17.4C). Note that, in Vecera’s model, a layer is connected only to the
immediately higher and the immediately lower layer: the connections do not go beyond them to
connect to the two (or more) layers forward or backward directly (Figure 17.4D left). On the con-
trary, Peterson’s model has a bypass that connects the sensory signals (low-level properties before
figure-ground) directly to the object-detection layer (Figure 17.4C). In other words, the key ele-
ment in Peterson’s model involves mutual connections between multiple layers (Figure 17.4D,
right, see Felleman and Van Essen 1991 for multi-level mutual connections).
Some neurophysiological studies investigated the relationship between depth order percep-
tion and neural activity in the lateral occipital complex (LOC) in humans, and inferior-temporal
region (IT) in monkeys. When a surface is presented repeatedly, the brain areas that are activated
in response to the shape of the surface adapt, and neural activity declines. Using fMRI, Kourtzi
and Kanwisher (2001) found the same amount of adaptation in area LOC both when a surface is
presented behind bars and in front of bars (Figure 17.5A). Note that when the surface is behind
the bars, the surface is segmented into several subregions. If depth order had not been computed,
these subregions would not have been recognized as parts of a single surface. This result suggests
that the shape of the object is established after the depth computation, causing adaptation in object
area LOC. Furthermore, they showed that when an image is divided into two areas, and stereo

An interactive account, Journal of Experimental Psychology: Human Perception and Performance,


24(2), pp. 441–462. http://dx.doi.org/10.1037/0096-1523.24.2.441, Copyright (c) 1998, American
Psychological Association) (c) The model proposed by Peterson. Note that there is a route from the
input to the object detection unit (blue asterisk) bypassing the figure-ground unit (red asterisk).
(Reproduced from Mary A. Peterson, What’s in a stage name? Comment on Vecera and O'Reilly,
Journal of Experimental Psychology: Human Perception and Performance, 25(1), pp. 276–286. http://
dx.doi.org/10.1037/0096-1523.25.1.276, Copyright (c) 1999, American Psychological Association)
(d) In general, a hierarchical organization may have mutual connections only between the next layers
(left) or between all layers with bypassing connections (right).
A B
(a) Same shape (b) Same contours (a) Shape #1

Contrast a c
reversal

e g

Figure-ground reversal
b d
Completely different Same depth
(c)
Shape #2~4
0.25
% signal change from fixation baseline

f h
0.20 Same shape

0.15 Mirror reversal

0.10
Identical
0.05 (b) 40
a c

Spikes/s
20
0 Contrast
reversal 0
0 1 2 3 4 5 6 7 8 9 10
–0.05 Shape # 1 2 3 4
Time (sec)
e g

Completely different Same depth b 40 d

Spikes/s
20
Figure-ground reversal

(d)
0
Shape # 1 2 3 4
0.25 Same shape
% signal change from fixation baseline

f h
0.20

0.15
Identical

0.10 Mirror reversal

0.05

0 1 2 3 4 5 6 7 8 9 10
–0.05
Time (sec)

Fig. 17.5  Neurophysiological studies showing the relationship between the depth order of surfaces
and the neural activity reflecting their shapes. A. From Kourtzi and Kanwisher (2001). a. The ‘same
shape’ condition with reversed depth order. An object is perceived to be behind the bars (left) or
in front of the bars (right). b. The ‘same contour’ condition with reversed depth order. Using a
stereoscope, the depth order of the two halves in the image can be reversed, the figure (F) could
be the left half (left) or the right half (right). c. FMRI recording from LOC (lateral occipital complex
in human) showing the equivalent amount of adaptation when the same shapes are presented
in sequence, irrespective of the reversal of the depth order (orange: same shape with reversed
depth order, red: same shape without the reversal). (Reprinted with permission from Kourtzi and
Kanwisher, 2001) B. From Baylis and Driver (2001). a. Stimuli used. Note that in the contrast reversal
and the mirror reversal, the shape of the surface that is perceived to be a figure is the same. Only
in the figure-ground reversal, the other side of the central boundary becomes the figure (hence the
shape of the perceived figure changes). b. A representative pattern of responses from a single cell in
IT (inferior temporal cortex in macaque). The numbers 1~4 correspond to the different shapes and
the letters a~h correspond to the figural surfaces indicated inside the figure in a. The overall pattern
of the plot does not change significantly by the contrast reversal or the mirror reversal, but it does by
the figure-ground reversal.
Reprinted by permission from Macmillan Publishers Ltd: Nature Neuroscience, 4(9), Gordon C. Baylis and Jon
Driver, Shape-coding in IT cells generalizes over contrast and mirror reversal, but not figure-ground reversal,
pp. 937–942, doi:10.1038/nn0901-937, Copyright © 2001, Nature Publishing Group.
Neural Mechanisms of Figure-ground Organization 353

disparity specifies that one of the two regions is figure (Figure 17.5Ab), adaptation is observed only
when the same region is presented as a figure in the second presentation (Figure 17.5Ad). Based
on these results, Kourtzi and Kanwisher suggested that figure-ground processing occurs prior to
shape perception. Baylis and Driver (2001) used elaborated images (Figure 17.5Ba) in combina-
tion with single-unit recordings from monkeys. In these images, the central border was either kept
constant or mirror-reversed and contrast polarity was reversed. In addition, by creating borders
to enclose one of the two divided regions, they created eight different images. In these images,
the ‘mirror-reversal’ condition and the ‘contrast-reversal’ condition create the perception that the
figures have the same shape. In the ‘figure reversal’ condition (the opposite side is enclosed and
perceived as the figure); on the other hand, the shape of the figure is changed. The neural responses
in IT neurons showed clear correlation in the mirror-reversal and the contrast-reversal conditions
but not in the figure-reversal condition. Because the shape of the figure was kept constant in the
former two conditions while in the latter condition it changed, Baylis and Driver suggested that the
figure-ground organization influences the shape detection process in IT.
Although these neurophysiological data suggest an apparent sequence of the signal processing
with the figure-ground analysis first and the shape analysis later, they do not exclude the possibility
that the information of the two areas competing for depth order are both analysed at the higher
level. It is possible that the two competing BOWN signals for opposite owner sides are sent to the
higher level to analyse the shape information at both sides that then, in turn, influence the BOWN
computation. It is also possible that the borders between the competing areas are ‘parsed’ and being
sent to the higher level via a bypassed route as suggested by Peterson (1999, Figure 17.4C). This
transient phase of signal processing may not be reflected in the long time-scale fMRI recordings of
Kourtzi and Kanwisher, and it may not be detected in the correlation analysis of Baylis and Driver.
However, it should be noted that, so far, there is no evidence for the influence of the neural activ-
ity in IT (or LOC) on the lower-level BOWN signals. Moreover, even if this feedback occurs, the
shape-detection mechanism has to overcome the longer latency of the computation: the latency
of IT responses is much longer than the BOWN-sensitive responses and an additional conduc-
tion time is required for the feedback (see Brincat and Connor 2006; Bullier 2001). Therefore, two
possibilities still remain:  either the dynamic mutual interaction between the BOWN-sensitive
area and the shape-sensitive area indeed occurs, or there is a dissociation between low-level
‘BOWN-sensitive’ neural activity and the cognitive level of figure-ground organization. In a
dynamically organized visual system with a multi-level mutual connection (Figure 17.4D right),
the apparent sequence of the signal processing may depend on the context of each given image as
well as the state of the brain. Future research is needed to provide clearer descriptions of mecha-
nisms underlying such a dynamic system.

Computational models
The early figure-ground computational modelling work of Kienker et al. (1986) implemented
an ‘edge unit’ that was excited when a surface was present at its preferred side, and inhib-
ited when it was not. Such edge-assignment computation is in fact equivalent to BOWN
computation. Ever since this pioneering work, several computational models have been
developed for figure-ground organization (Domijan and Setic 2008; Finkel and Sajda 1992;
Grossberg 1993; Kelly and Grossberg 2000; Kumaran, Geiger, and Gurvits 1996; Peterhans and
Heitger 2001; Roelfsema et  al. 2002; Sajda and Finkel 1995; Thielscher and Neumann 2008;
Vecera and O’Reilly 1998; Williams and Hanson 1996). More relevant, after the discovery of
BOWN-sensitive neurons (Zhou et al. 2000, see ‘Discovery of border-ownership-sensitive neu-
rons’), recent computational models particularly focus on modelling the responses of these
BOWN-sensitive neurons (Baek and Sajda 2005; Craft et al. 2007; Froyen, Feldman, and Singh
(a) (b)

(c) (d)

(e) (f) (g)

B22

B00

BB1

Fig. 17.6  (a) and (b) To reproduce the opposite perceived depth order of images in Figure 17.1J
and K, the global relationships between the BOWN signals need to be reflected. The computational
models have to implement an algorithm for the global interaction so that the ownership at the
location indicated by the black dot, for example, is on the left in (a) and on the right in (b). Note that
the dashed lines here indicate the interactions and do not indicate direct axonal connections. (c) To
create the convexity preference, an algorithm must enhance the BOWN signals that are ‘facing’ each
other as shown left. In this way, BOWN signals with inward preference would be the winner, making
the interior of the enclosed boundary as the figure (right). (d) If the algorithm works in favour of the
BOWN pairs directing outward, the outside of the boundary would be the figure (foreground), and
the interior would become a hole (concavity preference). (e)~(g). BOWN computation and complex
shapes. (e) When a surface with a complex shape is presented, a rule of ‘consistency’ in BOWN
signals by detecting the convexity relationship maybe violated. In the algorithm, the pair of BOWN
Neural Mechanisms of Figure-ground Organization 355

2010; Jehee, Lamme, and Roelfsema 2007; Kikuchi and Akashi 2001; Kikuchi and Fukushima
2003; Kogo et  al. 2010; Layton, Mingoll, and Yazdanbakhsh 2012; Mihalas et  al. 2011; Sakai
and Nishimura 2006; Sakai et al. 2012; Zhaoping 2005). As described above, one of the promi-
nent properties of figure-ground perception is its context sensitivity. While BOWN signals are
assigned locally, their activity reflects the global configuration. How does the brain process such
global information?

Computational models of BOWN


In essence, the BOWN computation creates a biased response at each location of the border for
the two competing signals associated with the oppositely preferred owner side. Models differ in
their implementation of the global comparison algorithm that assigns the ‘consistency’ of the
owner side (Figure 17.6A and 17.B).
In Zhaoping’s model (2005), the BOWN signals of the line segments are compared so that con-
sistent pairs are excited and inconsistent ones are inhibited. This comparison propagates along
the borderlines. In Craft’s model (Craft et al. 2007), the ‘grouping cell’ at a higher level collects the
vector components of BOWN signals matching the inward direction of the annulus. The result is
fed back to the BOWN-sensitive neurons. The BOWN signals that prefer the inside of the annulus
as owner side are enhanced and the ones that prefer the opposite owner side are inhibited (Figure
17.2C). In Jehee’s model (Jehee et al. 2007), BOWN-sensitive cells are activated by the signals
from contour-detection cells. The contour elements forming the arm of the L-junction excite the
BOWN signals that prefer the inner area of the junction. The model is constructed hierarchically
with increasing size of receptive fields. The BOWN-sensitive cells at each layer send top-down
connections to the ones at the lower layer, thereby exciting the BOWN-sensitive cells with the
same preferred owner side and inhibiting the others.
Kogo’s model, called DISC (differentiation integration for surface completion), detects pairs
of BOWN signals that point to one another by implementing a specific geometric rule. In this
way, the model specifies whether a pair of BOWN signals is in ‘agreement’ or in ‘disagreement’
(Kogo et al. 2010; Kogo, Galli, and Wagemans 2011). The pair in agreement excite one another’s
activity and the pair in disagreement inhibit activity. All possible combinations of BOWN signals
are being compared. The integration of BOWN signals creates a depth map. In addition, there
is mutual interaction between BOWN and the depth map (see Section ‘Computation of bistable
figure-ground perception’).
In another recent model (Froyen et al. 2010), BOWN signals are not directly compared, but
instead interact with a skeleton signal consisting of an object’s medial axis, representing the basic
structure of its shape (Blum 1973; Feldman and Singh 2006). In this model, the BOWN and
the skeleton signals are quantified in a Bayesian framework in terms of posteriors, resulting in
dynamical development of the BOWN signals and the skeleton signals.
All of the algorithms reviewed above were developed to create a bias at each location in the
competition of the BOWN signals with the opposite preferred owner side. These algorithms
share a preference for convexity. The pair of BOWN signals shown in Figure 17.6C constitute

signals, B0 and B1, are considered to be ‘in agreement’ while B0 and B2 are not. (f) The grouping cells
group coherent BOWN signals within the relatively compact parts of the complex shape but may not
group distal but consistent pairs (e.g. B0 and B2) in a complex shape. (g) The model that implemented
the dynamic interaction between the skeleton signals and BOWN signals detects the ‘consistency’ of
BOWN signals such as B0, B1, and B2, based on their association to the same skeleton.
356 Kogo and van Ee

the BOWN signals of a convex region (inside being the figure). The pair in Figure 17.6D, on the
other hand, indicate the relationship of BOWN signals for a concave surface (outside being the
figure, inside being a hole). To reproduce the convexity preference, the BOWN pairs for convexity
have to gain stronger mutual excitation than the BOWN pairs for concavity. The mutual excita-
tion and inhibition rules in Zhaoping’s model, the inner side preference in Jehee’s model, as well
as the geometric definition of agreement in Kogo’s model, all work in favour of the BOWN pairs
in the convex configurations. In Craft’s model, the BOWN signals’ vector components matching
the inward direction of the annulus enable grouping of BOWN signals that point to one another.
Hence, it also favours convex configurations. Convexity preference of the visual system and its
possible origin in the ecological factors embodies Gestalt psychology (Kanizsa and Gerbino 1976;
Koffka 1935; Rubin 1958). It is possible that the enclosure of the contours of individual objects
and the general tendency of finding convex shapes in the environment may have caused the visual
system to develop such biased processing.
BOWN is not just about the computation of figure-ground organization with only one figural
surface present in the image. The model should be able to assign depth order for multiple surfaces
(Figure 17.1E). For this, the local configuration of a T-junction plays a key role. A  T-junction
is created when three surfaces with different surface properties overlap. The existence of a
T-junction strongly suggests that the surface above the top of the T is the occluder and the stem of
the T belongs to one of the surfaces that are occluded. Depth order can be modelled by process-
ing the consistency of the occluder side according to this rule (Thielscher and Neumann 2008).
Zhaoping, Craft, Kogo, and Froyen’s models, mentioned above, implemented an algorithm to
reflect the configuration of T-junctions and are capable of computing depth order for overlapping
surfaces. A different model developed by Roelfsema et al. (2002) computes filling in of textured
surfaces by reflecting the increasing size of receptive fields in the hierarchy of the visual cortex,
but it is unknown how this model incorporates depth order implied by T-junctions (note that the
configuration of T-junctions is independent of surface size).
One of the challenges of the current theories of BOWN computation is how to create BOWN
signals properly in complex shapes. This demands further elaboration of current computa-
tional models. When an object such as shown in Figure 17.6E is presented, the figure-ground
organization is immediately clear. However, the consistency-detection algorithm implemented
in, for example, Kogo’s DISC model, is not coherent with BOWN along the border of complex
shapes. The BOWN signal at the black dot (B0) is in agreement with the one that points to it,
e.g. B1. On the other hand, the BOWN signals far from it, e.g. B2, violate the ‘consistency’ rule,
while it is perceptually evident that they are in agreement. In Craft’s model, the grouping cells
with the annulus-shaped receptive field may detect the consistency of BOWN signals at close
distances within a complex shape (e.g. B0 and B1); nevertheless, the BOWN signals far apart
such as B0 and B2 would not be grouped by the grouping cells (Figure 17.6F). To detect con-
sistency of BOWN signals it may be necessary to group the grouping cells along the surface.
Although iterative computation of current models exhibits robustness to a certain extent, it
is unknown if their responses fully match human perception. The approach taken by Froyen
using the dynamic interactions of the BOWN signals and the skeleton signals may give a hint
as to how to solve this problem. As shown in Figure 17.6G, if BOWN signals belong to the
same skeleton, they are considered to be consistent (B0, B1, and B2 are all in agreement with the
skeleton of the surface).
The analysis of the onset latencies of BOWN-sensitive neural activity led von der Heydt’s
group to conclude that the BOWN signals are being grouped at a higher level with ‘grouping
cells’. Coincidentally, the research on shape recognition led to the development of the concept of
Neural Mechanisms of Figure-ground Organization 357

skeleton. Note that grouping cells are activated along the medial axis of the surface. This means
that the requirement of the BOWN signal grouping and the requirement of the shape representa-
tion have in fact merged into identical concepts. It is interesting to investigate whether the neural
activity that corresponds to the grouping and medial axis signals actually exists in the visual neu-
ral system. Lee et al. (1998) reported that the late modulation of neural activity in V1 (see ‘Brain
activity correlated to figure-ground organization and involvement of feedback’) shows a peak,
possibly reflecting the increased neural activity at the higher level associated with the centre of
the surface. They suggested that this corresponds to the medial axis computation. In more recent
work, Hung, Carlson, and Connor (2012) reported that neurons in macaque inferior temporal
cortex (IT) are tuned to the medial axis of a given object and Lescroart and Biederman (2013)
reported that fMRI signals become more and more tuned to the medial axis starting from V3
to higher processing levels in the visual cortex. The current insights concerning neural mecha-
nisms may suggest that we are now approaching an increasingly integrated view of the underlying
mechanisms.

Computation of bistable figure-ground perception


As described, border-ownership competition likely plays a key role in bistable figure-ground per-
ception, such as for the face-vase illusion (Rubin 1921). Investigation of bistable perception may
shed light on the underlying mechanisms of the figure-ground organization.
To model bistable figure-ground perception Kogo and colleagues further elaborated on the
DISC model (Kogo et al. 2011, Figure 17.7). The depth map that is being created as the results of
integration of BOWN is fed back to the lower level to influence the BOWN computation. Those
top-down feedback connections enhance the BOWN signals at each location that are in agree-
ment with the depth order, and inhibit the ones that are competing. The modified BOWN signals
are, in turn, being integrated to renew the depth map. Hence, the depth signal is enhanced by this
positive feedback at first. Due to neural adaptation, however, the depth signals gradually decay.
Due to the combination of this decay and noisy fluctuation of BOWN signals, the depth order gets
reversed. Consider an example for face-vase bistability. If at one moment in time an area, say the
face area, happens to be higher in the depth map than the other area (the vase area), the postive
feedback loop enhances the face percept at first. However, due to adaptation the depth signals
decay gradually. The noisy decaying depth signals lead to a switch in perception and the vase
becomes figure. After the switch, adaptation of face perception recovers. In this way, the depth
order of the face and the vase reverse stochastically (Figure 17.7C).

Discussion
This chapter commenced by describing the importance of assigning depth order at borders to
establish figure-ground organization. We then described that neurons in visual cortex show
responses corresponding to the perceived depth order at borders. Thus, the concept of edge
assignment, developed by behavioural studies, has a neural counterpart:  the BOWN-sensitive
neurons. Insight on the underpinning neural activity and how this activity leads to figure-ground
perception is still developing.
BOWN signals may be considered to be binary signals in the sense that occlusion cues only
indicate depth order but not quantitative depth (unlike stereo disparity). Nevertheless, consider
configurations such as in Figure 17.1E and 17.1G. In Figure 17.1E, multiple surfaces overlap. The
perceived depth between the blue rectangle and the orange oval is smaller than the perceived
depth between the blue rectangle and the green rectangle. Furthermore, Figure 17.1G indicates
358 Kogo and van Ee

Depth computation
(a) (b)
Figure
figure

FF Ground

FB

+

(c) BOWN computation

Y X
(face area–vase area)
Difference of depth

200

–200
0 200 400 600 800 1000
Iteration
Y X
Fig. 17.7  A computational model of bistable figure-ground perception. (a) It is assumed that BOWN
signal at each location is computed by the global interaction. (b) The BOWN signals are sent, through
the feedforward connections (FF), to the higher level, and are integrated to create the depth map. The
result is then sent back, through the feedback connections (FB), to the BOWN computation layer. (c)
The response of the model plotted as the depth difference between the face area and the vase area.
The positive values indicate that the face perception is dominant and the negative values indicate the
vase perception. In the model, the noise is given to BOWN signals and hence the depth values fluctuate.
Furthermore, the adaptation process and its recovery are implemented in the feedback signals. The
iteration of the feedback system creates the strong ‘face’ response at first in this example. Due to the
adaptation, the response gradually weakens and, the fluctuated response eventually reverses to the
‘vase’ response. Adaptation of the vase response causes the decrease of the response while adaptation
of the face signals is being recovered. This causes the perceptual switch again. In the long time course,
the model shows the stochastic perceptual switch between the face and the vase responses.
Reprinted from Vision Research, 51(18), Naoki Kogo, Alessandra Galli, and Johan Wagemans, Switching dynamics
of border ownership: A stochastic model for bi-stable perception, pp. 2085–98, Copyright (2011), with permission
from Elsevier.

that, when there are inconsistent occlusion cues along a border, the depth difference along the
border gradually changes. Whether the BOWN-sensitive signals in visual cortex reflect these
quantitative differences or whether these differences emerge after the BOWN signals have been
integrated for the depth map need to be answered by future research.
As described above, current computational models reflect the convexity bias that is also present
in perception. However, as shown in Figure 17.1I, this convexity preference can be overcome
Neural Mechanisms of Figure-ground Organization 359

by the consistency of the surface properties such as textures. Does the BOWN-sensitive neural
activity reflect this reversal of ownership to create the perception of holes? In more general terms,
the fact that some BOWN-sensitive neurons are also sensitive to luminance contrast (Zhou et al.
2000) suggests that they are capable of reflecting surface properties. For future research, it would
be important to study the role of the surface properties in the BOWN computation.
Neurons tuned as T-junction detectors have not been found in the visual cortex. It has been
suggested that end-stopped cells play a key role (Craft et al. 2007). Yazdanbakhsh and Livingstone
(2006) reported that end-stopped cells in V1 (macaque) are sensitive to the contrast of abut-
ting surfaces that create junctions. Whether these contrast-sensitive end-stopped cells act as
T-junction detectors that are connected to the depth-order computation process should be
answered by future research.
Although electro-physiological studies have shown that the lower level visual cortex is involved
in face-vase perceptual bistability, no direct recordings of neural activities have been reported
that can be correlated to the perceptual switch. While the input signals are kept constant for the
face-vase stimulus, the ownership keeps changing. It is known that higher-level functions, such as
attention and familiarity of shape, can influence the switch. Examining the role of feedback modi-
fication of BOWN signals in perceptual bistability would give important insight into mechanistic
organization (see also Alais and Blake, this volume, for more discussion on bistable perception).
To explain the short latency of the BOWN-sensitive components in neural responses, it has
been argued that BOWN signals must be grouped at a higher level. This opens up a new possibil-
ity in which the higher-level functions dynamically influence the BOWN signals. Whether such
grouping can be found, and where grouping is accomplished, remains to be answered. It is crucial
now, more than ever, to investigate how border detection, BOWN, depth order, shape detection,
and other functions at the higher level, are organized through dynamic feedback system.
The context sensitivity of figure-ground organization is the hallmark of Gestalt psychology.
We discussed how figure-ground perception emerges from the global configuration of the image.
This possibility invites future investigation of the neural mechanisms underlying the BOWN
computations.

References
Amir, Y., M. Harel, and R. Malach (1993). ‘Cortical Hierarchy Reflected in the Organization of Intrinsic
Connections in Macaque Monkey Visual Cortex’. Journal of Comparative Neurology 334(1): 19–46.
Appelbaum, L. G., A. Wade, V. Vildavski, M. Pettet, and A. Norcia (2006). ‘Cue-Invariant Networks
for Figure and Background Processing in Human Visual Cortex’. Journal of Neuroscience
26(45): 11695–11708.
Appelbaum, L. G., A. Wade, V. Vildavski, M. Pettet, and A. Norcia (2008). ‘Figure-Ground Interaction in
the Human Visual Cortex’. Journal of Vision 8(9).
Baek, K. and P. Sajda (2005). ‘Inferring Figure-Ground Using a Recurrent Integrate-and-Fire Neural
Circuit’. IEEE Transactions on Neural Systems and Rehabilitation Engineering 13(2): 125–130.
Baylis, G. C. and J. Driver (2001). ‘Shape-Coding in IT Cells Generalizes over Contrast and Mirror
Reversal, but not Figure-Ground Reversal’. Nature Neuroscience 4(9): 937–942.
Blum, H. (1973). ‘Biological Shape and Visual Science. I’. Journal of Theoretical Biology 38(2): 205–287.
Brincat, S. L. and C. E. Connor (2006). ‘Dynamic Shape Synthesis in Posterior Inferotemporal Cortex’.
Neuron 49(1): 17–24.
Bullier, J. (2001). ‘Integrated Model of Visual Processing’. Brain Research Reviews 36(2–3): 96–107.
Craft, E., H. Schutze, E. Niebur, and R. von der Heydt (2007). ‘A Neural Model of Figure-Ground
Organization’. Journal of Neurophysiology 97(6): 4310–4326.
360 Kogo and van Ee

Domijan, D. and M. Setic (2008). ‘A Feedback Model of Figure-Ground Assignment’. Journal of Vision
8(7): 1–27.
Fang, F., H. Boyaci, and D. Kersten (2009). ‘Border Ownership Selectivity in Human Early Visual Cortex
and its Modulation by Attention’. Journal of Neuroscience 29(2): 460–465.
Feldman, J. and M. Singh (2006). ‘Bayesian Estimation of the Shape Skeleton’. Proceedings of the National
Academy of Sciences 103(47): 18014–18019.
Felleman, D. J. and D. C. Van Essen (1991). ‘Distributed Hierarchical Processing in the Primate Cerebral
Cortex’. Cerebral Cortex 1(1): 1–47.
Finkel, L. H. and P. Sajda (1992). ‘Object Discrimination Based on Depth-from-Occlusion’. Neural
Computation 4(6): 901–921.
Froyen, V., J. Feldman, and M. Singh (2010). ‘A Bayesian Framework for Figure-Ground Interpretation’.
Advances in Neural Information Processing Systems 23: 631–639.
Girard, P., J. M. Hupé, and J. Bullier (2001). ‘Feedforward and Feedback Connections between Areas
V1 and V2 of the Monkey Have Similar Rapid Conduction Velocities’. Journal of Neurophysiology
85(3): 1328–1331.
Grossberg, S. (1993). ‘A Solution of the Figure-Ground Problem for Biological Vision’. Neural Networks
6(4): 463–483.
Hesselmann, G. and R. Malach (2011). ‘The Link between fMRI-BOLD Activation and Perceptual
Awareness is “Stream-Invariant” in the Human Visual System’. Cerebral Cortex 21(12): 2829–2837.
Hung, C.-C., E. T. Carlson, and C. E. Connor (2012). ‘Medial Axis Shape Coding in Macaque
Inferotemporal Cortex’. Neuron 74(6): 1099–1113.
Jehee, J. F., V. A. Lamme, and P. R. Roelfsema (2007). ‘Boundary Assignment in a Recurrent Network
Architecture’. Vision Research 47(9): 1153–1165.
Kanizsa, G. and W. Gerbino (1976). ‘Convexity and Symmetry in Figure-Ground Organization’. In Vision
and Artifact, edited by M. Henle, pp. 25–32. New York: Springer.
Kelly, F. and S. Grossberg (2000). ‘Neural Dynamics of 3-D Surface Perception: Figure-Ground Separation
and Lightness Perception’. Perception & Psychophysics 62(8): 1596–1618.
Kienker, P. K., T. J. Sejnowski, G. E. Hinton, and L. E. Schumacher (1986). ‘Separating Figure from
Ground with a Parallel Network’. Perception 15(2): 197–216.
Kikuchi, M. and Y. Akashi (2001). ‘A Model of Border-Ownership Coding in Early Vision’. In Artificial
Neural Networks—ICANN 2001, 2130, edited by G. Dorffner, H. Bischof, and K. Hornik, pp. 1069–1074.
Berlin, Heidelberg: Springer.
Kikuchi, M. and K. Fukushima (2003). ‘Assignment of Figural Side to Contours Based on Symmetry,
Parallelism, and Convexity’. In Knowledge-Based Intelligent Information and Engineering Systems, 2774,
edited by V. Palade, R. J. Howlett, and L. Jain, pp. 123–130. Berlin, Heidelberg: Springer.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace & World.
Kogo, N., C. Strecha, L. van Gool, and J. Wagemans (2010). ‘Surface Construction by a 2-D
Differentiation-Integration Process: A Neurocomputational Model for Perceived Border Ownership,
Depth, and Lightness in Kanizsa Figures’. Psychological Review 117(2): 406–439.
Kogo, N., A. Galli, and J. Wagemans (2011). ‘Switching Dynamics of Border Ownership: A Stochastic
Model for Bi-Stable Perception’. Vision Research 51(18): 2085–2098.
Kourtzi, Z. and N. Kanwisher (2001). ‘Representation of Perceived Object Shape by the Human Lateral
Occipital Complex’. Science 293(5534): 1506–1509.
Kumaran, K., D. Geiger, and L. Gurvits (1996). ‘Illusory Surface Perception and Visual Organization’.
Network-Computation in Neural Systems 7(1): 33–60.
Lamme, V. A. (1995). ‘The Neurophysiology of Figure-Ground Segregation In Primary Visual Cortex’.
Journal of Neuroscience 15(2): 1605–1615.
Neural Mechanisms of Figure-ground Organization 361

Lamme, V. A., K. Zipser, and H. Spekreijse (1998). ‘Figure-Ground Activity in Primary Visual Cortex
is Suppressed by Anesthesia’. Proceedings of the National Academy of Sciences of the United States of
America 95(6): 3263–3268.
Lamme, V. A., V. Rodriguez-Rodriguez, and H. Spekreijse (1999). ‘Separate Processing Dynamics for
Texture Elements, Boundaries and Surfaces In Primary Visual Cortex of the Macaque Monkey’. Cerebral
Cortex 9(4): 406–413.
Lamme, V. A., H. Super, R. Landman, P. R. Roelfsema, and H. Spekreijse (2000). ‘The Role of Primary
Visual Cortex (V1) in Visual Awareness’. Vision Research 40(10–12): 1507–1521.
Layton, O. W., E. Mingolla, and A. Yazdanbakhsh (2012). ‘Dynamic Coding of Border-Ownership in
Visual Cortex’. Journal of Vision 12(13): 8, 1–21.
Lee, T. S., D. Mumford, R. Romero, and V. A. Lamme (1998). ‘The Role of the Primary Visual Cortex in
Higher Level Vision’. Vision Research 38(15–16): 2429–2454.
Lescroart, M. D. and I. Biederman (2013). ‘Cortical Representation of Medial Axis Structure’. Cerebral
Cortex 23(3): 629–637.
Levitt, J. B., D. C. Kiper, and J. A. Movshon (1994). ‘Receptive Fields and Functional Architecture of
Macaque V2’. Journal of Neurophysiology 71(6): 2517–2542.
Likova, L. T. and C. W. Tyler (2008). ‘Occipital Network for Figure/Ground Organization’. Experimental
Brain Research 189(3): 257–267.
Mihalas, S., Y. Dong, R. von der Heydt, and E. Niebur (2011). ‘Mechanisms of Perceptual Organization
Provide Auto-Zoom and Auto-Localization for Attention to Objects’. Proceedings of the National
Academy of Sciences of the United States of America 108(18): 7583–7588.
Nakayama, K., S. Shimojo, and G. H. Silverman (1989). ‘Stereoscopic Depth: Its Relation to Image
Segmentation, Grouping, and the Recognition of Occluded Objects’. Perception 18(1): 55–68.
Parkkonen, L., J. Andersson, M. Hämäläinen, and R. Hari (2008). ‘Early Visual Brain Areas Reflect the
Percept of an Ambiguous Scene’. Proceedings of the National Academy of Sciences of the United States of
America 105(51): 20500–20504.
Peterhans, E. and F. Heitger (2001). ‘Simulation of Neuronal Responses Defining Depth Order and
Contrast Polarity at Illusory Contours in Monkey Area V2’. Journal of Computational Neuroscience
10(2): 195–211.
Peterson, M. A., E. M. Harvey, and H. J. Weidenbacher (1991). ‘Shape Recognition Contributions to
Figure-Ground Reversal: Which Route Counts?’ Journal of Experimental Psychology: Human Perception
and Performance 17(4): 1075–1089.
Peterson, M. A. and B. S. Gibson (1993). ‘Shape Recognition Inputs to Figure-Ground Organization in
Three-Dimensional Displays’. Cognitive Psychology 25(3): 383–429.
Peterson, M. A. (1999). ‘What’s in a Stage Name? Comment on Vecera and O’Reilly (1998)’. Journal of
Experimental Psychology: Human Perception and Performance 25(1): 276–286.
Peterson, M. A. and E. Salvagio (2008). ‘Inhibitory Competition in Figure-Ground Perception: Context and
Convexity’. Journal of Vision 8(16): 1–13.
Pitts, M. A., A. Martínez, J. B. Brewer, and S. A. Hillyard (2011). ‘Early Stages of Figure-Ground
Segregation during Perception of the Face-Vase’. Journal of Cognitive Neuroscience 23(4): 880–895.
Pitts, M. A., J. L. Nerger, and T. J. R. Davis (2007). ‘Electrophysiological Correlates of Perceptual
Reversals for Three Different Types of Multistable Images’. Journal of Vision 7(1): 6, 1–14.
Polimeni, J. R., M. Balasubramanian, and E. L. Schwartz (2006). ‘Multi-Area Visuotopic Map Complexes
in Macaque Striate and Extra-Striate Cortex’. Vision Research 46(20): 3336–3359.
Poort, J., F. Raudies, A. Wannig, V. A. Lamme, H. Neumann, and P. R. Roelfsema (2012). ‘The Role
of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual Cortex’. Neuron
75(1): 143–156.
362 Kogo and van Ee

Qiu, F. T., von der Heydt R. (2005) Figure and ground in the visual cortex: v2 combines stereoscopic cues
with Gestalt rules. Neuron 47(1): 155–66.
Qiu, F. T., T. Sugihara, and R. von der Heydt (2007). ‘Figure-Ground Mechanisms Provide Structure for
Selective Attention’. Nature Neuroscience 10(11): 1492–1499.
Roelfsema, P. R., V. A. Lamme, H. Spekreijse, and H. Bosch (2002). ‘Figure-Ground Segregation in a
Recurrent Network Architecture’. Journal of Cognitive Neuroscience 14(4): 525–537.
Rubin, E. (1921). Visuell wahrgenommene figuren. Copenhagen: Glydenalske bogahndel.
Rubin, E. (1958). ‘Figure and Ground’. In Readings in Perception, edited by D. Beardslee, pp. 35–101.
Princeton: Van Nostrand.
Sajda, P. and L. H. Finkel (1995). ‘Intermediate-Level Visual Representations and the Construction of
Surface Perception’. Journal of Cognitive Neuroscience 7(2): 267–291.
Sakai, K. and H. Nishimura (2006). ‘Surrounding Suppression and Facilitation in the Determination of
Border Ownership’. Journal of Cognitive Neuroscience 18(4): 562–579.
Sakai, K., H. Nishimura, R. Shimizu, and K. Kondo (2012). ‘Consistent and Robust Determination of
Border Ownership Based on Asymmetric Surrounding Contrast’. Neural Networks 33: 257–274.
Scholte, S., J. Jolij, J. Fahrenfort, and V. Lamme (2008). ‘Feedforward and Recurrent Processing in Scene
Segmentation: Electroencephalography and Functional Magnetic Resonance Imaging’. Journal of
Cognitive Neuroscience 20(11): 2097–2109.
Sugihara, T., F. T. Qiu, and R. von der Heydt (2011). ‘The Speed of Context Integration in the Visual
Cortex’. Journal of Neurophysiology 106(1): 374–385.
Supèr, H., H. Spekreijse, and V. A. Lamme (2001). ‘Two Distinct Modes of Sensory Processing Observed in
Monkey Primary Visual Cortex (V1)’. Nature Neuroscience 4(3): 304–310.
Supèr, H., C. van der Togt, H. Spekreijse, and V. A. Lamme (2003). ‘Internal State of Monkey Primary
Visual Cortex (V1) Predicts Figure-Ground Perception’. Journal of Neuroscience 23(8): 3407–3414.
Supèr, H. and V. A. Lamme (2007). ‘Altered Figure-Ground Perception in Monkeys with an Extra-Striate
Lesion’. Neuropsychologia 45(14): 3329–3334.
Thielscher, A. and H. Neumann (2008). ‘Globally Consistent Depth Sorting of Overlapping 2D Surfaces in
a Model Using Local Recurrent Interactions’. Biological Cybernetics 98(4): 305–337.
Vecera, S. P. and R. C. O’Reilly (1998). ‘Figure-Ground Organization and Object Recognition Processes: An
Interactive Account’. Journal of Experimental Psychology: Human Perception and Performance
24(2): 441–462.
Vecera, S. P. and R. C. O’Reilly (2000). ‘Graded Effects in Hierarchical Figure-Ground Organization: Reply
to Peterson (1999)’. Journal of Experimental Psychology: Human Perception and Performance
26(3): 1221–1231.
Williams, L. R. and A. R. Hanson (1996). ‘Perceptual Completion of Occluded Surfaces’. Computer Vision
and Image Understanding 64(1): 1–20.
Windmann, S., M. Wehrmann, P. Calabrese, and O. Gunturkun (2006). ‘Role of the Prefrontal Cortex in
Attentional Control over Bistable Vision’. Journal of Cognitive Neuroscience 18(3): 456–471.
Yazdanbakhsh, A. and M. S. Livingstone (2006). ‘End Stopping in V1 is Sensitive to Contrast’. Nature
Neuroscience 9(5): 697–702.
Zhang, N. and R. von der Heydt (2010). ‘Analysis of the Context Integration Mechanisms Underlying
Figure-Ground Organization in the Visual Cortex’. Journal of Neuroscience 30(19): 6482–6496.
Zhaoping, L. (2005). ‘Border Ownership from Intracortical Interactions in Visual Area V2’. Neuron
47(1): 143–153.
Zhou, H., H. S. Friedman, and R. von der Heydt (2000). ‘Coding of Border Ownership in Monkey Visual
Cortex’. Journal of Neuroscience 20(17): 6594–6611.
Chapter 18

Border inference and border


ownership: The challenge of
integrating geometry and topology
Steven W. Zucker

Introduction
A little over a century ago Sherrington (1906) established the concept of the receptive field in neu-
rophysiology. This was taken into the visual system by Hartline (1938) and Kuffler (1953), elabo-
rated into simple, complex and other classes of neurons by Hubel and Wiesel (1977), and elevated
into a neural doctrine by Barlow (1972). Central among the properties that emerged by study-
ing receptive fields is orientation selectivity. This became an organizing principle for explaining
boundary perception (among other visual features) (Hubel and Wiesel 1979), and much of mod-
ern visual neurophysiology is built on these foundations. So are substantial parts of computational
neuroscience. Computationally networks of these neurons whose properties are defined by recep-
tive fields are taken to define the machinery that supports boundary inferences.
A little less than a century ago Gestalt psychologists discovered a very different aspect of bound-
ary perception. Rubin (1915) produced a striking example of a reversible figure (Figure 18.1a). It
consists of black and white regions: in one organization the goblet becomes the figure and the dark
regions the background; in the other organization the dark faces become figure(s) and the white
region background. Figure and ground provided one part of the foundation for the Gestalt laws
of perceptual organization.
Rubin’s figure opened the door into a subtle property of boundaries: border ownership (Koffka
1935). In words, boundaries belong to the figure and not the ground. As the Rubin figure alternates,
so do the regions perceived as figure and ground, and so does the property of border ownership.
The entire process seems automatic, fast, and effortless. Paradoxically, while the figure/ground
and border ownership are alternating, the boundary remains fixed in retinal position: regardless
of which figural organization is perceived, the boundary contour passes through the same image
locations. It may, however, vary in apparent depth.
Understanding border ownership is important for understanding vision. At the top level is
the integration of the phenomenology with neural computation. But looking deeper reveals a
kind of catch-22 inherent in these computations: while borders define the figures they enclose,
border ownership depends on the figure. Cells with orientation-selective receptive fields signal
local information; border ownership requires global (figural) information. This observation has
enormous implications for the definition of a visual receptive field and for understanding visual
computations more generally.
The challenge for understanding border ownership is to break this mutual dependence.
Figure 18.1b illustrates how subtle this can be. The concept of figure is a difficult one to pin down,
364 Zucker

(a) (b) (c) (d)

Fig. 18.1  Different “sides” of border and figural phenomena in perceptual organization. (a) Rubin’s vase:
the fixed border is perceived as belonging to the figure, not the background. Border ownership switches
with the figure/ground reversal, as does the position of an apparent light source.
(Reprinted from Computer Vision and Image Understanding, 85(2), Michael S. Langer and Steven W. Zucker,
Casting Light on Illumination: A Computational Model and Dimensional Analysis of Sources, pp. 322–35.
Copyright © 1997 with permission from Elsevier).
(b) Borders can induce apparent shape from shading, although the disc is constant in brightness.
I thank R. Shapley for this figure.
(Reproduced from Perception and Psychophysics, 37(1), pp 84–88, Nonlinearity in the perception of form, Robert
Shapley and James Gordon, Copyright © 1985, Springer-Verlag. With kind permission from Springer Science and
Business Media).
(c,d) In some cases borders can be too complicated to induce global figures.

and often it is related to surfaces and the many different facets of objects (Nakayama and Shimojo
1992). This example (Gordon and Shapley 1985) shows how adept we are at perceiving smooth
surfaces (and their shading) even when none is present! In a related observation, the apparent
position of the light source shifts in Figure 18.1a (Langer and Zucker 1997).

Perceptual organization across levels


. . . For some concepts of physics and of biology must be clearly
understood if serious errors are to be avoided.
(Köhler 1969, p. 62)
Perceptual organization and emergent Gestalt effects have fascinated and preoccupied research-
ers for more than a century (Wagemans et al. 2012). This handbook attests to the richness and
variety of the phenomena plus the experimental and theoretical approaches to studying them.
But this richness also points to a difficulty: At which level should explanations be put forth: phe-
nomenological or conceptual or psychological or computational or neurophysiological? Or all?
(See Figure 18.2). We adopt a neurogeometric perspective. The concept of figure is perhaps at the
highest level while the machinery of neural computation is defined at the molecular and cellular
levels. Somewhere in the middle is the network level, and this is the type of abstraction normally
employed in building models.
What is most compelling about the Gestalt phenomena is how they demand integration between
levels. But in practice this integration is rarely attempted. Rather, two heuristics are commonly
employed. (1) Decomposition into functional tasks by association with a visual area, for exam-
ple claiming that V4 is the site of color constancy (Zeki and Shipp 1988). Although anatomical
constraints relaxed this decomposition into streams, such as the form, color and stereopsis path-
ways (Hubel and Livingstone 1987), should form be separated from stereopsis? (2) Marr’s (1982)
separation of computational levels asserted that the problem definition should not depend on the
Border Inference and Border Ownership 365

algorithm to solve it nor on its implementation. Although there may be many algorithms that
solve a given problem, and many ways to implement a particular algorithm, it may be precisely
the details of “implementation” (Figure 18.2) that provide the clue to understanding the problem.
Intuition from one level can inform modeling at another.
The challenge for understanding border ownership, in particular, is that any explanation must
in principle span all of these levels. The question is how to use them to help define the problem.
To make these general claims concrete, this chapter contrasts two lines of investigation. The first
abstracts neural computation in geometric terms. We start with finding those contours that com-
prise borders, and build the ideas into surface inference via stereo and shading analysis. Although
the circuit models (and mathematics) become more complex, the path through these different
inference tasks displays a common thread. In effect (all) different possibilities are present in a kind
of distributed code, and local conditions select from among them. The principle of good continu-
ation dominates, and global configurations are built from local ones. This defines one of the major
aspects of visual processing.
Border ownership, we argue, is different. Whether a figure is indicated (at a boundary position)
or not is a choice driven not by geometrical good continuation but rather by whether a border
exists that could enclose something. The details do not matter (very much) and global considera-
tions drive local ones. Instead of geometry the question is more one of topology, but in a softer
way than this notion is considered in mathematics. This can be thought of as a different aspect of
visual processing.
While distinct, these two aspects of visual modeling are not uncoupled, and therein, I believe,
lies the real challenge of border ownership. It is not just a question of integrating top-down with
bottom-up (Ullman et al. 2002); it is a question of how to do this without getting lost in the myriad
combinatorial possibilities that arise.
Our goal in this chapter is to help the reader find a path through these different possibilities. In
the end we develop a conjecture about border ownership, neural networks, and local fields that

(a) (b) (c) (d) (e)

Perception Neural Networks Biophysics


V2

V1

Fig. 18.2  Biological levels of explanation for perception vary with scale. (a) At the most macroscopic
scale, the visual system involves nearly half of the primate cortex plus sub-cortical and retinal
structures. (b) The first two cortical visual areas, V1 and V2, are shown. The existence of feedforward
and feedback connections between them establishes the networked nature of visual processing.
(c) Within each visual area are layers of neural networks, with neural projections between cells
in a layer and between layers. We shall abstract such networks into a columnar organization.
(d) Networks among neurons are established at synapses. Rarely considered in neural modeling is the
presence of glia (a portion of one of which, an astrocyte, is shown). These non-neuronal cells will be
important when we consider models for border ownership. (e) Finally, there are neurotransmitters,
modulators and other mechanisms at the biophysical level. The tradition in modeling is to
concentrate at (c), the neural networks level, but thinking about all levels can inspire theories.
366 Zucker

could provide a principled approach to doing this. But it is only one way of putting the different
ingredients together. As we hope becomes clear, border ownership is a challenge and a goal that
drives one to consider: What are the general themes that guide perceptual organization, and at
what level should they be described? We start with a review of the border ownership problem.

The border ownership problem


. . . I have embarked on something which must lead somewhere.
So now I feel almost on top of the world. Edgar Rubin in letter
to Niels Bohr, May 1912.
(Quoted in Pind (2012, p. 90))
Border ownership establishes that there is more to orientation-selective responses in early visual
neurons than their contour context. von der Heydt and colleagues Zhou, Friedman, and von der
Heydt (2000) discovered neurons early in the primate visual system that respond according to
what appear to be border ownership configurations (Figure 18.3). Although the local pattern
“seen” by the receptive field remains identical, some neurons respond more vigorously when the
edge defines e.g. a dark square; for others the opposite (Zhou et al. 2000). The interpretation is that
this neuron prefers e.g. “dark” figures against a light background and is signaling that it is part of
a dark figure.
Not all cells show a border-ownership response. Many in the first cortical visual area, V1,
do not; they respond mainly to the edge brightness configuration. However the fraction of
border-ownership responding cells increases significantly in the next higher visual areas (V2
and V4); it is for this reason that intermediate-level effects are implicated.
A subtle aspect of border ownership is that sometimes the details matter, and sometimes they
do not. Distant completions are a case in point (Figure 18.3c, d): whether the figure is a circular
disc or a wavy square does not matter; only that it is a figure. This is in contrast to border infer-
ence, where the details do matter. To determine whether putative edge elements fit together for-
mally depends on the curvature. Along the sides of the circle the curvature is constant; along an
ellipse it changes in a slow, but regular fashion. The curvature is zero along the sides of a square,
and undefined at the corners. This distinction—whether the details matter or not—illustrates a
major difference between the two aspect of visual processing laid out in the Introduction.
If one were to draw the circle on a sheet of rubber, the sheet could be stretched (without tearing)
until the disc became an ellipse. Such rubber-sheet distortions are the heart of topology, where
the key invariant is closure. There remains a well-defined inside and a well defined outside on the
sheet.
This mathematical distinction also runs through this chapter. The Gestalt notion of good con-
tinuation, we maintain, can be viewed fruitfully from the perspective of (differential) geometry,
while the notion of border ownership involves closure. As with much of biology, however, these
ideas have to be developed carefully before they can be applied to perception. The sheet cannot be
stretched in all the ways available to a mathematician or challenges to our visual system’s ability to
deal with complexity arise (Dubuc and Zucker 2001; see Figure 18.1). These are classically global
computations (Minsky and Papert 1969); how to relax them is discussed later.

The geometry of good continuation


Perhaps the most basic of the principles of perceptual organization is the concept of good continu-
ation. While it is normally considered mainly along boundaries (discussed next), this is just the
Border Inference and Border Ownership 367

(a) (b) (c) (d)

Fig. 18.3  The combinatorial complexity relating receptive fields and border ownership. (a) A dark figure
on a white background and (b) a white figure on a dark background present identical local patterns
to a neuron (small ellipse denotes receptive field). The border ownership response (Zhou et al. 2000):
those neurons preferring a dark figure, for example, would respond more vigorously to pattern (a) than
to (b); others might prefer light figures; and still others might not be border-ownership selective at
all. The light-dark pattern within the receptive field does not change, only the global arrangement of
which it is a part. (c,d) Other variations should respond similarly. The difficulty is to develop a circuit that
not only provides a border ownership response, but does so in a manner that is invariant to the global
completion.
Data from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of Border Ownership in Monkey
Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611, 2000.

beginning. Viewing good continuation geometrically provides very powerful tools for analysis,
which can be extended onto surfaces, thus opening the door to areas such as stereo correspondence
and even shape-from-shading. Thinking of these tasks from the perspective of perceptual organiza-
tion provides a refreshing relationship among them. We review briefly three steps along this path.

Boundary inference from contour geometry


We may generalize thus: any curve will proceed in its own
natural way, a circle as a circle, an ellipse as an ellipse, and
so forth.
(Koffka (1935, p. 153)).
Only where there is no straight (or otherwise smooth)
continuation at the corners does a break occur by itself.
(Metzger (2006, p. 18)).

Boundary detection seems straightforward. It is known that visual cortex contains neurons selec-
tive for different orientations, with each position covered by cells tuned to each orientation (Figure
18.4a,b). This suggests a classical approach: simply convolve an operator modeling an orientation-
selective receptive field against the image, simulating the neurons’ responses, and choose those with
high values. Unfortunately these purely local approaches simply do not work. Noise, additional
microstructure in the image, and the properties of object reflectance conspire to alter the responses
from the ideal. Some additional interactions are required, and this becomes our first view of local
and global interactions in boundary inference. (Later, when considering the units comprising a bor-
der-ownership model, we shall be forced to question this filtering view of receptive fields as well.)
Exploiting the functional organization of visual cortex, those neurons whose classical receptive
field centers overlap yields a columnar model for the superficial (upper) layers of visual cortex, V1
(a) (b) (c) Vi

II-III

IV

(d) (e) (f) Ideal models lifted


to R2 x S1
The osculating circle approximates
a curve in the neighborhood of a point

Incompatible
tangent

True image Compatible


curve q tangent
θ θ
y
y y q
x Local tangent
x x

(g) (h) (i)

Fig. 18.4  Detection of local boundary signals. (a) Individual neurons in visual cortex are selective to dark/
bright pattern differences in the visual field; this is depicted by the (b) Gabor model of a receptive field.
Since such local measurements are noisy, contextual consistency along a boundary can be developed
geometrically. This involves circuits of neurons (c) that possess both local and long-range horizontal
connections. (d) Orientation columns abstract the superficial layers of V1. Rearranging the anatomy
yields groups of neurons (a column) selective for every possible orientation at each position in the visual
array. These columns are denoted by the vertical lines, indicating that at each retinotopic (x, y)-position
all (θ)-orientations are represented. Long-range horizontal connections define circuits among these
neurons, enforcing consistent firing among those (e) representing the orientations along a putative
contour. Geometry enters when we interpret an orientationally-selective cell’s response as signaling
the tangent to a curve. This tangent can in effect be transported along an approximation to the curve
(indicated as the osculating circle) to a nearby position. Compatible tangents agree in position and
orientation. (f) The transport operation can be “hardwired” in the long range connections, shown as
the “lift” of an arc of (osculating) circle in the (x, y)-plane into a length of helix in (x, y, θ) coordinates.
The result is a model for connection patterns in visual cortex indicating (g) straight, (h) small curvature,
or (i) high curvature.
Reproduced from Steven Zucker and Ohad Ben-Shahar, Geometrical computations explain projection patterns
of long-range horizontal connections in visual cortex Neural Computation, 16:3 (March , 2004), pp. 445–476
© 2004 Massachusetts Institute of Technology.
Border Inference and Border Ownership 369

(Hubel and Wiesel 1977). Although a mathematical simplification, it is useful for organizing com-
putations. In Figure 18.4d such orientation columns are denoted by vertical lines, indicating that
at each (x,y)-position in the retinotopic array (a discrete sampling of) all (θ) orientations are
represented.
We concentrate on these upper layers, and sketch several of the anatomical projections to and
from them. This, of course, is only a rough sampling (Casagrande and Kaas 1994, Douglas and
Martin 2004) of the many layers of visual processing (Felleman and Essen 1991).
1 Feedforward projections from layer 4 to layers 2/3 build up the local response properties.
These are likely supported by local circuits within layers 4 and layers 2/3 as well (Miller 2003;
Sompolinsky and Shapley 1997). Superficial V1 also has an organization into cytochrome
oxidase blobs and interblob areas, a distinction we shall not pursue in this chapter.
2 Long range horizontal connections (LRHC’s) (Rockland and Lund 1982; Bosking et al. 1997;
Angelucci et  al. 2002; Figure 18.4c) define circuits among layer 2/3 neurons. Anatomical
studies reveal that these intrinsic connections are clustered (Gilbert and Wiesel 1983) and
orientation-dependent (Bosking et  al. 1997), leading many to believe that consistent firing
among neurons in such circuits specifies the orientations along a putative contour (Kapadia
et al. 1995; Zucker et al. 1989; Field et al. 1993). This, in effect, uses context (along the contour)
to remove noisy responses that are inconsistent with their neighbors’ responses. It could also
reinforce weak or missing responses blocked by image structure.
3 Feedforward projections from layers 2/3 in V1 to higher visual areas (Salin and Bullier 1995;
Angelucci et al. 2002). V2, for example, has an elaborate organization into subzones as well,
including the thin, thick, and pale stripe areas (Roe and Ts’o 1997).
4 Feedback projections from higher visual areas to earlier visual areas (Rockland and Virga
1989; Angelucci et al. 2002). The structure of these feedback signals will be a significant
feature of models for border ownership, and is discussed in more detail later. For now we
emphasize that these feedback connections are patchy rather than targeted (Shmuel et al.
2005; Muir et al. 2011).
We now discuss the LRHC’s, because these are so naturally associated with boundary process-
ing (Adini et  al. 1997). We concentrate on geometric properties to emphasize the connection
to good continuation. For a discussion of psychophysical properties, see Elder and Singh, this
volume. A model is sketched for V1 (Ben-Shahar and Zucker 2003) that predicts the first and
second order statistics of LRHC’s (Bosking et al. 1997). It could also subserve contrast integration
(Bonneh and Sagi 1998) and, over a larger scale, model (some of) the projections to V2 (Zucker
et al. 1989). As we show, however, these are insufficient for the border ownership problem, which
will require us to think more carefully about feedback projections.
Differential geometry provides a formalization of good continuation over short distance scales.
It specifies how orientations align along a contour. Interpreting the orientationally-selective cell’s
response as signaling the tangent to a curve, this tangent can be transported along an approxima-
tion to the curve (indicated as the osculating circle) to a nearby position.
Compatible tangents are those that agree with sufficient accuracy in position and orientation
following transport; this is co-circularity. The transport operation can be embedded in the long
range connections, and realized both geometrically (Figure 18.4f or in the retinotopic plane
(Fig. 18.4g,h,i. As we shall describe, many models of border ownership are based on similar ideas,
although it is the topological orientation (toward inside or outside of the figure) that is communi-
cated via the long-range horizontal projections.
370 Zucker

Sometimes complexity can reveal simplicity, and by lifting contours from the image into cor-
tical coordinates we show how Wertheimer’s (1923) original demonstration of the Principle of
Good Continuation simplifies. Crossing curves become simple in cortical coordinates (Figure
18.5). The intuition is that, like inertial motion of an object, things tend to keep going in the direc-
tion they were going. Only now it is in a geometric space (Parent and Zucker 1989; Sarti et al.
2008). At a discontinuity there are multiple orientations at the same position. They signal what
often amounts to a monocular occlusion event (Zucker et al. 1989); a contour ending can signal a
cusp (Lawlor et al. 2009).
It is important to note that not all discontinuities are visible, especially when individual con-
tours combine into a texture. Figure 18.5d shows what appears as a wavy surface behind occlud-
ers. Classical amodal completion (Kanizsa 1979) works to suggest a smooth surface even when
there are different numbers of stripes in each zone. (Such dense patterns will be relevant for shad-
ing analysis, shortly.)

(a) (b) (c)


480
360
θ (deg)

240
120
0
2
1.5
1
0.5
0
–0.5
–1 1
0.5
–1.5 0
y –0.5
–2 –1
x

(d)

Fig. 18.5  Good continuation in (x, y, θ)-space explains why the “figure 8” in (a) is not seen as (b)
two “just touching” closed contours. The lift separates the crossing point into two distinct levels
(c), one corresponding to the lower orientation and the other to the higher value of orientation.
The lift further provides an early representation of corners and junctions, for example at points
of monocular occlusion. (d) For textures there is completion across occluders, even though there
are different numbers of contours in each segment; this is relevant to texture and shading flow
continuations.
Border Inference and Border Ownership 371

Good continuation for stereo correspondence


. . . a perspective drawing, even when viewed monocularly,
does not give the same vivid impression of depth as the same
drawing if viewed through a stereoscope with binocular
parallax . . . for in the stereoscope the tri-dimensional force of
the parallax co-operates with the other tri-dimensional forces
of organization; instead of conflict between forces, stereoscopic
vision introduces mutual reinforcement.
(Koffka (1935, pp. 161–162))
What are the tri-dimensional forces of perceptual organization, especially good continuation,
and how might they be used to solve the stereo correspondence problem? Normally stereo is
approached via spatial disparity. But working with the geometrical idea of good continuation, the
question becomes: which edge (or tangent) in the left image goes with which edge (tangent) in
the right image? In biological terms, how are responses of cells in the left/right ocular dominance
columns related to one another in V1 and V2 (Poggio and Fisher 1977; Roe and Ts’o, 1997)?
The geometry builds upon the 2D setup for curves in an image (Figure 18.4e). There good con-
tinuation came from transporting an edge via co-circularity: when the transported tangent agreed
with a measured one (at the new position), both were reinforced. Now consider a curve meander-
ing through space, e.g., a tree branch. Instead of studying good continuation in the image, we shall
study good continuation in the 3D world. But this is not what is given, it is what is sought. The
givens are a pair of images, one to the left eye and one to the right, each of which contains a 2D
curve (Figure 18.6). The problem is to determine which local edge from the left-image 2D curve
agrees with an edge from the right 2D image.
To answer this, we have to consider good continuation in 3D (Li and Zucker 2006). Rephrasing: a
short segment of the 3D curve, say its tangent, projects to a tangent in the left image and another
in the right image. Moving slightly along the 3D space curve leads to another 3D tangent, which
projects to another pair in 2D. Grouping pairs with pairs again requires an approximation; in this
case, a short length of a helix in 3D generalizes the circle in 2D co-circularity (Figure 18.6). Thus
the stereo problem is solved by asking: which tangent pairs, when transported along a helix, match
which other pairs. This is how the results in Figure 18.6e,f were obtained.
The machinery that results can again be formulated as a set of connections that generalize those
for co-circularity. They could potentially be realized in the V1  → V2 projection, within V2, or
perhaps in higher areas. There exists evidence that such responses are available in V4 (Hinkle and
Connor 2002) and psychophysics supports (at least) co-linear facilitation in depth (Huang et al.
2012). Moreover, rivalry results when non-matching oriented patterns are used (Hunt et al. 2012).
Much more needs to be done regarding good continuation in depth.
As with 2D curves, the good continuation approach to solving stereo correspondence for
space curves relies on curvatures. Another leap is required when stereo for surfaces is considered
(Figure 18.6). Now, instead of a tangent to a surface there is a tangent plane, and it rotates depend-
ing on which direction it is transported. In other words, the curvature varies in every direction
for a general surface. To build intuition, consider slicing an apple: for every direction in which the
knife is pointed (the direction of movement) a different cut (surface curve) is made. Each cut has a
curvature. Thus it is easier to work with the surface normal and how this varies as it is transported
in different directions along the surface. Details regarding how to solve the stereo problem for
surfaces and be found in Li and Zucker (2010); for now we turn to shading analysis.
372 Zucker

(a) (b) (c)

(d) (e)
Transport in R3

i j
M
y
x Tp(M)
z
C1
p q
pair i N(p) N(q)
Il Ir

Cr
yl
yr
xl
pair j xr

Fig. 18.6  The stereo problem for space curves. (a, b) Tree branches meander through depth and
may appear in different ordering when projected into the left and right eyes (highlighted box).
(c) Color-coded depth along the branches. In early visual areas the boundaries of these branches
are complicated arrangements of short line segments (tangents) inferred from the left and right
images. Notice the smooth variation of depth along the branches, even though they occasionally
cross one another. (d) Geometry of stereo correspondence: pairs of projected image tangents need
to be coupled to reveal a tangent in space. Good continuation (in space) then amounts to good
continuation among pairs of (left, right) tangents. (e) The stereo problem for surfaces can be posed in
similar terms, except now the surface normal drives the computation.
Reproduced from International Journal of Computer Vision, 69(1), pp 59–75, Contextual Inference in Contour-
Based Stereo Correspondence, Gang Li and Steven W. Zucker, Copyright ©2006, Kluwer Academic Publishers.
With kind permission from Springer Science and Business Media.

Good continuation for shape-from-shading


The emergence of depth from shading cues is no more
miraculous than the emergence from two flat retinal images of
the perceived world that extends in depth as well as in height
and width.
(Metzger (2006, p. 106)).
Border Inference and Border Ownership 373

The curvature of the body is the betrayer, light and shadow are
its accomplices.
(Metzger (2006, p. 107)).
Although the Gestalt psychologists realized intuitively that the inference of shape from shad-
ing information involved some of the same ideas as good continuation, to our knowledge it is
rarely approached in that fashion. Instead the stage was set initially by Ernst Mach in the 1860’s
(see Ratliff (1965) and taken up with enthusiasm in computer vision (Horn and Brooks 1989).
However, none of these approaches involved perceptual organization; they were based either
on a first-order differential equation or on regularization techniques. We now sketch a percep-
tual organization approach to inferring shape from shading information, based on the model in
Kunsberg and Zucker (2014) and Kunsberg and Zucker (2013), to provide a flavor of how general
geometric good continuation can be.
In each of the previous problems good continuation was used to provide constraints between
nearby possible interpretations—e.g., how nearby orientations behave along a curve with each
interpretation deriving from an image measurement. For the inference of shape from shading
information, we start with the cortical representation of the shading (Figure 18.7a). Ideally, cells
tuned to low spatial frequencies will respond maximally when, e.g. the excitatory receptive field
domain is aligned with the brighter pixels; the inhibitory domain of an oriented receptive field will
then align with the darker regions. These maximal-responding cells define the shading flow field
in cortical space (Breton and Zucker 1996).
Corresponding to this shading flow is an illuminated surface, and therein lies the heart of the
difficulty: the surface is situated in 3D space, the light source is situated in 3D space (relative to the
surface and the viewer) but the image is only 2D. Solving this inverse problem will require both
assumptions about how images are formed and what types of surfaces exist in the world.
The trick is to think about what happens on the surface when you move through the shading
flow field. Taking a step in the direction signaled by a cell amounts to taking a step along an iso-
phote on the surface. For Lambertian reflectance, this implies that the tangent plane (to the sur-
face) has to rotate precisely so the brightness remains constant. Or, moving normal to the shading
flow implies the brightness gradient must be changing in another measureable fashion (contrast).
Together these constraints on the flow changes correspond to changes in the surface curvatures,
revealing a family of possible surface patches for each patch of shading flow (Figure 18.7). This
provides the “column” of possible local surface patches, analogous to the column of possible ori-
entations at a position for contours. Boundary and interior conditions could then select from
among these, just as the induced boundary contrast yielded a shape percept in Figure 18.1b.
Fascinatingly, understanding shape-from-shading also illuminates other aspects of boundaries
that we enjoy in art and drawings (see DeCarlo et al. 2003).

Closure and border ownership


If a line forms a closed, or almost closed, figure, we see no
longer merely a line on a homogeneous background, but a
surface figure bounded by a line. This fact is so familiar . . .
(Koffka (1935, p. 150)).
The perspective on good continuation was geometric. In all cases there was a space of local ele-
ments: the column of possible 2D boundary tangents; the column of possible 3D space tangents;
and the column of possible surface patches. Given some initial or boundary conditions, good
374 Zucker

(a)
0.3

0.2
Shading flow
0.1 field

0.0

–0.1
Tangents to
–0.2 isophotos

–0.3
–0.3 –0.2 –0.1 0.0 0.1 0.2 0.3

(b)
Orientation Shading flow Possible local surfaces
hypercolumns
Response

z
y
x

Standard computer vision formulation


Fig. 18.7  The inference of shape from shading information as a problem in perceptual organization. (a)
Locally, shading information could be represented by the response of oriented cells tuned to low spatial
frequencies. (b) For each patch of the shading flow field there is a family of possible surfaces; this family
is a kind of column of possibilities analogous to the orientation column in early visual cortex. It may
correspond to the manner in which shape is represented in higher areas of visual cortex (Pasupathy and
Connor 2002). Selecting from among these families according to boundary and interior conditions reveals
a surface just as selecting orientations reveals a contour. Good continuation now operates at two levels:
shading flow and surface patches.
Reprinted by permission from Macmillan Publishers Ltd: Nature Neuroscience, 5(12), Anitha Pasupathy and Charles
E. Connor, Population coding of shape in area V4, pp. 1332–1338, doi:10.1038/972, Copyright © 2002, Nature
Publishing Group.

continuation could be thought of as selecting from among these possibilities according to linking
constraints. For contours it was co-circularity; for stereo it was pairs of (left, right) pairs of ori-
ented binocular responses; and finally the shading flow and surface patches. Curvature provided
the constraint in each case, dictating how the pieces could be glued together. The whole, in effect,
is built up by assembling the pieces in concert with their neighbors. Things fit together like a jig-
saw puzzle; and the different puzzles fit together at a higher level; it is all beautifully coupled into
one large network.
Border ownership, we assert, is different. It requires feedback from beyond geometric neigh-
bors and includes whole assemblies of cells. Neural action-at-a-distance affects local decisions,
and this action has to do with the global arrangement of boundary fragments; that is, with figural
properties.
Border Inference and Border Ownership 375

We now speculate on which aspects of neural systems could play a fundamental role in the
solution of the border ownership computation. We discuss two main classes of models:  those
in which the global information is obtained by a propagation process, and the second in which
global information is conveyed back to local decisions by downward propagation of information
from higher visual areas to lower ones. Both classes raise interesting theoretical questions that can
be related to topology. The first class deals with the question of whether a contour is oriented; the
second with whether a surface is contained. For reasons developed below, we believe the second
class is more appropriate to border ownership computations.
A combinatorial problem arises at the heart of these “topological” computations, and this
demands special consideration. It was already hinted at in Figure 18.3:  how can the feedback
connections be “wired up” so that the many possible completions all support the same border—
ownership neuron consistently? Trying to learn all possible connections seems wasteful, if not
infeasible; that level of detail seems inappropriate. Rather, some type of generalized shape feed-
back seems more suitable, one that provides a figure signal without details.
A conjecture about this general figure problem is the final topic covered. It involves a local field
potential whose value signals certain key properties of distant boundaries. While this breaks the
central paradox of border ownership, it is highly speculative. It is included in the spirit of trying to
start a discussion about whether “standard” approaches to neural computation, such as those just
discussed for good continuation, suffice. Among the questions raised are the following: how are
feedforward, feedback, and lateral connections coordinated? Does neural computation involve
only neurons, or should the surrounding substrate be included as well. And finally, given this
larger picture, should the classical—or even the extra classical—version of receptive field give
way to more general computational structures? This is where we confront the levels issue raised
in Figure 18.2.

Network propagation models


Classical models for border ownership are built entirely from networks of neurons. Instead of
good-continuation along contours, tangents can be rotated perpendicularly to become normals.
These point away from curves, instead of along them; we shall choose the sign so that, for a
circle, all normals point toward the center. Now, by drawing the circle on a rubber sheet so that
it can be stretched but not torn, geometry becomes topology. And, no matter how the circle is
distorted, the normals will point inward. Because this holds even for extreme distortions (Figure
18.1c,d), the computational challenge is to determine this inward direction for each normal and
whether they are directed consistently inward. For this it is necessary to travel all the way around
the boundary.
On the assumption that the brighter side of an edge indicates the inside, Figure 18.8a,b shows
that groups of neurons could reinforce others with a similar brightness orientation. Such topolog-
ical consistency has been developed for border ownership (Zhaoping 2005; Sakai and Nishimura
2004; Kogo and van Ee, this volume) and, earlier, for cluster analysis (Zucker and Hummel 1979).
A wide range of experiments (Lamme 1995; Zipser et al. 1996; Lee et al. 1998; Zhou et al. 2000;
Orban 2008) supports these models, at least as far as indicating border ownership occurs early in
the visual process.
Topological consistency has a mathematical connection to the geometric view developed earlier.
The fiber of different possibilities at each position—from boundary tangents to surface patches—
can be thought of as a mathematical space attached to each retinotopic point. This space estab-
lishes coordinates on tangent vectors, for example, so that we can operate with them. Establishing
coordinates requires a basis, in the manner that the x-axis and the y-axis define retinotopic coor-
dinates. They are consistent in the following sense: choose a point on a circle and hold an arrow
376 Zucker

(a) (b)
Vi
I

II-III

IV

(c) (d) Vj
Vi
I

G II-III

IV

(e) (f)
–0
–10
–20

200
150
100 20
40
50 60
80

Fig. 18.8  Neural models for computing border ownership. (a) Topological indicators or their proxy (e.g.,
the bright side of a boundary) could be propagated along a contour by utilizing long-range horizontal
connections (b) within an area. To establish closure it is necessary to go “all the way around” the
figure, however, which takes too long in neural terms. (c) Feedback integrating boundary information
from higher areas (d) could provide information about the existence of a figure, for example when
a circular arrangement of edge detectors feeds back to a single integrating “grouping” neuron G to
approximately signal the square figure (Craft et al. 2007). (e) To specify the correct grouping neurons for
complex shapes is combinatorially difficult for complex shapes; there are many interior “balls” that could
provide feedback. (f) The distance map (here shown in the negative) is the foundation for such shape
descriptions. Peaks (or valleys in this case) are the most distant points to the boundary; their locations
define the skeleton of the shape.
Data from Edward Craft, Hartmut Schütze, Ernst Niebur, and Rüdiger von der Heydt, A Neural Model of Figure–
Ground Organization, Journal of Neurophysiology, 97(6), pp. 4310–4326, DOI: 10.1152/jn.00203.2007, 2007.

pointing in the y-direction. Now, holding tight, after walking around the circle completely the
orientation of the arrow would be the same. But doing this on a Möbius strip is different: after
walking around once the arrow is pointing in the opposite direction; a second rotation is required
to align them. Topological consistency formally is the question of whether the local bases for each
fiber can be glued together so that the arrow does not reverse. Clearly, for general boundaries, to
guarantee consistency it is necessary to propagate information all the way around; the circle in the
image is orientable; the Möbius strip is not (Arnold 1962).
Border Inference and Border Ownership 377

Although this approach is beautiful in its mathematical simplicity, the global requirement for
orientability makes timing an issue for this class of models. For large figures it could take a long
time for information to propagate all the way around, but the evidence is that there is simply not
enough time for the signal to propagate that far (Craft et al. 2007).
A more plausible class of models involves feedback from higher visual areas (Felleman and
Essen 1991). Prominent projections exist from V1 to V2, V2 to V4 and V4 to inferotemporal (IT)
cortex, where much of high-level visual shape analysis is thought to reside (Hung et al. 2012).
There is a corresponding feedback projection for each of these forward projections. Since this
carries the integrated, higher-level information about shape back to lower areas it seems a natural
component to border ownership models. After all, it is this global, shape-based feedback that
could support border ownership (Section 1.2); supporting physiological evidence exists (e.g.,
Super and Lamme 2007; Self and Roelfsema, this volume and a number of models have been
developed (Craft et al. 2007; Sajda and Finkel 1995; Super and Romeo 2011).
Feedback is important because a 2D shape is an area surrounded by boundary and it is this fea-
ture of boundaries that could be fed back (Figure 18.8). The logic for accomplishing this is shown
in Figure 18.8b and is based on the idea that, briefly, shapes can be approximated by circular
arrangements of border-selective cells at the right positions. For certain simple shapes it is this
arrangement of boundary responses that could be fed back and integrated into a border-ownership
response. One way to do this is by a putative “grouping neuron” (Craft et al. 2007), but therein lies
the problem: Since there are many different circles contained in a general figure (e.g., Figure 18.8c)
how should these be integrated together into a single entity? When is a shape simple enough for
this to work? Does the distant completion matter (Figure 18.3c,d)?
This is the first part of the combinatorial problem faced by early border ownership models
and is related to certain figural representations. It suggests how shape models could inform the
border ownership computation. To build up a construct that we shall need shortly, imagine that
the shape was made of paper, and that it was ignited at every boundary point simultaneously.
The fire would burn inward and extinguish itself at distinguished points—the skeleton of the
shape (Blum 1973; Kimia et al. 1995). At the root of such algorithms is the distance map, or a
plot of the (shortest) distance to the boundary from any interior point (the negative of the dis-
tance map is shown in Figure 18.8d); it gives the time for the fire to reach that point. Maximal
values are the locus of maximal enclosed circles that touch the shape in (at least) two points
and are singularities of its gradient (Siddiqi et al. 2002). The Blum fire propagation solves the
issue of selecting the maximal enclosed circles by physics; we shall shortly suggest how a brain
might do this.
The second difficulty faced by border ownership models is that borders need not be closed
topologically. This is illustrated by visual search tasks (Figure 18.9) in which the time to find the
target among a group of distractors is a surrogate for how similar their cognitive representations
might be. Somehow, for broken contours or occluded figures we do not require the exact distance
map but only certain of its key features.
Generative models (Hinton and Ghahramani 1997; Hinton et al. 2006; Rao et al. 2002) pro-
vide for top-down feedback motivated by the question of how neural activity in higher areas
could generate patterns of activity in earlier areas resembling those from the bottom-up stimu-
lus. But the problem with border ownership is combinatorial: many patterns should evoke the
same relevant back projection. One possibility involves a probabilistic interpretation of the skel-
eton (Froyen et al. 2010), although this provides no connection to neurophysiology. We suggest
another approach.
378 Zucker

(a)
Target Distractor Target Distractor

(b)

3000 3000

Target Dist
2500 2500 Open
Response time (ms)

2000 2000

Target Dist
1500 1500

Target Dist
1000 1000
Closed
500 500
8 16 24 8 16 24
Display size Display size

Fig. 18.9  In visual search one seeks an example figure among a field of distractors as rapidly as possible.
(a) Examples of two displays with a figure embedded among distractors. Notice how much easier the
task is for the closed rather than the open figures. This suggests the power of closure. (b) Data showing
that nearly closed figures are effectively the same as closed figures, and that the arrangement of
contour fragments is key to the effect.
Reprinted from Vision Research, 33(7), James Elder and Steven Zucker, The effect of contour closure on the rapid
discrimination of two-dimensional shapes, pp. 981–91, Copyright © 1993. With permission from Elsevier.

Enclosure fields
Once in a conversation, the late Karl Lashley, one of the most
important psychologists of the time, told me quietly: “Mr.
Kohler, the work done by the Gestalt psychologists is surely
most interesting. But sometimes I cannot help feeling that you
have religion up your sleeves.
(Köhler (1969, p. 48)).
Border ownership is about action-at-a-distance:  how distant edges influence local boundary
decisions. Such phenomena occur not only in neuroscience but in developmental biology more
widely. In this section we build up the idea of an enclosure field, a relaxation of the topological
definition of closure, and show that it carries information about borders at a distance in a manner
Border Inference and Border Ownership 379

that integrates over incompletions and shape variations. In the next section we develop it into a
conceptual circuit model.
To build intuition, we start with what, at first, seems like a completely different situation:  a
growing plant. We ask: how are new veins signaled in a juvenile leaf? Somehow the cell furthest
from existing veins must signal them to send a new shoot in that direction. The hormone auxin
is involved in the process, a simple model for which can be developed along the following lines
(Dimitrov and Zucker 2006; see Figure 18.10a). Imagine that each cell in a rectangular areole (or
patch of tissue surrounded by existing veins) produces auxin at a constant rate, that it diffuses
across cell membranes, and that existing vasculature clears it away. Abstractly this implies a simple
reaction–diffusion equation: the change in concentration at a point is proportional to the amount
that is produced there plus the relative amount that diffuses in and away. A  boundary condi-
tion—zero concentration at the veins—lets us calculate the solution. The steady state equilibrium
(Figure 18.10b) has a “hot spot” in the center and drops off to zero. Note that although it could

(a) Interior production (b) Boundary production


0
20 5
5
10 15 10
15
10 15
20
25 20
5
30 25
0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35

(c) (d)
0.03
0.029
Concentration

0.028
0.027
0.026
0.025
0.024
0 5 10 15 20 25 30 35 40

(f) x 10–4
(e) 8
Concentration

6
Difference

4
2
0
0 5 10 15 20 25 30 35 40

Fig. 18.10  Two ways to build the enclosure field concept. The left column is relevant to biology
(interior production) and the other to neuroscience (boundary feedback). The illustration shows
a rectangular figure. (a) Interior production has each “cell” (i.e. pixel) producing, with diffusion
between neighboring cells and zero concentration at the existing veins (boundary). (c) The equilibrium
concentration along the central black line shows a peak at the center, while the magnitude of the
gradient (e) shows a peak at the boundary. This peak gradient is proportional to the distance to the
concentration “hot spot.” (b) Production from existing veins has only the boundary cells (pixels)
producing. Diffusion leads to spreading and catalysis leads to destruction. (d) Notice that now there is
a concentration minimum but still a magnitude of gradient peak (f) proportional to distance.
380 Zucker

appear that the hot spot developed from overproduction, say to lack of nutrient, this specialization
is not necessary. But it is even more important to look at the boundary, where the concentration
gradient (magnitude) is maximal. This is where the signal is most useful, because it is where cells
need to start differentiating from ground type to vein type. Structurally here is the main point: the
absolute value of the gradient is in proportion to the distance to the hot spot (Figure 18.10e).
While the actual biology is more complex (Dimitrov and Zucker 2009a; Dimitrov and Zucker
2009b), action-at-a-distance has been achieved: a signal is available to control vascular growth.
There is a mathematical dual to this result that amounts to letting the system run in the oppo-
site direction. Instead of having the tissue produce auxin and the veins clear it, auxin could be
produced by the existing veins and could then diffuse inwards. Adding a destruction term to the
equation (so that the change in concentration at a point is proportional to the amount that dif-
fuses in minus the amount catabolized away) prevents the concentration from increasing beyond
bound (Dimitrov and Zucker 2009a; Dimitrov and Zucker 2009b) but the logic remains the
same: the value of the auxin field contains information about the distance map. This is precisely
what is required for border ownership. See Figure 18.10 (right column).
It is this dual result that is relevant to neurobiology because there is a different way to pro-
duce it than by hormones. To appreciate it, consider the feedback from higher areas about border
segments (and possibly their arrangement) as analogous to the existing vasculature: instead of
signaling the areole’s boundary, as veins could in plants, the feedback signals information about
the figural boundary. What is relevant for border ownership is not that there is a hotspot of auxin
at the center, but rather that there exists a “center” to some figure plus the side on which it lies.
Certain properties of this enclosure field are illustrated in Figure 18.11. As we describe next, the
relevant signal could be in the form of a local field potential instead of auxin. But the mathematics
remains qualitatively the same.

Feedback via LFP: global influences on local decisions


To realize the enclosure field concept and how it might influence the border ownership computa-
tion, we return to the different levels illustrated in Figure 18.2. Neurons are situated in a conductive
substrate, not in isolation, and there are many different sources of transmembrane ionic currents.
The result is an environmental local field potential (LFP) that contains information at many temporal
scales (Buzski et al. 2012). Some of this reflects local spiking activity about orientation (Katzner et al.
2009) and contrast (Henrie and Shapley 2005); although others have shown a richer connection to
the extra-classical components of a neuron’s discharge field (Chavane et al. 2000). Given the impor-
tance of membrane potential for spiking activity, the LFP could play a role in neural computation.
We suggest a way to make this role concrete: that the LFP carries information like that in the
enclosure field (Zucker 2012). Although there are differences between the calculations discussed
in the previous section and the local field, in particular that the enclosure field reaction diffu-
sion equation is related to the Gaussian kernel while the LFP is Poisson, these are technical. The
previous calculation would hold if the extra-neuronal substrate were a linear resistive medium.
Conceptually we shall work with this concept in principle.
To review, the criteria that must be met for the border ownership computation include the
following:
1. Border ownership involves global to local feedback (Section 1.2), but
•  feedback projections are patchy (Section 2.1); and
•  border ownership breaks down if the figure is too complex (Figure 18.1).
Border Inference and Border Ownership 381

(a) (b) (c)

(d)

(e)
10
9
8
7
6
5
4
3
2
1

Fig. 18.11  Illustrations of the enclosure field. (a,b,c) Increasing segment length shows the field as
more of the “enclosing boundary” is available. It increases with convexity and integrates over gaps.
(d) Figures like those used in the search task. (e) The enclosure field. Notice how the target emerges
in concentration whether or not the boundary is complete.

2. The global information derives from figural properties, but:


•  figural boundaries need not be complete; only suggestive (Figure 18.9), and
•  different figural completions should be equivalent (Figure 18.3).
3. Neural circuits must integrate the feedback to the boundary signal in a manner that
•  combines the bottom-up, top-down and (perhaps) lateral signals; and
•  the system must be able to learn to integrate the feedback signal.
The enclosure field construct clearly satisfies criteria 1 and 2. It is driven by boundary segments,
so that when they become too complex the field will break down, and the diffusion term clearly
integrates over boundary incompletions and geometric variations. So we turn now to item 3.
382 Zucker

Figure 18.12 illustrates how an enclosure field model could work. The LFP is built up from cur-
rents that derive from both intrinsic neuronal activity and feedback connections. Most importantly,
there is accumulating evidence that physiological fluctuations in the LFP can control when neurons
spike (Frohlich and McCormick 2010); the composite is called a phase-of-firing code (Montemurro
et al. 2008; Panzeri et al. 2010). Although in vivo research in visual cortex is lacking, it is known that
such codes can coordinate activity in different brain areas (e.g., Brockmann et al. 2011); we assert
that they provide the coupling between the local field and the border-selective neurons.
Finally, it must be stressed that there are other cell types in the neuronal surround, primarily
glia, and we here focus on one of these, the astrocytes (Figure 18.12d). It has recently been con-
jectured that glia could play a role in neuronal function (Araque and Navarrete 2010). Although
astrocytes are non-spiking, they do have channels, glial transmitters (e.g., glutamate) and provide
a gap-junction coupled tesselation of extra-neuronal space (Nedergaard et  al. 2003). And they
play a role in synaptic development (Araque et al. 1999). In summary, it seems increasingly likely
that glia could be playing a significant role in controlling the LFP and its neuronal interaction,
and in integrating it with neuronal activity. The enclosure field model suggests a concrete way in
which they could be involved.
The model is clearly radical. If correct (even in part) it suggests that neural modeling must
extend beyond neurons to include the substrate in which neurons are embedded plus other cell
types. Synaptic interaction must extend beyond classical second order: local field potentials mat-
ter as well as spike timing and synaptic arrangement.

(a) (b)
Vi Vj Vi Vj

I I

II-III II-III

IV IV

(c) (d)
V Vi Vj

II-III

IV

Fig. 18.12  The enclosure field model for border ownership involves feedback from higher areas and
integration via local field potentials. (a) The LFP is shown (gray) emanating from neuronal processes; it
also derives (b) from feedback projections. The composite field controlling border ownership derives
from their superposition. (c) The LFP can control neuronal spiking activity. Shown are action potentials
on top of local field fluctuations. This particular neuron prefers to fire when the LFP is depolarized. (d)
Astrocytes tessellate the volume surrounding large numbers of neurons. Each blob in the tessellation
suggests a single astrocyte domain.
Reprinted from Trends in Neurosciences, 26(10), Maiken Nedergaard, Bruce Ransom, and Steven A. Goldman,
New roles for astrocytes: Redefining the functional architecture of the brain, pp. 523–30, Copyright © 2003, with
permission from Elsevier.
Border Inference and Border Ownership 383

The implications of ascribing an information-processing role to glia are wide ranging but can-
not be ignored. In a striking experiment human glia have been shown to greatly increase learning
and synaptic plasticity in adult mice (Han et al. 2013). Second, glia may play a role in disease. It is
known, for example, that there is an increase in glia among autistic individuals. Since this holds
even in visual cortex (Tetreault et al. 2012), perhaps it explains the perceptual organization differ-
ences that are expressed in autism (Simmons et al. 2009).
Finally, the consideration of border ownership as part of what causes a neuron’s activity greatly
complicates the notion of receptive field. As described above (Figure 18.4b, receptive fields are
normally characterized as, e.g. Gabor patches with even/odd symmetry, plus an orientation and a
scale. When the border ownership component is included, the locus of retinotopic positions that
can influence firing becomes very large. Receptive fields in early vision no longer have the crisp
interpretation of a Gabor patch and can be a very complicated function of the stimulus. Receptive
fields become a network property, in short, and not a convolution filter.

Conclusions
A science . . . gains in value and significance not by the number
of individual facts it collects but by the generality and power of
its theories . . .
(Koffka (1935, p. 9)).
Border ownership in particular, and Gestalt phenomena in general, have provided a long-term
challenge to visual modelers. While the phenomena are easy to demonstrate, explaining them has
required an integration of many different theoretical constructs. Here we tried to lay out a logical
basis for this, by contrasting the geometric ideas underlying borders, stereo, and shading analysis
on the way to surface inferences against the topological ideas underlying border ownership. The
chapter took a neurogeometric tone and, in the end, we explored both traditional style models
of neuron-to-neuron computation plus extensions to them. The topological challenge of border
ownership revealed an association to field-theoretic models, which in turn broadened the scope
of modeling to include local field potentials and glia as well as neurons. The end was a model
enlarged drastically in scope. The chapter opened with a brief review of the receptive field concept
in neurophysiology and closed with a radically enlarged view from Gestalt psychology. While this
is certainly not the last word in border ownership, we hope it is indicative of the types of intel-
lectual debate that modeling must face.

Acknowledgements
Supported by AFOSR, ARO, NIH and NSF. I  thank J.  Wagemans, N.  Kogo, and reviewers for
comments on the manuscript; and B. Kunsberg, D. Holtmann-Rice, M. Lawlor, and P. Dimitrov
for discussion.

References
Adini, Y., Sagi, D., and Tsodyks, M. (1997). Excitatory-inhibitory network in the visual
cortex: Psychophysical evidence. Proceedings of the National Academy of Sciences (USA) 94: 10426–31.
Angelucci, A., Levitt, J. B., Walton, E. J. S., Hupe, J.-M., Bullier, J., and Lund, J. S. (2002). Circuits for local
and global signal integration in primary visual cortex. The Journal of Neuroscience 22(19): 8633–46.
Araque, A. and Navarrete, M. (2010). Glial cells in neuronal network function. Philosophical Transactions
of the Royal Society, Series B 365: 2375–81.
384 Zucker

Araque, A., Parpura, V., Sanzgiri, R., and Haydon, P. (1999). Tripartite synapses: glia, the unacknowledged
partner. Trends in Neurosciences 22: 208–15.
Arnold, B. H. (1962). Intuitive concepts in elementary topology. Englewood Cliffs: Prentice Hall.
Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptual psychology. Perception
1(4): 371–94.
Ben-Shahar, O. and Zucker, S. W. (2003). Geometrical computations explain projection patterns of
long-range horizontal connections in visual cortex. Neural Computation 16: 445–76.
Blum, H. (1973). Biological shape and visual science (Part I). Journal of Theoretical Biology 38: 205–87.
Bonneh, Y. and Sagi, D. (1998). Effects of spatial configuration on contrast detection. Vision Research
38: 3541–53.
Bosking, W., Zhang, Y., B., S., and Fitzpatrick, D. (1997). Orientation selectivity and the
arrangement of horizontal connections in the tree shrew striate cortex. The Journal of Neuroscience
17(6): 2112–27.
Breton, P. and Zucker, S. (1996). Shadows and shading flow fields. In Proceedings of Computer Vision and
Pattern Recognition (CVPR), pp. 782–789.
Brockmann, M., Pschel, B., Cichon, N., and Hanganu-Opatz, I. (2011). Coupled oscillations mediate
directed interactions between prefrontal cortex and hippocampus of the neonatal rat. Neuron
71(2): 332–47.
Buzski, G., Anastassiou, C. A., and Koch, C. (2012). The origin of extracellular fields and currents EEG,
ECOG, LFP and spikes. Nature Reviews Neuroscience 13: 407–20.
Casagrande, V., and Kaas, J. (1994). The afferent, intrinsic, and efferent connections of primary visual
cortex in primates. In: A. Peters, and K. Rockland (eds.) Cerebral cortex: Primary visual cortex in
primates, Vol. 10, pp. 201–259. New York: Plenum Press.
Chavane, F., Monier, C., Bringuier, V., Baudot, P., Borg-Graham, L., Lorenceau, J., and Fregnac, Y. (2000).
The visual cortical association field: A Gestalt concept or a psychophysiological entity? Journal of
Physiology (Paris) 94: 333–42.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. Journal of Neurophysiology 97(6): 4310–26.
DeCarlo, D., Finkelstein, A., Rusinkiewicz, S., and Santella, A. (2003). Suggestive contours for conveying
shape. ACM Transactions on Graphics 22(3): 848–55.
Dimitrov, P. and Zucker, S. W. (2006). A constant production hypothesis that predicts the dynamics of leaf
venation patterning. Proceedings of the National Academy of Sciences (USA) 13(24): 9363–8.
Dimitrov, P. and Zucker, S. W. (2009a). Distance maps and plant development #1: Uniform production and
proportional destruction. arXiv.org, arXiv:0905.4446v1 [q-bio.QM], 1–39.
Dimitrov, P. and Zucker, S. W. (2009b). Distance maps and plant development #2: Facilitated transport and
uniform gradient. arXiv.org, arXiv:0905.4662v1 [q-bio.QM](24), 1–46.
Douglas, R. J. and Martin, K. A. C. (2004). Neuronal circuits of the neocortex. Annual Review of
Neuroscience 27: 419–51.
Dubuc, B. and Zucker, S. W. (2001). Complexity, confusion, and perceptual grouping. Part II. Mapping
complexity. International Journal of Computer Vision 42(1/2): 83–115.
Elder, J. and Zucker, S. W. (1993). Contour closure and the perception of shape. Vision Research
33(7): 981–91.
Felleman, D. and Essen, D. V. (1991). Distributed hierarchical processing in the primate cerebral cortex.
Cerebral Cortex 1: 1–47.
Field, D., Hayes, A., and Hess, R. (1993). Contour integration by the human visual system: evidence for a
local association field. Vision Research 33: 173–93.
Frohlich, F. and McCormick, D. (2010). Endogenous electric fields may guide neocortical network activity.
Neuron 67: 129–43.
Border Inference and Border Ownership 385

Froyen, V., Feldman, J., and Singh, M. (2010). A Bayesian framework for figure-ground interpretation.
In: J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (eds.) Advances in Neural
Information Processing Systems, Vol. 23, pp. 631–9). Available online at: http://papers.nips.cc/book/
advances-in-neural-information-processing-systems-23-2010
Gilbert, C. and Wiesel, T. (1983). Clustered intrinsic connections in cat visual cortex. The Journal of
Neuroscience 3(5): 1116–33.
Gordon, J. and Shapley, R. (1985). Nonlinearity in the perception of form. Perception & Psychophysics
37: 84–8.
Han, X., Chen, M., Wang, F., Windrem, M., Wang, S., Shanz, S. et al. (2013). Forebrain engraftment by
human glial progenitor cells enhances synaptic plasticity and learning in adult mice. Cell Stem Cell
12(3): 342–53.
Hartline, H. K. (1938). The response of single optic nerve fibers of the vertebrate eye to illumination of the
retina. American Journal of Physiology 121: 400–15.
Henrie, J. and Shapley, R. (2005). LFP power spectra in V1 cortex: The graded effect of stimulus contrast.
Journal of Neurophysiology 94(1): 479–90.
Hinkle, D. A. and Connor, C. E. (2002). Three-dimensional orientation tuning in macaque area V4. Nature
Neuroscience 5(7): 665–70.
Hinton, G. and Ghahramani, Z. (1997). Generative models for discovering sparse distributed
representations. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
352: 1177–90.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural
Computation 18: 1527–54.
Horn, B. K. P. and Brooks, M. J. (eds.) (1989). Shape from shading. Cambridge, MA: MIT Press.
Huang, P.-C., Chen, C.-C., and Tyler, C. W. (2012). Collinear facilitation over space and depth. Journal of
Vision 12(2): 1–9.
Hubel, D. H. and Livingstone, M. S. (1987). Segregation of form, color, and stereopsis in primate area 18.
The Journal of Neuroscience 7(11): 3378–415.
Hubel, D. H. and Wiesel, T. N. (1977). Functional architecture of macaque monkey visual cortex.
Proceedings of the Royal Society of London, Series B 198: 1–59.
Hubel, D. H. and Wiesel, T. N. (1979). Brain mechanisms of vision. Scientific American 241: 150–62.
Hung, C.-C., Carlson, E. T., and Connor, C. E. (2012). Medial axis shape coding in macaque
inferotemporal cortex. Neuron 74(6): 1099–113.
Hunt, J. J., Mattingley, J. B., and Goodhill, G. J. (2012). Randomly oriented edge arrangements dominate
naturalistic arrangements in binocular rivalry. Vision Research 64: 49–55.
Kanizsa, G. (1979). Organization in vision: Essays on Gestalt perception. New York: Praeger.
Kapadia, M., Ito, M., Gilbert, C., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: Parallel studies in human observers and in V1 of alert monkeys. Neuron
15: 843–56.
Katzner, S., Nauhaus, I., Benucci, A., Bonin, V., Ringach, D., and Carandini, M. (2009). Local origin of
field potentials in visual cortex. Neuron 61: 35–41.
Kimia, B., Tannenbaum, A., and Zucker, S. W. (1995). Shapes, shocks, and deformations. Part I. The
components of two-dimensional space and the reaction-diffusion space. International Journal of
Computer Vision 15: 189–224.
Koenderink, J. J., van Doorn, A., and Wagemans, J. (2013). SFS? Not likely! i–Perception 4:
299–302.
Koffka, K. (1935). Principles of Gestalt psychology. New York: Harcourt, Brace and World.
Köhler, W. (1969). The task of Gestalt psychology. Princeton: Princeton University Press.
386 Zucker

Kuffler, S. W. (1953). Discharge patterns and functional organization of mammalian retina. Journal of
Neurophysiology 16(1): 37–68.
Kunsberg, B. and Zucker, S. W. (2013). Characterizing ambiguity in light source invariant shape from
shading. Available at: <http://arxiv.org/abs/1306.5480>.
Kunsberg, B. and Zucker, S. (2014). How shading constrains surface patches without knowledge of light
sources, SIAM Journal on Imaging Sciences 7(2): 641–688.
Lamme, V. (1995). The neurophysiology of figure ground segregation in primary visual cortex. The Jorunal
of Neuroscience 15: 1605–15.
Langer, M. and Zucker, S. W. (1997). Casting light on illumination: A computational model and
dimensional analysis of sources. Computer Vision and Image Understanding 65(2): 322–35.
Lawlor, M., Holtmann-Rice, D., Huggins, P., Ben-Shahar, O., and Zucker, S. W. (2009). Boundaries,
shading, and border ownership: A cusp at their interaction. Journal of Physiology (Paris) 103: 18–36.
Lee, T. S., Mumford, D., Romeo, R., and Lamme, V. A. F. (1998). The role of the primary visual cortex in
higher level vision. Vision Research 38: 2429–54.
Li, G. and Zucker, S. W. (2006). Contour-based binocular stereo: Inferencing coherence in stereo tangent
space. International Journal of Computer Vision 69(1): 59–75.
Li, G., and Zucker, S. W. (2010). Differential geometric inference in surface stereo. IEEE Transactions on
Pattern Analysis and Machine Intelligence 32(1): 72–86.
Marr, D. (1982). Vision. San Francisco: W.H. Freeman.
Metzger, W. (2006). Laws of seeing. Cambridge, MA: MIT Press.
Miller, K. D. (2003). Understanding layer 4 of the cortical circuit: A model based on cat V1. Cerebral Cortex
13: 73–82.
Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge,
MA: MIT Press.
Montemurro, M. A., Rasch, M. J., Murayama, Y., Logothetis, N. K., and Panzeri, S. (2008). Phase-of-firing
coding of natural visual stimuli in primary visual cortex. Current Biology 18(5): 375–80.
Muir, D. R., Costa, N. M. A. D., Girardin, C. C., Naaman, S., Omer, D. B., Ruesch, E., Grinvald, A., and
Douglas, R. J. (2011). Embedding of cortical representations by the superficial patch system. Cerebral
Cortex 21(10): 2244–60.
Nakayama, K. and Shimojo, S. (1992). Experiencing and perceiving visual surfaces. Science
257(5075): 1357–63.
Nedergaard, M., Ransom, B., and Goldman, S. (2003). New roles for astrocytes: Redefining the functional
architecture of the brain. Trends in Neurosciences 26(10): 523–30.
Orban, G. (2008). Higher order visual processing in macaque extrastriate cortex. Physiology Reviews
88(1): 59–89.
Panzeri, N., S. Brunel, Logothetis, N., and Kayser, C. (2010). Sensory neural codes using multiplexed
temporal scales. Trends in Neurosciences 33(3): 111–20.
Parent, P. and Zucker, S. W. (1989). Trace inference, curvature consistency and curve detection. IEEE
Transactions on Pattern Analysis and Machine Intelligence 11(8): 823–39.
Pasupathy, A. and Connor, C. (2002). Population coding of shape in area V4. Nature Neuroscience
5(12): 1332–8.
Pind, J. L. (2012). Figure and ground at 100. The Psychologist 25(1): 90–1.
Poggio, G. F. and Fisher, B. (1977). Binocular interaction and depth sensitivity of striate and pre-striate
cortical neurons of the behaving rhesus monkey. Journal of Neurophysiology 40(1): 392–405.
Rao, R., Olshausen, B. and Lewicki, M. (Eds.) (2002). Probabilistic models of the brain: Perception and
neural function. Cambridge, MA: MIT Press.
Ratliff, F. (1965). Mach bands: Quantitative studies on neural networks in the retina. San
Francisco: Holden-Day.
Border Inference and Border Ownership 387

Rockland, K. and Lund, J. (1982). Widespread periodic intrinsic connections in the tree shrew visual
cortex. Science 215: 1532–4.
Rockland, K. and Virga, A. (1989). Terminal arbors of individual feedback axons projecting from area
V2 to V1 in the macaque monkey: a study using immunohistochemistry of anterogradely transported
phaseolus vulgaris-leucoagglutinin. Journal of Comparative Neurology 285: 54–72.
Roe, A. W. and Ts’o, D. Y. (1997). The functional architecture of area V2 in the macaque monkey.
In: K. Rockland, J. Kaas and A. Peters (eds.) Extrastriate cortex in primates, Vol. 12, pp. 295–333.
New York: Plenum.
Rubin, E. (1915). Synsoplevede Figurer: Studier i psykologisk Analyse. Frste Del. Gyldendalske Boghandel,
Nordisk Forlag. Visually experienced figures: Studies in psychological analysis. Part one.
Sajda, P. and Finkel, L. (1995). Intermediate-level visual representations and the construction of surface
perception. Journal of Cognitive Neuroscience 7: 267–91.
Sakai, K. and Nishimura, H. (2004). Determination of border ownership based on the surround context of
contrast. Neurocomputing 58: 843–8.
Salin, P. A. and Bullier, J. (1995). Corticocortical connections in the visual system: structure and function.
Physiological Reviews 75: 107–54.
Sarti, A., Citti, G., and Petitot, J. (2008). The symplectic structure of the primary visual cortex. Biological
Cybernetics 98(1): 33–48.
Sherrington, C. S. (1906). The integrative action of the nervous system. New York: C. Scribner and Sons.
Shmuel, A., Korman, M., Sterkin, A., Harel, M., Ullman, S., Malach, R., and Grinvald, A. (2005).
Retinotopic axis specificity and selective clustering of feedback projections from v2 to v1 in the owl
monkey. The Journal of Neuroscience 25: 2117–31.
Siddiqi, K., Bouix, S., Tannenbaum, A. R., and Zucker, S. W. (2002). Hamilton-Jacobi skeletons.
International Journal of Computer Vision 48: 215–31.
Simmons, D. R., Robertson, A. E., McKay, L. S., Toal, E., McAleer, P., and Pollick, F. E. (2009). Vision in
autism spectrum disorders. Vision Research 49: 2705–39.
Sincich, L. and Horton, J. (2002). Divided by cytochrome oxidase: a map of the projections from V1 to V2
in macaques. Science 295: 1734–7.
Sompolinsky, H. and Shapley, R. (1997). New perspectives on the mechanisms for orientation selectivity.
Current Opinion in Neurobiology 7: 514–22.
Super, H. and Lamme, V. A. (2007). Altered figure-ground perception in monkeys with an extra-striate
lesion. Neuropsychologia 45(14): 3329–34.
Super, H. and Romeo, A. (2011). Feedback enhances feedforward figure-ground segmentation by changing
firing mode. PLoS ONE 6(6): e21641.
Tetreault, N. A., Hakeem, A. Y., Jiang, S., Williams, B. A., Allman, E., Wold, B. J., and Allman, J. M.
(2012). Microglia in the cerebral cortex in autism. Journal of Autism and Developmental Disorders
42(12): 2569–84.
Ullman, S., Vidal-Naquet, M., and Sali, E. (2002). Visual features of intermediate complexity and their use
in classification. Nature Neuroscience 5: 682–7.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychological Bulletin 138(6): 1172–217.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt (Part II). Psychologische Forschung 4: 301–50.
Zeki, S. and Shipp, S. (1988). The functional logic of cortical connections. Nature 335: 311–17.
Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area V2. Neuron
47: 143–53.
388 Zucker

Zhou, H., Friedman, H., and von der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. The Journal of Neuroscience 20: 6594–611.
Zipser, K., Lamme, V. A. F., and Schiller, P. H. (1996). Contextual modulation in primary visual cortex. The
Journal of Neuroscience 16(22): 7376–89.
Zucker, S. W. (2012). Local field potentials and border ownership: a conjecture about computation in visual
cortex. Journal of Physiology (Paris) 106: 297–315.
Zucker, S. W., Dobbins, A., and Iverson, L. (1989). Two stages of curve detection suggest two styles of
visual computation. Neural Compution 1: 68–81.
Zucker, S. W. and Hummel, R. A. (1979). Toward a low-level description of dot clusters: labeling edge,
interior, and noise points. Computer Graphics and Image Processing 9: 213–33.
Section 5

Surface and color perception


Chapter 19

Perceptual organization in lightness


Alan Gilchrist

Lightness
Lightness refers to the perceived white/gray/black dimension of a surface. The physical property
that corresponds to lightness is reflectance, that is, the percentage of light a surface reflects. White
surfaces reflect about 90% of the light they receive while black surfaces reflect only about 3%.
Thus, lightness refers to the perception of a concrete property of an object. (Lightness should not
be confused with brightness, which concerns perception of the raw intensity of light reflected by
the object, which is not a property of the object itself.)

Early Structure-blind Conceptions


The indispensable role of perceptual organization for a theory of lightness, as with other percep-
tual qualities, was not recognized initially. This is not surprising. If white reflects more light to
the eye than black, and if the retina contains photoreceptors that respond in proportion to the
intensity of light striking them, what is the problem? Early theories of perception, as seen in the
doctrine of sensations, assumed that the perceptual experience at any point in the visual field cor-
responds to the local stimulation at that point. This is the quintessential example of what Gilchrist
(2006) has called a structure-blind approach. The Gestaltists criticized this kind of reductionist
assumption. They labeled it the constancy hypothesis because it assumed a constant relationship
between local stimulation and local percept. ‘In its consistent form,’ Koffka wrote (1935, p. 96),
‘the constancy hypothesis treats of sensations, each aroused by the local stimulation of one retinal
point. Thus the constancy hypothesis maintains that the result of a local stimulation is constant,
provided that the physiological condition of the stimulated receptor is constant (e.g., adaptation).’
Unfortunately, the term constancy hypothesis has become confusing because, in the intervening
years, the term constancy has come to be used in an almost opposite way. This linguistic confu-
sion is unfortunate because the assumption of a one-to-one relationship between stimulation and
experience, while wrong, is an important concept that is badly in need of a name. For example, it
might be called the doctrine of local determination.
Even though no one would defend such a reductionist assumption today, Gilchrist (1994, p. 17)
argues that it continues to lurk just beneath the surface, especially in lightness perception, where
he has called it the photometer metaphor.

The Ambiguity of Luminance


The photometer metaphor fails because any shade of gray can reflect any intensity of light (called
luminance). This state of affairs arises from the fact that the luminance reaching the eye from a
surface is a joint product of both the reflectance of the surface and the intensity of illumination
392 Gilchrist

striking the surface. For example, a black surface in sunlight can easily reflect more light than a
white surface in shadow. Indeed, any luminance can come from any shade of gray. This implies
that the light reflected from a surface to your eye, by itself, cannot reveal the reflectance of that
surface. In principle lightness can only be determined using the surrounding context. The exact
role of context is the focus of many theoretical disputes, but the indispensable role of perceptual
structure cannot be doubted.
The central problem of lightness is that of lightness constancy. The perceived lightness of an
object remains approximately (but not entirely) constant even when the illumination level changes.
In view of the spoiling role played by variations in illumination, von Helmholtz (1866/1924) logi-
cally suggested that lightness could be recovered by dividing the luminance of a surface by an
unconscious estimate of its incident illumination, but without a clear idea of how illumination can
be estimated, his suggestion remains little more than a promissory note.

The Appeal to Relative Luminance


A more concrete approach is contained in the intuitive idea that lightness depends on relative,
not absolute luminance. The dependence of lightness on relative luminance is a fundamental fact.
Indeed, the perception of a surface in the first place requires the simultaneous, adjacent presence
of at least two luminance values. If you stand in the center of a large sphere of homogeneous pig-
ment, you cannot even see the surface. You experience only an infinite fog (Gelb, 1932; Metzger,
1930). The perception of a surface requires at least one edge or luminance boundary.
The physical definition of reflectance involves a comparison – between the amount of light
incident upon a surface and the amount the surface reflects. Thus, it is not surprising that
von Helmholtz, as a physicist, assumed that the visual system must estimate the illumination
level, and compare this with the luminance of a surface. However, there is a very different way
to compute something like reflectance, and that is to compare the amount of light reflected by
one surface with the amount reflected by neighboring surfaces. The Helmholtzian approach
is very demanding computationally. It has never been clear how the illumination level could
be estimated. Comparing the luminance values of neighboring surfaces, however, seems
much more tractable.

Wallach Experiment
In 1948, Hans Wallach published an elegant experiment that soon became a classic. He presented
a disk of homogeneous luminance surrounded by a fat annulus also of homogenous luminance.
Holding the luminance of the disk constant, he showed that it could, nevertheless, be made to
appear as any shade of gray between black and white simply by varying the luminance of the
annulus. He then presented observers with two disk/annulus displays and asked them to adjust
the luminance of one disk to make it appear as the same shade of gray as the other disk. The set-
tings made by the observers showed that the disks appear as equal shades of gray not when they
have the same luminance value, but when the disk/annulus luminance ratios are equal. This find-
ing led Wallach to propose the simple idea that the lightness of an object is a direct function of the
ratio between the luminance of the object and the luminance of its adjacent region.

Explains constancy
Wallach’s paper was celebrated for several reasons. First, when the illumination level changes,
although the luminance of an object changes, the luminance ratio between the object and its
Perceptual Organization in Lightness 393

immediate background does not. Wallach noted that this is exactly what would be expected if
lightness were a function of the object/surround luminance ratio.

Explains simultaneous contrast


Secondly, the ratio idea seemed to explain another lightness puzzle, called simultaneous lightness
constancy. In this classic illusion, a gray square centered on a black background appears some-
what lighter than an identical gray square on an adjacent white background. Wallach argued that
this is because the two squares have different luminance ratios.

Supporting evidence
Wallach’s results were consistent with Weber’s law, and with a great deal of evidence from vari-
ous senses of a logarithmic relationship between physical energy and perceived magnitude. Later
findings from stabilized images and physiological work implied that the luminance ratio at each
edge is just what is encoded at the retina (Barlow and Levick, 1969; Troy and Enroth-Cugell, 1993;
Whittle and Challands, 1969).

Consistent with lateral inhibition


Finally, Wallach’s ratio theory seemed to mesh perfectly with a then-recently discovered physi-
ological mechanism called lateral inhibition. First proposed in 1865 by Ernst Mach, and later by
Ewald Hering (1874), who called it ‘reciprocal action in the somatic visual field,’ experiments on
the horseshoe crab (limulus) had shown that the rate of firing of a constantly-illuminated photo-
receptor in the crab’s eye is reduced when the light shining on neighboring receptors is increased
(Hartline et al., 1956). The parallel between this finding and Wallach’s psychophysical results was
obvious, and most researchers concluded that Wallach’s ratio results were a manifestation of lat-
eral inhibition. This was an exciting development, potentially marking the first time a basic per-
ceptual property could be explained at the cellular level.

Limitations of Ratio Theory


Luminance ratios at edges have continued to play an essential role in subsequent theories of light-
ness. However, work published since the Wallach experiment has shown that his simple ratio idea:
(1)  does not explain lightness constancy;
(2)  does not explain simultaneous contrast;
(3)  is not explained by lateral inhibition.
Indeed, these same points had been made earlier, both theoretically and empirically, by the
Gestaltists. The basic problem is that the ratio principle captures the structure of the visual field
in only the most minimal way. Compared with the view that sensory experience is locally deter-
mined, the ratio principle is a step in the right direction. However, the response of the visual
system to the structure of the image is far more extensive than Wallach imagined.

Lightness and 3D structure


Although Wallach himself (1963) did not believe that his results were explained by lateral inhib-
ition at the retina, most other theorists did (Cornsweet, 1970; Jameson and Hurvich, 1964). This
is not surprising, especially given the retinotopic nature of Wallach’s ratio concept, which implies
that lightness does not depend on the 3D structure of the visual field, an unlikely position for a
394 Gilchrist

student of the Gestaltists. However, this point was not essential to Wallach’s thinking; it merely
came from his empirical finding that the lightness of a disk does not change when the disk and
annulus are separated in depth, but for the contrast theorists who attributed lightness to lateral
inhibition, any finding that lightness depends on perceived depth would represent a fundamental
challenge.
Von Helmholtz’s claim that lightness depends on taking the illumination into account implies
a close depth/lightness linkage, but empirical support was scarce. Mach (1922/1959, p. 209) had
observed that if a white card is folded in half, placed on a table like a tent or roof, and illuminated
primarily from one side, both sides of the roof appear white, although one side appears shadowed.
However, when the card can be perceptually reversed so that it appears concave, as an open book,
then ‘the light and the shade stand out as if painted thereon.’ The lightness of the shadowed side
changes even though the retinal image (and with it any inhibitory effect) has remained constant.
However, attempts to capture Mach’s depth effect in the laboratory showed little or no success
(Beck, 1965; Epstein, 1961; Flock and Freedberg, 1970; Hochberg and Beck, 1954). Experiments
by Gilchrist (1977, 1980), using a greater luminance range, and a richer context that allowed the
target to form a different luminance ratio in each of two perceived spatial positions, showed that a
change in depth could cause the lightness of a target surface to change almost from one end of the
black/white scale to the other, with no essential change in the retinal image.
Once again, however, we see that these findings were anticipated by the Gestaltists, who clearly
sketched an intimate relationship between depth and lightness. Koffka (1935, p. 246) had empha-
sized the importance of coplanarity. After noting that lightness is a product of luminance ratios
between image patches that belong together, he wrote, ‘Which field parts belong together, and
how strong the degree of this belonging together is, depends upon factors of space organization.
Clearly, two parts at the same apparent distance will, ceteris paribus, belong more closely together
than field parts organized in different planes.’ Gelb (1932), Wolff (1933), and Kardos (1934) had all
demonstrated an effect of depth on lightness. Radonjić et al. (2010) replicated one of the Kardos
experiments and found that a change in perceived depth changed the perceived lightness of a tar-
get disk by 4.4 Munsell steps, with no change in the retinal image.
The idea that lightness crucially depends on the perceived 3D structure of the visual field is by
now firmly established. Empirical findings supporting a strong dependence of lightness on per-
ceived depth have been reported by Adelson (1993, 2000), Knill and Kersten (1991), Logvinenko
and Menshikova (1994), Pessoa et al (1996), Schirillo et al (1990), Spehar et al (1995), Taya et al
(1995), and others.

Different kinds of edges: reflectance versus illuminance edges


Wallach’s suggestion that the luminance ratio at an edge in the image remains constant under
a change in illumination level presupposes that all the edges in the image are reflectance edges.
However, they are not. If everything in a scene were painted the same homogeneous shade of gray,
the scene would not disappear. Many visible edges would remain, but these would all be illumina-
tion edges (Gilchrist and Jacobsen, 1984). These would include cast edges at the boundaries of
cast shadows, attached edges at corners, and at occlusion boundaries. When the illumination level
changes, the luminance ratio at these illumination edges often changes.
How could Wallach have neglected the ubiquity of illumination edges? I believe there is a his-
torical answer. The problem of lightness constancy manifests itself both temporally and spatially.
That is, the illumination level in the world varies both over time and over space. From the begin-
ning of research on lightness, investigation was focused on the temporal version of the constancy
Perceptual Organization in Lightness 395

problem. The spatial version of the problem was, with a few exceptions, ignored, as can easily
be seen in the theories. All three of Hering’s physiological factors invoked to account for con-
stancy ignore the problem of spatial illumination edges. Pupil size may be relevant to an over-
all shift in illumination level, but is hardly helpful when viewing a complex scene with multiple
regions of light and shadow. The same can be said for adaptation of the photoreceptors. As for
‘reciprocal interaction in the somatic visual field,’ later called lateral inhibition, when two identi-
cally gray papers lie under different illuminations, they produce different neural excitations at
the retina. Hering argued that the neural exaggeration of the difference at each the edge between
each gray paper and its background (a reflectance boundary) can mitigate that difference (Hering,
1874/1964, p. 141). However, he failed to recognize that if the difference in excitation on the two
sides of an illumination boundary (cast across a surface of homogeneous reflectance) is exagger-
ated, the problem of bringing neural excitation levels into line with perceived lightness levels is
made worse, not better. Hering was not stupid. We must conclude that he simply did not consider
the implications for lightness constancy of applying lateral inhibition to an illumination boundary.
Von Helmholtz (1866/1924), Hering (1874/1964), and Katz (1935, p 279) all suggested that per-
ceived illumination level was determined by the average luminance in the scene. This suggestion
makes sense only if you are thinking about a change of illumination (over the whole scene) from
time 1 to time 2. It makes no sense when a scene is divided into two adjacent regions of high and
low illumination. It is ironic that Katz also fell into this trap, given that the method of asymmetri-
cal matching he used so extensively in his early studies of lightness constancy featured exactly this
spatial version of the constancy problem: side-by-side regions of illumination and shadow.
In this sense, Wallach took a very traditional approach. This neglect of illumination edges is
very natural. In one study, Kardos (1934) asked his subjects to describe the entire laboratory
scene. They faithfully described the room and all its contents, but did not spontaneously mention
any of the shadows. When he asked them whether they see any shadows they replied that yes, of
course, they see the shadows, but they had not thought to mention them. This makes some sense.
While reflectance is an intrinsic property of a surface or object, the level of illumination on it is
not. Likewise, in spatial perception, the size of an object is an essential property, but its distance
from the observer is not. The visual system is tuned primarily to the intrinsic properties of objects,
much less to an accidental, temporary property like illumination level (see also Anderson, this
volume). The shading on a sculpture is instantly absorbed in the creation of a 3D percept such
that the luminance gradients across the object are scarcely noticed. It is natural that our percep-
tual system homes in on the essential features of the environment, not on the fleeting and fickle
variations in illumination. Ironically, however, this truth-seeking aspect of visual functioning may
have blinded both Wallach and the classic theorists to the important problem posed by spatial
illumination edges.
The preoccupation among students of lightness constancy by the temporal version of the prob-
lem for so long allowed relatively simplistic solutions to obscure the thornier aspects of the prob-
lem. As Arend (1994, p. 160) has clearly noted, ‘Lightness constancy over multiple-illuminants in
a single scene places much greater demands on candidate constancy models than does constancy
in single-illuminant scenes.’
To summarize, Wallach’s ratio principle works fine when applied to reflectance edges, but fails
when applied to illuminance edges. Here, we see one of several reasons why his ratio principle
cannot be reduced to lateral inhibition – that neural mechanism is blind to the kind of edge. The
visual system as a whole, however, cannot be blind to this distinction. If it were, lightness con-
stancy would fail catastrophically. The problem of edge classification, then, cannot be ignored.
396 Gilchrist

Koffka clearly recognized that luminance ratios at edges (which he called gradients) were criti-
cal to lightness, as can be seen in the first of two propositions he offered (Koffka, 1935, p. 248): ‘(a)
the qualities of perceived objects depend upon gradients of stimulation . . .’ But his appreciation
of the edge classification problem can be seen in his second proposition: ‘(b) not all gradients
are equally effective as regards the appearance of a particular field part . . .’ On the same page he
presents the problem of edge classification in concrete terms: ‘. . . given two adjoining retinal areas
of different stimulation, under what conditions will the corresponding parts of the behavioral
(perceptual) field appear of different whiteness but equal [perceived illumination], when of dif-
ferent [perceived illumination] but equal whiteness? A complete answer to this question would
probably supply the key to the complete theory of color perception in the broadest sense.’ (As
before I have substituted the modern term ‘perceived illumination’ for Koffka’s equivalent term
‘brightness.’) Although J. J. Gibson never worked substantially in lightness, Koffka’s influence on
him (presumably due to their decade of overlap at Smith College) can be seen in Gibson’s (1966,
p. 215) question, ‘Why is a change in color not regularly confused with a change in illumination?’
If the discrimination of reflectance and illumination edges is so fundamental to lightness percep-
tion, how is it done? Although a complete answer has not yet been achieved, we can cite many reveal-
ing empirical findings. The first factor often mentioned is edge sharpness. Illumination boundaries
typically contain a penumbra, while reflectance boundaries are more typically sharp, stepwise changes.
In his famous spot-shadow experiment, Hering (1874/1964, p. 8) created a cast shadow by sus-
pending an object in front of a piece of white paper. The shadow was perceived as such, presum-
ably due to its penumbra. However, when Hering painted a thick black line along the penumbra,
the shadow was perceived as a dark gray stain or a painted region. His thick black line obscured
the penumbra. The same phenomenon can be demonstrated without the black line, using a slide
projector. If a glass slide containing a small opaque disk glued to its center is placed in a slide pro-
jector and projected onto a large white wall, the disk will appear as a shadow when the projector
is somewhat out of focus, but it will appear as a darker surface color when the projector is brought
into focus. In the checker-block image by Adelson (2000), shown in Figure 19.1, however, the
edges within the two circles are equally sharp. Yet one is perceived as a reflectance edge, while the
other is perceived as an illuminance edge.
If luminance edges contain crucial information about lightness and illumination, intersec-
tions where edges cross one another are especially informative. In terms of the relative luminance
values in the four quadrants of an intersection, we find two basic patterns:  ratio-invariant and
difference-invariant (Gilchrist et al, 1983). When an illumination boundary crosses a reflectance
boundary, a common pattern, the result is ratio-invariance. Although the change in illumination
changes absolute values, it does not change the luminance ratio along the reflectance edge. The
same is true along the illumination boundary; the luminance ratio is constant regardless of the
reflectance on which it is projected.
However, when two illumination edges cross each other, as when there are two or more light
sources, the intersections show difference-invariance, not ratio invariance. Difference-invariance
is also found when the boundary of a veiling luminance intersects a more distant edge, regardless
of its type.

Local versus remote ratios


A simple ratio theory puts the lightness of a target surface far too much at the mercy of its retinally
adjacent (and perhaps accidental) neighbor. Several studies have demonstrated that the lightness
of a target can change dramatically, even when the target/background luminance ratio remains
Perceptual Organization in Lightness 397

Fig. 19.1  These two edges are locally identical, although one is perceived as a reflectance change
and the other as an illumination change.
Reproduced from Pentti I. Laurinen, Lynn A. Olzak, and Tarja L. Peromaa, Psychological Science, 8(5), pp. 386–
390, doi:10.1111/j.1467-9280.1997.tb00430.x, Copyright © 1997 by SAGE Publications. Reprinted by Permission
of SAGE Publications.

constant. Yarbus (1967) used a display similar to the simultaneous contrast pattern. Two red target
disks were placed on adjacent black and white backgrounds. As expected, the two disks appeared
slightly different in lightness. He then made the boundaries of the black and white backgrounds
disappear by retinally stabilizing them, causing the targets to appear to lie on a single homogen-
ous field. This made the targets appear far more different in lightness, even though the luminance
ratio at the disk border did not change. The implication is that the lightness of the disk depends
not only on the luminance ratio between the disk and its immediate background, but also upon
the luminance ratio at the edge of the background.
In the famous Gelb (1929) effect, a black paper appears white when it is suspended in midair and
illuminated by a spotlight. However, it appears black as soon as a (real) white background is placed
immediately behind the black paper within the spotlight. These phenomena seem ideally consist-
ent with Wallach’s ratio principle. However, in 1995 Cataliotti and Gilchrist published experiments
on the Gelb effect in which they broke the perceptual change into a series of steps. They started
with a black square in a spotlight. It appeared white. Then, they added a dark gray square next to
it, also in the spotlight. The new square (having a higher luminance) appeared completely white,
but caused the original square to darken to light gray. Then a middle gray square was added, and
so on, until the display contained a row of 5 squares, all standing in the spotlight. Each time a new
(and brighter) square was added it appeared white and caused the other squares to appear darker.
The goal was to test whether the darkening effect caused by the addition of a brighter mem-
ber was a contrast effect based on lateral inhibition, or (as they suspected) an anchoring effect.
Their test relied on the well-known fact that lateral inhibitory effects drop off precipitously with
distance across the retina. The question was thus, when each brighter square is added, does it
darken the adjacent square more than it darkens the others? In other words, as the novel brighter
square  moves farther away from the original square does its darkening effect on the original
square weaken? The answer turned out to be ‘no.’ The darkening effect depended only on the
degree to which each novel square raised the highest luminance in the row, not on its location.
398 Gilchrist

This implies that the darkening effect they found, in what has come to be called the staircase Gelb
effect, is an anchoring phenomenon.
These results also demonstrate that luminance ratios between non-adjacent surfaces can deter-
mine lightness just as much as those between adjacent surfaces. This is intuitively reasonable.
Land and McCann (1971), and Arend (1973) suggested that, if the retina encodes luminance
ratios at edges, ratios between remote surfaces can be computed by mathematically integrating the
series of edge ratios that lie along any path between the remote surfaces. Such an edge-integration
would be consistent with the results reported by Yarbus (1967), Arend et al. (1971), Gilchrist et al.
(1983), and Cataliotti and Gilchrist (1995).
Once again, an analysis by Koffka (1935, p. 248) shows his understanding of the role of remote
luminance ratios, and an experiment by Koffka and Harrower (1931) demonstrated it empirically.
In light of subsequent physiological work, it seems likely that such an integration is achieved
through spatial filtering – that is, through the integration of information from center-surround
receptive fields of varying location and scale (Blakeslee and McCourt, 1999).

Gestalt Theory
The concept of perceptual organization is intimately associated with the Gestalt theorists (see
Wagemans, this volume). They were the first to recognize the fundamental importance of this
problem. Different theories had sought to explain the perceived size of an object, but Wertheimer
(1923) realized that the very perception of an object at all is a perceptual achievement.
Long before the emergence of Gestalt theory, it had become obvious that perception could
not be explained by sensations associated with local stimulation. Hering (1874/1964, p. 23) had
written, ‘Seeing is not a matter of looking at light-waves as such, but of looking at external things
mediated by these waves; the eye has to instruct us, not about the intensity or quality of the light
coming from external objects at any one time, but about these objects themselves.’ However, that
shortcoming was conventionally addressed by assuming a cognitive modification of those sensa-
tions, typically based on prior experience. The Gestaltists forcefully rejected this duality of raw
sensations and cognitive modification, arguing that perception is the product of a unitary pro-
cess. Gelb (1929, excerpted in Ellis, 1938, p. 207) wrote: ‘Our visual world is not constructed by
‘accessory’ higher (central, psychological) processes from a stimulus-conditioned raw material of
‘primary sensations’ and sensation-complexes . . . ‘ Köhler (1947, p. 103) wrote, ‘Our view will be
that, instead of reacting to local stimuli by local and mutually independent events, the organism
responds to the pattern of stimuli to which it is exposed; and that this answer is a unitary pro-
cess, a functional whole which gives, in experience, a sensory scene rather than a mosaic of local
sensations.’
These Gestalt ideas did not fail on their own merits. Nor were they superseded by superior
ideas. Rather, they were eclipsed by external factors, specifically the tragic events surround-
ing World War II. The Gestaltists were forced to flee. The center of the scientific world shifted
to the United States, and its behaviorist hegemony. Gestalt thinking was seen as embarrass-
ingly metaphysical, especially when compared with the promises of the new, non-mentalistic
reductionism. However, for the question of lightness perception, the decades that followed
could be called the dark ages because the experiments were done in dark rooms and very little
progress was made. It was in this context that Wallach presented his ratio theory, but while
ratio theory may have been celebrated by the reductionists, it failed to reflect the rich insights
that had been offered by the Gestaltists.
Perceptual Organization in Lightness 399

Illumination came only with the cognitive revolution of the late 1960s, which legalized discus-
sion of internal processes. Influenced by David Marr (1982), artificial intelligence, and machine
vision, lightness theorists began to think in terms of inverse optics. Perhaps the decomposition of
the retinal image by the visual system is the mirror inverse of the manner in which the image is
initially composed by the multiplication of reflectance and illumination.
Various image decomposition models were proposed. Bergström (1977) suggested that the pat-
tern of reflected light is analyzed into common and relative components, analogous to Johansson’s
ingenious vector analysis of motion (see Giese, this volume; Herzog and Ögmen, this volume).
Thus, luminance variations in the image are attributed to changes in reflectance, illumination, and
planarity. Adelson and Pentland (1996) offered a similar approach couched in a vivid metaphor,
whereby painters, lighting designers, and metal benders cooperate to produce any given image in
the most economical way. Ekroll et al (2004) have provided additional evidence for an analysis
into common and relative components in the chromatic domain.
Barrow and Tenenbaum (1978) suggested that the retinal image can be treated as a multiple
image composed of separate layers, which they called intrinsic images. Gilchrist proposed an
intrinsic image approach in which luminance ratios at edges are encoded, classified as due to
reflectance or illuminance, and integrated within each class to produce separate reflectance and
illuminance maps (Gilchrist, 1979; Gilchrist et al., 1983). Arend (1994) and Blake (1985) offered
similar approaches.

Decomposition models as Gestalt


Certainly by comparison with the sensory and cognitive theories that preceded them, the decompos-
ition models were consistent with the spirit of Gestalt theory. There was no initial raw sensory stage.
The structure of the image, in particular, the 3D structure, was recognized. There was a place for every-
thing and everything was in its place. If a gradient of luminance was used for shape-from-shading
in one map, it was not available to the reflectance map and reflectance was seen as homogeneous at
that location. This kind of complementarity had been proposed earlier by Koffka (1935, p. 244) who
suggested ‘the possibility that a combination of whiteness and [perceived illumination], possibly their
product, is an invariant for a given local stimulation under a definite set of total conditions. If two equal
proximal stimulations produce two surfaces of different whiteness, then these surfaces will also have
different [perceived illuminations], the whiter one will be less, the blacker one more [brightly illumi-
nated]’ (substituting the modern phrase ‘perceived illumination’ for Koffka’s equivalent term ‘bright-
ness’). Later this was called the lightness-illumination invariance hypothesis by Japanese researchers
working in the Gestalt tradition (Kozaki and Noguchi, 1976; Noguchi and Kozaki, 1985). This view of
lightness and perceived illumination as complementary can also be seen in Gelb’s (1929, taken from
Ellis, 1938, p. 276) comment that, ‘Severance of illumination and that which is illuminated and per-
ception of a resistant and definitely colored surface are two different expressions of one and the same
fundamental process.’
Two of the earliest inverse-optics theories were presented by Johansson (1950) and Metelli
(1970), both Gestalt theorists. Johansson proposed that retinal motions are decomposed into
common and relative components, an analysis that is the mirror image of the initial synthesis
of eye movements and hierarchically nested distal motions (see Giese, this volume; Herzog &
Ögmen, this volume). Not surprisingly perhaps, the essential elements in Johansson’s vector ana-
lysis can be found in Duncker’s (1929) earlier concept of separation of systems. Musatti (1953)
presented an account of color perception analogous to Johansson’s model. Metelli proposed that
color scission is just the inverse of color fusion (see Gerbino, this volume).
400 Gilchrist

A new type of Gestalt theory based on frameworks and groups


In the 1990s, a new approach to lightness began to emerge, based on frameworks and percep-
tual grouping. Two authors of decomposition models, Adelson (2000) and Gilchrist (Gilchrist
et al., 1999), began to move away from the inverse-optics approach. Adelson began to speak
in terms of adaptive windows, sub-regions of the retinal image within which lightness is com-
puted by comparing luminance values. He noted that these regions need to be large enough
for the highest luminance value to be assumed to be white with reasonable probability, but
small enough that the window does not include regions of very different illumination level.
He also spoke about atmospheres, which incorporate not only high and low levels of illumi-
nation, but also regions of fog, and both veil (additive light) and filter components of trans-
parent regions.
Gilchrist’s anchoring theory (Gilchrist, 2006; Gilchrist et al, 1999) was couched in terms of
frameworks. The term framework, short for frame of reference, owes the most to the thinking of
Duncker (1929) and Koffka (1935), who invoked the concept so persuasively, especially in motion
perception. Just as the perception of any absolute motion in the visual field depends on the per-
ceptual frame of reference to which the motion belongs, so the lightness of a given surface lumi-
nance depends on the frame of reference within which it is embedded. Intuitively, a framework
is a field of illumination, as used by Katz (1935). However, a framework need not coincide with a
field of illumination, as we will see.
Within each framework, the lightness of a target is computed by multiplying the luminance
ratio between that target and the highest luminance in the framework by the reflectance of white
(90%). However, in complex images, any target surface is a member of at least one such local
framework and a global framework composed of the entire visual field. The final perceived value is
based on a weighted average of local and global values. This weighted average is closely related to
the earlier concept of co-determination, proposed by Kardos (1934) who suggested that lightness
is computed in relation to both relevant and foreign fields of illumination.
Subsequently, Bressan (2001, 2006a,b, 2007) published a modified anchoring theory, which
she calls double-anchoring theory. Accepting the concept of co-determination and the notion of
anchoring to the highest luminance, Bressan adds a second anchoring principle by which the sur-
round of any target is treated as white.

The rise of mid-level theories


This shift from layers to frameworks, in turn, was part of a larger trend  – the emergence of
mid-level models. Modern theories of lightness can be classified as low-, high-, or mid-level.
Low-level theories emphasize the role of peripheral sensory mechanisms. These theories go
back to Hering (1874/1964), who attributed what he called ‘approximate constancy’ to pupil size,
sensory adaptation, and lateral inhibition. Theories in this tradition (Cornsweet, 1970; Jameson
and Hurvich, 1964) have primarily focused on lateral inhibition. These theories have been called
‘structure blind’ because they rely on local processes. Photoreceptors that engage in mutual inhi-
bition, for example, are not conditioned by whether they exist at a reflectance edge or an illumi-
nance edge. High-level theories generally derive from von Helmholtz (1866/1924). They portray
lightness processing as cognitive, or thought-like.
Mid-level theories respond to the structure of the visual field without a high-level cognitive
component. The world is represented more sparsely than in the decomposition models, consist-
ent with change blindness work that began to appear about the same time (Simons and Levine,
1997). Mid-level models are rough and ready. They feature shortcuts. As Adelson (2000, p. 344)
Perceptual Organization in Lightness 401

has commented, the Helmholtzian approach is overkill (see also Koenderink, this volume, chapter
on Gestalts as ecological templates). Whereas the decomposition models are concerned primarily
with constancy, mid-level models give substantial attention to lightness illusions and failures of
constancy. In the same spirit, Singh and Anderson (2002) offered a mid-level account of perceived
transparency that has proven to account for the empirical data better than Metelli’s (1974) classic
inverse-optics approach.
It is debatable whether the decomposition models should be considered high-level or mid-level.
Although they are often treated as high-level, the decomposition models do not require a cogni-
tive component. There are no raw sensations and there is no appeal to past experience. On the
other hand, the decomposition models posit a very complete representation of the world.

Frameworks as Perceptual Groups


A framework can be thought of as a perceptual group, and it is subject to the usual Gestalt laws of
grouping. However, in this grouping, regions of the image are grouped by common illumination.
This use of the term grouping is somewhat unusual and requires some background.

Two kinds of grouping


Typically, Gestalt grouping principles have been invoked to organize the retinal mosaic into dis-
crete objects (see Brooks, this volume). In the famous words of Wertheimer (in Ellis, 1938, p. 71):
‘I stand at the window and see a house, trees, sky. And I could, then, on theoretical grounds, try to
sum up: there are 327 brightnesses (and tones of colour). (Have I “327”? No: sky, house, trees; and
no one can realize the having of the “327” as such.)’
Thus as Bressan (2001, 2007) has noted, we can make a distinction between two kinds of
grouping:
(1)  The traditional kind which involves the segregation of objects out of an indifferent retinal
mosaic.
(2)  The grouping of surfaces standing in the same illumination level.
The first might roughly be called grouping by reflectance, the second, grouping by illumination.
These are illustrated in Figure 19.2. Grouping regions A and C together supports the perception of
a square white napkin, while grouping regions A and B (and also C and D) supports the computa-
tion of surface lightness values.

Grouping by illumination
In fact, Koffka (1935, p. 246) hinted at just such a grouping by illumination. Using the term ‘appur-
tenance’ as a synonym for belongingness, Koffka wrote, ‘a field part x is determined in its appear-
ance by its “appurtenance” to other field parts. The more x belongs to the field part y, the more
will its whiteness be determined by the gradient xy, and the less it belongs to the part z, the less
will its whiteness depend on the gradient xz.’ When Koffka suggests that the whiteness (lightness)
of a surface depends on the luminance ratio between that surface and other surfaces to which it
belongs, he is talking about surfaces that lie in the same field of illumination.

Grouping by planarity
Gilchrist’s findings on coplanar ratios can be thought of as grouping by planarity. In a chapter
called ‘In defense of unconscious inference’ Irvin Rock (1977) sought to offer a Helmholtzian
402 Gilchrist

Fig. 19.2  Grouping by illumination (A & B; C & D) and grouping by reflectance (A & C; B & D).

account of those findings, writing, ‘When regions of differing luminance are phenomenally local-
ized in one plane, the perceptual system operates on the assumption that they are receiving equal
illumination’ (Rock 1977, p. 359).
This, too, was anticipated by Koffka (1935, p. 246) who wrote, ‘Which field parts belong together,
and how strong the degree of this belonging together is, depends upon factors of space organiza-
tion. Clearly, two parts at the same apparent distance will, ceteris paribus, belong more closely
together than field parts organized in different planes.’
In the Gilchrist (1980) experiments, depth perception allowed the visual system to organize
retinal patches into perceived planes. The surfaces within each plane, as is often the case, shared a
common illumination level. However, for purposes of lightness computation, which is more fun-
damental, grouping by planarity or grouping by illumination? Radonjić and Gilchrist (2013) have
recently teased these factors apart. They replicated Gilchrist’s (1980) earlier experiments involving
dihedral planes, but with one change. One of the two planes was further divided into two fields
of illumination by an illumination boundary. In this case, the lightness of the critical target was
determined, not by the highest luminance in that plane, but by the highest luminance within the
same region of illumination (which comprised only part of that plane).
Grouping by illumination makes sense. Von Helmholtz had glibly suggested that, to com-
pute lightness, the visual system must take the illumination level into account, but specifying
how this might be done is another matter. Von Helmholtz never did. Boyaci et al. (2003) and
Ripamonti et al. (2004) have proposed that the visual system takes into account the direction
and intensity of the light source, using cues like cast shadows, attached shadows, and glossy
highlights (Boyaci et al., 2006). Such a hypothesized process, however, would be computa-
tionally very expensive and perhaps impossible in the real world. There is virtually never only
Perceptual Organization in Lightness 403

a single light source. Consider your immediate environment as you read this. How many light
sources are there? Remember that you must include any windows, and remember that every
surface reflects light onto other surfaces.

Illumination level not needed


It turns out that there is a much simpler approach. The visual system does not need to know the
actual amount of illumination; it only needs to know which patches are getting the same level of
illumination. Comparing the luminances of retinal patches grouped by illumination level is not
only simpler computationally than comparing the luminance of a patch with some estimate of
illumination level, but it is also more consistent with the empirical data (Gilchrist, 2006). This is
where the grouping principles prove their worth.

Grouping Principles Work for Both Types of Grouping


The parallel between the classic notion of grouping (for object formation) and this more novel
kind of grouping by illumination is striking. Most of the classic grouping principles have already
been shown to be effective in grouping by illumination, although the authors of those experiments
did not think about their results in this way.

Grouping by proximity
Studies of the so-called brightness induction effect of a brighter ‘inducing field’ on a darker ‘test
field’ were reported by Cole and Diamond (1971), Dunn and Leibowitz (1961), Fry and Alpern
(1953), and Leibowitz et al. (1953). All found that, with luminances held constant, the perceived
brightness (and presumably lightness) of the darker test field decreases as the separation between
the two is reduced. Although they attributed this result to spatial function of lateral inhibition,
it perfectly satisfies Koffka’s claim that ‘The more x belongs to the field part y, the more will its
whiteness be determined by the gradient xy . . .’ McCann and Savoy (1991) and Newson (1958)
found the same results testing lightness explicitly, but without attribution to lateral inhibition.
Gogel and Mershon (1969) showed that changes in depth proximity (rather than lateral prox-
imity) produce the same effect on lightness. Their result cannot be attributed to lateral inhibition.
It is important to note that these test and inducing fields were either floating in mid-air, or pre-
sented against a totally dark background. When the fields are connected by a continuous series of
coplanar patches (as in Cataliotti and Gilchrist, 1995), little or no such proximity effect is found,
presumably because they are already strongly organized as a group of patches.

Grouping by similarity
Laurinen et al. (1997) superimposed shallow luminance modulations on each of the four parts
of the simultaneous contrast display, as shown in Figure 19.3. They found that the contrast effect
is substantially weakened if the modulation frequency on each target is different from that of its
background. Bonato et al. (2003) also found this result by varying the type of texture, rather than
the scale. Conversely, the contrast effect can be strengthened by giving one target and its back-
ground one frequency (or texture), while giving the other target and its background a different
frequency. Color can also be used to modulate similarity among regions of the contrast display
without altering relative luminance. Olkkonen et al. (2002) found that when both targets share a
common color and the two backgrounds share a different color, the illusion is reduced. In group-
ing terms, increasing the belongingness of each target and its immediate surround by giving them
404 Gilchrist

a common color, while simultaneously decreasing the belongingness between the two surrounds
by giving them different colors, tends to produce local lightness computations within each sur-
round, thus enhancing the perceived difference between targets. However, increasing the belong-
ingness between the two surrounds, as Olkkonen et al did, promotes a more global computation
within the whole pattern, and this reduces the contrast effect.

Grouping by common fate


Agostini and Proffitt (1993) have shown that a gray disk that moves together with a group of
white disks appears darker than an identical gray disk that moves together with a group of black
disks, even though all disks are seen against a common blue background. Bressan (2007) argues
that, while common fate is a strong grouping principle for object formation, it is a weak factor for
grouping by illumination.

Simultaneous lightness contrast as a grouping phenomenon


There is by now a good deal of evidence that a gray target on a black background appears lighter
than an identical gray target on a white background, not because of retinal adjacency, but because
of belongingness. This was first shown by Benary in 1924, using the image shown in Figure 19.3.
Even though the two triangles have identical adjacent luminances, the upper triangle appears

Fig. 19.3  (Left side) Depending on which regions are grouped by spatial frequency similarity, the
contrast effect can be weakened (top two examples) or strengthened (bottom example). (Upper
right) Benary effect. (Lower right) White’s illusion.
Reproduced from Pentti I. Laurinen, Lynn A. Olzak, and Tarja L. Peromaa, Psychological Science, 8(5),
pp. 386–390, doi:10.1111/j.1467-9280.1997.tb00430.x, Copyright © 1997 by SAGE Publications. Reprinted by
Permission of SAGE Publications.
Perceptual Organization in Lightness 405

slightly darker, presumably because it appears to belong to the white background. The lower tri-
angle appears lighter because it appears to belong to the black cross.
In 1979, Michael White introduced an illusion that now bears his name. While the Benary effect
is weaker than the standard simultaneous contrast effect, White’s illusion is much stronger (see
Figure 19.3). Moreover, the effect is counter to that suggested by adjacency, given that the gray
bars that appear lighter actually share more boundary length with white than with black. This
asymmetry is pushed even farther in the Todorović illusion (Todorović, 1997).

The role of T-junctions


These illusions not only suggest that simultaneous contrast should be viewed as a grouping phe-
nomenon, but they further reveal the critical grouping function of T-junctions. T-junctions
appear to strengthen the perceptual grouping of the two regions that meet across the stem of the T,
while weakening the grouping between those regions and the third region above the top of the T.

Reverse Contrast Illusions


The divergence of adjacency and belongingness reaches its logical conclusion in the three reverse
contrast illusions shown in Figure 19.4 (Bressan, 2001, 2006; Agostini and Galmonte, 2002;
Economou et al, 2007). In each case, the lightness difference between the identical gray targets
runs exactly counter to what should happen according to the traditional inhibition explanation,
and the illusion is produced by creating a perceptual group that rivals the immediate background
of each of the targets.
Economou and Gilchrist reasoned that if the grouping interpretation of his reverse contrast
effect is correct, it should be possible to vary the strength of the lightness illusion merely by vary-
ing the grouping factors that support the perception of the group of bars. Thus, in a forthcoming
paper, Economou and Gilchrist report that illusion strength does, indeed, vary predictably with
variations in:
(1)  proximity of the flanking bars;
(2)  shape similarity of target and flanking bars;
(3)  orientation similarity of target and flanking bars;
(4)  good continuation of the flanking bar ends.
In a further set of experiments Economou and Gilchrist varied the depth position of the various
elements in order to vary the depth proximity between the target bars and their would-be part-
ners – the flanking bars, and the white and black backgrounds. The reverse contrast illusion was
strongest when the target and flanking bars were perceived to lie in one plane, while the white
and black backgrounds were perceived to lie in a more distant plane. Conversely, the illusion was
weakest when the target bars, and white and black backgrounds were perceived to lie in the same
plane while the flanking bars were perceived to lie in a separate, nearer plane.

Segmentation versus grouping


Organizing retinal patches into regions of common illumination is the equivalent of segmenting
the retinal image by illumination level. Segmentation is thus the flip-side of grouping, and it is
equivalent to edge classification. According to Kardos (1934) the main factors in segmentation are
depth boundaries (corners and occlusion boundaries) and penumbrae.
Fig. 19.4  Three reverse contrast illusions.
(Top) Reproduced from M. White, The effect of the nature of the surround on the perceived lightness of grey
bars within square-wave test gratings, Perception 10(2), pp. 215–230, doi:10.1068/p100215, Copyright © 1981,
Pion. With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com. (Middle) Reproduced
from Tiziano Agostini and Alessandra Galmonte, Psychological Science, 13(1), Perceptual Organization Overcomes
the Effects of Local Surround in Determining Simultaneous Lightness Contrast: pp. 89–93, doi:10.1111/1467-
9280.00417, copyright © 2002 by SAGE Publications. Reprinted by Permission of SAGE Publications. (Bottom)
Dungeon illusion. Adapted from Paola Bressan, The place of white in a world of grays: A double-anchoring
theory of lightness perception, Psychological Review, 113(3), pp. 526–553, http://dx.doi.org/10.1037/0033-
295X.113.3.526 © 2006, American Psychological Association.
Perceptual Organization in Lightness 407

Frameworks that Create Illusions


Although the framework concept in lightness goes back to the Katz notion of field of illumi-
nation, many frameworks do not coincide with regions of illumination. The black and white
backgrounds of the simultaneous contrast display, for example, do not represent two levels of
illumination. Yet they seem to function like frameworks of illumination, to a limited degree.
Does this make sense?
In fact, it may be inevitable. Fields of illumination are not perceived that way just because they
are actually fields of illumination. The perception of a field of illumination must be based on cer-
tain cues, such as penumbra. However, those cues can occur in the absence of a field of illumina-
tion. When that happens, it appears that those cues create weak frameworks. The white and black
backgrounds in simultaneous contrast have perimeters of consistent, continuous sign, much like
spotlights and shadows. Perhaps for this reason they function as weak frameworks, approximately
six times weaker than regions of equal size and luminance that are actually perceived to differ in
illumination, according to edge substitution experiments (Gilchrist et al, 1983; Gilchrist, 1988).
Thus, when the boundary between the black and white backgrounds is replaced by a luminance
ramp (penumbra), the contrast illusion is significantly enhanced (Shapley, 1986).

Is reverse contrast an example of assimilation?


White’s illusion is often presented as an example of assimilation. However, the examples shown
in Figure 19.5, created by Bart Anderson (1997) show that this construction does not work. The
inequality signs indicate whether the target bars on the left should appear lighter or darker than
those on the right, according to an assimilation account. Mere inspection shows that these assimi-
lation predictions are falsified.

Contrast versus assimilation: not Gestalt concepts


There have been repeated attempts to organize these various lightness illusions by treating
contrast and assimilation as opposing processes. First, it should be noted that contrast and
assimilation are not Gestalt concepts. So-called contrast effects, as I have tried to show, were
interpreted by the Gestaltists as matters of belongingness. Indeed, Koffka (1935, p. 245) explic-
itly rejected Hering’s contrast theory because it ‘… implies an explanation not in terms of
gradient, but in terms of absolute amounts of light.’ Nor was assimilation proposed by the
Gestaltists. While Musatti (1953), clearly a Gestaltist, did employ the term assimilation, it
appears that he meant by it something analogous to Bergström’s (1977) notion of a common
component.
Secondly, attempts to define the conditions under which either contrast or assimilation occurs
have been made by Agostini and Galmonte (2000), Beck (1966), Bindman and Chubb (2004),
Festinger et al. (1970), Helson (1964), Jameson and Hurvich (1989), and Shapley and Reid (1985).
There is a total lack of consensus; each of these suggestions is different from all the others.

Frameworks versus Layers: two Gestalt Approaches


In the modern era of lightness research, the challenge of perceptual organization has primarily
been confronted by two classes of lightness theory: decomposition models and anchoring mod-
els. Decomposition models include those of Barrow and Tenenbaum, Gilchrist, Bergström, and
Adelson and Pentland. The central idea is that the retinal image is parsed into two overlapping
layers: a pattern of illumination superimposed over a pattern of surface reflectance. According to
the anchoring model of Gilchrist (2006), following Kardos (1934) and Koffka (1935), the image
408 Gilchrist

Assimilation predictions

Fig. 19.5  The inequality signs show on which side the shorter target bars are predicted to appear
lighter, according to assimilation. Perceived lightness contradicts these predictions.
Adapted from B.L. Anderson, A theory of illusory lightness and transparency in monocular and binocular images:
the role of contour junctions, Perception, 26(4), pp. 419–53, doi:10.1068/p260419, Copyright © 1997, Pion.
With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com.

is parsed into frameworks of illumination that are typically adjacent, like countries on a map.
Empirical support for both frameworks and layers exists. Although the relative merits of frame-
works and layers are debated (see Anderson and Winawer, 2008), these contending approaches
may ultimately turn out to be aspects of a single Gestalt account. But the outlines of such an inte-
gration are not obvious at present because the components into which the image is parsed, lay-
ers versus frameworks, seem mutually exclusive. Nevertheless, Bressan (2006a) has proposed the
concept of the overlay framework, in which a layer is also a framework. But this use of the term
framework departs substantially from that of Koffka or Kardos.

Conclusions
There is as yet no consensus on how surface lightness is computed by the brain. The fundamental
problem is that any luminance can come from any reflectance. Thus, the problem can be solved
Perceptual Organization in Lightness 409

only by using the surrounding context. Simply using the luminance ratio between a target surface
and its background is woefully inadequate. The lightness of a surface has been shown to depend
on many aspects of the perceptual structure of the image, including perceived 3D arrangement,
classification of edges, and long-distance luminance relationships. These problems of perceptual
organization have been confronted mainly by either parsing the image into overlapping layers
representing illumination and reflectance or into frameworks within which lightness is computed
by comparing luminances. It is hoped that further research will lead to models that incorporate
the strengths of both approaches.

References
Adelson, E. H. (1993). Perceptual organization and the judgment of brightness. Science 262, 2042–2044.
Adelson, E. H. (2000). Lightness perception and lightness illusions. In The New Cognitive Neuroscience, 2nd
edn, edited by M. Gazzaniga, pp. 339–351. Cambridge, MA: MIT Press.
Adelson, E. H., and Pentland, A. P. (1996). The perception of shading and reflectance. In Perception as
Bayesian Inference, edited by D. Knill and W. Richards, pp. 409–423. New York: Cambridge University
Press.
Agostini, T., and Galmonte, A. (2000). Contrast and assimilation: the belongingness paradox. Rev Psychol
7(1-2): 3–7.
Agostini, T., and Galmonte, A. (2002). Perceptual organization overcomes the effect of local surround in
determining simultaneous lightness contrast. Psychol Sci 13(1): 89–93.
Agostini, T., and Proffitt, D. R. (1993). Perceptual organization evokes simultaneous lightness contrast.
Perception 22(3): 263–272.
Anderson, B. (1997). A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions. Perception 26: 419–453.
Anderson, B., and Winawer, J. (2008). Layered image representations and the computation of surface
lightness. J Vision 8(7): 1–22.
Arend, L. (1994). Surface colors, illumination, and surface geometry: intrinsic-image models of human
color perception. In Lightness, Brightness, and Transparency, edited by A. Gilchrist, pp. 159–213.
Hillsdale: Erlbaum.
Arend, L. E. (1973). Spatial differential and integral operations in human vision: implications of stabilized
retinal image fading. Psychol Rev 80, 374–395.
Arend, L. E., Buehler, J. N., and Lockhead, G. R. (1971). Difference information in brightness perception.
Percept Psychophys 9: 367–370.
Barlow, H. B., and Levick, W. R. (1969). Three factors limiting the reliable detection of light by retinal
ganglion cells of the cat. J Physiol 200: 1–24.
Barrow, H. G., and Tenenbaum, J. (1978). Recovering intrinsic scene characteristics from images. In
Computer Vision Systems A. R. Hanson and E. M. Riseman, pp. 3–26. Orlando: Academic Press.
Beck, J. (1965). Apparent spatial position and the perception of lightness. J Exp Psychol 69:P 170–179.
Beck, J. (1966). Contrast and assimilation in lightness judgements. Percept Psychophy 1: 342–344.
Benary, W. (1924). Beobachtungen zu einem Experiment über Helligkeitskontrast (Observations
concerning an experiment on brightness contrast). Psychol Forsch 5: 131–142.
Bergström, S. S. (1977). Common and relative components of reflected light as information about the
illumination, colour, and three-dimensional form of objects. Scand J Psychol 18: 180–186.
Bindman, D., and Chubb, C. (2004). Brightness assimilation in bullseye displays. Vision Res 44(3): 309–319.
Blake, A. (1985). Boundary conditions for lightness computation in Mondrian world. Comp Vision Graphics
Image 32: 314–327.
Blakeslee, B., and McCourt, M. E. (1999). A multiscale spatial filtering account of the White effect,
simultaneous brightness contrast and grating induction. Vision Res 39: 4361–4377.
410 Gilchrist

Bonato, F., Cataliotti, J., Manente, M., and Delnero, K. (2003). T-junctions, apparent depth, and perceived
lightness contrast. Percept Psychophys 65(1): 20–30.
Boyaci, H., Doerschner, K., and Maloney, L. (2006). Cues to an equivalent lighting model. J Vision
6: 106–118.
Boyaci, H., Maloney, L., and Hersh, S. (2003). The effect of perceived surface orientation on perceived
surface albedo in binocularly viewed scenes. J Vision 3: 541–553.
Bressan, P. (2001). Explaining lightness illusions. Perception 30: 1031–1046.
Bressan, P. (2006a). Inhomogeneous surrounds, conflicting frameworks, and the double-anchoring theory
of lightness. Psychonom Bull Rev 13: 22–32.
Bressan, P. (2006b). The place of white in a world of grays: a double-anchoring theory of lightness
perception. Psychol Rev 113(3): 526–553.
Bressan, P. (2007). Dungeons, gratings, and black rooms: a defense of the double-anchoring theory of
lightness and a reply to Howe et al. Psychol Rev 114: 1111–1114.
Cataliotti, J., and Gilchrist, A. L. (1995). Local and global processes in lightness perception. Percept
Psychophys 57(2), 125–135.
Cole, R. E., and Diamond, A. L. (1971). Amount of surround and test inducing separation in simultaneous
brightness contrast. Percept Psychophys 9: 125–128.
Cornsweet, T. N. (1970). Visual Perception. New York: Academic Press.
Duncker, D. K. (1929). Uber induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung). Psychol Forsch 12: 180–259.
Dunn, B., and Leibowitz, H. (1961). The effect of separation between test and inducing fields on brightness
constancy. J Exp Psychol 61(6): 505–507.
Economou, E., Zdravkovic, S., and Gilchrist, A. (2007). Anchoring versus spatial filtering accounts of
simultaneous lightness contrast. J Vision 7(12), 1–15.
Ekroll, V., Faul, F., and Niederee, R. (2004). The peculiar nature of simultaneous colour contrast in uniform
surrounds. Vision Res 44: 1756–1786.
Ellis, W. D. (Ed.). (1938). A Source Book of Gestalt Psychology. New York: Humanities Press.
Epstein, W. (1961). Phenomenal orientation and perceived achromatic color. J Psychol 52: 51–53.
Festinger, L., Coren, S., and Rivers, G. (1970). The effect of attention on brightness contrast and
assimilation. Am J Psychol 83: 189–207.
Flock, H. R., and Freedberg, E. (1970). Perceived angle of incidence and achromatic surface color. Percept
Psychophys 8: 251–256.
Fry, G. A., and Alpern, M. (1953). The effect of a peripheral glare source upon the apparent brightness of an
object. J Opt Soc Am 43: 189–195.
Gelb, A. (1929). Die ‘Farbenkonstanz’ der Sehdinge (The color of seen things). In Handbuch der normalen
und pathologischen Physiologie, Vol. 12, edited by W. A. von Bethe, pp. 594–678. Berlin: Julius Springer.
Gelb, A. (1932). Die Erscheinungen des simultanen Kontrastes und der Eindruck der Feldbeleuchtung.
Zeitschr Psychol 127: 42–59.
Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gilchrist, A. (1979). The perception of surface blacks and whites. Scient Am 240: 112–123.
Gilchrist, A. (1988). Lightness contrast and failures of constancy: a common explanation. Percept
Psychophys 43(5): 415–424.
Gilchrist, A. (1994). Absolute versus relative theories of lightness perception. In Lightness, Brightness, and
Transparency, edited by A. Gilchrist, pp. 1–33. Hillsdale: Erlbaum.
Gilchrist, A. (2006). Seeing Black and White. New York: Oxford University Press.
Gilchrist, A., Delman, S., and Jacobsen, A. (1983). The classification and integration of edges as critical to
the perception of reflectance and illumination. Percept Psychophys 33(5): 425–436.
Perceptual Organization in Lightness 411

Gilchrist, A., and Jacobsen, A. (1984). Perception of lightness and illumination in a world of one
reflectance. Perception 13, 5–19.
Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T., Cataliotti, J., Li, X., et al. (1999). An anchoring theory
of lightness perception. Psychol Rev 106(4): 795–834.
Gilchrist, A. L. (1977). Perceived lightness depends on perceived spatial arrangement. Science 195: 185–187.
Gilchrist, A. L. (1980). When does perceived lightness depend on perceived spatial arrangement? Percept
Psychophys 28(6): 527–538.
Gogel, W. C., and Mershon, D. H. (1969). Depth adjacency in simultaneous contrast. Percept Psychophys
5(1): 13–17.
Hartline, H., Wagner, H., and Ratliff, F. (1956). Inhibition in the Eye of Limulus. J Genet Physiol
39: 357–673.
Helmholtz, H., von (1866/1924). Helmholtz’s Treatise on Physiological Optics. New York: Optical Society of
America.
Helson, H. (1964). Adaptation-Level Theory. New York: Harper & Row.
Hering, E. (1874/1964). Outlines of a Theory of the Light Sense, translated by L. M. H. D. Jameson.
Cambridge, MA: Harvard University Press.
Hochberg, J. E., and Beck, J. (1954). Apparent spatial arrangement and perceived brightness. J Exp Psychol
47: 263–266.
Jameson, D., and Hurvich, L. M. (1964). Theory of brightness and color contrast in human vision. Vision
Res 4: 135–154.
Jameson, D., and Hurvich, L. M. (1989). Essay concerning color constancy. Ann Rev Psychol 40: 1–22.
Johansson, G. (1950). Configurations in Event Perception. Uppsala: Almqvist & Wiksell.
Kardos, L. (1934). Ding und Schatten [Object and Shadow]. Zeitschr Psychol Erg bd 23.
Katz, D. (1935). The World of Colour. London: Kegan Paul, Trench, Trubner & Co.
Knill, D., and Kersten, D. (1991). Apparent surface curvature affects lightness perception. Nature
351(May): 228–230.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, and World.
Koffka, K., and Harrower, M. R. (1931). Colour and Organization II. Psychol Forsch 15: 193–275.
Köhler, W. (1947). Gestalt Psychology. New York: Liveright.
Kozaki, A., and Noguchi, K. (1976). The relationship between perceived surface-lightness and perceived
illumination. Psychol Res 39: 1–16.
Land, E. H., and McCann, J. J. (1971). Lightness and retinex theory. J Opt Soc Am A 61: 1–11.
Laurinen, P. I., Olzak, L. A., and Peromaa, T. (1997). Early cortical influences in object segregation and the
perception of surface lightness. Psychol Sci 8(5): 386–390.
Leibowitz, H., Mote, F. A., and Thurlow, W. R. (1953). Simultaneous contrast as a function of separation
between test and inducing fields. J Exp Psychol 46: 453–456.
Logvinenko, A., and Menshikova, G. (1994). Trade-off between achromatic colour and perceived
illumination as revealed by the use of pseudoscopic inversion of apparent depth. Perception
23(9): 1007–1024.
Mach, E. (1865). Über die Wirkung der räumlichen Vertheilung des Lichtreizes auf die Netzhaut.
Sitzungsberichte der mathematisch-naturwissenschaftlichen Classe der kaiserlichenAkademic der
Wissenschaften 52(2): 303–322.
Mach, E. (1922/1959). The Analysis of Sensations (Vol. English Translation of Die Analyse der
Empfindungen, 1922). New York: Dover.
Marr, D. (1982). Vision. San Francisco: Freeman.
McCann, J. J., and Savoy, R. L. (1991). Measurements of lightness: dependence on the position of a white in
the field of view. Proc SPIE 1453: 402–411.
412 Gilchrist

Metelli, F. (1970). An algebraic development of the theory of perceptual transparency. Ergonomics


13: 59–66.
Metelli, F. (1974). The perception of transparency. Scientific American 230: 90–98.
Metzger, W. (1930). Optische Untersuchungen Am Ganzfeld. II. Zur Zeitschrift für Psychologier
Phanomenologie Des Homogenen Ganzfelds. Zeitschr Psychol 13: 6–29.
Musatti, C. (1953). Luce e colore nei fenomeni del contrasto simultaneo, della costanza e dell’eguagliamento
[Experimental research on chromatic perception: light and color constancy, contrast, and illumination
phenomena]. Arch Psicol Neurol Psichiat 5: 544–577.
Newson, L. J. (1958). Some principles governing changes in the apparent lightness of test surfaces isolated
from their normal backgrounds. Q J Exp Psychol 10: 82–95.
Noguchi, K., and Kozaki, A. (1985). Perceptual scission of surface-lightness and illumination: An
examination of the Gelb effect. Psychol Res 47: 19–25.
Olkkonen, K., Saarela, T., Peromaa, T., and Laurinen, P. I. (2002). Effects of chromatic contrast on
brightness perception. Perception 31(Supplement): 184d.
Pessoa, L., Mingolla, E., and Arend, L. (1996). The perception of lightness in 3D curved objects. Percept
Psychophys 58: 1293–1305.
Radonjić, A., and Gilchrist, A. (2013). Depth effect on lightness revisited: the role of articulation, proximity
and fields of illumination. i-Perception 4(6): 437–455.
Radonjić, A., Todorović, D., and Gilchrist, A. (2010). Adjacency and surroundedness in the depth effect on
lightness. J Vision 10: 1–16.
Ripamonti, C., Bloj, M., Hauck, R., Mitha, K., Greenwald, S., Maloney, S., et al. (2004). Measurements of
the effect of surface slant on perceived lightness. J Vision 4: 747–763.
Rock, I. (1977). In defense of unconscious inference. In Stability and Constancy in Visual
Perception: Mechanisms and Processes, edited by W. Epstein, pp. 321–373. New York: Wiley.
Schirillo, J. A., Reeves, A., and Arend, L. (1990). Perceived lightness, but not brightness, of achromatic
surfaces depends on perceived depth information. Percept Psychophys 48(1): 82–90.
Shapley, R. (1986). The importance of contrast for the activity of single neurons, the VEP and perception.
Vision Res 26(1): 45–61.
Shapley, R., and Reid, R. C. (1985). Contrast and assimilation in the perception of brightness. Proc Nat
Acad Sci USA 82: 5983–5986.
Simons, D. J., and Levin, D. T. (1997). Change Blindness. Trends Cogn Sci 1: 261–267.
Singh, M., and Anderson, B. L. (2002). Toward a perceptual theory of transparency. Psychol Rev
109: 492–519.
Spehar, B., Gilchrist, A., and Arend, L. (1995). White’s illusion and brightness induction: the critical role of
luminance relations. Vision Res 35: 2603–2614.
Taya, R., Ehrenstein, W., and Cavonius, C. (1995). Varying the strength of the Munker–White effect by
stereoscopic viewing. Perception 24: 685–694.
Todorović, D. (1997). Lightness and junctions. Perception 26(4): 379–394.
Troy, J., and Enroth-Cugell, C. (1993). X and Y ganglion cells inform the cat’s brain about contrast in the
retinal image. Exp Brain Res 93: 383–390.
Wallach, H. (1948). Brightness constancy and the nature of achromatic colors. J Exp Psychol 38: 310–324.
Wallach, H. (1963). The perception of neutral colors. Scient Am 208: 107–116.
White, M. (1979). A new effect of pattern on perceived lightness. Perception 8(4): 413–416.
Whittle, P., and Challands, P. D. C. (1969). The effect of background luminance on the brightness of
flashes. Vision Res 9: 1095–1110.
Wolff, W. (1933). Concerning the contrast-causing effect of transformed colors. Psychol Forsch 18: 90–97.
Yarbus, A. L. (1967). Eye Movements and Vision. New York: Plenum Press.
Chapter 20

Achromatic transparency
Walter Gerbino

History and definitions


Phenomenal transparency is a key property of perceptual organization, emerging under appropri-
ate stimulus conditions and often coupled with other aspects of experienced wholes. In the frame-
work of percept-percept coupling (Epstein 1982; Hochberg 1974; Savardi and Bianchi 2012),
transparency may be both an effect and a cause, as evidenced in the title of a seminal paper by
Kanizsa (1955) and argued by Nakayama et al. (1990).
Broadly speaking, transparency is a good label for any instance of experiencing something
through something else. In vision, we can see an object—sometimes vividly, sometimes vaguely—
through a piece of glass, a medium like smoke, or an image reflected on the surface of a pond;
a double experience that intrigued vision theorists (Arnheim 1974, p. 253; Gibson 1975, 1979;
Koffka 1935, pp. 260–264), painters like Paul Klee (1961; Rosenthal 1993), designers and archi-
tects (Kepes 1944; Rowe and Slutzky 1963), and plays a crucial role in visualization techniques
(Chuang et al. 2009; Stone and Bartram 2008). In audition, Bregman (1996, 2008; Denham and
Winkler, Chapter 29, this volume) emphasized that perceiving sounds through other sounds is
ordinary in auditory scene analysis. In touch, transparency has been analyzed by Katz (1925/1989;
Krueger 1982) and constitutes a relevant aspect of product design and experience (Sonneveld and
Schifferstein 2008, p. 60).1
In the present chapter transparency qualifies the phenomenal possibility of seeing something
through something else and shifting attention from what is in front to what is behind, along the
same line of sight. With respect to perceptual organization, transparency supports the modal
completion of partially occluded contours, while occlusion requires their amodal completion (van
Lier and Gerbino, Chapter 15, this volume). To a first approximation, the physical counterpart
of phenomenal transparency is transmittance; i.e., the fraction of light that a layer allows to pass
through without modifying its structure.
The chapter is focused on vision in a grey world. Independently of an explicit grey-world
assumption (i.e., without assuming that the average spectral reflectance curve of environmental
surfaces is flat) a great deal of research has been devoted to the achromatic case, for the good
reason that the visual system seems well adapted to process the patterns of intensive changes gen-
erated by the interposition of transparent layers; patterns that differ in achromatic and chromatic
cases (Da Pos 1999; Kramer and Bressan 2009, 2010).2 The generalizability of any model devel-
oped in achromatic conditions is important (Faul and Ekroll 2012); but perceptual organization
issues are better analyzed in the grey world.

  Transparency experienced in sensory perception provides a basis for the transparency metaphor, frequently
1

encountered in fields as diverse as philosophy of mind (Hatfield 2011), linguistics (Libben 1998), and politics.
  Chuang et al. (2009) discuss the dominance of achromatic constraints in visualization.
2
414 Gerbino

(a) (b)
a b
p q

(c)

(d)

Fig. 20.1  Apparent transparency. The abpq pattern in panel a is usually perceived as a dark bar on
top of a white cross (though an alternative perceptual solution is possible) and not as the mosaic
of irregular shapes shown in panel b. The pattern in panel c is a control for the effect of figural
organization on perceived color: the adjacencies are kept constant, while good continuation of
contours at junctions is eliminated. According to Metzger, transparency is not perceived in panel d
because both black and white regions have a good shape and the addition of the grey region would
not generate figures with a better shape.
Adapted from Metzger, Wolfgang. translated by Lothar Spillman., Laws of Seeing, figure 131, modified, © 2006
Massachusetts Institute of Technology, by permission of The MIT Press.

Achromatic transparency plays a special role in perceptual organization for the following
reasons:
•  it provides an ideal case for the application of the tendency to Prägnanz, which may be taken as
the distinctive trait of the Gestalt theory of perception;
•  under optimal conditions it appears as an organized outcome strongly constrained by geometric
and photometric information, and highly functional, being formally equivalent to the solution
of a pervasive inverse-optics problem;
•  under suboptimal conditions it reveals the links between color and form (a leitmotif of Gestalt
psychology; Koffka 1935, pp. 260–264; see Section “Transparency and motion”).
Consider how Metzger (1936/2006) set up the problem in Chapter  8 of Gesetze des Sehens,
discussing a demonstration from Fuchs (1923). Figure 20.1a is normally perceived as a dark trans-
parent bar on top of a white cross, not as the mosaic in Figure 20.1b.3 The bar and the cross inter-
sect in such a way that each ‘claims as its own’ the superposition region, requiring the scission of

  The pattern in Figure 20.1a supports two transparency solutions. See Figure 20.7 for an analysis of bivalent
3

4-region patterns.
Achromatic Transparency 415

its grey substance into two components that perceptual organization makes as similar as possi-
ble to bar and cross lightnesses. The double-belongingness of the superposition region depends,
locally, on the good continuation of contours meeting at X-junctions and, more globally, on the
improvement of form regularity. Metzger (1936/2006) referred to his Fig.  27 to claim that the
strength of such factors is well established by classical demonstrations with intertwined outline
patterns (Köhler 1929; Wertheimer 1923/2012).4
Figure 20.1c (not in Metzger 1936/2006; drawn following Kanizsa 1955) is a control. All adja-
cencies in Figure 20.1a are maintained, but contours of neither the bar nor the cross keep a con-
stant trajectory at X-junctions. The dark bar survives as a unit, being supported by the topological
condition (see Section “Topological and figural conditions”); but the sense of transparency is
weakened, and the color appearance of the superposition region is different from the one in
Figure 20.1a.
Figure 20.1d displays a counterexample in which the same greys of Figure 20.1a are combined
in a pattern that is perceived as a mosaic of three adjacent squares, though compatible—in prin-
ciple—with the overlapping of two homogeneous rectangles, with the same front/back ambiguity
and alternating transparency observable in the cross/bar display of Figure 20.1a.
Much of the theoretical weight of transparency depends on the colors seen when the inter-
section region belongs to both the dark bar and the light cross (panel a), rather than appearing
as an isolated surface (panel b). Figural belongingness modulates the scission of the sensation
(Spaltung der Empfindung; Hering 1879) and impacts on perceived intensity and color appear-
ance. Helmholtz (1910/1924, originally published in 1867) framed real transparency as a problem
of recognizing the components of a light mixture, using knowledge acquired in ordinary environ-
ments in which at least the mixture of illumination and reflectance components is pervasive. In
the Helmholtzian view, the same ratiomorphic process supports the discounting of illumination
associated with the approximate constancy of opaque surface colors, the perception of shadows,
the separation of filter properties from background properties, and analogous recovery problems.
‘Just as we are accustomed and trained to form a judgment of colours of bodies by eliminating
the different brightness of illumination by which we see them, we eliminate the colour of the
illumination also. [. . .] Thus too when we view an object through a coloured mantle, we are not
embarrassed in deciding what colour belongs to the mantle and what to the object.’ (Helmholtz
1924, p. 287.)
Helmholtz’s emphasis on observers’ ability to evaluate light mixture components conflicts with
the plain argument developed in Figure 20.1. The same light mixture sometimes is phenomenally
split into components, sometimes not, depending on stimulus conditions. The discovery of condi-
tions for the occurrence of phenomenal transparency (independent of its veridicality) is the goal
of a long tradition of research oriented by Gestalt ideas (Fuchs 1923; Kanizsa 1955, 1979; Koffka
1935; Metelli 1970, 1974, 1975; Moore-Heider 1933; Tudor-Hart 1928), among which a special
place is held by the idea that double-belongingness is a peculiar organization producing charac-
teristic effects on perceived color (Kanizsa 1955; Musatti 1953; Wallach 1935/1996).
Since transparency can be observed in line-drawing displays (Bozzi 1975), without specific
photometric information, let us consider geometric conditions first.

  In the Gestalt tradition the ‘apparent/real’ dichotomy is used to stress that real transparency (i.e., a layer with
4

non-zero transmittance) is neither necessary nor sufficient to support a transparency percept; apparent trans-
parency is perceived in mosaics of opaque surfaces. Like for motion, the apparent/real dichotomy stimulates the
search for the proximal conditions supporting the perception of transparency, independent of its veridicality.
416 Gerbino

Topological and figural conditions


Take the prototypical 4-region pattern in Figure 20.1a. To support perceived transparency, p and
q regions should group together and form the layer; furthermore, each of them should group with
the other adjacent region (a and b, respectively) and form a background surface partially occluded
by the layer. That is, both p and q should belong to two units, subordinate to the whole configura-
tion but superordinate to input regions, according to the intertwined pattern (a[p)(q]b).5
As suggested in the title of this section, the double-belongingness of two of the four regions
depends on geometric constraints that have been articulated into topological and figural condi-
tions (Kanizsa 1955, 1979; Metelli 1974, 1975, 1985b).

Topological condition
The topological condition has been formulated as follows (Kanizsa 1955). To belong to two subu-
nits each candidate region must be in contact with the other (reciprocal contact constraint) and
with only one of the remaining regions (Figure 20.2). At the level of regions, the condition is
satisfied when contours meet at a generic 4-side junction, even without good continuation at the
contour level (Figure 20.1c).
Kanizsa (1955, 1979) and Metelli (1975, 1985b) discussed various controversial configurations
connected to the topological condition. Kanizsa (but not Metelli) concluded that the topological
condition is necessary, though not sufficient. Panels b–d in Figure 20.2 depict violations that lead
to the loss of the compelling transparency percept observed in Figure 20.2a. However, the bro-
ken layer depicted in Figure 20.2c does not completely forbid transparency, being consistent with
common observations of shadows falling over a 3D step, with non coplanar background regions.
Arguing that the topological condition is necessary, Kanizsa (1979, Fig. 8.9) claimed that trans-
parency is hardly seen in Figure 20.3a.6
Apart from being necessary or not, what is the meaning of the topological condition? Does
it capture a figural constraint at the level of regions or does it relate to photometric conditions
described in Section “Photometric conditions”? The second hypothesis is supported by a manipu-
lation of borders done by Metelli (1985b). Transparency of the oblique square in Figure 20.3b dis-
appears if one eliminates the adjacency of to-be-grouped regions by superposing a thick outline
on the borders of the intersection region (Figure 20.3c). Transparency is not blocked, however, if
all regions are bounded by thick outlines that can become part of the transparency solution, with
the upright square perceived on top of the oblique square (Figure 20.3d). The isolation effect in
Figure 20.3c is reminiscent of the loss of the film appearance in a shadow whose penumbra is sup-
pressed by a thick outline.7

Figural conditions
Figural aspects play a major role in transparency and, when strengthened by motion, can overcome
contradictory photometric information. Kanizsa (1955, 1979) and Metelli (1974) emphasized the role

  An extended notation for the double-belongingness of p and q regions would be (ap)(pq)(qb). In the compact
5

notation above the subunit corresponding to the transparent layer is marked by square brackets, while the
background subunits are marked by round brackets.
  You may disagree.
6

  See discussions of Hering’s shadow/spot demonstration in Metzger (1936/2006, Fig. 132) and Gilchrist
7

(2006, p. 21).
Achromatic Transparency 417

(a) (b)

(c) (d)

Fig. 20.2  Topological condition. (a) Canonical 4-region display fulfilling all geometric and photometric
requirements. Panels b–d illustrate three ways in which the topological condition can be violated. (b)
Regions that should be unified into a single layer are not in reciprocal contact, while touching both
background regions. (c) The reciprocal contact constraint is fulfilled, but both candidate layer regions are
in contact also with both background regions. (d) The topological condition is violated also when the
inner contour of a unitary layer (i.e., the one that divides the two constituent regions) is not aligned with
the contour that divides the background regions.
Data from G. Kanizsa, Condizioni ed effetti della trasparenza fenomenica, Rivista di Psicologia, 49, pp. 3–19,
1955.

of good continuation at X-junctions as the critical local factor supporting vivid impressions of trans-
parency, other things being equal (i.e., once the topological condition is fulfilled and keeping the
intensity pattern constant). However, they considered also more global figural factors, like the shape
of regions.
Figural conditions for the double-belongingness of regions to be grouped into a layer agree with
those that govern the segmentation of outline patterns and have been studied within a research
tradition that goes from Wertheimer (1923/2012) to the most recent developments of Structural
Information Theory (SIT; Leeuwenberg and van der Helm 2013). Wertheimer (1923/2012), com-
menting on his Figs. 33 and 34, observed that Fuchs (1923) utilized the same laws of unification/
segregation when studying transparent surfaces in the period 1911–1914 and found they strongly
affect color. Wertheimer’s Fig. 33 is an outline version of Figure 20.3b, while Wertheimer’s Fig. 34
is similar to Figure 20.1d. These and other famous outline patterns (like the pair of intertwined
hexagons) support the idea that figural segmentation crucially depends on the tendency towards
the ‘good whole Gestalt’ (Wertheimer 1923, p. 327; Wagemans, Chapter 1, Section “Wertheimer’s
“Gestalt laws” (1923)”, this volume).
418 Gerbino

(a) (b)

(c) (d)

Fig. 20.3  According to Kanizsa (1979) the pattern in panel a shows that the topological condition
cannot be violated without destroying perceived transparency. Adapted from G. Kanizsa, Organization
in Vision, Figure 9.6, p. 160, Praeger, Santa Barbara, USA, Copyright © 1979, Praeger. Panels b–d (from
Metelli 1985b) show the effect of thick outlines. The transparency perceived in panel b is destroyed by a
thick outline surrounding the superposition region (panel c). A thick outline surrounding all regions can
be integrated in the transparency percept (panel d).

In an early application of SIT to visual and auditory domains, Leeuwenberg (1976, 1982;
Leeuwenberg and van der Helm 2013; see also van der Helm, Chapter 50, this volume) computed
a measure of preference for pattern segmentation based on the ratio between the complexity of the
mosaic solution and the complexity of the transparency solution. Using patterns like those in Figure
20.4 and coding only figural complexity (independently of photometric conditions), he obtained a
high correlation between the theoretical preference measure and transparency judgments.
Singh and Hoffman (1998) provided a major contribution to the idea that figural conditions go
beyond the local good continuation at X-junctions. They used displays with X-junctions that pre-
served the local good continuation of background and layer contours, and asked observers to rate
perceived transparency on a 1-7 scale. Observers were more sensitive to the size of turning angles
at the extrema of curvature of the layer boundary when they were negative minima than positive
maxima. Average ratings ranged from 1.5 (close to perfect mosaic) to 6 for negative minima, and
from 4 to 6 for positive maxima. Furthermore, Singh and Hoffman (1998) found that the prox-
imity of the extrema of curvature to the background boundary increased the detrimental effect
on transparency ratings. Their results show that the competition between mosaic and double-
belongingness solutions depends on properties like negative extrema, which are relevant for the
parsing of shapes into parts (Singh, Chapter 12, this volume).
All geometric factors known to affect relative depth may be effective in making the transpar-
ent layer more salient and in modulating the preference for one transparency solution when
Achromatic Transparency 419

(a) (b)

Fig. 20.4  According to Leeuwenberg’s coding approach (1976, 1982) perceived transparency is
predicted by a preference measure, with a value of 1 for the balance between mosaic and transparency
solutions. Preference values are 11.90 in panel a and 0.56 in panel b. This preference measure takes into
account only figural (not photometric) aspects.
Reproduced from Emanuel Leeuwenberg and Peter A. van der Helm, Structural Information Theory: The Simplicity
of Visual Form, Cambridge University Press, Cambridge, UK, Copryight © 2012, Cambridge University Press, with
permission.

photometric conditions are ambivalent (see Section “Reflectances or luminances?”). Delogu


et al. (2010) demonstrated that relative size can affect the depth stratification of transparent con-
figurations. Binocular disparity (Nakayama et al. 1990; Anderson and Schmid 2012) and motion
parallax (see Vezzani et al., Chapter 25, this volume) interact with transparency in complex ways.

Transparency in outline patterns


As regards intertwined outline patterns of the Wertheimer type (Brooks, Chapter 4, this volume;
Elder, Chapter 11, this volume), one may wonder whether phenomenal transparency—in a gen-
eric sense—is involved in all cases in which a pattern of intersecting contours, in the absence
of information carried by adjacent grey regions, is perceptually parsed into overlapping shapes.
Double-belongingness of some enclosed regions is observed in both grey-region mosaics and
outline patterns, but the transparency label would probably appear as stretched too far, if applied
to all intertwined outlines.
Rock and Gutman (1981) used overlapping shapes involving the segmentation of contours and
regions to relate attention and form perception, and made a point opposite to double-awareness,
showing that perception of one figure may occur without perception of the other, despite the pres-
ence of all lines around the center of fixation. Object attention is based on segmentation (Scholl
2001; Driver et al. 2001) and can be limited in the number of overlapping planes the observer can
be simultaneously aware of (Tyler and Kontsevich 1995; Fazl et al. 2008).8
However, phenomenal transparency should be qualified as something more than the simple
experience of seeing overlapping figures or surfaces in depth. This type of stratification (supported
by contour or texture information, motion parallax, or binocular disparity) might be a necessary

  Based on evidence from texture segmentation in motion transparency, Glass patterns, and stereopsis, such a
8

number has been evaluated as equal to two (Edwards and Greenwood 2005; Gerbino and Bernetti 1984; Kanai
et al. 2004; Mulligan 1992; Prazdny 1986), three (Weinshall 1991), four (Hiris 2001), and dependent on the
cueing of attention (Felisberti and Zanker 2005).
420 Gerbino

(a) (b)

Fig. 20.5  Transparency in outline patterns (Bozzi 1975). In panel a thinning all lines included within
the oblique rectangle makes it appear foggy. In panel b the misalignment is perceived as the effect
of a distorting superposed layer.

condition for transparency, but phenomenal transparency should involve a characteristic color
appearance, different from the appearance of the same region when seen as part of a mosaic.
This is the case in patterns like those in Figure 20.5, devised by Bozzi (1975) to demonstrate that
the experience of an interposed layer or substance, capable of modifying the appearance of the
background, can be obtained also in the limited and artifactual world of line drawings. Taken as a
whole, Bozzi’s demonstrations suggest that the perception of an interposed layer—at least in some
conditions—amounts to the recovery of the causal history of shapes (Leyton 1992). The milky
layer perceived in panel a accounts for the thinning of vertical lines, while the distorting glass
perceived in panel b accounts for their lateral shift. Bozzi was well aware of the possibility that
line thinning (panel a) may be equivalent to an intensity change, which would make at least some
of his line drawings not less interesting, but similar to other effects involving assimilation and
filling in. The degree of connection between Bozzi’s outline displays portraying transparency and
phenomena like achromatic neon spreading and flank transparency is debatable (Wollschläger
et al. 2001, 2002; Roncato 2012). However, this objection does not apply to Figure 20.5b and other
displays that depict a background transformation more complex than a simple change of inten-
sity due to layer superposition. Line drawings are highly symbolic and transparency mediated by
the specific transformations they can afford might go beyond the domain covered in this chapter.

Photometric conditions
To support transparency, the pattern of intensities of adjacent regions must satisfy a requirement
that, at an abstract level, complements the good continuation of contour trajectories. The equiva-
lent of a discontinuity in contour trajectory is an abrupt change of surface values (apparent trans-
mittance, lightness, or others to be defined).
Consider contour trajectories in the neighborhood of an X-junction originated by layer
superposition. In general, background regions are divided by a continuous reflectance edge
(R-edge), while the superposed layer and background regions are divided by a continuous
transmittance-reflectance-illumination edge (TRI-edge). Following Nakayama et  al. (1989) the
latter edge is intrinsic to layer regions (it belongs to them) but extrinsic to regions seen as unoc-
cluded background (it does not belong to them). Topological and figural conditions tell that both
edges should be smoothly continuous at the X-junction.
Consider now intensities in the neighborhood of the X-junction. Photometric conditions tell
when one of the two crossing edges can be classified as a TRI-edge; i.e., when the intensity of each
Achromatic Transparency 421

double-function region is consistent with the mixing of photometric properties of the adjacent
background region and those of an ideally homogeneous layer resulting from the grouping of two
adjacent double-function regions. Notions such as scission (Metelli 1970; Anderson 1997), vector
analysis in the photometric domain (Bergström 1977, 1982, 1994), atmospheric transfer func-
tion (Adelson 2000) capture the same idea. A rather general term is layer decomposition, used by
Kingdom (2011) to qualify brightness, lightness, and transparency models—alternative to image
filtering—that explain achromatic phenomena as a consequence of extracting components from
each stimulus intensity (the invariant of alternative partitioning solutions). For historical and
conceptual reasons let us illustrate the algebraic model proposed by Metelli (1970, 1974, 1975)
which—despite limitations that will be pointed out—provides an effective frame of reference for
the whole discussion on photometric conditions of transparency.9

Metelli’s model
Metelli’s model is derived from a simplistic case of real transparency, the episcotister setting uti-
lized to manipulate light mixtures (Fuchs 1923; Koffka 1935; Moore-Heider 1933; Tudor-Hart
1928). The episcotister model is representative of a broad class of ecological settings, which in
principle should consider more parameters (Richards et al. 2009), but—more importantly—has
the virtue of being a simple and essential decomposition-and-grouping model.
As shown in Figure 20.1, a layer appears transparent only if partially superposed on a back-
ground that includes at least two regions of different reflectance.10 Metelli’s model provides a way
of evaluating the amount of photometric information carried by a generic X-junction in which
an R-edge intersects a TRI-edge. The R-edge is the simple boundary between two adjacent back-
ground regions, differing in reflectance but equally illuminated; while the TRI-edge is a complex
boundary arising from the superposition of a layer of variable transmittance and reflectance, and/
or a change in illumination.
In the original model the input variables are the four reflectances that, in a cardboard display,
mimic the light coming from two adjacent background surfaces a and b, and from the light mix-
tures p and q, obtained by rotating an episcotister (spinning disk with apertures and opaque sec-
tors of variable reflectance) in front of background surfaces a and b, under the critical assumption
that the episcotister and background surfaces are equally illuminated.11 The fact that the situation
referred to in the episcotister model does not involve physically transparent materials should not
be seen as a problem. When an episcotister rotates faster than fusion speed, its effects on p and
q intensities are equivalent to those generated by static layers as a thin veil or an optical filter.
Neither the temporal (episcotister) nor the spatial (veil, filter) light mixtures follow the equations
known as the episcotister model if the constraint of uniform illumination is not fulfilled; both

  Kanizsa (1955, 1979) sometimes used the label ‘chromatic conditions’ as a synonim of photometric condi-
9

tions, discussing achromatic displays. To avoid confusions that would obviously arise in a chapter entitled
‘Achromatic transparency,’ conditions related to region intensities (expressed as either reflectances or lumi-
nances) will be called ‘photometric.’
  This formulation covers transparency perceived in the 3-region display, studied for instance by Masin
10

(1984). His observers perceived as transparent a real filter suspended in front of a background that includ-
ed a square projectively enclosed by the filter. However, the objective separation in depth was large enough
to provide valid disparity information.
11  In this chapter small letters are used for dimensionless numbers (reflectances abpq and other coefficients
with meaningful values between 0 and 1) and capital letters for luminances (in Section “Reflectances or
luminances?”). For further details see Gerbino et al. (1990) and Gerbino (1994). The transparency litera-
ture is full of different symbols for the same entities. I apologize for possible confusions.
422 Gerbino

should be described by the so-called filter model if the layer is very close or in contact with the
background, as it actually looks in the flatland of impoverished 4-region displays (Beck et al. 1984;
Gerbino 1994; Richards et al. 2009).12
Basically, the episcotister model takes regions grouped as (a[p)(q]b) according to figural con-
straints and verifies if p and q intensities are compatible with the constrained sum of two compo-
nents described by the following equations:

p = ta + f (1)

q = tb + f  (2)

Equations 1 and 2 make clear that the episcotister model is a straightforward decomposition-
and-grouping model. Each intensity of a region to be grouped into the layer is reduced to the sum of
a multiplicative component and an additive component (the scission aspect): the first is the constant
fraction t of the corresponding background region; the second is a common component that—what-
ever the t value between 0 and 1—attenuates the background contrast a/b.
Equations 1 and 2 describe how a and b intensities are modified by a rotating episcotister with
an open sector of size t and an effective reflectance f, equal to the product of the size of the com-
plementary solid sector (1-t) by its reflectance r. Since both t and r are proper fractions (t is the
relative size of the opening of the episcotister and r is a reflectance), neither can be smaller than
zero or larger than 1.
Equations 1 and 2 refer to direct optics. For instance, knowing background reflectance a,
filter transmittance t and filter reflectance r, one can derive the effective reflectance of the
superposition area p. However, such a system of two equations becomes a useful psychophys-
ical model if one realizes (as Metelli did) that it provides unique solutions for both t and r,
constituting a plausible inverse-optics model for the recovery of layer properties (not explicit
in the stimulus) from the pattern of input values (Marr 1982, pp. 89–90). Relevant solutions
are as follows:

t = ( p − q) / (a − b)  (3)

r = (aq − bq) / [(a + q) − (b + q)]  (4)

f = (aq − bp ) / (a − b )  (5)

Taking the episcotister as a physical model of real transparency Metelli proposed that layer
transmittance and reflectance are perceived in the same way in which the reflectance of an opaque
background surface is perceived as its lightness. Layer transparency (perceived transmittance,
increasing with t) and layer lightness (perceived reflectance, increasing with r) are derived from
the pattern of stimulation.

In the transparency literature, expressions like ‘episcotister model’ and ‘filter model,’ or ‘episcotister equations’
12

and ‘filter equations,’ should not be taken as referring to a specific device (a spinning disk with open sectors
vs. a piece of smoked glass), but to two extreme types of background illumination: in the so-called episcotister
model the background is illuminated exactly like the layer (a condition easily obtained if the layer is sus-
pended in mid air, far away from the background); in the so-called filter model the background is illuminated
only through the layer (a condition which quite frequently occurs when a filter is in contact with the ground).
Achromatic Transparency 423

(a) (b)

t = 0.27 t = 0.43
r = 0.20 r = 0.40

(c) (d)

t = 0.53 t = 0.60
r = 0.60 r = 0.80

Fig. 20.6  The four panels illustrate that, keeping background intensities constant (a = 0.90;
b = 0.10), approximately the same attenuation of background contrast (p/q = 0.25 a/b) is compatible
with different pairs of t and r values (shown in each panel). Intensities of p and q regions are as
follows: (a) p = 0.12; q = 0.05; (b) p = 0.39; q = 0.17; (c) p = 0.61; q = 0.27; (d) p = 0.76; q = 0.34.

The hypothesis that perceptual dimensions of transparency parallel the physical properties of
the layer is quite controversial (Albert 2006, 2008; Anderson 2008; Anderson, Chapter 22, this
volume; Anderson et al. 2006, 2008a, b; Masin 2006; Singh and Anderson 2002, 2006). According
to Kingdom (2011, Section 9) further research is needed to identify the appropriate perceptual
dimensions and the best methods for obtaining valid data from observers. However, as remarked
by Anderson et al. (2008a, p. 1150), researchers should not expect that all variables included in
generative physical models like Equations 1 and 2 have a perceptual meaning. Furthermore, they
should consider the possibility that perception is sensitive to other variables. For instance, solu-
tions for t, r, f (Equations 3, 4, 5) are more complex than the simple intensity ratio available at
each image boundary; while attenuation of border contrast is probably the most salient physical
consequence of layer superposition.13 Note that t and r values, against intuition, are not related to
contrast attenuation in a simple way (Figure 20.6). For a theory of transparency based on contrast
attenuation see Anderson (2003).

The attenuation of border contrast is also behind the notion of veiling luminance, a hybrid term that
13

combines the phenomenal transparency of a metaphorical veil with a physical measure of input intensity
(Gilchrist, 2006, pp. 196–197). When spontaneously perceived as a veil, added light is experienced as the
cause of the reduced visibility of otherwise well-contrasted borders (a case of real transparency without
X-junctions).
424 Gerbino

Reflectances or luminances?
Clearly, the choice of reflectances as input variables is controversial and raised several discussions
(Beck 1985; Beck et al. 1984; Gerbino 1994; Metelli 1985a; Masin 2006). Reflectances are distal
values, and a model should express perceptual values as a function of proximal, not distal, values.
On the other hand, under homogeneous illumination reflectances can be taken as luminances
in arbitrary units, making the distinction irrelevant. Another type of criticism refers, instead, to
the possibility of taking lightnesses (i.e., perceived reflectances derived from a transformation of
luminances) as the input for the model. This approach is theoretically consistent with the exist-
ence of a stage in which all four regions of the canonical display are represented as opaque sur-
faces, each with its own lightness, and of a subsequent stage in which a better solution is achieved
(Rock 1983, pp. 138–139).
An unfortunate implication of the use of reflectances is Metelli’s idea that r= 1 constitutes an
effective upper boundary for transparency. Reformulating the episcotister model in terms of lumi-
nances (Gerbino 1988, 1994; Gerbino et al. 1990) helps to understand that this constraint can be
relaxed. Using luminances as input values, Equations 1 and 2 change as follows:

P = tA + F  (6)

Q = tB + F  (7)

In Equations 6 and 7 also the additive component F is a luminance, equal to (1−t) r Ie, where
Ie is the illumination falling on the episcotister, in principle different from the illumination Ib
falling on background regions whose reflectances are a and b.14 Following the inverse-optics
logic there is no reason to reject values of the additive component F larger than (1−t) Ib,
(i.e., r = 1), since they are compatible with more illumination falling on the layer than on the
background. In principle one could decompose even smaller F values as involving an increase
of the illumination on a layer with r < 1. But this solution would be against the minimum prin-
ciple (which leads to a decomposition with uniform illumination, unless required by specific
stimulus information).
Photometric conditions of the episcotister luminance model are conveniently represented in
the diagram devised by Remondino (1975). Figure 20.7 includes two diagrams, to represent two
transparency solutions, one for each of the two edges crossing at the X-junction, for two 4-region
patterns having in common two luminances (30 and 80, in arbitrary units). In general, photomet-
ric conditions for the TRI-edge can be satisfied for both edges, only one, or none. In the pattern
at the bottom the two solutions correspond to the following APQB orderings: (80, 40, 20, 30) and
(80, 30, 20, 40), with t = 0.40 and 0.25, respectively, and r = 0.13 in both cases. Both transparency
solutions of the pattern at the top violate the r ≤ 1 constraint, but can be interpreted as cases in
which a layer made of perfectly white particles is more illuminated than the background (Ie= 1.3
Ib, if r = 1). The aspect of the diagram with the most prominent theoretical meaning is the shaded
region representing the set of PQ values compatible with a given AB pair and with the constraints
of the episcotister luminance model.

As anticipated in Footnote 11, capital letters are used for luminances and light intensities, while small let-
14

ters indicate dimensionless numbers (reflectance and transmittance coefficients).


Achromatic Transparency 425

t2 = 0.7 t2 = 0.5
TRI-edge R-edge
r = 1.0 r = 1.0
Ie = 1.3 Ib Ie = 1.3 Ib
30 60 30 60
B Q2 B* A*
R-edge TRI-edge
A P2 Q2 P2
80 95 80 95

100 100
P2 = 95

A A=Q2

t2 A

P1
t1 A
P1=B*
t1 A
t2 B*

0 0
0 t1 B Q1 B Q2 100 0 t1 B Q1 B A* P2
t2 A*
t2 B

30 20 30 20
B Q1 P1 Q1
R-edge TRI-edge
A P1 A B
t1 = 0.4 80 40 80 40 t1 = 0.25
r = 0.13 r = 0.13
Ie =Ib Ie =Ib
TRI-edge R-edge

Fig. 20.7  A convenient visualization of transparency solutions in 4-region patterns is the diagram
proposed by Remondino (1975). Coordinates represent luminances in arbitrary units. Two 4-region
patterns are considered here, both compatible with two transparency solutions, corresponding to
two different t values. The component r has a low value (r = 0.13) in both solutions for the bottom
pattern; while it exceeds the r = 1 boundary (dashed line) in both solutions for the top pattern. Each
shaded trapezoidal region in the two diagrams represents the space of valid PQ luminance pairs for
a given AB pair (square symbol). Such a space is actually open in the direction of higher PQ values,
since the additive component (visualized by the projection of the oblique arrow on each axis) can take
any positive value, if constraints on illumination are relaxed. PQ pairs are shown in the two diagrams
as circular symbols, filled for the pattern at the bottom and empty for the pattern at the top.

Are X-junctions and four regions indispensable?


These are two different questions, of course. An X-junction implies four regions, but four
regions can be effectively arranged without X-junctions (for instance, as stripes in a row; Da
Pos 1999). Furthermore, transparency can be obtained in double-inclusion patterns of three
regions, without X-junctions, though stereo and relative motion help a lot in such a limiting
case (Masin 1984). At low contrast, transparency can be perceived also in 2-region displays
(Masin and Idone 1981).
426 Gerbino

As regards the indispensability of X-junctions, Masin (2006) found that transparency in a


striped pattern APQB can be vivid, if supported by coherent motion of AP and QB boundaries,
and that transparency ratings did not differ from those obtained in a classic 4-region display with
X-junctions. This piece of evidence is consistent with the fact that, given four intensity values
around an X-junction, any of the four ratios of adjacent luminances is redundant and can be
derived from a well-taken product of the others. In the case of the APQB pattern the A/B ratio of
non-adjacent luminances could be obtained as a product of ratios A/P, P/Q, Q/B (following the
product of sequential ratios approach applied in Retinex; Land and McCann 1971).

Shadows, transparency, and constancy


As stressed by Adelson (2000) in his notion of atmospheric transfer function, a decomposition
model like Metelli’s makes clear the continuity between shadows and transparency. In a less opti-
mistic way, one might say that the model cannot discriminate between a shadow and a transparent
layer with zero reflectance or without illumination falling on it. In all three cases the additive com-
ponent is zero. Perceptually, the distinction between a shadow and a transparent layer is not sharp
at all.15 If the essence of phenomenal transparency is the sense of ‘seeing through’, shadows (like
episcotisters with a black opaque sector; Koffka 1935; Tudor-Hart 1928) are the best transparent
layers one can experience. Particularly when their boundary is sharp, shadows have a clear shape
that intersects background shapes and can be easily segmented (Mamassian et al. 1998).
Shadows and layers share the problem of constancy; i.e., the perceptual invariance of object
properties despite stimulus change. Perfect decomposition of layer regions (including shadows as
a limiting case) should lead to complete color constancy of surfaces seen through the layer, as well
as to complete constancy of the transparent layer. The phenomenon that probably better embod-
ies the interplay between shadows, transparency, and constancy is the illusion by Anderson and
Winaver (2005; Gilchrist 2005). An important implication of constancy of surface color seen in
a cast shadow or through a transparent layer was studied by Rock et al. (1992), who found that
similarity grouping is not based on luminances but on lightness values, consistent with early layer
decomposition. So far, research on transparent layer constancy (Faul and Ekroll 2012; Gerbino
et al. 1990) has provided good support for the layer decomposition approach, despite the meth-
odological limitations of some studies pointed out by Kingdom (2011). However, more experi-
ments considering both types of constancy in comparable conditions are necessary.

Effects of transparency
Transparency can be conceived of as the effect of appropriate stimulus conditions, but also as the
cause of specific changes in other perceptual properties. Kanizsa (1955) articulated this logic refer-
ring to Figure 20.8a, an ambiguous pattern supporting either an occlusion solution (a light lamina
with holes in front of an oblique opaque bar) or a transparency solution (a milky rectangular filter
in front of a rectangle with holes). The dominance of one solution over the other depends on the
relative intensities of the three regions (Ripamonti and Gerbino 2001); but when conditions are such
that both solutions are easily perceived, a clear effect of form organization on color is observed. In

Metelli (1985b, p. 304) reminded us that the devil—notoriously an excellent observer—treats Peter
15

Schlemihl’s shadow as a thin mantle laying on the terrain: ‘He shook my hand, knelt down in front of me
without delay, and I beheld him, with admirable dexterity, gently free my shadow, from the head down to
the feet, from the grass, lift it up, roll it together, fold it, and finally tuck it into his pocket.’ (Chamisso, The
Wonderful History of Peter Schlemihl).
Achromatic Transparency 427

(a) (b) (c)

Fig. 20.8  The ambiguous three-intensity pattern in panel a (Kanizsa 1955) can be perceived as a light
lamina with four holes in front of an oblique rectangle (like in panel b) or as a transparent oblique
rectangle in front of a lamina with holes (like in panel c). The addition of a thin outline disambiguates
the transparent layer, which takes on a definite milky appearance. The same color appearance is
observed in panel a, when the oblique rectangle appears in front.
Reproduced from G. Kanizsa, Condizioni ed effetti della trasparenza fenomenica, Rivista di Psicologia, 49,
pp. 3–19, Figure 12, Copyright © 1955, The Author.

the occlusion solution (that may be primed by panel b, where intensity conditions do not favor trans-
parency) the oblique bar is amodally completed but its modal parts have a hard surface color. In the
transparency solution the oblique bar is similar to the one in panel c, where the white outline makes
the bar unambiguously in front. Coming in front is associated with a distinctive change in color
appearance. The bar appears modally completed in front by the addition of illusory contours and all
its surface acquires a milky appearance (van Lier and Gerbino, Chapter 15, this volume).
There are two theoretically important points. First, the specific color appearance of transparent
surfaces cannot be explained by image properties only, given that the image remains the same dur-
ing occlusion/transparency reversals. Second, changes are consistent with scission: an invariant
stimulus-specified quantity splits into a layer component and a background component. Kanizsa
(1955) remarked that the measurement of such components is made difficult by opposite tenden-
cies in different observers: some focus their attention on the transparent layer in front, some on
surfaces seen through the layer.
As regards other effects (or at least, other couplings involving transparency) Kersten et  al.
(1992) provided a nice demonstration of the interplay between transparency and rotation in
depth. Gerbino (1975) found that shrinkage by amodal completion extends to rectangles partially
occluded by a layer of variable transparency, and its amount correlates with the perceived opacity
of the layer. Sigman and Rock (1974; Rock 1983, p. 171) demonstrated that an opaque occluder,
but not a transparent object, vetoes the perception of stroboscopic motion, according to the idea
that this type of apparent motion is mediated by perceptual intelligence. Moving from the obser-
vation that transparency can be perceived in low-contrast disk-surround displays (Masin and
Idone, 1981), Ekroll and Faul (2012a, 2012b, 2013) argued that the perception of transparency can
provide a unifying account of simultaneous color contrast phenomena.16

Musatti (1953) articulated a theory of simultaneous color contrast, based on scission of the proximal color,
16

in which the ‘equalizing’ common component was primary.


428 Gerbino

Transparency and motion


There are at least two logical intersections between transparency and motion. First, some motion
configurations are perceptually segregated into different entities (typically, overlapping planes)
that involve the fundamental feature of phenomenal transparency; i.e., perception of one sur-
face through another. In this case photometric information is not critical. Second, transparency
in grey-level images can be instantiated or enhanced by motion of the TRI-edge relative to the
R-edge. The point of contact between the two research lines is represented by the effect of lumi-
nance constraints on motion segmentation in plaid patterns (Stoner et al. 1990; Trueswell and
Hayhoe 1993).

Motion transparency
In random dot kinematograms (RDK), grouping by common fate (Brooks, Chapter 4, this volume)
leads to the segmentation of textured overlapping surfaces. This phenomenon is usually called
motion transparency and has been intensively utilized to study motion mechanisms (Braddick
and Qian 2001; Curran et al. 2007; Durant et al. 2006; Meso and Zanker, 2009; van Doorn and
Koenderink 1982a, b), the maximum number of independent planes that the visual system can
effectively segregate (Edwards and Greenwood 2005; Gerbino and Bernetti 1984; Mulligan 1992),
depth ordering (Schütz 2011), global vs. local motion (Kanai et al. 2004), and directional biases
(Mamassian and Wallace 2010).
Transparency perceived in RDK is a by-product of grouping by motion and does not involve
layer decomposition with color changes. However, figure/ground stratification is correlated
with small but reliable effects on lightness and perceived contrast. As noted since Rubin
(1915/1921) and demonstrated by Wolff (1934; Gilchrist 2006) the figure appears more con-
trasted than the ground; and perceived contrast within the figure is higher than perceived
contrast within the ground (Kanizsa 1979). Since attention is normally directed towards the
figure, one should also consider that attention can enhance contrast, as postulated by James
(1890) and demonstrated in several studies (Barbot et al. 2012; Carrasco et al. 2000; Prinzmetal
et al. 2008; Treue 2004).

Kinetic transparency in grey-level patterns


The emergence of perceived transparency can be facilitated by relative motion, also in grey-level
patterns that otherwise would be perceived as mosaics. Masin (2006) used motion to support
transparency in 4-region patterns without X-junctions. The basic effect was observed by Wallach
(1935; English translation in Wuerger et al. 1996) in his pioneering analysis of the aperture
problem (Bruno and Bertamini, Chapter 24, this volume) and Musatti (1953; Kanizsa 1955).17
Transparency effects induced by motion and clearly involving color changes occur in kinetic neon
color spreading (Bressan and Vallortigara 1991; Bressan et al. 1997), in the so-called ‘flank trans-
parency’ (Wollschläger et al. 2001, 2002), and in various stereokinetic phenomena (Vezzani et al.,
Chapter 25, this volume; Zanforlin 2006; Zanforlin and Vallortigara 1990).

Musatti (1953, p. 555) attributed to Metzger the honor of first observing transparency in stereokinetic
17

displays. Metzger mentioned the effect in the second edition of Gesetze des Sehens (1953) and discussed
(1955) the paradoxical fact that stereokinesis can make a disk transparent and sliding over another also
when the color of the superposition region is physically unplausible, as later reported by Hupé and
Rubin (2000).
Achromatic Transparency 429

Conclusion
Principles of perceptual organization prove to be an important source of inspiration for the under-
standing of phenomenal transparency. Concern for the physical plausibility of transparency models
has sometimes obscured the fundamental fact that notions like scission and layer decomposition,
combined with grouping by surface color similarity and contour good continuation satisfactor-
ily account for perception. Interested readers will find extensive treatments of other aspects of
phenomenal transparency in recent empirical and theoretical papers (Anderson, Chapter 22, this
volume; Faul and Ekroll 2011, 2012; Kingdom 2011; Kitaoka 2005; Koenderink et al. 2008, 2010;
Richards et al. 2009). Important evidence on the neural mechanisms related to the assignment of
border ownership in transparency patterns has been found by Qiu and von der Heydt (2007).

References
Adelson, E. H. (2000). ‘Lightness perception and lightness illusions’. In The New Cognitive Neurosciences,
edited by M. Gazzaniga, 2nd ed., pp. 339–51 (Cambridge, MA: MIT Press).
Albert, M. K. (2006). ‘Lightness and perceptual transparency’. Perception 35: 433–43.
Albert, M. K. (2008). ‘The role of contrast in the perception of achromatic transparency: Comment on
Singh and Anderson (2002) and Anderson (2003)’. Psychological Review 115: 1127–43.
Anderson, B. L. (1997). ‘A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions’. Perception 26: 419–53.
Anderson, B. L. (2003). ‘The role of occlusion in the perception of depth, lightness, and opacity’.
Psychological Review 110: 785–801.
Anderson, B. L. (2008). ‘Transparency and occlusion’. In The Senses: A Comprehensive Reference, edited by
A. I. Basbaum, A. Kaneko, G. M. Shepherd, and G. Westheimer, Vol. 2, Vision II, T. D. Albright and R.
H. Masland (Volume eds.), pp. 239–44 (San Diego: Academic Press).
Anderson, B. L. (2014). ‘The perceptual representation of transparency, lightness, and gloss’. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 22 (Oxford: Oxford University
Press).
Anderson, B. L. and Schmid, A. C. (2012). ‘The role of amodal surface completion in stereoscopic
transparency’. Frontiers in Psychology 3: 1–11.
Anderson, B. L. and Winawer, J. (2005). ‘Image segmentation and lightness perception’. Nature 434: 79–83.
Anderson, B. L., Singh, M., and Meng, J. (2006). ‘The perceived transmittance of inhomogeneous surfaces
and media’. Vision Research 46: 1982–95.
Anderson, B. L., Singh, M., and O’Vari, J. (2008a). ‘Natural psychological decompositions of perceived
transparency: Reply to Albert’. Psychological Review 115: 144–51.
Anderson, B. L., Singh, M., and O’Vari, J. (2008b). ‘Postscript: Qualifying and quantifying constraints on
transparency’. Psychological Review 115: 151–3.
Arnheim, R. (1974). Art and Visual Perception. [1954] (Berkeley: University of California Press).
Barbot, A., Landy, M. S., and Carrasco, M. (2012). ‘Differential effects of exogenous and endogenous
attention on second-order texture contrast sensitivity’. Journal of Vision 12: 1–15.
Beck, J. (1985). ‘Perception of transparency in man and machine’. Computer Vision, Graphics, and Image
Processing 31: 127–38.
Beck, J., Prazdny, K. and Ivry, R. (1984). ‘The perception of transparency with achromatic colors’.
Perception and Psychophysics 35: 407–22.
Bergström, S. S. (1977). ‘Common and relative components of reflected light as information about the
illumination, colour, and three-dimensional form of objects’. Scandinavian Journal of Psychology
18: 180–6.
430 Gerbino

Bergström, S. S. (1982). ‘Illumination, color, and three-dimensional form’. In Organization and


Representation in Perception, edited by J. Beck, pp. 365–78 (Hillsdale, NJ: Erlbaum).
Bergström, S. S. (1994). ‘Color constancy: Arguments for a vector model for the perception of illumination,
color, and depth’. In Lightness, Brightness, and Transparency, edited by A. L. Gilchrist, pp. 257–86
(Hillsdale, NJ: Erlbaum).
Bozzi, P. (1975). ‘Osservazioni su alcuni casi di trasparenza fenomenica realizzabili con figure a tratto’.
In Studies in Perception: Festschrift for Fabio Metelli, edited by G. B. Flores D’Arcais, pp. 177–97
(Firenze: Martello-Giunti).
Braddick, O. and Qian, N. (2001). ‘The organization of global motion and transparency’. In Motion
Vision: Computational, Neural, and Ecological Constraints, edited by J. M. Zanker and J. Zeil, pp. 85–112
(New York: Springer).
Bregman, A. S. (1996). ‘Perceptual interpretation and the neurobiology of perception’. In The Mind-Brain
Continuum: Sensory Processes, edited by R. Llinás and P. S. Churchland, pp. 203–17 (Cambridge,
MA: MIT Press).
Bregman, A. S. (2008). ‘Auditory scene analysis’. In The Senses: A Comprehensive Reference, edited by A.
I. Basbaum, A. Kaneko, G. M. Shepherd, and G. Westheimer, Vol. 3, Audition, P. Dallos and D. Oertel
(Volume eds.), pp. 861–70 (San Diego: Academic Press).
Bressan, P. and Vallortigara, G. (1991). ‘Illusory depth from moving subjective figures and neon color
spreading’. Perception 20: 637–44.
Bressan, P., Mingolla, E., Spillmann, L., and Watanabe, T. (1997). ‘Neon color spreading: a review’.
Perception 26: 1353–66.
Brooks, J. L. (2014). ‘Traditional and new principles of perceptual grouping’. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans, Chapter 4 (Oxford: Oxford University Press).
Bruno, N. and Bertamini, M. (2014). ‘Perceptual organization and the aperture problem’. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 24 (Oxford: Oxford University
Press).
Carrasco, M., Penpeci-Talgar, C., and Eckstein, M. (2000). ‘Spatial attention increases contrast sensitivity
across the CSF: Support for signal enhancement’. Vision Research 40: 1203–15.
Chuang, J., Weiskopf, D., and Moller T. (2009). ‘Hue-preserving color blending’. IEEE Transactions on
Visualization and Computer Graphics 15: 1275–82.
Curran, W., Hibbard, P. B., and Johnston A. (2007). ‘The visual processing of motion-defined
transparency’. Proceedings of the Royal Society, Biological Sciences 274: 1049–57.
Da Pos, O. (1999). ‘The perception of transparency with chromatic colours’. In Research in Perception, edited
by M. Zanforlin and L. Tommasi, pp. 47–68 (Padova: Logos).
Delogu, F., Fedorov, G., Olivetti Belardinelli, M., and van Leeuwen, C. (2010). ‘Perceptual preferences in
depth stratification of transparent layers: Photometric and non-photometric factors’. Journal of Vision
10: 1–13.
Denham, S. L. and Winkler, I. (2014). ‘Auditory perceptual organization’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans, Chapter 29 (Oxford: Oxford University Press).
Driver, J., Davis, G., Russell, C., Turatto, M., and Freeman, E. (2001). ‘Segmentation, attention and
phenomenal visual objects’. Cognition 80: 61–95.
Durant, S., Donoso-Barrera, A., Tan, S., and Johnston, A. (2006). ‘Moving from spatially segregated to
transparent motion: a modelling approach’. Biology Letters 2: 101–5.
Edwards, M. and Greenwood, J. A. (2005). ‘The perception of motion transparency: A signal-to-noise
limit’. Vision Research 45: 1877–84.
Ekroll, V. and Faul, F. (2012a). ‘New laws of simultaneous contrast?’ Seeing and Perceiving 25: 107–41.
Ekroll, V. and Faul, F. (2012b). ‘Basic characteristics of simultaneous color contrast revisited’. Psychological
Science 23: 1246–55.
Achromatic Transparency 431

Ekroll, V. and Faul, F. (2013). ‘Transparency perception: the key to understanding simultaneous color
contrast’. Journal of the Optical Society of America A 30: 342–52.
Elder, J. H. (2014). ‘Bridging the dimensional gap: Perceptual organization of contour in two-dimensional
shape’. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 11 (Oxford:
Oxford University Press).
Epstein, W. (1982). ‘Percept-percept couplings’. Perception 11: 75–83. Reprinted in I. Rock (ed.) (1997).
Indirect Perception, pp. 17–29 (Cambridge, MA: MIT Press).
Faul, F., and Ekroll, V. (2011). ‘On the filter approach to perceptual transparency’. Journal of Vision 11: 1–33.
Faul, F. and Ekroll, V. (2012). ‘Transparent layer constancy’. Journal of Vision 12: 1–26.
Fazl, A., Grossberg, S., and Mingolla, E. (2008). ‘View-invariant object category learning, recognition,
and search: How spatial and object attention are coordinated using surface-based attentional shrouds’.
Cognitive Psychology 58: 1–48.
Felisberti, F. and Zanker, J. M. (2005). ‘Attention modulates perception of transparent motion’. Vision
Research 45: 2587–99.
Fuchs, W. (1923).’ Experimentelle Untersuchungen über das simultane Hintereinandersehen auf derselben
Sehrichtung’. Zeitschrift für Psychologie 91: 145–235.
Gerbino, W. (1975). ‘Perceptual transparency and phenomenal shrinkage of visual objects’. Italian Journal of
Psychology 2: 403–15.
Gerbino, W. (1988). ‘Models of achromatic transparency: A theoretical analysis’. Gestalt Theory 10: 5–20.
Gerbino, W. (1994). ‘Achromatic transparency’. In Lightness, Brightness, and Transparency, edited by A. L.
Gilchrist, pp. 215–55 (Hillsdale, NJ: Erlbaum).
Gerbino, W. and Bernetti, L. (1984). ‘One, two, many: textural segregation on the basis of motion’.
Perception 13: A38–A39.
Gerbino, W., Stultiens, C., Troost, J., and de Weert, C. (1990). ‘Transparent layer constancy’. Journal of
Experimental Psychology: Human Perception and Performance 16: 3–20.
Gibson, J. J. (1975). ‘Three kinds of distance that can be seen, or how Bishop Berkeley went wrong’.
In Studies in Perception: Festschrift for Fabio Metelli, edited by G. B. Flores D’Arcais, pp. 83–7
(Firenze: Martello-Giunti).
Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Boston: Houghton Mifflin).
Gilchrist, A. L. (2005). ‘Lightness perception: Seeing one color through another’. Current Biology 15,
9: 330–2.
Gilchrist, A. L. (2006). Seeing Black and White (New York: Oxford University Press).
Hatfield, G. (2011). ‘Transparency of mind: The contributions of Descartes, Leibniz, and Berkeley to
the genesis of the modern subject’. In Departure for Modern Europe: A Handbook of Early Modern
Philosophy (1400–1700), edited by H. Busche, pp. 361–75 (Hamburg: Felix Meiner Verlag).
Helmholtz, H. von (1867). Handbuch der physiologischen Optik (Leipzig: Voss). English translation by
J. P. C. Southall (ed.) of the third [1910] German edition (1924). Treatise on Physiological Optics.
(New York: Dover). Available at <http://poseidon.sunyopt.edu/BackusLab/Helmholtz/>
Hering, E. (1879). ‘Der Raumsinn und die Bewegungen des Auges’. In Handbuch der Physiologie der
Sinnesorgane, edited by L. Hermann, 3(1), S343-601 (Leipzig: Vogel).
Hiris, E. (2001). ‘Limits on the perception of transparency from motion’. Journal of Vision 1: 377a.
Hochberg, J. (1974). ‘Higher-order stimuli and inter-response coupling in the perception of the visual
world’. In Perception: Essays in Honor of James J. Gibson, edited by R. B. McLeod and H. L. Pick, Jr., pp.
17–39 (Ithaca, NY: Cornell University Press).
Hupé, J.-M., and Rubin, N. (2000). ‘Perceived motion transparency can override luminance / color cues
which are inconsistent with transparency’. Investigative Ophthalmology and Visual Science Supplement
41: 721.
James, W. (1890). The Principles of Psychology (New York: Holt).
432 Gerbino

Kanai, R., Paffen, C. L., Gerbino, W., and Verstraten, F. A. (2004). ‘Blindness to inconsistent local signals
in motion transparency from oscillating dots’. Vision Research 44: 2207–12.
Kanizsa, G. (1955). ‘Condizioni ed effetti della trasparenza fenomenica’. Rivista di Psicologia 49: 3–19.
Kanizsa, G. (1979). Organization in Vision (New York: Praeger).
Katz, D. (1925). Der Aufbau der Tastwelt (Leipzig: Barth). English translation by L. E. Krueger (ed.) (1989).
The World of Touch (Hillsdale, NJ: Erlbaum).
Kepes, G. (1944). Language of Vision (Chicago: Paul Theobald). Reissued 1995 (New York: Dover
Publications).
Kersten, D., Bülthoff, H. H., Schwartz, B., and Kurtz, K. (1992). ‘Interaction between transparency and
structure from motion’. Neural Computation 4: 573–89.
Kingdom, F. A. A. (2011). ‘Lightness, brightness and transparency: A quarter century of new ideas,
captivating demonstrations and unrelenting controversy’. Vision Research 51: 652–73.
Kitaoka, A. (2005). ‘A new explanation of perceptual transparency connecting the X-junction
contrast-polarity model with the luminance-based arithmetic model’. Japanese Psychological Research
47: 175–87.
Klee, P. (1961). The Thinking Eye, edited by J. Spiller (London: Lund Humphries).
Koenderink, J., van Doorn, A., Pont, S., and Richards, W. (2008). ‘Gestalt and phenomenal transparency’.
Journal of the Optical Society of America A 25: 190–202.
Koenderink, J., van Doorn, A., Pont, S., and Wijntjes, M. (2010). ‘Phenomenal transparency at
X-junctions’. Perception 39: 872–83.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt Brace).
Köhler, W. (1929). Gestalt Psychology (New York: Liveright).
Kramer, P. and Bressan, P. (2009). ‘Clear waters, murky waters: why transparency perception is good for
you and underconstrained’. Perception 38: 871–2, discussion 877.
Kramer, P. and Bressan, P. (2010). ‘Ignoring color in transparency perception’. Rivista di Estetica 43: 147–59.
Krueger, L. E. (1982). ‘Tactual perception in historical perspective: David Katz’s world of touch’. In
Tactual Perception: A Sourcebook, edited by W. Schiff and E. Foulke, pp. 1–54 (Cambridge: Cambridge
University Press).
Land, E. H. and McCann, J. J. (1971). ‘Lightness and retinex theory’. Journal of the Optical Society of
America 61: 1–11.
Leeuwenberg, E. L. J. (1976).’ Figure-ground specification in terms of structural information’. In Advances
in Psychophysics, edited by H. G. Geissler and Y. M. Zabrodin, pp. 325–37 (Berlin: Deutscher Verlag der
Wissenschaften).
Leeuwenberg, E. L. J. (1982). ‘The perception of assimilation and brightness contrast’. Perception and
Psychophysics 32: 345–52.
Leeuwenberg, E. L. J. and van der Helm, P. A. (2013). Structural Information Theory: The Simplicity of
Visual Form (Cambridge: Cambridge University Press).
Leyton, M. (1992). Symmetry, Causality, Mind (Cambridge, MA: MIT Press, Bradford Books).
Libben, G. (1998). ‘Semantic transparency in the processing of compounds: Consequences for
representation, processing, and impairment’. Brain and Language 61: 30–44.
Mamassian, P. and Wallace, J. M. (2010). ‘Sustained directional biases in motion transparency’. Journal of
Vision 10: 1–12.
Mamassian, P., Knill, D. C., and Kersten, D. (1998). ‘The perception of cast shadows’. Trends in Cognitive
Sciences 2: 288–95.
Marr, D. (1982). Vision (San Francisco, CA: Freeman).
Masin, S. C. (1984). ‘An experimental comparison of three- versus four-surface phenomenal transparency’.
Perception and Psychophysics 35: 325–32.
Achromatic Transparency 433

Masin, S. C. (2006). ‘Test of models of achromatic transparency’. Perception 35: 1611–24.


Masin, S. C. and Idone, A. M. (1981). ‘Studio sperimentale sulla percezione della trasparenza con figura e
sfondo acromatici e omogenei’. Giornale Italiano di Psicologia 8: 265–77.
Meso, A. I. and Zanker, J. M. (2009). ‘Perceiving motion transparency in the absence of component
direction differences’. Vision Research 49: 2187–200.
Metelli, F. (1970). ‘An algebraic development of the theory of perceptual transparency’. Ergonomics
13: 59–66.
Metelli, F. (1974). ‘The perception of transparency’. Scientific American 230: 90–8.
Metelli, F. (1975). ‘On the visual perception of transparency’. In Studies in Perception: Festschrift for Fabio
Metelli, edited by G. B. Flores D’Arcais, pp. 445–87 (Firenze: Martello-Giunti).
Metelli, F. (1985a). ‘Stimulation and perception of transparency’. Psychological Research 47: 185–202.
Metelli, F. (1985b). ‘Su alcune condizioni spazio-figurali della trasparenza’. In Conoscenza e Struttura, edited
by W. Gerbino, pp. 303–31. (Bologna: Il Mulino).
Metzger, W. (1936). Gesetze des Sehens. (Frankfurt: Kramer). English translation by L. Spillmann, S. Lehar,
M. Stromeyer, and M. Wertheimer (2006). The Laws of Seeing (Cambridge, MA: MIT Press).
Metzger, W. (1953). Gesetze des Sehens, 2nd edition (Frankfurt: Kramer).
Metzger, W. (1955). ‘Über Durchsichtigkeits-Erscheinungen (Vorläufige Mitteilung)’. Rivista di Psicologia
49: 187–9.
Moore-Heider, G. (1933). ‘New studies in transparency, form, and colour’. Psychologische Forschung
17: 13–55.
Mulligan, J. B. (1992). ‘Motion transparency is restricted to two planes’. Investigative Ophthalmology and
Visual Science Supplement 33: 1049.
Musatti, C. L. (1953). ‘Ricerche sperimentali sopra la percezione cromatica’. Archivio di Psicologia,
Neurologia e Psichiatria 14: 542–77.
Nakayama, K., Shimojo, S., and Silverman, G. H. (1989). ‘Stereoscopic depth: its relation to image
segmentation, grouping and recognition of partially occluded objects’. Perception 18: 55–68.
Nakayama, K., Shimojo, S., and Ramachandran, V. S. (1990). ‘Transparency: relation to depth, subjective
contours, luminance, and neon color spreading’. Perception 19: 497–513.
Prazdny, K. (1986). ‘Some new phenomena in the perception of glass patterns’. Biological Cybernetics
53: 153–8.
Prinzmetal, W., Long, V., and Leonhardt, J. (2008). ‘Involuntary attention and brightness contrast’.
Perception and Psychophysics 70: 1139–50.
Qiu, F. T. and von der Heydt, R. (2007). ‘Neural representation of transparent overlay’. Nature Neuroscience
10: 283–4.
Remondino, C. (1975). ‘Achromatic color conditions in the perception of transparency: The development of
an analytical model’. In Studies in Perception. Festschrift for Fabio Metelli, edited by G. B. Flores d’Arcais,
pp. 111–38 (Firenze: Martello-Giunti).
Richards, W., Koenderink, J. J., and van Doorn, A. (2009). ‘Transparency and imaginary colors’. Journal of
the Optical Society of America A 26: 1119–28.
Ripamonti, C. and Gerbino, W. (2001). ‘Classical and inverted White’s effect’. Perception 30: 467–88.
Rock, I. (1983). The Logic of Perception (Cambridge, MA: MIT Press).
Rock, I. and Gutman, D. (1981). ‘The effect of inattention on form perception’. Journal of Experimental
Psychology: Human Perception and Performance 7: 275–85.
Rock, I., Nijhawan, R., Palmer, S., and Tudor, L. (1992). ‘Grouping based on phenomenal similarity of
achromatic color’, Perception 21: 779–89.
Roncato, S. (2012). ‘Brightness alteration with interweaving contours’. i-Perception 3: 786–803.
Rosenthal, D. (1993). ‘A transparent world: the notebooks of Paul Klee’. The New Criterion 11: 33–8.
434 Gerbino

Rowe, C. and Slutzky, R. (1963). ‘Transparency: literal and phenomenal’. Perspecta 8: 45–54.


Rubin E. (1915). Synsoplevede Figurer (Copenhagen: Gyldendal). German translation (1921). Visuell
Wahrgenomme Figuren (Berlin: Gyldendal).
Savardi, U. and Bianchi, I. (2012). ‘Coupling Epstein’s and Bozzi’s “Percept-Percept Coupling” ’. Gestalt
Theory 34: 191–200.
Scholl, B. J. (2001). ‘Objects and attention: the state of the art’. Cognition 80: 1–46.
Schütz, A. C. (2011). ‘Motion transparency: Depth ordering and smooth pursuit eye movements’. Journal of
Vision, 11(14): 21, 1–19.
Sigman, E. and Rock, I. (1974). ‘Stroboscopic movement based on perceptual intelligence’. Perception
3: 9–28.
Singh, M. (2014). ‘Visual representation of contour and shape’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans, Chapter 12 (Oxford: Oxford University Press).
Singh, M. and Anderson, B. L. (2002). ‘Toward a perceptual theory of transparency’. Psychological Review
109: 492–519.
Singh, M. and Anderson, B. L. (2006) ‘Photometric determinants of perceived transparency’. Vision
Research 46: 879–94.
Singh, M. and Hoffman, D. D. (1998). ‘Part boundaries alter the perception of transparency’. Psychological
Science 9: 370–8.
Sonneveld, M. H. and Schifferstein, H. H. J. (2008). ‘The tactual experience of objects’. In Product
Experience, edited by H. H. J. Schifferstein and P. Hekkert (Amsterdam: Elsevier).
Stone, M. and Bartram, L. (2008). ‘Alpha, contrast and the perception of visual metadata’. Proceedings of the
16th IS&T/SID Color Imaging Conference, 355–59.
Stoner, G. R., Albright, T. D., and Ramachandran, V. S. (1990). ‘Transparency and coherence in human
motion perception’. Nature 344: 153–5.
Treue, S. (2004). ‘Perceptual enhancement of contrast by attention’. Trends in Cognitive Sciences 8: 435–7.
Trueswell, J. C. and Hayhoe, M. M. (1993). ‘Surface segmentation mechanisms and motion perception’.
Vision Research 33: 313–28.
Tudor-Hart, B. (1928). ‘Studies in transparency, form, and color’. Psychologische Forschung 10: 255–98.
Tyler, C. W. and Kontsevich, L. L. (1995). ‘Mechanisms of stereoscopic processing: stereoattention and
surface perception in depth reconstruction’. Perception 24: 127–53.
van Doorn, A. J. and Koenderink, J. J. (1982a). ‘Temporal properties of the visual detectability of moving
spatial white noise’. Experimental Brain Research 45: 179–88.
van Doorn, A. J. and Koenderink, J. J. (1982b). ‘Spatial properties of the visual detectability of moving
spatial white noise’. Experimental Brain Research 45: 189–95.
van der Helm, P. A. (2014). ‘Simplicity in perceptual organization’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans, Chapter 50 (Oxford: Oxford University Press).
van Lier, R. J. and Gerbino, W. (2014). ‘Perceptual completions’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans, Chapter 15 (Oxford: Oxford University Press).
Vezzani, S., Kramer, P., & Bressan, P. (2014). ‘Stereokinetic effect, kinetic depth effect, and structure from
motion’. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 25 (Oxford:
Oxford University Press).
Wagemans, J. (2014). ‘Historical and conceptual background: Gestalt theory’. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans, Chapter 1 (Oxford: Oxford University Press).
Wallach, H. (1935). ‘Über visuell wahrgenommene Bewegungsrichtung’. Psychologische Forschung 20: 325–
80. English translation in S. Wuerger, R. Shapley, and N. Rubin (1996). On the visually perceived
direction of motion by Hans Wallach: 60 years later. Perception 25: 1319–68.
Weinshall, D. (1991). ‘Seeing “ghost” planes in stereo vision’. Vision Research 31: 1731–48.
Achromatic Transparency 435

Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt, II’. Psychologische Forschung 4: 301–
50. English translation in L. Spillmann (ed.) (2012). On Perceived Motion and Figural Organization
(Cambridge, MA: MIT Press).
Wolff, W. (1934). ‘Induzierte Helligkeitsveränderung’. Psychologische Forschung 20: 159–94.
Wollschläger, D., Rodriguez, A. M., and Hoffman, D. D. (2001). ‘Flank transparency: transparent filters
seen in dynamic two-color displays’. Perception 30: 1423–6.
Wollschläger, D., Rodriguez, A. M., and Hoffman, D. D. (2002). ‘Flank transparency: The effects of gaps,
line spacing, and apparent motion’. Perception 31: 1073–92.
Wuerger, S., Shapley, R., and Rubin, N. (1996). ‘On the visually perceived direction of motion by Hans
Wallach: 60 years later’. Perception 25: 1317–68.
Zanforlin, M. (2006). ‘Illusory space and paradoxical transparency in stereokinetic objects’.
In Visual Thought: The Depictive Space of Perception, edited by L. Albertazzi, pp. 99–104.
(Amsterdam: Benjamins).
Zanforlin, M. and Vallortigara, G. (1990). ‘The magic wand: a new stereokinetic anomalous surface’.
Perception 19: 447–57.
Chapter 21

Perceptual organization of color


Hannah E. Smithson

Trichromacy and Human Color Perception


Overview
Human perception of color starts with the comparison of signals from three classes of cone photo-
receptor, with peak sensitivities in the long-, middle- and short-wavelength regions of the vis-
ible spectrum. Colourimetry—the measurement and specification of color—allows prediction of
metameric matches in which two lights with different spectral energy distributions are indis-
criminable, at least under well-controlled viewing conditions, because they offer the same triplet
of cone signals. The success of these predictions, however, belies the difficulties of predicting color
appearance. In this chapter we discuss the perceptual space in which color resides. We start by
considering the perceptual organization of color in terms of the structure of color spaces designed
to represent relationships between colors. We then consider the dependence of perceived color
on the spatial and temporal context in which colors are seen, and on the perception of lights and
surfaces.

Background
Trichromacy suggests a three-dimensional space for the organization of color. In his Bakerian
Lecture to the Royal Society, Thomas Young (1802) made the explicit connection between the
three-dimensionality of human color vision—that any spectral light can be matched by a combin-
ation of just three independent lights—and the existence of three types of physiological receptor,
distinguished by the wavelengths of light to which they respond most vigorously. At the start of
the eighteenth century, trichromacy had been exploited extensively for the practical purpose of
color reproduction for which only three primaries are needed; and indeed, by the late eighteenth
century, George Palmer (1777) and John Elliot (1780) had also made explicit early statements of
biological trichromacy (see Mollon 2003 for review).
In a remarkable short treatise from the thirteenth century, Robert Grosseteste sets out a three-
dimensional space of color in which three bipolar qualities—specifically the Latin pairings multa–
pauca, clara–obscura, and purum–impurum—are used in combination to account for all possible
colors (Dinkova-Bruun et al. 2013). The qualities multa–pauca and clara–obscura are considered
as properties of the light, and purum–impurum is considered as a property of the ‘diaphanous
medium’ in which light is incorporated. According to Grosseteste, whiteness is associated with
multa–clara–purum; and blackness with pauca–obscura–impurum. But Grosseteste moves away
from the Aristotelian one-dimensional scale of seven colors between white and black, instead
defining seven colors close to whiteness that are generated by diminishing the three bipolar quali-
ties one at a time (to give three different colors), or two at once (to give a further three), or all three
at once (to give the seventh). A further seven colors are produced by increasing the qualities from
Perceptual Organization of Color 437

blackness. By allowing infinite degrees of intensification and diminution of the bipolar qualities, he
describes a continuous three-dimensional space of color (Smithson et al. 2012).
Without wanting to over-interpret this particular text, it is worth noting several important
points that it raises about the perceptual organization of colour. First, for Grosseteste, the per-
ceptual experience of colour resides in a three-dimensional space, which can be conveniently
navigated via a combinatorial system. Second, the space of colours is continuous, but some direc-
tions in this space have a special status, for they identify discrete categories of colour. Third, the
interaction of light and materials is fundamental to our experience of colour—an observation
reiterated throughout the treatise and summarized in the opening statement, ‘Colour is light
embodied in a diaphanous medium.’ These three themes, albeit recast rather differently from the
thirteenth-century account, form the basis of the present chapter.

The Dimensionality of the Perceptual Experience of Color


Lights in a Void
Trichromatic color space describes the signals that are available to downstream stages of the vis-
ual system; it in no way describes the sensations that those signals evoke. Multidimensional scal-
ing methods have been applied to similarity judgments of pairs of color samples in an attempt to
extract the fundamental dimensions that best capture these relationships (Indow and Kanazawa
1960; Indow and Uchizono 1960). Such analyses have suggested that the perceptual qualities of an
isolated light, seen as if through an aperture and unrelated to other lights, are usefully described
in terms of the dimensions of hue, brightness, and saturation (although note that, as described by
Wyszecki and Stiles (1982), the technically correct terms are hue, lightness, and chroma). Using
these qualities to navigate the perceptual space of color requires a test of whether these qualities are
truly independent perceptual dimensions. It is clear that the physical variables that correlate strongly
with one perceptual quality do not modify that quality independently of other perceptual qualities.
Two striking examples are the Bezold-Brücke effect, in which a change in intensity is accompanied
by a shift in hue (see Boynton and Gordon 1965 for review), and the Abney effect, in which lines of
constant hue are curved when plotted in a color space that would show a change in spectral purity
(the physical quality that correlates strongly with saturation) as a straight line from white to a point
on the spectral locus (Burns et al. 1984). Burns and Shepp (1988) have provided an explicit test of
the independence of subjective dimensions of color, asking whether the organizing principles of one
particular set of experiences are independent of experiences along a second subjective dimension.
They used dissimilarity judgments and both spontaneous- and instructed-classification tasks. Like
other researchers before them (Garner 1974; Shepard 1964), they argue that color experiences are
generally integral or unitary—processed as homogeneous wholes—rather than analysable or separ-
able (Townsend and Wenger, this volume)—processed according to their component dimensions
of hue, brightness, and saturation. A subset of participants with considerable skill and training was
able to identify shared levels of value or of chroma in the presence of variation in hue, but could not
identify shared levels of hue in the context of variation in the other two dimensions.
Multidimensional scaling is not a good method by which to test the underlying geometry
of color space (Indow 1980), for the analysis itself rests on evaluation of distance according to
some chosen metric (e.g. Euclidian or city-block distance). Wuerger, Maloney, and Krauskopf
(1995) explicitly tested whether human judgments on three different color-proximity tasks were
consistent with a Euclidean geometry on a trichromatic color-matching space. They tested for
additivity of angles and for increased variability of judgments with increased color-separation
438 Smithson

between test and comparison stimuli. All three color-proximity tasks failed these tests, suggesting
that observers do not employ a Euclidian distance measure when judging the similarity of colored
lights. The growth of the variability of judgments was consistent with the assumption that observ-
ers use a city-block metric.

Lights in Context
Metamerism—in which two lights with different spectral energy distributions are indiscriminable
because they offer the same triplet of cone signals—implies that the three-dimensional space of
cone signals is exhaustive in describing the gamut of color experience. This is true under certain
limited conditions of observation, for example when a small patch of light is seen in isolation
against a black surround, as if through an aperture. However, if we consider regions of extended
spatial extent, descriptions of color perception become more complex.
For extended spatial regions that are nonhomogeneous in chromaticity and luminance, the dom-
inant mode of perception is that of illuminated surfaces. The spectral composition of light reach-
ing the eye from a point in a scene of illuminated surfaces is a function of the spectrally selective
reflectances of the surfaces, and the spectral composition of the illumination. The extent to which
observers compensate for changes in the illumination to extract a stable representation of the color
properties of a surface is known as color constancy, and will be discussed later (see ‘Objects and
Illumination’). The tendency for human observers to exhibit at least partial color constancy means
that color perception of objects, and of the materials from which they are made, is categorically dif-
ferent from the perception of isolated lights, or of surfaces viewed through an aperture.
Furthermore, object-colors have additional qualitative dimensions: for example they can appear
glossy or matte; rough or smooth; cloudy or transparent. These qualities are associated with par-
ticular signatures of chromaticity and luminance variation across space. Katz (1911) dedicates the
first chapter of his book on color to classifying modes of appearance of color and the phenomen-
ology of illumination. He draws distinctions between ‘film colors and surface colors’; ‘transparent
film, surface and volume colors’; ‘mirrored color and lustre’ and ‘luminosity and glow’. These terms
all refer to how colors appear in space. Katz’s examples frequently refer to material dimensions
of color, such as metallic lustre or the lustre of silk or of graphite, yet he is careful to distinguish
between the phenomena and the conditions of their production. One hundred years on, the cor-
respondences between the physical and perceptual variables associated with these higher qualities
remain relatively poorly understood (for reviews see Adelson 2001; Anderson 2011; Anderson,
this volume). With advances in computer graphics, it has become possible to generate physic-
ally accurate renders of materials and their interaction with the light that illuminates them, thus
allowing carefully controlled experiments on perception of object-colors. It is clear that percep-
tual qualities associated with color variation across space provide systematic information about
the stuff from which objects are made (Fleming, Wiebel, and Gegenfurtner 2013). It is also clear
that these judgments are often based on a range of simple but imperfect images measurements
that correlate with material properties, rather than physically ‘correct’ inverse-optics computa-
tions (see section, ‘Perceptual correlates of material properties’).

When Human Color Perception is Not Trichromatic


With signals from three univariant photoreceptor mechanisms, metamerism is a strict limit that
downstream visual stages can do nothing to overcome. Adaptation, for example, may change the
appearance of colored lights, but cannot render metamers distinct (Rushton 1972). However, if
the effective spectral sensitivity of the underlying mechanisms is changed, Grassman’s (1853) laws
Perceptual Organization of Color 439

of proportionality and additivity of metameric matches can fail (see Koenderink 2010 for review).
These subtleties in colorimetry impose important constraints on the perceptual organization of
color across the visual field, and across the lifetime. The extent to which color appearance is main-
tained despite such changes suggests the operation of sophisticated recalibration or constancy
mechanisms (Webster et al. 2010; Werner and Schefrin 1993), discussed in more detail below (see
‘Organization imposed by environmental factors’).
Individuals who are missing one of the three classes of cone are described as having dichro-
matic color vision. A subset of the dichromat’s color matches will fail to match for the normal
trichromat, but all of the normal trichromat’s matches will be acceptable to the dichromat. In this
way, dichromacy is a reduction, rather than an alteration, of trichromatic color vision. However,
individuals who are described as anomalous trichromats, by virtue of possessing a cone class
with spectral sensitivity shifted from that of the normal trichromat, will require different ratios
of matching lights in a color matching experiment. There will therefore be pairs of lights with
different spectral power distributions that are metamers for the normal trichromat but that are
discriminable to the anomalous trichromat. Deuteranomalous individuals—about 6 per cent of
men—rely on signals from S-cones and two forms of long-wavelength cone (L′ and L). The spec-
tral sensitivities of the L′—and L-cones are similar, but sufficiently different that comparison of
their signals yields a useful chromatic signal. By designing a set of stimuli that were separated
along this deuteranomalous dimension (but intermingled along the standard L versus M oppo-
nent dimension) Bosten et al. (2005) obtained multidimensional scaling data that revealed a color
dimension unique to these so-called ‘color deficient’ observers.
A female carrier of anomalous trichromacy has the potential to exhibit tetrachromatic vision,
since she expresses in her retina four cone classes that differ in their spectral selectivity—the
standard S, M, and L cones, plus cones expressing the anomalous M′ or L′ pigment. However,
merely expressing four classes of cone photoreceptors does not imply that the signals from these
photoreceptors can be neurally compared to support tetrachromatic perception. From a targeted
search for tetrachromatic women, in which seventeen obligate carriers of deuteranomaly and
seven obligate carriers of protanomaly were tested, Jordan et al. (2010) found only one participant
who could make reliable discriminations along the fourth dimension of color space—the color
dimension she shares with her deuteranomalous son.

The Special Status of Some Colors: Cardinal Axes


and Unique Hues
Opponent Color Processing
Most observers agree that some hues—red, green, yellow, and blue—appear phenomenologically unmixed,
and as such cannot be broken down into component hues (although see Saunders and van Brakel 1997
for critical discussion of the existence of unique hues). These so-called unique hues have been adopted
in opponent-process theory (Hurvich and Jameson 1957) as the end-points of two color channels, one
encoding the opposed directions of redness and greenness and the other encoding the opposed directions
of yellowness and blueness. While cone opponency—broadly defined as drawing inputs of opposed sign
from different cone classes—is a prerequisite for the extraction of a signal that disentangles changes in
wavelength from changes in radiance, the psychophysical evidence for just two chromatically opponent
mechanisms is subtle, and the color-tuning of these mechanisms does not align with the unique hues.
After viewing a colored light, the appearance of a broadband light that previously appeared
achromatic is shifted towards the color associated with the complement of the adapting light.
440 Smithson

The ‘opposite’ nature of these colored after-effects does not require that the sensitivity adjustment
occurs at an opponent site. Since complementary colored after-effects can be obtained with any
colored adapting light, they are consistent either with a reduction in sensitivity of the three cone
classes by an amount that depends on the extent to which each class was stimulated by the adapt-
ing light, or with a rebound response at an opponent post-receptoral site.
With intense adapting lights, the resulting sensitivity adjustments show independence between
cone classes (Williams and MacLeod 1979), but at these levels the photochemical process of
bleaching within the cones dominates over neural adjustments. Below bleaching levels colored
after-effects may still be obtained, and independent adjustments of neural gain within cone
classes—as suggested by von Kries (1878)—are likely to contribute to color appearance. To a first
approximation, Weber’s law holds independently for the three cone classes, but two significant
failures—transient tritanopia (Mollon and Polden 1975; Stiles 1949) and combinative euchro-
matopsia (Polden and Mollon 1980)—provide evidence for sensitivity adjustments at a post-
receptoral opponent site.
Slow temporal modulations of colored lights—from achromatic to saturated and back to achro-
matic—produce time-varying sensations. If the modulated region forms a figure against an achro-
matic surround, the figure merges with the background before figure and ground are objectively
equal, and a figure with the complementary color is apparent when there is no physical difference
between the figure and ground. The temporal signature of these after-effects, measured psycho-
physically, matches the time-varying response and rebound-response of retinal ganglion cells,
suggesting that the afterimage signals are generated in the retina, though they may subsequently
be modified by cortical processing (Zaidi et al. 2012).

The Physiology of Early Post-Receptoral Processing


Looking to the physiology gives some help with understanding the post-receptoral organiza-
tion of color. Early in the visual pathway, retinal ganglion cells compare and combine cone sig-
nals. The so-called midget ganglion cells are silent to lights that modulate only the signal in the
S-cones, but they exhibit strong responses to lights that change the ratio of L- to M-cone signals
whilst holding their sum constant. The small-bistratified ganglion cells show the opposite pat-
tern: they respond strongly to S-cone isolating stimuli but not to exchanges of L- and M-cone
excitations (Dacey and Lee 1994). Chromatic tuning in the lateral geniculate nucleus (LGN)
duplicates this pattern of comparisons, such that the null planes of chromatic responses of LGN
neurons cluster along the constant-S and constant-(L and M) directions (Derrington, Krauskopf,
and Lennie 1984). These results suggest that there is a physiological basis for some directions
in color space having a special status. However, the appearance of the lights that correspond to
these directions in color space does not correspond to the phenomenologically unique hues.
Starting from white, an increase (or decrease) in the S-cone signal corresponds to moving in a
violet (or lime-green) direction, whilst exchanging L- and M-signals moves along an axis that
varies between cherry red (high L, low M) and teal (high M, low L). The relative independence
of the effects of adaptation to modulations along the constant-S or constant-(L and M) axes
on detection thresholds has been used to define these axes as the cardinal axes of color space
(Krauskopf, Williams, and Heeley 1982).

Asymmetries in the Trichromatic Scheme


Asymmetries in the organization of color processing could arise from the differences between
the S-cones and the M- and L-cones. The S-cones comprise less than 10 per cent of cones in the
Perceptual Organization of Color 441

retina and can be identified as morphologically distinct from the other cones (Curcio et al. 1991).
The S-cone pigment is coded on chromosome seven whereas both the M- and L-cone pigment
genes are carried on the X-chromosome and are 96 per cent homologous (Nathans, Thomas, and
Hogness 1986). The dichromatic system shared by most mammals achieves a two-dimensional
color discrimination by comparing the outputs of a short-wave sensitive receptor and a receptor
in the middle- to long-wavelength region of the spectrum. It is thought that the L- and M-cone
pigment genes diverged only fifty million years ago in our evolutionary history, perhaps confer-
ring a behavioural advantage to our primate ancestors in selecting ripe fruit against a background
of young leaves at a distance (Bompas, Kendall, and Sumner 2013; Regan et al. 2001; Sumner and
Mollon 2000a, 2000b) or at arm’s reach (Parraga, Torscianko, and Tollhurst 2002), and piggy-
backing on the machinery of spatial vision that operated with the longer wavelength receptor
(Martin et al. 2011).
There is some evidence that the S-cone signal, the basis of the ancient color vision system,
remains distinct from the machinery dedicated to the main business of photopic vision. The
S-cones, for example, show minimal projections to the subcortical pathways, and S-cone stim-
uli are processed differently from M- and L-cone stimuli in saccadic (but not attentional) tasks
(Sumner et al. 2002). This asymmetry suggests a further way in which not all ‘colors’ are equal in
specifying and shaping our perceptual world. S-cone isolating stimuli additionally elicit longer
reaction times than L/M-opponent stimuli (Smithson and Mollon 2004) and their signals are
delayed before combination with L- and M-cone signals (Lee et al. 2009). Within the color vision
system this presents a specific temporal binding problem (Blake, Land, and Mollon 2008).

The Physiology of Later Color Processing


The chromatic tuning of cells in primary and secondary visual cortex (V1 and V2) shows narrower
tuning of individual units and a more uniform distribution of preferred directions around the hue
circle (Solomon and Lennie 2005) than LGN units. While the color sensitivities of neurons in V1
are substantially invariant to changes in spatial structure and contrast, the color sensitivities of
neurons in V2 are modified by surrounding context (Solomon, Peirce, and Lennie 2004). Those
characteristics that are associated with mid-level vision—concerned with the color of surfaces and
the identification of regions that go together—have traditionally been associated with distinctive
properties of neurons in macaque V4 (and its presumed homologue in humans). Indeed, lesions
in this area are associated with cerebral achromatopsia, and a particular impairment in perceiving
the color of surfaces. On the basis of behavioural and neuroimaging data from normal partici-
pants and neuropsychological patients, Cavina-Pratesi et al. (2010a, 2010b) argue that geometric
and surface properties are dealt with separately within the lateral occipital cortex (LOC) and the
collateral sulcus (CoS) respectively, and that the medial occipitotemporal cortex houses separate
foci for color (within anterior CoS and lingual gyrus) and texture (caudally within posterior CoS).
The visual recognition of real objects depends on more than shape, size, and orientation. Surface
properties such as color and texture are equally important sources of information, and may be
particularly useful in judging what an object is made of, and how it should be handled. Functional
separation of cortical regions for extracting color and texture might indicate differences in the
nature of the computations required to extract these characteristics (see also ‘Perceptual correlates
of material properties’).
Globs—regions of posterior inferior temporal cortex (including V4, PITd, and posterior TEO)
that show higher fMRI responses to equiluminant color than to black-and-white—have been
identified as candidates for the explicit encoding of unique hues (Stoughton and Conway 2008).
442 Smithson

Over-representation of units tuned to particular directions would provide a physiological basis for
the special status of some hues. However, there is a practical difficultly with testing this hypoth-
esis. For a meaningful discussion of the density with which cell-tuning samples the hue contin-
uum, we need to know how to scale the hue and saturation axes. Clumping of neurons’ preferred
directions in one region of hue-space is to be expected if the scaling of the underlying variable
is non-uniform or if some color directions are stimulated more strongly. One candidate scale is
the wavelength scale, but wavelength discrimination thresholds follow a ‘w’-shaped function of
wavelength (Pokorny and Smith 1970), so this is far from a perceptually uniform space. Stoughton
and Conway instead used test stimuli that were linear mixtures of the outputs of a RGB display
(i.e. R-G, G-B, and B-R). But this in itself may have meant that the strongest modulations of early
opponent cells were aligned with the unique hue directions, so that the responses of downstream
neurons inevitably showed a tuning preference for these directions (Mollon 2009).

Organization Imposed by Environmental Factors


It is clear that the locations of the unique hues are not predicted in any simple way from the
underlying physiology of early color vision mechanisms. An alternative is to look to regulari-
ties in the external world. One signature of a material with uniform spectral reflectance is that
it will exhibit no difference between the wavelengths reflected from the body of the material and
specular reflections from the glossy surface; whereas materials whose pigment selectively absorbs
some wavelengths will necessarily show a difference in wavelength content between these two
components. Gaspard Monge outlined this process in a lecture in 1789 (Mollon 2006), thereby
identifying a characteristic of materials that might appear unbiased in their color, perceptually
white (see Figure 21.1).
Other unique hues might similarly be determined by characteristics of the environment. If that
were true, observers should be less variable in judging colored papers than colored lights (Mollon
2006). A curious quirk of unique green settings with monochromatic lights is that they correlate
with iris color. This is understandable if observers agree on the broadband stimulus that is green
and then differ when tested with narrowband lights (Jordan and Mollon 1997). Similar compen-
sations for spectrally selective pre-retinal filtering occur with age, as the physical light associated
with the percept of white remains relatively constant despite the yellowing of the eye’s lens, reset-
ting over the course of months following lens replacement as part of cataract surgery (Delahunt et
al. 2004), and with retinal eccentricity, as the perceived color of both narrowband and broadband
stimuli remains similar at 0° and 8° loci, despite the distribution of yellowish macular pigment
in the central visual field (Webster et al. 2010). However, this compensation is not complete, and
although differences between central and peripheral vision imposed by filtering by macular pig-
ment are relatively stable across the lifetime, and impose systematic chromaticity shifts for a range
of natural and man-made stimuli, the visual system fails to correct as well as it might (Bompas,
Powell, and Sumner 2013).
The locus of lights that appear neither red nor green, and that stretches between blue and yellow,
may similarly be set by properties of our environment. Shepard (1991) has suggested, for example,
that this line is constrained by the two predominant illuminants in the world—skylight and sun-
light (see also Mollon 2006 for relevant measurements).
It seems odd that such regularities in the external world would not be reflected in the underly-
ing organization of our perceptual systems. It would seem prudent to remember the many retinal
ganglion cell types and early retinal circuits whose function is as yet unknown before abandoning
the notion of a physiological correlate of constraints imposed by the organization of our visual
Perceptual Organization of Color 443

Fig. 21.1  Illuminated glossy objects that illustrate several points about the interaction of light and
surfaces. The light reflected to the camera comes either from (i) direct specular reflections from
the surface in which the spectral content of the reflected light matches that of the illuminant, or
(ii) reflections from the body of the material in which the spectral content of the reflected light is
given by the illuminant modified by the spectral reflectance of the surface. Monge’s observation
is clear in the parts of the scene dominated by a single source of illumination, such as the front
of the purple mug. Significant chromatic variation is apparent across the purple-colored surface,
fading from purple to desaturated purple (mixed with white); whereas little chromatic variation is
apparent across the white-colored surface of the same mug.
Image: uncommongoods.com with permission.

environment. Some evidence for the special status of the skylight-sunlight locus in shaping our
perceptual apparatus is provided by the very low thresholds for chromatic discrimination of lights
in this region (Danilova and Mollon 2012).

Organization Imposed by Cultural and Linguistic Factors


It is possible that non-uniformities in the perceptual organization of hue stem from cultural
and linguistic roots. Interaction between color and language again exercised Katz (1911), par-
ticularly in relation to Goldstein and Gelb’s analysis of the color experience of a patient amnesic
for color names (Goldstein and Gelb 1925). More recent analyses have emphasized the distinc-
tion between the continuous nature of the physical parameters underlying color variation, and
444 Smithson

linguistic labels for color that must be discrete. According to the Sapir-Whorf hypothesis, the per-
ception of stimuli depends on the names we give them, and the perception of color has provided
an important test case for the hypothesis. In a seminal study of the color terms used in twenty
unrelated languages, Berlin and Kay (1969) put forward two hypotheses: (1) there is a restricted
universal inventory of such categories; (2) a language adds basic color terms in a constrained
order. They have argued for an underlying structure to the lexicalization of color, which is based
on a universal neurobiological substrate (Kay and Berlin 1997; Kay and McDaniel 1978), but
which leaves scope for Whorfian effects to ‘distort’ perception (Kay and Kempton 1984). Their
thesis has become something of a ‘classic’ but has not achieved universal acclaim, being roundly
criticized by Saunders (2000) on both scientific and anthropological grounds.
If our perceptual space of color were dependent on linguistic labels we might expect several
(testable) consequences: (1) stimuli within categories (given the same name) should look more
similar than those between categories (given different names), and this similarity should have
measureable effects on perceptual judgments (Kay and Kempton 1984); (2) these category-based
effects should be associated with different physical stimuli, depending on the native language
of the participant (Roberson and Hanley 2007; Winawer et al. 2007); (3) pre-language children
should show different perceptual judgments from post-language children (Daoutis et al. 2006);
and (4) training to use new color terms may influence perception (Zhou et al. 2010).
One study in particular has sparked significant research effort in this area. Gilbert et al. (2006)
claimed that between-category visual search is faster than within-category search (by 24 ms), but
only for stimuli presented in the right visual field, a result that they interpret as suggesting the
language centres in the left hemisphere are important in mediating the reaction-time benefit. Such
experiments, however, are riddled with difficulties. As discussed above, there are significant inter-
observer differences in factors that influence the very first stages of color perception (pre-recepto-
ral filtering by lens and macular pigment, differences in receptor sensitivities), and the observer’s
adaptation state has a strong influence on perceived color difference. Witzel and Gegenfurtner
(2011) ran several different versions of the Gilbert et al. study and related studies, but in each
case they included individual specification of color categories, and implemented careful control of
color rendering and of the adaptation state. They found that naming patterns were less clear-cut
than original studies suggested, and for some stimulus sets reaction times were better predicted by
JNDs than by category effects. As we saw with the search for the neural encoding of unique hues,
a recurrent difficulty is the choice of an appropriate space from within which to select test stimuli.
Brown, Lindsey, and Guckes (2011) identified this need for an appropriate null hypothesis—if lin-
guistic category effects do not predict reaction times for visual search, what are they predicted by?
They replicated the Gilbert et al. study, making methodological improvements that were similar to
those introduced by Witzel and Gegenfurtner (2011), but added an independent measurement of
the perceived difference between stimuli (assessed via Maximum Likelihood Difference Scaling,
MLDS). They were unable to replicate Gilbert et al.’s result, and reaction times were simply pre-
dicted by the reciprocal of the scaled perceived difference between colors.

Color and Form


Processing of Color- and Luminance-Defined Contours
It is widely held that the primary signals for form perception are carried in variations of lumi-
nance. But empirical evidence for the strong segregation of color and form responses in cortex is
weak. Staining with the mitochondrial enzyme cytochrome oxidase (CO) reveals CO-rich blobs
Perceptual Organization of Color 445

in V1 and thin bands in V2. Although these anatomical subregions have been shown by several
labs to contain a high proportion of cells that are selective for color and a high proportion of
cells that are not selective for orientation (see Gegenfurtner 2003 for review), it cannot be con-
cluded from these measurements that it is, for example, the color-selective cells in the thin stripes
that are not orientation selective. Within-cell measurements of color- and form-selectivity in a
large number of neurons in V1 and V2 of awake behaving monkeys show no correlation between
color and form responses (Friedman, Zhou, and von der Heydt 2003), providing no evidence for
segregation.
Sumner et al. (2008) tested fMRI responses to orientation signals that were defined by lumi-
nance, or by L/M-opponent or S-opponent chromatic modulation. At arrival in V1, S-cone
information is segregated from the pathways carrying form information, while L/M-opponent
information is not. Nevertheless Sumner et al. found successful orientation discrimination, in V1
and in V2 and V3, for luminance and for both color dimensions, suggesting that a proportion of
cells shows joint selectivity to both color and orientation.
Friedman et al. (2003) have explicitly tested the contributions of color-selective cells to the
analysis of edges and surfaces. They found no difference in edge-enhancement between color- and
luminance-selective cells. This contradicts the ‘coloring book’ notion that the form of an object is
processed through achromatic channels, with color being filled-in later, and by separate mecha-
nisms. Instead we see color, orientation, and edge-polarity multiplexed in cortical signals.

Availability of Color- and Luminance-Defined Contours


This is not to say that there are not important differences in the constraints on the informa-
tion that can be extracted about color and luminance variation across space. Certainly, the L-M
opponent cells in the parvocellular layers of the LGN are bandpass for luminance and lowpass
for equiluminant chromatic stimuli (Derrington et al. 1984; Lee et al. 2012). For spatial forms
that are defined only by chromatic variation in the S-cone signal the situation is particularly
marked. The S-cones constitute only 5 to 10 per cent of human cones. They are absent from a
central region of about 0.4° with a ring of relatively high S-cone density just outside this region,
and are otherwise fairly evenly distributed across the retina (Curcio et al. 1991). So the S-cones
necessarily sample the visual image rather sparsely and convey correspondingly coarse spatial
information.
For most real stimulus displays, the relative strength of luminance- and chromaticity-defined
contours is further biased in favour of luminance by the maximal achievable chromatic contrast
in equiluminant stimuli: the substantial overlap between the L- and M-cone sensitivities limits
the L- or M-cone Weber contrast to about 0.3. Psychophysical studies reinforce the argument that
the processing of form defined by color is limited mainly by the contrast in the cones and not by
subsequent processing (Webster, Devalois, and Switkes 1990).

Organization Imposed by Luminance-Defined Contours


Capture of color contours by luminance contours can lead to striking displays. In a demonstra-
tion attributed to Boynton (Stockman and Brainard 2009), weak color contours appear to follow
spatial forms defined by high-contrast luminance contours (see Figure 21.2a), an effect exploited
by watercolour artists (Pinna, Brelstaff, and Spillmann 2001). The propensity for colors to melt
into one another (and see Koffka and Harrower 1931 for discussion of ‘soft’ versus ‘hard’ colors;
Liebmann 1927) is particularly pronounced for color borders that are defined only by the modu-
lation they offer to the S-cones (Tansley and Boynton 1976).
446 Smithson

(a)

(b)

Fig. 21.2  (a) The Boynton Illusion. The wavy color contour between yellow and grey in the left-hand
image is captured by the smooth black contour. The wavy luminance contour between dark and
light grey in the right-hand image is robust to capture. (b) A plaid constructed by adding a vertical
LM-opponent grating and a horizontal S-opponent grating (left) appears to be dominated by violet-
lime variation when horizontal black contours are applied (middle); and dominated by cherry-teal when
vertical black contours are applied (right).
Data from Stuart Anstis, Mark Vergeer, and Rob van Lier, Luminance contours can gate afterimage colours and
‘real’ colours, Journal of Vision, 12(10), pp. 1–13, doi: 10.1167/12.10.2, 2012.

Contrast sensitivity for low-frequency L-M square-wave gratings can be facilitated by the add-
ition of luminance variation, but the facilitation is abolished at a relative phase of 90° (Gowdy,
Stromeyer, and Kronauer 1999). The result is consistent with integration of color between lumi-
nance edges and comparison across edges. Anstis, Verger, and Van Lier (2012) have further
investigated the ‘gating’ of color by contours. For a colored plaid constructed by superimposing
a blue-yellow vertical sinusoidal grating on a red-green horizontal sinusoidal grating, they used
contours defined by a combination of thick black lines and regions of random-dot motion. When
the contours were horizontal and aligned with the zero-crossings of the horizontal grating, the
plaid appeared red-green; when the contours were vertical and aligned with the zero-crossings of
the vertical grating, the plaid appeared blue-yellow (see Figure 21.2b).

Organization Imposed by Color


Color similarity is sufficient to impose a perceptual organization when spatial proximity is
matched, and indeed such effects have been used to measure the relative salience of color dif-
ferences along cardinal axes in normal and anomalous trichromats (Regan and Mollon 1997).
McIlhagga and Mullen (1996) tested contour integration for color- and luminance-defined stim-
uli, and found that color alone is sufficient to delineate a contour, provided that contrast is suf-
ficiently high. If contrast is first scaled according to discrimination thresholds for orientation,
Perceptual Organization of Color 447

equivalent performance is obtained for color- and luminance-defined contours if the color-
defined contours are presented with a further two-fold increase in contrast. When contours are
defined by alternating elements of color and luminance, performance declines significantly, but
not as much as would be expected from entirely independent processing of color and luminance
edges.
Texture gradients provide a strong monocular cue to depth. Zaidi and Li (2006) showed that
chromatic orientation flows are sufficient for accurate perception of 3D shape. The cone-contrast
required to convey shape in chromatic flows is less than the cone-contrast required in achromatic
flows, indicating that sufficient signal is present in orientation-tuned mechanisms that are also
color-selective. Identification of shape from chromatic flows is masked by luminance modula-
tions, indicating either joint processing of color and luminance in orientation tuned neurons, or
competing organizations imposed by color and luminance.
Troscianko et al. (1991) had previously shown that estimates of the slant of a surface defined
by texture gradients are the same for textures defined by chromaticity and those defined by chro-
maticity and luminance. These authors also find that gradients of brightness and saturation (in
the absence of texture gradients, or in addition to texture gradients) can modify perceived depth,
consistent with the gradual changes in luminance or saturation that are produced as a result of
the increase in atmospheric scattering with distance. Luminance gradients are important in con-
veying 3D shape, through a process described as shape-from-shading, and interactions between
luminance and color gradients have been interpreted with respect to the correspondence between
luminance and color gradients in the natural environment of illuminated surfaces (Kingdom
2003), which we discuss in ‘Configural effects’.
Color can facilitate object segmentation. For example, color vision can reveal objects that are
camouflaged in a greyscale image. Random chromatic variations can also hamper segmentation
of luminance-defined texture boundaries—a phenomenon that is exploited in both natural and
man-made camouflage (Osorio and Cuthill 2013, this volume). Interestingly this presents an
opportunity for dichromatic observers to break such camouflage, since they do not perceive the
chromatic variation (Morgan, Adam, and Mollon 1992).
In the classical random-dot stereogram, the arrays presented to left and right eyes are composed
of binary luminance noise. If the random-dot pattern is made equiluminant, such that the cor-
respondence of matching elements is defined only by their chromaticity, stereopsis fails (Gregory
1977). However, introducing color similarity to matching elements improves stereopsis (Jordan,
Geisler, and Bovik 1990), and in global motion the introduction of a color difference between
target and distractor elements reduces the number of target dots required to identify the direction
of motion (Croner and Albright 1997).
Improvement in thresholds for luminance-defined global motion in the presence of color simi-
larity between target elements suggests that color may be a useful cue for grouping elements
that would otherwise be camouflaged. This color advantage, however, is dependent on select-
ive attention, and disappears in displays that are designed to render selective attention useless
(Li and Kingdom 2001). The ‘Colour Wagon Wheel’ illusion (Shapiro, Kistler, and Rose-Henig
2012) lends further support to the idea that color provides a feature-based motion signal that can
become perceptually uncoupled from the motion-energy signal.

Combination of Color-Defined Features


A recurrent finding in the integration and combination of features defined by color is the rela-
tive selectivity of responses to stimuli defined along cardinal directions in color space (see ‘The
Physiology of Early Post-Receptoral Processing’). Contour-shape mechanisms, which show
448 Smithson

after-effects for shape-frequency and shape-amplitude, are selective for contours defined for the
S-opponent and L/M-opponent cardinal axes (Gheorghiu and Kingdom 2007). Contrast-contrast
effects, in which a region of fixed contrast appears to have a lower contrast when surrounded
by a region of high contrast, are selective for contrast within a cardinal mechanism (Singer and
Dzmura 1994). Plaids comprised of drifting gratings modulated along different cardinal direc-
tions appear to slip with respect to one another, whereas gratings modulated along intermediate
directions in color space tend to cohere (Krauskopf and Farell 1990).
McKeefry, Laviers, and McGraw (2006) present a more nuanced account of the separability
of color inputs to motion processing. They found that the traditional motion after-effect, where
prolonged viewing of a stimulus moving in one direction causes a stationary stimulus to appear to
move in the opposite direction, exhibited a high degree of chromatic selectivity. However, biases
in the perceived position of a stationary stimulus following motion adaptation, were insensitive to
chromatic composition. The dissociation between the two types of after-effect suggests that chro-
matic inputs remain segregated at early stages of motion analysis, while at later processing stages
there is integration across chromatic and achromatic inputs.
Grouping of elements that are similar in terms of the underlying physiological mechanisms
that process them is a recurrent theme in several modern accounts of perceptual organization.
For example, Gilchrist (this volume) shows how simultaneous contrast can be strengthened or
diminished by manipulating the relative spatial frequencies of the figure and ground of the stand-
ard display. Anderson (this volume) presents a strong argument for analysing scenes in terms
of physiologically relevant parameters, such as contrast ratios rather than luminance-difference
ratios. Whilst the Gestalt psychologists were critical of analyses that carve perception into under-
lying channels or modules, the organization of the underlying physiology may still be used to
inform us about the emergence of structure in perceptual experience. For it is likely that the
organization of our neural systems at least in part reflects the organization of our sensory world.

Color and Form in After-Effects


From a sequence of short experiments, Daw (1962) argues that colored afterimages do not gen-
erally trouble us in day-to-day visual experience simply because they are inhibited except in the
special situation where the (luminance-defined) scene is in geometric registry with the afterim-
age. Powell, Bompas, and Sumner (2012) concur, additionally presenting evidence that luminance
edges enhance afterimages more than they enhance physical stimuli of similar appearance.
Anstis et al. (2012) show conditions in which the same adapting pattern can generate two differ-
ent afterimage patterns, depending on the luminance contours that are presented during the test
phase. Their adapting stimulus is a four-color plaid constructed by adding a vertical blue-yellow
grating and a horizontal red-green grating. When tested with vertical achromatic contours, the
after-effect is yellow-blue; when tested with horizontal achromatic contours, the after-effect is
green-red. The effect is consistent with spatial averaging of afterimage colors within contours, but
not across contours—a result that echoes the result for the appearance of real plaids with super-
imposed contours (see ‘Organization imposed by luminance-defined contours’).
Orientation-dependent colored after-effects have been described by McCollough (1965).
Adaptation to, for example, red-black vertical gratings and green-black horizontal gratings causes
white-black vertical and horizontal gratings to appear tinged with green and with red respect-
ively. The effect is particularly long-lasting, documented to last days at least (Jones and Holding
1975). Such contingent after-effects have been demonstrated for several combinations of features,
and their long-lasting effects may simply reflect the rarity in the natural world of those stimulus
Perceptual Organization of Color 449

combinations that would be required to re-adapt the observer to a different norm (Vul, Krizay,
and MacLeod 2008).
Under conditions of binocular rivalry, it is possible for a pink-grey vertical grating presented to
the left eye and a green-grey horizontal grating presented to the right eye to be perceived as either
a horizontal or vertical pink-green grating—a perceptual misbinding of color from one eye into
a spatially selective part of the form defined in the other eye (Hong and Shevell 2006). It is also
possible to obtain afterimages of the misbound percept. Importantly, Shevell, St Clair, and Hong
(2008) argue that the afterimage is derived from a central representation of the misbound percept,
rather than as a result of resolution of rivalrous monocular afterimages. They showed that when
adapting stimuli were pulsed, simultaneously or in alternation to the two eyes, misbound after-
images were obtained only in the simultaneous condition. Since it is only this condition that has
rivalrous dichoptic stimuli, their results imply adaptation of a cortical mechanism that encodes
the observer’s (misbound) percept.

Color induction and perceptual grouping


When one colored light is presented in close spatial and temporal proximity to another, its appear-
ance may change. Such color induction may shift the appearance of the test light towards the
appearance of the inducing light (an assimilation effect), or away from the appearance of the
inducing light (a contrast effect). Some authors consider color induction and perceptual grouping
as inherently linked, for example by interpreting assimilation as a by-product of the integration of
parts into one whole (Fuchs 1923; Musatti 1931) and by interpreting contrast as a result of main-
taining separate wholes (e.g. King 1988, 2001).
Empirical studies that connect color induction and perceptual grouping are relatively rare. Xian
and Shevell (2004) have shown how the color appearance of a test patch depends on the color
appearance of other elements of the display with which it is grouped. In their experiment, the
test patch was a small square that was grouped with a set of horizontal bars of different lengths
arranged in an hour-glass configuration above and below the test. They modified the appearance
of the grouped elements by local induction from a striped background (rather than by a physical
change in the elements themselves), and they found that the measured influences on the appear-
ance of the test are consistent with the hypothesis that chromatic assimilation occurs among ele-
ments belonging to the same group.
However, this experiment is a rather indirect test of the influence of grouping on assimilation,
since it is the color appearance of the grouped elements that is manipulated, and not the strength
of the grouping per se. In a coherent set of follow-up experiments Xian and Shevell have per-
formed multiple tests of the hypothesis that the stronger the perceptual grouping, the larger the
shift in appearance toward the co-grouped elements (Xian 2004). In particular, they showed that
weaker color shifts were obtained when (1) motion of the test and inducing bars was in opposite
directions rather than the same direction; (2) the test and inducing bars were dissimilar in their
chromaticity or luminance; and (3) binocular disparity was introduced such that the inducing
bars were perceived in a single-depth plane in front of the test, but not when the test and inducing
bars were perceived as belonging to a three-dimensional ‘V’-shaped hour-glass structure. These
findings provide strong evidence that perceptual grouping causes chromatic assimilation among
components that are grouped together. Since any effect of binocular disparity must be due to bin-
ocularly driven cortical cells, the last experiment points to involvement of a central neural mech-
anism in color assimilation. A similar conclusion was reached by de Weert and van Kruysbergen
(1997) on the basis that assimilation occurs after the figure-ground segregation has taken place.
450 Smithson

Objects and Illumination


A Segmentation Problem
Our sensory experience is of a world comprised of objects of particular shapes and sizes, which
are made of particular stuff and illuminated by particular light sources. As such, our perception
is the result of a process of segmentation in which sensory stimulation is interpreted as coming
from discrete sets of causal sources in the world. The light imaged at a particular location on the
retina does not contain separable information about the reflectance characteristics of materials,
the spectral energy distributions of the lights that illuminate them, and the spectral transmit-
tance of any intervening filters. So color perception for any of these constituents must rely on
geometric and chromatic relationships across an extended spatial area, and on how these change
over time.
Anderson (this volume) discusses transparency, lightness, and gloss within a similar conceptual
framework. In lightness perception, we can identify scission models in which the illuminant and
surface reflectance are explicitly segmented; equivalent illumination models in which an estimate
of the illuminant is derived and then used to recover reflectance properties from the image data;
anchoring theory in which luminance ratios are used to derive information about relative light-
ness and the resultant scale is anchored by mapping one image luminance (e.g. the highest) onto
a fixed lightness value (e.g. white); and filtering or filling-in models in which percepts are simply
the outputs of local image filters applied directly to the image.
Lightness constancy (in an achromatic world in which surface reflectance and illumination are
specified by scalar values) and color constancy (in a chromatic world in which surface reflectance
and illumination are functions of wavelength) share many of the same computational problems.
Indeed, many models of lightness and color constancy share similar computational tricks. The
well-known retinex algorithms of Land (1986) and Land and McCann (1971) rely heavily on
relational coding, making assumptions about the mean color of a scene (e.g. grey world) or about
the brightest elements in a scene (e.g. brightest is white) to anchor the relational code. While
relational coding is a central notion from Gestalt psychology, it is also the Achilles’ heel of the
retinex models. For, the normalization performed in retinex depends heavily on the set of surfaces
available in the scene (Brainard and Wandell 1986). Human vision on the other hand maintains
approximate color constancy despite variation both in the spectral composition of the illumin-
ant and variation in the spectral reflectances of nearby surfaces (an issue to which we return in
‘Configural effects’).
Equivalent illumination models have been particularly successful in providing a compact
description of the effect of changing illumination on color appearance (see Brainard and Maloney
2011 for review and detailed discussion). One powerful feature of these models is that they sep-
arate the modelling problem into two parts. First, what is the parametric form of the transform-
ation imposed on the raw image signals by a change in illumination, and second, how are the
parameters of this transformation determined from the image data? For lightness constancy, the
physical parameters of reflectance and illumination allow the transformation to be fully described
by a multiplicative scaling of the luminance values in the image. In this case there is no question of
how well a multiplicative transformation accounts for the physical situation, though there may be
uncertainty as to whether the visual system uses such a transformation to derive perceived light-
ness from the raw luminance signals, and indeed how the appropriate scale factor is determined.
For color constancy, the parametric form of the transformation is not immediately obvious, as we
shall discuss next.
Perceptual Organization of Color 451

Color Conversions with Spectral Filters and Illuminant Changes


A set of surfaces with particular spectral reflectances, viewed under a particular illumination
(or through a thin filter with a particular transmittance), is associated with a spatial distribu-
tion of cone signals (see Figure 21.3). The cone signals at any point can be calculated from the

Reflected light

Sunlight

Reflectance
Color conversion
Reflected light

Skylight

Reflectance

1 1 1
L-cones M-cones S-cones
Signals under sunlight

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
Signals under skylight

Fig. 21.3  The light that reaches the eye from a surface depends on the spectral reflectance of the
surface and the spectral energy content of the illuminant (e.g. sunlight or skylight). Example spectral
energy distributions or reflectances are shown in the inset panels. The scatter plots show the L-, M-, and
S-cone signals for a set of 100 surfaces under skylight (x-axis) or sunlight (y-axis). The effect of changing
illumination is approximately described by a multiplicative scaling of the signals in the three cone classes.
The multiplicative constant for each cone class, and the gradients of the line on which the points fall,
depends on the illuminants that are compared. The red symbols represent the cone signals from a
surface with uniform spectral reflectance, which correspond to the signals from the relevant illuminant.
452 Smithson

wavelength-by-wavelength multiplication of reflectance, transmittance, and illumination, inte-


grated over the wavelength sensitivity of each cone-type. A change of illumination, or a change
in filter, changes these signals, imposing what can usefully be described as a ‘color conversion’
(Smithson 2005).
In principle, with arbitrary lights, surfaces, and filters, these color conversions can be com-
plex. For example, surfaces that offered high L-cone signals under one illumination might offer
relatively low L-cone signals under another. However, empirical measurements of environmental
spectra suggest that for the vast majority of natural surfaces and illuminants, color conversions
imposed by illuminant exchanges are well summarized by multiplicative scaling of the L-cone
signals, the M-cone signals, and the S-cone signals, where the relative scaling for each cone class
depends on the particular illuminant exchange (Foster and Nascimento 1994).
Do observers exploit these regularities in the statistics of the natural world? If, for each cone
class, the visual system encoded the spatial ratios of signals from different surfaces, this code
could be used by observers to discriminate between scenes that changed in illumination and
scenes that changed in reflectance: the code would be virtually unchanged by a change in illu-
mination but would be disturbed by a change in the surfaces comprising the scene. It has been
suggested that this signal might support operational color constancy, i.e. the ability to distin-
guish between a change in illumination and a change in surface reflectance (Craven and Foster
1992). Observers are certainly highly sensitive to violations of the invariance of spatial cone-
excitation ratios, at least when the two images are presented in quick succession (Linnell and
Foster 1996). When asked to detect changes in surface reflectance that are made to accompany a
fast illuminant change, multiple simultaneous surface changes can be detected almost indepen-
dently of the number of surfaces. This performance suggests that violations of the invariance of
spatial cone-excitation ratios are detected pre-attentively, via a spatially parallel process (Foster
et al. 2001).
Westland and Ripamonti (2000) have additionally argued that invariance of cone-excitation
ratios may also be a necessary condition for the perception of transparency (see Figure 21.4), and
indeed, when asked to discriminate between sequences that preserved the spatial cone-excitation

(a) (b)

Fig. 21.4  (a) A strong impression of transparency is generated by spatio-chromatic arrangements


that preserve cone-ratios across a boundary. (b) The impression of transparency is abolished in static
displays by rotating the filtered region and disrupting the associated X-junctions.
Data from Stephen Westlund and Caterina Ripanmonti, Invariant cone-excitation ratios may predic transparency,
Journal of the Optical Society of America, 17 (2), pp. 255–264, Figure 1, 2000.
Perceptual Organization of Color 453

ratios for filtered and unfiltered regions and sequences that did not, observers identified the stable
cone-ratios with the transparent filter (Ripamonti and Westland 2003).
Faul and Ekroll (2002), however, contest the claim that invariance of cone-excitation ratios is
necessary for transparency. Westland and Ripamonti’s (2000) analysis was based on a simplified
model of transparency in which the effective reflectance (R´ (λ)) of a surface covered by a filter
was given by a wavelength-by-wavelength multiplication of the reflectance spectrum of the sur-
face (R(λ)) with the absorption spectrum of the filter (T(λ)), reduced by the internal reflectance of
the filter (r) and observed in double-pass, such that: R´(λ) = R(λ) [T(λ) (1−r)2]2. Starting from a
more complete model of physical filtering—in which the filter is specified by its absorption spec-
trum, thickness, and refractive index—Faul and Ekroll (2002) derive a psychophysical model of
perceptual transparency that uses a three-element scaling vector (operating on the cone signals)
to characterize the color and thickness of the filter (corresponding to the direction and magni-
tude respectively of the scaling vector) and an additional parameter to characterize the perceived
‘haziness’ of the filter. For the special case when the refractive index of the filter is equal to one,
and close to that of air, Faul and Ekroll’s model matches Westland and Ripamonti’s model, and
predicts constant cone-excitation ratios. For filters with higher refractive indices, the prediction
does not hold, and Faul and Ekroll’s model provides a better description of their perceptual data.

Perceptual Correlates of Material Properties


These experiments highlight the way in which structured changes of color—namely the consistent
remapping of cone-signals under changes in the spectral content of the illumination or the spec-
tral transmittance of a filter—provide strong cues about perceptual organization. Interestingly,
chromatic transparency reveals perceptual heuristics that are hidden in the achromatic case. With
achromatic transparency, additive color mixture, encompassed by variants of Metelli’s epicoster
model, provides a reasonably accurate account of our perception (see Gerbino, this volume).
Yet, for chromatic transparency, our perception is dominated by subtractive color mixture, as
described by filter models.
Perception, considered as the estimation of the intrinsic properties of objects in the world, can-
not depend on a full characterization of the physical interactions between light and matter, not
least because our perceptual apparatus is limited by the sensory data available. One alternative
suggestion is that human vision relies on a number of image statistics that correlate, albeit imper-
fectly, with object attributes (e.g. Fleming, Dror, and Adelson 2003; Ho, Landy, and Maloney 2008).
A second alternative is that the visual system ‘corrects’ the image data by estimating and discount-
ing the contribution of incidental factors, such as illumination (e.g. D’Zmura and Iverson 1993;
Maloney and Wandell 1986). Signatures of both suggestions can be found in perceptual data, and
it is likely that their relative strengths depend on the information available under the particular
viewing circumstance. The ‘recovery’ of physical parameters of the scene from perceptual informa-
tion is necessarily under-constrained, and our task is not to evaluate perception against veridical
extraction of these physical parameters but to understand the relationship between sensory input
and perceptual experience (see Anderson, this volume for discussion of this approach). Research
on material perception is a growing field, particularly as physically accurate computer rendering
of surface properties, such as gloss (Olkkonen and Brainard 2010), and volume properties, such as
transparency and translucency (Fleming and Bülthoff 2005; Fleming, Jakel, and Maloney 2011), is
becoming possible. Wavelength-dependent signatures of the interaction between light and matter
may well be important in constraining our perceptions in previously unrecognized ways.
454 Smithson

Dimensionality of Color Experience in a World of Illuminated


Objects
A distinction can usefully be made here between performance—and appearance-based measures
(Koenderink, this volume). The ability perceptually to identify particular surfaces across condi-
tions of observing, such as a change in the spectral content of the illumination, does not imply that
these objects remain unchanging in their appearance. Such associations can often be made des-
pite large changes in appearance. The asymmetric matching task, in which the observer is asked
to adjust the light from a surface under a reference illuminant until it matches the appearance of
a test surface under a test illuminant, typically permits only imperfect ‘matches’. Brainard, Brunt,
and Speigle (1997) comment, ‘At this match point, however, the test and the match surfaces looked
different, and the observers felt as if further adjustments of the match surface should produce a
better correspondence. Yet turning any of the knobs or combinations of knobs only increased the
perceptual difference’ (p. 2098). Lichtenberg raised just this issue. In a letter to Goethe (7 October
1793), he writes, ‘In ordinary life we call white, not what looks white, but what would look white
if it was set out in pure sunlight . . . we believe at every moment that we sense something which we
really only conclude’ (Joost, Lee, and Zaidi 2002).
An interesting issue is the extent to which observers can represent simultaneously the color of
a surface and that of the light illuminating it (MacLeod 2003). In addition to extracting a percep-
tual signal associated with the unchanging property of a material’s surface reflectance, would it
not also be useful to retain information about the properties of different illuminants (c.f. Jansch
1921; Katz 1911)? Tokunaga and Logvinenko (2010) used multidimensional scaling to show that
the perceptual distance between papers that were uniformly illuminated could be accommodated
within a three-dimensional configuration, while under variegated illumination three further
dimensions emerged. They describe their results as revealing ‘lighting dimensions’ of object color
that can be distinguished from the traditional three dimensions referred to as ‘material dimen-
sions’. The distinction is one that echoes discussion by Katz and by Koffka on the more-than-one
dimensionality of neutral colors (Koffka 1936).
We can also ask about observers’ explicit judgments of the illuminant on a scene. In a strong
version of the illuminant estimation hypothesis, the illuminant estimate is associated with the
explicitly perceived illuminant, but there is also the intriguing possibility that the same physical
quantity has multiple psychological representations (Rutherford and Brainard 2002). In the lim-
ited number of studies that have obtained explicit estimates of scene illuminant, the estimates are
not consistent with the equivalent illuminant parameters required to account for surface percep-
tion in the same scene (Brainard and Maloney 2011).

The Relationship Between Color Contrast and Color Constancy


The standard simultaneous color contrast situation has been likened to a color constancy task,
in which the chromatic bias in the surround is attributed to a bias in the spectrum of illumina-
tion. Compensation for this bias shifts the appearance of the test region away from the surround.
Koffka (1931) compares two observations: a small grey patch on a yellow background, and a
small area reflecting neutral light within a room under yellow illumination. In both cases, an
objectively neutral region appears blue when it is surrounded by a yellow environment. But in
the first example, the yellow background appears saturated, but the effect on the neutral region
is weak; whereas in the second example, the yellow background appears close to white, but
the effect on the neutral region is strong. Koffka identifies factors that might account for the
difference, such as the full spatial extent of the scene and the likely spectral composition of
Perceptual Organization of Color 455

natural illuminants—explanations that might now sit comfortably within a Bayesian framework
(Feldman, Chapter 45, this volume).
Simple figure-ground displays are compatible with many different perceptual organizations.
The central disc may be an opaque surface lying on a colored background both illuminated by a
neutral light; the central disc may be an opaque surface lying on a neural background both under
spectrally biased illumination; or the central disc may be transparent so that the light reaching the
eye is a mixture of the properties of the transparent layer and of the underlying surface.
Ekroll et al. have argued for transparency-based interpretations of classical demonstrations
of simultaneous color contrast (Ekroll and Faul 2013). Whilst it is true that the simple displays
typically used to show simultaneous color contrast do not include the multiple surfaces that are
required to parse appropriately the contributions from a transparent layer and from the back-
ground or illumination, ambiguous arrangements may also be perceived in terms of surfaces,
filters, and illuminants. A transparency-based interpretation suggests new laws of simultaneous
contrast that have some empirical support, particularly when temporal von Kries adaptation is
taken into account (Ekroll and Faul 2012). Bosten and Mollon (2012) provide a detailed discus-
sion of different theories of simultaneous contrast.

Configural Effects
Color constancy is often cast as the problem of perceiving stable color appearance of a sur-
face under changes in the illumination of the surface. We might also consider positional
color constancy, which describes the invariance of surface color under changes in pos-
ition (von Helmholtz 1867; Young 1807). Illuminant color constancy requires the chro-
matic context of the surface to be taken into account, since for isolated matte surfaces there is
no way to disentangle illuminant and reflectance. Positional color constancy requires the chro-
matic context to be discounted, since color perception would otherwise be an accident of loca-
tion (Whittle and Challands 1969). Amano and Foster (2004) obtained surface color matches in
Mondrian displays in which they were able to change the simulated illuminant and the position
of the test surface. Accuracy was almost as good for positional and illuminant constancy as for
illuminant constancy alone. A reliable cue in these cases was provided by the ratios of cone excita-
tions between the test surfaces and a spatial average over the whole pattern.
In natural viewing, shadows or multiple light sources mean that it is common for scenes to
include multiple regions of illumination. If a perceptual system is to ‘discount’ the illumination
in such scenes, elements that share the same illumination must be grouped together to allow the
appropriate corrections to be applied. Gilchrist’s anchoring theory of lightness (Gilchrist et  al.
1999) adopts the term ‘framework’ to specify the frame of reference within which the target stim-
ulus belongs (see also Duncker 1929; Koffka 1935; and Herzog and Öğmen 2013, this volume, for
their discussion of the perceived motion of a target within a frame of reference which may itself
be in motion). The principles that promote grouping according to common illumination are dis-
cussed in detail by Gilchrist (this volume).
Schirillo and Shevell (2000) tested the relationship between color appearance of a small test
patch and the spatial organization of surrounding patches. They used a small set of chromatic
stimuli and varied only the spatial arrangement in different conditions of the experiment, whilst
keeping constant the immediate surround of the test patch, the space-average chromaticity of
the whole scene, and the range and ensemble of chromaticities present. Strong color appearance
effects were found with spatial arrangements that allowed the left and right halves of the display to
be interpreted as areas with identical objects under different illuminations. In achromatic cases,
456 Smithson

Schirillo and Shevell (2002) showed that arranging grey-level patches to be consistent with sur-
faces covered by a luminance edge (i.e. one with a constant contrast ratio) caused shifts in bright-
ness that were in the direction predicted by a change in a real illuminant. Perceptual judgments
of color that are specific to the illuminant simulated in particular regions of the display can be
maintained even when eye-movements cause images of different regions to be interleaved on the
retina, implying that the regional specificity does not derive from peripheral sensory mechanisms
(Lee and Smithson 2012).
Geometric cues, such as X-junctions formed by the continuation of underlying contours across
the edges of a transparency, are vital for the perception of transparency in static scenes (see Figure
21.4). However, whilst X-junctions can promote perceptual scission, they are not necessarily
beneficial in identifying perceptual correlates of the spectral transmittance of the transparent
region, at least in cases where scission is supported by other cues, such as common motion. With
simulations of transparent overlays moving over a pattern of surface reflectances, rotating the
image region corresponding to the transparency by 180° disrupts X-junctions but does not impair
performance in the task of identifying identical overlays across different illuminant regions and
over different surfaces (Khang and Zaidi 2002). It seems that the identification of spectrally select-
ive transparencies in these conditions is well predicted by a process of color matching that oper-
ates with parameters estimated from the mean values in relevant image regions (Khang and Zaidi
2002; Zaidi 1998).
Geometric configuration is particularly important for the perception of three-dimensional sur-
faces and their interaction with illumination. Bloj, Kersten, and Hulbert (1999) showed that color
perception is strongly influenced by three-dimensional shape perception. A concave folded card
with trapezoidal sides can be perceived correctly as an inward-pointing corner, or can be mis-
perceived as a ‘roof ’ if viewed through a pseudoscope which reverses the binocular disparities
between the two eyes. Bloj et al. painted the left side of the folded card magenta, and the right
side white. The light reflected from the left side illuminated the right side, generating a strong
chromatic gradient across the white-painted area. Switching viewing mode from ‘corner’ to ‘roof ’
caused large changes in color-appearance matches to the white-painted side, from a desaturated
pink to a more saturated magenta.
Kingdom (2003) has shown that the perception of shape-from-shading is strong when chro-
matic and luminance variations are not aligned or are out of phase, and suppressed when they are
aligned and in-phase (see Figure 21.5). One interpretation is that spatially corresponding changes
of chromaticity and luminance are most likely to originate from changes in surface reflectance.
Harding, Harris, and Bloj (2012), however, have shown that the use of illumination gradients as a
cue to three-dimensional shape can be flexibly learned, leading to the acquisition of assumptions
about lighting and scene parameters that subsequently allow gradients to be used as a reliable
shape cue.

Concluding Remarks
The perceptual attribute of color has its own inherent structure. Colors can be ordered and
grouped according to their perceptual similarities. For lights in a void, color resides in a three-
dimensional space, constrained by the spectral sensitivities of the three, univariant cone mecha-
nisms and conveniently described by the perceptual qualities of hue, saturation, and brightness.
However, once placed in a spatial and temporal context, and related to other lights, the same
spectral distribution of light reaching the retina can change dramatically in appearance.
Additionally, some hues or color directions have a special status, and the relative influences
Perceptual Organization of Color 457

(a)

(b)

(c)

(d)

Fig. 21.5  When chromatic gratings (left-hand column) and luminance gratings (middle column) are
spatially aligned their combination appears flat (right-hand column, (a) and (c)): but, when they are
spatially misaligned, the luminance component readily contributes ‘shape from shading’ (right-hand
column, (b) and (d)).
Data from Frederick A.A. Kingdom, Colour bring relief to human vision, Nature Neuroscience 6 (6), pp. 641–644,
Figures 2a-4, 3a, and 6a-b, 2003.

of physiological, environmental, and linguistic factors in conferring this status remain fiercely
debated.
Color has a strong organizational influence on scenes. Color can be used to impose spatial
structure, for example when pitted against spatial proximity in conferring rival perceptual organi-
zations or in supporting contour integration. It allows grouping of elements that aid extraction of
depth from random-dot-stereograms, motion from global-motion stimuli, and form from cam-
ouflage. Although color has traditionally been studied in isolation from other perceptual attrib-
utes, and has often been considered as secondary to form perception, there is increasing evidence
that color and form processing interact in subtle and flexible ways.
Color perception is strongly influenced by scene organization, particularly when the spatial
arrangement of surfaces introduces spatio-chromatic signatures that are consistent with the
458 Smithson

chromatic transformations imposed by changes in illumination or by spectrally selective filtering.


Many stimulus arrangements are ambiguous in that they could have been produced by multiple
different arrangements of surfaces, filters, and illuminants, and perhaps some of the differences
between the color percepts elicited by simple stimulus arrangements stem from the observers’
relative willingness to adopt different interpretations of the scene.
A large body of work has considered surface color perception for arrays of flat, matte
surfaces. As with all perceptual constancies, when there are more cues to the real-world
arrangement of lights and objects, constancy improves. High levels of performance-based or
operational constancy can be achieved, however, without the need for constancy of appear-
ance across different conditions of observing. More recently, it has become possible to use
computer-rendered images to study the perception of three-dimensional objects formed from
glossy or translucent materials. The interaction of light and the materials from which objects
are made provides a rich source of spatio-chromatic variation. Understanding the constraints
that these interactions impose on the pattern of cone signals across the retina will be impor-
tant in unravelling competing perceptual organizations as they relate to stimuli in the exter-
nal world.
In Gelb’s words, ‘from the very beginning, the functioning of our sensory apparatus depends
upon conditions in such a way that, in accordance with external stimulus constellations and
internal attitudes we find ourselves confronted by a world of “things” . . .’ (Gelb 1938, p. 207).
With our increased understanding of the physiology of color vision, and the sophistication with
which we are now able to manipulate stimuli according to the optical physics of light-mate-
rial interactions, the world of color remains a rich testing-ground for principles of perceptual
organization.

References
Adelson, E. H. (2001). ‘On Seeing Stuff: The Perception of Materials by Humans and Machines’. Human
Vision and Electronic Imaging 6(4299): 1–12.
Amano, K. and D. H. Foster (2004). ‘Colour Constancy under Simultaneous Changes in Surface Position
and Illuminant’. Proceedings of the Royal Society B–Biological Sciences 271(1555): 2319–2326.
Anderson, B. L. (2011). ‘Visual Perception of Materials and Surfaces’. Current Biology 21(24): R978–R983.
Anstis, S., M. Vergeer, and R. Van Lier (2012). ‘Luminance Contours can Gate Afterimage Colors and
“Real” Colors’. Journal of Vision 12(10): 1–13.
Berlin, B. and P. Kay (1969). Basic Color Terms: Their Universality and Evolution. Berkeley: University of
California Press.
Blake, Z., T. Land, and J. Mollon (2008). ‘Relative Latencies of Cone Signals Measured by a Moving Vernier
Task’. Journal of Vision 8(16): 1–11.
Bloj, M. G., D. Kersten, and A. C. Hurlbert (1999). ‘Perception of Three-dimensional Shape Influences
Colour Perception through Mutual Illumination’. Nature 402(6764): 877–879.
Bompas, A., G. Kendall, and P. Sumner (2013). ‘Spotting Fruit versus Picking Fruit as the Selective
Advantage of Human Colour Vision’. iPerception 4(2): 84–94.
Bompas, A., G. Powell, and P. Sumner (2013). ‘Systematic Biases in Adult Color Perception Persist Despite
Lifelong Information Sufficient to Calibrate them’. Journal of Vision, 13(1): 19, 1–19.
Bosten, J. M., J. D. Robinson, G. Jordan, and J. D. Mollon (2005). ‘Multidimensional Scaling Reveals a
Color Dimension Unique to “Color Deficient” Observers’. Current Biology, 15(23): R950–R952.
Bosten, J. M. and J. D. Mollon (2012). ‘Kirschmann’s Fourth Law’. Vision Research 53(1): 40–46.
Boynton, R. M. and J. Gordon (1965). ‘Bezold-Brucke Hue Shift Measured by Color-naming Technique’.
Journal of the Optical Society of America 55(1): 78–86.
Perceptual Organization of Color 459

Brainard, D. H. and B. A. Wandell (1986). ‘Analysis of the Retinex Theory of Color-vision’. Journal of the
Optical Society of America A: Optics Image Science and Vision 3(10): 1651–1661.
Brainard, D. H., W. A. Brunt, and J. M. Speigle (1997). ‘Color Constancy in the Nearly Natural Image.1.
Asymmetric Matches’. Journal of the Optical Society of America A: Optics Image Science and Vision
14(9): 2091–2110.
Brainard, D. H. and L. T. Maloney (2011). ‘Surface Color Perception and Equivalent Illumination Models’.
Journal of Vision 11(5):1, 1–18).
Brown, A. M., Lindsey, D. T., & Guckes, K. M. (2011). ‘Color names, color categories, and color-cued visual
search: Sometimes, color perception is not categorical’. Journal of Vision, 11(12): 2, 1–21.
Burns, B. and B. E. Shepp (1988). ‘Dimensional Interactions and the Structure of Psychological Space—the
Representation of Hue, Saturation, and Brightness’. Perception & Psychophysics 43(5): 494–507.
Burns, S. A., A. E. Elsner, J. Pokorny, and V. C. Smith (1984). ‘The Abney Effect—Chromaticity
Coordinates of Unique and Other Constant Hues’. Vision Research 24(5): 479–489.
Cavina-Pratesi, C., R. Kentridge, C. A. Heywood, and A. D. Milner (2010a). ‘Separate Channels for
Processing Form, Texture, and Color: Evidence from fMRI Adaptation and Visual Object Agnosia’.
Cerebral Cortex 20(10): 2319–2332.
Cavina-Pratesi, C., R. Kentridge, C. A. Heywood, and A. D. Milner (2010b). ‘Separate Processing of
Texture and Form in the Ventral Stream: Evidence from fMRI and Visual Agnosia’. Cerebral Cortex
20(2): 433–446.
Craven, B. J. and D. H. Foster (1992). ‘An Operational Approach to Color Constancy’. Vision Research
32(7): 1359–1366.
Croner, L. J. and T. D. Albright (1997). ‘Image Segmentation Enhances Discrimination of Motion in Visual
Noise’. Vision Research 37(11): 1415–1427.
Curcio, C. A., K. A. Allen, K. R. Sloan, Connie L. Lerea, James B. Hurley, et al. (1991). ‘Distribution and
Morphology of Human Cone Photoreceptors Stained with Anti-blue Opsin’. Journal of Comparative
Neurology 312(4): 610–624.
Dacey, D. M. and B. B. Lee (1994). ‘The Blue-on Opponent Pathway in Primate Retina Originates from a
Distinct Bistratified Ganglion-cell Type’. Nature 367(6465): 731–735.
Danilova, M. V. and J. D. Mollon (2012). ‘Foveal Color Perception: Minimal Thresholds at a Boundary
between Perceptual Categories’. Vision Research 62: 162–172.
Daoutis, C. A., A. Franklin, A. Riddett, A. Clifford and I. R. L. Davies (2006). ‘Categorical Effects In
Children’s Colour Search: A Cross-linguistic Comparison’. British Journal of Developmental Psychology
24: 373–400.
Daw, N. W. (1962). ‘Why After-images Are Not Seen in Normal Circumstances’. Nature
196(4860): 1143–1145.
Delahunt, P. B., M. A. Webster, L. Ma, and J. S. Werner (2004). ‘Long-term Renormalization of Chromatic
Mechanisms Following Cataract Surgery’. Visual Neuroscience 21(3): 301–307.
Derrington, A. M., J. Krauskopf, and P. Lennie (1984). ‘Chromatic Mechanisms in Lateral Geniculate
Nucleus of Macaque’. Journal of Physiology (London) 357: 241–265.
de Weert, C. M. M. and N. A. W. H. van Kruysbergen (1997). ‘Assimilation: Central and Peripheral Effects’.
Perception 26: 1217–1224.
Dinkova-Bruun, G., G. E. M. Gasper, M. Huxtable, T. C. B. McLeish, C. Panti, and H. Smithson (2013).
The Dimensions of Colour: Robert Grosseteste’s De colore (Edition, Translation and Interdisciplinary
Analysis). Toronto, Canada: PIMS.
Duncker, D. K. (1929). ‚Uber induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung)’. Psychologische Forschung 12: 180–259.
D’Zmura, M. and G. Iverson (1993). ‘Color Constancy.1. Basic Theory of 2-Stage Linear Recovery of
Spectral Descriptions for Lights and Surfaces’. Journal of the Optical Society of America A: Optics Image
Science and Vision 10(10): 2148–2163.
460 Smithson

Ekroll, V. and F. Faul (2012). ‘New Laws of Simultaneous Contrast?’ Seeing and Perceiving 25(2): 107–141.
Ekroll, V. and F. Faul (2013). ‘Transparency Perception: The Key to Understanding Simultaneous
Color Contrast’. Journal of the Optical Society of America A: Optics Image Science and Vision
30(3): 342–352.
Elliot, J. (1780). Philosophical Observations on the Senses of Vision and Hearing. London: J. Murry.
Faul, F. and V. Ekroll (2002). ‘Psychophysical Model of Chromatic Perceptual Transparency Based on
Substractive Color Mixture’. Journal of the Optical Society of America A: Optics Image Science and Vision
19(6): 1084–1095.
Fleming, R. W., R. O. Dror, and E. H. Adelson (2003). ‘Real-World Illumination and the Perception of
Surface Reflectance Properties’. Journal of Vision 3(5): 347–368.
Fleming, R. W. and H. H. Bülthoff (2005). ‘Low-level Image Cues in the Perception of Translucent
Materials’. ACM Transactions on Applied Perception 2(3): 346–382.
Fleming, R. W., F. Jakel, and L. T. Maloney (2011). ‘Visual Perception of Thick Transparent Materials’.
Psychological Science 22(6): 812–820.
Fleming, R. W., C. Wiebel, and K. Gegenfurtner (2013). ‘Perceptual Qualities and Material Classes’. Journal
of Vision 13(8):9, 1–20.
Foster, D. H. and S. M. C. Nascimento (1994). ‘Relational Color Constancy from Invariant Cone-Excitation
Ratios’. Proceedings of the Royal Society B-Biological Sciences, 257(1349): 115–121.
Foster, D. H., S. M. C. Nascimento, K. Amano, L. Arend, K. J. Linnell, et al. (2001). ‘Parallel Detection of
Violations of Color Constancy’. Proceedings of the National Academy of Sciences of the United States of
America 98(14): 8151–8156.
Friedman, H. S., H. Zhou and R. von der Heydt (2003). ‘The Coding of Uniform Colour Figures in
Monkey Visual Cortex’. Journal of Physiology (London) 548(2): 593–613.
Fuchs, W. (1923). ‘Experimentelle Untersuchungen über die Änderung von Farben unter dem Einfluss von
Gestalten (Angleichungserscheinungen) [Experimental investigations on the alteration of color under
the influence of Gestalten]’. Zeitschrift für Psychologie 92: 249–325.
Garner, W. R. (1974). The Processing of Information and Structure. Potomac, MD: Erlbaum.
Gegenfurtner, K. R. (2003). ‘Cortical Mechanisms of Colour Vision’. Nature Reviews Neuroscience
4(7): 563–572.
Gelb, A. (1938). ‘Colour Constancy’. In A Source Book of Gestalt Psychology, edited by D. Willis, pp. 196–209.
London: Kegan Paul, Trench, Trubner and Co.
Gheorghiu, E. and F. A. A. Kingdom (2007). ‘Chromatic Tuning of Contour-shape Mechanisms
Revealed through the Shape-frequency and Shape-amplitude After-effects’. Vision Research
47(14): 1935–1949.
Gilbert, A. L., T. Regier, P. Kay, and R. B. Ivry (2006). ‘Whorf Hypothesis is Supported in the Right Visual
Field but not the Left’. Proceedings of the National Academy of Sciences of the United States of America
103(2): 489–494.
Gilchrist, A., C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, et al. (1999). ‘An Anchoring Theory of
Lightness Perception’. Psychological Review 106(4): 795–834.
Goldstein, K. and A. Gelb (1925). ‘Über Farbennamenamnesie’. Psychologische Forschung 6: 127–186.
Gowdy, P. D., C. F. Stromeyer, and R. E. Kronauer (1999). ‘Facilitation between the Luminance and
Red-green Detection Mechanisms: Enhancing Contrast Differences across Edges’. Vision Research
39(24): 4098–4112.
Grassmann, H. (1853). ‘Zur Theorie der Farbenmischung’. Annalen der Physik und Chemie 89: 60–84.
Gregory, R. L. (1977). ‘Vision with Isoluminant Colour Contrast. 1. A Projection Technique and
Observations’. Perception 6(1): 113–119.
Harding, G., J. M. Harris, and M. Bloj (2012). ‘Learning to Use Illumination Gradients as an Unambiguous
Cue to Three Dimensional Shape’. PLoS ONE 7(4): e35950.
Perceptual Organization of Color 461

Ho, Y. X., M. S. Landy, and L. T. Maloney (2008). ‘Conjoint Measurement of Gloss and Surface Texture’.
Psychological Science 19(2): 196–204.
Hong, S. W. and S. K. Shevell (2006). ‘Resolution Of Binocular Rivalry: Perceptual Misbinding of Color’.
Visual Neuroscience 23(3–4): 561–566.
Hurvich, L. M. and D. Jameson (1957). ‘An Opponent-process Theory of Color Vision’. Psychological Review
64(6): 384–404.
Indow, T. and K. Kanazawa (1960). ‘Multidimensional Mapping of Munsell Colors Varying in Hue,
Chroma, and Value’. Journal of Experimental Psychology 59(5): 330–336.
Indow, T. and T. Uchizono (1960). ‘Multidimensional Mapping of Munsell Colors Varying in Hue and
Chroma’. Journal of Experimental Psychology 59(5): 321–329.
Indow, T. (1980). ‘Global Color Metrics and Color-appearance Systems’. Color Research and Application
5(1): 5–12.
Jansch, E. R. (1921). ‘Über den Farbenkontrast und die so genannte Berücksichtigung der farbigen
Beleuchtung’. Zeitsschrift für Sinnesphysiologie 52: 165–180.
Jones, P. D. and D. H. Holding (1975). ‘Extremely Long-term Persistence of the McCollough Effect’. Journal
of Experimental Psychology—Human Perception and Performance 1(4): 323–327.
Joost, U., B. B. Lee, and Q. Zaidi (2002). ‘Lichtenberg’s letter to Goethe on “Farbige Schatten”—
Commentary’. Color Research and Application 27(4): 300–301.
Jordan, J. R., W. S. Geisler, and A. C. Bovik (1990). ‘Color as a Source of Information in the Stereo
Correspondence Process’. Vision Research 30(12): 1955–1970.
Jordan, G. and J. D. Mollon (1997). ‘Unique Hues in Heterozygotes for Protan and Deutan Deficiencies’.
Colour Vision Deficiencies XIII 59: 67–76.
Jordan, G., S. S. Deeb, J. M. Bosten, and J. D. Mollon (2010). ‘The dimensionality of color vision in carriers
of anomalous trichromacy’. Journal of Vision 10(8):12, 1–19.
Katz, D. (1911). The World of Colour, trans. R. B. MacLeod, C. W. Fox. London: Kegan Paul, Trench,
Trubner and Co.
Kay, P. and C. K. McDaniel (1978). ‘Linguistic Significance of Meanings of Basic Color Terms’. Language
54(3): 610–646.
Kay, P. and W. Kempton (1984). ‘What Is the Sapir-Whorf Hypothesis’. American Anthropologist 86(1): 65–79.
Kay, P. and B. Berlin (1997). ‘Science not Equal Imperialism: There Are Nontrivial Constraints on Color
Naming’. Behavioral and Brain Sciences 20(2): 196–201.
Khang, B. G. and Q. Zaidi (2002). ‘Cues and Strategies for Color Constancy: Perceptual Scission, Image
Junctions and Transformational Color Matching’. Vision Research 42(2): 211–226.
King, D. L. (1988). ‘Assimilation Is Due to One Perceived Whole and Contrast Is Due to Two Perceived
Wholes’. New Ideas in Psychology 6(3): 277–288.
King, D. L. (2001). ‘Grouping and Assimilation in Perception, Memory, and Conditioning’. Review of
General Psychology 5(1): 23–43.
Kingdom, F. A. A. (2003). ‘Color Brings Relief to Human Vision’. Nature Neuroscience 6(6): 641–644.
Koenderink, J. (2010). Color for the Sciences. Cambridge, MA: MIT Press.
Koffka, K. (1931). ‘Some Remarks on the Theory of Colour Constancy’. Psychologische Forschung
16: 329–345.
Koffka, K. and M. R. Harrower (1931). ‘Colour and Organization II’. Psychologische Forschung 15: 193–275.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, and World.
Koffka, K. (1936). ‘On Problems of Colour-perception’. Acta Psychologica, 1, 129–134.
Krauskopf, J., D. R. Williams, and D. W. Heeley (1982). ‘Cardinal Directions of Color Space’. Vision
Research 22(9): 1123–1131.
Krauskopf, J. and B. Farell (1990). ‘Influence of Color on the Perception of Coherent Motion’. Nature
348(6299): 328–331.
462 Smithson

Land, E. H. and J. J. McCann (1971). ‘Lightness and Retinex Theory’. Journal of the Optical Society of
America 61(1): 1–11.
Land, E. H. (1986). ‘Recent Advances in Retinex Theory’. Vision Research 26(1): 7–21.
Lee, B. B., R. M. Shapley, M. J. Hawken, and H. Sun (2012). ‘Spatial Distributions of Cone Inputs to Cells
of the Parvocellular Pathway Investigated with Cone-isolating Gratings’. Journal of the Optical Society of
America A: Optics Image Science and Vision 29(2): A223–A232.
Lee, R. J., J. D. Mollon, Q. Zaidi, and H. E. Smithson (2009). ‘Latency Characteristics of the
Short-wavelength-sensitive Cones and their Associated Pathways’. Journal of Vision 9(12): 5, 1–17.
Lee, R. J. and H. E. Smithson (2012). ‘Context-dependent Judgments of Color that Might Allow Color
Constancy in Scenes with Multiple Regions of Illumination’. Journal of the Optical Society of America
A: Optics Image Science and Vision 29(2): A247–A257.
Li, H. C. O. and F. A. A. Kingdom (2001). ‘Segregation by Color/Luminance Does Not Necessarily
Facilitate Motion Discrimination in the Presence of Motion Distractors’. Perception & Psychophysics
63(4): 660–675.
Liebmann, S. (1927). ‘Über das Verhalten farbiger Formen bei Helligkeitsgleichheit von Figur und Grund’.
Psychologische Forschung 9: 300–353.
Linnell, K. J., and Foster, D. H. (1996). ‘Dependence of Relational Colour Constancy on the Extraction of a
Transient Signal’. Perception 25(2): 221–228.
McCollough, C. (1965). ‘Color Adaptation of Edge-detectors in the Human Visual System’. Science
149(3688): 1115–1116.
McIlhagga, W. H. and K. T. Mullen (1996). ‘Contour Integration with Colour and Luminance Contrast’.
Vision Research 36(9): 1265–1279.
McKeefry, D. J., E. G. Laviers, and P. V. McGraw (2006). ‘The Segregation and Integration of Colour in
Motion Processing Revealed by Motion After-effects’. Proceedings of the Royal Society B—Biological
Sciences 273(1582): 91–99.
MacLeod, D. I. A. (2003). ‘New Dimensions in Color Perception’. Trends in Cognitive Sciences 7(3): 97–99.
Maloney, L. T. and B. A. Wandell (1986). ‘Color Constancy—a Method for Recovering Surface Spectral
Reflectance’. Journal of the Optical Society of America A: Optics Image Science and Vision 3(1): 29–33.
Martin, P. R., E. M. Blessing, P. Buzas, B. A. Szmajda, and J. D. Forte (2011). ‘Transmission of Colour
and Acuity Signals by Parvocellular Cells in Marmoset Monkeys’. Journal of Physiology (London)
589(11): 2795–2812.
Mollon, J. D. and P. G. Polden (1975). ‘Colour Illusion and Evidence for Interaction between Colour
Mechanisms’. Nature 258: 421–422.
Mollon, J. D. (2003). ‘The Origins of Modern Color Science’. In Color Science, edited by S. Shevell.
Washington: Optical Society of America.
Mollon, J. D. (2006). ‘Monge—The Verriest Lecture, Lyon, July 2005’. Visual Neuroscience 23(3–4):
297–309.
Mollon, J. D. (2009). ‘A Neural Basis for Unique Hues?’ Current Biology 19(11): R441–R442.
Morgan, M. J., A. Adam, and J. D. Mollon (1992). ‘Dichromates Detect Color-camouflaged Objects
that Are Not Detected by Trichromates’. Proceedings of the Royal Society B—Biological Sciences
248(1323): 291–295.
Musatti, C. (1931). ‘Forma e assimilazione’ [Form and assimilation]. Archivo Italiano di Psicologica
9: 213–269.
Nathans, J., D. Thomas, and D. S. Hogness (1986). ‘Molecular Genetics of Human Color Vision—the
Genes Encoding Blue, Green, and Red Pigments’. Science 232(4747): 193–202.
Olkkonen, M. and D. H. Brainard (2010). ‘Perceived Glossiness and Lightness under Real-world
Illumination’. Journal of Vision 10(9): 5, 1–19.
Palmer, G. (1777). Theory of Colours and Vision. London: S. Leacroft.
Perceptual Organization of Color 463

Parraga, C. A., T. Troscianko, and D. J. Tolhurst (2002). ‘Spatiochromatic Properties of Natural Images and
Human Vision’. Current Biology 12(6): 483–487.
Pinna, B., G. Brelstaff, and L. Spillmann (2001). ‘Surface Color from Boundaries: A New “Watercolor”
Illusion’. Vision Research 41(20): 2669–2676.
Pokorny, J. and V. C. Smith (1970). ‘Wavelength Discrimination in the Presence of Added Chromatic
Fields’. Journal of the Optical Society of America 60(4): 562–569.
Polden, P. G. and J. D. Mollon (1980). ‘Reversed Effect of Adapting Stimuli on Visual Sensitivity’.
Proceedings of the Royal Society B—Biological Sciences 210(1179): 235–272.
Powell, G., A. Bompas, and P. Sumner (2012). ‘Making the Incredible Credible: Afterimages Are
Modulated by Contextual Edges More than Real Stimuli’. Journal of Vision 12(10): 17, 1–13.
Regan, B. C. and J. D. Mollon (1997). ‘The Relative Salience of the Cardinal Axes of Colour Space in
Normal and Anomalous Trichromats’. Colour Vision Deficiencies XIII 59: 261–270.
Regan, B. C., C. Julliot, B. Simmen, F. Vienot, P. Charles-Dominique, et al. (2001). ‘Fruits, Foliage and
the Evolution of Primate Colour Vision’. Philosophical Transactions of the Royal Society B—Biological
Sciences 356(1407): 229–283.
Ripamonti, C. and S. Westland (2003). ‘Prediction of Transparency Perception Based on Cone-excitation
Ratios’. Journal of the Optical Society of America A: Optics Image Science and Vision 20(9): 1673–1680.
Roberson, D. and J. R. Hanley (2007). ‘Color Vision: Color Categories Vary With Language After All’.
Current Biology 17(15): R605–R607.
Rushton, W. A. H. (1972). ‘Pigments and Signals in Color Vision’. Journal of Physiology (London)
220(3): 1–31P.
Rutherford, M. D. and D. H. Brainard (2002). ‘Lightness Constancy: A Direct Test of the
Illumination-estimation Hypothesis’. Psychological Science 13(2): 142–149.
Saunders, B. and J. van Brakel (1997). ‘Are There Nontrivial Constraints on Colour Categorization?’
Behavioral and Brain Sciences 20(2): 167–228.
Saunders, B. (2000). ‘Revisiting Basic Color Terms’. Journal of the Royal Anthropological Institute
6(1): 81–99.
Schirillo, J. A. and S. K. Shevell (2000). ‘Role of Perceptual Organization in Chromatic Induction’. Journal
of the Optical Society of America A—Optics Image Science and Vision 17(2): 244–254.
Schirillo, J. A. and S. K. Shevell (2002). ‘Articulation: Brightness, Apparent Illumination, and Contrast
Ratios’. Perception 31(2): 161–169.
Shapiro, A., W. Kistler, and A. Rose-Henig (2012). Color Wagon-Wheel (3rd place, Best Illusion of the
Year). http://illusionoftheyear.com/2012/color-wagon-wheel/.
Shepard, R. N. (1964). ‘Attention and the Metric Structure of the Stimulus Space’. Journal of Mathematical
Psychology 1(1): 54–87.
Shepard, R. N. (1991). ‘The Perceptual Organization of Colors: An Adaptation to Regularities of the
Terrestrial World?’ In J. Barkow, L. Cosmides, and J. Tooby (eds.), The Adapted Mind: Evolutionary
Psychology and the Generation of Culture. Oxford: Oxford University Press.
Shevell, S. K., R. St Clair, and S. W. Hong (2008). ‘Misbinding of Color to Form in Afterimages’. Visual
Neuroscience 25(3): 355–360.
Singer, B. and M. D’Zmura (1994). ‘Color Contrast Induction’. Vision Research 34(23): 3111–3126.
Smithson, H. E. and J. D. Mollon (2004). ‘Is the S-Opponent Chromatic Sub-System Sluggish?’ Vision
Research 44(25): 2919–2929.
Smithson, H. E. (2005). ‘Sensory, Computational and Cognitive Components of Human Colour Constancy’.
Philosophical Transactions of the Royal Society B—Biological Sciences 360(1458): 1329–1346.
Smithson, H. E., G. Dinkova-Bruun, G. E. M. Gasper, M. Huxtable, T. C. B. McLeish, et al. (2012).
‘A Three-dimensional Color Space from the 13th Century’. Journal of the Optical Society of America
A: Optics Image Science and Vision 29(2): A346–A352.
464 Smithson

Solomon, S. G. and P. Lennie (2005). ‘Chromatic Gain Controls in Visual Cortical Neurons’. Journal of
Neuroscience 25(19): 4779–4792.
Solomon, S. G., J. W. Peirce, and P. Lennie (2004). ‘The Impact of Suppressive Surrounds on Chromatic
Properties of Cortical Neurons’. Journal of Neuroscience 24(1): 148–160.
Stiles, W. S. (1949). ‘Increment Thresholds and the Mechanisms of Colour Vision’. Documenta
Ophthalmologica 3(1): 138–165.
Stockman, A. and D. H. Brainard (2009). ‘Color Vision Mechanisms’. In Vision and Vision Optics: The
Optical Society of America Handbook of Optics (3rd edn, Vol. 3), edited by Bass M., C. DeCusatis,
J. Enoch, V. Lakshminarayanan, G. Li, C. Macdonald, et al. New York: McGraw Hill.
Stoughton, C. M. and B. R. Conway (2008). ‘Neural Basis for Unique Hues’. Current Biology
18(16): R698–R699.
Sumner, P. and J. D. Mollon (2000a). ‘Catarrhine Photopigments are Optimized for Detecting Targets
against a Foliage Background’. Journal of Experimental Biology 203(13): 1963–1986.
Sumner, P. and J. D. Mollon (2000b). ‘Chromaticity as a Signal of Ripeness in Fruits Taken by Primates’.
Journal of Experimental Biology 203(13): 1987–2000.
Sumner, P., T. Adamjee, and J. D. Mollon (2002). ‘Signals Invisible to the Collicular and Magnocellular
Pathways can Capture Visual Attention’. Current Biology 12(15): 1312–1316.
Sumner, P., E. J. Anderson, R. Sylvester, J. D. Haynes, and G. Rees (2008). ‘Combined Orientation
and Colour Information in Human V1 for both L-M and S-cone Chromatic Axes’. Neuroimage
39(2): 814–824.
Tansley, B. W. and R. M. Boynton (1976). ‘A Line, Not a Space, Represents Visual Distinctness of Borders
Formed by Different Colors’. Science 191(4230): 954–957.
Tokunaga, R. and A. D. Logvinenko (2010). ‘Material and Lighting Dimensions of Object Colour’. Vision
Research 50(17): 1740–1747.
Troscianko, T., R. Montagnon, J. Leclerc, E. Malbert, and P. L. Chanteau (1991). ‘The Role of Color as a
Monocular Depth Cue’. Vision Research 31(11): 1923–1929.
von Helmholtz, H. (1867). Handbuch der physiologischen Optik (1st edn, Vol. 2). Leipzig: Leopold Voss.
Translation of 3rd edn, Helmholtz’s Treatise on Physiological Optics, 1909, edited by J. P. C. Southall, pp.
286–287. Washington, DC: Optical Society of America, 1924.
von Kries, J. (1878). ‘Beitrag zur Physiologie der Gesichtsempfindungen’ [ Physiology of Visual
Sensations]. In Sources of Color Science, ed. D. L. MacAdam, pp. 101–108. Cambridge, MA: MIT Press.
Vul, E., E. Krizay, and D. I. A. MacLeod (2008). ‘The McCollough Effect Reflects Permanent and Transient
Adaptation in Early Visual Cortex’. Journal of Vision 8(12):4, 1–12.
Webster, M. A., K. K. Devalois, and E. Switkes (1990). ‘Orientation and Spatial-Frequency Discrimination
for Luminance and Chromatic Gratings’. Journal of the Optical Society of America A: Optics Image
Science and Vision 7(6): 1034–1049.
Webster, M. A., K. Halen, A. J. Meyers, P. Winkler, and J. S. Werner (2010). ‘Colour Appearance
and Compensation in the Near Periphery’. Proceedings of the Royal Society B: Biological Sciences
277(1689): 1817–1825.
Werner, J. S. and B. E. Schefrin (1993). ‘Loci of Achromatic Points throughout the Life Span’. Journal of the
Optical Society of America A: Optics Image Science and Vision 10(7): 1509–1516.
Westland, S. and C. Ripamonti (2000). ‘Invariant Cone-Excitation Ratios May Predict Transparency’.
Journal of the Optical Society of America A: Optics Image Science and Vision 17(2): 255–264.
Whittle, P. and P. D. C. Challands (1969). ‘Effect of Background Luminance on Brightness of Flashes’.
Vision Research 9(9): 1095–1110.
Williams, D. R. and D. I. A. MacLeod (1979). ‘Interchangeable Backgrounds for Cone Afterimages’. Vision
Research 19(8): 867–877.
Perceptual Organization of Color 465

Winawer, J., N. Witthoft, M. C. Frank, L. Wu, A. R. Wade, et al. (2007). ‘Russian Blues Reveal Effects of
Language on Color Discrimination’. Proceedings of the National Academy of Sciences of the United States
of America 104(19): 7780–7785.
Witzel, C. and K. R. Gegenfurtner (2011). ‘Is There a Lateralized Category Effect for Color?’ Journal of
Vision 11(12):16, 1–25.
Wuerger, S. M., L. T. Maloney, and J. Krauskopf (1995). ‘Proximity Judgments in Color Space—Tests of a
Euclidean Color Geometry’. Vision Research 35(6): 827–835.
Wyszecki, G. and W. S. Stiles (1982). Color Science: Concepts and methods. Quantitative data and Formulae.
New York: Wiley.
Xian, S. X. (2004). ‘Perceptual Grouping in Colour Perception’. PhD, University of Chicago, Illinois.
Xian, S. X. and S. K. Shevell (2004). ‘Changes in Color Appearance Caused by Perceptual Grouping’. Visual
Neuroscience 21(3): 383–388.
Young, T. (1802). ‘The Bakerian Lecture. On the Theory of Light and Colours’. Philosophical Transactions of
the Royal Society of London 92: 12–48.
Young, T. (1807). A Course of Lectures on Natural Philosophy and the Mechanical Arts (Vol. I, lecture
XXXVIII). London: Joseph Johnson.
Zaidi, Q. (1998). ‘Identification of Illuminant and Object Colors: Heuristic-Based Algorithms’. Journal of the
Optical Society of America A: Optics Image Science and Vision 15(7): 1767–1776.
Zaidi, Q. and A. Li (2006). ‘Three-Dimensional Shape Perception from Chromatic Orientation Flows’.
Visual Neuroscience 23(3–4): 323–330.
Zaidi, Q., R. Ennis, D. C. Cao, and B. Lee (2012). ‘Neural Locus of Color Afterimages’. Current Biology
22(3): 220–224.
Zhou, K., L. Mo, P. Kay, V. P. Y. Kwok, T. N. M. Ip, et al. (2010). ‘Newly Trained Lexical Categories Produce
Lateralized Categorical Perception of Color’. Proceedings of the National Academy of Sciences of the
United States of America 107(22): 9974–9978.
Chapter 22

The perceptual representation of


transparency, lightness, and gloss
Barton L. Anderson

1  Theoretical preliminaries
The adaptive role of vision is to provide information about the behaviorally relevant properties
of our visual environment. Our evolutionary success relies on recovering sufficient information
about the world to fulfill our biological and reproductive needs while avoiding environmental
dangers. The attempt to understand vision as a collection of adaptations to specific computational
problems has shaped a growing body of research that treats vision as a decomposable collection of
‘recovery’ problems. In this view, perceptual outputs are understood as approximately ideal solu-
tions to specific recovery problems, which have been dubbed the ‘natural tasks’ of vision (Geisler
and Ringach 2009). From this perspective, the science of understanding visual processing pro-
ceeds by identifying an organism’s natural tasks, evaluating the information available to perform
each task, developing models of how to perform a task optimally, and discovering the mechanisms
that implement these solutions.
The first aspect of this method of approach—the identification of ‘natural tasks’—is arguably
the most important because it defines the problem that needs to be solved. It is also the least con-
strained. Any environmental property can be hypothesized to be something that could have adap-
tive value and therefore something that might provide a selective advantage to anyone equipped to
recover it. Presumably, however, only some aspects of our environment were involved in directly
shaping the evolution of our senses. The scientific challenge is to differentiate properties that
actually exerted selective pressure in shaping the design of our senses from those that merely
came along for the ‘evolutionary ride’ (perceptual ‘spandrels’). But there is currently no principled
means of making such distinctions. For example, a general argument could be (and has been)
made that the computation of surface lightness would be useful because it provides information
about an intrinsic property of the external world, but it is much harder to fashion a clear argument
about how the recovery of surface albedo provides a specific adaptive benefit, or that any such
benefit played a role in natural selection.
The second aspect of the adaptationist approach—identifying the information available for a
computation—is in principle more constrained. Natural scenes are replete with information that
could be used to sense a particular world property. Once a recovery problem has been identified,
it is possible to inventory the sources of information that exist in the natural world that can be
used to sense it. However, most recovery problems in vision (such as shape, depth, color, lightness,
etc.) are considered in isolation, often in informationally impoverished laboratory settings. This
approach has led to the nearly universal acceptance of a belief in the poverty of the stimulus: the
presumption that the images do not contain sufficient information to recover the aspects of the
world that we experience. This view is typically defended by demonstrating that it is impossible
The perceptual representation of transparency, lightness, and gloss 467

to derive a unique solution for a specific recovery problem based on the information available in
the images. Perception is construed as the outputs of a collection of under-constrained problems
of probabilistic inference, which are solved with the aid of additional information, assumptions,
or constraints. So construed, it is natural to turn to probability theory for guidance into how to
solve such inference problems ideally, which typically entails the application of Bayes’ theorem
(see Feldman’s chapter, this volume).
The third aspect of the adaptationist program is ostensibly the easiest, and is where theory
meets data. Percepts or perceptual performance of observers is compared to that of the Bayesian
ideal, constructed on a set of priors and likelihoods. When data and the Bayesian ideal are deemed
sufficiently similar, the explanatory circle is considered closed: the fit between model and data is
upheld as evidential support for the specification of the natural tasks, the selection of priors and
likelihoods needed to perform the inference, and the claim that perception instantiates a form
of Bayesian inference. All that remains is the discovery of the mechanisms that instantiate such
computations.
The preceding describes what may currently be considered one (if not the) dominant view on
how to approach the study and modeling of visual processes. My own view departs in a num-
ber of significant ways from this approach, which shapes both my selection of problems and the
theoretical approach taken to account for data. One of the main goals of this chapter is to provide
an overview of how my approach has shaped work in three areas of surface and material percep-
tion: transparency, lightness, and gloss. The gist of my approach may be articulated as follows.
First, I assume that the attempt to identify the ‘natural tasks’ of vision—i.e., the computational
‘problems’ that visual systems putatively evolved to solve—is at best a guessing game, and at worst
a theoretical fiction. Some of the ‘problems’ our visual systems seem to solve may be epiphenom-
enal outputs, not explicit adaptations. Second, the claim that vision is an ill-posed inference prob-
lem is a logical consequence of treating vision as a collection of recovery problems, for which it
can be shown that there is no closed form solution that can be derived from the information that
is currently available. But if the putative ‘recovery problem’ is misidentified, or the ‘information
available for solving it’ is artificially restricted (such as typically occurs in laboratory environ-
ments), then it may not be vision that is ill-posed, but our particular understanding of visual
processing that is misconstrued.
An alternative approach is to begin with what we visually experience about the world, and attempt
to determine what image properties modulate these experiences. The question is not whether there
is sufficient information in the images to specify the true states of the world, but rather, whether
there is sufficient information to explain what we experience about the world. This approach is
neutral as to the ‘computational goals’ of the visual system, or if even whether the idea of a compu-
tational goal has any real meaning for biological systems. Whereas the recovery of a world property
can be shown to be under-constrained by argument, the question whether there is sufficient infor-
mation available to explain what we experience about the world is an empirical question.

2  Disentangling images into causal sources


We experience the world as a collection of 3D objects, surfaces, and materials that possess a vari-
ety of different phenomenological qualities. The reflectance and transmittance properties of a
material, together with its 3D geometry, structure light in ways that modulates our experience of
shape, lightness, color, gloss, texture, and translucency. Some image structure also arises from the
idiosyncratic distribution of light sources in a scene—the illumination field. To a first approxima-
tion, this list of surface and material properties tend to be experienced as separate sources of image
468 Anderson

structure, despite the fact that they are conflated in the image. Much research into perceptual
organization has focused on how the visual system fills in missing information or groups image
fragments into a global structure or pattern. While such phenomena are an extremely important
aspect of our visual experience, one of the other fundamental organizational problems involves
understanding how the visual system disentangles different sources of image structure into the
distinct surface and material qualities that we experience. In what follows, I consider a variety of
segmentation problems in the perception of surface and material attributes, and the insights that
such problems shed on the broader theoretical issues raised above.

2.1  Transparency
One of the perceptually most explicit and theoretically challenging forms of image segmenta-
tion occurs in the perception of transparency. Historically, the study of transparency focused on
achromatic surfaces, which was largely due the seminal influence of Metelli’s model of transpar-
ency (Metelli 1970, 1974a, 1974b, 1985; see also Gerbino’s chapter, this volume). The perception
of (achromatic) transparent surfaces generates two distinct impressions: its perceived lightness
and its perceived opacity or ‘hiding power’. Metelli’s model was based on a simple physical device
known as an episcotister: a rapidly rotating disc with a missing sector. The proportion of the disk
that is ‘missing’ determines the amount of light transmitted from the underlying surfaces through
the episcotister blades, which is the physical correlate of a transparent surface’s transmittance.
The lightness (or albedo) of the transparent surface corresponded to the color of the paint used
on the front surface of the episcotister, which determines the color of the transparent layer (or for
achromatic paints, its lightness). Metelli’s model was restricted to ‘balanced’ transparency, which
referred to conditions where the episcotister had a uniform reflectance and transmittance, reduc-
ing each to a single scalar (number). For the simple bipartite fields Metelli used as backgrounds,
this allowed equations for the total reflected light in the regions of overlay to be written as a sum of
two components: a multiplicative transmittance term, which determined the weight for the con-
tribution of the underlying surface; and an additive term, which corresponds the light reflected
by the episcotister surface. By construction, Metelli considered displays containing two uniformly
colored background regions, which gave him a system of two equations and two unknowns that
could be solved in closed form. A significant body of work showed that the perception of trans-
parency is often well predicted by Metelli’s episcotister model: balanced transparency is perceived
when displays were consistent with the episcotister equations, but generally not otherwise. Note
that Metelli’s model served double duty as both a physical model of transparency and a psychologi-
cal model of the conditions that elicit percepts of transparency.
Despite these successes, Metelli himself noted a curious discrepancy between the predictions of
the episcotister model and perception: a light episcotister looks less transmissive than dark epis-
cotister (Metelli 1974a). From a ‘recovery’ point of view, this constitutes a perceptual error, and
hence non-ideal performance, but almost no experimental work was conducted to understand
this deviation from the predictions of Metelli’s model. We therefore performed a series of experi-
ments to test whether the physical independence of opacity and lightness is observed psychophys-
ically (Singh and Anderson 2002). Observers matched the transmittance of simulated surfaces
that varied in lightness, and the lightness of transparent filters that varied in transmittance. We
found that lightness judgments were modulated by simulated transmittance, and transmittance
judgments were modulated by simulated variations in lightness. Thus, although the transmittance
and reflectance of transparent layers are physically independent parameters in Metelli’s model,
they are not experienced as being independent perceptually.
The perceptual representation of transparency, lightness, and gloss 469

What theoretical conclusions can be drawn from these results? Metelli’s model treated a physical
model of transparency as a perceptual model of transparency. Our findings of mutual ‘contamin-
ation’ of the transmittance and lightness of the transparent filter implies one of two possibilities: (1)
there is no simple correspondence between the dimensions of a physical model and a perceptual
model, or (2) that Metelli’s model is the wrong physical model on which to base theories of per-
ceived transparency. With respect to (1), Metelli’s model equates the perceived opacity of an epis-
cotister with its physical transmittance, and hence cannot explain why light episcotisters look more
opaque than dark episcotisters. The dependence of perceived opacity on lightness can be readily
understood, however, if the visual system relied on image contrast to assess the hiding power of
transparent surfaces. A light episcotister reduces the contrast of underlying surface structure more
than an otherwise identical dark episcotister, and hence, should appear more opaque if the visual
system uses image contrast to assess perceived opacity1. Indeed, it seems almost inevitable that the
visual system utilizes contrast to judge the perceived opacity of transparent filters, since contrast
determines the visibility of image structure in general. But this implies that the visual system is
using the ‘wrong’ image properties to generate our experience of a world property, and hence will
almost always result in the ‘wrong’ answer. From the perspective of explaining our experience, such
issues are largely irrelevant; the only issue is whether there is sufficient information in the image to
explain what it is we experience about the world, not whether such percepts are veridical.
Alternatively, it could be (and has been) argued that the discrepancy between perception and
Metelli’s model merely provides evidence that there is something wrong with Metelli’s model, and
does not impact on the more general claim that perception can be identified with the recovery
of some physical model. Faul and Ekroll (2011) have made precisely this argument. They con-
tend that a subtractive filter model better captures the perception of chromatic transparency, and
hence may be a more appropriate model of achromatic transparency as well. Although there is
currently insufficient data to determine which of these alternatives is ultimately correct for achro-
matic stimuli, Faul and Ekroll reported substantial discrepancies between their filter model and
perceived transparency when the chromatic content of the illuminant was varied, despite demon-
strating that there was theoretically sufficient information for a much better level of performance
(Faul and Ekroll 2012). At this juncture, there is currently no physical model that maps directly
onto our experience of transparent surfaces, and it is largely a matter of scientific faith that such a
model may ultimately be discovered.

2.2  Lightness
The perception of lightness also has been treated as a kind of segmentation problem. For ach-
romatic surfaces, the term lightness (or albedo) refers to a surface’s diffuse reflectance. The light
returned to the eye is a conflated mixture of the illuminant, surface reflectance, and 3D pose.
There is currently extensive debate over the computations, mechanisms, and/or assumptions that
are responsible for generating our experience of lightness (see Gilchrist’s chapter, this volume).
There are four general theoretical approaches to the problem of lightness:  scission (or layers

1  This reduction in contrast occurs for almost any definition of contrast, which includes a divisive normaliza-
tion term that is a function of integrated or mean luminance in the region over which contrast is defined.
Unfortunately, there is currently no general definition of contrast that adequately captures perceived contrast
in arbitrary images, so the precise way in which contrast is reduced depends on the definition of contrast used
in a particular context.
470 Anderson

models), equivalent illuminant models, anchoring models, and filter or filling-in models. I con-
sider each model class in turn.

2.2.1  Models and theories of lightness


Scission models
Scission models assert that the visual system derives lightness by explicitly segmenting the illu-
minant from surface reflectance in a manner analogous to the decomposition that occurs in con-
ditions of transparency. Such models have been dubbed layers, scission, or intrinsic image models
(Adelson 1999; Anderson 1997; Anderson and Winawer, 2005, 2008; Barrow et al. 1978; Gilchrist
1979). In models of lightness, scission models assert that the visual system teases apart the contri-
butions of reflectance, the illuminant, and 3D pose. Although some authors associate scission (or
intrinsic image) models with veridical perception (Gilchrist et al. 1999), there is nothing inherent
in scission models that mandates this association. The concept of scission entails a claim about a
particular representational format or process of image decomposition that is presumed to under-
lie our experience of lightness. The hypothesized segmentation processes responsible for gener-
ating the putative layered representation may or may not result in veridical lightness percepts
depending on how (and how well) the visual system performs the hypothesized decomposition.
Equivalent illumination
One model that is conceptually related to layers models is the equivalent illumination model
(EIM) developed by Brainard and Maloney (2011). As with layers models, the EIM assumes that
the visual system recovers surface reflectance by factoring the image into two components: an
estimate of the illuminant (which they term an ‘equivalent illuminant’) and surface reflectance.
Whereas layers models have assumed that there is an explicit representation of both the illumin-
ant and surface reflectance, the same is not necessarily true for the EIM. The EIM is a two-stage
model which asserts that the visual system begins by generating an estimate of the illuminant,
and uses this information in a second stage to derive surface reflectance properties from the
image data. This model remains mute as to how the visual system estimates the parameters of the
estimated illuminant from images and also remains uncommitted as to the any representational
format the EI may take. The main experimentally assessable claim is that it predicts that the para-
metric structure of color or lightness matches can be described by some EIM. The approach of
the EIM can be understood as follows: Given a set of reflectance matches, is it possible to find a
model of the illuminant that is consistent with the matches? Note that there is no presumption
that the particular EIM that putatively shapes observer’s matches is veridical; the only claim is
that observers’ lightness matches are shaped by some EIM. Indeed, the benefit of this class of
model is that it can in principle account for both veridical matches and/or the specific pattern of
failures in veridicality.
Anchoring theory
A third theoretical approach to lightness is captured by anchoring theory, which was developed
in an attempt to account for a variety of systematic errors in the perception of lightness (Gilchrist
et al. 1999). Unlike layers or EIM models, there is no explicit factorization of the illuminant and
reflectance in anchoring theory. Rather, anchoring theory asserts that perceived lightness is
derived through a set of heuristic rules that the visual system uses to map luminance onto per-
ceived lightness. There are two main components to anchoring theory (see Gilchrist’s chapter, this
volume). First, following Wallach (1948), luminance ratios are used to derive information about
relative lightness. When the full 30:1 range of physically realizable reflectances are present in a
common illuminant, the true reflectance of surfaces can be derived on the basis of these ratios
alone. However, in scenes containing less than this full 30:1 range, some additional information
The perceptual representation of transparency, lightness, and gloss 471

or rule is needed to transform ambiguous information about relative lightness into an estimate of
absolute surface reflectance. For example, an image containing a 2:1 range of luminances could
be generated by surfaces with reflectances of three per cent and six per cent, or five per cent and
10 per cent, 40 per cent, 80, ad infinitum. Anchoring theory asserts that this ambiguity must be
resolved with an anchoring rule, such that a specific relative image luminance (such as the high-
est) is mapped onto a fixed lightness value (such as white). All other lightness values in a scene are
putatively derived by computing ratios relative to this anchor value. A number of fixed points are
possible (e.g., the average luminance could be grey, the highest luminance could be white, or the
lowest luminance could be black), but a variety of experiments, especially those from Gilchrist’s
lab, have suggested that in many contexts, the highest luminance is perceived as white.
Filtering and filling-in models
A third approach to lightness treat lightness percepts as the outputs of local image filters applied
directly to the images (Blakeslee and McCourt 2004; Dakin and Bex 2003; Kingdom and Moulden
1988, 1992; Shapiro and Lu 2011). Such approaches typically do not distinguish between per-
ceived lightness (perceived surface reflectance) and brightness (perceived luminance), at least not
explicitly in the construction of the model. Rather, a new image is generated from a set of transfor-
mations applied to the input image. In a strict sense, filter models are not truly lightness models,
since they simply transform one image into another image. Such models are more appropriately
construed as models of brightness than lightness, since there is no explicit attempt to represent
surface reflectance, or distinguish reflectance from luminance. Their relevance to understanding
lightness depends on the extent to which the distinction between brightness and lightness makes
biological or psychological sense for a given image or experimental procedure. Like anchoring
models, filter approaches to lightness do not explicitly segment image luminance into separate
components of reflectance and illumination.
In a related manner, a variety of filling-in models have been proposed that do not explicitly dis-
tinguish lightness and brightness (Grossberg and Mingolla 1985; Paradiso and Nakayama 1991;
Rudd and Arrington 2001). Such models invoke a two stage process: one that responds to the
magnitude and orientation of ‘edges’ (oriented contrast) and/or gradients, and a second process
that propagates information between such localized ‘edge’ responses to generate a fully ‘filled-in’
or interpolated percept of brightness or color.

2.2.2  Evaluating theories of lightness


As noted in a recent article, the topic of lightness and brightness has historically been quite div-
isive (Kingdom 2011). One source of disagreement involves the very distinction between bright-
ness and lightness. Although such constructs are easily distinguished from each other with regard
to their intended physical referents, it is not clear that (or when) such distinctions have psycho-
logical meaning. The distinction between lightness and brightness is particularly problematic for
the kinds of displays that are typically studied in either lightness or brightness studies. In almost
all cases, the targets of interest have a single, uniform luminance (or approximately so), and are
embedded in highly simplified geometric and illumination contexts. For scenes depicting real or
simulated surfaces, the surfaces of interest are typically flat, matte, and arranged in a single depth
and/or illuminant. They typically lack information about the light field, such as that provided
by specular reflections, 3D structure, shading, and inter-reflections. It is perhaps not surprising,
then, that the field remains divided as to the proper way to understand how such impoverished
displays are experienced, since it is unclear whether the distinction between lightness and bright-
ness is psychologically meaningful in many of these displays. In what follows, I will consider some
recent evidence relevant for each of the theories of lightness described above.
472 Anderson

The core claim of scission models is that our experience of lightness involves the decompos-
ition of the input into separable causes. One of the difficulties in assessing scission models is that
it is not always clear whether (or when) such separation occurs, or what criteria that should be
applied to determine whether such decomposition occurs. One can begin by posing a question
of sufficiency: Can scission induce transformations in perceived lightness when it is phenomen-
ally apparent? The most phenomenologically compelling sense of scission occurs in conditions
of transparency, which requires the satisfaction of both geometric and photometric conditions.
One technique for inducing scission involves manipulating the relative depth and photometric
relationships of stereoscopic Kanizsa figures such as those depicted in Figure 22.1. When the grey,
wedge-shaped segments of the Kanizsa figure’s inducing elements in Figure 22.1 are decomposed
into a transparent layer overlying a white disk (second and fourth rows of Figure 22.1), they appear
substantially darker than when the same grey segment appears to overlie a dark disk (first and third
rows of Figure 22.1). Note that the color of the underlying circular inducing element appears to be

Fig. 22.1  Stereoscopic Kanizsa figure demonstrating the role of scission on perceived lightness for
two different grey values. The small pie shaped inducing sectors are the same shade of dark grey
in the top two rows, and the same shade of light grey in the bottom two rows. When the left two
images are cross fused, or the right two images divergently fused, an illusory diamond is experience.
Note that the diamonds in the first and third rows appear much lighter than their corresponding
figures in the second and fourth rows.
Adapted from Trends in Cognitive Sciences, 2(6), Richard A Andersen and David C Bradley, Perception of three-
dimensional structure from motion, pp. 222–8, Copyright (1998), with permission from Elsevier.
The perceptual representation of transparency, lightness, and gloss 473

‘removed’ from the grey wedge-shaped segments and attributed to the more distant layer, which
putatively transforms the perceived lightness of the transparent layer. Note also that the direction of
the lightness transformation depends on which layer observers are asked to report. If observers are
asked to report the color of the far layer underneath the grey sectors of the top image, they report
it as appearing quite dark (nearly black), since this is the color of the interpolated disc. But if they
are asked to report the near layer of the transparent region, they report it as appearing quite light.
In order to provide more conclusive evidence for the effects of scission on perceived light-
ness, I  constructed stereoscopic variants of Figure 22.1 using random noise textures. The goal
was to induce transparency in a texture such that the light and dark ‘components’ of the texture
would perceptually segregate into different depth planes. An example is presented in Figure 22.2.
When the left two columns are cross-fused, vivid percepts of inhomogeneous transparency can
be observed: The top image appears as dark clouds overlying light disks, and the bottom appears
as light clouds overlying dark disks. Note that the lightest components of the texture in the top
image appear as portions of the underlying disc in plain view, whereas the same regions in the
bottom image appear as the most opaque regions of the light clouds in the bottom image (and vice
versa for the dark regions). We subsequently showed that similar phenomena could be observed
in non-stereoscopic displays. In these images, scission was induced by embedding targets in sur-
rounds that contain textures that selectively group with either the light or dark ‘components’ of the
textures within the targets (Figure 22.3). As with their stereoscopic analogues, the white and black
chess pieces are actually physically identical (i.e., contain identical patterns of texture). Note that
the luminance variations within the texture of the chess piece figures are experienced as variations
in the opacity of a transparent layer that overlie a uniformly colored surface. The opacity of the

Fig. 22.2  Stereoscopic noise patterns can also be decomposed into layers in ways that induce large
transformations in perceived lightness. If the left two images are cross fused or the right two images
divergently fused, the top image appears to split into a pattern of dark clouds overlying light discs
(top), or light clouds overlying dark disks (bottom). The textures in the top and bottom are physically
identical.
Adapted from Neuron, 24(4), Barton L. Anderson, Stereoscopic Surface Perception, pp. 919–28, Copyright (1999),
with permission from Elsevier.
474 Anderson

Fig. 22.3  Scission can also be induced by a selective grouping the light and dark components of
texture of the targets (chess pieces) with the surround. The textures within the chess pieces in the
top and bottom images are identical, but appear as dark cloud overlying light chess pieces on the
top, and light clouds overlying dark chess pieces on the bottom.
Reprinted by permission from Macmillan Publishers Ltd: Nature, 434, Barton L. Anderson and Jonathan Winawer,
Image segmentation and lightness perception, pp. 79–83, doi: 10.1038/nature03271 Copyright © 2005, Nature
Publishing Group.

transparent surface is greatest for luminance values that most closely match the surround along the
borders of the chess pieces (dark on top, light on the bottom), and the least opaque when for lumi-
nance values that are most different from the surround (light on top, dark on the bottom). Note that
the lightest regions within the targets on the dark surround appear in plain view, and the darkest
regions within the targets appear in plain view on the light surround. This bias is evident for essen-
tially all ranges of target luminance tested, although this perceptual fact is in no way mandated by
the physics of transparency, particularly for underlying surfaces that do not appear black or white.
These phenomena demonstrate that scission can induce striking transformations in perceived
lightness in conditions of transparency, but it does not address the broader question of whether
The perceptual representation of transparency, lightness, and gloss 475

scission plays a role in generating our experience of lightness in conditions that do not generate
explicit percepts of multiple layers or transparency.
EIMs also assert that the perception of surface color and lightness is derived by decomposing
the image into estimates of the illuminant and surface reflectance. The evidence in support of this
model is, however, phenomenologically indirect. Work from Brainard’s and Maloney’s labs have
demonstrated that the parametric structure of a variety of matching data can be explained with a
two-stage model in which the first stage involves an estimation of the illuminant (an ‘equivalent
illuminant’), which is then used to derive observers’ reflectance matches from the input images
(Brainard and Maloney, 2011).
Unlike scission models or EIMs, anchoring theory asserts that lightness is derived without
explicitly decomposing the images into an explicit representation of illumination and reflectance.
The central premise of anchoring theory is that the visual system solves the ambiguity of lightness
by treating a particular relative luminance as a fixed (anchor) point on the lightness scale (namely,
that the highest luminance as white), independent of the level of illumination or absolute lumi-
nance values in a scene. To test this claim, we constructed both paper Mondrians displayed in an
otherwise uniformly black laboratory, and simulated Mondrians displayed on a CRT in a dark
black lab room (Anderson et al. 2008; Anderson et al. 2014). In all cases, the highest luminance
in the room was the central target patch of the Mondrian display. We varied both the reflectance
range and illumination level of the former (i.e., paper Mondrians), and the simulated reflectance
range and simulated illuminant levels of the latter simulated Mondrians. For restricted reflectance
ranges (3:1 or less), we found that the highest luminance could vary in perceived lightness as a
function of illumination. For our simulated illuminants and Mondrian displays, observers’ light-
ness matches (expressed as a percentage of reflectance) were a logarithmic function of (simulated)
illuminant, rather than an invariant ‘white’ as predicted by anchoring theory. These results suggest
that the apparent ‘anchoring’ of luminance to ‘white’ is a consequence of the particular experimen-
tal conditions that have been used to assess this model, rather than reflecting an invariant ‘anchor
point’ used to scale other lightness values.
Some recent data has provided some strong evidence against an explicit illumination estima-
tion model, and more generally, any most that relies on luminance ratios to compute perceived
lightness (such as anchoring theory). Radonjic et  al. (2011) conducted experiments depicting
checkerboard displays in a display capable of displaying an extremely large dynamic range, and
found that observers mapped a very high dynamic range (~10,000:1) onto an extended lightness
range of 100:1, which spanned from ‘white’ to ‘dark black’ (the darkest values were obtained using
glossy papers). Such behavior would not be expected for any model that attempts to infer a phys-
ically realizable illuminant, or any realizable reflectance ratios of real surfaces, as embraced by
anchoring theory or the EIM.
One common assumption of anchoring theory and the EIM is that the visual system expli-
citly attempts to extract an estimate of lightness that corresponds to the physical dimension of
surface albedo. The results of Radonjić et al. (2011) provide compelling evidence against this
view. Just as our experience of transparency may not have any direct correspondence to the
physical dimensions that modulate perceived transparency (such as transmittance), the per-
ception of lightness may not represent an approximation of the physical dimension of surface
albedo. The results of Radonjic et al. provide evidence that directly challenge any attempt to
interpret the visual response as a ‘best guess’ as to the environmental sources that produced
their stimuli, since there is no combination of surface reflectance and illuminant that can
produce such stimuli (at least in a common illuminant). I will return to this general point in
the general discussion below.
476 Anderson

3  Gloss
The experience of gloss is another aspect of our experience of surface reflectance that has received a
growing amount of experimental attention. Whereas the concept of surface lightness has been cast
as the problem of understanding how we experience the diffuse reflectance of a surface, the percep-
tion of gloss is typically cast as the problem of understanding how we experience the specular ‘com-
ponent’ of reflectance. From a generative point of view, the diffuse and specular ‘components’ of
reflectance are treated as computationally separable. So construed, the problem of gloss perception
involves understanding how the visual system segments the image structure generated by specular
reflectance from diffuse reflectance (and all other sources of image structure).
The apparent intractability of this problem has inspired attempts to find computational
short-cuts to avoid the complexity of this decomposition problem. One approach asserts that the
visual system uses simple image statistics that do not require any explicit decomposition of the
images into distinct components of reflectance to derive our experience of gloss. Motoyoshi et al.
(2007) argued that perceived gloss was well predicted by an image’s histogram or sub-band skew, a
measure of the asymmetry of the pixel histogram (or response of center-surround filters) respect-
ively. This claim was evaluated for a class of stucco surfaces with a statistically fixed level of surface
relief that were viewed in fixed illumination field. In these conditions, glossy surfaces generated
images with a strong positive skew, whereas matte surfaces generated surfaces with negative skew.
The attractive feature of this kind of model is that it potentially reduces a complex mid-level vision
problem into a comparatively simple problem of detecting low-level image properties.
However, subsequent work has shown that our experience of gloss cannot be understood so
easily (Anderson and Kim 2009; Kim and Anderson 2010; Kim et al. 2011; Marlow et al. 2011;
Olkkonen and Brainard 2010, 2011). One of the main problems with the proposed image statistics
is that they fail to take into account the kind of image structure that predicts when gloss will or

(a) (b)

Fig. 22.4  The perception of gloss depends critically highlights appearing in the ‘right places’ of
a surface’s diffuse shading profile. In A, the highlights appear near the luminance maxima of the
diffused shading profile and have similar orientations, and the surface appears relatively glossy. In B,
the highlights have been rotated so that they appear with random positions and orientations relative
to the diffuse shading profile, and do not appear glossy.
Reproduced from Barton L. Anderson and Juno Kim, Image statistics do not explain the perception of gloss and
lightness, Journal of Vision, 9(11), pp. 1–17, figure 3, doi: 10.1167/9.11.10 © 2009, Association for Research in
Vision and Ophthalmology.
The perceptual representation of transparency, lightness, and gloss 477

won’t be perceived. Specular highlights, and specular reflections more generally, must appear in the
‘right places’ on surfaces to elicit a percept of gloss (see Figure 22.4). From a physical perspective,
specular highlights cling to regions of high surface curvature. The perception of gloss also requires
highlights to appear in specific places and have orientations consistent with surface shading for a
surface to appear glossy, a geometric constraint that is not captured by histogram or sub-band skew.
Although these results suggest that the visual system in some sense ‘understands’ the physics of
specular reflection, there are other findings that reveal that the extent of any such understanding
is limited. The perception of gloss has been shown to interact with a surface’s 3D shape and its
lighting conditions, which are physically independent sources of image variability (Ho et al. 2008;
Marlow et al. 2012; Olkkonen and Brainard 2011). These interactions have been observed by a var-
iety of authors and have resisted explanation. Indeed, these interactions are difficult to understand
from a physical perspective, since gloss and 3D shape are independent sources of image structure.
However, we recently presented evidence that these interactions can be understood as a conse-
quence of a simple set of image cues that the visual system uses to generate our experience of gloss,
which are only roughly correlated with a surface’s physical gloss level (Marlow et al. 2012). Some of
the intuition shaping this theoretical proposal can be gained by considering the surfaces depicted
in Figure 22.5. All of the surfaces in these images have the same physical gloss level, yet appear
to vary appreciably in perceived gloss. Each column contains surfaces with a common degree of

Oblique
illumination

Frontal
illumination

Low relief High relief


Fig. 22.5  Interactions between 3D shape and perceived gloss as a function of the illumination field.
All of the images in this image have the same physical gloss level, but do not appear equally glossy.
The images in the top row were rendered in an illumination field where the primary light sources
were oriented obliquely to the surface, and the images in the second row were illuminated in the
same illumination field with the primary light sources oriented towards the surface.
Reprinted from Current Biology, 22(20), Phillip J. Marlow, Juno Kim, and Barton L. Anderson, The Perception and
Misperception of Specular Surface Reflectance, pp. 1909–13, figure 2, Copyright (2012), with permission from Elsevier.
478 Anderson

relief, and each row contains images that were placed in an illumination field with the same dir-
ection of the primary light sources. We varied the structure of the light field, the direction of the
primary light sources, and 3D surface relief. Observers performed paired comparison judgments
of the perceived gloss of all surfaces, where they chose which of a pair of surfaces was perceived as
glossier. The data revealed complex interactions between the light field and surface shape on gloss
judgments. As can be seen in Figure 22.6, the variation of the illumination field and shape had a sig-
nificant impact on the sharpness, size, and contrast of specular highlights in these images. We rea-
soned that if observers were basing their gloss judgments on these cues, then it should be possible

Cues Perceived gloss and model

100 Coverage Disparity 100 Gloss 100


(depth) (disparity)

75 75 75
Perceived coverage

Perceived depth
50 50 50

Model
25 25 25
Gloss judgements

Perceived gloss
0 0 0

100 Contrast Gloss 100


33% 31% (no disparity)

75 75
Perceived contrast

16%
50
Weighted
50
average
25 20% 25
0%
0 0

1 2 3 4 5
100 Sharpness Skew
Grace (frontal)
3
Perceived sharpness

75

2
Skew

50 Grace (oblique)
Illumination

25 1

Grove (oblique)
0 0
1 2 3 4 5 1 2 3 4 5
Relief height Relief height

Fig. 22.6  Data and model fits for the experiments we performed on the interactions between perceived
gloss, 3D shape (as captured by a measure of surface relief), and the illumination field. The stimuli were
viewed either with or without stereoscopic depth (the ‘disparity’ and ‘no disparity’ conditions respectively).
The different colored curves in each graph correspond to a different illumination direction of a particular
illumination field (called ‘Grace’). The gloss judgments are in the two top right panels. The panels on the
left represent the judgments of a separate group of observers of four different cues to gloss: the depth,
coverage, contrast, and sharpness of specular reflections. The panel labeled ‘skew’ was computed directly
from images. The dotted lines in the two graphs on the top right correspond to the best fitting linear
combination of the cues on the left, which account for 94 per cent of the variance of gloss judgments. The
weights are denoted in the boxes adjacent to the small arrows in the center of the graphs.
Reprinted from Current Biology, 22 (20), Phillip J. Marlow, Juno Kim, and Barton L. Anderson, The Perception and
Misperception of Specular Surface Reflectance, pp. 1909–13, figure 3, Copyright (2012), with permission from Elsevier.
The perceptual representation of transparency, lightness, and gloss 479

to model observers’ gloss judgments with a weighted combination of these image cues. However,
there is currently no known method for computing these cues directly from image. We therefore
had independent sets of observers judge each of these cues, and tested whether it was possible to
predict gloss judgments with a weighted sum of these cues. We found that a simple weighted sum
model was capable of predicting over 94 per cent of the variance of the other observers’ gloss judg-
ments. Thus, although the perception of surfaces with the same physical gloss level can appear to
vary significantly in perceived gloss, these effects can be understood with a set of relatively simple,
albeit imperfect, ‘cues’ that the visual system uses to generate our experience of gloss.

4  The perceptual organization of surfaces and materials


The last few decades have witnessed an explosive increase in models that have treated visual pro-
cesses as a collection of approximately ideal ‘solutions’ to particular computational problems.
Such models are explicitly teleological: they treat a desired outcome, goal, or task as the organizing
force that shapes the perceptual abilities they are attempting to model. Evolutionary theory serves
as the engineering force that putatively drives biological systems toward optimal solutions. This
modeling process hinges critically on the ability to specify the ‘natural tasks’ that were putatively
shaped by evolution. The justification for the adaptive importance of a particular ‘natural task’
typically takes a generic form: an environmental property is treated as having evolutionary sig-
nificance because it is an intrinsic property of the world. Thus, any animal capable of accurately
recovering that property would gain an adaptive advantage. The properties to be recovered—the
‘tasks’ of vision—are defined in with respect to particular physical sources of variability. Our expe-
rience of lightness is treated as the visual system’s solution to the problem of recovering the albedo
of a surface. Our experience of transparency is treated as the perceptual solution to a particular
generative model of transparency (such as Metelli’s episcotister model or Faul and Ekroll’s filter
model). And our experience of gloss is understood as the visual system’s attempt to estimate the
specular component of surface reflectance.
One of the assumptions of this approach is that the dimensions of psychological variation are
assumed to mirror the sources of physical variation. This assumption is explicit in both Metelli’s
model, which treated the episcotister as both a physical and psychological model of transparency,
and the EIM of Brainard and Maloney, which asserts that the visual system generates a ‘virtual’
model of the illuminant to recover color and lightness. The perception of gloss has also been studied
as a kind of ‘constancy’ problem, which involves recovering the specular ‘component’ of reflectance.
A main theme of this chapter is to question the adequacy of this conceptualization of vision.
Rather than attempting to guess the ‘natural tasks’ and an animal, I view the goal of perceptual
theory to discover the ‘natural decompositions’ of representational space, i.e., to discover the psy-
chological dimensions that capture the space of our experiences. The preceding focused on our
experience of transparency, lightness, and gloss. Each of these attributes can be identified with a
particular physical property of surfaces and material, which can be described in physical terms
independently of any perceptual system. Such descriptions assume that the visual system plays no
part in defining the attributes that it putatively represents; the dimensions are given by identifiable
sources of variation in the world, which the visual system is attempting to recover, not by intrin-
sic properties of the visual system. We are left discussing how well the visual system encodes or
recovers a particular world property, rather than how the visual system contributes to shaping the
dimensions of our visual experience.
The preceding suggests that this general approach fails to explain a number of different phenom-
ena in surface and material perception. The perception of surface opacity does not follow Metelli’s
480 Anderson

model of transmittance. We argued that one of the main reasons for this failure was that Metelli’s
model is based on a ratio of luminance differences, where are not available to a visual system that
transforms retinal luminance into local contrast signals. We showed that our matching data were
well predicted by a model in which observers matched contrast ratios, rather than luminance dif-
ference ratios. One of the key points of our model was to define transmittance in a way that was
consistent with intrinsic coding properties of the visual system, even if this results in the failure to
compute physically accurate measure of surface opacity. This general approach of a physiologic-
ally motivated model has also been pursued by a recent model of these results by Vladusich, who
proposed an alternative model of our transmittance matching data (Vladusich 2013). He shows
that our transmittance matching data can be captured with a modified version of Metelli’s model in
which log luminance values are used instead of luminance values (Vladusich, submitted). Like our
model, the choice to use Log luminance values cannot be derived from the physics of transparent
surfaces; they are derived from intrinsic response properties of the visual system.
The different theories of lightness perception are even more contentious and diverse than those
found in the transparency literature. One of the basic issues involves the distinction between light-
ness and brightness. The perception of lightness is then defined as the perception of diffuse (ach-
romatic) surface reflectance, whereas brightness is defined as the perception of image luminance.
The presumption is that these physical distinctions have psychological meaning. But this is far from
self-evident. The majority of work on lightness has used 2D (flat) matte displays of surfaces with
uniform albedos, for which the distinction between lightness and brightness is arguably least valid
(or meaningful) perceptually. For some experimental conditions, observers’ matching data will dif-
fer substantially if instructed to match either brightness or lightness. But in others, a difference in
instructions may make little or no difference. Consider, for example, the problem of matching the
‘brightness’ versus the ‘lightness’ of the checker-shadow illusion. A given patch appears a particular
shade of grey, and there is no evidence that observers could distinguish its brightness and light-
ness. In support of this view, we found that the perception of lightness increased as a function of
its luminance in both simulated and ‘real’ Mondrian displays. Moreover, the data of Radonjić et al.
(2011) demonstrate that observers will readily map a physically unrealized set of luminances, span-
ning 4 orders of magnitude, onto a lightness scale two orders smaller. These results are impossible
to reconcile with models that treat the problem of lightness as a recovery problem, since the range
of reflectances in a natural scene can only span a range of ~30:1.
In the perception of gloss, we found that observer’s experience of gloss can be well predicted by
a set of simple cues that are only imperfectly correlated with the physical gloss of a surface. Gloss
is not defined with respect to some physically specified dimension of surface optics, but with
respect to a set of cues the visual system uses as a proxy for an objectively defined surface property.
What general understanding can be gleaned from these patterns of results? All of these results
reveal the insufficiency of attempting to identify psychological dimensions of our experience with
physical sources of image variability. The fact that we have a particular experience of lightness,
gloss, or transparency does not imply that the dimensions of our experience map onto a particu-
lar physical dimension and/or its parameterization. The general argument used to justify ‘natural
tasks’ takes the generic form that ‘getting an environmental property right increases adaptive fit-
ness.’ The presumed identification of fitness with veridical perception is actually fallacious (see
Hoffman 2009; cf. Lewontin 1996), but even if such views were accepted, they are incapable of
distinguishing perceptual abilities that were actually shaped by natural selection from the ‘span-
drels’ that came along for the evolutionary ride. The fact that human observers will readily map
an ecologically unobtainable range of luminance values (in a single illuminant) onto lightness
estimates suggests that lightness may be one example of a perceptual spandrel. Although human
observers can usually distinguish reflectance differences from other sources of image variation, the
The perceptual representation of transparency, lightness, and gloss 481

perception of absolute lightness may simply be the result of low-level processes of adaptation that
allow the visual system to encode a particular range of luminance values. Indeed, I am aware of
no compelling evidence or argument about why lightness constancy per se provided an adaptive
advantage, or is something that the visual system is explicitly ‘designed’ to compute. A similar
argument holds for the perception of transparency and gloss. We can readily distinguish between
surfaces or media that transmit light from those that do not, or distinguish between surfaces that
reflect light specularly from those that do not. But the data also suggests that we do not scale these
dimensions in a way that is physically correct for any of these properties.
Although it is difficult to craft a compelling argument for the specific adaptive utility of develop-
ing a physically accurate model of lightness, gloss, and transparency, the fact that we experience
these different sources of variable as different underlying causes implies that the visual system is
capable of at least qualitatively distinguishing different sources of image structure. This ‘source
segmentation’ is arguably one of the most important general properties of our visual system. The
visual system may, in fact, be quite poor in estimating lightness in arbitrary contexts, but it is
nonetheless typically quite good at distinguishing image structure generated by lightness differ-
ences from illumination changes, or variations in the opacity of a transparent surface, or from
specular reflections. The identification of specular reflections as specular reflections depends on
their compatibility with diffuse surface shading and 3D surface geometry, and is modulated by the
structure, intensity, and distribution of image structure so identified, even if it does not accurately
capture the ‘true’ gloss level of a surface. And although the physical transmittance (or opacity) of a
surface does not vary as a function of its albedo or color, the psychological analog of opacity—its
‘hiding power’—will for a visual system that uses contrast to determine the visibility of image
structure. The visual system may not determine the ‘true’ opacity of a surface, but nonetheless
is effective at performing a segmentation that captures the presence or absence of transmissive
surfaces and media.

5  Summary and conclusions


In this chapter, I have considered a number of topics in the area of surface and material percep-
tion: transparency, lightness, and gloss. The organization of these topics was largely shaped by
my historical progression in conducting research into each of these domains; many alternative
organizations are possible. In all of these areas of inquiry, there has been a striking tendency to
treat physical models of image formation as some kind of approximation to a perceptual model
of their apprehension. The precise way that a physical model ‘counts’ as a psychological model is
typically left unspecified. It appears to be based on some intuition that the visual system ‘knows’
or ‘understands’ the physics that of a particular surface or material attribute. I contend that one of
the main goals of vision science should be to discover the dimensions of perceptual experience,
and the image variables that modulate our response to them. Whereas the dimensions of physical
variables can be specified independently of any perceptual system, the dimensions of perceptual
experience are inherently relational, and must consider the intrinsic properties of the visual sys-
tem as well as the environments in which they operate.

References
Adelson, E. H. (1999). ‘Lightness perception and lightness illusions’. In The new cognitive neurosciences, 2nd
ed., pp. 339–51. (Cambridge, MA: MIT Press).
Anderson, B. L. (1997). ‘A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions’. Perception 26(4): 419–53.
482 Anderson

Anderson, B.L. (1998). Stereovision: Beyond disparity computations. Trends in Cognitive Sciences,


2: 222–228.
Anderson, B.L. (1999). Stereoscopic surface perception. Neuron, 24: 919–928.
Anderson, B. L., and Kim, J. (2009). ‘Image statistics do not explain the perception of gloss and lightness’.
Journal of Vision 9(11): 1–17.
Anderson, B. L., and Winawer, J. (2005). ‘Image segmentation and lightness perception’. Nature
434(7029): 79–83. doi: 10.1038/nature03271.
Anderson, B. L., and Winawer, J. (2008). ‘Layered image representations and the computation of surface
lightness’. Journal of Vision 8(7): 18, 11–22. doi: 10.1167/8.7.18.
Anderson, B. L., de Silva, C., and Whitbread, M. (2008). ‘Lightness perception has no anchor’. Journal of
Vision 8(6): 284.
Anderson, B.L., Whitbread, M., & de Silva, C. (2014) Lightness, brightness, and anchoring. Journal of
Vision, 14(9): 7, 1–13. doi: 10.1167/14.9.7
Barrow, H. G., Tenenbaum, J. M., Hanson, A., and Riseman, R. (1978). ‘Recovering intrinsic scene
characteristics from images’. Computer Vision Systems, pp. 3–26. (New York: Academic Press).
Blakeslee, B., and McCourt, M. E. (2004). ‘A unified theory of brightness contrast and assimilation
incorporating oriented multiscale spatial filtering and contrast normalization’. Vision Research
44(21): 2483–503. doi: 10.1016/j.visres.200405.015.
Brainard, D. H., and Maloney, L. T. (2011). ‘Surface color perception and equivalent illumination models’.
Journal of Vision 11(5), doi: 10.1167/11.5.1.
Dakin, S. C., and Bex, P. J. (2003). ‘Natural image statistics mediate brightness “filling in” ’. Proc Biol Sci
270(1531): 2341–8. doi: 10.1098/rspb.2003.2528.
Faul, F., and Ekroll, V. (2011). ‘On the filter approach to perceptual transparency’. Journal of Vision
11(7): doi: 10.1167/11.7.7.
Faul, F., and Ekroll, V. (2012). ‘Transparent layer constancy’. Journal of Vision 12(12): 1–26.
doi: 10.1167/12.12.7.
Feldman (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Geisler, W. S., and Ringach, D. (2009). ‘Natural systems analysis. Introduction’. Vis Neurosci 26(1): 1–3.
Gerbino (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford:
Oxford University Press).
Gilchrist, A. L. (1979). ‘The perception of surface blacks and whites’. Sci Am 240(3): 112–2, 124.
Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T., Cataliotti, J., Li, X. J., . . . Economou, E. (1999). ‘An
anchoring theory of lightness perception’. Psychological Review 106(4): 795–834.
Gilchrist, A. (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Grossberg, S., and Mingolla, E. (1985). ‘Neural dynamics of form perception: boundary completion,
illusory figures, and neon color spreading’. Psychol Rev 92(2): 173–211.
Ho, Y. X. et al. (2008). ‘Conjoint measurement of gloss and surface texture’. Psychol Sci 19(2): 196–204.
Hoffman, D. (2009). ‘The interface theory of perception: Natural selection drives true perception to swift
extinction’. In Object categorization: Computer and human vision perspectives, edited by S. Dickinson, M.
Tarr, A. Leonardis, B. Schiele, pp. 148–65. (Cambridge: Cambridge University Press).
Kim, J., and Anderson, B. L. (2010). ‘Image statistics and the perception of surface gloss and lightness’.
Journal of Vision 10(9): 1–17.
Kim, J., Marlow, P., and Anderson, B. L. (2011). ‘The perception of gloss depends on highlight congruence
with surface shading’. Journal of Vision 11(9), 1–19. doi: 10.1167/11.9.4.
The perceptual representation of transparency, lightness, and gloss 483

Kingdom, F. A. (2011). ‘Lightness, brightness and transparency: a quarter century of new ideas,
captivating demonstrations and unrelenting controversy’. Vision Res 51(7): 652–73. doi: 10.1016/j.
visres.2010.09.012.
Kingdom, F., and Moulden, B. (1988). ‘Border effects on brightness: a review of findings, models and
issues’. Spat Vis 3(4): 225–62.
Kingdom, F., and Moulden, B. (1992). ‘A multi-channel approach to brightness coding’. Vision Res
32(8): 1565–82.
Lewontin, R.C. (1996). ‘Evolution as Engineering’. In Integrative Approaches to Molecular Biology, edited by
J. Collado et. al. (Cambridge, MA: MIT Press).
Marlow, P., Kim, J., and Anderson, B. L. (2011). ‘The role of brightness and orientation congruence in the
perception of surface gloss’. Journal of Vision 11(9): 1–12. doi: 10.1167/11.9.16
Marlow, P. J., Kim, J., and Anderson, B. L. (2012). ‘The perception and misperception of specular surface
reflectance’. Curr Biol 22(20): 1909–13. doi: 10.1016/j.cub.2012.08.009.
Metelli, F. (1970). ‘An algebraic development of the theory of perceptual transparency’. Ergonomic
13: 59–66.
Metelli, F. (1974a). ‘Achromatic color conditions in the perception of transparency’. In Perception: Essays in
honor of J.J. Gibson, edited by R. B. MacLeod and H. L. Pick, pp. 95–116. (Ithaca, NY: Cornell University
Press).
Metelli, F. (1974b). ‘The perception of transparency’. Scientific American 230: 90–8.
Metelli, F. (1985). ‘Stimulation and perception of transparency’. Psychol Res 47(4): 185–202.
Motoyoshi, I., Nishida, S., Sharan, L., and Adelson, E. H. (2007). ‘Image statistics and the perception of
surface qualities’. Nature 447(7141): 206–9. doi: 10.1038/nature05724.
Olkkonen, M., and Brainard, D. H. (2010). ‘Perceived glossiness and lightness under real-world
illumination’. Journal of Vision 10(9): 5. doi: 10.1167/10.9.5.
Olkkonen, M., and Brainard, D. H. (2011). ‘Joint effects of illumination geometry and object shape in the
perception of surface reflectance’. Iperception 2(9): 1014–34. doi: 10.1068/i0480.
Paradiso, M. A., and Nakayama, K. (1991). ‘Brightness perception and filling-in’. Vision Res
31(7–8): 1221–36.
Radonjić, A., Allred, S. R., Gilchrist, A. L., and Brainard, D. H. (2011). ‘The dynamic range of human
lightness perception’. Curr Biol 21(22): 1931–6. doi: 10.1016/j.cub.2011.10.013.
Rudd, M. E., and Arrington, K. F. (2001). ‘Darkness filling-in: a neural model of darkness induction’. Vision
Res 41(27): 3649–62.
Shapiro, A., and Lu, Z. L. (2011). ‘Relative brightness in natural images can be accounted for by removing
blurry content’. Psychol Sci 22(11): 1452–9. doi: 10.1177/0956797611417453.
Singh, M., and Anderson, B. L. (2002). ‘Toward a perceptual theory of transparency’. Psychological Review
109(3): 492–519. doi: 10.1037//0033–295x.109.3.492.
Vladusich, T. (2013). ‘Gamut relativity: A new computational approach to brightness and lightness
perception’. Journal of Vision 13(1): 1–21 doi: 10.1167/13.1.14.
Wallach, H. (1948) ‘Brightness constancy and the nature of achromatic colors’. Journal of Experimental
Psychology 38: 310–24.
Section 6

Motion and event perception


Chapter 23

Apparent motion and reference frames


Haluk Öğmen and Michael H. Herzog

The History of Apparent Motion


and its Role in Gestalt Psychology
Mathematical foundations of space and time,
Zeno’s paradoxes, and the implied
psychological theory
By definition, motion is change of position over time. To understand motion from a psychological
perspective, one needs to appeal to the concepts whereby space and time are defined from the per-
spective of physics (to express the stimulus) and from the perspective of psychology (to express
the percept). Around 450 BC, Zeno studied how motion can be expressed using the concepts of
space and time available at that time (Kolers 1972). Zeno’s analysis of physical motion led him
to paradoxes that he could solve by suggesting that motion is a purely psychological construct. In
one of these paradoxes, Achilles is trying to catch a tortoise up in a race where the tortoise starts
with an initial advantage. Zeno argues that Achilles will never be able to catch up with the tortoise
because by the time Achilles reaches the tortoise’s starting point, the tortoise will have advanced
to a new position; by the time Achilles reaches this new position, the tortoise will be yet at another
position further down the road, and so on . . . Zeno thought that even if Achilles moves faster
than the tortoise and reduces his distance at every iteration, he will still have to do this infinitely
many times. Lacking the concept of infinity and convergent series, he concluded that Achilles
would never be able to catch the tortoise. A similar paradox arises if one wants to move from
point A to point B. Zeno reasoned that infinitely many points need to be crossed and that one can
never move between two points. When time is conceived as a continuous variable composed of
infinitely short (i.e. duration-less) instants, one cannot be in motion because, by definition, the
instant has no duration to allow change in position. If motion is not physically possible, what then
explains our percepts of moving objects? Zeno thought that objects exist at different locations at
different time instants. These percepts are stored in the memory and compared over time. When
a disparity in spatial position is detected, we create an illusion of motion to resolve this dispar-
ity. Progress in mathematics (the development of the concept of convergent series) removed the
conceptual barriers to expressing motion as a physical stimulus. Armed with this new mathemat-
ics, naïve realistic approaches focused on how this real motion can be perceived as a veridical, as
opposed to an illusory percept. Nevertheless, the psychological implications of Zeno’s analysis
have been enduring.
488 ÖĞMEN AND HERZOG

Exner’s and Wertheimer’s contributions, types


of apparent motion, and Korte’s laws
About 2500  years later an important advance occurred when Exner (1875) created a stimulus
consisting of two brief flashes presented at two spatially neighbouring locations. With proper
selection of timing and separation parameters, this stimulus generated the perception of motion,
the first flash appearing to move smoothly to the location of the second flash. Since there was no
stimulation of the points intermediate between the two flashes, this was indeed an illusion created
by the perceptual system. More generally, Exner found that when the interstimulus interval (ISI)
between the flashes was 10 ms or less, the two flashes were perceived as simultaneous; subjects
could not reliably report their temporal order. When the ISI was increased, the perception was
that of a single object moving from one position to the other. At longer ISIs, the stimuli appeared
as two temporally successive flashes without the perception of motion. The finding that the per-
ception of motion occurred at ISIs at which the temporal order of stimuli cannot be resolved led
Exner to reject Zeno’s memory explanation. Since the temporal order of the two stimuli cannot
be determined, the contents of memory should appear simultaneous and no motion should be
perceived. Hence, Exner defended the view that motion is not an indirect property inferred from
the analyses of objects over time, but instead it is a basic dimension of perception.
The experimental technique developed by Exner was essential to Max Wertheimer’s influential
study that led to the development of Gestalt psychology (Wertheimer, 1912; for a review of the devel-
opment of Gestalt psychology see Wagemans, this volume). Using a borrowed tachistoscope, and
with Wolfgang Köhler and Kurt Koffka as his subjects, Wertheimer extended Exner’s study by creat-
ing a richer and more nuanced phenomenology. Exner’s three stages (simultaneity, motion, succes-
sion) were refined further by describing different types of perceived motion: one type of perceived
motion was smooth movement of the object as described by Exner. This was called beta motion.
A second type is partial movement, i.e. the object appears to move up to a certain point along the
trajectory between the flashes, disappears, and reappears in movement again at a further point along
the trajectory. Finally, a third type of movement, called phi motion, corresponded to the percept of
movement without any specific form, i.e. ‘figureless movement’. Wertheimer used phi motion to
argue that the perception of motion does not emerge from the comparison of objects in memory but
is a fundamental dimension of perception in its own, separate from the perception of form.
The following terminology is used: the perception of motion generated by two flashes is called
apparent motion. Phi and beta motions are subtypes of apparent motion. They are distinguished
from real motion, which refers to the perception of motion generated by a smoothly moving
object.1 Following Wertheimer’s study, the Gestalt psychologists Korte and Neuhaus explored fur-
ther the effect of various stimulus parameters leading to the so-called ‘Korte’s laws’ (Korte 1915;
Neuhaus 1930). These ‘laws’ can rather be viewed as rules of thumb, since the relationship of the
percept to the parameters is rather complex (e.g. Kolers 1972; Gepshtein and Kubovy 2007). In
short, Korte’s laws state that to obtain the percept of apparent motion between flashes: (1) larger
separations require higher intensities, (2)  slower presentation rates require higher intensities,
and (3) larger separations require slower presentation rates (see the demos “AM different shapes”,
“AMintermediate ISI apparent motion”,“AM Long ISI”, “AM Short ISI”).

1  Note that the terms apparent/real motion may refer to the stimulus or to the percept generated by the stimu-
lus, depending on the context. Stroboscopic motion and sampled motion are synonymous terms for apparent
motion; the former derived from the equipment used to generate it (a stroboscope), while the latter term
highlights its relation to real motion (see Section Motion detection as orientation detection in space-time).
Apparent Motion and Reference Frames 489

Since this early work, there have been a large number of studies investigating systematically
the dependence of motion perception on a broader range of stimulus parameters. Around the
1980s, the focus of research shifted from explaining the complex phenomenology of motion to
the more basic question of how we detect motion. Several computational models have been pro-
posed and were eventually united under a broad umbrella. In The Computational Basis of Motion
Detection we briefly review these models after which we will return to the main theme of our
chapter, namely phenomenal and organizational aspects of motion.

The Computational Basis of Motion Detection


Motion detection as orientation detection in space–time
As shown in Figure 23.1(A), the real (continuous) motion of an object with a constant speed can
be described by an oriented line in a space–time diagram. An apparent motion stimulus is a sam-
pled version of this stimulus consisting of two (or more) discrete points on the pathway (Figure
23.1B). Mechanisms for detecting motion have been described as filters tuned to orientation in
space–time. Among the earliest models, the Barlow–Levick model (Barlow and Levick 1965)
takes its input from one point in space, delays it, and compares it (with Boolean ‘AND’ opera-
tion) with the input from another point in space. The Hassenstein–Reichardt correlation model
(Hassenstein and Reichardt 1956) works on a similar principle but the comparison is carried out
by the correlation integral (Figure 23.1C). Since these models sample space at two discrete spatial
and temporal positions, they respond to apparent and real motion in the same way. More elabo-
rate versions of these models include denser sampling to build a space–time receptive field, as
shown in Figure 23.1(D). These spatiotemporal models have been further extended by introduc-
ing nonlinearities at early stages so that they can respond to second-order stimuli (i.e. defined by
stimulus dimensions other than luminance, such as texture). Finally, a third-order motion system
has been proposed that requires attention (for review see Lu and Sperling 2001). Salient features
are detected and tracked over time. One implication of spatiotemporally localized receptive fields
is that each motion-detecting neuron ‘views’ a small part of the space via its receptive field which
acts as an ‘aperture’. When a uniform surface or edge moves across the viewing aperture, only
the motion component perpendicular to the edge can be measured by a local motion detector, a
problem known as the aperture problem (for a review see Bruno and Bertamini, this volume). The
solution of the aperture problem requires integration of motion signals across space. The motion
integration problem will be discussed in the following sections within a broader context, namely
even when each local measurement is accurate.

Is motion an independent perceptual dimension?


Given this background, we can now return to one of the original questions about motion percep-
tion: is it derived from comparisons of an object over time through memory or is it a fundamental
dimension of perception? At first glance, all the models already discussed involve memory (e.g.
delay or temporal filtering operations) and carry out comparisons (e.g. AND gate or correlation).
However, first- and second-order models compare relatively raw inputs without prior computa-
tion of form. As such, they constitute models that represent motion as an independent dimension.
The third-order motion system, however, identifies and tracks features; this system is, at least
partially, built on form analysers.
From the neurophysiological perspective, motion-sensitive neurons have been found in many
cortical areas. In particular, visual areas MT and MST are highly specialized in motion processing
490 ÖĞMEN AND HERZOG

(a) Space (b) Space

Time

Time
(c)
Compare

Delay
(d)
Space Space
Time

Time

Fig. 23.1  (a) The trajectory of a stimulus moving with a constant speed can be described as an
oriented line in a space–time diagram. (b) Apparent motion stimulus is a sampled version of
continuous motion. (c) A motion detector samples the input at two spatial locations and carries out
a delay-and-compare operation. (d) The denser sampling in space–time yields an oriented receptive
field for the motion detector. This detector will become maximally active when the space–time
orientation of the motion stimulus matches the orientation of its receptive field.

(for a review see Albright and Stoner 1995). These areas are located in the dorsal stream as opposed
to the form-related areas located in the ventral stream. In sum, there is a broad range of evidence
for the existence of different systems dedicated to the processing of motion and form and that
motion constitutes an independent perceptual dimension. However, there is also evidence that
these systems are not strictly independent, but rather interact.

The Problem of Phenomenal Identity


and the Correspondence Problem
After Wertheimer’s pioneering work on apparent motion the major focus of Gestalt psychology
shifted to static images, but there was still a strong emphasis on motion. In his 1925 dissertation,
with Wertheimer as his second reader, Joseph Ternus took up the task of studying how grouping
Apparent Motion and Reference Frames 491

(a) (b)
Frame 1
Frame 1

ISI ISI

Frame 2 Frame 2

(c) (d)
Frame 1
Frame 1

ISI
ISI
Frame 2 Frame 2

Fig. 23.2  (a) A simple Ternus–Pikler display. (b) An apparent motion stimulus with two different
shapes. (c) The influence of shape is strong in correspondence matching when there is overlap
between stimuli (left) and becomes weaker as the overlap is eliminated (right). (d) A stimulus
configuration used by Ternus to investigate the relationship between local motion matches and
global shape configurations.

principles can be applied to stimuli in motion. The fundamental question he posed was what he
termed the problem of phenomenal identity: ‘Experience consists far less in haphazard multiplicity
than in the temporal sequence of self-identical objects. We see a moving object, and we say that
‘this object moves’ even though our retinal images are changing at each instant of time and for
each place it occupies in space. Phenomenally the object retains its identity’ (Ternus 1926). He
adopted a stimulus previously used by Pikler (1917), shown in Figure 23.2(A).
The first frame of this stimulus contains three identical elements. In the second frame, these elements
are displaced so that some of them overlap spatially with the elements in the previous frame. In the
example of Figure 23.2(A), the three discs are shifted by one interdisc distance so that two of the discs
overlap across the two frames. Given all identical elements in the two frames, one can then ask how
will the elements be grouped across the two frames? This question was later termed the ‘motion corre-
spondence’ problem. If we consider the central disc in frame 2 (Figure 23.2A), will this disc be grouped
with the rightmost disc of the first frame based on their common absolute spatial location, i.e. the same
retinal position, or will it be grouped with the central disc of frame 1 based on their relative position as
the central elements of spatial groups of three elements? The answer to this question turned out to be
quite complex, with several variables influencing the outcome. For example, when the ISI between the
two frames is short, the leftmost element in the first frame appears to move to the rightmost element
in the second frame while the spatially overlapping elements in the centre appear stationary (i.e. they
are grouped together). For longer ISIs, a completely different organization emerges: the three elements
appear to move in tandem as a group, i.e. their relative spatial organization prevails in the spatiotempo-
ral organization. These two distinct percepts are called element and group motion, respectively. Many
other variables, such as interelement separation, element size, spatial frequency, contrast, ISI, lumi-
nance, frame duration, eccentricity, and attention influence which specific organization emerges as
the prevailing percept (e.g. Pantle and Picciano 1976; Pantle and Petersik 1980; Breitmeyer and Ritter
492 ÖĞMEN AND HERZOG

1986a, 1986b; Casco and Spinelli 1988; Dawson et al. 1994; He and Ooi 1999; Alais and Lorenceau
2002; Ma-Wyatt et al. 2005; Aydin et al. 2011; Hein and Moore 2012). Like many other Gestalt grouping
phenomena, spatiotemporal grouping is governed by multivariate complex processes (see the demos
TP Feature Bias, TP Element Motion, TP Group Motion, TP Complex Configuration Long ISI, TP
Complex Configuration Short ISI).

Form–Motion Interactions
How local form information influences the perception of motion
The apparent motion stimulus lends itself nicely to the study of form–motion interactions (for other
examples of form motion interactions see Blair et al., this volume). Remember that Zeno claimed
that motion is an illusion created by the observer in order to reconcile the existence of an object
at two different spatial locations at two different instants of time. The observer would compare the
two stimuli from memory and if a suitable match is found a phenomenal identity will be attributed
to these two stimuli as two instances of the same object. Perceived motion from one object to the
other would signal the conclusion that these two objects are one and the same. Thus, according to
this view, form analysis is a precursor of motion perception and the match of the form of the two
objects is a prerequisite for motion perception. This can be tested directly by creating an apparent
motion stimulus where the shapes presented in the two frames are different (Figure 23.3; see also
the demo ‘AM—different shapes’). Many such experiments have been carried out showing that
form has little effect on the perception of apparent motion, i.e. motion percepts between the two
stimuli are strong (Kolers 1972). In the example of Figure 23.3, one perceives the square morphing
into a circle along the path of apparent motion. That the shape of an object in apparent motion
should remain constant can, in general, be expected to hold only for small displacements. This is
because, the proximal stimulus is a two-dimensional projection of a three-dimensional object, and
during motion one experiences perspective changes resulting in different views of the object. It is
this very fact that Ternus used in defining the problem of phenomenal identity.
In the case of the example shown in Figure 23.2(B) there is no motion ambiguity and the interpre-
tation of an object whose form changes (presumably due to perspective change) appears to be a nat-
ural solution. What happens, however, if the correspondences in the display are more complex and
represent ambiguities such as the ones shown in Figure 23.2(C)? Results indicate that form informa-
tion (or in general feature information such as colour or texture) can be used to resolve ambigui-
ties in the case where there is physical overlap between elements of the two frames (Ternus–Pikler
displays; see for example the demo ‘TP—feature bias’) but this influence becomes weaker when the
overlap is reduced and the distance between the elements is increased (Hein and Cavanagh 2012).
Taken together, all these results indicate that motion and form are separate but interacting systems.

How local motion information influences the perception of form


Having answered the question of how local form information can influence motion perception,
one can ask the converse question, namely how can local motion information influence form per-
ception? Figure 23.2D) shows one of Ternus’ displays where in each static display consists of dots
grouped into global shapes. One can see a vertical line and a diamond shape which are moved left
to right and right to left, respectively. However, the strength of the static groups cannot predict the
perceived forms in motion; i.e. the percept in Figure 23.2(D) does not correspond to a line mov-
ing right and a diamond moving left. Instead, at short ISIs, the three horizontally aligned central
dots appear stationary while the outer dots appear to move rightwards. For longer ISIs, the percept
Apparent Motion and Reference Frames 493

(a)

(b)
a b c

Fig. 23.3  (a) Two stimulus configurations studied by Duncker. The top diagrams represent the
stimuli and the bottom ones depict the corresponding percepts. Left panels: induced motion.
Right panels: rolling wheel illusion. (b) An example illustrating Johansson’s vector decomposition
principles: a, the stimulus; b, the decomposition of the motion of the central dot so as to identify
common vector components for all three dots; c, the resulting percept.

appears to be that of a single object rotating 180 degrees in three dimensions (Ternus 1926). Note
that in these complex displays, multiple possible correspondences of motion exist (e.g. Dawson and
Wright 1994; Otto et al. 2008) and the percept may vary from subject to subject, or even from trial
to trial for the same subject. The reader can experiment with the demo ‘TP complex configuration’.
Having established that form and motion information interact, the next question is to under-
stand how. Combining signals from form and motion systems requires a common basis upon
which they can be expressed. In other words, what is the reference frame that allows interac-
tions between these two systems? We will proceed first by discussing reference frames within the
motion system and then by extending these reference frames to form computations.

Reference Frames
Relativity of motion and reference frames
The work of Gestalt psychologist Karl Duncker was instrumental in highlighting the importance
of reference frames in perception (Duncker 1929; for review see Wallach 1959; Mack, 1986). In
one of his experiments, he presented a small stimulus embedded in a larger one (Figure 23.3A,
left panel). He moved the large surrounding stimulus while keeping the smaller one stationary.
Observers perceived the smaller stimulus as moving in the direction opposite to the physical
motion of the surrounding stimulus (for a recent paper with demos see Anstis and Casco 2006).
To account for this illusory induced motion, he proposed that the larger surrounding stimulus
served as the reference frame against which the position of the embedded stimulus is computed.
494 ÖĞMEN AND HERZOG

The right panel of Figure 23.3(A) shows another configuration studied by Duncker, the ‘rolling
wheel’. If a light dot stimulus is placed on the rim of a wheel rolling in the dark, the perceived
trajectory of this dot is cycloidal. If a second dot at the centre of the wheel is added to the display,
one perceives the central dot to move in a linear trajectory and the dot on the rim is perceived
to rotate around the central dot. In other words, the central dot serves as a reference against
which the motion of the second dot is computed (for demos on the relativity of motion using the
Ternus–Pikler paradigm, the reader is referred to Boi et al. 2009).
To explain these effects, Johansson (1973) proposed a theory of vector analysis based on three
principles. The first principle states that elements in motion are always perceptually related to each
other. According to his second principle, simultaneous motions in a series of proximal elements
perceptually connect these elements into rigid perceptual units. Finally, when the motion vectors
of proximal elements can be decomposed to produce equal and simultaneous motion compo-
nents, per the second principle, these components will be perceptually united into the percept of
common motion. Figure 23.3B) illustrates these concepts. Figure 23.3B-a) shows the stimulus. By
the first principle, the movements of these dots are not perceived in isolation but are related to
each other. By the second principle, the top and bottom dots are connected together as a single
rigid unit moving together horizontally. By the third principle, a horizontal component equal to
and simultaneous with the horizontal motion of the top and bottom dots is extracted from the
motion of the central dot (Figure 23.3B-b). The resulting percept is the horizontal movement
of three dots during which the central dot moves up and down between the two flanking dots
(Figure 23.3B-c) (Johansson 1973).
In a more natural setting, the distal stimulus generates a complex optic flow pattern on the
retina. For example, while watching a street scene, one perceives the background (shops, houses,
etc.) as stationary, the cars and pedestrians as moving with respect to this stationary background,
and the legs and arms of pedestrians as undergoing periodic motion with respect to their body,
their hands moving with respect to the moving arms, etc. Thus, the stimulus can be analysed as
a hierarchical series of moving reference frames, and motions are perceived with respect to the
appropriate reference frame in the hierarchy (e.g. the hand with respect to the arm, the arm with
respect to the body). While powerful and intuitively appealing, the basic principles of this theory
are not sufficient to specify unambiguously how vectors will be decomposed in complex natu-
ralistic stimuli. In fact, a vector can be expressed as the sum of infinitely many pairs of vectors,
and it is not clear a priori how to predict which combination will prevail for complex stimuli.
The difficulty faced here is similar to the one encountered when we attempt to apply the Gestalt
‘laws’ derived from simple stimuli to complex stimuli. To address this issue, Gestaltists proposed
the ‘law of Prägnanz’ (or the law of good Gestalt) which states that among the different possible
organizations, the one that is the ‘simplest’ is the one that will prevail (Koffka 1935; Cutting and
Proffitt 1982; for a review see van der Helm, this volume). However, the criterion for ‘simplest’
remains arbitrary and elusive. The same concept has been adopted by other researchers who
tried to quantify the simplicity of organizations. For example, Restle (1979) adopted the coding
theory in which different solutions are expressed as quantifiable ‘codes’. A  stimulus undergo-
ing circular motion can be described by three parameters: amplitude, phase, and wavelength.
Restle used the number of parameters describing a configuration as the ‘information load’ and
predicted that the configuration with the lowest information load would be the preferred (i.e.
perceived) configuration. Dawson (1991) used a neural network to combine three heuristics in
solving the correspondence problem. However, these approaches all suffer from the same general
problems: as acknowledged by Restle, the method does not have an automatic way to generate all
Apparent Motion and Reference Frames 495

possible interpretations. Moreover, the choice of parametrization and its generality, the heuris-
tics, their benefit and costs as well as the optimization criteria remain arbitrary.

Object file theory


Kahneman and colleagues addressed the problem of phenomenal identity by adapting two
concepts from computer science, namely addresses and files (Kahneman et al. 1992). The fun-
damental building blocks of their theory are ‘object files’, each containing information about
a given object. These files establish and maintain the identities of objects. According to their
theory, an object file is addressed not by its contents but by the location of the object at a
given time.2 This location-based index is a type of reference frame discussed in the section on
Relativity of motion and reference frames. However, by restricting the file addressing mecha-
nism to a spatial location, this theory faces many shortcomings. In the object file theory, fea-
tures are available on an instant-by-instant basis and get inserted into appropriate files. On
the other hand, feature processing takes time. Without specifying the dynamics of feature
processing, the theory ends up in a bootstrapping vicious circle. When and how is the opening
of an object file triggered? Since an object is defined by features, initial evidence for opening a
file for an object necessitates that at least some of the relevant features of the object are already
processed; however, the processing of features for a specific object requires that a file for that
object is already opened.
Typical experiments used within the context of the object file theory include static preview
conditions whose ‘main end product [. . .] is a set of object files’ (Kahneman et al. 1992). However,
under normal viewing conditions objects often appear from our peripheral field or behind occlu-
sions, necessitating mechanisms that can operate in the absence of static preview conditions.
Another problem with object file theory is that while vision has geometry, ‘files’ do not specify
a geometric structure. Objects have a spatial extent and thus the location of an object cannot be
abstracted from its features. Assume that the centroid of an object is used as its location index.
To put features in the file indexed by this location, one needs to know not just one location
index but the retinotopic extent of the object, which in turn necessitates surface and bound-
ary features. Moreover, as we will discuss below (Feature attribution and occlusion problems),
objects may occlude each other. The insertion of correct features to correct object files cannot be
accomplished by location indices alone, information on spatial extent and occlusion needs to be
represented as well.
In sum, while all this work highlights the importance of motion grouping and motion-based
reference frames, a deeper understanding of why the visual system needs reference frames may
provide the constraints necessary to determine how and why reference frames are established.

The Need for Reference Frames


The problems of motion blur and moving ghosts
In order to appreciate why reference frames are needed, consider first the fact that humans are
mobile explorers and interact constantly with other moving objects. The input to our visual system

  A similar concept was also proposed by Pylyshyn in his FINST theory (Pylyshyn 1989). Several extensions
2

and variants of the object file theory have been proposed, including the detailed analysis of object updating
(Moore and Enns 2004; Moore et al. 2007) and hierarchies in object structures (Lin and He 2012).
496 ÖĞMEN AND HERZOG

is conveyed following the optics of the eye. The mechanism of image formation can be described
by projective geometry. Neighbouring points in the environment are imaged on neighbouring
photoreceptors in the retina. The projections from retina to early visual cortical areas preserve
these neighbourhood relationships creating a retinotopic representation of the environment. To
analyse the impact of motion on these representations we need to consider the dynamical proper-
ties of the visual system.
A fundamental dynamical property of vision is visible persistence: Under normal viewing con-
ditions, a briefly presented stationary stimulus remains visible for approximately 120 ms after
its physical offset (e.g. Haber and Standing 1970; Coltheart 1980). Based on this duration of
visible persistence, we would expect moving objects to appear highly blurred. For example, a
target moving at 10 degrees per second would generate a trailing smear of 1.2 degrees. The situ-
ation is similar to taking pictures of moving objects with a film camera at an exposure dura-
tion that mimics visible persistence. Not only do the moving objects exhibit extensive motion
smear, they also have a ghost-like appearance without any significant form information. This is
because static objects remain for long enough on a fixed region of the film to expose the chemi-
cals sufficiently while moving objects expose each part of the film only briefly, thus failing to
provide sufficient exposure to any specific part of the film. Similarly, in retinotopic representa-
tions, a moving object will stimulate each retinotopically localized receptive field briefly, and
incompletely processed form information would spread across the retinotopic space just like the
ghost-like appearances in photographs (Öğmen 2007). Unlike photographic images, however,
in human vision objects in motion typically appear relatively sharp and clear (Ramachandran
et al. 1974; Burr 1980; Burr et al. 1986; Bex et al. 1995; Westerink and Teunissen 1995; Burr and
Morgan 1997; Hammett 1997).
In normal viewing, we tend to track moving stimuli with pursuit eye movements and thereby
stabilize them on the retina. While pursuit eye movements can help reduce the perceived blur of
a moving object (Bedell and Lott 1996), the problem of motion blur remains for other objects
present in the scene, since we can pursue only one object at a time. Eye movements also cause
a retinotopic movement for the stationary background, creating the blur problem for the back-
ground. Furthermore, the initiation of an eye movement can take about 150–200 ms during which
a moving object can generate considerable blur. How does the visual system solve the problems
of motion blur and moving ghosts? A potential solution to the motion blur problem is the use of
mechanisms that inhibit motion smear in retinotopic representations (Öğmen 1993, 2007; Chen
et al. 1995; Purushothaman et al. 1998). A potential solution to the moving ghosts problem is the
use of reference frames that move along with moving objects rather than being anchored in reti-
notopic coordinates (Öğmen 2007).

The problems of dynamic occlusions and feature attribution


When an object moves, a variety of dynamic occlusions occur. The object occludes different parts
of the background and, depending on depth relations, either occludes or gets occluded by other
objects in the scene. Moreover, as its perspective view changes with respect to the observer, its vis-
ible features also change due to self-occlusion. All these dynamic considerations lead to two inter-
related questions: First, as highlighted by Ternus, how does the object maintain its identity despite
the changes in its features? Second, due to these occlusions, features of different objects become
dynamically entangled. How does the visual system attribute features to the various objects in a
consistent manner? As discussed in the previous sections, a possible solution to maintain object
identities is to establish motion correspondences and to arrange the resulting motion vectors as
Apparent Motion and Reference Frames 497

(a) (b)

a
a b
c
b d

(c)
time

Fig. 23.4  Stimulus arrangement used by (a) McDougall (1904) corresponding to metacontrast,


(b) Piéron (1935) corresponding to sequential metacontrast, and (c) Otto et al. (2006) to analyse
feature attribution in sequential metacontrast.

a hierarchical set of reference frames. These exo-centred reference frames3 establish and maintain
the identity of objects in space and time. As we discuss in the section Non-retinotopic Feature
Attribution, these reference frames can also provide the basis for feature attribution.

Non-retinotopic Feature Attribution


Sequential metacontrast and non-retinotopic feature attribution
The earliest studies of motion blur and deblurring can be traced back to McDougall (1904) and
Piéron (1935). Figure 23.4. depicts the stimulus arrangements used by these researchers. As men-
tioned in Section The Problems of Motion Blur and Moving Ghosts, the motion blur generated
by a moving stimulus can be ‘deblurred’ by inhibitory mechanisms in retinotopic representations.
In fact, McDougall reported that the blur generated by the leading stimulus ‘a’ in Figure 23.4(A)
could be curtailed by adding a second stimulus, labelled ‘b’ in Figure 23.4(A) in spatiotemporal
proximity. The specific type of masking where the visibility of a target stimulus is suppressed by
a spatially non-overlapping and temporally lagging stimulus is called metacontrast (Bachmann
1994; Breitmeyer and Öğmen 2006).

3  Reference frames can be broadly classified into two types: ego-centred reference frames are those centred on
the observer (e.g. eye-centred, head-centred, limb-centred); exo-centred reference frames are those centred
outside the observer (e.g. centred on an object in a scene).
498 ÖĞMEN AND HERZOG

Piéron (1935) modified McDougall’s stimulus to devise a ‘sequential’ version as shown in


Figure 23.4(B). This sequential stimulus provides a temporally extended apparent motion and
metacontrast stimulus that can be used to illustrate the phenomenon of motion deblurring.
It can also be used to study the feature attribution problem. Figure 23.4C) shows a version of
sequential metacontrast where the central line contains a form feature: a small Vernier offset is
introduced by shifting the upper segment of the line horizontally with respect to the lower seg-
ment (Otto et al. 2006). In this stimulus, the central line containing the Vernier offset is invisible
to the observer because it is masked by the two flanking lines. One perceives two streams of
motion, one to the left and one to the right. The question of feature attribution is the follow-
ing: what happens to the feature presented in the central invisible element of the display? Will
it also be invisible, or will it be attributed to motion streams? The results of experiments using
various versions of this sequential metacontrast stimulus show that features of the invisible
stimuli are attributed to motion streams and integrated with other features presented within
each individual motion stream. In other words, features are processed according to reference
frames that move according to the motion vector of each stream (Otto et al. 2006, 2008, 2009,
2010a, 2010b).

Ternus–Pikler displays and non-retinotopic feature


attribution in the presence of retinotopic conflict
Ternus–Pikler displays are designed to directly pit retinotopic relations against non-retinotopic
grouping relations. This property offers the advantage of directly assessing whether features are
processed according to retinotopic or grouping relations (Öğmen et al. 2006). Figure 23.5. shows
an example of how the Ternus–Pikler display is used for studying feature attribution. As a fea-
ture, a Vernier offset, called the ‘probe Vernier’ is inserted to the central element of the first
frame (Figure 23.5). Observers were asked to report the perceived offset direction for elements
in the second frame, numbered 1, 2, and 3 in the left-hand part of Figure 23.5(D). None of these
elements contained a Vernier offset and naïve observers did not know where the probe Vernier
was located. Consider first the control condition in Figure 23.5(E), obtained by removing the
flanking elements from the two frames. In this case no motion is perceived. Based on retinotopic
relations, the probe Vernier should be integrated with element 1 in the second frame and the
agreement of observers’ responses with the direction of probe-Vernier offset should be high for
element 1 and low for element 2. If processing of the Vernier were to occur according to reti-
notopic relations, one would predict the same outcome for the Ternus–Pikler display regardless
of whether element or group motion is perceived. On the other hand, if feature processing and
integration take place according to motion grouping relations (Figure 23.5B, C), instead of reti-
notopic relations, one would expect the probe Vernier to integrate with element 1 in the case of
element motion (Figure 23.5B) and with element 2 in the case of group motion (Figure 23.5C).
The results of this experiment along with those conducted with a more complex combination of
features show that form features are computed according to motion grouping relations, in other
words, according to a reference frame that moves according to prevailing motion groupings in
the display (Öğmen et al. 2006).
In follow-up studies, this paradigm has been applied to other visual computations and it has been
shown that form, motion, visual search, attention, and binocular rivalry all have non-retinotopic
bases (Boi et al. 2009, 2011b). Non-retinotopic computation of various stimulus features has also
been supported by other paradigms using motion stimuli (Shimozaki et al. 1999; Nishida 2004;
Nishida et al. 2007; Kawabe 2008) or attentional tracking (Cavanagh et al. 2008). On the other
(a) Ternus-Pikler display (b) Element motion

Frame 1

ISI = 0
ISI (blank screen)
(c) Group motion

Frame 2

ISI = 100

(d) Stimulus and corresponding results


with the probe vernier (%)
ISI = 0 ms
Responses in agreement

100
Probe-vernier
ISI = 100 ms
80

60

40
1 2 3 1 2 3
Label of attended line

(e) Control stimulus and corresponding results


with the probe vernier (%)
Responses in agreement

Probe-vernier 100 ISI = 0 ms


ISI = 100 ms
80

60

1 2 40
1 2 3
Label of attended line
Fig. 23.5  The Ternus–Pikler display (a) and the associated percepts of ‘element motion’. Reprinted
from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 1a
Copyright (2006), with permission from Elsevier. (b) and ‘group motion’. (c). The dashed arrows
in panels B and C depict the perceived motion correspondences between the elements in the two
frames. Experimental results for Ternus–Pikler stimulus. Reprinted from Vision Research, 46 (19),
Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual grouping induces non-retinotopic
feature attribution in human vision, pp. 3234–42, Figure 1c Copyright (2006), with permission from
Elsevier. (d) and the control stimulus. Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas
U. Otto, and Michael H. Herzog, Perceptual grouping induces non-retinotopic feature attribution in
human vision, pp. 3234–42, Figure 2a Copyright (2006), with permission from Elsevier. (e).
Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 1b Copyright (2006),
with permission from Elsevier.
Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 2c Copyright (2006),
with permission from Elsevier.
500 ÖĞMEN AND HERZOG

hand, not all processes are non-retinotopic; motion and tilt adaptation have been found to be
retinotopic (Wenderoth and Wiese 2008; Knapen et al. 2009; Boi et al. 2011a) indicating that they
are by-products of computations occurring prior to the transfer of information from retinotopic
to non-retinotopic representations.

Concluding Remarks
Motion is ubiquitous in the ecological environment and most biological systems devote extensive
neural processing to its analysis. This importance has been recognized by philosophers and sci-
entists who have carried out extensive studies on how motion is processed and perceived. While
there has been convergence in the types of computational models that can detect motion, the
broader issue of how motion is organized as a spatiotemporal Gestalt remains a challenging ques-
tion. The discovery of the relativity of motion led to the introduction of hierarchical reference
frames according to which part–whole relations can be constructed. This chapter has provided a
review of why reference frames are needed from ecological and neurophysiological (retinotopic
organization) perspectives. These analyses show that reference frames are needed not just for
motion computation but for all stimulus attributes. We expect future research to develop in more
depth the properties of these reference frames which will provide a common geometry wherein
all stimulus attributes can be processed jointly.

References
Alais, D. and J. Lorenceau (2002). ‘Perceptual grouping in the Ternus display: Evidence for an ‘association
field’ in apparent motion’. Vision Res 42: 1005–1016.
Albright, T. D. and G. R. Stoner (1995). ‘Visual motion perception’. Proc Natl Acad Sci USA 92: 2433–2440.
Anstis, S. and C. Casco (2006). ‘Induced movement: the flying bluebottle illusion’. J Vision
10(8): 1087–1092.
Aydin, M., M. H. Herzog, and H. Öğmen (2011). ‘Attention modulates spatio-temporal grouping’. Vision
Res 51: 435–446.
Bachmann, T. (1994). Psychophysiology of Visual Masking: the Fine Structure of Conscious Experience
(New York: Nova Science Publishers).
Barlow H. B. and W. R. Levick (1965). ‘The mechanism of directionally selective units in rabbit’s retina’.
J Physiol 178: 477–504.
Bedell, H. E. and L. A. Lott (1996). ‘Suppression of motion-produced smear during smooth-pursuit
eye-movements’. Curr Biol 6: 1032–1034.
Bex, P. J., G. K. Edgar, and A. T. Smith (1995). ‘Sharpening of blurred drifting images’. Vision Res 35: 2539–2546.
Boi, M., H. Öğmen, J. Krummenacher, T. U. Otto, and M. H. Herzog (2009). ‘A (fascinating) litmus test for
human retino- vs. non-retinotopic processing’. J Vision 9(13): 5.1–11; doi: 10.1167/9.13.5.
Boi, M., H. Öğmen, and M. H. Herzog (2011a). ‘Motion and tilt aftereffects occur largely in retinal, not in
object coordinates, in the Ternus–Pikler display’. J Vision 11(3): 7.1–11; doi: 10.1167/11.3.7, 2011.
Boi M., M. Vergeer, H. Öğmen, and M. H. Herzog (2011b). ‘Nonretinotopic exogenous attention’. Curr Biol
21: 1732–1737.
Breitmeyer, B. G. and Öğmen, H. (2006). Visual Masking: Time Slices through Conscious and Unconscious
Vision, 2nd edn (Oxford University Press: Oxford).
Breitmeyer, B. G. and A. Ritter (1986a). ‘The role of visual pattern persistence in bistable stroboscopic
motion’. Vision Res 26: 1801–1806.
Breitmeyer, B. G. and A. Ritter (1986b). ‘Visual persistence and the effect of eccentric viewing, element
size, and frame duration on bistable stroboscopic motion percepts’. Percept Psychophys 39: 275–280.
Apparent Motion and Reference Frames 501

Burr, D. (1980). ‘Motion smear’. Nature 284: 164–165.


Burr, D. C. and M. J. Morgan (1997). ‘Motion deblurring in human vision’. Proc. R. Soc. Lond. B
264: 431–436.
Burr, D. C., J. Ross, and M. C. Morrone (1986). ‘Seeing objects in motion’. Proc. R. Soc. Lond. B
227: 249–265.
Casco, C. and D. Spinelli (1988). ‘Left-right visual field asymmetry in bistable motion perception’.
Perception 17: 721–727.
Cavanagh, P., A. O. Holcombe, and W. Chou (2008). ‘Mobile computation: spatiotemporal integration of
the properties of objects in motion’. J Vision 8(12): article 1; doi: 10.1167/8.12.
Chen, S., H. E. Bedell, and H. Öğmen (1995). ‘A target in real motion appears blurred in the absence of
other proximal moving targets’. Vision Res 35: 2315–2328.
Coltheart, M. (1980). ‘Iconic memory and visible persistence’. Percept Psychophys 27: 183–228.
Cutting, J. E. and D. R. Proffitt (1982). ‘The minimum principle and the perception of absolute, common,
and relative motions’. Cogn Psychol 14: 211–246.
Dawson, M. R. W. (1991). ‘The how and why of what went where in apparent motion: modeling solutions
to the motion correspondence problem’. Psychol Rev 98: 569–603.
Dawson, M. R. W. and R. D. Wright (1994). ‘Simultaneity in the Ternus configuration: psychophysical data
and a computer model’. Vision Res 34: 397–407.
Dawson, M. R. W., N. Nevin-Meadows, and R. D. Wright (1994). ‘Polarity matching in the Ternus
configuration’. Vision Res 34: 3347–3359.
Duncker, K. (1929). ‘Über induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung)’. Psychol Forsch 12: 180–259.
Exner, S. (1875). ‘Experimentelle Untersuchungen der einfachsten psychischen Prozesse’. Pflugers Arch
Gesamte Physiol 11: 403–432.
Gepshtein, S. and M. Kubovy (2007). ‘The lawful perception of apparent motion’. J Vision 7(8): 9.1–15.
Haber, R. N. and L. Standing (1970). ‘Direct estimates of the apparent duration of a flash’. Can J Psychol 24:
216–229.
Hammett, S. T. (1997). ‘Motion blur and motion sharpening in the human visual system’. Vision Res
37: 2505–2510.
Hassenstein, B. and W. Reichardt (1956). ‘Systemtheoretische Analyse der Zeit, Reihenfolgen, und
Vorzeichenauswertung bei der Bewegungsperzepion des Rüsselkäfers Chlorophanus’. Z Naturforsch
11b: 513–524.
He, Z. J. and T. L. Ooi (1999). ‘Perceptual organization of apparent motion in the Ternus display’. Perception
28: 877–892.
Hein, E. and P. Cavanagh (2012). ‘Motion correspondence in the Ternus display shows feature bias in
spatiotopic coordinates’. J Vision 12(7): pii: 16; doi: 10.1167/12.7.16.
Hein E. and C. M. Moore (2012). ‘Spatio-temporal priority revisited: the role of feature identity and
similarity for object correspondence in apparent motion’. J Exp Psychol: Human Percept Perform
38: 975–988.
Johansson, G. (1973). ‘Visual perception of biological motion and a model for its analysis’. Percept
Psychophys 14: 201–211.
Johansson, G. (1975). ‘Visual motion perception’. Sci Am 232: 76–88.
Johansson, G. (1976). ‘Spatio-temporal differentiation and integration in visual motion perception’. Psychol
Res 38: 379–393.
Kahneman, D., A. Treisman, and B. J. Gibbs (1992). ‘The reviewing of object files: object-specific
integration of information’. Cogn Psychol 24: 174–219.
Kawabe, T. (2008). ‘Spatiotemporal feature attribution for the perception of visual size’. J Vision 8(8): 7.1–9;
doi: 10.1167/8.8.7.
502 ÖĞMEN AND HERZOG

Knapen T., Rolfs M., and Cavanagh P. (2009). ‘The reference frame of the motion aftereffect is retinotopic’.
J Vision 9(5):16, 1–7.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt).
Kolers, P. A. (1972). Aspects of Motion Perception (Oxford: Pergamon Press).
Korte, A. (1915). ‘Kinematoskopische Untersuchungen’. Z Psychol 72: 194–296.
Lin, Z. and S. He (2012). ‘Automatic frame-centered object representation and integration revealed by
iconic memory, visual priming, and backward masking’. J Vision 12(11): pii: 24;
doi: 10.1167/12.11.24
Lu, Z.-L. and G. Sperling (2001). ‘Three-systems theory of human visual motion perception: review and
update’. J Opt Soc Am A 18: 2331–2370.
Ma-Wyatt, A., C. W. G. Clifford, and P. Wenderoth (2005). Contrast configuration influences grouping in
apparent motion. Perception 34: 669–685.
Mack, A. (1986). ‘Perceptual aspects of motion in the frontal plane’. In Handbook of Perception and Human
Performance, edited by K. R. Boff, L. Kaufman, and J. P. Thomas (New York: Wiley), pp. 17-1–17-38.
McDougall, W. (1904). ‘The sensations excited by a single momentary stimulation of the eye’. British Journal
of Psychology, 1: 78–113.
Moore, C. M. and J. T. Enns (2004). ‘Object updating and the flash-lag effect’. Psychol Sci 15: 866–871.
Moore, C. M., J. T. Mordkoff, and J. T. Enns (2007). ‘The path of least persistence: object status mediates
visual updating’. Vision Res 47: 1624–1630.
Neuhaus, W. (1930). ‘Experimentelle Untersuchung der Scheinbewegung’. Arch Gesamte Psychol 75:
315–458.
Nishida, S. (2004). ‘Motion-based analysis of spatial patterns by the human visual system’. Curr Biol
14: 830–839.
Nishida, S., J. Watanabe, I. Kuruki, and T. Tokimoto (2007). ‘Human visual system integrates color signals
along a motion trajectory’. Curr Biol 17: 366–372.
Öğmen, H. (1993). ‘A neural theory of retino-cortical dynamics’. Neural Networks, 6: 245–273.
Öğmen, H. (2007). ‘A theory of moving form perception: Synergy between masking, perceptual grouping,
and motion computation in retinotopic and non-retinotopic representations’. Advances in Cognitive
Psychology, 3: 67–84.
Öğmen, H., T. Otto, and M. H. Herzog (2006). ‘Perceptual grouping induces non-retinotopic feature
attribution in human vision’. Vision Res 46: 3234–3242.
Otto, T. U., H. Öğmen, and M. H. Herzog (2006). ‘The flight path of the phoenix-the visible trace of
invisible elements in human vision’. J Vision 6: 1079–1086.
Otto, T. U., H. Öğmen, and M. H. Herzog (2008). ‘Assessing the microstructure of motion correspondences
with non-retinotopic feature attribution’. J Vision 8(7): 16.1–15; doi: 10.1167/8.7.16.
Otto, T. U., H. Öğmen, and M. H. Herzog (2009). ‘Feature integration across space, time, and orientation’.
J Exp Psychol: Human Percept Perform 35: 1670–1686.
Otto, T. U., H. Öğmen, and M. H. Herzog (2010a). ‘Attention and non-retinotopic feature integration’.
J Vision 10: 8.1–13; doi: 10.1167/10.12.8.
Otto, T. U., H. Öğmen, and M. H. Herzog (2010b). ‘Perceptual learning in a nonretinotopic frame of
reference’. Psychol Sci 21(8): 1058–1063.
Pantle, A. J. and J. T. Petersik (1980). ‘Effects of spatial parameters on the perceptual organization of a
bistable motion display’. Percept Psychophys 27: 307–312.
Pantle, A. and L. Picciano (1976). ‘A multistable movement display: evidence for two separate motion
systems in human vision’. Science 193: 500–502.
Piéron, H. (1935). ‘Le processus du métacontraste’. J Psychol Normale Pathol 32: 1–24.
Pikler, J. (1917). Sinnesphysiologische Untersuchungen (Leipzig: Barth).
Apparent Motion and Reference Frames 503

Purushothaman, G., H. Öğmen, S. Chen, and H. E. Bedell (1998). ‘Motion deblurring in a neural network
model of retino-cortical dynamics’. Vision Res 38: 1827–1842.
Pylyshyn, Z. (1989). ‘The role of location indexes in spatial perception: a sketch of the FINST spatial-index
model’. Cognition 32: 65–97.
Ramachandran, V. S., V. M. Rao, and T. R. Vidyasagar (1974). ‘Sharpness constancy during movement
perception’. Perception 3: 97–98.
Restle, F. (1979). ‘Coding theory of the perception of motion configurations’. Psychol Rev 86: 1–24.
Shimozaki S. S., M. P. Eckstein, and J. P. Thomas (1999). ‘The maintenance of apparent luminance of an
object’. J Exp Psychol: Human Percept Perform 25: 1433–1453.
Ternus, J. (1926). ‘Experimentelle Untersuchung über phänomenale Identität’. Psychol Forsch 7: 81–136.
Wallach, H. (1959). ‘The perception of motion’. Sci Am 201: 56–60.
Wenderoth P. and Wiese M. (2008). ‘Retinotopic encoding of the direction aftereffect’. Vision Research
48:1949–1954.
Wertheimer, M. (1912). ‘Experimentelle Studien uber das Sehen von Bewegung’. Z Psychol 61: 161–265.
Westerink J. H. D. M. and K. Teunissen (1995). ‘Perceived sharpness in complex moving images’. Displays
16: 89–97.
Chapter 24

Perceptual organization and


the aperture problem
Nicola Bruno and Marco Bertamini

Introduction: the ambiguity
of local motion signals
We live in a world of objects that move. To perceive them, the visual system must use information
in the motion signals available in the spatiotemporal structure of the optic array. These motion
signals, however, are inherently ambiguous. Thus, to perceive moving objects human perception
cannot simply record sensory signals. To overcome ambiguity (underdeterminacy) and to achieve
a coherent global interpretation, sensory motion signals must be combined across space and time.
In this chapter, we review strategies for performing such combination. We argue that the combi-
nation of motion signals cannot be reduced to relatively simple vector operations, such as aver-
aging or intersecting constraints in velocity space, but is instead a complex form of perceptual
organization, which dynamically takes into account the spatial structure of the stimulus. To set
the stage for our discussion of motion organization, we begin with a brief account of the two main
sources of local ambiguity in motion signals: the aperture problem (AP) and the edge classifica-
tion problem (ECP).

The Aperture Problem


Pleikart Stumpf is credited with first describing the AP in motion perception (see Todorovic
1996). However, the first analysis of the many facets of the problem was provided by Hans
Wallach (Wuerger et al. 1996). The AP refers to the fundamental ambiguity of the signals that
are available locally from a moving homogeneous straight contour. Consider an infinitely long
contour translating within the visual field. For any point on the contour, any motion signal can
be thought of as the sum of two component vectors: a component in the direction orthogonal
to the orientation of the contour, and a second component along the contour itself. Because
the contour is locally featureless, this second component will not be available as spatiotem-
poral change in the optic array. This has two consequences. First, only the component in the
direction orthogonal to the contour will be available (Figure 24.1a). Second, an infinite set of
physical motions will map onto one, and the same, motion signal at local points on the contour
(Figure 24.1b). The argument can be readily generalized to curved contours or curved trajec-
tories. In this case, the local curvilinear motion can be decomposed into a component along
the tangent to the curve and a component orthogonal to the tangent (see Hildreth 1983). The
argument can be also generalized to multiple local signals in natural images (Kane et al. 2011)
and to other sensory channels. For instance, the AP holds for tactile motion passively perceived
on the skin (Pei et al. 2008).
Perceptual Organization and the Aperture Problem 505

(a) (b)

(c) (d)

Fig. 24.1  The ambiguity of local motion signals. (a) Consider two contours moving in different
directions relative to the environment (e.g. horizontally and vertically, see black vectors). The
physical motions are the sum of components along the direction of the contour and in the direction
orthogonal to the contour (grey vectors). Because the contour is locally featureless, the component
along the contour cannot be recorded. Thus only the component orthogonal to the contour will
be available and the two physical motions will be indistinguishable (apright.mov, apdown.mov).
(b) In fact, an infinite class of physical motions having different speed and direction (dashed) will
be available as the same motion signal (black orthogonal vector). The orientation of the contour
defines a constraint line (CL) in velocity space. (c) An additional ambiguity arises when the contour
is interpreted as the border of a surface. Consider an orthogonal motion signal at a local point on a
contour. The signal could be due to the left surface progressively covering the background (visible to
its right), to a right surface progressively uncovering a background (visible to its left), or to a circular
hole moving over a stationary edge in the opposite direction. (d) Finally, when two borders meet to
form a T-junction, the local motion signal at the junction is along the hat of the T rather than in the
direction orthogonal to the moving contour.

The Edge Classification Problem


What we call the ECP stems from the need to map local signals on contours to a representation in
terms of oriented surfaces bounded by those contours. This is a deeper problem (Hildreth 1983),
inasmuch as it connects local motion ambiguity to other issues in motion perception. One such
issue is the computation of the global optical velocity field (‘optic flow’) due to motion of the
506 Bruno and Bertamini

viewpoint in the three-dimensional environment. Other issues include the perception of struc-
ture from motion (see Vezzani et al., this volume), and the analysis of moving edges in shadows,
shading, and highlights. In this chapter, we limit our discussion to organization in 2D and to the
segmentation of the scene into figures and grounds. When applied to this domain, the ECP refers
to the fact that the same local motion signal can be attributed to a leading surface edge (progres-
sively covering a background) or to a trailing edge (progressively revealing a background). This
distinction implies a classification of the edge in relation to the surface that owns it within the glo-
bal segmentation of the scene into figure and ground. In the example of Figure 24.1c, the leading
edge interpretation implies that the left surface is the figure and the edge belongs to it; the trailing
edge interpretation, conversely, implies that the right surface is the figure.
Edge classification in turn has consequences for the organization of local motions in relation
to a hierarchy of frames of reference, a topic that we address later in this chapter. Referring again
to the example, the leading edge interpretation implies that the left surface is moving relative to a
background to its right; the trailing edge interpretation, conversely, that the right surface is mov-
ing relative to a background to its left. Additionally, in both interpretations the edge is moving
relative to a stationary aperture. As an alternative, the edge (either belonging to the left or to the
right surface) could be interpreted as stationary, and the aperture itself could be interpreted as
moving relative to the edge and the two surfaces. Thus the same motion signal can be attributed
to either surface or to neither, depending on which region of the scene is interpreted as figure and
which as ground. Contemporary research has begun to reveal constraints and biases that may play
a role in solving this form of the ECP (Barenholz and Tarr 2009).
An important aspect of the ECP is related to surface edges that meet other edges to form a
T-junction (Figure 24.1d). In these cases, the motion signal at the junction is not orthogonal to
the contour forming the stem of the T but moves along the contour forming the hat of the T. As we
shall see in Section 3, these local ‘terminator’ motion signals play an important part in the global
perception of the movement of contours, and are themselves weighted differently depending on
their classification as ‘intrinsic’ to the line (true endings of a moving object) or ‘extrinsic’ (acci-
dental alignments due to occlusion).

Two stages of motion processing in the brain


An appreciation of the extent to which the AP and the ECP constrain theorizing on the percep-
tual organization of motion can also be achieved by considering motion-processing mechanisms
in the brain. Beginning with the pioneering work of Hubel and Wiesel (1968), it has long been
known that a large proportion of neurons in primary visual area V1 respond best to contours
moving through their receptive fields in a particular direction, whereas their responses are inhib-
ited when contours move in the opposite direction. Different neurons respond best to different
directions, and all directions are represented. Thus, the ensemble of direction-tuned neurons in
V1 may be thought of as a neural network recording motion signals from spatiotemporal changes
in the optic array. Each individual neuron in the ensemble, however, has its own spatially limited
receptive field. These receptive fields can be construed as local apertures, and within these aper-
tures direction-selective neurons will respond most strongly in the direction orthogonal to the
moving contour, independent of its actual direction.
Beyond V1, it is generally recognized that a key role in motion processing is played by neu-
rons in V5, the human homologue of the monkey middle temporal area MT (Tootell et al. 1995).
Albright (1984) compared direction selectivity of neurons in V1 and in area MT of the macaque.
In area MT orientation-tuning is broader, and orientation preference is orthogonal to motion
Perceptual Organization and the Aperture Problem 507

preference, but in some cases it is parallel to it. In striate and extrastriate areas motion selectiv-
ity is secondary to direction selectivity (Gizzi et al. 1990). By contrast, in temporal areas there is
selectivity for global motion, defined as the motion of a whole pattern. When contours form a
pattern, neurons do not respond to the motion per se, but to the motion of the configuration as a
whole. Finally, several other visual areas are known to receive MT output, including areas coding
complex motions such as expansion and rotation (Tanaka and Saito 1989) and eye movements
(Schall 2000).
Although the functional interpretation of these networks remains the object of empirical inves-
tigation and theoretical debate (see Grossberg and Mingolla 1993; Grossberg 2011), it is clear
that higher-level motion processing in the human brain involves long-range, integrative interac-
tions. These interactions are thus quite consistent with the notion that global motion perception
involves sophisticated processes of organization and interpretation of the local signals to solve the
AP and ECP. In the following sections, we review some of these processes.

Structure-blind strategies for overcoming the AP


Several computational models have proposed strategies to solve the AP. The term ‘strategy’ of
course refers to computational rules in neural networks, not to explicit or conscious decisions. An
important strength of these models is that they are based only on bottom-up operations on local
motion signals. In other words, they do not require contributions from other bottom-up visual
mechanisms that code aspects of the global stimulus structure, such as those that achieve unit
formation and figure-ground stratification, process three- dimensional form, and hierarchically
organize motions in relation to multiple frames of reference. For this reason we refer to the strate-
gies adopted in these models as structure-blind strategies.

IOC, FT, and VA


Three structure-blind strategies for solving the AP have been proposed (Figure 24.2). The first and
earliest is the intersection of constraints (IOC) strategy (Adelson and Movshon 1982; Fennema
and Thompson 1979). Because of the AP, for each moving contour the direction of the orthogonal
component vector defines a line of constraints in velocity space for the corresponding physical
motions (see Figure 24.2a). The set of physical motion vectors that are consistent with the con-
straint line identifies the possible solutions for the AP. In a pattern with two contours, the intersec-
tion of the constraint lines of both contours identifies a unique vector common to both solutions
sets. This vector is the veridical motion of the pattern, assuming rigidity. The second strategy is
the feature-tracking (FT) strategy, which consists in tracking identifiable features of a moving
contour or contours (Alais et al. 1997). In a pattern consisting of the superposition of two gratings,
for instance, one such feature is the ‘X’ junction at the intersection of each contour. The motion
of these features also corresponds to the veridical motion of the pattern (Figures 24.2a). The third
one, finally, is the vector average (VA) strategy (Wilson et al. 1992). This consists in determin-
ing the vector that lies halfway between the two components vectors (Figure 24.2b). This vector
has often the same orientation (although not necessarily the same magnitude) as the IOC or FT
solutions. However, in some critical cases the VA solution can differ from the IOC–FT solutions.

Evidence from plaids


A large literature has put these three strategies to test using so-called ‘plaid’ patterns that result
from the sum of two sinusoidal gratings at different orientations. The critical evidence has come
508 Bruno and Bertamini

(a) V1
V1

FT

IOC

V2 V2

(b) V1
V1
VA

V2

IOC
V2 FT

Fig. 24.2  Three proposed solutions to the AP in plaid patterns. The intersection of constraints (IOC)
strategy consists in determining the unique vector that is consistent with both constraint lines of
the component motions. The feature tracking (FT) strategy consists in attributing to the global
pattern the motion of identifiable features such as the intersections between the component edges.
The vector average (VA) solution consists in computing the vector lying halfway between the two
components. (a) The IOC and FT strategies always yield the true pattern motion in a plaid, assuming
rigidity. (b) In Type-2 plaids, the VA solution can differ markedly from the IOC or FT solutions.

from the study of Type-2 plaids. Type-2 plaids have both component vectors lying on the same
side as the IOC resultant, such that the VA predictions differ markedly from those of the IOC–FT.
Perceived motion direction in Type-2 plaids has been reported to be biased toward the VA solu-
tion with short presentation times but to approach the IOC solution after a contrast-dependent
time lag (Yo and Wilson 1992). Similar results have been reported in plaids involving second-order
(i.e., texture boundary) motion signals (Wilson and Kim 1994; Cropper et al. 1994).
Type-2 plaids have also been used to assess the FT strategy. Alais et al. (1994) adapted partici-
pants to a translating Type-2 plaid (simultaneous adaptation condition) or to its alternately pre-
sented components (alternating adaptation). They found that perceived direction in the motion
after-effect reflected more the VA predictions after alternating adaptation, whereas it reflected
more the IOC–FT prediction after simultaneous adaptation. Because feature motion signals
were available when components were simultaneous, but not when they were alternated, these
results are consistent with a mechanism that retrieves the true plaid motion using FT. Follow
up experiments (Alais et al. 1997) have provided support for this conclusion by demonstrating
that both feature size and feature number modulate the bias in the FT direction. Overall, there-
fore, it seems that two mechanisms are involved in the perception of pattern motion in plaids, an
earlier integration mechanism that employs the VA strategy, and a slower and presumably more
Perceptual Organization and the Aperture Problem 509

global mechanism that employs the FT strategy. The interaction between these two mechanisms
can be captured by models that diffuse motion signals from the local to the global scale by parallel
excitatory connections weighted by distance (Loffler and Orbach 2003) or by motion-based pre-
dictive coding (Perrinet and Masson 2012).

Structure-blind strategies are not truly structure-blind


Thus structure-blind strategies have proved successful in predicting perceived motion in relatively
simple patterns such as plaids. Even in such simple patterns, however, further analysis suggests
that underlying these strategies are in fact specific assumptions about organizational processes,
that is, these models are not truly structure-blind. This is equally true of the earlier integration
of plaid component motions based on VA and of the later pattern motion perception based on
FT. Concerning the earlier VA integration, it is known that component motions in a plaid do not
always result in unitary pattern motion (coherence) but can, under a variety of circumstances,
be perceived as one grating sliding above the other (transparency; see Wright and Gurney 1997;
Hedges et al. 2011). Thus before integration can take place, the system in some way decides that
the components are to be integrated. For instance, when component gratings have different spatial
frequencies a critical factor is their difference in orientation (Kim and Wilson 1993; for a related
finding see also Nakayama and Silverman 1988). In addition, luminance relations consistent with
transparency are important (Stoner et al. 1990). These results suggest that integration is gated by
organizational processes such as grouping by similarity or figure- ground layering. Within recent
Bayesian approaches, such organizational principles can be modeled formally as prior probabili-
ties. An organizational minimum principle, for instance, can be modeled as a prior bias for slower
motions (Montagnini et al. 2007; Weiss et al. 2002); or a principle of good continuation as a facili-
tation for connections coding collinear signals (Loffler and Orbach 2003).

The barberpole effect


The barberpole effect refers to a class of motion phenomena involving contours moving within
stationary frames (often also referred to as apertures, but note that we are now referring to phys-
ical apertures, like a hole, not to theoretical apertures as discussed in Section 1). The effect refers
to the strong effect that the shape of a surrounding frame has on the perceived motion of a con-
tour (Figure 24.3a). As such, the term is a bit of a misnomer. It derives from old-time barbershop
signs, which consisted of staffs or poles with a revolving helix of colored stripes. When observing
these signs, one perceives motion along the vertical orientation of the pole. Because the stripes
are subject to the aperture problem, it would be expected that they would move in the direction
orthogonal to their orientation. However, in the proximal stimulus the terminators of the stripes
move vertically along the edges of a rectangle. In a variety of conditions, it is the proximal motion
of these terminators that determines the perceived motion of the grating.

Psychophysics of orthogonal and terminator signals


Thus the barberpole effect actually refers to frames of any orientation and shape, not just to proper
barberpole shapes. The effect of the frame shape on the direction of motion is consistent with
the idea that although local motion detectors respond maximally to the orthogonal component,
additional motion computations go beyond this limitation by combining local orthogonal motion
signals with local signals from contour terminators (Kooi 1993; Lorenceau et al. 1993; Mussap and
Te Grotenhuis 1997). Careful psychophysical measurements have shown that the perceived speed
510 Bruno and Bertamini

(a)

(b)

Fig. 24.3  (a) The perceived direction of a translating grating depends on the shape of the
surrounding frame (barber-pole.mov). Suppose that for all gratings true motion is horizontal and
to the right (central grey vector). The grating within the circular frame will appear to move diagonally
in the direction orthogonal to the orientation of the contour. The grating within the vertical frame,
vertically downwards. That within the horizontal frame, horizontally and to the right. The grating
within the square will alternate between vertical and horizontal motion. The grating within the
narrower bent frame, finally, will appear to change direction as the aperture changes orientation
(perceived motions are represented by black vectors). (b) If a diamond shape is translated behind
three vertical bars without revealing the corners, each visible segment actually moves vertically as
shown on the left. These vertical motions are readily seen when only the segments are presented,
but become invisible after adding the occluding bars. In this case, observers perceive the true
motion of the diamond (shiffrar.mov, shiffrar-ill.mov). Without the occluding bars, the segment
terminators are perceived as intrinsic to the lines and their vertical motion overcomes the orthogonal
components. With the occluding bars, the segment terminators are perceived as extrinsic or
accidental (due to the occlusion interpretation). The vector average of the orthogonal components
determines the correctly perceived translation.

of oblique translating lines is underestimated compared to that of vertical lines. This bias increases
with the tilt and length of the line, as would be expected if the orthogonal and terminator signals
were weighted according to their perceptual salience (Castet et al. 1993). This in turn is consist-
ent with a wealth of physiological data. For instance, there is evidence that MT is implicated in
integrating not only local signals along multiple contours (Movshon et al. 1986), but also signals
along contours and at contour terminators (Pack 2001; Pack et al. 2003; Pack et al. 2004), and
with temporal dynamics consistent with the hypothesis that the integration stage occurs later in
processing than the coding of local motions (Pack and Born 2001).
Perceptual Organization and the Aperture Problem 511

Edge classification and occlusion


The barberpole effect has inspired the creation of stimuli that have been used to test the role of
various factors. For example perceptual factors affect whether a region is perceived as an aperture
or as a foreground. This change in the figure-ground interpretation of the scene in turn affects the
perceived motion and Wallach had already pointed out this important aspect of the interaction
between motion and form perception. An interesting case in point is that of contours having mark-
ers on them. Imagine lines changing from black to red along one dividing line visible inside an
aperture. This additional motion information (from the locations where color changes) can drive
the perceived direction of motion. However, over time, the shape of the aperture and its terminators
become dominant and individuals perceive lines moving in a different direction and that change
color as they move, i.e. they appear to move underneath a ‘queer transparent veil’ (Wallach 1935).
In a seminal paper, Shimojo et al. (1989) have shown that these figure-ground effects can be
conceptualized as different ways to solve the ECP, that is, as a form of classification process that
treats the terminator motions as belonging to the moving object (intrinsic terminators, that must
be integrated with the orthogonal components to estimate the object’s motion), or as accidental
terminators that do not belong to the object because they are due to occlusion (extrinsic termina-
tors, that must be ignored). They manipulated the stereoscopic disparity of striped patterns trans-
lating within rectangular frames. Their results showed that if the striped pattern had uncrossed
disparity relative to the frame plane, such that the pattern was seen through a rectangular hole, the
barberpole effect was abolished and the pattern appeared to move in the orthogonal direction. If
the pattern had crossed disparity, conversely, the pattern appeared to lie above a solid rectangular
surface and the stripe terminators determined its direction, consistent with the barberpole effect
(shin-dav.mov, shin-die.mov).
If terminators signals affect the solution to the AP only when the terminators are classified as
intrinsic, one would expect that in an ambiguous motion display having both intrinsic and extrin-
sic terminators, the pattern motion would be in the direction of the former. This prediction turns
out to be correct in ambiguous ‘barber-diamond’ displays (Duncan et al. 2000). In these displays,
gratings translate within diamond-shaped apertures that are divided into four equal quadrants.
Two of these quadrants are stereoscopically placed in front of the grating, whereas the other two
are placed behind the grating. Thus, half of the terminator signals are classified as intrinsic and
the other half as extrinsic. Remarkably, the perceived direction of motion is dominated by the
signal coming from the intrinsic terminators. In addition, many neurons in area MT respond pre-
cisely to this motion direction. The fact that extrinsic terminators created by occlusion are treated
differently from intrinsic terminators suggests that the visual system solves the AP and the ECP
jointly. This general principle is consistent with a number of other observations (see for instance
Anderson and Sinha 1997; Castet et al. 1999).

Edge classification beyond disparities


Several studies have shown that the effect of the classification of terminators as intrinsic or extrin-
sic on the solution to the AP is not simply due to an interaction of motion and stereoscopic
occlusion mechanisms, but extends to other organizational factors that affect figure-ground strati-
fication. In an elegant study, for instance, Vallortigara and Bressan (1991; see also Bressan et al.
1993) used Petter figures (Petter 1956) to manipulate the figure-ground stratification of moving
stripes and their rectangular frame.
They observed that when the stripes were thinner than the frame, such that the stripes by
Petter’s effect appeared in front, the bars moved perpendicularly to their orientation, as if the
512 Bruno and Bertamini

visual system disregarded the motion of their terminators (vallobres-sottile.mov). When the
stripes and the frame were the same width, such that they formed a single perceptual unit, the bars
tended to move in the direction of the terminators (vallobres-spesso.mov). Related effects have
been demonstrated using illusory-surface frames (Bertamini et al. 2004) and by several manipula-
tions aimed at making the motion of contour terminators less salient or reliable (Lorenceau and
Shiffrar 1992). Consider, for instance, an outline diamond translating horizontally behind three
occluding bars (see Figure 24.3b). Suppose that the movement stops and reverses direction before
revealing the corners of the diamond, such that only the diagonal contours are visible in any given
frame. Participants will perceive the motion of the diamond correctly, as one would expect if the
orthogonal components were averaged to compute the motion of the whole. The terminators of
the diamond contours, however, bear a motion signal in the vertical direction as can be easily seen
by removing the occluding bars as in Figure 24.3b, right. Presumably, the visual system interprets
the up-down motion of the line terminators as being due to occlusion, and discards it from the
integration process.

Hierarchical organization and frames of reference


The role of figure-ground perceptual organization in the solutions to the AP is not limited to the
classification of edge terminators into intrinsic and extrinsic, but can be shown to involve  the
global organization of the scene into a hierarchy of figure-ground relationships and of corre-
sponding frames of reference for motion. We have already seen (Section 2)  how assumptions
about the organization of the scene are implicit even in models that implement relatively simple
integration schemes such as the IOC or VA strategies. By considering moving stimuli with just
slightly more complex spatial structures, we will now show that explicitly including such organi-
zational processes into accounts of the AP becomes unavoidable. We will start by considering
what might be considered the smallest possible structural complication, adding a simple feature
to a barberpole display.

Sliding effect
In his pioneering observations, Wallach (1935) was the first to note that adding a visible feature,
such as a dot, to a contour moving within an aperture fails to abolish the barperpole effect. He
justly noted that this is surprising, as the dot provides an unambiguous signal potentially specify-
ing the true motion of the contour. This unambiguous signal, however, does not typically affect
the motion of the contour. In most cases, instead, the moving contour continues to move in the
same direction as the corresponding contour without the feature (i.e., it shows the barberpole
effect). At the same time, the feature appears to move obliquely along the contour. This ‘sliding’
effect is quite robust (sliding.mov). For instance, it remains visible if several features are placed
on the line (Wallach 1935), and if the orientation of the aperture or the duration of the motion
are varied (Castet and Wuerger 1997). Critically, the sliding remains visible even with very brief
durations, which argues against an explanation in terms of retinal slip during smooth pursuit of
the line (Castet and Wuerger 1997). Thus, the sliding effect seems to be consistent with a hierar-
chical organization of the motion signals into separate frames of reference (separation of systems,
Duncker 1938). The motion of the feature is perceived in relation to the moving line, which in
turn is perceived in relation to the aperture. Consistent with this account, it has been shown that
the sliding effect is abolished when a conspicuous static frame of reference is placed outside the
aperture (Castet and Wuerger 1997).
Perceptual Organization and the Aperture Problem 513

Chopsticks and resting circles


Also consistent with a role of hierarchical figure-ground organization within separate frames of
reference are the chopstick illusion (Anstis 1990) and the apparent rest phenomenon.
In the chopstick illusion (Figure 24.4a), two intersecting segments, one vertical and one hori-
zontal, appear to rotate counterclockwise in counterphase. However, the + feature at the intersec-
tion actually moves in the clockwise direction, although this trajectory is never perceived. The
counterclockwise motion is in fact the relative movement of each of the two segments with respect
to the other. Thus this perceptual solution fits the notion of hierarchical organization, as it fits
the idea that the accidental, or extrinsic, features due to occlusion are disregarded by the system.
In the apparent rest phenomenon (Metelli 1940; see also Gerbino and Bruno 1997, pararest.
mov) a circle is rotated around its center. The circle is arranged in a pattern that includes other
structures such as, for instance, three segments forming a Y having the intersection at the center
of rotation (Figure 24.4b, left) or a rectangle that occludes part of the circle (Figure 24.4b, right).
When patterns such as these are rotated, a surprising percept is experienced. The circles do not
move at all, although there is an abundance of local motion signals, both at contour terminators
and along contours. Specifically, in the first pattern reproduced in the figure, the circle appears as
a static frame, and the Y only appears to rotate relative to it. This implies that the motion signals
at the contour junctions between the segments and the circle are classified as belonging to the

(a) (b)

(c)

Fig. 24.4  Selected demonstrations of hierarchical organization affecting the solutions to the AP.
(a) In the chopstick illusion, two chopsticks appear to rotate counterclockwise in counterphase (top,
chopstick.mov). Isolating the ‘+’ at the cross-over by a circular aperture reveals that this central
feature is actually rotating clockwise (bottom, chopstick-occl.mov). However, clockwise rotation
is never perceived in the unoccluded chopsticks. (b) In the apparent rest demonstration (metelli2.
mov), a circle is rotated around its center. Other visual structures are presented within (left) or in other
instances by (right) the circle. This generates moving features at the intersections with the circular
contour. However, the circles appear completely stationary and the other structures appear to rotate
relative to it. (c) In the so-called ‘breathing illusions’, an illusory figure is rotated relative to stationary
elements. The movement is rigid but various deformations are perceived. For instance, with a square
rotating over four stationary disks, the figure appears to expand and shrink cyclically during the
rotation like a breathing lung (expansion.mov). With a triangle rotating over a spoke pattern, the
figure appears to deform, growing suddenly in one direction while shrinking in another during the
rotation. Interestingly, no comparable deformations are visible when the background elements are
rotated relative to the figure, although the relative motions are identical (nickeffect.mov).
514 Bruno and Bertamini

segments and therefore fail to capture the circle. A plausible reason for this outcome, given that
the pattern contains no disparity or figural information for figure-ground organization, is that the
circle itself remains stable relative to the observer and for this tends to become a reference for the
Y figure. In the second pattern reproduced in the figure, as in other variants studied by Metelli,
the circle is completed amodally behind the occluder and the rectangle appears to rotate above it.
Given that terminator signals are present at the T-junctions between the circle and the rectangle,
it could be argued that these terminators ought to be classified as extrinsic and therefore should
have no role in determining the circle movement. Presumably, this organization is further rein-
forced by the stability of the amodally completed circle relative to the observer, which makes it a
strong candidate frame of reference for the motion of the rectangle.

Breathing illusions
The role of the self as a frame of reference for the interpretation of visual motion is also apparent
in the so-called breathing illusions (for a review see Bruno 2001). These are cases where a figure,
such as for instance a square or a triangle (see Figure 24.4c), is rotated rigidly over other surround
elements. In typical demonstrations, the figures are illusory but equivalent configurations can
be obtained by reversing the depth order such that the elements become holes and the figure is
seen through them (note that this implies that the same optical transformations occur within, for
instance, the disks of the left figure). Although the rotation is perfectly rigid, the rotating figure
appears to deform in various ways. The square over the disks, for instance, appears to breathe, that
is, to shrink and expand cyclically during each cycle of rotation.
Shiffrar and Pavel (1991) suggested that the breathing percept arises because the motion of the
square is perceived in relation to different frames of reference when the corners are visible and
when they are not. According to their proposal, when the corners of the square are not visible
within one of the disks, because of the AP the center of rotation for each of the visible contours
is misperceived and placed near to, or at, the local center of the rotating side. As a consequence,
local motion signals that are oriented toward or away from the actual center of rotation become
available. These signals signal a change in size, and this causes the apparent breathing. However,
the deformations are never perceived when the background elements are rotated relative to a sta-
tionary figure (Bruno and Gerbino, 1991).
Given that in this modification all relative motions are exactly equivalent to the case where
the figure rotates, one might find this asymmetry surprising. However, considering what
structure acts as a frame of reference for the perceived motion reveals an obvious difference.
When the figure rotates, the disks or lines have the role of a stable frame of reference relative
to the observer, and the figures moves relative to these. When the disks rotate, conversely, it is
the figure that remains stable relative to the self. Thus all motion signals are coded in relation
to this frame of reference. Bruno and Gerbino (1991) and Bruno and Bertamini (1990) have
argued that the local motion signals that are coded in this fashion are critical to the bound-
ary formation process that reconstructs partly invisible edges from sparse spatiotemporal
information.

Recent results
Recent results have provided evidence that contributions to the solution of the AP in visual motion
perception may also come, surprisingly, from non-visual sources of information. These results are
in line with the currently increasing interest toward multisensory processes in perception (Calvert
et al. 2004). It has been long known that multisensory interactions bias the preferred percept in
multistable motion displays. For instance, adding an auditory signal switches the perception of
Perceptual Organization and the Aperture Problem 515

two dots moving in phase along an X pattern from streaming (one dot crosses over on top of the
other) to bouncing (the dots collide at the intersection of the X and bounce back; Sekuler et al.
1997). Tactile information about direction of rotation disambiguates the visual three-dimensional
structure of a computer-generated random-dot globe (Blake et al. 2004). During dichoptic view-
ing of dynamic rival stimuli, moving a computer mouse extends dominance durations and abbre-
viates suppression durations for the one rival stimulus moving in the same direction as the hand
movement (Maruya et al. 2007). The perceived direction of motion of an ambiguous visual dis-
play is biased by several aspects of preceding actions (Wohlschlager 2000). Finally, pursuit eye
movements promote coherent motion of four line segments that are ambiguous during fixation
(Hafed and Krauzlis 2006). These findings suggest that multisensory contributions as well as other
top-down non visual factors may affect the solution to the AP.

Kinesthetic information and the AP


Additional constraints for solving the AP may come from information about one’s movement
(kinesthesis) during purposive action. To test this expectation, an elegant experiment by Hu
and Knill (2010) independently presented a tactile movable cube, a visual rendering of the same
cube, and a sinusoidal grating translating within an aperture on the upper face of this visual cube
(see Figure 24.5). With a circular aperture, participants reported that they perceived the grating
to move always in the direction of the hand movement. With a square aperture, the perceived
motions were more variable. They were often in the direction of the hand movement, but they were
also often in one of the directions of the aperture sides (terminator motions), and occasionally

(a) (b)

Kinesthetic
Monitor

ror Visual
Mir

Cube

Fig. 24.5  Schematics of an apparatus for assessing the role of kinesthetic motion signals in the
solution of the AP. (a) A CRT monitor is suspended upon a mirror. Behind the mirror is a cube
manipulandum connected to a motion-tracking device. The participant moves the cube with one
hand while an image of the cube in its current position is rendered on the monitor. (b) On top of
the rendered cube experimental software presents a sinewave grating within a circular aperture.
Two motion signals are potentially available: a visual signal, which because of the AP is always in the
direction orthogonal to the orientation of the sinewave, and a kinesthetic signal that is a function of
the hand movement.
Reprinted from Current Biology, 20(10), Bo Hu and David C. Knill, Kinesthetic information disambiguates visual
motion signals, pp. R436–37, Figures 1a and 1b, Copyright (2010), with permission from Elsevier.
516 Bruno and Bertamini

also in the direction orthogonal to the orientation of the grating. Finally, when the aperture was
circular but a 200 ms delay was imposed between the visual and kinesthetic signals, almost all
reports were in the direction orthogonal to the grating orientation. These results are consistent
with a multisensory interaction of kinesthetic and visual signals occurring for simultaneous, but
not delayed stimulation (see Stein and Meredith 1993). These results also suggest that the weight
of the kinesthetic component is highest when visual information is most ambiguous (circular
aperture) and becomes less strong when unambiguous motion signals from terminators are pro-
vided (square aperture). Thus, this pattern can also be interpreted in terms of optimal Bayesian
integration (Ernst and Banks 2002) for visual and kinesthetic signals.
In a related experiment, DeLucia and Ott (2011) presented lines that translated within circular
or rectangular moving or stationary apertures. In one condition, participants passively viewed the
lines. In a second condition, they actively moved a joystick that controlled the direction of the
translating line. In accord with the barberpole effect, they found that with rectangular apertures
participants tended to report movement in the direction of the orientation of the aperture. With
circular apertures, conversely, they tended to report movement orthogonal to the orientation of
the line. For both apertures, however, active control of the line movement biased perceived move-
ments away from the orthogonal direction and in the direction of the joystick movement. Thus,
although the reported effects were smaller than those of Hu and Knill (2010), these results provide
converging evidence that kinesthetic signals contribute to the solution of the AP.

Top-down factors
In a second experiment, DeLucia and Ott (2011) also manipulated attentional load by asking
participants to report the motion of the line (no load condition) or both the motion of the line
and that of the aperture (load condition). While it is not clear how this manipulation affected the
spatial distribution of attention, results provided some evidence that this manipulation affects
the relative weighting of orthogonal and terminator motions in solving the AP. This result is in
line with previous reports that voluntary attentional control can influence contextual integration
processes in motion perception (Freeman and Driver 2008) and can modulate the spatial extent
over which local motion signals are integrated (Burr et al. 2009). It seems likely, therefore, that top
down processes may also have a role in solutions to the AP. Related studies suggest that these are
not limited to attention but can include expectations learned through perceptual (Graf et al. 2004)
or sensorimotor individual experience (Yabe et al. 2011), as well as high-level knowledge about
the visibility of surfaces during occlusion and disocclusion (McDermott et al. 2001).

Conclusions
We have reviewed strategies for solving the local ambiguities of motion signals (the AP) and for
perceiving coherent object motion. This is arguably one of the greatest challenges faced by the
human visual system. We have argued that the solution cannot be reduced to relatively simple vec-
tor operations, such as averaging or intersecting constraints in velocity space. Solutions to the AP
reflect complex processes of perceptual organization, which dynamically take into account visual
stimulus structure as well as additional constraints from nonvisual sensory channels. We believe
that studies on effects of perceptual organization on the solution to the AP will continue to be a
fertile and active area of research. In this area, key findings may come from studies of dynamic
grouping of connected surfaces (see Hock, this volume) and of interactions between motion and
form (see Blair et al, this volume).
Perceptual Organization and the Aperture Problem 517

References
Adelson, E. H. and Movshon, J. A. (1982). ‘Phenomenal coherence of moving visual patterns’. Nature
300(5892): 523–5.
Alais, D. M., Wenderoth, P. M., and Burke, D. C. (1994). ‘The contribution of 1-D motion mechanisms to
the perceived direction of drifting plaids and their aftereffects’. Vision Research 34: 1823-34.
Alais, D., Wenderoth, P., and Burke, D. (1997). ‘The size and number of plaid blobs mediate the
misperception of type-II plaid direction’. Vision Research 37(1) 143–50.
Albright, T. D. (1984). ‘Direction and orientation selectivity of neurons in visual area MT of the macaque’.
Journal of Neurophysiology 52(6): 1106–30.
Anderson, B. L. and Sinha, P. (1997). ‘Reciprocal interactions between occlusion and motion computations’.
Proc Natl Acad Sci USA, 94(7), 3477–80.
Anstis, S. (1990) ‘Imperceptible Intersections: The Chopstick Illusion’. In AI and the Eye, edited by A. Blake
and T. Troscianko, pp. 105–117 (John Wiley: Chichester).
Barenholz, E. and Tarr, M. J. (2007). ‘Reconsidering the role of structure in vision’. In Categories in use: The
Psychology of Learning and Motivation, edited by M. Markman and B. Ross vol. 47, pp. 157–180.
(Orlando, FL: Academic Press).
Bertamini, M., Bruno, N., and Mosca, F. (2004). ‘Illusory surfaces affect the integration of local motion
signals’. Vision Research 44(3): 297–308.
Blake, R., Sobel, K. V., and James, T. W. (2004). ‘Neural synergy between kinetic vision and touch’. Psychol
Sci 15(6): 397–402.
Blair (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Bressan, P., Ganis, G., and Vallortigara, G. (1993). ‘The role of depth stratification in the solution of the
aperture problem’. Perception 22(2): 215–28.
Bruno, N. (2001). ‘Breathing illusions and boundary formation in space-time’. In From Fragments to
Objects: Segmentation and Grouping in Vision (Advances in Psychology 130), edited by T. F. Shipley and
P. J. Kellman, pp. 531–56. (North-Holland).
Bruno, N. and Bertamini, M. (1990). ‘Identifying contours from occlusion events’. Perception and
Psychophysics 48(4): 331–42.
Bruno, N. and Gerbino, W. (1991) ‘Illusory figures based on local kinematics’. Perception 20: 259–74.
Burr, D. C., Baldassi, S., Morrone, M. C., and Verghese, P. (2009). ‘Pooling and segmenting motion signals’.
Vision Research 49(10): 1065–72.
Calvert, G. A., Spence, C., and Stein, B. E. (2004). The Handbook of Multisensory Processes. (Cambridge,
MA: MIT Press).
Castet, E. and Wuerger, S. (1997). ‘Perception of moving lines: interactions between local perpendicular
signals and 2D motion signals’. Vision Research 37(6): 705–20.
Castet, E., Lorenceau, J., Shiffrar, M., and Bonnet, C. (1993). ‘Perceived speed of moving lines depends on
orientation, length, speed and luminance’. Vision Research 33(14): 1921–36.
Castet, E., Charton, V., and Dufour, A. (1999). ‘The extrinsic/intrinsic classification of two-dimensional
motion signals with barber-pole stimuli’. Vision Research 39(5): 915–32.
Cropper, S. J., Badcock, D. R., and Hayes, A. (1994). ‘On the role of second- order signals in the perceived
direction of motion of type II plaid patterns’. Vision Research 34(19): 2609–12.
DeLucia, P. R. and Ott, T. E. (2011). ‘Action and attentional load can influence aperture effects on motion
perception’. Exp Brain Research 209(2): 215–24.
Duncan, R. O., Albright, T. D., and Stoner, G. R. (2000). ‘Occlusion and the interpretation of visual
motion: perceptual and neuronal effects of context’. J Neurosci 20(15): 5885–97.
518 Bruno and Bertamini

Duncker, K. (1938). ‘Über induzierte Bewegung [Concerning induced movement] ’. In Source book of
Gestalt psychology, edited and translated by W D. Ellis, pp. 161–72. (London: Routledge and Kegan
Paul). Reprinted from Psychologische Forschung (1929), 12 180–259.
Ernst, M. O. and Banks, M. S. (2002). ‘Humans integrate visual and haptic information in a statistically
optimal fashion’. Nature 415(6870): 429–33.
Fennema, C. L. and Thompson, W. B. (1979). ‘Velocity determination in scenes containing several moving
objects’. Computer Graphics and Image Processing 9: 310–15.
Freeman, E. and Driver, J. (2008). ‘Voluntary control of long-range motion integration via selective
attention to context’. Journal of Vision 8(11): 18.1–18.22.
Gerbino, W. and Bruno, N. (1997). ‘Paradoxical rest’. Perception 26: 1549–54.
Gizzi, M. S., Katz, E., Schumer, R. A., and Movshon, J. A. (1990). ‘Selectivity for orientation and direction
of motion of single neurons in cat striate and extrastriate visual cortex’. J Neurophysiol 63(6): 1529–43.
Graf, E. W., Adams, W. J., and Lages, M. (2004). ‘Prior depth information can bias motion perception’.
Journal of Vision 4(6): 427–33.
Grossberg, S. (2011). ‘Visual motion perception’. In Encyclopedia of Human Behavior, edited by V. S.
Ramachandran, second edn. (Oxford: Elsevier).
Grossberg, S. and Mingolla, E. (1993). ‘Neural dynamics of motion perception: direction fields, apertures,
and resonant grouping’. Percept Psychophys 53(3): 243–78.
Hafed, Z. M. and Krauzlis, R. J. (2006). ‘Ongoing eye movements constrain visual perception’. Nat Neurosci
9(11): 1449–57.
Hedges, J. H., Stocker, A. A., and Simoncelli, E. P. (2011). ‘Optimal inference explains the perceptual
coherence of visual motion stimuli’. Journal of Vision 11(6): 14, 1–16.
Hildreth. E. C. (1983). The Measurement Of Visual Motion. (Cambridge, MA: MIT press).
Hock (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Hu, B. and Knill, D. C. (2010). ‘Kinesthetic information disambiguates visual motion signals’. Curr Biol
20(10): R436–7.
Hubel, D. H. and Wiesel, T. N. (1968). ‘Receptive fields and functional architecture of monkey striate
cortex’. The Journal of Physiology 195(1), 215–43.
Kane, D., Bex, P., and Dakin, S. (2011). ‘Quantifying “the aperture problem” for judgments of motion
direction in natural scenes’. Journal of Vision 11(3): 25, 1–20.
Kim, J. and Wilson, H. R. (1993). ‘Dependence of plaid motion coherence on component grating
directions’. Vision Research 33(17): 2479–89.
Kooi, F. L. (1993). ‘Local direction of edge motion causes and abolishes the barberpole illusion’. Vision
Research 33(16): 2347–51.
Loffler, G. and Orbach, H. S. (2003). ‘Modeling the integration of motion signals across space’. J Opt Soc
Am A Opt Image Sci Vis 20(8): 1472–89.
Lorenceau, J. and Shiffrar, M. (1992). ‘The influence of terminators on motion integration across space’.
Vision Research 32(2): 263–73.
Lorenceau, J., Shiffrar, M., Wells, N., and Castet, E. (1993). ‘Different motion sensitive units are involved in
recovering the direction of moving lines’. Vision Research 33(9): 1207–17.
Maruya, K., Yang, E., and Blake, R. (2007). ‘Voluntary action influences visual competition’. Psychol Sci
18(12): 1090–8.
McDermott, J., Weiss, Y., and Adelson, E. H. (2001). ‘Beyond junctions: nonlocal form constraints on
motion interpretation’. Perception 30(8): 905–23.
Metelli, F. (1940) ‘Ricerche sperimentali sulla percezione del movimento’. Rivista di psicologia 36: 319–60.
Perceptual Organization and the Aperture Problem 519

Movshon, J. A., Adelson, E. H., Gizzi, M. S., and Newsome, W. T. (1986) ‘The analysis of moving visual
patterns’. In Pattern recognition mechanisms, edited by C. Chagas, R. Gattass and C. Gross, pp. 117–51.
(Vatican City: Vatican Press).
Montagnini, A., Mamassian, P., Perrinet, L., Castet, E., and Masson, G. S. (2007). ‘Bayesian modeling of
dynamic motion integration’. J Physiol Paris 101(1–3): 64–77.
Mussap, A. J. and Te Grotenhuis, K. (1997). ‘The influence of aperture surfaces on the barber-pole illusion’.
Perception 26(2): 141–52.
Nakayama, K. and Silverman, G. H. (1988). ‘The aperture problem—II. Spatial integration of velocity
information along contours’. Vision Research 28(6): 747–53.
Pack, C. C. (2001). ‘The aperture problem for visual motion and its solution in primate cortex’. Sci Prog
84(Pt 4): 255–66.
Pack, C. C. and Born, R. T. (2001). ‘Temporal dynamics of a neural solution to the aperture problem in
visual area MT of macaque brain’. Nature 409(6823): 1040–2.
Pack, C. C., Gartland, A. J., and Born, R. T. (2004). ‘Integration of Contour and Terminator Signals in
Visual Area MT of Alert Macaque’. J Neurosci 24(13): 3268–680.
Pack, C. C., Livingstone, M. S., Duffy, K. R., and Born, R. T. (2003). ‘End- stopping and the aperture
problem: two-dimensional motion signals in macaque V1’. Neuron 39(4): 671–80.
Pei, Y. C., Hsiao, S. S., and Bensmaia, S. J. (2008). ‚The tactile integration of local motion cues is analogous
to its visual counterpart’. Proc Natl Acad Sci USA 105(23): 8130–5.
Perrinet, L. U. and Masson, G. S. (2012). ‘Motion-Based Prediction is Sufficient to Solve the Aperture
Problem’. Neural Computation 24(10): 2726–50.
Petter, G. (1956) ‘Nuove ricerche sperimentali sulla totalizzazione percettiva’. Rivista di psicologia
50: 213–27.
Schall J. D. (2000). ‘Decision making: From sensory evidence to a motor command’. Current Biology
10(11): R404-R406.
Sekuler, R., Sekuler, A. B., Lau, R. (1997). ‘Sound alters visual motion perception’. Nature 385: 308.
Shiffrar, M. and Pavel, M. (1991). ‘Percepts of rigid motion within and across apertures’. JEPHPP
17(3): 749–61.
Shimojo, S., Silverman, G. H., and Nakayama, K. (1989). ‘Occlusion and the solution to the aperture
problem for motion’. Vision Research 29(5): 619–26.
Stein, B. E. and Meredith, M. A. (1993). The Merging of the Senses. (Cambridge, MA: MIT Press).
Stoner, G., Albright, T., and Ramachandran, V. (1990). ‘Transparency and coherence in human motion
perception’. Nature 344(6262): 153–5.
Tanaka, K. and Saito, H. A. (1989). ‘Analysis of motion of the visual field by direction, expansion/
contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the
macaque monkey’. Journal of Neurophysiology 62(3): 626–41.
Todorovic D., (1996). ‘A gem from the past: Pleikart Stumpf ’s (1911) anticipation of the aperture problem,
Reichardt detectors, and perceived motion loss at equiluminance’. Perception 25(10): 1235–42.
Tootell, R. B. H., Reppas, J. B., Kwong, K. K., Malach, R., Born, R. T., Brady, T. J., et al. (1995). ‘Functional
analysis of human MT and related visual cortical areas using magnetic resonance imaging’. Journal of
Neuroscience 15(4): 3215.
Vallortigara, G. and Bressan, P. (1991). ‘Occlusion and the perception of coherent motion’. Vision Research
31(11): 1967–78.
Vezzani et al. (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Wallach, H. (1935). ‘Über visuell wahrgenommene Bewegungsrichtung’ Psychologische Forschnung
20: 325–380.
520 Bruno and Bertamini

Weiss, Y., Simoncelli, E. P., and Adelson, E. H. (2002). ‘Motion illusions as optimal percepts’. Nat Neurosci
5(6): 598–604.
Wilson, H. R. and Kim, J. (1994). ‘Perceived motion in the vector sum direction’. Vision Research
34(14): 1835–42.
Wilson, H. R., Ferrera, V. P., and Yo, C. (1992). ‘A psychophysically motivated model for two-dimensional
motion perception’. Vis Neurosci 9(1): 79–97.
Wohlschlager, A. (2000). ‘Visual motion priming by invisible actions’. Vision Research 40(8): 925–30.
Wright, M. J. and Gurney, K. N. (1997). ‘Coherence and motion transparency in rigid and nonrigid plaids’.
Perception 26(5): 553–67.
Wuerger, S., Shapley, R., and Rubin, N. (1996). ‘ “On the visually perceived direction of motion” by Hans
Wallach: 60 years later’. Perception, 25, 1317–67.
Yabe, Y., Watanabe, H., and Taga, G. (2011). ‘Treadmill experience alters treadmill effects on perceived
visual motion’. PLoS One 6(7): e21642.
Yo, C. and Wilson, H. R. (1992). ‘Perceived direction of moving two- dimensional patterns depends on
duration, contrast and eccentricity’. Vision Research 32(1): 135–47.
Chapter 25

Stereokinetic effect, kinetic depth


effect, and structure from motion
Stefano Vezzani, Peter Kramer, and Paola Bressan

Introduction
Relative motion is one of the phylogenetically oldest and most compelling sources of information
about distance from one’s viewpoint (depth). Disparities between the left and right eye’s perspec-
tives are quite informative too, and stereopsis (depth perception on the basis of such disparities) is
of great help in breaking camouflage (Wardle et al. 2010). Oddly, though, the prerequisite orbital
convergence of the eyes from a lateral to a frontal position seems to have evolved, in primates,
only after the use of vision for reaching and grasping (Isbell 2006). It thus seems that, in order to
see depth, we were getting by just fine without stereopsis—relying only on monocular depth cues
like relative motion.
In part due to us moving about, the projection of the world on our retinae is constantly in
motion. Even when proprioceptive and motor information is unavailable to help us distinguish
between motion generated by the environment and motion generated by ourselves, and even in
the face of conflicting binocular disparity and other depth cues, motion generates strong impres-
sions of depth. Here we review this particular kind of depth perception that depends solely on
relative motion.
The oldest studies in this field concern the phenomenon of stereokinesis, which we discuss first.
Most of the more recent studies focus, instead, on the kinetic-depth effect (KDE), also known as
structure from motion (SfM), which we discuss afterwards.

Stereokinetic effect
Early work
Mach
Ernst Mach (1868, 1886) was the first to report a depth effect created by a figure moving in the
frontoparallel plane. He writes: “A flat linear drawing, monocularly observed, often seems flat. But
if the angles are made variable and motion is introduced, any such drawing immediately stretches
out in depth. One then usually sees a rigid body in rotation”1 (Mach 1886, pp. 99–100). (What
“angles” Mach refers to here remains unclear.)
Mach (1886, p. 102; 1897, p. 108) also discovered an unusual percept induced by either of two
kinds of motion. In the first case, an egg is rolled over a table in such a way that it performs jolting

  Our translation.
1
522 Vezzani, Kramer, and Bressan

(a) (b)

Fig. 25.1  (a) An ellipse on a rotating turntable (here represented by the circle) becomes, at the
stereokinetic stage, a rigid disc. (b) A circle with an eccentric dot on a rotating turntable (here
partially represented by the arc) becomes, at the stereokinetic stage, a rigid cone, either pointing
outward or receding inward.
Reproduced V. Benussi, Introduzione alla psicologia sperimentale, Lezioni tenute nell’anno 1922–23, Bicocca
University: Milan, 1922–1923.

movements, rather than smooth rotation. In the second case, the egg is placed horizontally on
the table and is rotated smoothly around a vertical axis. If viewed from a particular angle, in both
cases but more strikingly in the latter, the egg is perceived as a liquid body or large oscillating
drop. The effect disappears immediately if trackable spots are added to the egg’s surface.

Benussi
Peculiarly, the investigation of stereokinesis has been dominated by researchers from the Italian
University of Padua: Benussi, Musatti, Zanforlin, Beghi, Xausa, Vallortigara, and Bressan. In 1921,
Vittorio Benussi noted that some flat stimuli in slow rotation in the frontal plane appear to trans-
form into solid, cyclically moving 3-D objects (Musatti 1924; see also Benussi 1922–1923, 1925,
1927). Because the perceived corporeity of these illusory objects is similar to that of stereoscopi-
cally perceived ones, Benussi called the phenomenon stereokinetic. He thought the illusion arises
because of past experience with solid objects.
Benussi observed that, while watching an ellipse on a rotating turntable (Figure 25.1a)2, three
separate percepts arise in order. First, the ellipse appears to rotate rigidly around both the turntable’s
centre and its own. Second, the ellipse becomes an elastic, constantly deforming ring or disc that still
rotates around the turntable’s centre, but no longer around its own centre (best effects are obtained
if the ellipse’s axes have a 3:2 ratio; Wallach et al. 1956). At this stage, the percept is similar to Mach’s
rotating egg, but still 2-D, and therefore strictly speaking not stereokinetic; nevertheless, it has since
been studied in its own right (e.g., Weiss and Adelson 2000). Third, the ellipse suddenly appears
to disconnect from the turntable and becomes a rigid ring or disc slanted in depth, that while still
rotating around the turntable’s centre, also oscillates about its own centre. It is perceived to repeat-
edly reverse in depth, with its farthest edge becoming its closest and vice versa (Benussi 1922–1923).
Bressan and Vallortigara (1986a) later reported that, if observation continues, the third percept
is followed by a fourth—an elongated egg whose ends are located at different distances from the
observer and rotate in the frontal plane (see also Mefferd’s “cigar:” Mefferd 1968a, 1968b; Wieland
and Mefferd 1968). The disc and the egg alternate in time, separated by brief intervals in which
either a rotating rigid ellipse or a distorting elastic one are perceived (Vallortigara et al. 1988; see also

  For other stereokinetic stimuli used by Benussi, see <www.archiviapsychologica.org/index.php?id=581>


2
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 523

Mefferd 1968a). Benussi and his student Musatti (1924) basically only studied contour ellipses, but
all the percepts described above, including the fourth, obtain with both contour and filled ellipses.
Benussi (1927) described stereokinetic solids as “moving with astounding grace, smoothness,
elasticity, and ease, rhythmically and adroitly.”3 No surprise they attracted the attention of artists. In
the early 1920s, artist Marcel Duchamp created a series of Rotoreliefs: discs depicting circles and spi-
rals that, when rotating, produce percepts of depth. His stereokinetic displays were basically com-
plex versions of Benussi’s, and were created later. However, Duchamp had already used rotation in
previous art works (<www.marcelduchamp.net/ecatalogue.htm>). Quite possibly, therefore, he dis-
covered the stereokinetic effect independently from Benussi. In 1926, Duchamp portrayed ten of his
Rotoreliefs in the six-minute film Anémic Cinéma (D’Aversa 2007; note the illusory-contour rings
at 1:50 minutes into the film). Some Rotoreliefs were also used in Hans Richter’s 1947 surrealist film
Dreams that Money can Buy (<www.youtube.com/watch?feature=player_embedded&v=mJ5Cl30_
KvE>). More recently, the psychologist and artist Frederick S. Duncan (1975) has created remark-
ably powerful stereokinetic discs he called psychokinematic objects.

Musatti
Benussi’s assistant at the University of Padua, Cesare Musatti, authored the first published paper
on stereokinesis (Musatti 1924), followed by several others (e.g., Musatti 1928, 1975). He general-
ized to other stereokinetic stimuli Benussi’s three perceptual stages. First, rigid veridical motion is
perceived on a plane. Second, either relative motion between different parts of the stimulus or an
“ameboid” deformation is seen. And third, a stereokinetic solid emerges. Musatti argued that, with
few exceptions (such as inhomogeneously colored ellipses, e.g. Musatti 1929; for an English transla-
tion of some of Musatti’s observations, see Albertazzi 2004), the relative-motion or ameboid stage
is a necessary precursor to the stereokinetic stage. He proposed two completely different explana-
tions for the second and third stages (Musatti 1924). He explained the third, like Benussi, with past
experience with rotating solids, and the second with what he called “orientation stability.”

Orientation stability
Before turning to perception Musatti had studied mathematics, and in 1928 he was the first to use
vector analysis to describe perceptual phenomena—a particularly helpful approach subsequently
adopted by others (e.g., Johansson 1950; Wallach 1935; see also Giese chapter, this volume).
Musatti suggested considering, for example, a rotating turntable with two nested circles and two
virtual points, one on each circle (Figure 25.2a). During a 90o rotation, the two points maintain
the same position relative to each other (compare Figure 25.2a to Figure 25.2b). However, if the
two points are not marked, it is impossible to keep track of them, and the rotation goes unno-
ticed: a phenomenon called orientation stability (Musatti 1924) or identity imposition (Wallach
and Centrella 1990). If the rotational component of the stimulus’ motion is removed, only a trans-
latory component remains, and this is what is observed. That is, during the 90o rotation, the virtual
points on the two circles appear neither to take part in this rotation, nor to remain fixed relative
to one another, but to translate relative to one another (Figure 25.2c).
If, instead of two circles, only a single ellipse is presented, then this relative translation is not
seen between virtual points on different shapes, but between different virtual points on the same
shape. In this case, the ellipse is perceived to continually deform.
The phenomenon of orientation stability also occurs with some figures whose contours are not
uniform and should therefore not produce it (Musatti 1924, 1955, 1975; Proffitt et al. 1992). For

  Translation by Todorović 1993.


3
524 Vezzani, Kramer, and Bressan

(a) (b) (c)

Fig. 25.2  After a 90° clockwise rotation, the two points marked by grey triangles in (a) will have
moved as in (b), but due to orientation stability they seem to have moved as in (c).
Adapted from Dennis R. Proffitt, Irvin Rock, Heiko Hecht, and Jim Schubert, Stereokinetic effect and its relation to
the kinetic depth effect, Journal of Experimental Psychology: Human Perception and Performance,
18(1), pp. 3–21, http://dx.doi.org/10.1037/0096-1523.18.1.3 © 1992, American Psychological Association.

example, if the contours of the two circles in Figure 25.2 are dashed rather than solid, one still
does not see the circles rotate together, as they physically do, but translate relative to each other.
Meanwhile, the dashes are perceived to slide along the circles’ contours—an effect that Musatti
recognized but never reconciled with his theory.

Stereokinesis on inadequate basis


If the relative-motion or ameboid stage is necessary to reach the stereokinetic stage, then there
should be no stereokinesis with rectilinear figures, for example a wireframe triangle or cube. Such
figures contain angles, which render any rotation clearly visible and, hence, cannot support the
illusion of orientation stability. Yet, Musatti (1929) found that stereokinetic effects could arise
with such stimuli (see also, e. g., Mefferd 1968a; Piggins et al. 1984; Zanforlin 2003; Zanforlin and
Vallortigara 1990). Whereas 88 per cent of Musatti’s (1955) naïve observers saw stereokinesis with
curvilinear figures, only 18 per cent saw it with rectilinear ones; but this number rose to 30 per
cent if observers had previously watched curvilinear stimuli, and to an impressive 77 per cent if
they were explicitly told what they might see. Musatti called the effect generated by these figures
“stereokinesis on inadequate basis.” The impression of corporeity is ephemeral and the stimulus
does not appear to extend in depth as much as in ordinary stereokinesis (Musatti 1975; see also
Wilson et al. 1983). Nonetheless, stereokinesis on inadequate basis is inconsistent with Musatti’s
theory, and Musatti himself (1955) did admit as much.

The height of the stereokinetic cone


On a rotating turntable, a circle containing an eccentric dot produces the stereokinetic percept of
a cone pointing outward, with the dot becoming the cone’s apex—or, less often, of a funnel reced-
ing inward (Musatti 1924; see Figure 25.1b; where the peripheral circle is replaced by a central
filled ellipse). For geometric reasons, the more tilted the cone, the shorter it should be. Still, in
principle, the same stimulus is consistent with an infinite number of possible tilt-and-height pairs
(Musatti 1975). Thus, the fact that the stereokinetic cone is typically perceived to have only one
specific tilt and height requires an explanation.
The perceived height of the cone depends on various factors. For example, the cone is taller
under monocular than under binocular observation (Fischer 1956) and is shorter for a textured
base than for a plain one (Zanforlin 1988a). More importantly, the cone becomes taller with both
its base’s increasing diameter and the dot’s increasing eccentricity (Musatti 1924, 1955, 1975; see
also Fischer 1956; Robinson et al. 1985; Wieland and Mefferd 1968; Zanforlin 1988a). The more
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 525

concentric circles the stimulus contains, the more compelling the stereokinetic effect, but whether
this also affects the height of the cone is unclear: some reported that it does (e.g., Wallach and
Centrella 1990), others that it does not (e.g., Robinson et al. 1985; Zanforlin 1988a).
Musatti (1924, 1928–1929, 1955, 1975) reasoned that the cone could appear rigid only if its
base were physically slanted relative to the observer, and the base does indeed look slanted. But,
if the base were physically slanted, its retinal projection would be an ellipse; instead, it is a circle.
To solve this “geometrical paradox,” Musatti (1955, 1975) proposed that, because of a general ten-
dency of all points on the stimulus to appear equally far from the observer, (a) the eccentric dot
that becomes the cone’s apex “resists” coming closer to the observer, and (b) the circle “resists”
becoming slanted. Whereas the first kind of “resistance” should decrease the cone’s height and
increase its slant, the second should do the opposite. Some compromise between the two might
then determine how the cone is perceived. However, because the two “resistances” cannot be
quantified, this hypothesis is untestable (Zanforlin 1988b).

The explanation of stereokinesis


The Gestaltist Pentti Renvall (1929) accepted Musatti’s explanation of how the percept of rigid
veridical motion on a plane gives way to that of deformation, and accepted that the latter was nec-
essary for the emergence of stereokinesis. However, he rejected Benussi’s belief, shared by Musatti
(1924), that stereokinesis could be explained on the basis of past experience. According to Renvall,
the stereokinetic solid is the most stable, regular, and symmetrical shape that is consistent with the
retinal image. Renvall showed that even more complex stimuli, such as sets of partly overlapping
circles, invariably produce stereokinetic percepts that, while remaining consistent with the stimu-
lus, minimize the number of objects and maximize the regularity of motion.
Following Renvall’s work, Musatti (1937, 1955, 1975) further emphasized the role of the Gestalt
laws of organization, that he regarded as special cases of an overarching principle of minimum dif-
ferences or maximal homogeneity (Musatti 1930, 1931, 1937). According to this principle, a stimulus
is preferentially perceived in such a way that its elements differ as little as possible in color, position,
and so on. Applied to time, maximal homogeneity means that the stimulus should remain as similar
to itself as possible, that is, it should change the least—which implies that it should remain as rigid
as possible. In the case of stereokinetic stimuli, the first, veridical percept consists of flat shapes that
rotate rigidly. Due to orientation stability, rigidity is lost at the relative motion or ameboid stage, but
finally recovered when the stereokinetic transformation brings about the solid object.

Recent work
The minimum-relative-motion principle
Zanforlin (1988a,b; see also related work by Beghi et  al. 1991a,b; Beghi et  al. 2008; Liu 2003)
proposed a new model, based on a version of the Gestalt “minimum principle” (see van der Helm
chapter, this volume), which includes the minimization of relative velocity differences within a
percept. When this minimization eliminates them all, the percept is rigid, but this rigidity is a
mere byproduct.
In the case of the stereokinetic cone, the model of Zanforlin and colleagues involves two sep-
arate minimizations of relative velocity differences:  the first explains orientation stability, the
second the emergence of the stereokinetic solid. The process is illustrated in Figure 25.3. First
minimization: the farther away each point of the circle is from the turntable’s centre c, the longer
the physical trajectory it covers during rotation and, thus, the faster it moves (Figure 25.3a). When
orientation stability is reached, however, all these differences in velocity disappear (Figure 25.3b).
Second minimization: the velocity of the eccentric dot e is different from that of the points on the
526 Vezzani, Kramer, and Bressan

(a) (b) a'

o' a' o' b'


e e
b'
a a

o e b c o e b c

(c) a''

a' o' b'

b''

a o b c

Fig. 25.3  (a) When the circle rotates around the turntable’s centre c, its points move at different
velocities. For example, the trajectory a-a’ is longer than the trajectory b-b’, and a moves therefore
faster than b. (b) When stability of orientation is reached, all points cover equally long trajectories
and therefore have the same velocity. The trajectory and velocity of the eccentric dot e, however,
are unaffected by the orientation stability of the circle, and remain different from those of a
and b. (c) The bar ab moves (solid arrows) around the turntable’s centre c. After a 90o rotation of the
turntable, it ends up as a’’b’’. What is perceived before the stereokinetic transformation, however,
is that the bar ab rotates clockwise around its own centre, which concurrently moves from o to o’
along a clockwise circular path. The two components into which the linear velocity of a and b can
be subdivided occur simultaneously, but their description may be simplified by imagining them as
consecutive: in this case, ab would move to a’b’ (dashed arrows) and a’b’ would move to a’’b’’
(dotted arrows).

circle, and by the addition of a depth component, another minimization of velocity differences
takes place. It results in a rigid cone whose points, including e, all have the same velocity (for a
complete geometrical analysis, see Zanforlin 1988a,b).
The minimum-relative-motion explanation can be extended to the rotating ellipse and the
rotating bar (Beghi et al. 2008; Zanforlin 1988b, 2000; Zanforlin and Vallortigara 1988). Here we
will describe how it applies to the latter, which is a case of stereokinesis on inadequate basis.
At first, a bar drawn radially on a rotating turntable is simply perceived to move around
the turntable’s centre, like a rotating clock hand. After a while, it seems to rotate around its
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 527

own centre as well (Figure 25.3c), and finally, all of a sudden, it looks slanted into 3-D space
(Mefferd and Wieland 1967; Musatti 1955; Renvall 1929). The bar end that is farther away from
the centre of rotation appears closer to the observer. The bar never becomes elastic; hence, its
stereokinetic transformation cannot be explained as a rigid interpretation of a non-rigidity.
It can, however, be explained within the minimum-relative-motion model (Zanforlin and
Vallortigara 1988).
Again, two separate minimizations of relative velocity differences are involved. The first explains
the rotation of the bar around its own centre, the second the bar’s dislocation in depth. In Figure
25.3c, a moves faster then o and o moves faster than b. The linear velocity of a and b can be
subdivided into a common component, identical to that of o, and a residual one. If only the first
component were present, the points a, b, and o would be motionless relative to one another, and
would move at the same velocity with respect to the turntable’s centre c. Once this component is
subtracted from the motion of a and b, a second component remains: a and b appear to rotate
around o, at the same speed but in opposite directions. This corresponds to the apparent rotation
of the bar around its own centre.
The speed difference between a and b disappears as a result of the first minimization. However,
because of the residual motion component, the velocities of a and b are still different from the
velocity of o. According to Zanforlin and Vallortigara (1988; for a geometrical demonstration see
also Beghi et al. 2008; Zanforlin 2000), the second minimization makes the three velocities iden-
tical by slanting the bar in depth.

Stereokinesis with, and from, illusory contours


Ellipses delimited by illusory contours produce stereokinetic rings and cones that are as vivid and
impressive as their real-contour equivalents (Bressan and Vallortigara 1986b). Conversely, illusory
contours can emerge as a byproduct of stereokinesis. An especially convincing case is the Saturn illu-
sion (Vallortigara et al. 1986), evoked by the slow rotation on a turntable of a filled ellipse with two
symmetrically attached semi-rings (Figure 25.4a). This stimulus produces a series of partially rigid
percepts that culminate in a compelling 3-D impression. The latter consists in an egg-shaped object
surrounded by a ring, similar to an elongated planet Saturn; egg and ring move solidly in space.
Whereas inexperienced observers take five to seven minutes on average to see the Saturn-like
percept, this incubation time progressively decreases with repeated exposures, down to an asymp-
totic value of about 15–20 seconds (Bressan and Vallortigara 1987b). Interestingly, experience
does not compress every stage of the stereokinetic transformation equally, but selectively elimi-
nates locally rigid solutions (such as the combination of a slanted rigid disc and an elastic ellipse).
Thus, experienced observers proceed directly from impressions of deformations of the flat con-
figuration to the Saturn-like percept. Bressan and Vallortigara argued that the residual 15 seconds,
which could not be further reduced, are the fixed time needed to compute a rigid 3-D solution
from 2-D deformations.
In the Saturn illusion, the ring appears completed amodally behind the egg and modally in
front of it. Importantly, the illusory section in front of the egg (reminiscent of Tynan and Sekuler’s
[1975] moving visual phantoms) emerges concomitantly with the egg itself, never before. Some
variants of the Saturn stimulus produce stereokinesis-dependent moving phantoms that can be
extraordinarily articulated. Upon rotation, for example, Figure 25.4c creates a “diadem-like” illu-
sory ring (Figure 25.4d), whereas Figure 25.4b does not. Locally, where the illusory ring completes
modally in front of the egg, Figure 25.4b and Figure 25.4c are identical. The latter’s 3-D diadem
must therefore be the result of a global, rather than local, interpretation (Bressan and Vallortigara
1987a). (Musatti (1955) described a related phenomenon occurring during the rotation of two
528 Vezzani, Kramer, and Bressan

(a) (b)

(c) (d)

Fig. 25.4  The stimulus (a), in rotation, produces the Saturn illusion, which includes a (partially)
illusory ring. The stimulus (b) produces the Saturn illusion, but no moving phantoms connecting
the three bottom bars to the illusory ring. The stimulus (c) produces the Saturn illusion with a
“diadem-like” illusory ring in which the three bottom bars, although locally identical to (b), are
connected to the ring by moving phantoms, as depicted in (d).
Reproduced from P. Bressan and G. Vallortigara, Stereokinesis with moving visual phantoms, Perception
16(1), pp. 73–8, Figures 25.1, 25.3, and 25.4 Copyright © 1987, Pion. With kind permission from Pion Ltd,
London www.pion.co.uk and www.envplan.com.

nested dashed circles: occasionally, the gaps between the dashes on one circle appeared to link up
with the gaps on the other, fleetingly forming illusory contours. For details, see Albertazzi 2004.)
Stereokinesis can also affect perceived color, by creating 3-D perceptual objects that are then
filled-in with the color of nearby elements (neon color spreading: for a review, see Bressan et al.
1997). For example, after some observation time, two small red discs on a rotating turntable
give rise to a slightly reddish cylinder spanning between them (Figure 25.5a; see Zanforlin 2003;
Zanforlin and Vallortigara 1990). If the two red discs are replaced by red circles, neon color
spreading does not occur (Figure 25.5b), unless at least one of the circles has a gap that is oriented
towards the other (Figure 25.5c). (For a separate demonstration of neon color spreading in ste-
reokinesis, see Bressan and Vallortigara 1991.)

Kinetic depth effect and structure from motion


Metzger
Relying on a method by Miles (1931), Metzger (1934, 1935) appears to have been the first to
explore what Wallach and O’Connell (1953) later called “kinetic depth effect”—the illusion of 3-D
structure from a moving 2-D projection. Since the 18th century (Smith 1738, p. 61), it had been
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 529

(a) (b) (c)

Fig. 25.5  Rotation of each of the stimuli (a), (b), and (c) produces an illusory cylinder. The inducing
elements are red (here shown in grey) and the cylinder is reddish in (a) and (c), and colorless in
(b). Similar stereokinetic effects can also be obtained with black inducers, but in this case only the
illusory-contour cylinder in (a) is tinged.
Reproduced from M. Zanforlin and G. Vallortigara, The magic wand: a new stereokinetic anomalous surface,
Perception 19(4), pp. 447–57, Copyright © 1990, Pion. With kind permission from Pion Ltd, London www.pion.
co.uk and www.envplan.com.

b
c

a
Fig. 25.6  The device used by Metzger (1934). The turntable b with the vertical rods is set in rotation.
The rods are illuminated by the light source c and their shadows are projected onto a translucent
screen a.
Reproduced from Psychologische Forschung, 19(1), pp. 1–60, Beobachtungen über phänomenale Identität,
Wolfgang Metzger, © 1934, Springer-Verlag. With kind permission from Springer Science and Business Media.

known that the blades of a windmill silhouetted against the sky often reverse their apparent direc-
tion of motion. To investigate this phenomenon, Miles (1931) projected on a screen the shadow
of a two-bladed rotating fan. His observers reported, among other things, a rotary motion that
often reversed. As Musatti (1955) had already noticed in stereokinesis, what the observers saw was
affected by the experimenter’s suggestions.
Metzger used a method similar to Miles’s, but with the device illustrated in Figure 25.6. A set of
thin rods stood on a rotating horizontal turntable; the rods’ shadows were cast onto a translucent
screen. The relatively large distance between the light source and the turntable (five meters) and
the relatively small distance between the turntable and the screen (as small as possible) ensured
that the projection was approximately orthographic rather than perspective. Whereas in a per-
spective projection all imaginary projection lines meet at one point, in orthographic projection
they are (a)  parallel to one another (parallel projection) and (b)  orthogonal to the projection
530 Vezzani, Kramer, and Bressan

(a) (b) (c)

Fig. 25.7  If stimulus (a) is set in rotation behind aperture (b), observers see a solid pyramid (c).
Data from Metzger, Wolfgang. translated by Lothar Spillman., Laws of Seeing, 2006, The MIT Press.

plane. Thus, in orthographic projection, unlike in perspective projection, identical objects at dif-
ferent distances all cast identical images onto the projection plane. In this way, orthographic pro-
jections allow the removal of perspective cues to depth. To ensure that indeed all perspective cues
to depth were eliminated, Metzger also blocked the ends of the rods from view; on the screen, they
all had the same height. The shadows of the rods moved horizontally over the screen, with con-
stantly changing distances between them. The velocity of the turntable was uniform, and hence,
each shadow performed a simple harmonic motion.
With this device, observers initially see the shadows move horizontally in 2-D. When they overlap,
the shadows can be seen to either stream (that is, to continue in the same direction) or bounce. For
individuals who tend to see streaming rather than bouncing, the 2-D percept is eventually replaced by
one of circular motion in 3-D: the kinetic depth effect (KDE). While the variable (harmonic) motion
of each shadow becomes perceptually uniform, the relative motion between them disappears and
they unite into a rigid whole. The shadows then appear as edges and no longer as independent lines.
Metzger’s explanation is that, in accordance with Gestalt theory (e.g., Wertheimer 1923; for reviews,
see Wagemans et al. 2012a,b; also Wagemans, this volume; van der Helm, this volume), the visual
system appears to adopt the simplest and most stable (least changing) interpretation of the stimulus.
Metzger noted that the initial 2-D percept might be due to the thin rods’ shadows appearing, at
first, as figures (e.g., Metzger 1935, section 19). At this stage there would be no deforming surfaces
because the space between the shadows is seen as background, and backgrounds have no shape of
their own (Rubin 1921). Later, the rods’ shadows appear as borders of continually deforming sur-
faces. Only then can a tendency to minimize deformations arise—producing the rigid 3-D percept.
This idea was put to test by Giorgio Tampieri (1956, 1968), who used stimuli composed of colored
areas that could only be perceived as surfaces (Figure 25.7a). If the hypothesis were correct, the 3-D
percept should emerge virtually right away. For example, Tampieri rotated Figure 25.7a’s polygon
around its centre, behind a screen with a wedge-shaped aperture whose apex coincided with the
polygon’s centre (Figure 25.7b). What observers saw was one face after another of a solid rotating
pyramid (Figure 25.7c). Tampieri reported that the impression of depth was more compelling than
in Benussi and Musatti’s stimuli and indistinguishable from that produced by a real pyramid. More
importantly, the depth percept emerged instantaneously, confirming the hypothesis.

Wallach
According to Wallach and colleagues (Wallach and O’Connell 1953; Wallach et al. 1953), any 3-D
percept of a monocular, static stimulus is based on a learned association between a 2-D retinal
projection and a 3-D structure. Wallach and colleagues argued that, initially, it is the KDE that
allows the 3-D structure of an object to be perceived. Because such a structure becomes associated
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 531

with the object’s retinal projection, this projection will subsequently evoke the 3-D structure even
when the object does not move.
To test this hypothesis Wallach and colleagues investigated, using Metzger’s technique, various
simple wire objects, whose orthographic 2-D projections are interpreted as 3-D only when they
move. They presented stationary projections up to seven days after subjects had viewed the moving
ones. Nearly all subjects perceived the stationary projections as coming from 3-D objects, whereas
before exposure to the KDE, they did not. (For a related modern study, see Sinha and Poggio 1996.)
Wallach and O’Connell (1953) thought they had demonstrated the necessary and sufficient
conditions of the KDE:  the projected contours had to change in both length and orientation.
Although Metzger had shown that changes in length (of the spaces between contours) were
enough, Wallach and O’Connell doubted whether the phenomenon described by Metzger could
be experienced by naïve observers—unless prompted about what they should see. However,
White and Mueser (1960) confirmed Metzger’s findings, and actually extended them to displays
with two rods only. Later studies showed that whereas the KDE is stronger with both length
and orientation changes, the former is sufficient (e.g., Börjesson and von Hofsten 1972, 1973;
Johansson and Jansson 1968).
Wallach and colleagues also proposed that stereokinesis could be explained by simultaneous
changes in the length and orientation of virtual, rather than real, lines. Consider, for example,
a rotating disc with two nested, non-concentric circles and a virtual line that connects them.
Because of orientation stability, the two circles appear to move relative to each other and this
causes the virtual line to change in both length and orientation. Thus, at least some stereokinetic
stimuli could be seen as forms of KDE (Wallach and Centrella 1990; Wallach et al. 1956).

Ullman
The rigidity assumption
Wallach and O’Connell (1953) investigated, but did not explain, the KDE. Ullman (1977; 1979a,b),
calling the same phenomenon structure from motion (SfM), did and his first use of a computa-
tional approach proved very influential.
Ullman studied the orthographic projection of two transparent virtual cylinders with a
common vertical axis (Figure 25.8; for a related demonstration, see <www.youtube.com/
watch?v=RdwU28bghbQ>). Each cylinder was defined by 100 points, scattered across its virtual
surface. The cylinders were perceived as such when rotating, but appeared flat when stationary.
The perception of SfM with this type of stimulus allowed the exclusion of an explanation (based
on Gestalt grouping by common fate) in which points must be grouped into objects before any
depth is recovered. In fact, even if the points sitting on each cylinder move at the same speed in
3-D space, their 2-D projections span an ample range of velocities. In the stimulus of Figure 25.8,
various points belonging to the same cylinder move at different speeds, whereas various points
belonging to different cylinders move at the same speed.
In principle, the 2-D projections can be produced by an infinite number of rotating 3-D objects
(Eriksson 1973). Like others before him (e.g., Johansson 1975), Ullman assumed that 3-D objects
are perceived as rigid. His structure-from-motion theorem states that, given this rigidity assumption,
three distinct orthographic or perspective views of just four non-coplanar points4 suffice to nar-
row the possibilities down to just one correct solution. It follows that an object cannot possibly be

  How the points in one view are correctly matched to those in another view is a called the correspondence problem.
4

Because this is typically studied as a separate topic we will not discuss it here; see Herzog and Ogmen, this volume.
532 Vezzani, Kramer, and Bressan

Fig. 25.8  A side view of two nested cylinders exclusively defined by dots (outlines were not
presented), illuminated from the right and projected orthographically onto a screen on the left.
Adapted from Ullman, Shimon, The Interpretation of Visual Motion, figure 4.1, page 135, © 1979 Massachusetts
Institute of Technology, by permission of The MIT Press.

perceived as rigid when it is not, and that incorrect “phantom structures” cannot emerge either; “the
interpretation scheme is virtually immune to misinterpretation” (Ullman 1979b, p. 411). However,
2-D orthographic projection determines a 3-D object only up to a reflection about the frontal plane.
That is, the perceived 3-D object can reverse in depth, while simultaneously inverting its apparent
direction of rotation, a bistability that is unavoidable with orthographically projected stimuli.
Braunstein and Andersen (1984) presented evidence against the rigidity assumption. However,
Ullman (1979a,b; 1984a) was already aware that 2-D projections could lead not only to rigid, but
also to non-rigid, SfM percepts (e.g., Braunstein 1962; Green 1961; Wallach and O’Connell 1953;
Wallach et al. 1956; White and Mueser 1960). He claimed that non-rigid SfM only occurs if the
2-D projection (a) looks 3-D even when stationary—as in the case of a distorting Necker cube—or
(b) is misperceived—as in the case of smooth contours lacking distinguishable, traceable features.

The incremental rigidity scheme


Ullman (1984b) attempted to overcome two important drawbacks of his earlier work: (a) the fail-
ure to deal with non-rigid SfM involving, for example, bending and stretching (e.g., Jansson and
Johansson 1973), and (b) the failure to account for improvement in SfM perception with observa-
tion time (e.g., Green 1961; White and Mueser 1960). To this end, he proposed the incremental
rigidity scheme. In this scheme, an internal model of a 3-D object is maintained that consists of
a set of 3-D coordinates and is compared with each frame of a discrete sequence of 2-D projec-
tions of a moving 3-D object. Each frame consists of a set of 2-D coordinates. Initially, the model
is based on stationary 3-D cues, like stereopsis, texture, or shading—which allows their integra-
tion with dynamic cues. If these stationary cues are unavailable, then the model is initially flat.
After each comparison between the 3-D model and a 2-D frame, the depth values of the model
are updated. During this update, the model is maintained as rigid as possible while rendering it
consistent with the frame. That is, across a sequence of frames the model is, in Ullman’s words,
incrementally rigid. As such, it can explain some nonrigidity during each update and substantial
nonrigidity in the sequence of frames as a whole.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 533

Because it tends to be initially inaccurate and to improve with each update, the internal
model accounts at least qualitatively for the fact that human SfM perception improves with
observation time. Yet, Ullman (1984b) admitted that the model had an important draw-
back:  even after a long exposure time, the recovered model of a rigid 3-D object still con-
tains residual non-rigid distortions. (For an elaboration of Ullman’s ideas, see Grzywacz and
Hildreth 1987; Hildreth et al. 1995.)

Euclidean vs. affine space


Ullman’s SfM algorithm aims to recover the structure of objects in Euclidean space, the space so
familiar to us that it has become our default one. Assuming that space in SfM is Euclidean, the
recovery of a rigid rotating object from its projection requires an analysis of the relations between
at least three distinct views of four non-coplanar points (see previous section). Two views suf-
fice to estimate velocity (assuming smoothness of motion). At least three are necessary to esti-
mate acceleration: the first and second views can provide one velocity estimate, the second and
the third another, and the estimate of the change between them is an estimate of acceleration.
However, with each of these evaluations subject to noise, acceleration estimates are necessarily
noisier than velocity estimates. Put differently, because acceleration is a derivative of velocity, its
estimate amplifies noise present in velocity estimates.
Indeed, whereas human sensitivity to velocity is relatively good, sensitivity to acceleration is
poor (for a review, see Todd 1998). Consistent with these findings, acceleration—or any com-
parison between more than two views—does not appear to play a major role in SfM; this implies
that Ullman’s algorithm, which relies on comparisons between three views, does not appear to
describe our visual system’s behaviour well. Moreover, our visual system turns out not to recover
the Euclidean properties of 3-D objects at all (Domini and Braunstein 1998; Todd and Bressan
1990; for reviews, see Domini and Caudek 2003; Todd 1998).
From an analysis of just two, rather than three, distinct views of four non-coplanar points, it
is possible to recover objects in affine, rather than Euclidean, space—even when these objects
are largely non-rigid: the affine structure-from-motion theorem (e.g., Koenderink and van Doorn
1991). Affine space is a less constrained version of Euclidean space (i.e., it is based on fewer
axioms). In affine space, it is still possible to establish whether two points on an object are copla-
nar or not, and whether two lines connecting pairs of points on the object are parallel or not, but
only the depth order between pairs of points can be obtained, and not the interval-scale distances
between them (Domini and Caudek 2003; Todd et  al. 2001). If, from a projection, the visual
system were at best only able to recover an object in affine space, then this object should be per-
ceptually indistinguishable from another one with identical affine, but different Euclidean, prop-
erties. This does indeed appear to be the case (Todd and Bressan 1990; for reviews, see Domini
and Caudek 2003; Todd 1998).

Optic-flow components and projection types


Optic flow—the total movement in a 2-D projection of 3-D motion—has four separate compo-
nents: translation, curl, divergence, and shear or deformation (for a review, see Koenderink 1986).
Translation is the uniform motion of the optic flow along a linear path, curl is its uniform rotation,
and divergence its uniform expansion or contraction. Deformation is a contraction in one direc-
tion and expansion in the orthogonal direction, while preserving area. Deformation is the only
component of optic flow that contains information about the original object’s shape.
It is unlikely that SfM is based on an analysis of optic flow as a whole. Internal inconsistencies
between different depth estimates in the same SfM percept suggest that SfM is computed locally
534 Vezzani, Kramer, and Bressan

rather than globally (Domini and Braunstein 1998; for a review, see Domini and Caudek 2003).
Locally computed optic-flow deformation does suffice to recover the local affine properties of objects
(Koenderink 1986; Koenderink and van Doorn 1991). By itself, though, the recovery of these affine
properties still leaves room for an infinite number of interpretations of a particular projection. Figure
25.9, for example, shows two doors. The first is narrow and swings open fast (Figure 25.9a). The sec-
ond is wide, already partially open, but swings further open more slowly (Figure 25.9b). In both cases
the projected widths of the doors shrink; and, for particular widths and rotational velocities, the two
doors produce exactly the same optic flow. In fact, the number of doors that can produce this optic
flow is infinite. Yet, at any one time, our visual system chooses only one of them as its SfM solution.
It has been proposed that, even if other depth cues are ignored, the visual system need not
necessarily be constrained by optic flow alone. In all likelihood, it is also constrained by noise
within the visual system. If it is assumed that deformation values are subject to Gaussian noise,
then it turns out that, given the observed 2-D deformation, different 3-D interpretations have a
different posterior probability of being correct (Domini and Caudek 2003). As its SfM solution,
the visual system might therefore adopt the particular 3-D interpretation that maximizes this
posterior probability. In the example of Figure 25.9, it will thus adopt one particular pair of slant
and rotational velocity values to arrive at one unambiguous SfM solution. The authors suggest,
though, that in order to assess posterior probabilities some learning may be required. With this
observation, we thus seem to have come full circle in this chapter; one of the first conjectures we

(a)

(b)

Fig. 25.9  Projections of two opening doors viewed from above. In each panel, the solid bar on the
left represents a door that opens until it reaches the position indicated by the dashed bar. The solid
bar on the right represents a 2-D projection screen. The dotted lines represent projection lines from
the door onto the 2-D screen. The door is relatively narrow and initially closed in (a) and relatively
wide and initially already partially open in (b). Notice, however, that despite that the doors differ in
width, their projections on the screen are identical.
Reprinted from Trends in Cognitive Sciences, 7(10), Fulvio Domini and Corrado Caudek, 3-D structure perceived
from dynamic information: a new theory, pp. 444–9, Copyright (2003), with permission from Elsevier.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 535

reported here about how 3-D percepts might arise from 2-D stimuli involved this very idea that
learning from past experience would be essential.
Until now, we have only considered orthographic projections of dynamic stimuli. The projec-
tion of the world onto our retinae, however, is perspective, not orthographic. In orthographic
projections, the projected distance between two points in a frontal plane does not depend on this
plane’s depth (i.e., its distance along the z-axis). In perspective projections, in contrast, it does; it
decreases with depth until it approaches zero at the vanishing point. Consequently, in perspective
projections, the further away a point is that moves a particular distance, the smaller its projected
traversed distance—and thus, the smaller its projected velocity. Stated more generally, in perspec-
tive projections, unlike orthographic ones, projected velocity is inversely proportional to depth.
This motion perspective is indeed used by our visual system (Jain and Zaidi 2011). Still, when
objects are fairly shallow, or not very close to the observer, their perspective projection approxi-
mates an orthographic one. At this point, the use of motion perspective becomes impossible. For
this reason, even though strictly speaking it is unwarranted, it is often reasonable to assume that
the projection of an object onto our retinae is orthographic.

Integration with other cues


SfM involves situations in which an object moves relative to the observer. Motion parallax involves
situations in which the observer moves relative to the object. Under particular conditions, the
two can produce exactly the same optic flow. The resulting percepts, however, need not be the
same. Motion parallax is effectively SfM, integrated with information from proprioception and
(efference-copy) information from the motor system. To enable this integration, the visual system
might adopt 3-D interpretations of optic flow that minimize the motion of the scene (the station-
arity assumption) and, when possible, assume self motion rather than motion in the scene (for a
review, see Wexler and van Boxtel 2005).
Apart from proprioceptive and motor information, there is other information that is integrated
into SfM perception. As discussed in the previous two sections, an analysis of just two distinct
2-D views suffices to recover the affine 3-D properties of an object. In stereopsis, it is also an
analysis of just two distinct 2-D views (one from the left eye and one from the right eye) that suf-
fices to recover those properties. Cross-adaptation studies have shown that adaptation to stereo-
scopic stimuli affect the perception of monocularly viewed motion-parallax stimuli and vice versa
(Nawrot and Blake 1989, 1991; Rogers and Graham 1984). These results suggest a tight integration
of SfM not only with proprioceptive and motor information, but with other depth cues too (see
also Domini et al. 2006; Landy et al. 1995).
In fact, recent psychophysical evidence suggests that stereoscopic and relative-motion depth
cues are integrated in the dorsal visual cortex (areas V3B and KO; Ban et al. 2012) and shows that
sensitivity to them deteriorates when they are not consistent and improves more than quadrati-
cally when they are. Earlier evidence indicates that some integration of stereoscopic and motion
information also takes place in area V5/MT (Andersen and Bradley 1998; Nadler et  al. 2008).
Moreover, in addition to stereoscopic and relative-motion ones, depth cues obtained from texture,
illumination, and shading are integrated as well (Landy et al. 1995; Norman et al. 2004).

Conclusion
There is a consensus that the recovered structure in structure from motion (a)  depends on
local, rather than global, computations, (b)  is—under most conditions—at best affine, rather
than Euclidean, and (c) need not be rigid. A recurring idea, in both structure from motion and
536 Vezzani, Kramer, and Bressan

stereokinesis, is that the visual system favours interpretations—whether 3-D or not—of 2-D
motion that contain as little motion as possible. Finally, an idea that has been around almost
since the beginning, but has attracted little systematic study, is that past experience may play a
key role.
Among others, studies on long-time congenitally blind patients who have recently gained their
sight suggest that past experience may, in fact, be more important for perception than has previ-
ously been thought (Ostrovsky et al. 2006; Ostrovsky et al. 2009). These patients, for example, have
difficulty parsing a simple stimulus consisting of a circle and a square that overlap; to them, the
stimulus appears to contain three non-overlapping shapes rather than just two overlapping ones.
However, if the circle and square are set in motion relative to each other, the patients suddenly
perceive what remains invariant:  not the three non-overlapping shapes, but the circle and the
square. Even more importantly, despite a critical period for the development of visual perception
has presumably long passed, this experience subsequently helps the patients to parse in a normal
way stationary stimuli too. It has been argued that the processing of invariants is critical to the
perception of optic flow as well (e.g., Gibson 1979; Marr 1982). If so, uncovering how this percep-
tual learning unfolds over time could be a particularly fruitful way forward in the study of both
stereokinesis and structure from motion.

References
Albertazzi, L. (2004). Stereokinetic shapes and their shadows. Perception 33: 1437–52.
Andersen, R. A. and Bradley, D. C. (1998). Perception of three-dimensional structure from motion. Trends
in Cognitive Sciences 2: 222–8.
Ban, H., Preston, T. J., Meeson, A., and Welchman, A. E. (2012). The integration of motion and disparity
cues to depth in dorsal visual cortex. Nature Neuroscience 15: 636–43.
Beghi, L., Xausa, E., and Zanforlin, M. (2008). Modelling stereokinetic phenomena by a minimum relative
motion assumption: The tilted disk, the ellipsoid and the tilted bar. Biological Cybernetics 99: 115–23.
Beghi, L., Xausa, E., De Biasio, C., and Zanforlin, M. (1991a). Quantitative determination of the
three-dimensional appearances of a rotating ellipse without a rigidity assumption. Biological Cybernetics
65: 433–40.
Beghi, L., Xausa, E., and Zanforlin, M. (1991b). Analytic determination of the depth effect in stereokinetic
phenomena without a rigidity assumption. Biological Cybernetics 65: 425–32.
Benussi, V. (1922–1923). Introduzione alla psicologia sperimentale. Lezioni tenute nell’anno 1922–23.
Typescript by Dr. C. Musatti, Fondo Benussi. Milan: Bicocca University.
Benussi, V. (1925). La suggestione e l’ipnosi come mezzi di analisi psichica reale. Bologna: Zanichelli.
Benussi, V. (1927). Zur experimentellen Grundlegung hypnosuggestiver Methoden psychischer Analyse.
Psychologische Forschung 9: 197–274.
Börjesson, E. and von Hofsten, C. (1972). Spatial determinants of depth perception in two dot patterns.
Perception & Psychophysics 11: 263–8.
Börjesson, E. and von Hofsten, C. (1973). Visual perception of motion in depth: Application of vector
model to three-dot motion patterns. Perception & Psychophysics 13: 169–79.
Braunstein, M. L. (1962). Depth perception in rotating dot patterns: Effects of numerosity and perspective.
Journal of Experimental Psychology 64: 415–20.
Braunstein, M. L. and Andersen, G. J. (1984). A counterexample to the rigidity assumption in the visual
perception of structure from motion. Perception 13: 213–17.
Bressan, P. and Vallortigara, G. (1986a). Multiple 3-D interpretations in a classic stereokinetic effect.
Perception 15: 405–8.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 537

Bressan, P. and Vallortigara, G. (1986b). Subjective contours can produce stereokinetic effects. Perception
15: 409–12.
Bressan, P. and Vallortigara, G. (1987a). Stereokinesis with moving visual phantoms. Perception 16: 73–8.
Bressan, P. and Vallortigara, G. (1987b). Learning to see stereokinetic effects. Perception 16: 187–92.
Bressan, P. and Vallortigara, G. (1991). Illusory depth from moving subjective figures and neon colour
spreading. Perception 20: 637–44.
Bressan, P., Mingolla, E., Spillmann, L., and Watanabe T. (1997). Neon colour spreading: A review.
Perception 26: 1353–66.
D’Aversa, A. S. [Lottedyskolia] (2007, April 20). Marcel Duchamp—Anemic Cinema [Video file]. Retrieved
from <http://www.youtube.com/watch?v=dXINTf8kXCc&list=UU4CDskGLhCGq0jYuHRTR81g&in
dex=18>.
Domini, F. and Braunstein, M. L. (1998). Recovery of 3-D structure from motion is neither Euclidean nor
affine. Journal of Experimental Psychology: Human Perception and Performance 24: 1273–95.
Domini, F. and Caudek, C. (2003). 3-D structure perceived from dynamic information: A new theory.
Trends in Cognitive Sciences 7: 444–9.
Domini F., Caudek, C., and Tassinari, H. (2006). Stereo and motion information are not independently
processed by the visual system. Vision Research 46: 1707–23.
Duncan, F. S. (1975). Kinetic art: On my psychokinematic objects. Leonardo 8: 97–101.
Eriksson, E. S. (1973). Distance perception and the ambiguity of visual stimulation: A theoretical note.
Perception & Psychophysics 13: 379–81.
Fischer, G. J. (1956). Factors affecting estimation of depth with variations of the stereokinetic effect.
American Journal of Psychology 69: 252–7.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Green, B. F., Jr. (1961). Figure coherence in the kinetic depth effect. Journal of Experimental Psychology
62: 272–82.
Grzywacz, N. M. and Hildreth, E. C. (1987). Incremental rigidity scheme for recovering structure from
motion: Position-based versus velocity-based formulations. Journal of the Optical Society of America A
4: 503–18.
Hildreth, E. C., Ando, H., Andersen, R. A., and Treue, S. (1995). Recovering three-dimensional structure
with surface reconstruction. Vision Research 35: 117–35.
Isbell, L. A. (2006). Snakes as agents of evolutionary change in primate brains. Journal of Human Evolution
51: 1–35.
Jain, A. and Zaidi, Q. (2011). Discerning non-rigid 3-D shapes from motion cues. Proceedings of the
National Academy of Sciences 108: 1663–8.
Jansson, G. and Johansson, G. (1973). Visual perception of bending motion. Perception 2: 321–6.
Johansson, G. (1950). Configurations in event perception. Uppsala: Almkvist and Wiksell.
Johansson, G. (1975). Visual motion perception. Scientific American 232: 76–88.
Johansson, G. and Jansson, G. (1968). Perceived rotary motion from changes in a straight line. Perception &
Psychophysics 6: 193–8.
Koenderink, J. J. (1986). Optic flow. Vision Research 26: 161–80.
Koenderink, J. J. and van Doorn, A. J. (1991). Affine structure from motion. Journal of the Optical Society
of America A—Optics Image Science and Vision 8: 377–85.
Landy, M. S., Maloney, L. T., Johnston, E. B., and Young, M. (1995). Measurement and modeling of depth
cue combination: In defense of weak fusion. Vision Research 35: 389–412.
Liu, Z. (2003). On the principle of minimal relative motion—the bar, the circle with a dot, and the ellipse.
Journal of Vision 3: 625–9.
Mach, E. (1868). Beobachtungen über monokulare Stereoskopie. Sitzungsberichte der Wiener Akademie 58.
538 Vezzani, Kramer, and Bressan

Mach, E. (1886). Beiträge zur Analyse der Empfindungen. Jena: Gustav Fischer. English
translation: Contributions to the analysis of the sensations, C. M. Williams (trans.), 1897. Chicago: The
Open Court.
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual
information. New York: W.H. Freeman and Company.
Mefferd, R. B., Jr. (1968a). Perception of depth in rotating objects: 4. Fluctuating stereokinetic perceptual
variants. Perceptual and Motor Skills 27: 255–76.
Mefferd, R. B., Jr. (1968b). Perception of depth in rotating objects: 7. Influence of attributes of depth on
stereokinetic percepts. Perceptual and Motor Skills 27: 1179–93.
Mefferd, R. B., Jr. and Wieland, B. A. (1967). Perception of depth in rotating objects: 1. Stereokinesis and
the vertical-horizontal illusion. Perceptual and Motor Skills 25: 93–100.
Metzger, W. (1934). Beobachtungen über phänomenale Identität. Psychologische Forschung 19: 1–60.
Metzger, W. (1935). Tiefenerscheinungen in optischen Bewegungsfeldern. Psychologische Forschung
20: 195–260.
Metzger, W. (1975). Gesetze des Sehens. Eschborn: Klotz.
Miles, W. R. (1931). Movement interpretations of the silhouette of a rotating fan. American Journal of
Psychology 48: 392–405.
Musatti, C. L. (1924). Sui fenomeni stereocinetici. Archivio Italiano di Psicologia 3: 105–20.
Musatti, C. L. (1928). Sui movimenti apparenti dovuti ad illusione di identità di figura. Archivio Italiano di
Psicologia 6: 205–19.
Musatti, C. L. (1928–1929). Sulla percezione di forme di figura oblique rispetto al piano frontale. Rivista di
Psicologia 25: 1–14.
Musatti, C. L. (1929). Sulla plasticità reale, stereocinetica e cinematografica. Archivio Italiano di Psicologia
7: 122–37.
Musatti, C. L. (1930). I fattori empirici della percezione e la teoria della forma. Rivista di Psicologia 26: 259–64.
Musatti, C. L. (1931). Forma e assimilazione. Archivio Italiano di Psicologia 9: 61–156.
Musatti, C. L. (1937). Forma e movimento. Atti del Reale Istituto Veneto di Scienze, Lettere e Arti 97: 1–35.
Musatti, C. L. (1955). La stereocinesi e il problema della struttura dello spazio visibile. Rivista di Psicologia
49: 3–57.
Musatti, C. L. (1975). On stereokinetic phenomena and their interpretation. In: G.B. Flores D’Arcais (ed.),
Studies in Perception. Festschrift for Fabio Metelli, pp. 166–89. Milan-Florence: Martello-Giunti.
Nadler, J. W., Angelaki, D. E., and DeAngelis, G. C. (2008). A neural representation of depth from motion
parallax in macaque visual cortex. Nature 452: 642–5.
Nawrot, M. and Blake, R. (1989). Neural integration of information specifying structure from stereopsis
and motion. Science 244: 716–18.
Nawrot, M. and Blake, R. (1991). The interplay between stereopsis and structure from motion. Perception
& Psychophysics 49: 230–44.
Norman, J. F., Todd, J. T., and Orban, G. A. (2004). Perception of three-dimensional shape from specular
highlights, deformations of shading, and other types of visual information. Psychological Science
15: 565–70.
Ostrovsky, Y., Andalman, A., and Sinha, P. (2006). Vision following extended congenital blindness.
Psychological Science 17, 12: 1009–14.
Ostrovsky, Y., Meyers, E., Ganesh, S., Mathur, U., and Sinha, P. (2009). Parsing images via dynamic cues.
Psychological Science 20: 1484–91.
Piggins, D., Robinson, J., and Wilson, J. (1984). Illusory depth from slowly rotating 2-D figures: The
stereokinetic effect. In: W. N. Charman (ed.), Transactions of the First International Congress, “The
Frontiers of Optometry”. London: British College of Ophthalmic Opticians [Optometrists], Vol. 1,
pp. 171–82.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 539

Proffitt, D. R., Rock, I., Hecht, H., and Schubert, J. (1992). Stereokinetic effect and its relation to the
kinetic depth effect. Journal of Experimental Psychology: Human Perception and Performance 18: 3–21.
Renvall, P. (1929). Zur Theorie der stereokinetischen Phänomene, in E. Kaila (ed.) Annales Universitatis
Aboensis, Series B, 10.
Robinson, J. O., Piggins, D. J., and Wilson, J. A. (1985). Shape, height and angular movement in
stereokinesis. Perception 14: 677–83.
Rogers, B. J. and Graham, M. E. (1984). After effects from motion parallax and stereoscopic
depth: Similarities and interactions. In: L. Spillman and B. R. Wooten (eds.), Sensory experience,
adaptation, and perception: Festschrift for Ivo Kohler, pp. 603–19. Hillsdale: Lawrence Erlbaum and
Associates.
Rubin, E. (1921). Visuell wahrgenommene Figuren. Copenhagen: Gyldendalske.
Sinha, P. and Poggio, T. (1996). Role of learning in three-dimensional form perception. Nature 384: 460–3.
Smith, R. (1738). A Complete System of Optics in Four Books. Cambridge: Printed for the author.
Tampieri, G. (1956). Contributo sperimentale all’analisi dei fenomeni stereocinetici. Rivista di Psicologia
50: 83–92.
Tampieri, G. (1968). Sulle condizioni del movimento stereocinetico. In: G. Kanizsa, G. Vicario (eds.),
Ricerche sperimentali sulla percezione, pp. 199–217. Trieste: Università degli Studi di Trieste.
Todd, J. T. (1998). Theoretical and biological limitations on the visual perception of three-dimensional
structure from motion. In: T. Watanabe (ed.), High-level motion processing- computational,
neurophysiological and psychophysical perspectives, pp. 359–80. Cambridge: MIT Press.
Todd, J. T. and Bressan, P. (1990). The perception of 3-dimensional affine structure from minimal apparent
motion sequences. Perception & Psychophysics 48: 419–30.
Todd, J. T., Oomes, A. H. J., Koenderink, J. J., and Kappers, A. M. L. (2001). On the affine structure of
perceptual space. Psychological Science 12: 191–6.
Todorović, D. (1993). Analysis of two- and three-dimensional rigid and nonrigid motions in the
stereokinetic effect. Journal of the Optical Society of America A 10: 804–26.
Tynan, P. and Sekuler, R. (1975). Moving visual phantoms: A new contour completion effect. Science
188: 951–2.
Ullman, S. (1977). The interpretation of visual motion (Unpublished doctoral dissertation). MIT,
Cambridge, MA.
Ullman, S. (1979a). The interpretation of visual motion. Cambridge: MIT Press.
Ullman, S. (1979b). The interpretation of structure from motion. Proceedings of the Royal Society of London.
Series B, Biological Sciences 203: 405–26.
Ullman, S. (1984a). Rigidity and misperceived motion. Perception 13: 219–20.
Ullman, S. (1984b). Maximizing rigidity: The incremental recovery of 3-D structure from rigid and
nonrigid motion. Perception 13: 255–74.
Vallortigara, G., Bressan, P., and Bertamini (1988). Perceptual alternations in stereokinesis. Perception
17: 31–4.
Vallortigara, G., Bressan, P., and Zanforlin, M. (1986). The Saturn illusion: A new stereokinetic effect.
Vision Research 26: 811–13.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der
Heydt, R. (2012a). A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping and
Figure-Ground Organization. Psychological Bulletin 138: 1172–217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P.A., and van
Leeuwen, C. (2012b). A Century of Gestalt Psychology in Visual Perception: II. Conceptual and
Theoretical Foundations. Psychological Bulletin 138: 1218–52.
Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung. Psychologische Forschung
20: 325–80.
540 Vezzani, Kramer, and Bressan

Wallach, H. and Centrella N. M. (1990). Identity imposition and its role in a stereokinetic effect. Perception
& Psychophysics 48: 535–42.
Wallach, H. and O’Connell, D. N. (1953). The kinetic depth effect. Journal of Experimental Psychology
45: 205–17.
Wallach, H., O’Connell, D. N., and Neisser, U. (1953). The memory effect of visual perception of
three-dimensional form. Journal of Experimental Psychology 45: 360–8.
Wallach, H., Weisz, A., and Adams, P. A. (1956). Circles and derived figures in rotation. American Journal
of Psychology 69: 48–59.
Wardle, S. G., Cass, J., Brooks, K. R., and Alais, D. (2010). Breaking camouflage: Binocular disparity
reduces contrast masking in natural images. Journal of Vision 10(14) 38: 1–12.
Weiss, Y. and Adelson, E. H. (2000). Adventures with gelatinous ellipses—constraints on models of human
motion analysis. Perception 29: 543–66.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. II. Psychologische Forschung 4: 301–50.
English translation in: L. Spillmann (ed.), On perceived motion and figural organization, pp. 127–82.
Cambridge: MIT Press.
Wexler, M. and van Boxtel, J. A. (2005). Depth perception by the active observer. Trends in Cognitive
Sciences 9: 431–8.
White, B. W. and Mueser, G. E. (1960). Accuracy in reconstructing the arrangement of elements generating
kinetic depth displays. Journal of Experimental Psychology 60: 1–11.
Wieland, B. A. and Mefferd, R. B., Jr. (1968). Perception of depth in rotating objects: 3. Asymmetry and
velocity as the determinants of the stereokinetic effect. Perceptual and Motor Skills 26: 671–81.
Wilson, J. A., Robinson, J. O, and Piggins, D. J. (1983). Wobble cones and wobble holes—the stereokinetic
effect revisited. Perception 12: 187–93.
Zanforlin, M. (1988a). The height of a stereokinetic cone: A quantitative determination of a 3-D effect from
a 2-D moving pattern without a “rigidity assumption.”. Psychological Research 50: 162–72.
Zanforlin, M. (1988b). Stereokinetic phenomena as good gestalts. The minimum principle applied to circles
and ellipses in rotation: A quantitative analysis and a theoretical discussion. Gestalt Theory 10: 187–214.
Zanforlin, M. (1999). La visione tridimensionale dal movimento o stereocinesi. In: F. Purghé, N. Stucchi,
A. Olivero (eds.), La percezione visiva, pp. 438–59. Turin: UTET.
Zanforlin, M. (2000). The various appearances of a rotating ellipse and the minimum principle: A review
and an experimental test with non-ambiguous percepts. Gestalt Theory 22: 157–84.
Zanforlin, M. (2003). Stereokinetic anomalous contours: Demonstrations. Axiomathes 13: 389–98.
Zanforlin, M. and Vallortigara, G. (1988). Depth effect from a rotating line of constant length. Perception
&Psychophysics 44: 493–9.
Zanforlin, M. and Vallortigara, G. (1990). The magic wand: A new stereokinetic anomalous surface.
Perception 19: 447–57.
Chapter 26

Interactions of form and motion


in the perception of moving objects
Christopher D. Blair, Peter U. Tse,
and Gideon P. Caplovitz

Introduction
This chapter covers a few highlights from the past 20 years of research demonstrating that there is
‘motion from form’ processing. It has long been known that the visual system can construct ‘form
from motion.’ For example, appropriate dot motions on a two-dimensional computer screen can
lead to a percept of, say, a rotating three-dimensional cylinder or sphere. Less appreciated has
been the degree to which perceived motion follows from processes that rely upon rapid analyses
of form cues. Percepts that depend on such form-motion interactions reveal that form informa-
tion can be processed and integrated with motion information to determine both the perceived
velocity and shape of a moving object. These integration processes must be rapid enough to occur
in the brief period, probably less than a quarter of a second, between retinal activation and visual
experience.
Data suggest that global form analyses subserve motion processing in at least five ways (Porter
et al., 2011). Here, we describe three examples in which the analysis of form significantly influ-
ences our experience of moving objects. The following examples have been chosen not only for
their distinctiveness, but also to compliment other examples described in detail within other
chapters of this book (Bruno & Bertamini; Herzog & Öğmen; Hock; Vezzani et al.). First, we
describe Transformational Apparent Motion, a phenomenon that reveals how form analyses
permit the figural segmentation dedicated to solving the problem of figure-to-figure match-
ing over time (Hsieh and Tse, 2006; Tse, 2006; Tse & Caplovitz, 2006; Tse & Logothetis, 2002).
Secondly, we describe how the size and shape of an object can influence how fast it is perceived
to rotate. These interactions reveal the way in which form analyses permit the definition of
trackable features whose unambiguous motion signals can be generalized to ambiguously mov-
ing portions of an object to solve the aperture problem (Caplovitz et al., 2006; Caplovitz & Tse,
2007a,b). Finally, we describe a number of peculiar ways in which the motions of individual
elements can interact with the perceived shape and motion of a global object constructed by
the grouping of these elements. These phenomena reveal that the form analyses that underlie
various types of perceptual grouping can lead to the generation of emergent motion signals
belonging to the perceptually grouped object that appear to underlie the conscious experience
of motion (Caplovitz & Tse, 2006, 2007b; Hsieh & Tse, 2007; Kohler et al., 2010; Kohler et al.,
2009).
(a) Display Percept (b) Transformational Apparent Motion

1.

Frame 1 Frame 2 Percept


2. Translational Apparent Motion

3.

Frame 1 Frame 2 Percept

Fig. 26.1  (a) Transformational Apparent Motion (TAM). Two abutting shapes are flashed in sequence, as shown on the left. The resulting percept is of
one shape smoothly extending from, and retracting back into the other, as depicted on the right. (b) TAM v. Translational Apparent Motion. In TAM
displays (top), when two frames are flashed in sequence, if the shapes in the second frame abut those in the first frame the percept is of smooth
deformation that is based on the figural parsing of the objects in both frames. However, in translational apparent motion displays (bottom), when the
shapes in the second frame do not abut those in the first frame, rigid motion to the nearest neighbor is perceived independent of any figural parsing.
Interactions of Form and Motion in the Perception of Moving Objects 543

Transformational Apparent Motion


Background
A phenomenon known as Transformational Apparent Motion (TAM) has received much atten-
tion over the past 20 years and sparked a renewed examination of the role of form analyses in
high-level motion processing. TAM occurs when two shapes, overlapping in space, are pre-
sented at different points in time, giving the illusion that one shape smoothly transforms into
the other (Tse et al., 1998). Precursors to TAM included ‘polarized gamma motion’ and ‘illusory
line motion,’ with the latter being a rediscovery and re-examination of the first (Hikosaka et al.,
1991, 1993a,b; Kanizsa, 1951, 1979). A classical demonstration of polarized gamma motion and
illusory line motion is illustrated in Figure 26.1A. Illusory line motion arises when a horizontal
bar is presented shortly after a transient cue located at one end of the bar. When this occurs, the
bar appears to extend out from the cue, rather than appearing all at once. Thus, rather than the
sudden appearance of a stationary object, a motion percept is observed in which an object appears
to morph from one shape to another.
An initial hypothesis for why these phenomena occur posited a primary role for attention.
Specifically, the sudden onset of the cue stimulus possibly draws attention and establishes an atten-
tional gradient that extends outward from the cue location. Because information at attended loca-
tions was presumed to be processed faster than at unattended locations, the target stimulus would
be processed asynchronously, leading locations closer to the center of the attentional gradient to
reach conscious awareness prior to those located more distally. This would thereby lead to the illu-
sory percept that the horizontal bar successively extends out from the point of attention (Faubert
and von Grünau, 1995; Stelmach and Herdman, 1991, Stelmach et al., 1994; Sternberg and Knoll,
1973; Titchener, 1908; von Grünau and Faubert, 1994). While attentional gradients may, indeed,
play some role in the illusory percept, subsequent experimentation suggested a dominant contri-
bution of other factors. For example, TAM can be observed even when attention is allocated away
from the cue. Also, if two cues – a red and a green dot – are presented simultaneously, some dis-
tance apart, when a red line appears abutting each cue and between them, the line always appears
to extend from the red dot, regardless of which cue is originally attended (Downing and Treisman,
1995, 1997; Hsieh et al., 2005; Tse and Cavanagh, 1995; Tse et al., 1996, 1998).
To account for these non-attentional effects, it has been argued that the illusory motion
observed in these stimuli arises from figural parsing (Tse et al., 1998; Tse and Logothetis, 2002).
Figural parsing occurs when contour and surface relationships are compared across successive
scenes. Thus, based on their relative surface and contour relationships, the visual system deter-
mines which shapes viewed at one time point correspond to which shapes viewed at a subsequent
time point. In the case of TAM, the visual system infers that an existing figure has changed its
shape into that of the new figure, leading to the perception of continuous deformation. Implicit in
this hypothesis is a fundamental role for form processes that extract information about the shape
and surface characteristics of objects. Moreover, as the motion percept in TAM displays depends
upon the output of these processes, this processing must occur either prior to, or coincident with
motion processing. In this view, processes that represent form information help solve the ‘what
went where?’ question of object movement. This occurs in two steps. First, individual objects are
identified or ‘parsed’ in a scene. The second step involves matching these parsed objects to the
objects present in the preceding scene.
The processes underlying TAM can be contrasted to those underlying classical translational
apparent motion. In classical translational apparent motion, when there are multiple objects in both
544 Blair, Tse, and Caplovitz

the first and second scene, motion correspondences tend to be formed between spatially-proximal
objects. This is true even if the proximal objects have dramatically dissimilar shape and surface
characteristics. As with TAM, this would imply that the object had grossly deformed from one
scene to the next. However, this deformation is determined not on the basis of object parsing and
figural matching, but rather on the basis of spatiotemporal proximity (Ullman, 1979). As such,
observations such as these led to the discounting of the importance of form features in determin-
ing object motion in the past (Baro and Levinson, 1988; Burt and Sperling, 1981; Cavanagh and
Mather, 1989; Dawson, 1991; Kolers and Pomerantz, 1971; Kolers and von Grünau, 1976; Navon,
1976; Ramachandran et al., 1983; Victor and Conte, 1990). However, as illustrated in Figure 26.1B,
TAM can still be observed in cases where the nearest neighbor principle may be violated in favor
of matching shapes across scenes that actually comprise more distant figures. This has been dem-
onstrated to result from a set of parsing and matching principles involving the analysis of contour
relationships among successive and abutting figures (Tse et al., 1998; Tse and Logothetis, 2002).
This appears to result largely from an analysis of good contour continuity, which indicates main-
tained figural identity, and contour discontinuity, which implies figural differences. Given the
lack of figural overlap in most translational apparent motion displays, this parsing is generally
unnecessary in determining ‘what went where?’

Neural correlates
Functional magnetic resonance imaging has determined which areas of the brain show the great-
est blood oxygen level dependent (BOLD) activity in response to TAM displays, as compared
with control stimuli (Tse, 2006). Using a region of interest analysis, this study found greater activ-
ity in response to TAM than control displays in V1, V2, V3, V4, V3A/B, hMT+, and the Lateral
Occipital Complex (LOC). An additional whole-brain analysis identified an area in the posterior
fusiform gyrus that was also found to be more active during the perception of TAM than control
stimuli. The recruitment of early retinotopically organized areas highlights the importance of the
basic visual processes (i.e. spatially specific detection of edges and contour features) that underlie
the perception of TAM. The recruitment of higher-level areas likely reflects the more global pro-
cessing that must underlie figural parsing and subsequent figural matching.
Of particular interest is the recruitment of the LOC. The LOC is now fully established as playing
a fundamental role in form processing and object recognition (Grill-Spector et al., 2001; Haxby
et al., 2001; Kanwisher et al., 1996; Malach et al., 1995) and, like TAM, has been shown to process
global 3D object shape, as opposed to just local 2D shape features (Avidan et al., 2002; Gilaie-Dotan
et al., 2001; Grill-Spector et al., 1998, 1999; Malach et al., 1995 Mendola et al., 1999; Moore and
Engel, 2001; Tse and Logothetis, 2002; Kourtzi and Kanwisher, 2000, 2001; Kourtzi et al., 2003a).
A reasonable interpretation of the increased activity in LOC during the viewing of TAM displays
relative to control stimuli is that in addition to processing global form and figural relationships, this
information is also output to motion-processing areas of the brain, such as hMT+.
Given this interpretation, and the increased activity demonstrated in both LOC and hMT+ dur-
ing TAM displays, it seems that hMT+ and LOC, rather than being motion processing and form
processing areas, respectively, may both serve as part of a form/motion processing circuit. In fact,
multiple studies have shown functional and anatomical overlap between LOC and hMT+ (Ferber
et  al., 2003; Kourtzi et  al., 2003a; Liu and Cooper, 2003; Liu et  al., 2004; Murray et  al., 2003;
Stone, 1999; Zhuo et al., 2003). As noted later in this chapter, it is likely that V3A/B, an area that
also shows increased activity in response to TAM displays, plays a key role in this form/motion
processing circuit. These findings call into question the traditional view of separate motion and
Interactions of Form and Motion in the Perception of Moving Objects 545

form processing streams contained in the dorsal ‘where’ and ventral ‘what’ pathways (Goodale
and Milner, 1992; Ungerleider and Mishkin, 1982). Although at the very highest representational
levels ‘what’ and ‘where’ may be largely independent (Goodale and Milner, 1992; Ungerleider and
Mishkin, 1982), form and motion processes are likely to be non-independent within the process-
ing stages that serve as inputs to these later representations.
Additional work has been done using electroencephalography (EEG) to study visually-evoked
potentials (VEP) in response to TAM displays as compared with displays that only flashed, but
lacked the TAM percept (Mirabella & Norcia, 2008). This study found that the VEP waveform
evoked by pattern onset and offset was significantly more symmetrical for TAM displays than for
flashing displays. The timing of such TAM-related processing appears within the first 150 ms of
object appearance and disappearance, once again implicating the involvement of early visual areas
in processing TAM. Furthermore, it was shown in the frequency domain that there was a notice-
able reduction in the odd-harmonic components in the frequency spectra for the TAM display,
as compared with that for a flashing patch alone. This further reflects the increased symmetry in
the TAM VEP waveform. Interestingly, as the contrast between the cue and flashing patch in the
TAM display was increased, the symmetry in the resulting VEP waveform decreased. Behavioral
data matched this observation, as the likelihood of participants perceiving TAM in the display was
strongly correlated with the symmetry of the VEP waveform. Thus, both behavioral and EEG data
further demonstrate the influence of object surface features on perceived movement.

Implications for Models of Transformational


Apparent Motion
The only formal model that we are aware of that attempts to account for TAM involves three inter-
acting subprocesses (Baloch and Grossberg, 1997). The first is a boundary completion process
where activity flows from V1 to interstripe V2 to V4. The second is a surface filling process where
activity flows from blob V1 to thin stripe V2 to V4. The third is a long-range apparent motion pro-
cess where activity flows from V1 to MT to MST. The model includes an additional link between
V2 and MT that allows the motion-processing stream to track emerging contours and filled-in
color surfaces (Baloch and Grossberg, 1997). The model represents a locally-based, bottom-up
explanation of TAM. In the fMRI experiment described above, each of the areas referenced in the
model has shown higher relative activity during the viewing of TAM displays. However, the model
fails to account for increased activity shown in V3v, V3A/B, and LOC. Furthermore, TAM has
been shown to be influenced by global configural relationships among stimuli, which this locally
based model cannot explain (Tse and Logothetis, 2002). TAM demonstrates many of the central
problems that the visual system must solve, which have been the subject of much study in the field
of visual neuroscience: How is local form information integrated into a global representation of
spatiotemporal figural relationships, and how does this, in turn, influence the interpretation of
local features (Kenkel, 1913; Wertheimer, 1912/1961)? During the perception of TAM, figural
contours must be analysed and integrated globally, over both space and time within and between
scenes.
For both contour integration in general and TAM, fMRI studies have demonstrated the strong-
est activity in lateral occipital areas of both the human and monkey brain (Altmann et al., 2003;
Kourtzi et  al., 2003b; Tse, 2006). However, both V1 and V2 also show increased activity dur-
ing such processes (Altmann et al., 2003; Caplovitz et al. 2008; Kourtzi et al., 2003b; Tse, 2006).
While increased activity in V2 may be unsurprising, given that single unit recordings have shown
its involvement in the perception of illusory contours (von der Heydt et  al., 1984), no such
546 Blair, Tse, and Caplovitz

involvement as early as V1 had previously been demonstrated. In more recent years, visual areas
V1 and V2 have been implicated in the processing of global shape (Allman et al., 1985; Fitzpatrick,
2000; Gilbert, 1992, 1998; Lamme et al., 1998) despite the traditional view that V1 is only involved
in the processing of local features (Hubel and Wiesel, 1968). However, it is still unclear whether
such activity in V1 results from bottom-up or top-down activation. A recent fMRI study found
increased activity in response to the spatial integration of individual elements into perceptually
grouped wholes in early visual cortex, possibly as early as V1 (Caplovitz et al., 2008). This was
true, despite each individual element being located in the periphery of a different visual quadrant,
suggesting such increases in activity are likely due to top-down feedback.
Separate from TAM, parsing can be important in other standard and apparent motion displays,
as pooling the motion energy of multiple objects moving through the same point in space would
lead to inaccurate motion signals (Born and Bradley, 2005). Motion signals arising at occlusion
boundaries may also be spurious (Nakayama and Silverman, 1988), and parsing can facilitate
the segmentation of spurious from real motion signals. It would appear that the visual system
possesses such parsing mechanisms and they help us to accurately perceive the motion of multi-
ple overlapping objects (Hildreth et al., 1995; Nowlan and Sejnowski, 1995). While there is evi-
dence that hMT+ plays some role in such motion parsing processes (Bradley et al., 1995; Stoner
and Albright, 1992, 1996), other evidence suggests that aspects of this process, such as figure
segmentation, do not take place in hMT+. Rather, it is more likely that specialized areas, such
as LOC handle global figural segmentation and similar processes, and that the resulting neural
activity is then output to hMT+. Given such an interaction, the analyses of form and motion, and
thus shape over time and space, can be seen as interacting inseparable processes. That form and
motion should be analyzed in an integrated spatiotemporal fashion was suggested as early as 1979
(Gibson), and has been re-emphasized in more recent years (Gepshtein and Kubovy, 2000; Wallis
and Bülthoff, 2001).

Size, Shape and the Perceived Speed of Rotating


Objects: Trackable Features
Recent research has demonstrated that the shape of an object directly affects the speed with which
it appears to rotate (Blair, Goold, Killebrew & Caplovitz, 2014; Caplovitz et al., 2006; Caplovitz
and Tse, 2007a; Porter et al., 2011). Specifically, objects with distinctive contour features, such
as corners or regions of high or discontinuous contour curvature are perceived to rotate faster
than those without such contour features. For example, when ellipses of various aspect ratios are
rotated with the same angular velocity, the ‘skinnier’ an ellipse is, the faster it appears to rotate
(Caplovitz et al., 2006).
There are various explanations for why this may be the case, and experiments have been con-
ducted to dissociate between them. For example, skinnier objects in general may appear to rotate
faster than fatter ones. Such an explanation is rooted in the temporal frequency with which con-
trast changes at any particular location in the visual field, highlighting the intrinsic ambiguity
that arises between spatial frequency, speed, and temporal frequency (Brown, 1931). Simply put,
the surface of a rotating skinny object will sweep across a neuron’s receptive field in less time than
that of a fatter object. This hypothesis can be ruled out by the fact that no differences in perceived
speed were observed between the perceived speed of skinny and fat rectangles (Caplovitz et al.,
2006).
A second hypothesis is that distinctive contour features serve as trackable features that provide
an unambiguous source of information about the speed and direction of motion of a given object.
Interactions of Form and Motion in the Perception of Moving Objects 547

This hypothesis is rooted in the works of Wallach (Wallach, 1935; Wallach & O’Connell, 1953;
Wallach et al., 1956) and Ullman (1979), which highlight the importance of such form features in
extracting 3D structure from motion (i.e. the Kinetic Depth Effect). In the case of a skinny ellipse,
the regions of high curvature located at the ends of the major axis may serve as an additional
source of motion information that is unavailable in the case of a fat ellipse. Moreover, this hypoth-
esis is consistent with the lack of effect observed with rotating rectangles whose corners may act
as trackable features regardless of whether they belong to a skinny or fat rectangle. To directly test
this hypothesis, an experiment was conducted in which the corners of a rectangle were ‘rounded
off ’ to a lesser or greater degree (Caplovitz et al., 2006). The more the corners were rounded, the
slower the rounded-rectangle appeared to rotate, thereby providing strong support in favor of the
form-defined trackable features hypothesis (see Figure 26.2A).
A third hypothesis, and one consistent with the data derived from the experiments described
above, is that the perceived speed of a rotating object is determined by the magnitudes of locally
detected 1D motion signals (Weiss and Adelson, 2000). Changes to an object’s shape will change
the distribution of component motion signals detected along its contour. When the magnitudes of
component motion signals derived from a skinny ellipse were compared with those derived from
a fat ellipse (see Figure 26.2B) it was found that they scaled in a manner wholly consistent with the
changes in perceived speed. Moreover, because the magnitudes of component motion signals scale

(a)

(b) (c)

Fig. 26.2  Trackable features and component vectors. (a) Proposed trackable features on rectangles,
ellipses, and rounded rectangles. (b) Changes in local component motion vectors of a rotating
ellipse as a function of changes in aspect ratio. (c) Changes in local component motion vectors as a
function of changes in the size of rotating objects.
548 Blair, Tse, and Caplovitz

as a function of their distance from the center of rotation, there are no differences in distribution
of such signals between skinny and fat rectangles. Although the relationship between component
motion magnitude and perceived speed is not as precise for the case of the rounded rectangles,
there is indeed a parametric decrease in the local distribution of component motions signals in the
corner regions as the corners become more and more rounded (Caplovitz et al., 2006).
As such, these initial sets of experiments were unable to conclusively determine whether
shape-related changes in perceived rotational speed arise due to trackable features or the inte-
gration of local component motion signals. It was not until very recently that experiments were
conducted to explicitly dissociate between these two hypotheses (Blair et al., 2014). This study
specifically examined the case of angular velocity. A hallmark of angular velocity is that it is
size invariant. Making a rotating object smaller will not change its angular velocity. However,
doing so will systematically decrease the magnitudes of the component motion signals derived
along its contour (see Figure 26.2C). The study compared the perceived rotational speeds of
small and large objects. There were two primary findings of the study: first, across a range of
object categories: ellipses, rectangles, stars, and rounded rectangles, smaller objects appear to
rotate more slowly than larger objects. This finding is what would be predicted by the local-
motion integration hypothesis. However, the second main finding of the study is that the degree
to which smaller objects appear to rotate slower is dependent upon the shape of the object.
Specifically, while the relative change in perceived speed of rectangles with very rounded cor-
ners is nearly perfectly predicted by the relative magnitudes of the component motion signals,
very little change in perceived speed is observed for regular rectangles, skinny ellipse, and star-
shapes. Indeed, simply reducing the degree to which the corners of the rounded-rectangles were
rounded off reduced the effect size of perceived rotational speed. These two findings suggest
that both hypotheses are likely to be true: the perceived speed of a rotating object is determined
by a combination of locally detected motion signals, which comprise a scale-variant source
of information, and the motion of form-defined trackable features, which comprise a scale-
invariant source of information.
What is important to note is that both sources of information are shape-dependent. However,
only the trackable feature motion requires an analysis of form, because in order to provide a use-
ful source of information, the trackable feature must first be classified as belonging to the object
that is rotating (see figure parsing above). Moreover, the motion of the trackable feature must be
attributed to other locations along the object’s contour. Lastly, in order to produce a size-invariant
representation (i.e. angular velocity), the motion of a trackable feature must be integrated with
information about its distance from the center of rotation, a necessarily non-local computation.
In the case of objects that simultaneously translate as they rotate, it appears to be the case that the
rotational motion around the object’s center is segmented from the overall translational motion of
the object (Porter et al., 2011). This suggests that the size invariant signal derived from the motion
of a trackable feature involves the computation of the object’s center.
The effects of object shape on the perceived speed of rotational motion have also been observed
and examined in the context of motion fading. Motion fading occurs when a slowly drifting or
rotating pattern appears to slow down and then momentarily stop, while the form of the pattern is
still visible (Campbell and Maffei, 1979, 1981; Lichtenstein, 1963; Spillman and De Weerd, 2003).
Experiments have shown that the presence of trackable features extends the time that it takes
motion fading to occur for rotating objects, as compared with those rotating objects, which do not
possess distinct trackable features (Hsieh and Tse, 2007). Furthermore, if the trackable features
of objects such as ellipses are made even more distinct by increasing a rotating ellipse’s aspect
ratio, it takes even longer for motion fading to occur (Kohler et al., 2010). It was further shown
Interactions of Form and Motion in the Perception of Moving Objects 549

that the effect of shape on the time for motion fading to occur is mediated by the perceived speed
of the rotating object. For example, a fatter ellipse will appear slower than a skinny ellipse and will
therefore take less time for motion fading to occur. Thus, by influencing the perceived speed of
rotation, an object’s contour features dictate how long it takes for a slowly rotating object to appear
to cease moving. This demonstrates the importance of the form-motion interaction that underlies
the role of trackable features role in the perception of rotational motion. Not only do they provide
a direct effect on perceived speed, but also indirect effects on other aspects of motion perception.

Neural correlates
Clearly, there is strong behavioral evidence for the existence of multiple form–motion interac-
tions. The question stands: where in the brain might these interactions take place? In the context
of the role form plays in the perceived speed of rotating objects, evidence from fMRI studies has
implicated the involvement of V3A. When shown rotating objects that modulated their contour
curvature at one point while remaining constant in speed and area, BOLD activity was also modu-
lated in area V3A of observers’ brains (Caplovitz & Tse, 2007b). Previous research focused on this
area has led to findings consistent with the interpretation that V3A makes use of areas of contour
curvature to process the rotational motion of objects. For one, it has been shown in several studies
that area V3A is motion selective (Tootell et al., 1997; Vanduffel et al., 2002). Motion processing
is only half of the story, and sure enough, V3A per cent BOLD signal change has also been cor-
related with contour and figural processing, even when contours and figures are not consciously
perceived (Schira et al., 2004). To go a step further, BOLD activity in V3A has been correlated
with various additional form-motion interactions. Specifically, it has been shown multiple times
that there is a greater percent BOLD signal change in the V3A when participants observe coher-
ent, as opposed to random motion (Braddick et al., 2000, 2001; Moutoussis et al., 2005; Vaina
et al., 2003). Finally, it was found that the V3A is more responsive to rotational than translational
motion (Koyama et al., 2005). In combination, these various findings indicate that V3A makes use
of form information, specifically contour curvature, to process motion information about moving
objects. The strongest activity may result in situations where the motion is more difficult for the
visual system to interpret, such as with rotation (Kaiser, 1990).
Neurophysiological data recorded in area MT of macaques has further elucidated some specifics
of how areas of contour curvature on objects may be used in processing object motion. Specifically,
certain neurons in macaque MT have been shown to respond more to the terminator motion of
lines than to the ambiguous motion signals present along a line’s contour. In addition, these neu-
rons respond strongest when terminators are intrinsically owned, as opposed to when they are
extrinsic (Pack et al., 2004). Interestingly, this process is not instantaneous, as it takes roughly 60 ms
for neurons in macaque MT to shift their response properties from those consistent with motion
perpendicular to a moving line, regardless of its actual direction of motion, to those consistent with
the true motion of the line independent of its orientation (Pack and Born, 2001). Behavioral data
examining initial pursuit eye movements support this finding, in that observers will initially follow
the motion perpendicular to the moving line before then exhibiting eye movements that follow the
unambiguous motion of line terminators. Further neurophysiological evidence has indicated that
neurons of this sort (dubbed end-stopped neurons) may be present in the visual system as early as
area V1 (Pack et al., 2003). This would mean that trackable feature information could be extracted
and utilized as early on as V1 in the visual processing stream. All these findings could help explain
how the visual system is capable of overcoming the aperture problem under various circumstances
using trackable features, and also, why it does not always do so perfectly.
550 Blair, Tse, and Caplovitz

From Moving Parts to Moving Wholes: the Perceived


Motion of Perceptually Grouped Objects
Just as an object’s shape has been shown to affect its perceived motion, additional processes,
such as perceptual grouping and the formation of contours from discrete elements, can lead
to changes in perceived motion. For example, one study examined how the perception of the
speed for rotating ellipses was modulated when the ellipses’ contours were constructed using
individual dots, instead of a continuous contour (Caplovitz & Tse, 2007a). Under these cir-
cumstances, one might expect that changing the aspect ratios of these ellipses should have no
effect on their perceived speed, as the individual dots should serve as unambiguous trackable
features not subject to the aperture problem. However, this was only the case if the dots were
spaced sufficiently far apart. While not in direct contact with one another, when spaced closely
enough together, aspect ratio-related changes in perceived speed were observed. This was true
even when the ellipses were formed using contrast-balanced dots that minimally activate neu-
rons sensitive to low-spatial frequencies to whose large receptive fields closely spaced dots may
produce similar patterns of activity as a continuous contour. It was subsequently hypothesized
that when the dots are closely spaced the visual system is incapable of following the motion
of a single dot. In the absence of such locally unambiguous motion, the visual system makes
use of the information from the perceptually grouped contour implicit in the dot arrangement
(Caplovitz & Tse, 2007a).
Further evidence for the effects of grouping on perceived motion has been demonstrated using
the motion fading paradigm. Specifically, when elements are part of a slowly rotating display, if
disparate elements can be grouped in such a way as to form the perception of an object that pos-
sesses trackable features, the amount of time necessary for motion fading to occur is increased
(Hsieh & Tse, 2007; Kohler et al., 2010). Similar to the previously described experiment examin-
ing the perceived rotational speed of dotted ellipses, the aspect ratio of such ellipses affects the
time course of motion fading only when the dots are spaced closely enough that a single dot can-
not be tracked by the visual system (Kohler et al., 2010).
While these previously discussed examples of the effects of grouping on motion percep-
tion appear to be largely automatic in nature, multistable perceptions involving grouping and
perceived speed have also been demonstrated. Specifically, if four dot pairs are evenly spaced
in a square formation, and each pair rotates around a common center, observers may inter-
pret the movement as four rotating dot pairs, or two flat squares moving in a circular motion
with one in front and the other behind, the dots in the pairs making up their corners (Anstis,
2003; Anstis and Kim, 2011). As a participant’s perception and interpretation changes, so does
the perceived speed of elements present (Figure 26.3A). When perceptually grouped into the
global percept of a square, the perceived speed of the display appears to slow down (Kohler,
Caplovitz, & Tse, 2009). The dots may be exchanged for various elements that bias the per-
ception in one direction or another (Figure 26.3B). Such elements have been shown to be
perceived as moving faster when viewed simply as rotating pairs, than when seen as being part
of any of the illusory shapes that may result from interpreting them as being corners instead of
individual elements (Kohler, Caplovitz, & Tse, 2009). Thus, form information resulting from
both automatic and multistable perceived groupings of moving objects can affect the perceived
motion of such groups.
Thus far, the effect of object shape on its perceived motion has been principally discussed.
However, there are also examples showing that the movement of an object can influence its per-
ceived shape (i.e. the Gelatinous Ellipse, Weiss and Adelson, 2000; and the Kinetic Depth Effect,
Interactions of Form and Motion in the Perception of Moving Objects 551

(a) (b)

Faster Slower

Fig. 26.3  Emergent motion on the basis of perceptual grouping. (a) When four dot pairs, each
pair rotating around its own common center, are perceived as separate objects, they are perceived
to rotate faster than when dots are perceived to form the corners of two squares translating in
a circular pattern with one sliding in front of the other. (b) The percept of individual elements or
square corners may be biased by element shape and arrangement, with individual elements most
likely to be seen when misaligned (top), and squares more likely to be seen when the elements are
aligned (bottom).

Wallach and O’Connell, 1953). Recently, it has been demonstrated that the movement-dependent
shape distortions can come as a result of local form/motion interactions in elements grouped to
form a larger perceived object. As previously mentioned, elongated objects are perceived to move
faster when moving in a direction parallel, as opposed to orthogonal, to their elongated axis
(Georges et al., 2002; Seriès et al., 2002). Taking advantage of this observation, an experiment
was conducted in which differentially elongated Gaussian blobs were used to form the corners
of illusory four-sided translating shapes. In the experiment, the blob would be orientated such
that those on the leading edge of the illusory object would be either parallel or orthogonal to
the direction of motion and those on the trailing edge of the illusory shape would be orientated
orthogonally to those on the leading edge. It was found that when those on the leading edge were
parallel to the direction of motion, the resulting illusory object appeared to be elongated, while
the opposite effect was observed when blobs on the leading edge were oriented orthogonally to
the direction of motion, as depicted in Figure 26.4 (McCarthy et al., 2012). This example reveals
how form and motion interact with each other across a range of visual processing stages from
very early (local orientation dependent perceived speed) to later representations of perceived
global shape.
As mentioned in the introduction, a 3D representation of a moving object can be derived from
appropriate 2D velocities of seemingly random dot displays. In such form-from-motion displays,
depth, 3D object shape, and 3D object motion may be perceived if seemingly random dot fields
are moved in ways consistent with the dots in motion being affixed to a particular 3D shape
(Green, 1961). This process represents a form of perceptual grouping in which the individual
dots are grouped into a single perceptual whole. Intriguingly, the shape and motion of the per-
ceived object do not always match what would be predicted based upon the individual motions
of the dots that make up the display. Instead, characteristics of the shape and motion of the global
object depend upon the shape and motion of the object itself. For example, perceived variations
in the angular velocity of rotating 3D shapes simulated by dot fields were more closely tied to the
552 Blair, Tse, and Caplovitz

Gabor arrangement/movement Perceived shape

Fig. 26.4  Form-motion-form interaction. When elliptical Gaussians are arranged in a square


formation and translated in a single common direction, if the leading edge and trailing edge
Gaussians are orientated 90º from one another, the perceived moving shape will appear to be a
rectangle instead of a square. The shape will appear elongated if the leading edge Gaussians are
orientated parallel to their direction of translation, and compressed if the leading edge Gaussians are
orientated orthogonal to their direction of translation.

perceived deformation of the rotating shapes than on actual variations in their angular velocities
(Domini et al., 1998). Similarly, the perceived slant of a simulated surface varies as a function
of the angular velocity with which it rotates when other factors are kept constant (Domini &
Caudek, 1999). These various effects have been demonstrated both when objects are rotated,
while being passively observed, and when object motion is a function of simulated optic flow in
response to observer movement (Caudek et al., 2011; Fantoni et al., 2010, 2012). Additionally,
even when binocular visual cues such as disparity are available, such biases and misperceptions
are still observed (Domini et al., 2006). The perception of these effects and visual biases is also
correlated with changes in grasping movements for the simulated objects (Foster et al., 2011).
A model based on the assumption that the analysis of 3D shape is performed locally accounts
well for successful and unsuccessful interpretation of 3D shape and the movement of 3D shapes
by human observers, as demonstrated by a variety of form motion interactions observed using
Interactions of Form and Motion in the Perception of Moving Objects 553

this paradigm (Domini & Caudek, 2003). Thus, not only is visual perception affected by form
motion interactions, but the practical behaviors in response to such perceptions are also adjusted
accordingly.

Conclusion
These results can be taken as further evidence for the inherently constructive nature of motion
processing, and the importance of form operators in motion processing. While it is not clear
where in the brain the analysis of form occurs that results in the perception of rotational
motion, it probably occurs within some or all of the neural circuitry that realizes the form–
motion interactions described above. These results support the general thesis that there are,
broadly speaking, two stages to motion perception – one, where motion energy is detected by
cells in early visual areas tuned to motion magnitude and direction, and another stage where
this detected information is operated upon by grouping and other visual operators that then
construct the motion that will be perceived (Caplovitz & Tse, 2007a; Hsieh & Tse, 2007; Kohler
et al., 2009, 2010). This means that perceived motion, while constructed on the basis of locally
detected motion information, is not itself detected or even present in the stimulus. It should
also be noted that, while we have focused on specific examples from only three broad catego-
ries of form motion interaction, these examples represent only a small subset of what has been
identified and tested at this time with further examples ranging as far as the processes under-
lying the perception of biological motion and how motion is conveyed through static images
(i.e. motion streaks).
Classically, form and motion perception were considered to be mediated by independent pro-
cesses in the visual system. Indeed there is a good deal of evidence for such independence at
the earliest stages of visual processing, as well as at the highest levels of perceptual represen-
tation. However, there is growing evidence suggesting that the mechanisms that process form
and motion characteristics of the visual scene mutually interact in numerous and complex ways
across a range of mid-level visual processing stages. These form-motion interactions appear to
help resolve fundamental ambiguities that arise at the earliest stages in the processing of the reti-
nal image. By combining information from both domains, these form motion interactions allow
potentially independent high-level representations of an object’s shape and motion to more accu-
rately reflect what is actually occurring in the world around us.

Acknowledgment
This work was supported by an Institutional Development Award (IDeA) from the National
Institute of General Medical Sciences of the National Institutes of Health under grant number
1P20GM103650-01, and a grant from the National Eye Institute: 1R15EY022775.

References
Allman, J. M., Miezin, F., and McCuiness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Ann Rev
Neurosci 8: 407–430.
Altmann, C. F., Bülthoff, H. H. and Kourtzi, Z. (2003). Perceptual organization of local elements into
global shapes in the human visual cortex. Curr Biol 13(4): 342–349.
Anstis, S. (2003). Levels of motion perception. In Levels of Perception, edited by L. Harris & M. Jenkin,
pp. 75–99. New York: Springer.
554 Blair, Tse, and Caplovitz

Anstis, S., and Kim, J. (2011). Local versus global perception of ambiguous motion displays. J Vision
11(3): 13, 1–12. Available at: http://www.journalofvision.org/content/11/3/13.
Avidan, G., Harel, M., Hendler, T., Ben-Bashat, D., Zohary, E., and Malach, R. (2002). Contrast sensitivity
in human visual areas and its relationship to object recognition. J Neurophysiol 87: 3102–3116.
Baloch, A. A., and Grossberg, S. (1997). A neural model of high—level motion processing: line motion and
form-motion dynamics. Vision Res 37(21): 3037–3059.
Baro, J. A., and Levinson, E. (1988). Apparent motion can be perceived between patterns with dissimilar
spatial frequencies. Vision Res 28: 1311–1313.
Blair, C. B., Goold, J., Killebrew, K., & Caplovitz, G. P. (2014). Form features provide a cue to the angular
velocity of rotating objects. Journal of Experimental Psychology: Human Perception and Performance
40(1): 116–128. doi: 10.1037/a0033055(
Born, R. T., and Bradley, D. C. (2005). Structure and function of visual area MT. Ann Rev Neurosci
28: 157–189.
Braddick, O. J., O’Brien, J. M., Wattam-Bell, J., Atkinson, J., Hartley, T., and Turner, R. (2001). Brain areas
sensitive to coherent visual motion. Perception 30: 61–72.
Braddick, O. J., O’Brien, J. M., Wattam-Bell, J., Atkinson, J., and Turner, R. (2000). Form and motion
coherence activate independent but not dorsal/ventral segregated, networks in the human brain. Curr
Biol 10: 731–734.
Bradley, D. C., Qian, N., and Andersen, R. A. (1995). Integration of motion and stereopsis in middle
temporal cortical area of macaques. Nature 373(6515): 609–611.
Brown, J. F. (1931). The visual perception of velocity. Psychol Res 14(1): 199–232.
Bruno, N., & Bertamini, M. (2013). Perceptual organization and the aperture problem. In J. Wagemans
(Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Burt, P., and Sperling, G. (1981). Time, distance, and feature trade-offs in visual apparent motion. Psychol
Rev 88: 171–195.
Campbell, F. W., & Maffei, L. (1979). Stopped visual motion. Nature 278: 192–193.
Campbell, F. W., & Maffei, L. (1981). The influence of spatial frequency and contrast on the perception of
moving patterns. Vision Res 21: 713–721.
Caplovitz, G. P., Hsieh, P-J., & Tse, P. U. (2006). Mechanisms underlying the perceived angular velocity of a
rigidly rotating object. Vision Res 46(18): 2877–2893.
Caplovitz, G. P., & Tse, P. U. (2007a). Rotating dotted ellipses: motion perception driven by grouped figural
rather than local dot motion signals. Vision Res 47(15): 1979–1991.
Caplovitz, G. P., & Tse, P. U. (2007b). V3A processes contour curvature as a trackable feature for the
perception of rotational motion. Cerebral Cortex 17(5): 1179–1189.
Caplovitz, G. P., Barroso, D. J., Hsieh, P. J., & Tse, P. U. (2008). fMRI reveals that non-local processing
in ventral retinotopic cortex underlies perceptual grouping by temporal synchrony. Hum Brain Map
29(6): 651–661.
Caudek, C., Fantoni, C., & Domini, F. (2011). Bayesian modeling of perceived surface slant from
actively-generated and passively-observed optic flow. PLoS ONE 6(4): 1–12.
Cavanagh, P., Arguin, M., and von Grünau, M. (1989). Interattribute apparent motion. Vision Res
29(9): 1197–1204.
Cavanagh, P., and Mather, G. (1989). Motion: the long and short of it. Spatial Vis 4: 103–129.
Dawson, M. R. W. (1991). The how and why of what went where in apparent motion: modeling solutions to
the motion correspondence problem. Psychol Rev 33(4): 569–603.
Domini, F., & Caudek, C. (1999). Perceiving surface slant from deformation of optic flow. J Exp Psychol
Hum Percept Perform 25(2): 426–444.
Domini, F., & Caudek, C. (2003). 3-D structure perceived from dynamic information: a new theory. Trends
Cogn Sci 7(10): 444–449.
Interactions of Form and Motion in the Perception of Moving Objects 555

Domini, F., Caudek, C., & Tassinari, H. (2006). Stero and motion information are not independently
processed by the visual system. Vision Res 46: 1707–1723.
Domini, F., Caudek, C., Turner, J., & Favretto, A. (1998). Discriminating constant from variable angular
velocities in structure form motion. Percept Psychophys 60(5): 747–760.
Downing, P., and Treisman, A. (1995). The shooting line illusion: attention or apparent motion? Invest
Ophthalmol Vision Sci 36: S856.
Downing, P., and Treisman, A. (1997). The line motion illusion: attention or impletion? J Exp Psychol Hum
Percept Perform 23(3): 768–779.
Fantoni, C., Caudek, C., & Domini, F. (2010). Systematic distortions of perceived planar surface motion in
active vision. J Vision 10(5): 12, 1–20.
Fantoni, C., Caudek, C., & Domini, F. (2012). Perceived slant is systematically biased in actively-generated
optic flow. PLoS ONE 7(3): 1–12.
Faubert, J., and von Grünau, M. (1995). The influence of two spatially distinct primers and attribute
priming on motion induction. Vision Res 35(22): 3119–3130.
Ferber, S., Humphrey, G. K. and Vilis, T. (2003). The lateral occipital complex subserves the perceptual
persistence of motion-defined groupings. Cereb Cortex 13: 716–721.
Fitzpatrick, D. (2000). Seeing beyond the receptive field in primary visual cortex. Curr Opin Neurobiol
10: 438–443.
Foster, R., Fantoni, C., Caudeck, C., & Domini, F. (2011). Integration of disparity and velocity information
for haptic and perceptual judgments of object depth. Acta Psychol 136: 300–310.
Georges, S., Seriès, P., Frégnac, Y., & Lorenceau, J. (2002). Orientation dependent modulation of apparent
speed: Psychophysical evidence. Vision Res 42: 2757–2772.
Gepshtein, S., and Kubovy, M. (2000). The emergence of visual objects in spacetime. Proc Natl Acad Sci
USA 97(14): 8186–8191.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gilaie-Dotan, S., Ullman, S., Kushnir, T., and Malach, R. (2001). Shape-selective stereo processing in
human object- related visual areas. Hum Brain Map 15: 67–9.
Goodale, M., and Milner, A. (1992). Separate visual pathways for perception and action. Trends Neurosci
15: 20–25.
Green, B. F., Jr. (1961). Figure coherence in the kinetic depth effect. J Exp Psychol 62(3): 272–282.
Gilbert, C. D. (1992). Horizontal integration and cortical dynamics. Neuron 9: 1–13.
Gilbert, C. D. (1998). Adult cortical dynamics. Physiol Rev 78: 467–485.
Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., and Malach, R. (1999). Differential
processing of objects under various viewing conditions in the human lateral occipital complex. Neuron
24: 187–203.
Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., and Malach, R. (1998). Cue-invariant activation in
object-related areas of the human occipital lobe. Neuron 21: 191–202.
Grill-Spector, K., Kourtzi, Z., and Kanwisher, N. (2001). The lateral occipital complex and its role in object
recognition. Vision Res 41: 1409–1422.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., and Pietrini, P. (2001). Distributed
and overlapping representations of faces and objects in ventral temporal cortex. Science
293(5539): 2425–2430.
Herzog, M. H., & Öğmen, H. (2013). Apparent motion and reference frames. In J. Wagemans (Ed.), Oxford
Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Hikosaka, O., Miyauchi, S., and Shimojo, S. (1991). Focal visual attention produces motion sensation in
lines. Investig Ophthamol Vis Sci 32(4): 176.
Hikosaka, O., Miyauchi, S., and Shimojo, S. (1993a). Focal visual attention produces illusory temporal
order and motion sensation. Vision Res 33(9): 1219–1240.
556 Blair, Tse, and Caplovitz

Hikosaka, O., Miyauchi, S., and Shimojo, S. (1993b). Visual attention revealed by an illusion of motion.
Neurosci Res 18(1): 11–18.
Hildreth, E. C., Ando, H., Andersen, R. A., and Treue, S. (1995). Recovering three-dimensional structure
from motion with surface reconstruction. Vision Res 35(1): 117–137.
Hock, H. S. (2013). Dynamic grouping motion: A method for determining perceptual organization for
objects with connected surfaces. In J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization
(in press). Oxford, U.K.: Oxford University Press.
Hsieh, P-J., Caplovitz, G. P., and Tse, P. U. (2005). Illusory rebound motion and the motion continuity
heuristic. Vision Res 45(23): 2972–2985.
Hsieh, P-J., and Tse, P. U. (2006). Stimulus factors affecting illusory rebound motion. Vision Res
46(12): 1924–1933.
Hsieh, P-J., & Tse, P. U. (2007). Grouping inhibits motion fading by giving rise to virtual trackable features.
J Exp Psychol Hum Percept Perform 33: 57–63.
Hubel, D. H., and Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate
cortex. J Physiol 195: 215–243.
Kaiser, M. K. (1990). Angular velocity discrimination. Percept Psychophys 47: 149–156.
Kanizsa, G. (1951). Sulla polarizzazione del movimento gamma [The polarization of gamma movement].
Arch Psichol Neurol Psichiatr 3: 224–267.
Kanizsa, G. (1979). Organization in Vision: Essays on Gestalt Perception. New York: Praeger.
Kanwisher, N., Chun, M. M., McDermott, J., and Ledden, P. J. (1996). Functional imagining of human
visual recognition. Brain Res Cogn Brain Res 5(1–2): 55–67.
Kenkel, F. (1913). Untersuchungen über den zusammenhang zwischen erscheinungsgrösse und
erscheinungsbewegung bei einigen sogenannten optischen täuschungen. Zeitschrift für Psychologie
67: 358–449.
Kohler, P. J., Caplovitz, G. P., Hsieh, P-J., Sun, J., & Tse, P. U. (2010). Motion fading is driven by perceived,
not actual angular velocity. Vision Res 50: 1086–1094.
Kohler, P. J., Caplovitz, G. P., & Tse, P. U. (2009). The whole moves less than the spin of its parts. Attention,
Percept Psychophys 71(4): 675–679.
Kolers, P. A., and Pomerantz, J. R. (1971). Figural change in apparent motion. J Exp Psychol 87: 99–108.
Kolers, P. A., and von Grünau, M. (1976). Shape and color in apparent motion. Vision Research
16: 329-335.
Koyama, S., Sasaki, Y., Andersen, G. J., Tootell, R. B., Matsuura, M., and Watanabe, T. (2005). Separate
processing of different global-motion structures in visual cortex is revealed by FMRI. Curr Biol
15(22): 2027–2032.
Kourtzi, Z., Erb, M., Grodd, W., and Bülthoff, H. H. (2003a). Representation of the perceived 3-D object
shape in the human lateral occipital complex. Cereb Cortex 13(9): 911–920.
Kourtzi, Z., and Kanwisher, N. (2000). Cortical regions involved in perceiving object shape. J Neurosci
20: 3310–3318.
Kourtzi, Z., and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293: 1506–1509.
Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., and Logothetis, N. K. (2003b). Integration of local
features into global shapes. Monkey and human FMRI studies. Neuron 37(2): 333–346.
Lamme, V. A., Super, H., and Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in
the visual cortex. Curr Opin Neurobiol 8: 529–535.
Lichtenstein, M. (1963). Spatio-temporal factors in cessation of smooth apparent motion. J Opt Soc Am
53: 304–306.
Liu, T., and Cooper, L. A. (2003). Explicit and implicit memory for rotating objects. J Exp Psychol Learn
Mem Cogn 29: 554–562.
Interactions of Form and Motion in the Perception of Moving Objects 557

Liu, T., Slotnick, S. D., and Yantis, S. (2004). Human MT+ mediates perceptual filling-in during apparent
motion. NeuroImage 21(4): 1772–1780.
Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., Ledden, P. J., Brady,
T. J., Rosen, B. R., and Tootell, R. B. (1995). Object-related activity revealed by functional magnetic
resonance imaging in human occipital cortex. Proc Natl Acad Sci 92(18): 8135–8139.
McCarthy, J. D., Cordeiro, D., and Caplovitz, G. D. (2012). Local form-motion interactions influence
global form perception. Attention Percept Psychophys 74: 816–823.
Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K., and Tootell, R. B. H. (1999). The representation of real
and illusory contours in human cortical visual areas revealed by fMRI. J Neurosci 19: 8560–8572.
Mirabella, G., and Norcia, A. N. (2008). Neural correlates of transformational apparent motion. Perception
37: 1368–1379.
Moore, C., and Engel, S. A. (2001). Neural response to perception of volume in the lateral occipital
complex. Neuron 29: 277–286.
Moutoussis, K., Keliris, G., Kourtzi, Z., and Logothetis, N. (2005). A binocular rivalry study of motion
perception in the human brain. Vision Res 45(17): 2231–2243.
Murray, S. O., Olshausen, B. A., and Woods, D. L. (2003). Processing shape, motion and three-dimensional
shape-from-motion in the human cortex. Cereb Cortex 13: 508–516.
Nakayama, K., and Silverman, G. H. (1988b). The aperture problem II. Spatial integration of velocity
information along contours. Vision Res 28(6): 747–753.
Navon, D. (1976). Irrelevance of figural identity for resolving ambiguities in apparent motion. J Exp Psychol
Hum Percept Perform 2: 130–138.
Nowlan, S. J., and Sejnowski, T. J. (1995). A selection model for motion processing in area MT of primates.
J Neurosci 15(2): 1195–1214.
Pack, C. C., and Born, R. T. (2001). Temporal dynamics of a neural solution to the aperture problem in
visual area MT of macaque brain. Nature 409(6823): 1040–1042.
Pack, C. C., Gartland, A. J., and Born, R. T. (2004). Integration of contour and terminator signals in visual
area MT of alert macaque. J Neurosci 24(13): 3268–3280.
Pack, C. C., Livingstone, M. S., Duffy, K. R., and Born, R. T. (2003). End-stopping and the aperture
problem: two-dimensional motion signals in macaque V1. Neuron 39(4): 671–680.
Porter, K. B., Caplovitz, G. P., Kohler, P. J., Ackerman, C. M., & Tse, P. U. (2011). Rotational and
translational motion interact independently with form. Vision Res 51: 2478–2487.
Ramachandran, V.S., Ginsburg, A. P., and Anstis, S. M. (1983). Low spatial frequencies dominate apparent
motion. Perception 12: 457–461.
Ramachandran, V. S., and Gregory, R. L. (1978). Does colour provide an input to human motion
perception? Nature 275: 55–56.
Schira, M. M., Fahle, M., Donner, T. H., Kraft, A., and Brandt, S. A. (2004). Differential contribution of
early visual areas to the perceptual process of contour processing. J Neurophysiol 91(4): 1716–1721.
Seriès, P., Georges, S., Lorenceau, J., & Frégnac, Y. (2002). Orientation dependent modulation of apparent
speed: a model based on the dynamics of feedforward and horizontal connectivity in V1 cortex. Vision
Res 42: 2781–2797.
Spillmann, L., & De Weerd, P. (2003). Mechanisms of surface completion: perceptual filling-in of texture.
In Filling-in: From Perceptual Completion to Cortical Reorganization, edited by L. Pessoa & P. De Weerd,
pp. 81–105. Oxford: Oxford University Press.
Stelmach, L. B., and Herdman, C. M. (1991). Directed attention and perception of temporal order. J Exp
Psychol Hum Percept Perform 17(2): 539–550.
Stelmach, L. B., Herdman, C. M., and McNeil, K. R. (1994). Attentional modulation of visual processes
in motion perception. Journal of Experimental Psychology: Human Perception and Performance
20(1): 108-121.
558 Blair, Tse, and Caplovitz

Sternberg, S., and Knoll, R. L. (1973). The perception of temporal order: fundamental issues and
a general model. In: Attention and Performance, Vol. IV, edited by S. Kornblum, pp. 629–685.
New York: Academic Press.
Stone, J. V. (1999). Object recognition: view-specificity and motion-specificity. Vision Res 39: 4032–4044.
Stoner, G. R., and Albright, T. D. (1992). Motion coherency rules are form-cue invariant. Vision Res
32(3): 465–475.
Stoner, G. R., and Albright, T. D. (1996). The interpretation of visual motion: evidence for surface
segmentation mechanisms. Vision Res 36(9): 1291–1310.
Titchener, E. B. (1908). Lecture on the Elementary Psychology of Feeling and Attention. New York: McMillan.
Tootell, R. B., Mendola, J. D., Hadjikhani, N. K., Ledden, P. J., Liu, A. K., Reppas, J. B., Sereno, M. I., and
Dale, A. M. (1997). Functional analysis of V3A and related areas in human visual cortex. J Neurosci
17(18): 7060–7078.
Tse, P. U. (2006). Neural correlates of transformational apparent motion. NeuroImage 31(2): 766–773.
Tse, P. U., and Caplovitz, G. P. (2006). Contour discontinuities subserve two types of form analysis that
underlie motion processing. In: Progress in Brain Research 154: Visual Perception. Part I. Fundamentals
of Vision: Low and Mid-level Processes in Perception, edited by S. Martinez-Conde, S. L. Macknick,
L. M. Martinez, J-M. Alonso, and P. U. Tse, pp. 271–292. Amsterdam: Elsevier.
Tse, P. U., and Cavanagh, P. (1995). Line motion occurs after surface parsing. Invest Ophth Vision Sci
36: S417.
Tse, P. U., Cavanagh, P., and Nakayama, K. (1996). The roles of attention in shape change apparent motion.
Invest Ophthalmol Vision Sci 37: S213.
Tse, P. U., Cavanagh, P., and Nakayama, K. (1998). The role of parsing in high-level motion processing.
In: High-Level Motion Processing: Computational, Neurobiological, and Psychophysical Perspectives, edited
by T. Watanabe, pp. 249–266. Cambridge, MA: MIT Press.
Tse, P. U., and Logothetis, N. K. (2002). The duration of 3-d form analysis in transformational apparent
motion. Percept Psychophys 64(2): 244–265.
Ullman, S. (1979). The Interpretation of Visual Motion. Cambridge, MA: MIT Press.
Ungerleider, L., and Mishkin, M. (1982). Two cortical visual systems. In: Analysis of Visual Behavior, edited
by D. Ingle, M. Goodale, and R. Mansfield, pp. 549–586. Cambridge, MA: MIT Press.
Vaina, L. M., Gryzacz, N. M., Saiviroonporn, P., LeMay, M., Bienfang, D. C., and Conway, A. (2003).
Can spatial and temporal motion integration compensate for deficits in local motion mechanisms?
Neuropsychologia 41: 1817–1836.
Vanduffel, W., Fize, D., Peuskens, H., Denys, K., Sunaert, S., Todd, J. T., and Orban, G. A. (2002).
Extracting 3D from motion: differences in human and monkey intraparietal cortex. Science
298: 413–415.
Vezzani, S., Kramer, P., & Bressan, P. (2013). Stereokinetic effect, kinetic depth effect, and structure
from motion. In J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford,
U.K.: Oxford University Press.
Victor, J. D., and Conte, M. M. (1990). Motion mechanisms have only limited access to form information.
Vision Res 30: 289–301.
von Grünau, M., and Faubert, J. (1994). Intraattribute and interattribute motion induction. Perception
23(8): 913–928.
von der Heydt, R., Peterhans, E., and Baumgartner, G. (1984). Illusory contours and cortical neuron
responses. Science 224(4654): 1260–1262.
Wallach, H. (1935). Uber visuell wahrgenommene Bewegungsrichtung. Psychol Forsch 20: 325–380.
Wallach, H., and O’Connell, D. N. (1953). The kinetic depth effect. J Exp Psychol 45(4): 205–217.
Wallach, H., Weisz, A., & Adams, P. A. (1956). Circles and derived figures in rotation. Am J Psychol
69: 48–59.
Interactions of Form and Motion in the Perception of Moving Objects 559

Wallis, G., and Bülthoff, H. (2001). Effects of temporal association on recognition memory. Proc Natl Acad
Sci USA 98(8): 4800–4804.
Weiss, Y., & Adelson, E. H. (2000). Adventures with gelatinous ellipses—constraints on models of human
motion analysis. Perception 29: 543–566.
Wertheimer, M. (1961). Experimental studies on the seeing of motion. In T. Shipley (Ed.), Classics in
psychology (pp. 1032-1088). New York: Philosophical Library. (Original work published 1912)
Zhuo, Y., Zhou, T. G., Rao, H. Y., Wang, J. J., Meng, M., Chen, M., Zhou, C., and Chen, L. (2003).
Contributions of the visual ventral pathway to long-range apparent motion. Science 299: 417–420.
Chapter 27

Dynamic grouping motion: A method


for determining perceptual
organization for objects with
connected surfaces
Howard S. Hock

Overview
Rather than focusing on a particular aspect of perceptual organization, the purpose of this chapter
is to describe and extend a new methodology, dynamic grouping, which cuts across and addresses
a wide variety of phenomena and issues related to perceptual organization. The need for this new
methodology, which was introduced by Hock and Nichols (2012), arises from its relevance to the
most common stimulus in our natural environment, objects composed of multiple, connected
surfaces. Remarkably, and despite Palmer and Rock’s (1994) identification of connectedness as a
grouping variable, there has been no systematic research concerned with the perceptual organ-
ization of connected surfaces. This chapter demonstrates the potential of the dynamic grouping
method for furthering our understanding of how grouping processes contribute to object percep-
tion and recognition. It shows how the dynamic grouping method can be used to identify new
grouping variables, examines its relevance for how the visual system solves the ‘surface corres-
pondence problem’ (i.e., determines which of an object’s connected surfaces are grouped together
when different groupings are possible), and provides a concrete realization of the classical idea
that the whole is more than the sum of the parts. The chapter examines the relationship between
dynamic grouping and transformational apparent motion (Tse et al. 1998) and provides insights
regarding the nature of amodal completion and how it can be used to examine classical Gestalt
grouping variables entailing disconnected surfaces (e.g., proximity). Finally, it demonstrates that
perceptual grouping should have a more prominent role in theories of object recognition than is
currently the case, and proposes new theoretical approaches for characterizing the compositional
structure of objects in terms of ‘multidimensional affinity spaces’ and ‘affinity networks’.

The lattice method


Grouping laws, which were originally delineated by Wertheimer (1923), characterize the effect
of various stimulus variables on perceptual organization. How the components of a stimulus are
grouped together depends on such variables as closure, proximity, similarity, movement direction
(common fate), and good continuation (Brooks, this volume). The predominant method for stud-
ying grouping variables has entailed the perceived orientation of 2D lattices composed of discon-
nected surfaces (Wertheimer 1923; Rush 1937; Kubovy and Wagemans 1995; Palmer et al. 1996;
Gori and Spillmann 2010). This method is appropriate for the large volume of research concerned
Dynamic Grouping Motion 561

with the recovery of objects from surface fragments that have become disconnected as a result
of degraded viewing conditions (e.g., Lamote and Wagemans 1999; Shipley and Kellman 2001;
Fantoni et al. 2008). Under non-degraded conditions, however, objects always are composed of
connected surfaces. It would not be surprising, therefore, if a different set of grouping variables
applied. Nor would it be surprising that a substantially different methodology would be required
in order to study these grouping variables.
The great success of the lattice method stems from the isolation of grouping variables and the
determination of their effects from competition between alternative perceptual organizations.
Similarity in shape is isolated for the Wertheimer (1923) lattice in Figure 27.1a; parallel rows are
perceived because the surfaces composing alternating rows are more similar than the surfaces
composing columns, so there is greater grouping strength horizontally than vertically. Proximity
is isolated for the lattice in Figure 27.1b; parallel columns are perceived because the surfaces com-
posing each column are closer together than the surfaces composing each row, so there is greater
grouping strength vertically than horizontally. Finally, shape similarity competes with proximity
for the lattice in Figure 27.1c. Parallel columns are perceived because grouping strength due to
proximity is greater than grouping strength due to shape similarity. Significantly, however, the
outcome of this competition between proximity and shape similarity is not true in general. It
holds only for the particular differences in proximity and the particular differences in shape for
the stimulus depicted in Figure 27.1c.
What is needed for significant progress in our understanding of perceptual organization, espe-
cially as it applies to the connected surfaces of objects, is the development of a new empirical
tool for assessing grouping strength for pairs of adjacent surfaces, and the determination of how
the effects of cooperating grouping variables are combined to establish overall grouping strength
(affinity) for pairs of adjacent surfaces. The prospect for a methodology meeting these require-
ments is a fully described compositional structure for an object (i.e., the pair-wise affinities for
all the object’s surfaces), and the determination that the compositional structure is central to the
recognition of the object.

Dynamic grouping: methodology and concepts


A method with the potential to meet these requirements has recently been reported by Hock
and Nichols (2012). It entails the perception of motion due to dynamic grouping (DG).1 In their
experiments, 2D objects composed of two or more adjacent surfaces are presented in a randomly
ordered series of two-frame trials. The first frame’s duration is on the order of one second, allow-
ing sufficient time for the perceiver to focus attention on the fixation dot located in the center of
the target surface. Preliminary testing has indicated that this duration is sufficient to establish the
compositional structure for simple geometric objects (i.e., the affinity relationships among the
object’s surfaces). However, it remains to be determined whether different compositional struc-
tures would be obtained for other frame durations as a result of differences in the rate with which
affinities are established for different grouping variables (see section below entitled ‘Dynamic
grouping motion versus transformational apparent motion’).
The target in the dynamic grouping paradigm is the surface for which an attribute is changed
during the second frame, the duration of which is on the order of half a second. The luminance of

  Watt and Phillips (2000) use the term ‘dynamic grouping’ in a much different sense. Rather than motion
1

induced by changing values of grouping variables, their emphasis is on the dynamical, self-organizational
aspect of perceptual grouping for both moving and static stimuli.
(a) (b) (c)

(d) High luminance similarity (e) Low luminance similarity


(large upward perturbation in affinity) (small upward perturbation in affinity)

Frame 1 Frame 1

Frame 2 Frame 2

(f) (g)
Increase
Increase

Affinity
Affinity

in affinity
in affinity
Increase Increase
in grouping in grouping
strength strength

Good Good
Connect- continua- Luminance Connect- continua-
Frame 1 Frame 1 ivity
ivity tion simlarity tion

Good Good
Connect- continua- Luminance Connect- continua- Lum
Frame 2 Frame 2 ivity sim
ivity tion simlarity tion
Cumulative strength of grouping variables Cumulative strength of grouping variables

(h) (j)
Frame 1
Affinity

Increase
Frame 2 in affinity
Increase in
grouping strength

Con-
Luminance
Frame 1 nect-
simlarity
ivity

Con-
Luminance
Frame 2 nect-
(i) simlarity
ivity
Frame 1 Cumulative strength of grouping variables

(k)
Frame 2

Increase
Affinity

in affinity
Increase
in grouping
strength

Con-
Good Luminance
Frame 1 nect-
continuation simlarity
ivity

Con-
Good Luminance
Frame 2 nect-
continuation simlarity
ivity
Cumulative strength of grouping variables

Fig. 27.1  Continued.
Dynamic Grouping Motion 563

the target surface always is greater than the luminance of the surfaces with which it is connected.
While some grouping variables remain the same during the transition from Frame 1 to Frame 2,
dynamic grouping variables change in value as a result of changes to the target surface. The change
(say in luminance) increases or decreases the affinity of the target surface with each of the surfaces
adjacent to it, without qualitatively changing the perceptual organization of the geometric object.
Changes (perturbations) in surface affinities that are created by dynamic grouping (DG) variables,
when large enough, elicit the perception of motion across the changing target surface.2,3 The dir-
ection of the DG motion is diagnostic for the affinity relationships among the stimulus’ surfaces
that were established during Frame 1, prior to the change in the target surface during Frame 2.

The direction of dynamic grouping motion


For the 2D objects depicted in Figures 27.1d and 27.1e, connectivity (Palmer and Rock 1994),
co-linearity of horizontal edges (i.e., good continuation) and luminance similarity are grouping
variables that combine to determine the affinity of the two surfaces during Frame 1. Changing the
horizontal bar’s luminance during Frame 2 changes its luminance similarity with the unchanged
square surface next to it; i.e., luminance similarity is the dynamic grouping (DG) variable. The
change in the surfaces’ luminance similarity perturbs the surfaces’ affinity, inducing the percep-
tion of motion across the changing target surface. The motion perceived across the changing
surface is toward the boundary when the affinity of the two surfaces decreases; the bound-
ary is momentarily more salient, as if for the moment the grouping of the surfaces is weaker
(Figures 27.1d and 27.1e). The motion is away from the boundary when their affinity increases;

Fig. 27.1  (a,b,c) Examples using Wertheimer’s (1923) lattice method to identify grouping variables and
determine their relative strength by the outcome of competition between two perceptual organizations.
(d,e) Examples of stimuli for which dynamic grouping (DG) motion is perceived. (f,g) Nonlinear functions
relating the combined effect of grouping variables to the affinity of the surfaces in panels d and e.
Because of super-additivity, changes in affinity are larger and therefore, DG motion is stronger, when
pre-perturbation luminance similarity is greater. (h) Example of a stimulus from Tse et al. (1998) for
which transformational apparent motion (TAM) is perceived in relation to the square. (i) A version of Tse
et al.’s (1998) stimulus for which DG motion also is perceived in relation to the square. (j,k) Nonlinear
functions relating the combined effect of grouping variables to affinity for the two pairs of surfaces in
panel i. Because of super-additivity, changes in affinity are larger and therefore, DG motion is stronger,
for the surface pairs that benefit in pre-perturbation grouping strength from good continuation.
Parts a-c: Data from M. Wertheimer, A Source Book of Gestalt Psychology, tr. W.D. Ellis, Routledge and Kegan,
London, 1923. Parts d-g and i-k: Reprinted from Vision Research, 59, Howard S. Hock and David F. Nichols,
Motion perception induced by dynamic grouping: A probe for the compositional structure of objects, pp. 45–63,
Figure 4, doi: 10.1016/j.visres.2011.11.015 Copyright (c) 2012, with permission from Elsevier. Part h: Reproduced
from Watanabe, Takeo, ed., High-Level Motion Processing: Computational, Neurobiological, and Psychophysical
Perspectives, figure from pages 154–183, © 1998, Massachusetts Institute of Technology, by permission of The
MIT Press.

2  Previous experiments concerned with perceptual grouping and motion perception have studied the effects of
unchanging grouping variables on the perceptual organization of motions elicited by the displacement of sur-
faces (e.g. Kramer and Yantis 1997; Martinovic et al. 2009). Dynamic grouping differs in that the perception
of motion is across a changing surface that is not displaced, and is elicited by changes in grouping variables.
  Dynamic grouping motion, although weaker, is phenomenologically similar to the line motion illusion that is
3

obtained when the changing surface is darker than the surfaces adjacent to it (Hock and Nichols 2010). For the
latter, motion perception results from the detection of oppositely signed changes in edge and/or surface con-
trast (i.e., counterchange). The avoidance of counterchange-determined motion is why the dynamic grouping
method requires the target surface to be lighter than surfaces adjacent to it.
564 Hock

the boundary is momentarily less salient, as if for the moment the grouping of the surfaces is
strengthened. These directions are characteristic for DG induced motion. The implications of
fluctuations in eye position or covert attention shifts without eye movements (Posner 1980) are
discussed in a section entitled ‘Further implications’ at the end of this chapter).

Affinity and the surface correspondence problem


The term affinity is the conceptual lynchpin for the dynamic grouping method. It entails any
variable affecting the likelihood of two surfaces being grouped together. The term is derived from
Ullman’s (1979) ‘minimal-mapping’ account of how the visual system solves the motion corre-
spondence problem, which arises when there are competing possibilities for the perception of
apparent motion from an initially presented surface to one of two or more surfaces that replace
it. Ullman shows that such ambiguities in how surfaces are grouped over time can be resolved by
differences in the affinity of the initially presented surface with each of the subsequently presented
surfaces that replace it.
Like Ullman’s (1979) minimal mapping, the dynamic grouping (DG) method stipulates that
differences in affinity resolve ambiguities, but now for ambiguities entailing the alternative ways
in which adjacent surfaces can be grouped. Rather than solving the motion correspondence
problem in time, the objective is to solve this surface correspondence problem in space (the lat-
ter is called ‘instability of structural interpretation’ by Edelman 1997). In contrast with Ullman,
changes in affinity result in the perception of motion within one of two or more adjacent surfaces
rather than motion between two or more non-adjacent surface locations. In addition, Ullman’s
concept of affinity is extended to account for the combined effect of multiple grouping vari-
ables on the affinity of surface pairs; i.e., how they cooperate in determining over-all grouping
strength.

State-dependence and super-additivity


Hock and Nichols (2012) found, for pairs of adjacent surfaces, that the frequency with which
motion is perceived in DG-determined directions depends on the affinity state of the surfaces
(during Frame 1), prior to the perturbation in affinity produced by the dynamic grouping variable
(during Frame 2). Although other grouping variables could serve as DG variables, for example,
hue similarity and texture similarity in Hock and Nichols (2012), the focus in this chapter is on the
luminance similarity of pairs of surfaces (as measured by their inverse Michelson contrast). Thus,
the greater the luminance similarity for a pair of surfaces during Frame 1 (their pre-perturbation
luminance similarity), the more often DG-specified motion is perceived when luminance similar-
ity is changed (perturbed) during Frame 2. Hock and Nichols (2012) showed that these results
were consistent with the affinity of these surfaces depending on the nonlinear summation of the
affinity values ascribable to individual grouping variables (connectivity, good continuation, and
luminance similarity). This is illustrated in Figures 27.1f and 27.1g by power functions (the curved
gray lines), although the only requirement is for the accumulated effects of individual grouping
variables on affinity to be super-additive; i.e., the combined effects of individual variables on affin-
ity must be greater than their linear sum.
It can be seen in these figures that the strength of DG motion induced by perturbing a
surface-pairs’ affinity depends on the Frame 1, pre-perturbation affinity state of the surfaces. It
lies on a steeper segment of the nonlinearly accelerating grouping/affinity function when the
pre-perturbation affinity of the surfaces is larger (in this case because of greater luminance simi-
larity prior to the perturbation). As a result of this advantage in pre-perturbation affinity, the same
Dynamic Grouping Motion 565

Frame 2 perturbation in luminance similarity produces a larger change in the affinity of the two
surfaces, and thereby elicits a stronger signal for motion across the changing surface in character-
istic DG-determined directions (i.e., away from the boundary of the surfaces when their affinity
increases, and toward the boundary when their affinity decreases.

Compositional structure: solving the surface correspondence problem


An example stimulus from Tse et al. (1998) study of ‘transformational apparent motion’ (TAM)
is presented in Figure 27.1h (see also Blair et  al., this volume). A  horizontal bar connects the
square and vertical bar, which are spatially separated during Frame 1, during Frame 2. The square
then appears to be transformed into an elongated horizontal bar. Tse et al. (1998) conclude that
this occurs because the square and horizontal bar are preferentially grouped as a result of good
continuation.
Hock and Nichols (2012) studied a version of this stimulus for which all three surfaces are
always visible (Figure 27.1i). For this stimulus, the square and horizontal bar can be grouped to
form a subunit, and the subunit grouped with the vertical bar. However, an alternative composi-
tional structure also is possible. That is, the vertical and horizontal bars could be grouped to form
a subunit, and the subunit grouped with the square. How this surface correspondence problem
is solved depends on the pre-perturbation affinity relationships among the surfaces composing
the object. On this basis, good continuation is decisive for the stimulus depicted in Figure 27.1i
because of asymmetry in the pre-perturbation affinity of the horizontal bar with its two flanking
surfaces; luminance similarity and connectivity contribute to the pre-perturbation affinity of the
horizontal bar with both flanking surfaces, whereas good continuation only contributes to the
horizontal bar’s affinity with the square (Figures 27.1j and 27.1k).
The asymmetrical effects of good continuation mean that the pre-perturbation affinity state for
the horizontal bar and square is located on a steeper segment of the accelerating grouping/affin-
ity function compared with the pre-perturbation affinity state for the horizontal bar and vertical
bar. Consequently, the same perturbation in luminance similarity produces a larger perturbation
in the horizontal bar’s affinity with the square than its affinity with the vertical bar, and unidirec-
tional DG motion is perceived in relation to the square rather than the vertical bar. That is, the DG
motion that is perceived across the horizontal bar is away from the square when their luminance
similarity increases and is toward the square when it decreases. The dominance of the stronger
affinity relationship of the horizontal bar and the square is confirmed by the perception of the
same DG motion directions when a gap separates the horizontal and vertical bars, but not when
the gap separates the horizontal bar and square.

Dynamic grouping motion versus


transformational apparent motion
Another version of the Tse et al. (1998) stimulus indicates that good continuation does not neces-
sarily dominate in resolving the surface correspondence problem. In this example (Figure 27.2a),
the presence of hue similarity strengthens the pre-perturbation affinity of the horizontal and
vertical bars sufficiently for their over-all affinity to frequently predominate in determining the
direction of DG motion, and therefore, the pre-perturbation compositional structure of the stim-
ulus. That is, when luminance similarity increases, unidirectional DG motion is perceived across
the horizontal bar, away from the vertical bar rather than away from the square. This asymmetry
in motion perception can again be traced to the nonlinear grouping/affinity function. That is, the
pre-perturbation affinity state is greater when hue similarity contributes to the grouping of the
DYNAMIC GROUPING MOTION TRANSFORMATIONAL APPARENT MOTION
(DG) (TAM)
(a) (b)
Frame 1 Frame 1

Frame 2 Frame 2

(c) (d)
Increase Increase
in affinity in affinity
Increase Increase
Affinity

Affinity
in affinity in affinity

Increase
Increase
in grouping
in grouping
strength
strength

Con- Good Con-


Luminance Hue Luminance
Frame 1 nect- continua- Frame 1 nect- Frame 1 Frame 1
simlarity similarity simlarity
ivity tion ivity

Con- Good Con- Con- Good Con-


Luminance Hue Luminance Luminance Hue Luminance
Frame 2 nect- continua- Frame 2 nect- similarity Frame 2 nect- continua- Frame 2 nect- similarity
simlarity simlarity simlarity simlarity
ivity tion ivity ivity tion ivity

Cumulative strength of grouping variables Cumulative strength of grouping variables

(e) (f) (g)

Frame 1 Frame 1 Frame 1

Frame 2 Frame 2 Frame 2

Fig. 27.2  (a) A version of Tse et al.’s (1998) stimulus for which unidirectional dynamic grouping motion is perceived in the direction determined by hue
similarity. (b) A similar stimulus, but with the horizontal bar presented only during Frame 2. Transformational apparent motion is perceived in the direction
determined by good continuation. (c,d) Nonlinear functions relating the combined effect of grouping variables to affinity for the two pairs of surfaces
in panels a and b. Both are consistent with hue similarity more strongly affecting grouping strength than good continuation. (e) For relatively long
boundary lengths, dynamic grouping (DG) motion is perceived across the changing surface on the left when its luminance is increased. (f) For the same
change in luminance, either no motion or symmetrically divergent motion is perceived when the boundary is shorter. (g) The perception of DG motion
across the surface on the left is restored when the luminance of the surface on the right is raised, increasing the luminance similarity and thereby the
pre-perturbation affinity of the two surfaces.
Dynamic Grouping Motion 567

horizontal and vertical bars, compared with when good continuation contributes to the grouping
of the horizontal bar and square (Figure 27.2c). As a result of the affinity for the horizontal and
vertical bars being located on a steeper segment of the grouping/affinity function, the perturba-
tion of luminance similarity produces a greater change in affinity, and therefore, stronger DG
motion across the horizontal bar in relation to the vertical bar than in relation to the square.
(It is noteworthy that this difference in grouping strength between good continuation and hue
similarity for this stimulus would not be discernible without something like the DG method.)
When the horizontal bar is presented only during the second frame (Figure 27.2b), as in Tse
et al.’s (1998) TAM paradigm, good continuity predominates despite the apparently stronger affin-
ity of the horizontal and vertical bars because of their hue similarity; i.e., the square appears to
expand into a long horizontal bar. As illustrated in Figure 27.2d, there is minimal pre-perturbation
affinity during the first frame for this stimulus (the effect of proximity grouping for the separated
surfaces is assumed to be negligible), and the insertion of the horizontal bar results in a larger
change in affinity for the grouping of the horizontal and vertical bars compared with the horizon-
tal bar and square. If the perception of motion depended only on the size of the affinity change,
TAM, like DG motion, would have been in relation to the vertical bar. This is the opposite of what
is actually perceived.
The perceptual differences between DG and TAM for the stimuli in Figures 27.2a and 27.2b
indicate that they do not always reflect identical aspects of perceptual organization. What then is
the relationship between them? It can be shown with a dynamical model (Hock & Schöner, 2010)
that DG and TAM can entail the same processing mechanisms, with both depending on differ-
ences in the rate of change in affinity that results from changes in grouping variables. DG and TAM
function differently in the model in that TAM depends on different grouping variables having dif-
ferent rates of change in affinity, whereas DG motion depends as well on rates of change varying
according to the level of stable, pre-perturbation affinity. The perceptual results described above
suggest that hue similarity may have a stronger effect on surface affinity than good continuation,
but the contribution of good continuation to surface affinity may emerge more rapidly.

Identifying new grouping variables


Although there are many stimulus variables that might affect the appearance of two surfaces, they
do not necessarily affect their affinity. That is, a stimulus variable may or may not function as a
grouping variable. This is an important consideration because it would affect the likelihood that
surfaces would be grouped together when they are embedded in a more complex, multi-surface
object.
The DG method can be used to identify new grouping variables by testing different val-
ues of a stimulus variable and determining whether each value requires a different amount of
pre-perturbation luminance similarity in order for motion to be perceived in directions charac-
teristic of DG. For example, if the length of the boundary separating two surfaces is a grouping
variable that affects their affinity, different levels of luminance similarity would be required in
order for unidirectional DG motion to be perceived for different boundary lengths. When the
boundary is relatively long, the pre-perturbation luminance similarity for the stimulus in Figure
27.2e and is sufficient to perceive DG motion across the target surface on the left. When the
boundary is shorter, this level of luminance similarity results in either the perception of no motion
or the perception of symmetrical, diverging motion (Figure 27.2f.). Additional pre-perturbation
luminance similarity is required (luminance is raised for the surface on the right) in order for DG
motion to be perceived for the shorter boundary (Figure 27.3g), indicating that the strength of
(a)
C
Frame 1 B
A

Frame 2 B
A

(b) (c) (d)

Frame 1 Frame 1 Frame 1

Frame 2 Frame 2 Frame 2

(e)
Frame 1 Frame 2

A A
B
B

C C

Fig. 27.3  (a) A stimulus for which the perception of dynamic grouping (DG) motion is indicative of
amodal completion behind the occluding cube. The direction of the motion is consistent with the
implied presence of a discontinuous luminance boundary separating surfaces A and C. (b) Unidirectional
DG motion is perceived across the square surface on the right when its luminance is decreased and the
occluding surface is relatively narrow (the squares are relatively close together). (c) For the same change
in luminance, DG motion is not perceived when the occluding surface is relatively wide (the squares
are further apart). (d) The perception of DG motion across the square on the right is restored when the
luminance of the square on the left is lowered, increasing the luminance similarity and therefore the pre-
perturbation affinity of the two physically separated surfaces. (e) Variation of a stimulus from Biederman
(1987). The dynamic grouping motion that is perceived when the luminance of surface B is decreased
is consistent with its grouping with surface A, perhaps to form a truncated cone, a ‘geon’ which
contributes to the recognition of the object as a lamp in Biederman’s (1987) recognition-by-components
theory.
Adapted from Irving Biederman, Recognition-by-components: A theory of human image understanding,
Psychological Review, 94(2), pp. 115–147, http://dx.doi.org/10.1037/0033-295X.94.2.115 © 1987, American
Psychological Association.
Dynamic Grouping Motion 569

the grouping variable increases with increases in the length of the boundary separating pairs of
adjacent surfaces.

Implications of super-additivity
Super-additivity, according to which the combined effects of cooperating grouping variables on
the overall affinity of two surfaces exceeds their linear sum, is a concrete realization of the princi-
ple that the whole is more than the sum of the parts (von Ehrenfels 1890; Wagemans, this volume).
An important consequence of super-additive nonlinearity is that the effect of a particular group-
ing variable on the affinity of a pair of adjacent surfaces is context dependent. That is, it will
vary, depending on the presence or absence of other cooperating grouping variables. This con-
trasts with Bayesian analyses indicating that the effects of grouping variables are independent,
or additive (e.g., Elder and Goldberg 2002). Although Bayesian independence was confirmed
by Claessens and Wagemans (2008) using the lattice method, they also found, inconsistent with
Bayesian-determined independence, that the relative strength of proximity and co-linearity
depended on whether their lattice aligned with cardinal axes or was oblique.

Amodal completion
The DG method can be used to gain further insights into amodal completion, which is typi-
cally concerned with the continuity of unseen stimulus information in time (e.g., Yantis 1995;
Joseph and Nakayama 1999) and space (e.g., Michotte et al. 1964; Tse 1999; van Lier and Gerbino,
this volume). It also can be used to establish the strength of grouping variables for disconnected
surfaces.

Hidden boundaries
For the stimulus in Figure 27.3a, a partially occluded light gray bar composed of surfaces A and C
is readily perceived during the first frame of a two-frame trial. When surface A’s luminance is
decreased during the second frame, its luminance similarity with surface C decreases, resulting in
diagonally upward DG motion across A, toward an amodal hidden boundary with C. In addition
to its effect on the affinity of surfaces A and C, the luminance decrease for surface A increases
its similarity with surface B, so if DG motion were determined strictly on the basis of whether
surfaces are adjacent on the retina, the motion across surface A would have been in the opposite
direction, away from surface B. That the direction of DG motion is consistent with the grouping
of surfaces A and C is important because: (1) it shows that amodal completion can entail dis-
continuous luminance boundaries, not just continuity, (2) the DG method can be diagnostic for
the grouping of surfaces even when their common boundaries are hidden, and (3) it enables the
measurement of affinity for non-adjacent surfaces. The latter feature is the basis for the measure-
ment of proximity effects, which is described next.

The effects of proximity


Pairs of co-linear squares that are separated by an occluding surface can be used to measure
proximity effects, which would be expected to decrease as the width of the occluding surface is
increased. For the relatively narrow occluder in Figure 27.3b, the perception of unidirectional DG
motion across the target square on the right requires relatively little pre-perturbation luminance
similarity. However, proximity grouping is weaker when the width of the occluder is increased,
so DG motion is not perceived (Figure 27.3c). It is perceived across the square on the right when
570 Hock

luminance is lowered for the square on the left (Figure 27.3d). This is because the change in lumi-
nance increases the pre-perturbation luminance similarity of the two square surfaces, which are
physically separate but nonetheless perceptually grouped.
The pre-perturbation luminance similarity required in order to perceive motion in
DG-determined directions increases (the Michelson contrast of the physically separated surfaces
decreases) with successive increases in the distance between the squares. Precise psychophysi-
cal measurements with systematically varied pre-perturbation luminance similarity will make it
possible to determine whether the ratios based on the equivalent luminance similarity for each
proximity value (including a proximity value of zero) will be consistent with the distance ratios
measured by Kubovy and Wagemans (1995) in their experiments using the lattice method.

Implications for object recognition


The most prominent theories of object recognition are based on the spatial arrangement of 3D
geometric primitives (Marr and Nishihara 1978; Pentland 1987; Biederman 1987). Much of
the research evaluating these theories has addressed their limitations with respect to viewpoint
invariance (e.g., Tarr et al. 1998), Ieading to alternative models entailing the encoding of different
views of the same object (e.g., Ullman 1989). However, these image-based models have their own
limitations with respect to category invariance; i.e., they are problematic for the classification of
other objects belonging to the same category (Edelman 1997; Tarr and Bultoff 1995). A further
limitation is that in contrast with the computer vision literature (e.g., Lowe 1987; Arseneault et al.
1994; Jacobs 1996; Iqbal and Aggarwal 2002), grouping properties have not been incorporated
into psychological theories of object recognition (Palmer 1999). A possible reason for this has
been the absence, until now, of a suitable empirical method for identifying grouping variables
specific to the connected surfaces of objects and determining the combined effect of these group-
ing variables. Described below is the use of the DG method to demonstrate the potential for per-
ceptual grouping to play a more significant role in theories of object recognition, like Biederman’s
(1987) recognition-by-components theory.
Biederman’s (1987) theory entails edge extraction, the parsing of surfaces based on their con-
cavities, and the recognition of objects on the basis of whether the parsed surfaces match 3D
geometric primitives (geons) in memory. The stimulus depicted in Figure 27.3e is similar to one
of Biederman’s (1987) examples. The object is presumably recognized as a lamp based on the pres-
ence and relative locations of geons corresponding to the lampshade (a truncated cone), the stem
(a cylinder), and the base (a truncated cylinder). However, surface B by itself does not evoke a
truncated cone or any other geon. A truncated cone is formed only after surface B (corresponding
to the lampshade’s outer surface) is grouped with surface A (the elliptical shadow correspond-
ing to the inside of the lampshade). Hock and Nichols (2012) used the DG method to show that
surfaces A and B are indeed grouped together. When the luminance of surface B decreases, its
luminance similarity with both black surfaces adjacent to it increases, and motion across the
changing surface is downward and to the right, consistent with the outer lampshade having a
greater pre-perturbation affinity with the ellipse (due to good continuation and perhaps boundary
length) than with the cylindrical stem of the lamp.
This example is consistent with a theory of object recognition in which surface-grouping opera-
tions precede the activation of object parts in memory (possibly geons, but other primitives are
not excluded), with the object’s parts serving as the basis for its recognition. (See Jacot-Descombes
and Pun (1997) for an artificial vision model along these lines.) A processing sequence in which
Dynamic Grouping Motion 571

surface grouping precedes comparison with component information in memory would reduce
the complexity of object recognition (Jacobs 1996; Feldman 1999), but it also is possible that the
affinity values for all pairings of the surfaces composing an object are unique, and therefore suf-
ficient for the recognition of the object. In either case, the ultimate test for dynamic grouping, or
any other method for assessing the compositional structure of a multi-surface object, is that the
compositional structure is determinative for the recognition of the object.

Further implications
The example in Figure 27.3e shows that grouping processes should have an explicit role in theories
of object perception, but it is quite another thing to specify what the role should be. The approach
taken in this chapter is that grouping variables determine the affinity of pairs of surfaces, and
thereby, the compositional structure of the object comprising those surfaces. Experiments and
demonstrations with simple, 2D objects composed of two or three surfaces have provided evidence
for the usefulness of the dynamic grouping method for the determination of affinity. Extending
the method to multi-surface, 3D objects creates opportunities for discovering new grouping vari-
ables, and determining how ambiguities in perceptual grouping are resolved (the ‘surface corre-
spondence problem’) in the context of the other surfaces composing a complex object.
The key theoretical concepts are: (1) the affinity of a pair of surfaces belonging to an object
depends on the nonlinear (super-additive) summation of the affinity values ascribable to indi-
vidual grouping variables, and (2)  the compositional structure of the object is revealed by
embedding the pairwise affinity relationships among the surfaces composing the object into a
multidimensional affinity space. This would entail multidimensional scaling (MDS) based on
matrices of DG-measured affinity for all the pairwise combinations of an object’s surfaces. Points
in the space would represent the surfaces composing an object, and the distance between the
points would represent the affinity of the surfaces. In contrast with multidimensional models of
object recognition that specify particular features, like color, shape and texture (e.g., Mei 1997),
the compositional structures determined with the dynamic grouping method will be based on an
abstract entity, affinity, so they will not be specific to the particular features of familiar objects.
They therefore would have the potential to exhibit a degree of invariance; i.e., generalize to other
objects with different features but a similar compositional structure, and to new viewpoints for
the same object.
Using MDS methods, the compositional structure of an object can be determined without
restrictions or pre-conceptions; e.g., without the typical assumption that the structure is hierarchi-
cal (Palmer 1977; Brooks 1983; Cutting 1986; Feldman (1999); Joo et al., this volume). Although
there are no restrictions in the compositional structure’s form, the existence of parts could be
indicated by the clustering of surfaces in multidimensional affinity space, and significant relations
between the parts, including possible hierarchical relations, could be indicated when pairs of sur-
faces from different clusters are relatively close in that abstract space.
An important consideration is the extent to which affinity relationships indicated by the
dynamic grouping method are definitive. In the experiments and demonstrations discussed in
this chapter, instructions have emphasized fixating on a dot placed in the center of the target sur-
face and maintaining attention on the dot for the entire two-frame trial. The purpose is to estab-
lish relatively unbiased conditions for determining the direction of dynamic grouping motion.
However, it is as yet undetermined whether fluctuations in eye position or covert attentional shifts
without eye movements (Posner 1980) will alter the compositional structures that are indicated by
572 Hock

the dynamic grouping method. Indeed, when the stimuli like those in Figures 27.1i and 27.2a are
freely examined there is the sense that the surfaces can be grouped in more than one way.
These uncertainties do not undermine the usefulness of the dynamic grouping method for
objects with more complex surface relationships. That is, changes in fixation or shifts of atten-
tion that reduce the measured affinity of a target surface with another surface would be likely
to also change its affinity with the other surfaces composing the object. Such changes can be
conceived of as the equivalent of the perturbations in luminance similarity that that can result
in the perception of dynamic grouping motion. That is, they can temporarily alter the multidi-
mensional compositional structure of an object, but the structure is nonetheless restored after
the perturbation.
The relationships among the surfaces composing an object also could be characterized as
an ‘affinity network’ in which each surface is represented by an activation variable and the
coupling strength for pairs of activation values is determined by their affinity. Changes in
luminance, eye position, or attention could perturb coupling strengths, but the inherent sta-
bility of the network would restore the couplings to their stable values. Exceptions are bistable
objects for which perturbations could result in new couplings among the object’s surfaces that
qualitative change the compositional structure of the object (e.g., the Necker cube). As in the
case of bistable motion patterns (Hock et al. 2003; Hock & Schöner 2010), such bistable objects
may provide an ideal vehicle for investigating the nature of compositional structure for static
objects.

References
Arseneault, J-L, Bergevin, R., and Laurendeau, D. (1994). ‘Extraction of 2D groupings for 3D object
recognition’. Proceedings SPIE 2239: 27.
Biederman, I. (1987). ‘Recognition-by-components: a theory of human image understanding’. Psychological
Review 94: 115–47.
Blair, C.D., Caplovitz, G.P., and Tse, P.U. (this volume). ‘Interactions of form and motion in the perception
of moving objects’. In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Brooks, R.A. (1983). ‘Model-based three-dimensional interpretations of two-dimensional images’. IEEE
Transactions on pattern Analysis and Machine Intelligence, 5: 140–149.
Claessens, P.M.E. and Wagemans, J. (2008). ‘A Bayesian framework for cue integration in multistable
grouping: Proximity, colinearity, and orientation priors in zigzag lattices’. Journal of Vision 8: 1–23.
Cutting, J. (1986). Perception with an eye for motion. Cambridge, MA: MIT Press.
Edelman, S. (1997). ‘Computational theories of object recognition’. Trends in Cognitive Sciences 1: 206–304.
Elder, J., and Goldberg, R.M. (2002). ‘Ecological statistics of Gestalt laws for the perceptual organization of
contours’. Journal of Vision 2: 324–53.
Fantoni, C., Hilger, J., Gerbino, W., and Kellman, P. J. (2008). ‘Surface interpolation and 3D relatability’.
Journal of Vision 8: 1–19.
Feldman, J. (1999). ‘The role of objects in perceptual grouping’. Acta Psychologica 102: 137–63.
Gori, S., and Spillmann, L. (2010). ‘Detection vs. grouping thresholds for elements differing in spacing,
size and luminance. An alternative approach towards the psychophysics of Gestalten’. Vision Research
50: 1194–202.
Hock, H.S., and Nichols, D.F. (2010). ‘The line motion illusion: The detection of counterchanging edge and
surface contrast’. Journal of Experimental Psychology: Human Perception and Performance 36: 781–96.
Hock, H.S., and Nichols, D.F. (2012). ‘Motion perception induced by dynamic grouping: A probe for the
compositional structure of objects’. Vision Research 59: 45–63.
Dynamic Grouping Motion 573

Hock, H.S., & Schöner, G. (2010). ‘A neural basis for perceptual dynamics’. In Nonlinear dynamics in human
behavior, edited by. R. Huys and V. Jirsa, pp. 151–77. (Berlin: Springer Verlag).
Hock, H. S., Schöner, G., and Giese, M. A. (2003). ‚The dynamical foundations of motion pattern
formation; Stability, selective adaptation, and perceptual continuity’. Perception & Psychophysics
65: 429–57.
Iqbal, Q., and Aggarwal, J.K. (2002). ‘Retrieval by classification of images containing large manmade
objects using perceptual grouping’. Pattern Recognition 35: 1463–79.
Jacobs, D. (1996). ‘Robust and efficient detection of salient convex groups’. I.E.E.E. Transactions on Pattern
Analysis and Machine Intelligence 18: 23–37.
Jacot-Descombes, A., and Pun, T. (1997). ‘Asynchronous perceptual grouping: from contours to relevant
2-D structures’. Computer Vision and Image Understanding 66: 1–24.
Joo, J., Wang, S., & Zhu, S.-C. (2013). Hierarchical organization by and-or tree. In The Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Joseph, J.S., and Nakayama, K. (1999). ‘Amodal representation depends on the object seen before partial
occlusion;. Vision Research 39: 283–92.
Kramer, P., and Yantis, S. (1997). ‘Perceptual grouping in space and time: Evidence from the Ternus
display’. Perception & Psychophysics 59: 87–99.
Kubovy, M., and Wagemans (1995). ‘Grouping by proximity and multistability in dot lattices: A quantitative
gestalt theory’. Psychological Science 6: 225–34.
Lamote, C., and Wagemans, J. (1999). ‘Rapid integration of contour fragments: From simple filling-in to
parts-based description’. Visual Cognition 6: 345–61.
Lowe, D.G. (1987). ‘Three-dimensional object recognition form single two-dimensional images’. Artificial
Intelligence 31: 355–95.
Marr, D., and Nishihara, H.K. (1978). ‘Representation and recognition of the spatial organization of
three-dimensional shapes’. Proceedings of the Royal Society of London, Series B 211: 151–80.
Martinovic, J., Meyer, G., Muller, M.M., and Wuerger, S.M. (2009). ‘S-cone signals invisible to the motion
system can improve motion extraction via grouping by color’. Visual Neuroscience 26: 237–48.
Mei, B. (1997). ‘Combining color, shape, and texture histogramming in a neutrally-inspired approach to
visual object recognition’. Neural Computation, 9: 777–804.
Michotte, A., Thinès, G., and Crabbè, G. (1964). Les compléments amodaux des structures perceptives
(Amodal completion of perceptual structures). (Leuven, Belgium: Publications Universitaires de
Louvain).
Palmer, S.E. (1999). Vision science: Photons to phenomenology. (Cambridge MA: Bradford Books).
Palmer, S.E., and Rock, I. (1994). ‘Rethinking perceptual organization: the role of uniform connectedness’.
Psychonomic Bulletin and Review 1: 29–55.
Palmer, S.E., Neff, J., and Beck, D. (1996). ‘Late influences on perceptual grouping: Amodal completion’.
Psychonomic Bulletin and Review 3: 75–80.
Pentland, A.P. (1987). ‘Perceptual organization and the representation of natural form’. Artificial Intelligence
28: 293–331.
Posner, M.I. (1980). ‘Orienting of attention’. Quarterly Journal of Experimental Psychology 32: 3–25.
Rush, G. (1937). ‘Visual grouping in relation to age’. Archives of Psychology, N.Y. 31: No. 217.
Shipley, T.F., and Kellman, P.J., (Eds.) (2001). From Fragments to Objects: Segmentation and Grouping in
Vision. (Amsterdam: Elsevier Science Press).
Tarr, M. J., and Bultoff, H.H. (1995). ‘Is human object recognition better described by
geon-structural-descriptions or by multiple-views? Comment on Biederman and Gerhardstein (1993).
Journal of Experimental Psychology: Human Perception and Performance 21 1494–505.
Tarr, M. J., Williams, P., Hayward, W. G., and Gauthier, I. (1998). ‘Three dimensional object recognition is
viewpoint-dependent’. Nature Neuroscience 1: 275–77.
574 Hock

Tse, P.U. (1999). ‘Volume completion’. Cognitive Psychology 39: 37–68.


Tse, P., Cavanagh, P., and Nakayama, K. (1998). ‘The role of parsing in high-level motion processing’. In
High-level motion processing: Computational, neurobiological, and psychophysical perspectives, edited by
T. Watanabe, pp. 154–83. (MIT Press: Cambridge).
Ullman, S. (1979). The interpretation of visual motion. (Cambridge, MA: MIT Press).
Ullman, S. (1989). ‘Aligning pictorial descriptions: an approach to object recognition’. Cognition
32: 193–254.
van Lier, J.R., & Gerbino, W. (in press). Perceptual completions. In The Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
von Ehrenfels, C. (1890). ‚Über ‚Gestaltqualitäten.’ Vierteljahrsschrift für wissenshaftliche‘. Philosophie
14: 224–92. Translated as ‘On Gestalt Qualities.’ In B. Smith (ed. and trans.) (1988). Foundations of
Gestalt theory, pp. 82–117. (Munich, Germany: Philosophie Verlag).
Wagemans, J. (in press). ‘Historical and conceptual background: Gestalt theory’. In The Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Watt, R.J., & Phillips, W.A. (2000). ‘The function of dynamic grouping in vision’. Trends in Cognitive
Sciences 4: 447–54.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt’. Psychologische Forschung 4: 301–350.
Reprinted in W.D. Ellis (Ed.) (1938). A source book of Gestalt psychology. (London: Routledge & Kegan).
Yantis, S. (1995). ‘Perceived continuity of occluded visual objects’. Psychological Science 6: 182–6.
Chapter 28

Biological and body motion perception


Martin A. Giese

A huge variety of empirical studies have been collected that treat different aspects of the percep-
tion of biological and body motion, ranging from psychophysical questions, the processing of
social signals, over ecological and developmental aspects, to clinical implications. Due to space
limitations, this chapter focuses primarily on aspects related to pattern formation and the organi-
zation of Gestalt for dynamic patterns.
Many topics in body motion perception, which cannot be covered in this chapter due to space
limitations, are treated in many excellent review articles and books. This includes the original work
by Gunnar Johannson (review: Jansson et al. 1994), the psychophysics and the neural basis of body
and facial motion processing (Puce and Perrett 2003; Allison et al. 2000; O’Toole et al. 2002; Blake &
Shiffrar, 2007), computational principles (Giese and Poggio 2003), imaging results (Blakemore and
Decety 2001; Puce and Perrett 2003), and its relationship to emotion processing (de Gelder 2006).
Another important topic that cannot be adequately treated in this review due to space limi-
tations is the relationship between body motion perception and motor representations. Several
recent books treat exhaustively different aspects of biological and body motion perception,
which could not be included in this review (e.g. Knoblich et al. 2006; Johnson and Shiffrar 2013;
Rizzolatti and Sinigaglia 2008).

Historical Background
While already Aristotle had written about the principles of movements of animals, the system-
atic scientific investigation of body motion perception started back two centuries ago with the
works and Eadweard Muybridge (1887) and Etienne-Jules Marey (1894) who studied body
motion, applying the technique of sequential photography. While classical Gestalt psychologists
had treated the organization of complex motion patterns not so extensively, the systematic study
of biological and body motion was initiated by the Swedish psychologist Gunnar Johansson in
the 1970s. He was originally interested in studying Gestalt laws of motion organization, and for
him body motion was an example of a complex motion pattern with relevance for everyday life
(Jansson et al. 1994). His work on biological motion grew out of studies on the organization of
much simpler motion patterns during his PhD thesis (Johansson 1950), aiming at the develop-
ment of a general ‘theory of event perception’.
Already classical Gestalt psychologists had described pattern organization phenomena for
simple motion patterns. This includes the classical law of ‘common fate’ (Wertheimer 1923),
work on motion grouping (Ternus 1926) and on ‘induced motion’ by Duncker (1929) (see
Figure 28.1a), and studies by Metzger (1937) on the ‘Prägnanz’ in motion perception perception
(see Herzog and Öğmen, this volume). In addition, some more recent work by Albert Michotte
576 Giese

(a) Stimulus Percept

(b)

(c)

(d)

Fig. 28.1  Perceptual organization of simple motion displays. (a) Induced motion (Duncker 1929): while
in reality the external frame moves and the dot is stationary, the dot is perceived as the moving element.
(The following examples are taken from Johansson (1950)): (b) three dots that move along straight lines
are perceptually grouped into two pairs of dots that move up and down, with a periodic ‘contraction’
of their virtual connection line horizontally. (c) Two dots that move vertically and two that move along
a circle are grouped into a single line that moves vertically. In addition, the exterior points are perceived
as moving horizontally. (d) Two dots, where one moves along a straight line and the second along
piecewise curved paths, is perceived as a ‘rotating wheel’, where one dot is rotating about the other.
Part a: Reproduced from Psychologische Forschung, 12(1), pp. 180–259, Über induzierte Bewegung, Karl
Duncker, © 1929, Springer Science and Business Media. With kind permission from Springer Science and Business
Media. Parts b-d: Reproduced from G. Johansson, ‘Configurations in Event Perception: An experimental study’.
Dissertation, Högskolan, Stockholm, 1950.

(1946/1963) addressed the interpretation of simple motion displays in terms of the perception
of ‘causality’.
Johansson tried to study systematically Gestalt grouping principles in simple motion displays
that consisted of small numbers of moving dots, where he varied systematically their geometrical
and temporal parameters. A variety of his observations are in line with modern theories about
the estimation of optic flow from spatiotemporal image data, such as the tendency to group dots
with similar motion vectors in the image plane, or a tendency to favor correspondences in terms
of slow motion.
In addition, Johansson made the important additional discovery that he formalized in his theory
of vector analysis: often even simple motion patterns are perceptually organized in terms of interpre-
tations that impose a hierarchy of spatial frames of reference, instead of a simple perceptual repre-
sentation that reflects just the physical structure of the motion. Some example stimuli that illustrate
this phenomenon are shown in Figure 28.1b–d. The physical motion of the stimulus is decomposed
into components that describe, sometimes non-rigid deformations within the grouped structure
(e.g. a contracting bar), and a second motion component that describes the motion of the whole
grouped structure within the external frame of reference (e.g. the movement of the whole bar). The
key point is that the perceptual interpretation provides a description in terms of relative motion
Biological and Body Motion Perception 577

that is described within frames of reference, which partially result from the grouping process itself.
This can be interpreted as a form of vectorial decomposition of the motion, e.g. in a component
that describes the motion of a whole group of dots, and an additive second vectorial component
that  describes the relative motion between the individual dots within the groups. It seems obvi-
ous that the principle might be extendable for more complex displays, e.g. consisting of multiple
non-rigid parts that move against each other. The human body is an example for such a more com-
plex system, and this motivated originally the interest of Johansson in these types of stimuli.
The analysis of such hierarchical patterns of relative motion is an interesting theoretical problem,
and has motivated theoretical work in psychology that tried to account for the organization of such
patterns by an application of coding theory and the principle of minimum description length (Restle
1979). The underlying idea is to characterize different possible encodings of the motion patterns by
the required number of describing parameters (such as amplitude, phase, and frequency for sinus-
oidal oscillation). Encodings in terms of hierarchies of relative motions are often more compact, i.e.
require less describing parameters than the direct encoding of the physical movements. In computer
vision the minimum description length principle has been successfully applied, e.g., for motion seg-
mentation (Shi et al. 1998) and the compression of motion patterns in videos (e.g. Nicolas et al.
1997). However, general models that decompose complex motion patterns in terms of hierarchies of
relative motion, in the way envisioned by Johansson, remain to be developed.

Psychophysical Investigation of Biological


and Body Motion Perception
One of the most famous discoveries by Gunnar Johansson was that body motion can be recog-
nized from motion patterns that present only moving dots at the positions of the joints of moving
humans, in absence of any information about the body surface (Johansson 1973). He generated
these stimuli by fixing light bulbs or reflecting tapes on the major joints of his participants and
filming them in the dark (Figure 28.2), a technique that was originally developed by Murray.
(Today such stimuli are typically generated by motion capture (data bases see, e.g. Vanrie and
Verfaillie 2004; Ma et al. 2006)). Johansson’s unexpected observation was that observers were able
to recognize body motion easily from such strongly impoverished stimuli, even if they were pre-
sented only for a very short time (such as 200 ms) (Johansson 1976). Static patterns of this type,
however, could not be easily interpreted by the observers.

Phenomenological Studies
Subsequent early research on body motion perception verified that different categories of move-
ments could be recognized from point-light stimuli, such as walking, running, or dancing (e.g.
Johansson 1973; Dittrich 1993). Further studies showed that humans also can recognize animals,
such as or dogs from such point-light stimuli (e.g. Bellefeuille and Faubert 1998; Jokisch and Troje
2003). Many early experiments tried to characterize the capability to derive subtle information
from such motion cues, such a gender (Barclay et al. 1978; Cutting et al. 1978; Pollick et al. 2005),
gaits of familiar people or friends (e.g. Beardsworth and Buckner 1981; Cutting and Kozlowski
1977), age (Montpare et al. 1988), or emotions (e.g. Dittrich et al. 1996; Walk and Homan 1984;
Atkinson et al. 2004; Roether et al. 2009). Also, it has been shown that observers can derive phys-
ical properties, such as the weights of lifted objects from such point-light stimuli (e.g. Runeson
and Frykholm 1981). In the context of these early studies, also the first mathematical descrip-
tions for critical features, e.g. for gender perception, and simplified mathematical models for gait
trajectories, suitable for the synthesis of point-light pattern by computer graphics (Cutting et al.
578 Giese

(a) (b)

Fig. 28.2  Point-light biological motion stimulus. (a) Light bulbs or markers are fixed to the major
joints of a moving human. (b) Presentation of moving dots alone results in a point-light stimulus that
induces the vivid perception of a moving human.
Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Neuroscience, 4(3), Martin A. Giese and
Tomaso Poggio, Neural mechanisms for the recognition of biological movements, page 180, Copyright © 2003,
Nature Publishing Group.

1978) have been developed. In addition, minimum coding theory was extended to gait patterns
(Cutting 1981).
Already starting to investigate the underlying critical processes, another stream of experi-
ments investigated the robustness of the perception of body motion form point-light stimuli,
introducing specific manipulations of Johansson’s original stimuli. This includes the mask-
ing of point-light stimuli by moving dot masks, generated from randomly positioned moving
dots from point-light stimuli (‘scrambled walker noise’) (Bertenthal and Pinto 1994; Cutting
et al. 1978). Other studies tried to degrade the local motion information by introducing tem-
poral delays between the stimulus frames (Thornton et al. 1998), variations of contrast polar-
ity and spatial-frequency information, or by changing the relative phase of the dots or their
disparity information (Ahlström et al. 1997). The depth information in binocularly presented
point-light stimuli could be strongly degraded without the observers even noticing this manipu-
lation (Bülthoff et al. 1998). This observation seems incompatible with mechanisms of biologi-
cal motion recognition that rely on a veridical reconstruction of depth. However, more recent
studies show that depth has an important influence and can disambiguate bistable point-light
stimuli whose orientation in space cannot be uniquely derived from two-dimensional informa-
tion (Vanrie et  al. 2004; Jackson and Blake 2010). Other studies tried to degrade point-light
stimuli by randomizing the positions of the dots on the body (Cutting 1981) and by limiting the
life time of individual dots (e.g. Neri et al. 1998; Beintema and Lappe 2002). Another interesting
manipulation looking specifically for the organization of biological motion patterns in terms of
spatial units were studies that randomized the position of individual parts of the body, leaving
Biological and Body Motion Perception 579

their internal motion invariant (showing e.g. all limbs, vs. only the ipsi- or contralateral limbs)
(Pinto and Shiffrar 1999; Neri 2009).
Finally, another set of studied used the rotation of point-light walkers in the image plane (inver-
sion) in order to study frames of reference in which the underlying perceptual processing happens.
Like for the perception of faces, rotations in the image plane strongly degrades the perception of
body motion form point-light stimuli (e.g. Sumi 1984; Pavlova and Sokolov 2000. The orientation
dependence seems to be tied to an egocentric rather than to the external frame of reference (e.g.
Troje 2003). Also the ‘Thatcher illusion’ (that is the difficulty to recognize inverted face parts in
faces that are presented upside down) has been generalized to biological motion patterns (Mirenzi
and Hiris 2011). In line with this, a recent study has shown that the features of the local dots (e.g.
color) are less accessible for consciousness when they are embedded in an upright than in an
inverted biological motion walker (Poljac et al. 2012). These results strongly suggest that the per-
ceptual processing of biological motion might be critically dependent on templates that are tied to
the visual frame of reference, rather than on a generic process that reconstructs three-dimensional
shape from motion.

Continuous Perceptual Spaces of Motion


The relevance of learned templates in the processing of biological and body motion is also sup-
ported by the observation of gradual generalization between different similar body motion pat-
terns. A  hallmark of such generalization is an encoding in terms of topologically well-defined
perceptual spaces.
In computer graphics, for a long time blending techniques have been applied for the gener-
ation of novel movements with intermediate style properties. An example are ‘gait designers’ for
the generation of gender-specific walking or of body movements with different emotional styles
(e.g. Unuma et al. 1995; Wiley and Hahn 1997; Rose et al. 1998). Psychologists have used similar
techniques to generate style spaces of body motion in order to study of the perception and cat-
egorization of movements (Pollick et al. 2001; Hill and Pollick 2000; Giese and Lappe 2002; Troje
2002). As for faces, it has been shown that body movements can be made particularly expressive
and discriminable by extrapolation in such style spaces (‘caricature effect’). As for object recogni-
tion (Bülthoff and Edelman 1992), the categorization of motion patterns seems to be character-
ized by smooth generalization fields (Giese and Lappe 2002). In addition, the metric properties
of the underlying perceptual space can be recovered by applying multi-dimensional scaling to
similarity judgments for body motion patterns, finding that its metric closely resembles to the
one defined by distance measures in space-time between the trajectories. This implies a ‘veridi-
cal’ encoding of the physical properties of body motions in such perceptual spaces (Giese et al.
2008).
Neural representations of continuous topological pattern spaces give raise to high-level after
effects. This has been first shown for static pictures of faces (Leopold et al. 2001). Adaptation with
an ‘anti-face’ (a face located opposite to the original face, relative to the average face, in face space)
results in an after-effect: The average face is then briefly perceived as the original face immediately
after the adaptation phase. Similar after-effects have been observed for biological motion: If for
example observers are exposed to a female walker for several seconds, they perceive a gender-neutral
morph temporarily as male walk (Jordan et al. 2006; Troje et al. 2006). It has been shown that such
after-effects are not simply a reflection of low-level form or motion after-effects, and must be based
on higher representations of body motion. Recent studies have started to investigate how form and
motion representations contribute to such high-level after-effects (Theusner et al. 2011).
580 Giese

From Critical Features to ‘Life Detectors’


A substantial amount of research in the field of biological motion perception has been searching
for the visual features that might be critical for the perception of body motion. At the same time,
this work has isolated different levels of the analysis of body motion.
A prominent example of this is work about the relevance of form vs. motion features. While
some studies, in line with Johansson’s original inspiration, have provided evidence for a critical
role of motion features (e.g. Cutting et al. 1988; Mather et al. 1992; Thornton et al. 1998; Neri et al.
1998; Casile and Giese 2005), others have strongly emphasized the role of form information (e.g.
Beintema and Lappe 2002; Hiris 2007). It is critical in this context to define precisely what ‘form’
and ‘motion information’ means, and what exactly is understood by ‘recognizing body motion’.
Figure 28.3a–c tries to illustrate different cues in the processing of body motion. One type of
form-based information is the global configuration of the human body. Information about body
shape seems at least critical for recognizing moving bodies in clutter, such as in randomly moving
background elements (e.g. Lu 2010). However, such global configurations can be specified based
on local form features (panel A), as well as on local motion features (panel B) (specifying com-
plexly structured optic flow patterns). It is thus a logical error to confuse the relevance of the body
configuration with an exclusive relevance of shape information. An alternative to the processing
of the global configural shape, which is sufficient to solve certain tasks (e.g. to detect body parts,
or whether a walker is going right or left) is the use of local features, or even individual dot trajec-
tories (panel C). Such tasks can be solved without necessarily perceiving a whole human body, e.g.
by detection of asymmetry in the motion.

(a) (b) (c) (d)


?

Fig. 28.3  Informative cues in body motion stimuli. The global configuration of a human body can be
recovered either from: (a) local form features (e.g. orientation and positions of limbs or limb parts),
or (b) from local motion features, which specify for each time point a complex instantaneous optic
flow field. (c) Trajectories of individual dots, like the ones of the feet, can also provide sufficient
information for the solution of specific biological motion tasks, e.g. detection of walking direction.
(d) Equivalent of a ‘life detector’ in the form domain. The direction of the nose in a scrambled face
image (middle panel) makes it easy to determine the heading direction of the face (upper panel).
This detection is more difficult if the picture is rotated upside down (‘inversion effect’).
Biological and Body Motion Perception 581

The fact that it is easy to recognize walking or running from static pictures of stick figures
shows that form information is relevant for the processing of body motion (Todd 1983). In
addition, it seems obvious that humans can learn to recognize point-light configurations, just
as any other shape, after sufficient training (Reid et al. 2009). Computational work has tried
to identify critical features for body motion perception, which generalize spontaneously from
full-body figures to point-light stimuli, applying principle components analysis to motion and
form features. It turns out that such generalization is easier to achieve for motion than for form
features (Casile and Giese 2005). In addition, the opponent motion of the hand and the feet
seems to be a critical feature for the recognition of biological motion (Casile and Giese 2005;
Chang and Troje 2009). Trying to oppose the potential relevance of local motion cues, Beintema
and Lappe (2002) have demonstrated that point-light walkers can be recognized from stimuli
where the dot positions are randomized on the skeleton in every frame. This manipulation
degrades the local motion information, but does not eliminate some of the critical motion fea-
tures (Casile and Giese 2005).
While Lappe and colleagues hypothesized that local motion processing is completely irrelevant
for biological motion processing, unless the moving figure has to be segmented from a (station-
ary) background (Lange and Lappe 2006), studies comparing the relevance of form and motion
cues sometimes found a primary relevance of form and sometimes of motion cues (e.g. Lu and
Liu 2006; Hiris et al. 2007; Thurman and Grossman 2008). Instead of denying the relevance of
individual cues, more recent work has rather studied how the cues are integrated. A recent set of
studies tried to develop reverse correlation techniques in order to identify critical features that
drive the categorization of biological motion patterns (Lu and Liu 2006; Thurman and Grossman
2008; Thurman et al. 2010). These studies found evidence for a relevance of both types of features,
consistent with the hypothesis that the nervous system fuses different informative cues during the
processing of body motion (instead of dumping classes of informative cues). Further evidence
suggests that it is dependent on the task which cue is more effective (Thirkettle et al. 2009). In
the same direction points also a recent study that suggests the existence of separate high-level
after-effects that are dependent on form or motion cues (Theusner et al. 2011).
A further stream of research about features in the recognition of body motion has been initi-
ated by the observation that the walking direction of point-light walkers can even be derived
from scrambled walkers, for which the configural information about the body shape has been
destroyed. In addition, the recognition of walking direction from these stimuli is worse if these
stimulus patterns are rotated upside down, implying an inversion effect (Troje and Westhoff
2006). The fact that the walking direction can be recognized without the configural informa-
tion in a forced-choice task is due to the fact that in particular the foot movement trajectory of
walking is highly asymmetrical (Figure 28.3c). (This fact is analogous to the observation that it
is easy to detect the facing direction of side views of faces from only the direction in which the
nose points, see Figure 28.3d.) The recognition of walking direction from such individual dot
trajectories is consistent with motion template detectors that are defined in a retinal frame of
reference. It is unclear in how far such detectors are learned or partially innate. Some research-
ers have interpreted the above observation as evidence for a special-purpose mechanism for the
detection of the asymmetric foot trajectories, which has been termed ‘life detector’. Since a similar
inversion effect was observed for the tendency of newly hatched chicks to align their bodies with
point-light patterns (Vallortigara and Regolin 2006), it has also been hypothesized that this spe-
cial purpose mechanism is evolutionary old, and potentially universal through a lot of species.
(See also Koenderink’s chapter on Gestalts as ecological templates, this volume.) The concept
of the ‘life detector’ has initiated a number of follow-up studies, investigating the processing of
582 Giese

biological motion information in absence of configural cues. For example, the perceived temporal
duration of biological motion and scrambled biological motion is prolonged compared to similar
non-biological stimuli (Wang and Jiang 2012).
A further general approach for the characterization of signals that are specific for biological
movements, and which can be processed even in absence of configural cues, has been motivated
by work in motor control on the differential invariants of body movements. An example for such
an invariant is the two-thirds power law that links the speed and the curvature of the endpoint
trajectories of arm and finger movements, and which holds even for trajectories in locomotion.
Psychophysical and imaging work shows that trajectories compatible with this law are perceived as
smoother (Viviani and Stucci 1989; Bidet-Ildei et al. 2006), and activate brain structures involved
in body motion processing more strongly than dot trajectories that are incompatible with this
invariant (Dayan et al. 2010; Casile et al. 2011).

Bottom-up vs. Top-down Processing


Since a long time, there has been a discussion in the field of body motion perception about pos-
sible contributions of bottom-up vs. top-down mechanisms. ‘Bottom-up mechanisms’ are typi-
cally understood as processes that derive representations of complex pattern by combination of
simpler image features, e.g. using hierarchical representations. ‘Top-down processing’ is typically
understood as a class of mechanisms that either tries to match some higher representation, e.g. of
a moving body to the stimulus sequence, or which actively searches and groups components of
body motion stimuli in the stimulus sequence. Typically, it is assumed that these processes require
attention.
Initial studies investigated the influence of attention on biological motion processing, dem-
onstrating that biological motion perception tolerates longer inter-stimulus intervals (ISIs) than
would be expected from first-order local motion processing (Thornton et al. 1998) and that that
processing of biological motion requires attention in dual task and visual search paradigms
(Figure 28.4a) (Cavanagh et al. 2001; Thornton et al. 2002). Consistent with this idea, patients
with parietal lesions are impaired in visual search tasks with biological motion stimuli (Battelli
et al. 2003). In a more recent study that demonstrates top-down interactions in the processing
of biological motion by (Hunt and Halper 2008) the dots of a normal point-light walker were
replaced by complex objects (cf. Figure 28.4b). This manipulation interfered strongly with the
processing of body motion, potentially because attentional resources have to be shared between
object and body motion processing.
A substantial attentional modulation of the brain activity related to biological motion process-
ing is also suggested by fMRI and ERP studies (Safford et al. 2010). More detailed psychophysical
studies showed that in particular performance variations due to changes of flanker congruency
and Stroop-related attention tasks correlated with performance in biological motion processing,
while this was not the case for other attention tasks (Chandrasekaran et al. 2010). However, even
unattended, not task-relevant walkers are processed automatically in a flanker paradigm and
influence the processing of the attended stimulus (Thornton and Vuong 2004). This illustrates
that the control by attention is not complete, and that even in tasks that require top-down control,
bottom-up processes act in parallel.
Further experiments show that the processing of body motion interacts with other percep-
tual processes, and the processing of the scene. For example, the perception of the direction of
ambiguous background motion (suggesting a floor or wall) is biased by the perceived locomotion
direction of walkers (cf. Figure 28.4c) (Fujimoto 2003; Fujimoto and Yagi 2008). Also, Gestalt
Biological and Body Motion Perception 583

(a) (b) (c)

Fig. 28.4  Top-down effects in the processing of body motion. (a) Visual search task for point-light
walkers: The target is the walker walking to the left side. Reproduced with permission from
Cavanagh et al. (2001). Attention-based visual routines: sprites. Cognition 80, p. 56, with permission
from Elsevier. (b) Stimulus demonstrating strong interference between shape recognition and body
motion perception. Reproduced from Hunt and Halper (2008). Disorganizing biological motion.
J. Vis. 8(9) 12, p. 3, with permission of the Association for Research in Vision and Ophthalmology.
(c) Motion stimulus by Fujimoto and Yagi (2008), showing that body motion processing interacts
with the organization of ambiguous coherent motion of a grating. The background is preferentially
perceived as moving in the direction that would be compatible with a forward locomotion of walker /
runner. Similar observations hold for point-light patterns.
Adapted from Kiyoshi Fujimoto and Akihiro Yagi, ‘Motion Illusion in Video Images of Human Movement’, in
Entertainment Computing - ICEC 2005, Lecture Notes in Computer Science, p. 532, Copyright © 2005,
Springer-Verlag Berlin Heidelberg. With kind permission from Springer Science and Business Media.

grouping principles interact with the perceptual organization of biological motion displays. This
was, for example, demonstrated by replacing the dots of point-light walkers by oriented Gabor
patches that support or disfavor the correct grouping into limbs (Poljac et al. 2011).

Relevance of Learning
Several studies that the perception of body motion and other complex motion patterns is depend-
ent on learning. It is a classical result that observers can learn to recognize individuals from their
body movements (e.g., Hill and Pollick 2000; Cutting and Kozlowski 1977; Troje et  al. 2005).
The discrimination of biological from scrambled patterns can be successfully trained, where this
training induces corresponding changes of the BOLD activity in relevant areas (Grossman et al.
2004). Several studies have compared the learning of biological and similar non-biological motion
patterns, finding substantial learning effects, for both stimulus classes (Hiris et al. 2005; Jastorff
et al. 2006). It seems critical for the learning process that the learned patterns are related to an
underlying skeleton. Beyond this, the learning seems to be very fast, requiring less than 30 repeti-
tions, and it is associated with BOLD activity changes along the whole visual pathway (Jastorff
et al. 2009). Finally, the learning of the visual discrimination of body motion patterns has been
studied extensively in the context of different application domains. For example, experience seems
to improve body motion recognition of identity and emotional expression in dance (e.g. Sevdalis
584 Giese

and Keller 2011), or the efficiency of the prediction of dangerous events in surveillance videos
(e.g. Troscianko et al. 2004).
Related to the role of learning in body motion recognition is the question about the extent in
which this capability is innate, and how this capability has changed in the course of evolution. This
question is on the one hand addressed by many developmental studies, showing that the capabil-
ity to discriminate point-light from scrambled stimuli emerges very early in child development
(e.g. Fox and McDaniel 1982; Bertenthal 1993). Space does not permit to provide a more detailed
review of this interesting literature. In addition, a variety of studies has investigated biological
motion perception in other species, such as cats, pigeons, or macaques (e.g. Blake 1993; Dittrich
et al. 1998). While many species can discriminate intact point-light from scrambled stimuli more
detailed investigations suggest that even macaques might not perceive point-light stimuli in the
same way as humans do and require extensive training until they can recognize these patterns
correctly (Vangeneugden et  al. 2010). This makes it crucial to carefully dissociate the relevant
computational levels of the processing of body motion in such experiments with other species,
before drawing far-reaching conclusions about potential evolutionary aspects.

Neural Mechanisms
Electrophysiological Studies
Substantial insights have been gained about neural mechanisms that are involved in the process-
ing of body motion. In particular, the imaging literature on action processing is vast, and a review
would by far exceed the scope of this chapter. In the following only a few key results from monkey
physiology and functional imaging can be highlighted that are particularly relevant for aspects
of visual pattern organization. In addition, it will not be possible to discuss the relevant literature
from neuropsychology and the relationship between body motion perception, brain lesions, and
psychiatric disorders, such as autism. More comprehensive discussions can be found in reviews
about the neural basis of body motion processing (e.g. Decety and Grezes 1999; Vaina et al. 2004;
Puce and Perrett 2003; Knoblich et al. 2006; Blake and Shiffrar 2007; Johnson and Shiffrar 2013).
Neurons with visual selectivity for body motion and point-light stimuli have been first described
in the superior temporal sulcus (STS) by the group of David Perrett (Perrett et al. 1985; Oram and
Perrett 1996). This region contains neurons that respond selectively to human movements and
body shapes, and in the monkey likely represents a site of convergence of form and motion infor-
mation along the visual processing stream. Some neurons in this area show specific responses to
combinations of articulary and translatory body motion, and many of them show selectivity for
the temporal order of the stimulus frames (Jellema and Perrett 2003; Barraclough et al. 2009). The
responses of many of these neurons are specific for certain stimulus views, and such view depend-
ence has been observed even at very high levels of the processing pathway, e.g. in mirror neurons
in premotor cortex (Caggiano et al. 2011). An extensive study of the neural encoding of body
motion in the STS has been realized by Vangeneugden et al. (2009) using a stimulus set that was
generated by motion morphing, and defining a triangular configuration in the morphing space.
Applying multi-dimensional scaling to the responses of populations of STS neurons, correspond-
ing metric configurations in the ‘neural space’ were recovered from the cell activities that closely
resembled these configurations in the physical space (consistent with a veridical neural encoding
of the physical space). In addition, this study reports ‘motion neurons’, especially in the upper
bank and fundus of the STS, which respond to individual and small groups of dots in point-light
stimuli, even in absence of global shape information. Conversely, the lower bank contains many
Biological and Body Motion Perception 585

‘shape neurons’ that are specifically selective for the global shape of the body. Recent studies also
applied neural decoding approaches using classifiers to responses of populations of STS neurons
for stick figure stimuli, as well as for densely textured avatars, showing that such stimuli can be
decoded from such population responses (Singer and Sheinberg 2010; Vangeneugden et al. 2011).
Another literature in the field of electrophysiology that is highly relevant for body motion pro-
cessing is related to the ‘mirrror neuron system’, and shows that neurons in parietal and premotor
cortex also are strongly activated by the observation of body motion. Space limitation do not
permit here to give a thorough review of this aspect, and the reader is referred to reviews and
books that treat specifically this aspect (e.g. Rizzolatti et al. 2001; Rizzolatti and Craighero 2004;
Rizzolatti and Sinigaglia 2008).

Imaging Studies
Meanwhile there exists a vast imaging literature on the perception of body motion, and we can
highlight only a very small number of aspects related to the mechanisms of pattern formation.
Further details can be found in the reviews mentioned at the beginning of this chapter.
Early positron emission spectroscopy (PET) and fMRI studies found evidence for the involve-
ment of a network of areas, including the posterior STS, in the processing of point-light biological
motion (Bonda et al. 1996; Vaina et al. 2001; Grossman and Blake 2002). The relevant network
includes also human MT, parts of the lateral occipital complex (LOC), and the cerebellum. Also
an inversion effect could be demonstrated for the activity in the STS (Grossman and Blake 2001).
Subsequent studies tried to dissociate activation components related to the action vs. human
shape (Peuskens et al. 2005), where specifically the right pSTS seems to respond selectively to the
human motion. The human STS can also be robustly activated by full-body motion patterns (e.g.
Pelphrey et al. 2003), and several studies have investigated body motion-induced activation pat-
terns using natural stimuli such as movies (e.g. Hasson et al. 2004; Bartels and Zeki 2004), even
being able to decode semantic categories from action videos (Huth et al. 2012). TMS stimulation
in the STS reduces the sensitivity to biological motion stimuli (Grossman et al. 2005).
Substantial work has been dedicated to study of body-selective areas in the inferotemporal cor-
tex and their involvement in the processing of body motion. One such area is the extrastriate
human body area (EBA) (Peelen and Downing 2007), which is selectively activated by static body
shapes and responds also strongly to body motion. Another relevant area is the fusiform body
area (FBA), which is very close to the fusifirm face area (FFA) (Peelen and Downing 2005). Both
areas have been interpreted as specifically processing the form aspects of body motion. Recent
studies, controlling for structure as well as motion cues, suggests that EBA and FBA might repre-
sent an essential stage of body motion processing that links the body information with the action
(Jastorff and Orban 2009). Very similar imaging results have been obtained by fMRI studies in the
monkey cortex, permitting to establish a homology between human and monkey imaging data on
body motion perception (e.g. Jastorff et al. 2012).
Again, there exists a vast and continuously growing imaging literature about the involvement
of motor and mirror representations in the perceptual processing of body motion. Again we refer
to other more specialized reviews (e.g. Buccino et al. 2004; van Overwalle and Baetens 2009) with
respect to this aspect.

Computational and Neural Models


Motion recognition and tracking have been popular topics in computational and computer vision
since the 1990s, and a huge variety of algorithms have been developed in this domain. Only a
(a) Motor Prediction
commands errors
Controller 1 Predictor 1 −
Classification
Controller 2 Predictor 2 − (minimum
error)

Observed
sensory
feedback

View-specific modules

(b)
Complex Snapshot Motion pattern
Gabor filters feature neurons neurons
detectors
Form
pathway t1t2t3 Σ
− + View integration
Temporal
Recurrent NN summation Motion pattern
neurons
V1/2 V2, V4 IT/FBA STS, FBA, F5 (view-indep.)
− −

Local motion Complex OF pattern Motion pattern Competitive


detectors OF feature cells neurons NN
detectors
Motion STS, FBA, EBA, F5
pathway t1t2t3
− +
Σ
Temporal
Recurrent NN summation

V1/2, MT M(S)T, STS, EBA STS, EBA, F5


KO/V3B

Fig. 28.5  Models of body motion recognition. (a) Example for a model for movement recognition by internal simulation of the underlying motor behavior. The core of the
MOSAIC model by Wolpert et al. (2003) is a mixture of expert controllers for different motor behaviors, such as walking or kicking. Forward models for each individual
controller predict the sensory signals that would be caused by the corresponding motor commands. These predictions are compared with the actual sensory input. The
classification of observed movements is obtained by choosing the controller model that produces the smallest prediction error. (b) Neural architecture for body motion
recognition, following models by Giese and Poggio (2003) and Fleischer et al. (2013). The model assumes processing in two parallel pathways that are specialized for form
and motion features. Model neurons at different levels mimic properties of cortical neurons. Recognition in the form pathway is accomplished by integrating the information
from sequences of recognized body shapes (recognized by ‘snapshot neurons’). Recognition from local motion features is accomplished by the detection of sequences of
characteristic optic flow patterns. Recognition is first accomplished in a view-specific manner within view-specific modules. Only at the highest hierarchy the outputs of
these view-specific modules are combined, achieving view-independent recognition. (Potentially relevant cortical areas in monkey and human cortex are indicated by the
abbreviations below the modules of the model. See above references for further details.)
Adapted from Daniel M. Wolpert, Kenji Doya, and Mitsuo Kawato, A unifying computational framework for motor control and social interaction, Philosophical Transactions B, 358 (1431),
pp. 593–602, DOI: 10.1098/rstb.2002.1238, Copyright © 2003, The Royal Society.
Biological and Body Motion Perception 587

small number of these approaches is relevant for biological systems. For a recent overview over
technical approaches see e.g. Moeslund et al. (2006). We will briefly sketch here some computa-
tional approaches that have been developed in the psychological literature on body motion per-
ception, and we will then more thoroughly discuss existing neural models.

Computational Models
Early theories of body motion recognition were based on simple invariants that can be derived from
the three-dimensional movements of articulated figures (e.g., Hoffman and Flinchbaugh 1982;
Webb and Aggarwal 1982). For example, for point-light stimuli the distances between dots on the
same limb tend to vary less than the distances between dots on different limbs. Alternatively, one
can try to derive geometrical constraints for the two-dimensional motion of points that are rigidly
connected in the three-dimensional space. Classical work by Marr and Vaina (1982), assumed
that the brain might recover the body shape, and track body movements, using parametric body
models that are composed from cylindrical shape primitives. Other models have exploited other
shape primitives, such as spheres (e.g. O’Rourke and Badler 1980).
Building on this idea another class of theoretical models has been developed that is presently
very influential in cognitive neuroscience. This class of models assumes that the recognition of
body movements and actions is based on the internal simulation of observed motor behaviors.
A tight interaction between body motion recognition and motor control is suggested by many
experiments (reviews see e.g. Knoblich et al. 2006; Schütz-Bosbach and Prinz 2007). For example,
a study by Jacobs and Shiffrar (2005) shows that the perception of gait speeds of point-light walk-
ers depends on whether the observers are walking or running during the observation. A direct
and highly selective coupling between motor control and mechanisms for the perception of bio-
logical motion is also suggested by a study that used Virtual Reality technology in order to control
point-light stimuli by the concurrent movements of the observer (e.g. Christensen et al. 2011).
In this case, detection of biological motion was facilitated if the stimulus was spatially and tem-
porally coherent with the ongoing movements of the observer, but impaired if this congruency
was destroyed. In addition, a variety of studies demonstrate that motor expertise (independent of
visual expertise) influences performance in body motion perception (e.g. Hecht et al. 2001; Casile
and Giese 2006; Calvo-Merino et al. 2006).
The analysis-by-synthesis idea that underlies this class of models goes back to classical motor
theory of speech recognition, which assumes that perceived speech is mapped onto ‘vocal gestures’
that form the units of the production of speech in the vocal tract (Liberman et al. 1967). For action
recognition this idea has been formulated, for example, by Wolpert and colleagues who suggested
that controller models for the execution of body movements might be used also for motion and
social recognition (Wolpert et al. 2003). The underlying idea is illustrated in Figure 28.5a. Their
MOSAIC model is based on a mixture of controller experts (forward models) for the execution of
different behaviors. Recognition is accomplished by predicting the observed sensory signals using
all controller models, and selecting the one that generates the smallest prediction error. Models
based on similar ideas have been suggested as account for the function of the ‘mirror neuron sys-
tem’ in action recognition, and as basis for the learning of movements by imitation (e.g. Oztop and
Arbib 2002; Erlhagen et al. 2006). In addition, related models have also been formulated exploit-
ing a Bayesian framework (e.g. Kilner et al. 2005).
Many of the discussed analysis-by-synthesis approaches require the reconstruction of
motor-relevant sensory variables, such as joint angles, at the input level. The estimation of such
variables from monocular image sequences is a very difficult computer vision problem that is
588 Giese

partially unsolved. Correspondingly, only few of the discussed models are implemented to a level
that would demonstrate their performance on real video data. For the brain it is unclear if and
how it solves the underlying reconstruction problem. Alternatively, the visual system might cir-
cumvent this difficult computational problem, recognizing body motion by computationally sim-
pler strategies.

Neural Models
Another class of models has been inspired by fundamental properties of the architecture of
the visual cortex and extends biologically-inspired models for the recognition of stationary
shapes (e.g. Riesenhuber and Poggio 1999) in space-time. Such an architecture, which repro-
duces broad range of data about body motion recognition from psychophysics, electrophysi-
ology, imaging, and neuropsychology is illustrated in Figure 28.5b. (See Giese and Poggio
(2003), Casile and Giese (2005), Giese (2006), Fleischer et al. (2013) for a detailed description.)
Consistent with the anatomy of the visual cortex, the model is organized in terms of two hier-
archical neural pathways, modeling the ventral and dorsal processing streams. The first pathway
is specialized for the processing of form information, while the second pathway processes local
motion information.
Both pathways consist of hierarchies of neural detectors that mimic properties of cortical neu-
rons, and which converge to a joint representation at a level that corresponds to the STS. The
complexity of the extracted features as well as the receptive field sizes of the feature detectors
increase along the hierarchy. The model creates position and scale invariance along the hierarchy
by pooling of the responses of detectors for the same feature over different positions and scales,
using a maximum operation (e.g. Riesenhuber and Poggio 1999). Stimuli can thus be recognized
largely independently of their size and positions in the visual field.
The detectors in the form pathway mimic properties of shape-selective neurons in the ventral
stream (including simple and complex cells in primary visual cortex, V4 neurons, and shape-
selective neurons in inferotemporal cortex). The detectors on the highest level of the form path-
way (‘snapshot neurons’) are selective for body postures that are characteristic for snapshots from
movies showing the relevant body movement. They are modeled by radial basis function (RBF)
units, which represent a form of fuzzy shape template (the RBF center defining the template).
The motion pathway of the model has the same hierarchical architecture, where its input level is
formed by local motion energy detectors. This pathway recognizes temporal sequences of com-
plexly-structured optic flow patterns, which are characteristic for body motion.
A central idea of the model is that body motion can be recognized by identifying temporal
sequences of features, such as body shapes or optic flow patterns in ‘snapshots’ from a movie (Giese
2000). In order to make the neural detectors selective for the temporal order of such sequences,
the model assumes the existence of asymmetric lateral connections between the snapshot neurons
in the form and motion pathway. The resulting network dynamics suppresses responses to mov-
ies for which the stimulus frames appear in the wrong temporal order (Giese and Poggio 2003).
The model accomplishes recognition first in a view-specific manner, within view-specific mod-
ules that are trained with different views of the body motion sequence. Only on the highest
hierarchy level the information from different view-specific modules is combined by pooling,
resulting in view-independent motion recognition (cf. Figure 28.5b).
If such a model is trained with normal full-body motion and tested with point-light walkers
the motion pathway spontaneously generalizes to point-light stimuli, while this is not the case
for the form pathway. This does not imply that configural information is irrelevant because also
Biological and Body Motion Perception 589

the optic flow templates in the motion pathway are dependent on the global body configuration.
In addition, this result does not imply that the form pathway cannot process point-light patterns.
If trained with them, the form pathway responds also perfectly to dot patterns (Casile and Giese
2005), consistent with the fact that trained observers can learn to recognize actions even from
static point-light patterns (Reid et al. 2009).
A strongly related model has been proposed by Beintema et al. (2006). This model was designed
originally in order to account for the processing of a biological motion from stimuli that degrade
local motion information by repositioning the dots on the skeleton of a moving point-light fig-
ure in every frame (Beintema and Lappe 2002). This model is very similar to the form pathway
of the model by Giese and Poggio (2003), where the major differences are: (i) The model does
not contain a motion pathway; (ii) it does contain a mechanism that accounts for position an
scale invariance; and (iii) it implicitly assumes that the form template detectors (RBFs) are always
perfectly positioned and scaled relative to the stimulus. In presence of static backgrounds this
perfect alignment might be accomplished by motion segmentation (Lange and Lappe 2006), while
this approach seems not applicable in presence of motion clutter, e.g. for dynamically masked
point-light stimuli. (More extensive discussions of related models can be found in Giese (2006)
and Fleischer et al. (2013).)
Meanwhile, much more computationally efficient versions of the Giese-Poggio model have
been developed in computer vision, reaching state-of-the-art performance for action detection
(e.g. Jhuang et al. 2007; Escobar et al. 2009; Schindler et al. 2008). In addition, the model has been
extended for the recognition of goal-directed actions (Fleischer et al. 2013). For this purpose, add-
itional modules were integrated that model the properties of neurons in parietal and premotor
cortex. One of these modules computes the spatial relationship (relative position and motion)
between the moving effector (e.g. the hand) and the goal object. The other module contains
neurons (probably in the STS and parietal cortex) that combine the information about the goal
object, the effector movement, and the spatial relationship between effector and goal. The model
accomplishes recognition of goal-directed hand actions from real videos, at the same time repro-
ducing a whole spectrum of properties of action-selective neurons in the STS, parietal and the
premotor cortex. Opposed to the architecture shown in Figure 28.5a, recognition by this model
is accomplished without the explicit reconstruction of three-dimensional structure parameters,
such as joint angles, from monocular image sequences. In addition, it has been shown (Fleischer
et al. 2012) that the model even accounts for certain forms of causality perception (Michotte
1946/1963).

Conclusion
This chapter has reviewed some central results and theories about the perception of body motion.
Work on this topic in psychology started from the original work of Johansson, who studied body
motion as an example of complex and ecologically relevant natural motion, and who was aiming
at uncovering and testing Gestalt rules for the perceptual organization of motion. Since then, this
field has made a strong development during which it has absorbed many other approaches out-
side Gestalt psychology and pattern formation. This includes psychophysical theories of pattern
detection, top-down control by attention, learning-based recognition theories, ecological and
developmental psychology, and modern approaches in physiology and imaging, including neu-
ral decoding by machine learning techniques. The large body of existing work has revealed some
neural and computational principles. However, we have no clear picture of the underlying neu-
ral and computational processes, and many of existing explanations remain phenomenological,
590 Giese

theoretically not rigorously defined, or only loosely tied to experimental data. The main stream
of present research is dominated, on the one hand, by pattern recognition approaches, implic-
itly assuming signal detection or filtering mechanisms, partly combined with ecological ideas.
Contrasting with this approach, research in cognitive neuroscience is fascinated by the idea of
an analysis by internal simulation of motor behavior, often entirely bypassing the aspects of
visual pattern recognition. Both streams go away from Johansson’s original idea of uncovering
the dynamic processes that control pattern formation in the organization of complex motion
patterns. It seems likely that such processes play a central role in the organization of ambigu-
ous stimulus information about body motion, and it seems quite interesting to pick up this old
line of research. Modern mathematical approaches in neurodynamics, Bayesian inference, and
computational learning, combined with the now available computer power, will provide a meth-
odological basis to re-address these questions. This approach in this direction seems even more
promising since the previous work has revealed insights about relevant features and underlying
basic processes, laying a basis for the study of active pattern formation in the processing of natu-
ralistic body motion stimuli.

Acknowledgments
I thank M. Angelovska for help with the illustrations and the editing of the references. I thank
J. Vangeneugden and an anonymous reviewer for helpful comments. Supported by EU Commission,
EC FP7-ICT-248311 AMARSi, F7 7-PEOPLE-2011-ITN:  ABC PITN-GA-011-290011, HBP
FP7-ICT-2013-FET-F/ 604102; FP7-ICT-2013-10/ 611909 KOROIBOT, Deutsche Forschungsge­
meinschaft: DFG GI 305/4-1, DFG GZ: KA 1258/15-1, and German Federal Ministry of Education
and Research: BMBF, FKZ: 01GQ1002A.

References
Ahlström, V., Blake, R., and Ahlström, U. (1997). Perception of biological motion. Perception 26: 1539–48.
Allison, T., Puce, A., and McCarthy, G. (2000). Social perception from visual cues: role of the STS region.
Trends Cogn Sci. 4: 267–78.
Atkinson, A.P., Dittrich, W.H., Gemmel, A.J., and Young A.W. (2004). Emotion perception from dynamic
and static body expressions in point-light and full-light displays. Perception 33: 717–46.
Barclay, C., Cutting, J., and Kozlowski, L. (1978). Temporal and spatial factors in gait perception that
influence gender recognition. Percept. Psychophys. 23: 145–52.
Barraclough, N.E., Keith, R.H., Xiao, D., Oram, MW, and Perrett, D.I. (2009). Visual adaptation to
goal-directed hand actions. J. Cogn. Neurosci. 21: 1806–20.
Bartels, A. and Zeki, S. (2004). Functional brain mapping during free viewing of natural scenes. Hum.
Brain Mapp. 21: 75–85.
Battelli, L., Cavanagh, P., and Thornton, I.M. (2003). Perception of biological motion in parietal patients.
Neuropsychologia 41: 1808–16.
Beardsworth, T. and Buckner, T. (1981). The ability to recognize oneself from a video recording of one’s
movements without seeing one’s body. Bulletin of the Psychonomic Society 18: 19–22.
Bellefeuille, A. and Faubert, J. (1998). Independence of contour and biological-motion cues for
motion-defined animal shapes. Perception 27: 225–35.
Beintema, J.P. and Lappe M. (2002). Perception of biological motion without local image motion.
Proceedings of the National Academy of Science USA 99: 5661–3.
Beintema, JA, Georg, K, and Lappe, M. (2006). Perception of biological motion from limited lifetime
stimuli. Percept. Psychophys. 68(4): 613–24.
Biological and Body Motion Perception 591

Bertenthal, B. I. (1993). Perception of biomechanical motions by infants: Intrinsic image and


knowledge-based constraints. In: C. Granrud (ed.), Carnegie Symposium on Cognition: Visual
perception and cognition in infancy, pp. 175–214. Hillsdale: Erlbaum.
Bertenthal, B. I. and Pinto, J. (1994). Global processing of biological motions. Psychological Science
5: 221–5.
Bidet-Ildei C., Orliaguet J. P., Sokolov A. N., and Pavlova M. (2006). Perception of elliptic biological
motion. Perception, 35: 1137–47.
Blake, R. (1993). Cats perceive biological motion. Psychological Science 4: 54–7.
Blake, R. and Shiffrar, M. (2007). Perception of human motion. Annu Rev Psychol. 58: 47–73.
Blakemore, S.J. and Decety, J. (2001). From the perception of action to the understanding of intention. Nat.
Rev. Neurosci. 2: 561–6.
Bonda, E., Petrides, M., Ostry, D., and Evans, A. (1996). Specific involvement of human parietal systems
and the amygdala in the perception of biological motion. J Neurosci. 16(11): 3737–44.
Bülthoff, I., Bülthoff, H., and Sinha, P. (1998). Top-down influences on stereoscopic depth-perception. Nat.
Neurosci. 1: 254–7.
Bülthoff, H.H. and Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation
theory of object recognition. Proceedings of the National Academy of Sciences 89: 60–4.
Buccino, G., Binkofski, F., and Riggio, L. (2004). The mirror neuron system and action recognition.
L. Brain Lang. 89(2): 370–76.
Calvo-Merino, B, Grèzes, J, Glaser, DE, Passingham, R.E., and Haggard, P.L. (2006) Seeing or doing?
Influence of visual and motor familiarity in action observation. Curr. Biol. 16(19): 1905–10.
Caggiano, V, Fogassi, L, Rizzolatti, G, Pomper, J, Thier, P, Giese, M.A., and Casile, A (2011) View-based
encoding of actions in mirror neurons of area f5 in macaque premotor cortex. Curr. Biol. 21: 144–8.
Casile, A. and Giese, M.A. (2005). Critical features for the recognition of biological motion. Journal of
Vision 5: 348–60.
Casile, A. and Giese M. A. (2006). Non-visual motor learning influences the recognition of biological
motion. Curr. Biol. 16(1): 69–74.
Casile, A., Dayan, E., Caggiano, V., Hendler, T., Flash, T., and Giese, M.A. (2011). Neuronal encoding of
human kinematic invariants during action observation. Cereb. Cortex 20(7): 1647–55.
Cavanagh, P., Labianca, A.T., and Thornton, I.M. (2001). Attention-based visual routines: sprites.
Cognition 80: 47–60.
Chang, D.H. and Troje, N.F. (2009) Acceleration carries the local inversion effect in biological motion
perception. J. Vis. 9(1): 19, 1–17.
Chandrasekaran C., Turner L., Bülthoff H. H., and Thornton I. M., (2010). Attentional networks and
biological motion. Psihologija 43(1): 5–20.
Christensen, A., Ilg, W. and Giese, M. A. (2011). Spatiotemporal Tuning of the Facilitation of Biological
Motion Perception by Concurrent Motor Execution. Journal of Neuroscience 31(9): 3493–9.
Cutting, J. E. (1981). Coding theory adapted to gait perception. Journal of Experimental Psychology: Human
Perception and Performance 7: 71–87.
Cutting, J. E. and Kozlowski, L. T., (1977) Recognizing friends by their walk: Gait perception without
familiarity cues. Bulletin of the Psychonomic Society 9: 353–6.
Cutting, J.E., Proffit D.R., and Kozlowski L.T. (1978). A biomechanical invariant for gait perception.
Journal of Experimental Psychology: Human Perception and Performance 4: 357–72.
Cutting, J.E., Moore, C., Morrison, R. (1988). Masking the motions of human gait. Percept. Psychophys.
44: 339–47.
Dayan, E., Casile, A., Levit-Binnun, N., Giese, M.A., Hendler, and T., Flash, T. (2010). Neural
representations of kinematic laws of motion: evidence for action-perception coupling. Proc Natl Acad
Sci USA 104(51): 20582–7.
592 Giese

Decety, J. and Grèzes, J. (1999). Neural mechanisms subserving the perception of human actions. Trends
Cogn. Sci. 3(5): 172–8.
de Gelder B. (2006). Towards the neurobiology of emotional body language. Nat. Rev. Neurosci. 7(3): 242–9.
Dittrich, W.H. (1993). Action categories and the perception of biological motion. Perception 22: 15–22.
Dittrich, W. H., Troscianko, T., Lea, S. E., and Morgan, D. (1996). Perception of emotion from dynamic
point-light displays represented in dance. Perception 25: 727–38.
Dittrich, W.H., Lea, S.E.G., Barrett, J., and Gurr, P.R. (1998). Categorization of natural movements by
pigeons: visual concept discrimination and biological motion. J. Exp. Anal. Behav. 70: 281–99.
Duncker, K. (1929). Über induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung). Psychologische Forschung 12: 180–259.
Erlhagen W., Mukovskiy A., and Bicho E. (2006). A dynamic model for action understanding and
goal-directed imitation. Brain Res. 1083(1): 174–88.
Escobar, M.J., Masson, G.S., Vieville, T., and Kornprobst, P. (2009.) Action recognition using a
bio-inspired feedforward spiking network. Int. J. Comput. Vision 82: 284–301.
Fleischer F, Christensen A, Caggiano V, Thier P, and Giese MA. (2012). Neural theory for the perception
of causal actions. Psychol. Res. 76(4): 476–93.
Fleischer, F., Caggiano, V., Thier, P. and Giese, M. A. (2013). Physiologically inspired model for the visual
Recognition of transitive hand actions. Journal of Neuroscience 15(33): 6563–80.
Fox, R. and Mc Daniel, C. (1982). The perception of biological motion by human infants. Science
218(4571): 486–7.
Fujimoto, K. (2003). Motion induction from biological motion. Perception 32: 1273–7.
Fujimoto, K. and Yagi, A. (2005). Motion illusion in video images of human movement. In: F. Kishino et al.
(eds.), ICEC 2005, LNCS 3711, Springer-Verlag, Berlin/Heidelberg, pp. 531–4.
Fujimoto, K. and Yagi, A. (2008). Biological motion alters coherent motion perception. Perception
37(12): 1783–9.
Giese, M.A. (2000). Neural field model for the recognition of biological motion patterns. Second
Proceedings of International ICSC Symposium on Neural Computation (NC 2000), pp. 1–12.
Giese, M.A. (2006). Computational Principles for the Recognition of Biological Movements, Model-based
versus feature-based approaches. In: Knoblich, W., Thornton, I. M., Grossjaen, M., Shiffrar, M. (eds),
The Human Body: Perception From the Inside Out, pp. 323–59. Oxford University Press.
Giese, M.A. and Lappe, M. (2002). Measurement of generalization fields for the recognition of biological
motion. Vision Res. 42(15): 1847–58.
Giese, M.A. and Poggio, T. (2003). Neural mechanisms for the recognition of biological movements. Nat.
Rev. Neurosci. 4: 179–92.
Giese, M. A., Thornton, I.M., and Edelman, S. (2008). Metrics of the perception of body movement.
Journal of Vision 8(9): 1–18.
Grossman, E.D. and Blake, R. (2001). Brain activity evoked by inverted and imagined biological motion.
Vision Res. 41(10–11): 1475–82.
Grossman, E.D. and Blake, R. (2002).Brain areas active during visual perception of biological motion.
Neuron 35(6): 1167–75.
Grossman ED, Blake R, and Kim CY. (2004). Learning to see biological motion: brain activity parallels
behavior. J. Cogn. Neurosci. 16: 1669–79.
Grossman, E.D., Battelli, L., and Pascual-Leone A. (2005). Repetitive TMS over STSp disrupts perception
of biological motion. Vis. Res. 45: 2847–53.
Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., and Malach, R. (2004). Intersubject synchronization of cortical
activity during natural vision. Science 303: 1634–1640.
Hecht, H., Vogt, S., and Prinz, W. (2001). Motor learning enhances perceptual judgment: a case for
action-perception transfer. Psychol. Res. 65(1): 3–14.
Biological and Body Motion Perception 593

Herzog, M. H. and Öğmen, H. (2014). Apparent motion and reference frames. In: J. Wagemans (ed.),
Oxford Handbook of Perceptual Organization (in press). Oxford University Press.
Hill, H. and Pollick, F.E. (2000). Exaggerating temporal differences enhances recognition of individuals
from point light displays. Psychological Science Vol. 11 (3): 223–8.
Hiris, E. (2007). Detection of biological and nonbiological motion. J Vis. 7(12) 4: 1–16.
Hiris, E., Krebeck, A., Edmonds, J., and Stout, A. (2005). What learning to see arbitrary motion tells us
about biological motion perception. J. Exp. Psychol.: Hum. Percept. Perform. 31: 1096–106.
Hoffman, D.D. and Flinchbaugh, B.E. (1982). The interpretation of biological motion. Biol Cybern.
42(3): 195–204.
Hunt, A.R. and Halper, F. (2008). Disorganizing biological motion. J Vis. 8(9)12: 1–5.
Huth, A.G., Nishimoto, S., Vu, A.T., and Gallant, J.L. (2012). A continuous semantic space describes
the representation of thousands of object and action categories across the human brain. Neuron.
76(6): 1210–24.
Jackson, S. and Blake, R. (2010) Neural integration of information specifying human structure from form,
motion, and depth. J. Neurosci. 30(3): 838–48.
Jacobs, A. and Shiffrar, M. (2005). Walking perception by walking observers. J. Exp. Psychol.: Hum. Percept.
Perform. 31: 157–69.
Jansson, G., Bergström, S.S., Epstein, W., and Johansson, G. (1994). Perceiving Events and Objects.
Hillsdale: Lawrence Erlbaum Associates.
Jastorff, J. and Orban, G.A. (2009). Human functional magnetic resonance imaging reveals
separation and integration of shape and motion cues in biological motion processing. J. Neurosci.
29(22): 7315–29.
Jastorff, J., Kourtzi, Z., and Giese, M.A. (2006). Learning to discriminate complex movements: biological
versus artificial trajectories. J Vis. 6(8): 791–804.
Jastorff, J., Kourtzi, Z., and Giese, M.A. (2009). Visual learning shapes the processing of complex
movement stimuli in the human brain. J. Neurosci. 29(44): 14026–38.
Jastorff, J., Popivanov, I.D., Vogels, R., Vanduffel, W., and Orban, G.A. (2012). Integration of shape and
motion cues in biological motion processing in the monkey STS. Neuroimage. 60(2): 911–21.
Jellema, T. and Perrett, D.I. (2003). Perceptual history influences neural responses to face and body
postures. J. Cogn. Neurosci. 15(7): 961–71.
Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007). A biologically inspired system for action recognition.
In: IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil,
October 14-20, pp. 1-8.
Johansson, G. (1950). Configurations in event perception: an experimental study, dissertation.
Stockholm: Högskolan.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and
Psychophysics 14: 201–11.
Johansson, G. (1976). Spatio-temporal differentiation and integration in visual motion perception An
experimental and theoretical analysis of calculus-like functions in visual data processing. Psychological
Research 38: 379–93.
Johnson, K. and Shiffrar, M. (2013). People Watching. Oxford University Press.
Jokisch, D. and Troje, N.F. (2003). Biological motion as a cue for the perception of size. J. Vis. 3: 252–64.
Jordan H, Fallah M, and Stoner GR. (2006) Adaptation of gender derived from biological motion. Nat.
Neurosci. 9(6): 738–9.
Kilner, J., Friston, K.J., and Frith, C.D. (2005). The mirror-neuron system: a Bayesian perspective.
Neuroreport 18(6): 619–23.
Knoblich, G., Thornton, I.M., Grosjean, M., and Shiffrar, M. (2006). Human Body Perception from the
Inside Out. New York: Oxford University Press.
594 Giese

Koenderink, J. (2014). Gestalts as ecological templates. In: J. Wagemans (ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford University Press.
Lange, J. and Lappe, M. (2006). A model of biological motion perception from configural form cues.
J. Neurosci. 26: 2894–906.
Leopold, D.A., O’Toole, A.J., Vetter, T., and Blanz, V. (2001). Proto-type-referenced shape encoding
revealed by high-level aftereffects. Nat. Neurosci. 4: 89–94.
Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1967). Perception of the
speech code. Psychol. Rev. 74(6): 431–61.
Lu, H. (2010). Structural processing in biological motion perception. J. Vis. 10(12): 1–13.
Lu, H. and Liu, Z. (2006). Computing dynamic classification images from correlation maps. J Vis.
6(4): 475–83.
Ma, Y., Paterson, H.M., and Pollick, F.E. (2006). A motion-capture library for the study of identity, gender,
and emotion perception from biological motion. Behav. Res. Methods 38: 134–41.
Marey, E.J. (1894). Le Mouvement, Masson, Paris.
Marr, D. and Vaina, L. (1982). Representation and recognition of the movements of shapes. Proc. R. Soc.
Lond. B. Biol. Sci. 214(1197): 501–24.
Mather, G., Radford, K., and West, S. (1992). Low level visual processing of biological motion. Proc. R. Soc.
Lond. B. Biol. Sci. 249: 149–55.
Metzger, W. (1937). Gesetze des Sehens, 1st German edition, Laws of Vision.
Michotte, A. (1946). La perception de la causalité. Louvain: Publications Universitaires. (English
translation: The perception of causality. (1963) London: Methuen.)
Mirenzi, A. and Hiris, E., (2011). The Thatcher effect in biological motion. Perception 40(10): 1257–60.
Moeslund, T.B., Hilton, A., and Kruger, V. (2006). A survey of advances in vision-based human motion
capture and analysis. Computer Vision and Image Understanding 104: 90–126.
Montpare, J. M., Zebrowitz, M., and McArthur, L. (1988). Impressions of people created by age-related
qualities of their gaits. Journal of Personality and Social Psychology 55: 547–56.
Muybridge, E. (1887). Muybridge’s Complete Human and Animal Locomotion. (All 781 Plates from the
1887 ‘Animal Locomotion.’ Volume I. Dover Publications, Inc. 1979.)
Neri, P. (2009). Wholes and subparts in visual processing of human agency. Proc. Biol. Sci.
276(1658): 861–9.
Neri, P., Morrone, M.C., and Burr D. (1998). Seeing biological motion. Nature 395, 894–896.
Nicolas, H., Pateux, S., and Le Guen, D. (1997). Minimum description length criterion for region-based
video compression, Image Processing, Proceedings, International Conference 1: 346–9.
Oram, M.W., and Perrett, D.I. (1996). Integration of form and motion in the anterior superior temporal
polysensory area (STPa) of the macaque monkey. J. Neurophysiol. 76: 109–29.
O’Rourke J. and Badler N. (1980). ‘Model-based image analysis of human motion using constraint
propagation.’ IEEE Trans. on Pattern Analysis and Machine Intelligence 2(6): 522–36.
O’Toole, A.J., Roark, D.A., and Abdi, H. (2002). Recognizing moving faces: a psychological and neural
synthesis. Trends Cogn. Sci. 6 (6): 261–6.
Oztop, E. and Arbib, M.A. (2002). Schema design and implementation of the grasp-related mirror neuron
system. Biol. Cybern. 87(2): 116–40.
Pavlova, M. and Sokolov, A. (2000). Orientation specificity in biological motion perception. Percept.
Psychophys. 62 (5): 889–99.
Peelen, M.V. and Downing, P.E. (2005). Selectivity for the human body in the fusiform gyrus.
J. Neurophysiol. 93(1): 603–8.
Peelen, M.V. and Downing, P.E. (2007). The neural basis of visual body perception. Nat. Rev. Neurosci.
8(8): 636–48.
Biological and Body Motion Perception 595

Pelphrey, K.A., Mitchell, T.V., Mc Keown, M.J., Goldstein, J., Allison, T., and McCarthy, G. (2003).
Brainactivity evoked by the perception of human walking: controlling for meaningful coherent motion.
J. Neurosci. 23: 6819–25.
Perrett, D.I., Smith, P.A., Mistlin, A.J., Chitty, A.J., Head, A.S., Potter, D.D., Broenni-Mann, R., Milner,
A.D., and Jeeves, M.A. (1985). Visual analysis of body movements by neurons in the temporal cortex of
the macaque monkey: a preliminary report. Behav. Brain Res. 16: 153–70.
Peuskens, H., Vanrie, J., Verfaillie, K., and Orban GA. (2005). Specificity of regions processing
biologicalmotion. Eur. J. Neurosci. 21: 2864–75.
Pinto, J. and Shiffrar, M. (1999). Subconfigurations of the human form in the perception of
biologicalmotion displays. Acta. Psychol. 102: 293–318.
Poljac, E., Verfaillie, K, and Wagemans, J. (2011) Integrating biological motion: the role of grouping in the
perception of point-light actions. PLoS ONE 6(10): e25867.
Poljac, E., de-Wit, L., and Wagemans, J. (2012). Perceptual wholes can reduce the conscious accessibility of
their parts. Cognition 123: 308–12.
Pollick, F.E., Paterson, H.M., Bruderlin, A., and Sanford, A.J. (2001). Perceiving affect from arm
movement. Cognition 82(2): B51–B61.
Pollick, F.E., Kay, J.W., Heim, K., and Stringer, R. (2005). Gender recognition from point-light walkers.
J. Exp. Psychol.: Hum. Percept. Perform. 31: 1247–65.
Puce, A. and Perrett, D., (2003). Electrophysiology and brain imaging of biological motion. Philos. Trans.
R. Soc. Lond. B Biol. Sci. 358: 435–45.
Reid, R, Brooks, A, Blair, D, and van der Zwan, R. (2009). Snap! Recognising implicit actions in static
point-light displays. Perception 38(4): 613–16.
Restle, F. (1979) Coding theory of the perception of motion configurations. Psychol. Rev. 86(1): 1–24.
Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci.
12(11): 1019–25.
Rizzolatti, G., Fogassi, L., and Gallese, V. (2001). Neurophysiological mechanisms underlying the
understanding and imitation of action. Nat. Rev. Neurosci. 2: 661–70.
Rizzolatti, G. and Craighero, L. (2004). The mirror-neuron system. Annu. Rev. Neurosci. 27: 169–92.
Rizzolatti, G. and Sinigaglia, C. (2008) Mirrors in the brain: How our minds share actions and emotions.
New York: Oxford University Press.
Roether, C.L., Omlor, L., Christensen, A., and Giese, M. A. (2009). Critical features for the perception of
emotion from gait. Journal of Vision 9(6): 1–32.
Rose, C., Cohen, M.F., and Bodenheimer, B. (1998). Verbs and adverbs: multidimensional motion
interpolation. Computer Graphics and Applications 18(5): 32–40.
Runeson, S. and Frykholm, G. (1981). Visual perception of lifted weight. J. Exp. Psychol.: Hum. Percept.
Perform. 7: 733–40.
Safford, A.S., Hussey E.A., Parasuraman, R., and Thompson, J.C. (2010). Object-based attentional
modulation of biological motion processing: spatiotemporal dynamics using functional magnetic
resonance imaging and electroencephalography. J. Neurosci. 30 (27): 9064–73.
Schindler, K., Van Gool, L., and de Gelder, B. (2008). Recognizing emotions expressed by body pose: a
biologically inspired neural model. Neural Netw. 21(9): 1238–46.
Schütz-Bosbach, S. and Prinz, W. (2007). Perceptual resonance: action-induced modulation of perception.
Trends Cogn. Sci. 11(8): 349–55.
Shi, J., Pan, J., and Yu, S. (1998). Joint motion estimation and segmentation based on the MDL principle.
ICSP ‘98. Fourth International Conference on Signal Processing, Proceedings, 2(2): 963–7.
Singer, J.M., Sheinberg, D.L. (2010). Temporal cortex neurons encode articulated actions as slow sequences
of articulated poses. J. Neurosci. 30: 3133–45.
596 Giese

Sevdalis, V. and Keller, P.E. (2011). Perceiving performer identity and intended expression intensity in
point-light displays of dance. Psychol. Res. 75(5): 423–34.
Sumi, S. (1984). Upside-down presentation of the Johansson moving light-spot pattern. Perception
13: 283–6.
Theusner, S., de Lussanet, M.H.E., and Lappe, M. (2011). Adaptation to biological motion leads to a
motion and a form after effect. Atten. Percept. Psychophys. 73(6): 1843–55.
Thirkettle, M., Benton, C.P., and Scott-Samuel, N.E. (2009). Contributions of form, motion and task to
biological motion perception. J. Vis. 9(3)28: 1-11.
Thornton, I.M. and Vuong, Q.C. (2004.) Incidental processing of biological motion. Curr. Biol.
14(12): 1084–9.
Thornton, I. M., Pinto J., and Shiffrar, M. (1998).The visual perception of human locomotion. Cognitive
Neuropsychology 15: 535–52.
Thornton, I.M., Rensink, R.A., and Shiffrar, M. (2002) Active versus passive processing of biological
motion. Perception 31(7): 837–53.
Thurman, S.M. and Grossman, E.D. (2008). Temporal ‘Bubbles’ reveal key features for point-light
biological motion perception. J. Vis. 8(3) 28: 1–11.
Thurman, S.M., Giese, M.A., and Grossman, E.D. (2010). Perceptual and computational analysis of critical
features for biological motion. J. Vis. 10: 1–15.
Ternus, J. (1926). Experimentelle Untersuchungen über phänomenale Identitat (Experimental
investigations of phenomenal identity). Psychologische Forschung 7: 81–136.
Todd, J.T. (1983). Perception of gait. J. Exp. Psychol.: Hum. Percept. Perform. 9(1): 31–42.
Troje, N.F. (2002). Decomposing biological motion: A framework for analysis and synthesis of human gait
patterns. J. Vis. 2(5) 2: 371–87.
Troje, N.F. (2003). Reference frames for orientation anisotropies in face recognition and biological-motion
perception. Perception 32 (2): 201–10.
Troje N. F., Sadr J., Geyer H. and Nakayama K. (2006). Adaptation aftereffects in the perception of gender
from biological motion. J. Vis. 6: 850–7.
Troje, N.F. and Westhoff, C. (2006). The inversion effect in biological motion perception: evidence for a ‘life
detector’? Curr. Biol. 16(8): 821–4.
Troje, N.F., Westhoff, C., and Lavrov, M. (2005). Person identification from biological motion: effects of
structural and kinematic cues. Percept Psychophys. 67(4): 667-75.
Troscianko T, Holmes A, Stillman J, Mirmehdi M, Wright D, and Wilson A. (2004) What happens next?
The predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101.
Unuma, M., K. Anjyo, and R. Takeuchi (1995). Fourier principles for emotion-based human figure
animation, Proceedings of ACM SIGGRAPH ‘95, ACM Press, pp. 91–6.
Vaina, L.M., Solomon, J., Chowdhury, S., Sinha, P., and Belliveau, J.W. (2001). Functional neuroanatomy
of biological motion perception in humans. Proc. Natl. Acad. Sci. USA 98(20): 11656–61.
Vaina, L.M.V., Beardsley, S.A., and Rushton, S. (2004). Optic Flow and Beyond. Dordrecht: Kluwer
Academic Press.
Vallortigara, G. and Regolin, L. (2006). Gravity bias in the interpretation of biological motion by
inexperienced chicks. Curr. Biol. 16(8): R279–R280.
Vangeneugden, J, Pollick, F, and Vogels, R. (2009). Functional differentiation of macaque visual temporal
cortical neurons using a parametric action space. Cereb. Cortex. 19(3): 593–611.
Vangeneugden, J., Vancleef, K., Jaeggli, T., Van Gool, L., and Vogels, R. (2010). Discrimination of
locomotion direction in impoverished displays of walkers by macaque monkeys. J. Vis. 10: 22.1–22.19.
Vangeneugden, J., De Mazière, P.A., Van Hulle, M.M., Jaeggli, T., Van Gool, L., and Vogels, R.
(2011). Distinct mechanisms for coding of visual actions in macaque temporal cortex. J. Neurosci.
31(2): 385–401.
Biological and Body Motion Perception 597

Van Overwalle F. and Baetens K. (2009). Understanding others’ actions and goals by mirror and
mentalizing systems: a meta-analysis. Neuroimage 48(3): 564–84.
Vanrie J. and Verfaillie K. (2004). Perception of biological motion: a stimulus set of human point-light
actions. Behav. Res. Methods Instrum. Comput. 36(4): 625–9.
Vanrie, J., Dekeyser, M., and Verfaillie, K. (2004). Bistability and biasing effects in the perception of
ambiguous point-light walkers. Perception 33(5): 547–60.
Viviani, P., Stucchi, N. (1989). The effect of movement velocity on form perception: geometric illusions in
dynamic displays. Percept. Psychophys. 46(3): 266–74.
Walk, R.D. and Homan, C.P. (1984). Emotion and dance in dynamic light displays. Bull. Psychon. Soc.
22: 437–40.
Wang, L. and Jiang, Y. (2012). Life motion signals lengthen perceived temporal duration. Proc. Natl. Acad.
Sci. USA 109(11): E673–E677.
Webb, J.A. and Aggarwal, J.K. (1982). Structure from motion of rigid and jointed objects. Artif. Intell.
19: 107–30.
Wiley, D.J. and Hahn, J.K. (1997). Interpolation synthesis of articulated figure motion. IEEE Computer
Graphics and Applications 17(6): 39–45.
Wertheimer, M. (1923). Laws of organization in perceptual forms. First published as Untersuchungen zur
Lehre von der Gestalt II, in Psychologische Forschung 4: 301–50.
Wolpert, D. M., Doya, K., and Kawato, M. (2003). A unifying computational framework for motor control
and social interaction. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 358(1431): 593–602.
Section 7

Perceptual organization
and other modalities
Chapter 29

Auditory perceptual organization


Susan L. Denham and István Winkler

Introduction and Background


The problem
According to the functionalist view of perception and cognition (Brunswik 1955), perceptual infor-
mation processing serves to support the organism in reaching its fundamental goals: avoiding dangers
and gaining access to resources. Both dangers and resources are provided by objects in our envir-
onment. Thus a large part of perceptual processing can be understood as answering the question,
‘What is out there?’. However, even correctly answering this question is not sufficient for deciding on a
course of action, because our possible interactions with the environment necessarily lie in the future
compared to the time from which the information originated. Therefore, the second question to be
answered is: ‘What will these objects do in the future?’; that is, our perceptual systems must describe the
flow of events in the environment, and interpret them in terms of the behaviors of objects.
In this chapter, we consider how sound information is processed by the human brain to answer
the above questions. Sounds are produced by the movements or actions of objects and by interac-
tions between them. As a consequence, sounds primarily carry information about what happens
in the environment, rather than about the surface features of objects. Together with the fact that
most environments are largely transparent to travelling pressure waves (the physical sound) this
makes sounds especially useful for conveying information about the behaviors of objects.
Sounds pose a number of specific challenges that need to be considered in any account of their
interpretation. Sounds are ephemeral; we can’t go back to re-examine them. Sounds unfold in time
and contain information at many scales of granularity; thus analysis over a number of different
timescales is needed in order to extract their meaning (Nelken 2008). For example, a brief impul-
sive sound may tell the listener that two objects have been in collision, but a series of such sounds
is needed in order for the listener to know that someone is clapping rather than walking. Many
sound sources generate sounds intermittently and information about their behavior typically spans
several discrete sound events. To correctly associate sounds across time requires the formation
of mental representations that are temporally persistent and allow the formation of associations
between sounds emitted by the same source (Winkler et al. 2009). Finally, the pressure waves arriv-
ing at our ears are formed as a composite of all concurrent sounds. Thus the auditory system has to
disentangle them. This process of partitioning acoustic features into meaningful groups is known
as auditory perceptual organization or auditory scene analysis (Bregman 1990).

Chapter overview
How does the auditory system achieve the remarkable feat of (generally correctly) decom-
posing the sound mixture into perceptual objects under the time constraints imposed by
the need to behave in a timely manner? Based on our review we will argue for two key pro-
cessing strategies; firstly, perceptual representations should be predictive (Friston 2005;
602 Denham and Winkler

Summerfield and Egner 2009), and secondly, perceptual decisions should be flexible
(Winkler et al. 2012). In this chapter, we will first consider the principles that guide the forma-
tion of links between sounds, and their separation from other sounds. Next, some of the key
experimental paradigms that have been used to investigate auditory perceptual organization are
described, and the behavioral and neural correlates of perceptual organization summarized. We
use this information to motivate our working definition of an auditory perceptual object (Kubovy
and Van Valkenburg 2001; Griffiths and Warren 2004; Winkler et al. 2009), and demonstrate the
utility of this concept for understanding auditory perceptual organization. For the purposes of
this chapter we ignore the influences of other modalities, but see Spence (this volume) for the
importance of cross-modal perceptual organization.

Grouping Principles, Events, Streams, and


Perceptual Objects in the Auditory Modality
The inverse problem and the need for constraints. If the goal of perception is to characterize dis-
tal objects, then perceptual information processing must solve what physicists term the ‘inverse
problem’: to find the causes (sources) of the physical disturbances reaching the sensors. The prob-
lem is that the information reaching the ears does not fully specify the sources (e.g. Stoffgren and
Brady 2001; however, see Gibson 1979). Therefore, in order to achieve veridical perception, solu-
tions need to be constrained in some way; e.g. by knowledge regarding the nature of the sound
sources likely to be found in the given environment (Bar 2007), and/or by expectations arising
from the current and recent context (Winkler et al. 2012). In his seminal book, Bregman (1990)
argued that such constraints had already been discovered by the Gestalt school of psychology
(Köhler 1947) during the first half of the twentieth century.
The core observation of Gestalt psychology was that discrete stimuli form larger perceptual
units, which have properties not present in the separate components, and that the perception
of the components is influenced by the overall perceptual structure. The Gestalt psychologists
described principles that govern the grouping of sensory elements (for a detailed discussion of
the Gestalt theory, see section I.1 in this book and the excellent review by Wagemans et al. 2012).
Because the original Gestalt ‘laws of perception’ were largely based on the study of vision, here we
discuss them in terms of sounds.
Similarity between the perceptual attributes of successive events such as pitch, timbre, loudness
and location provides a basis for linking them (Bregman 1990; Moore and Gockel 2002; Moore
and Gockel 2012). However, it appears that it is not so much the raw difference that is important,
but rather the rate of change; the slower the rate of change between successive sounds the more
similar they are judged (Winkler et  al. 2012). This leads one to consider that in the auditory
modality, the law of similarity is not separate from what the Gestalt psychologists termed good
continuation. Good continuation means that smooth continuous changes in perceptual attributes
favor grouping, while abrupt discontinuities are perceived as the start of something new. Good
continuation can operate both within a single sound event (e.g. amplitude-modulating a noise
with a relatively high frequency results in the separate perception of a sequence of loud sounds
and a continuous softer sound; Bregman 1990), and between events (e.g. glides can help bind suc-
cessive events; Bregman and Dannenbring 1973).
The principle of common fate refers to correlated changes in features; e.g. whether they start
and/or stop at the same time. This principle has also been termed ‘temporal coherence’ specifically
with regard to correlations over time windows that span longer periods than individual events
(Shamma et al. 2011). However, while common onset is a very powerful grouping cue, common
offset is far less influential (for a review see Darwin and Carlyon 1995), and evidence for the
Auditory Perceptual Organization 603

grouping effects of coherent correlations between some other features (e.g. frequency modula-
tions (Darwin and Sandell 1995; Lyzenga and Moore 2005) or spatial trajectories (Bőhm et al.
2012)) is lacking.
Disjoint allocation (or belongingness) refers to the principle that each element of the sensory
input is only assigned to one perceptual object. In an auditory analogy to the exclusive bor-
der assignment in Rubin’s face–vase illusion, Winkler et al. (2006) showed that a tone which
could be equally assigned to two different groups was only ever part of one of them at any
given point in time. However, while this principle often holds in auditory perception, there are
some notable violations; e.g. in duplex perception, the same sound component can contribute
to the perception of a complex sound as well as being heard separately (Rand 1974; Fowler and
Rosenblum 1990).
Finally, the principle of closure refers to the tendency of objects to be perceived as continuing
unless there is evidence for their stopping, e.g. a glide continuing through a masking noise (Miller
and Licklider 1950; Riecke et al. 2008). For example, in ‘temporal induction’ (or phonemic res-
toration), the replacement of part of a sound (speech) with noise results in the perception of the
original, unmodified, sound as well as a noise that is heard separately (Samuel 1981; Warren et al.
1988). However, temporal induction only works if the sound that is deleted is expected, as is found
for over-learnt sounds such as speech; see also Seeba and Klump (2009).
Perception as inference. This raises an important point: namely, that the key idea of a ‘Gestalt’
as a pattern implicitly carries within it the notion of predictability; i.e., parts can evoke the rep-
resentation of the whole pattern. Specifically in the case of sounds, this allows one to generate
expectations about sound events that have not yet occurred. This notion goes beyond Gestalt
theory, aligning it with the empiricist tradition of unconscious inference (Helmholtz 1885) and
perception as hypothesis formation (Gregory 1980; Feldman this volume). Indeed, whereas
Gestalt psychologists thought that grouping principles were rooted in the laws of physics, more
recent thinking (Bregman 1990) regards them as heuristics acquired through evolution and
learning. By detecting patterns (or feature regularities) in the sensory input the brain can con-
struct compressed representations that allow it to ‘explain away’ (Pearl 1988) future events and so
radically reduce the amount of sensory data needed for adequately describing the environment
(Summerfield and Egner 2009). The use of schemata (with the corresponding loss of some detail)
has long been accepted as an explanation for the nature of long-term memory (Bartlett 1932) and
seems also to be the basis for the formation of perceptual representations in general (Neisser 1967;
Hochberg 1981; Bar 2007). In accordance with these ideas, Winkler and Cowan (2005) suggested
that sound sequences are represented by feature regularities (i.e. relationships between features
that define the detected pattern) with only a few items described in full detail for anchoring the
representation.
Auditory perceptual objects as predictive representations. Based on the Gestalt principles
and ideas of perceptual inference outlined above, Winkler and colleagues (Winkler 2007; Winkler
et al. 2009; Winkler 2010) proposed a definition of auditory perceptual objects as predictive rep-
resentations, constructed on the basis of feature regularities extracted from the incoming sounds
(see also Koenderink this volume for a more general treatment of ecological Gestalts). Object
representations are persistent, and absorb expected sensory events. Object representations encode
distributions over featural and temporal patterns and can generalize appropriately with regard to
the current context. Thus in accordance with the ideas of the Gestalt psychologists, it was sug-
gested that individual sound events are processed within the context of the whole, and the con-
solidated object representation refers to patterns of sound events.
In accord with Griffiths and Warren (2004), Winkler et al. (2009) do not distinguish ‘concrete’
from ‘abstract auditory objects’, where the former refers to the physical source and the latter to the
604 Denham and Winkler

pattern of emission (Wightman and Jenison 1995; Kubovy and Van Valkenburg 2001). Thus, the
notion of an auditory perceptual object is compatible with the definition of an auditory stream, as
a coherent sequence of sounds separable from other concurrent or intermittent sounds (Bregman
1990). However, whereas the term ‘auditory stream’ refers to a phenomenological unit of sound
organization, with separability as its primary property, the definition proposed by Winkler et al.
(2009) concerns the extraction and representation of the unit as a pattern with predictable com-
ponents (Winkler et  al. 2012). This definition of an auditory perceptual object is compatible
with the memory component assumed in hierarchical predictive coding theories of perception
(Friston 2005; Hohwy 2007). These theories posit that the brain acts to minimize the discrep-
ancy between its predictions and the actual sensory input (termed the error signal), and that this
occurs at many different levels of processing (e.g. Friston and Kiebel 2009). Error signals propa-
gate towards higher levels which then attempt to suppress them through refinements to internal
models. Auditory perceptual objects can be regarded as models working at intermediate levels of
this predictive coding hierarchy (Winkler and Czigler 2012).

Behavioral Correlates of Perceptual Sound Organization


Extraction and binding of features. It is generally accepted that the spectral decomposition
carried out by the cochlea results in a topographically organized array of signals; i.e. a repre-
sentation of incoming sounds in terms of their frequency content, and this sets up the tono-
topic organization found through most of the auditory system, up to and including the primary
auditory cortex (Zwicker and Fastl 1999), with other features such as onsets, amplitude and
frequency modulations, and binaural differences, extracted subcortically and largely indepen-
dently within each frequency channel (Oertel et al. 2002). It is important to note that even iso-
lated sounds can be rather complex. In general, natural sounds contain many different frequency
components, and both the frequencies of the components and their amplitudes can vary within
a single sound (Ciocca 2008). Thus the auditory system has to find some way of correctly associ-
ating the features which originate from the same sound source. The classical view suggests that
acoustic features are bound together to form auditory events (Bertrand and Tallon-Baudry 2000;
Zhuo and Yu 2011). By a sound event, or token (Shamma et al. 2011), we mean a sound that is
localized in time and is perceived originating from a single sound source; for example, a musi-
cal note or a syllable (Ciocca 2008). Events are subsequently grouped sequentially into patterns,
streams, or objects.
However, most of the studies and models of auditory feature extraction to date have been based
on data obtained in experiments presenting isolated sounds to listeners, and many of the prob-
lems encountered in natural environments have not yet been fully explored due to their complex-
ity. One consequence is that the commonly accepted feed-forward hierarchical grouping account,
just described, is too simplistic; see also van Leeuwen this volume. In order to determine the per-
ceptual qualities of two or more overlapping sound events the brain must first bind their compo-
nent features; i.e. it must decide which parts of the complex input belong to each event and group
features according to which event they belong. But there is a problem, as the number of concur-
rent auditory objects and which features belong to each is unknown a priori; this must be inferred
incrementally from the ongoing sensory input. Therefore, feature extraction, feature binding, and
sequential grouping must proceed in an interactive manner. Unfortunately, as yet, little is known
about the nature of these interactions beyond the fact that the ubiquitous presence of descending
pathways throughout the auditory system could provide the substrate for contextual (top-down)
influences (Schofield 2010). Therefore, despite being aware that grouping processes cannot be
Auditory Perceptual Organization 605

fully disconnected from feature extraction and binding, by necessity, we will address grouping as
a separate process.
Auditory Scene Analysis. In the currently most widely accepted framework describing per-
ceptual sound organization, Auditory Scene Analysis, Bregman (1990) proposes two separable
processing stages. The first stage is suggested to be concerned with partitioning sound events
into possible streams (groups) based primarily on featural differences (e.g. spectral content, loca-
tion, timbre). The second stage, within which prior knowledge, context, and/or task demands
exert their influence, is a competitive process between candidate organizations that ultimately
determines which one is perceived. Three notable further assumptions are included in the frame-
work:  (1)  Initially, the brain assumes that all sounds belong to the same stream and segregat-
ing them requires evidence attesting to the probability that they originate from different sources;
(2) For sequences with repeating patterns, perception settles on a final ‘perceptual decision’ after
the evidence-gathering stage is complete; (3) Solutions that include the continuation of a previ-
ously established stream are preferred to alternatives (the ‘old+new’ strategy).
The grouping stage. Most behavioral studies have targeted the first processing stage, assessing
the effects of various cues on auditory group formation. Bregman (1990) distinguishes two classes
of grouping processes: grouping based on concurrent (spectral, instantaneous, or vertical) cues,
and grouping based on sequential (temporal, contextual, or horizontal) cues. However, although
these two classes seem intuitively to be distinct, it turns out that instantaneous cues are susceptible
to the influences of prior sequential grouping (Bendixen, Jones, et al. 2010); e.g. a harmonic can
be pulled out of a complex with which it would otherwise be grouped if there are prior examples
of that tone (Darwin et al. 1995).
So what triggers the automatic grouping and segregation of individual sound events? There
have been surprisingly few experiments addressing this question explicitly, but the gap transfer
illusion (Nakajima et al. 2000) suggests that the auditory system tends to try to match onsets to
offsets according to their temporal proximity, and that the result (which also depends on the
extent to which features at the onset and offset match; Nakajima et al. 2004) is a perceptual event,
as defined above. Since listeners reliably reported the illusory event even though they were not
trying to hear it out, these experiments provide some evidence for obligatory grouping. Another
typical example of this class of obligatory grouping is the mistuned partial phenomenon. When
one partial of a complex harmonic tone is mistuned listeners perceive two concurrent sounds,
a complex tone and a pure tone, the latter corresponding to the mistuned partial (Moore et al.
1986). However, not all features trigger concurrent grouping; e.g. common interaural time differ-
ences between a subset of frequency components within a single sound event do not generate a
similar segregation of component subsets (Culling and Summerfield 1995).
In contrast to concurrent grouping, sequential grouping is necessarily based on some repre-
sentation of the preceding sounds. Most studies of this class of grouping have used sequences of
discrete sound events, and asked two main questions: (a) How do the various stimulus param-
eters affect sequential grouping of sound events, and (b)  What are the temporal dynamics of
this grouping process (for reviews, see Carlyon 2004; Haykin and Chen 2005; Snyder and Alain
2007; Ciocca 2008; Shamma et al. 2011). In the most widely used stimulus paradigm (termed
the auditory streaming paradigm), sequences of the structure ABA- (where A and B denote two
sounds (typically tones) differing in some auditory feature(s) and ‘-’ stands for a silent interval)
are presented to listeners (van Noorden 1975). When the feature separation between A and B
is small and/or they are delivered at a slow pace, listeners predominantly hear a single coher-
ent stream with a galloping rhythm (termed the integrated percept). With a large separation
between the two sounds and/or fast presentation rates, they most often experience the sequence
606 Denham and Winkler

in terms of two separated streams, one consisting only of the A tones and the other of the
B tones, with each stream having its own isochronous rhythm (termed the segregated percept).
Throughout most of the feature-separation/presentation-rate space there is a trade-off between
the two cues: smaller feature separation can be compensated with higher presentation rate, and
vice versa (van Noorden 1975).
Differences in various auditory features, including frequency, pitch, loudness, location, timbre,
and amplitude modulation, have been shown to support auditory stream segregation (Vliegen
and Oxenham 1999; Grimault et  al. 2002; Roberts et  al. 2002). Thus it appears that sequential
grouping is based on perceptual similarity, rather than on specific low-level auditory features
(Moore and Gockel 2002; Moore and Gockel 2012). As for the timing of the sounds, it was shown
that the critical parameter is the silent interval between consecutive tones of the same set (the
within-stream inter-stimulus interval; Bregman et al. 2000); however, see Bee and Klump (2005)
for a counter-view. Temporal structure has also been suggested as a key factor in segregating
streams either by guiding attentive grouping processes (Jones 1976; Jones et al. 1981) or through
temporal coherence between elements of the auditory input (Elhilali, Ma, et  al. 2009). Finally,
contextual effects, such as the presence of additional sounds or attentional set, can bias the final
perceptual outcome, suggesting that the second-stage processes of competition consider all pos-
sible alternative groupings (Bregman 1990; Winkler, Sussman, et al. 2003). In summary, sequen-
tial grouping effects generally conform to the Gestalt principles of similarity/good continuation
and common fate.
The competition/selection stage: Multistability in auditory streaming. Although the results
of many experiments have painted a picture consistent with Bregman’s assumptions (e.g. Cusack
et al. 2004; Snyder et al. 2006), other results appear to be at odds with the notion that the audi-
tory system (a)  always starts from the integrated organization, and (b) that eventually a stable
final perception is reached. When listeners are presented with ABA- (or ABAB) sequences of a
few minutes duration and are asked to report their perception in a continuous manner, it has
been found that perception fluctuates between alternative organizations in all listeners and with
all of the combinations of stimulus parameters tested (Anstis and Saida 1985; Roberts et al. 2002;
Denham and Winkler 2006; Pressnitzer and Hupe 2006; Kondo and Kashino 2009; Hill et al. 2011;
Schadwinkel and Gutschalk 2011; Kondo et al. 2012; Denham et al. 2013). Thus the perception
of these sequences appears to be bi- or multistable (Schwartz et al. 2012), similar to some other
auditory (Wessel 1979) and visual stimulus configurations (e.g. Leopold and Logothetis 1999;
Alais and Blake this volume). Furthermore, segregated and integrated percepts are not the only
ones that listeners experience in response to ABA- sequences (Bendixen, Denham, et al. 2010,
Bendixen et al. 2013, Bőhm et al. 2013, Denham et al. 2013, Szalárdy et al. 2013), and, with stimu-
lus parameters strongly promoting the segregated organization, participants often report segrega-
tion first (Deike et al. 2012; Denham et al. 2013). It has also been found that the first experienced
perceptual organization is more strongly determined by stimulus parameters than those experi-
enced later (Denham et al. 2013).
Finally, higher-order cues, such as regularities embedded separately within the A and B streams,
promote perception of the segregated organization (Jones et al., 1981; Drake et al., 2000; Devergie
et al., 2010; Andreou et al., 2011; Rimmele et al., 2012; Rajendran et al., 2013), probably by extend-
ing the duration of the phases (continuous intervals with the same percept) during which lis-
teners experience the segregated percept, while they do not affect the duration of the phases of
the integrated percept (Bendixen, Denham, et al. 2010; Bendixen et al. 2013). This suggests that
predictability (closure in terms of the Gestalt principles) also plays into the competition between
alternative sound organizations, although differently from cues based on the rate of perceptual
Auditory Perceptual Organization 607

change (similarity/good continuation and common fate). Closure in auditory perceptual organ-
ization may therefore be seen to resonate with Koffka’s early intuition as acting not so much as a
low-level grouping cue but rather as something that helps to determine the final perceptual form
(Wagemans et al. 2012). Just as closure in vision allows the transformation of a 1D contor into a
2D shape (Elder and Zucker 1993), so the discovery of a predictable temporal pattern transforms
a sequential series of unrelated sounds into a distinctive motif.
In contrast to the laboratory findings of multistable perception, everyday experience tells us
that we perceive the world in a stable, continuous manner. We may find that initially we are not
able to distinguish individual sound sources when suddenly confronted with a new auditory
scene, such as entering a noisy classroom or stepping out onto a busy street. But generally within
a few seconds, we are able to differentiate them, especially sounds that are relevant to our task.
This experience is well captured by Bregman’s assumptions of initial integration and subsequent
settling on a stable segregated organization. In support of these assumptions, when averaging
over the reports of different listeners, it is generally found that within the initial 5–15 s of an ABA-
sequence, the probability of reporting segregation monotonically increases (termed the build-up
of auditory streaming) (but see Deike et al. 2012), and the incidence of a break during this early
period, or directing attention away from the sounds, causes a reset (i.e. a return to integration
followed by a gradual increase in the likelihood of segregation; Cusack et al. 2004). So, should we
disregard the perceptual multistability observed in the auditory streaming paradigm as simply a
consequence of the artificial stimulation protocol used? We suggest not. Illusions and artificially
constructed stimulus configurations have played an important role in the study of perception (e.g.
as the main method of Gestalt psychology), because they provide insights into the machinery of
perception. In the following, we provide a description of auditory perceptual organization based
on insights gained from multistable phenomena.
Winkler et al. (2012) suggested that one should consider sound organization in the brain in
terms of the continuous discovery of proto-objects (alternative groupings) and ongoing com-
petition between them. Continuous discovery and competition are well suited to the everyday
demands on auditory perceptual organization in a changing world. Proto-objects (Rensink 2000)
are the candidate set of representations that have the potential to emerge as the perceptual objects
of conscious awareness (Mill et al. 2013). Within this framework, proto-objects represent patterns
which have been discovered embedded within the incoming sequence of sounds; they are con-
structed by linking sound events and recognizing when a previously discovered sequence recurs
and can thus be used to predict future events. In a new sound scene, the proto-object that is easiest
to discover determines the initial percept. Since the time needed for discovering a proto-object
depends largely on the stimulus parameters (i.e., to what extent successive sound events satisfy/
violate the similarity/good continuation principle), the first percept strongly depends on stimulus
parameters. However, the duration of the first perceptual phase is independent of the percept
(Hupe and Pressnitzer 2012), since it depends on how long it takes for other proto-objects to be
discovered (Winkler et al. 2012).
Once alternative organizations have been discovered they start competing with each other.
Competition between organizations is dynamic both because proto-objects are discovered on the
fly, and may come and go, and because their strength, which determines which of them becomes
dominant at a given time, is probably affected by dynamic factors, such as how often they success-
fully predict upcoming sound events (cf. predictive coding theories (Friston 2005) and Bregman’s
‘old+new’ heuristic (Bregman 1990)), adaptation, and noise (Mill et al. 2013). The latter two influ-
ences are also often assumed in computational models of bi-stable visual perceptual phenomena
(e.g. Shpiro et al. 2009; van Ee 2009); adaptation ensures the observed inevitability of perceptual
608 Denham and Winkler

switching (the dominant percept cannot remain dominant forever), and noise accounts for the
observed stochasticity in perceptual switching (successive phase durations are largely uncor-
related, and the distribution of phase durations resembles a gamma distribution) (Levelt 1968;
Leopold and Logothetis 1999). Generalizing the two-stage account of perceptual organization
proposed by Bregman (1990) to two concurrent stages which operate continuously and in par-
allel, the first consisting of the discovery of predictive representations (proto-objects), and the
second, competition for dominance between proto-objects, results in a theoretical and compu-
tational framework that explains a wide set of experimental findings (Winkler et al. 2012; Mill
et al. 2013). For example, perceptual switching, first-phase choice and duration, and differences
between the first and subsequent perceptual phases can all be explained within this framework.
It also accounts for the different influences of similarity and closure on perception; the rate of
perceptual change (similarity/good continuation) determines how easy it is to form links between
the events that make up a proto-object, while predictability (closure) does not affect the discovery
of proto-objects, but can increase the competitiveness (salience) of a proto-object once it has been
discovered (Bendixen, Denham, et al. 2010).
Perceptual organization. Up to this point we have used the term ‘sound organization’ in a gen-
eral sense. Now we consider it in a narrower sense. The two sound organizations most commonly
(but not exclusively) appearing in the ABA- paradigm are integration and segregation. Whereas
the integrated percept is fully specified, there are in fact two possible segregated percepts: one may
hear the A sounds in the foreground and the Bs in the background, or vice versa. It is compara-
tively easy to switch between these two variants of the segregated percept (since we are aware of
both of them at the same time), while it is more difficult to voluntarily switch between segregation
and integration (as we are not simultaneously aware of both these organizations, i.e. we don’t hear
the integrated galloping rhythm while we experience the sequence in terms of two streams). In
essence, a specific sound organization corresponds to a set of possible perceptual experiences,
which are, in Bregman’s terms, compatible with each other, while perceptual experiences which are
mutually exclusive belong to different sound organizations.
What determines compatibility? Winkler et al. (2012) suggested that two (or more) proto-objects
are compatible if they never predict the same sound event (i.e. they have no common element—cf.
the Gestalt principle of disjoint allocation), and considered three possible ways in which competi-
tion may be implemented in order to account for perceptual experience. The first possibility they
considered is that compatibility is explicitly extracted and organizations are formed during the first
processing stage. This leads to the assumption of hierarchical competition, one between organiza-
tions, and another within each organization that includes multiple proto-objects. The second pos-
sibility is a foreground–background solution. In this case all proto-objects compete directly with
each other and once a dominant one emerges, all remaining sounds are grouped together into a
background representation. Results showing no clear separation of sounds in the background are
compatible with this solution (Brochard et al. 1999; Sussman et al. 2005). However, other stud-
ies suggest that the background is not always undifferentiated (Winkler, Teder-Salejarvi, et  al.
2003). A third possibility is that proto-objects only compete with each other when they predict
the same sound event (collide). In this case organizations emerge because of the simultaneous
dominance of proto-objects that never collide with each other, and their alternation with other
compatible sets with which they do collide; i.e. when one proto-object becomes dominant in
the ongoing competition, others with which it doesn’t collide will also become strong, while all
proto-objects with which this set does collide are suppressed. Noise and adaptation ensure that at
some point a switch will occur to one of the suppressed proto-objects and the cycle will continue.
A computational model that demonstrates the viability of this solution for modeling perceptual
Auditory Perceptual Organization 609

experience in the ABA- paradigm has recently been developed (Mill et al. 2013). The assumption
that the perceptual organization of sounds is based on continuous competition between predic-
tive proto-objects leads to a system that is flexible, because alternative proto-objects are available
all the time, ready to emerge into perceptual awareness when they prove to be the best predictors
of the auditory input. The system is also stable and robust, because it does not need to reassess all
of its representations with the arrival of a new sound source in the scene, or in the event of tempo-
rary disturbances (such as a short loss of input, or during attentional switching between objects).

Neural Correlates of Perceptual Organization


We turn now to consider what has been learnt from neurophysiological studies of auditory per-
ceptual organization. Neural responses to individual sounds are profoundly influenced by the
context in which they appear (Bar-Yosef et al. 2002). The question is to what extent the contextual
influences on neural responses reflect the current state of perceptual organization. This ques-
tion has been addressed by a number of studies ranging in focus from the single-neuron level to
large-scale brain responses, and the results provide important clues about the processing strate-
gies adopted by the auditory system.
Stimulus specific adaptation and differential suppression. Context-dependent responses
at the single-neuron level have been probed using repetitive sequences of tones within which
occasional deviant tones (with a different frequency) are inserted. Under these circumstances
many neurons in cortex (Ulanovsky et al. 2003), thalamus (Anderson et al. 2009), and inferior
colliculus (Malmierca et al. 2009) show stimulus specific adaptation (SSA), i.e. the response to a
frequently recurring ‘standard’ tone diminishes, while the response to a ‘deviant’ tone is relatively
enhanced. Furthermore, this preferential response is not solely a function of the low probability of
the deviant sounds but also reflects their novelty; i.e. the extent to which they violate a previously
established pattern (Taaseh et al. 2011). This property of deviance detection is important in that
it signals to the brain, by increased neural activity, that something new has occurred, such as the
start of a new sound source. Thus SSA may indicate the presence of a primitive novelty detector
in the brain.
Single-neuron responses to alternating tone sequences as used in the auditory streaming para-
digm have also been investigated (Fishman et al. 2004; Bee and Klump 2005; Micheyl et al. 2005;
Micheyl et al. 2007), and it was found that even when at the start of the stimulus train the neuron
responds to both tones, with time the response to one of the tones (typically corresponding to the
best frequency of the cell) remains relatively strong, while the response to the other tone dimin-
ishes; an effect termed differential suppression. Although no behavioral tests were conducted in
these experiments, it was claimed that differential suppression was a neural correlate of perceptual
segregation (Fishman et al. 2004). This claim was supported by showing that neuronal sensitivity
to frequency difference and presentation rate was consistent with the classical van Noorden (1975)
parameter space, and that spike counts from neurons in primary auditory cortex could predict
an integration/segregation decision closely matching the results of perceptual studies in humans
(Micheyl et al. 2005; Bee et al. 2010). The differential suppression account of auditory streaming is
based on the idea that by default everything is grouped together but with time some part of primary
auditory cortex comes to respond to one of the tone streams, while some other part responds to
the other tone stream, and the time taken for these clusters to form and the degree to which they
can be separated corresponds to the time-varying, stimulus-dependent probability of segregation.
However, this account is challenged by three findings. Firstly, it suggests a fixed perceptual deci-
sion and offers no explanation for the multistability of streaming described in the previous section.
610 Denham and Winkler

Secondly, finding that segregation can be reported first contradicts the assumption of integration
as default (see The competition/selection stage section). Thirdly, it has been shown that while a
similar distinct clustering of neural responses can be found when the A and B tones are overlap-
ping in time, in this case, listeners report hearing an integrated pattern (Elhilali, Ma, et al. 2009).
So, while differential suppression may be necessary, it is not a sufficient condition for segregation.
Event-related potential correlates of sound organization. Auditory event-related brain poten-
tials (AERPs) represent the synchronized activity of large neuronal populations, time-locked to
some auditory event. Because they can be recorded non-invasively from the human scalp, one
can use them to study the brain responses accompanying perceptual phenomena, such as audi-
tory stream segregation. An AERP correlate of concurrent sound organization is found when a
partial of a complex tone is mistuned, giving rise to the perception of two concurrent sounds (see
The grouping stage section); a negative wave peaking at about 180 milliseconds after stimulus
onset, whose amplitude increases with the degree of mistuning, is elicited (Alain, Arnott et al.
2001). This AERP component, termed the ‘object-related negativity’ (ORN), is proposed to signal
the automatic segregation of concurrent auditory objects (Alain et al. 2002). An AERP correlate
of sequential sound organization was found in an experiment showing that the amplitude of two
early sensory AERP components, the auditory P1 and N1, vary depending on whether the same
sounds are perceived as part of an integrated or segregated organization (Gutschalk et al. 2005;
Szalárdy et al. 2013).
Another electrophysiological measure that has been extensively used to probe sequential per-
ceptual organization is the Mismatch Negativity (MMN); for recent reviews see (Winkler 2007;
Näätänen et al. 2011). MMN is elicited by sounds that violate some regular auditory feature of
the preceding sound sequence; therefore, it can be used to probe what auditory regularities are
encoded in the brain. By setting up stimulus configurations which result in different regularities
depending on how the sounds are organized, MMN can be used as an indirect index of auditory
stream segregation. The first studies using MMN in this way (Sussman et al. 1999; Nager et al.
2003; Winkler, Sussman, et al. 2003) showed that the elicitation of MMN can be made dependent
on sound organization, and furthermore, that MMN is only elicited by violations of regularities
characterizing the stream to which a sound belongs, but not by violating the regularities of some
other parallel sound stream (Ritter et al. 2000; Winkler et al. 2006). These observations allowed a
number of issues, not easily accessible to behavioral methods, to be addressed. Here we highlight
three important questions: interactions between concurrent and sequential perceptual organiza-
tion, evidence for the existence of two stages in sound organization, and the role of attention in
forming and maintaining auditory stream segregation.
In a study delivering sequences of harmonic complexes in which the probability of a mistuned
component was manipulated, it was found that the ORN was reliably elicited by mistuning in
all conditions, but its magnitude increased with decreasing probability of occurrence (Bendixen,
Jones, et al. 2010). This was interpreted as being a heightened response towards the onset of a
possible new auditory object. The additional finding that a positive AERP component, the P3a,
usually associated with involuntary attentional switching (Escera et al. 2000), was elicited by mis-
tuned sounds in the low mistuning probability condition but not by tuned sounds in the high
mistuning probability condition, suggested that the auditory system is primarily interested in the
onset of new sound sources rather than their disappearance (Dyson and Alain 2008; Bendixen,
Jones, et al. 2010); a view further supported by results obtained in a different behavioral paradigm
(Cervantes Constantino et al. 2012).
It has been shown that the early (<100 ms) AERP correlates of auditory stream segregation,
the P1 and N1 components, are governed by the acoustic parameters (Winkler et  al. 2005,
Auditory Perceptual Organization 611

Snyder et al. 2006), whereas later (>120 ms) responses (N2) correlate with perceptual experience
(Winkler et al. 2005, Szalárdy et al. 2013). Furthermore, the amplitude of the later AERP response
correlates with the probability of reporting segregation (the build-up of streams) and it is aug-
mented by attention (Snyder et al. 2006). These results suggest that the initial grouping, which
precedes temporal integration between sound events (Yabe et al. 2001; Sussman 2005), is mainly
stimulus-driven, whereas later occurring perceptual decisions are susceptible to top-down modu-
lation, a view compatible with Bregman’s theoretical framework.
Whereas most accounts of auditory streaming assume that perceptual similarity affects group-
ing through automatic grouping processes, Jones et al. (1978) suggested that segregation results
from a failure to rapidly shift attention between perceptually dissimilar items in a sequence. The
literature is divided on the role of attention in auditory stream segregation. Some electrophysi-
ological studies suggested that auditory stream segregation can occur in the absence of focused
attention (Winkler, Sussman, et  al. 2003; Winkler, Teder-Salejarvi, et  al. 2003; Sussman et  al.
2007). In contrast, results of some behavioral and AERP studies suggest that attention may at least
be needed for the initial formation of streams (Cusack et al. 2004; Snyder et al. 2006); however,
see Sussman et al. (2007). How can attention affect sound organization? Snyder et al. (2012) argue
for an attentional ‘gain model’ in which the representation of attended sounds is enhanced, while
unattended ones are suppressed. Due to the short latency of the observed gain modulation they
suggested that attention operates both on the group formation phase of segregation as well as the
later selection phase (Bregman 1990). However, attention can also have other effects on sound
organization; attention can retune and sharpen representations in order to improve the segrega-
tion of signals from noise (Ahveninen et al. 2011), attention to a stream improves the phase lock-
ing of neural responses to the attended sounds (Elhilali, Xiang, et al. 2009), attention allows the
utilization of learned (non-primitive) grouping algorithms thus providing additional processing
capacities (Lavie et al. 2004); and, attention can bias the competition between alternative sound
organizations (as found in the visual system; Desimone 1998). Which of these are most relevant
to auditory perceptual organization has yet to be established.
The neuroscience view of auditory objects. ‘. . .  in neuroscientific terms, the concepts of an
object and of object analysis can be regarded as inseparable’ (Griffiths and Warren 2004: 887).
Thus, neuroscientific descriptions of auditory perceptual objects focus on the processes involved
in forming and maintaining object representations. The detection and representation of regulari-
ties by the brain, as indexed by the MMN, has been used to establish a functional definition of an
auditory object (Winkler et al. 2009). Using evidence from a series of MMN studies, Winkler et al.
(2009) proposed that an auditory object is a perceptual representation of a possible sound source,
derived from regularities in the sensory input (Näätänen et al. 2001) that has temporal persistence
(Winkler and Cowan 2005) and can link events separated in time (Näätänen and Winkler 1999).
This representation forms a separable unit (Winkler et al. 2006) that generalizes across natural
variations in the sounds (Winkler, Teder-Salejarvi, et al. 2003) and generates expectations of parts
of the object not yet available (Bendixen et al. 2009).
Evidence for the representation of auditory objects in cortex, consistent with this defini-
tion, is found in fMRI (Hill et  al., 2011; Schadwinkel and Gutschalk 2011), and in MEG and
multi-electrode surface recording studies of people listening to two competing talkers (Ding
and Simon 2012; Mesgarani and Chang 2012). By decoding MEG signals correlated with the
amplitude fluctuations of each of the speech signals it was shown that the brain preferentially
locks onto the temporal patterns of the attended talker, and that this representation adapts
to the sound level of the attended talker and not the interfering one (Ding and Simon 2012).
Multi-electrode recordings in non-primary auditory cortex similarly show that the brain locks
612 Denham and Winkler

onto critical features in the attended speech stream, and furthermore that a simple classifier built
from a set of linear filters can be used to decode both the attended speaker and the words being
uttered (Mesgarani and Chang 2012). Other experiments showing that context-dependent pre-
dictive activity in the hippocampus encoded temporal relationships between events and corre-
lated with subsequent recall of episodes (Paz et al. 2010), suggest that the hippocampus may also
be involved; although this work used multisensory cinematic material so it is not clear whether
the finding hold for sounds alone.
While traditional psychological accounts implicitly or explicitly refer to representations
of objects, there are models of auditory streaming and perception in general, which are not
concerned with positing a representation that would directly correspond to the contents of
conscious perception; we have already referred to two such theories. Although hierarchical
predictive coding (e.g. Friston and Kiebel 2009) includes predictive memory representations,
which are in many ways compatible with the notion of auditory object representations (Winkler
and Czigler 2012), no explicit connection with object representations is made. Indeed, whereas
predictive coding models have been successful in matching the statistics of perceptual decisions
(Lee and Mumford 2003; Aoyama et al. 2006; Yu 2007; Garrido et al. 2009; Daunizeau et al.
2010), they are better suited to describing the neural responses observed during perception
(Grill-Spector et al. 2006), than perceptual experience per se. Shamma and colleagues’ temporal
coherence model of auditory stream segregation (Elhilali and Shamma 2008; Elhilali, Ma, et al.
2009; Shamma et al. 2011) provides another way to avoid the assumption that object represen-
tations are necessary for sound organization; instead it is proposed that objects are essentially
whatever occupies the perceptual foreground and exist only insofar as they do occupy the fore-
ground. Temporal coherence can be calculated using relatively short time windows without
building a description of the past stimulation. Thus auditory streams can be separated in a
single pass. It is also claimed that object formation (binding) occurs late, i.e. the composite mul-
tifeatured percept of conscious awareness is formed through selective attention to some feature
that causes all features correlated with the attended feature to emerge together into perceptual
awareness (and thus form a perceptual object), while the background remains undifferentiated
(Shamma et al. 2011). In summary, there is currently little consensus on the role of auditory
object representations in perceptual organization and the importance placed on object repre-
sentations by the various models differs markedly.

Conclusions and Future Directions


The Gestalt principles and their application to auditory perception instantiated in Bregman’s
two-stage auditory scene analysis framework have provided the impetus and initial basis for
understanding auditory perceptual organization. Recent proposals have extended this framework
in interesting ways. Specifically, a more precise definition of auditory objects (Winkler et al. 2009)
and an explanation for how perceptual organization can emerge through parallel processes of
construction and competition (Winkler et al. 2012; Mill et al. 2013), have been formed by inte-
grating Gestalt ideas (Köhler 1947; Bregman 1990) with the notion of perception as a ratiomor-
phic (Brunswik 1955) inference process (Helmholtz 1885; Gregory 1980; Friston 2005). One key
idea has been to show that perceptual object representations form plausible candidates for the
generative models assumed by predictive coding theories (Winkler and Czigler 2012). The con-
struction of proto-objects on the basis of pattern detection (closure) is well supported by recent
experiments showing that people can detect regularities very quickly (Teki et al. 2011). As dis-
cussed above, the general approach of predictive coding (Friston 2005) and predictive auditory
Auditory Perceptual Organization 613

object representation (Winkler et al. 2009) are compatible (Winkler and Czigler 2012) although
they have somewhat different aims. However, as of yet, there have been few attempts to face up to
the complexity of real auditory scenes in which grouping and categorization cues are not imme-
diately available; but see Yildiz and Kiebel (2011).
Progress may come from building bridges between competing theories. The instantiation of the
principle of common fate in the form of temporal coherence (Shamma et al. 2011) suggests a basis
for linking features and possibly events within a proto-object. Due to its generic nature, temporal
coherence as a cue is not limited to discrete well-defined sound events and can thus help to gener-
alize models that rely on such. The suggestion of a hierarchical decomposition of the sound world
into objects which are differentiated by attention and task demands, while others remain rather
more amorphous (Cusack and Carlyon 2003), can also be accommodated within the framework
of predictive object representations. The patterns or regularities encoded by proto-objects rep-
resent distributions over featural and temporal structures. Thus it is entirely feasible for some
proto-objects to represent well-differentiated and separated patterns, such as the voice of the
person to whom one is talking, while others may represent the undifferentiated combination of
background sounds, such as the background babble at a cocktail party (Cherry 1953). Finally,
decomposing complex sounds and finding events in long continuous sounds (Coath and Denham
2007; Yildiz and Kiebel 2011) may feed into models concerned with grouping events into auditory
object representations.
We started out by highlighting the two questions that the auditory system needs to answer: ‘What
is out there?’ and ‘What will it do next?’. In this chapter, we outlined the main approaches currently
being pursued to provide insights into how the human auditory system answers these questions
quickly and accurately under a variety of conditions, which can dramatically affect the cues that
are available. We suggest that in order to deliver robust performance within a changing world,
the human brain builds auditory object representations that are predictive of upcoming events,
and uses these in the formation of perceptual organizations that represent its interpretation of the
world. Flexible switching between candidate organizations ensures that the system can explore
alternative interpretations, and revise its perceptual decisions in the light of further information.
However, there is much that remains to be understood and current models are far from matching
the capabilities of human auditory perception. Perhaps, as outlined above, convergence between
the alternative approaches will provide a more satisfactory account of the processes underlying
auditory perceptual organization.

Acknowledgements
This work was supported in part by the Lendület project awarded to István Winkler by the
Hungarian Academy of Sciences (contract number LP2012-36/2012).

References
Ahveninen, J., M. Hamalainen, I. P. Jaaskelainen, S. P. Ahlfors, S. Huang, F. H. Lin, T. Raij, M. Sams, C.
E. Vasios, and J. W. Belliveau (2011). ‘Attention-Driven Auditory Cortex Short-Term Plasticity Helps
Segregate Relevant Sounds from Noise’. Proc Natl Acad Sci USA 108(10): 4182–4187.
Alain, C., S. R. Arnott, and T. W. Picton (2001). ‘Bottom-Up and Top-Down Influences on Auditory
Scene Analysis: Evidence from Event-Related Brain Potentials’. J Exp Psychol Hum Percept Perform
27(5): 1072–1089.
Alain, C., B. M. Schuler, and K. L. McDonald (2002). ‘Neural Activity Associated with Distinguishing
Concurrent Auditory Objects’. J Acoust Soc Am 111(2): 990–995.
614 Denham and Winkler

Alais, D. and R. Blake (this volume). ‘Multistability and Binocular Rivalry’. In The Oxford Handbook of
Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Anderson, L. A., G. B. Christianson, and J. F. Linden (2009). ‘Stimulus-Specific Adaptation Occurs in the
Auditory Thalamus’. J Neurosci 29(22): 7359–7363.
Andreou, L.-V., M. Kashino, and M. Chait (2011). ‘The Role of Temporal Regularity in Auditory
Segregation’. Hear Res 280(1–2): 228–235.
Anstis, S. and S. Saida (1985). ‘Adaptation to Auditory Streaming of Frequency-Modulated Tones’. J Exp
Psychol Hum Percept Perform 11: 257–271.
Aoyama, A., H. Endo, S. Honda, and T. Takeda (2006). ‘Modulation of Early Auditory Processing by
Visually Based Sound Prediction’. Brain Res 1068(1): 194–204.
Bar, M. (2007). ‘The Proactive Brain: Using Analogies and Associations to Generate Predictions’. Trends
Cogn Sci 11(7): 280–289.
Bar-Yosef, O., Y. Rotman, and I. Nelken (2002). ‘Responses of Neurons in Cat Primary Auditory Cortex to
Bird Chirps: Effects of Temporal and Spectral Context’. J Neurosci 22(19): 8619–8632.
Bartlett, F. C. (1932). Remembering: A Study in Experimental and Social Psychology (Cambridge: Cambridge
University Press).
Bee, M. A. and G. M. Klump (2005). ‘Auditory Stream Segregation in the Songbird Forebrain: Effects of
Time Intervals on Responses to Interleaved Tone Sequences’. Brain Behav Evol 66(3): 197–214.
Bee, M. A., C. Micheyl, A. J. Oxenham, and G. M. Klump (2010). ‘Neural Adaptation to Tone Sequences in
the Songbird Forebrain: Patterns, Determinants, and Relation to the Build-Up of Auditory Streaming’.
J Comp Physiol A Neuroethol Sens Neural Behav Physiol 196(8): 543–557.
Bendixen, A., E. Schröger, and I. Winkler (2009). ‘I Heard That Coming: Event-Related Potential Evidence
for Stimulus-Driven Prediction in the Auditory System’. J Neurosci 29(26): 8447–8451.
Bendixen, A., S. L. Denham, K. Gyimesi, and I. Winkler (2010). ‘Regular Patterns Stabilize Auditory
Streams’. J Acoust Soc Am 128(6): 3658–3666.
Bendixen, A., S. J. Jones, G. Klump, and I. Winkler (2010). ‘Probability Dependence and Functional
Separation of the Object-Related and Mismatch Negativity Event-Related Potential Components’.
Neuroimage 50(1): 285–290.
Bendixen, A., T. M. Bőhm, O. Szalárdy, R. Mill, S. L. Denham, and I. Winkler (2012). ‘Different Roles of
Similarity and Predictability in Auditory Stream Segregation’. J Learning & Perception in press.
Bertrand, O. and C. Tallon-Baudry (2000). ‘Oscillatory Gamma Activity in Humans: A Possible Role for
Object Representation’. Int J Psychophysiol 38(3): 211–223.
Bőhm, T. M., L. Shestopalova, A. Bendixen, A. G. Andreou, J. Georgiou, G. Garreau, P. Pouliquen, A.
Cassidy, S. L. Denham, and I. Winkler (2013). ‘The Role of Perceived Source Location in Auditory
Stream Segregation: Separation Affects Sound Organization, Common Fate Does Not’. Learn Percept
5(Suppl 2): 55–72.
Bregman, A. S. and G. Dannenbring (1973). ‘The Effect of Continuity on Auditory Stream Segregation’.
Percept Psychophys 13: 308–312.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge,
MA: MIT Press).
Bregman, A. S., P. A. Ahad, P. A. Crum, and J. O’Reilly (2000). ‘Effects of Time Intervals and Tone
Durations on Auditory Stream Segregation’. Percept Psychophys 62(3): 626–636.
Brochard, R., C. Drake, M. C. Botte, and S. McAdams (1999). ‘Perceptual Organization of Complex
Auditory Sequences: Effect of Number of Simultaneous Subsequences and Frequency Separation’. J Exp
Psychol Hum Percept Perform 25(6): 1742–1759.
Brunswik, E. (1955). ‘Representative Design and Probabilistic Theory in a Functional Psychology’.
Psychological Review 62(3): 193–217.
Carlyon, R. P. (2004). ‘How the Brain Separates Sounds.’ Trends Cogn Sci 8(10): 465–471.
Auditory Perceptual Organization 615

Cervantes Constantino, F., L. Pinggera, S. Paranamana, M. Kashino, and M. Chait (2012). ‘Detection of
Appearing and Disappearing Objects in Complex Acoustic Scenes.’ PLoS One 7(9): e46167.
Cherry, E. C. (1953). ‘Some Experiments on the Recognition of Speech, with One and with Two Ears’.
J Acoust Soc Am 25(5): 975–979.
Ciocca, V. (2008). ‘The Auditory Organization of Complex Sounds’. Front Biosci 13: 148–169.
Coath, M. and S. L. Denham (2007). ‘The Role of Transients in Auditory Processing’. Biosystems
89(1–3): 182–189.
Culling, J. F. and Q. Summerfield (1995). ‘Perceptual Separation of Concurrent Speech Sounds: Absence of
Across-Frequency Grouping by Common Interaural Delay’. J Acoust Soc Am 98(2, Pt 1): 785–797.
Cusack, R. and R. P. Carlyon (2003). ‘Perceptual Asymmetries in Audition’. J Exp Psychol Hum Percept
Perform 29(3): 713–725.
Cusack, R., J. Deeks, G. Aikman, and R. P. Carlyon (2004). ‘Effects of Location, Frequency Region, and
Time Course of Selective Attention on Auditory Scene Analysis’. J Exp Psychol Hum Percept Perform
30(4): 643–656.
Darwin, C. J. and R. P. Carlyon (1995). Auditory Grouping. In The Handbook of Perception and Cognition,
vol. 6: Hearing, ed. B. C. J. Moore, pp. 387–424 (London: Academic Press).
Darwin, C. J., R. W. Hukin, and B. Y. al-Khatib (1995). ‘Grouping in Pitch Perception: Evidence for
Sequential Constraints’. J Acoust Soc Am 98(2, Pt 1): 880–885.
Darwin, C. J. and G. J. Sandell (1995). ‘Absence of Effect of Coherent Frequency Modulation on Grouping a
Mistuned Harmonic with a Vowel’. J Acoust Soc Am 97(5, Pt 1): 3135–3138.
Daunizeau, J., H. E. den Ouden, M. Pessiglione, S. J. Kiebel, K. E. Stephan, and K. J. Friston (2010).
‘Observing the Observer (I): Meta-Bayesian Models of Learning and Decision-Making’. PLoS One
5(12): e15554.
Deike, S., P. Heil, M. Böckmann-Barthel, and A. Brechmann (2012). ‘The Build-Up of Auditory Stream
Segregation: A Different Perspective’. Frontiers in Psychology 3: 461.
Denham, S. L. and I. Winkler (2006). ‘The Role of Predictive Models in the Formation of Auditory Streams’.
J Physiol Paris 100(1–3): 154–170.
Denham, S. L., K. Gymesi, G. Stefanics, and I. Winkler (2013). ‘Multistability in Auditory Stream
Segregation: The Role of Stimulus Features in Perceptual Organisation’. Learn Percept 5(Suppl 2): 55–72.
Desimone, R. (1998). ‘Visual Attention Mediated by Biased Competition in Extrastriate Visual Cortex’.
Philos Trans R Soc Lond B Biol Sci 353(1373): 1245–1255.
Devergie, A., N. Grimault, B. Tillmann, and F. Berthommier (2010). ‘Effect of Rhythmic Attention on the
Segregation of Interleaved Melodies’. J Acoust Soc Am 128(1): EL1–EL7.
Ding, N. and J. Z. Simon (2012). ‘Emergence of Neural Encoding of Auditory Objects while Listening to
Competing Speakers’. Proc Natl Acad Sci USA 109(29): 11854–11859.
Drake, C., M. R. Jones, and C. Baruch (2000). ‘The Development of Rhythmic Attending in Auditory
Sequences: Attunement, Referent Period, Focal Attending’. Cogn 77(3): 251–288.
Dyson, B. J. and C. Alain (2008). ‘Is a Change as Good with a Rest? Task-Dependent Effects of Inter-trial
Contingency on Concurrent Sound Segregation’. Brain Res 1189: 135–144.
Elder, J. and S. Zucker (1993). ‘The Effect of Contour Closure on the Rapid Discrimination of
Two-Dimensional Shapes’. Vision Res 33(7): 981–991.
Elhilali, M. and S. A. Shamma (2008). ‘A Cocktail Party with a Cortical Twist: How Cortical Mechanisms
Contribute to Sound Segregation’. J Acoust Soc Am 124(6): 3751–3771.
Elhilali, M., L. Ma, C. Micheyl, A. J. Oxenham, and S. A. Shamma (2009). ‘Temporal Coherence in the
Perceptual Organization and Cortical Representation of Auditory Scenes’. Neuron 61(2): 317–329.
Elhilali, M., J. Xiang, S. A. Shamma, and J. Z. Simon (2009). ‘Interaction between Attention and
Bottom-Up Saliency Mediates the Representation of Foreground and Background in an Auditory Scene’.
PLoS Biol 7(6): e1000129.
616 Denham and Winkler

Escera, C., K. Alho, E. Schroger, and I. Winkler (2000). ‘Involuntary Attention and Distractibility as
Evaluated with Event-Related Brain Potentials’. Audiol Neurootol 5(3–4): 151–166.
Feldman, J. (this volume). In The Oxford Handbook of Perceptual Organization, ed. J. Wagemans
(Oxford: Oxford University Press).
Fishman, Y. I., J. C. Arezzo, and M. Steinschneider (2004). ‘Auditory Stream Segregation in Monkey
Auditory Cortex: Effects of Frequency Separation, Presentation Rate, and Tone Duration’. J Acoust Soc
Am 116(3): 1656–1670.
Fowler, C. A. and L. D. Rosenblum (1990). ‘Duplex Perception: A Comparison of Monosyllables and
Slamming Doors’. J Exp Psychol Hum Percept Perform 16(4): 742–754.
Friston, K. (2005). ‘A Theory of Cortical Responses’. Philos Trans R Soc Lond B Biol Sci 360(1456): 815–836.
Friston, K. and S. Kiebel (2009). ‘Predictive Coding under the Free-Energy Principle’. Philos Trans R Soc
Lond B Biol Sci 364(1521): 1211–1221.
Garrido, M. I., J. M. Kilner, K. E. Stephan, and K. J. Friston (2009). ‘The Mismatch Negativity: A Review of
Underlying Mechanisms’. Clin Neurophysiol 120(3): 453–463.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Boston: Houghton Mifflin).
Gregory, R. L. (1980). ‘Perceptions as Hypotheses’. Philos Trans R Soc Lond B Biol Sci 290(1038): 181–197.
Griffiths, T. D. and J. D. Warren (2004). ‘What is an Auditory Object?’ Nat Rev Neurosci 5(11): 887–892.
Grill-Spector, K., R. Henson, and A. Martin (2006). ‘Repetition and the Brain: Neural Models of
Stimulus-Specific Effects’. Trends Cogn Sci 10(1): 14–23.
Grimault, N., S. P. Bacon, and C. Micheyl (2002). ‘Auditory Stream Segregation on the Basis of
Amplitude-Modulation Rate’. J Acoust Soc Am 111(3): 1340–1348.
Gutschalk, A., C. Micheyl, J. R. Melcher, A. Rupp, M. Scherg, and A. J. Oxenham (2005). ‘Neuromagnetic
Correlates of Streaming in Human Auditory Cortex’. J Neurosci 25(22): 5382–5388.
Haykin, S. and Z. Chen (2005). ‘The Cocktail Party Problem’. Neural Comput 17(9): 1875–1902.
Hill, K. T., C. W. Bishop, D. Yadav, and L. M. Miller (2011). ‘Pattern of BOLD Signal in Auditory Cortex
Relates Acoustic Response to Perceptual Streaming’. BMC Neurosci 12: 85.
Hochberg, J. (1981). ‘Levels of Perceptual Organization’. In Perceptual Organization, ed. M. K. J. Pomerantz,
pp. 255–278 (Hillsdale, NJ: Erlbaum).
Hohwy, J. (2007). ‘Functional Integration and the Mind’. Synthese 159: 315–328.
Hupe, J. M. and D. Pressnitzer (2012). ‘The Initial Phase of Auditory and Visual Scene Analysis’. Philos
Trans R Soc Lond B Biol Sci 367(1591): 942–953.
Jones, M. R. (1976). ‘Time, our Lost Dimension: Toward a New Theory of Perception, Attention, and
Memory’. Psychological Review 83: 323–355.
Jones, M. R., D. J. Maser, and G. R. Kidd (1978). ‘Rate and Structure in Memory for Auditory Patterns’.
Memory & Cognition 6: 246–258.
Jones, M. R., G. Kidd, and R. Wetzel (1981). ‘Evidence for Rhythmic Attention’. J Exp Psychol Hum Percept
Perform 7: 1059–1073.
Koenderink, J. (this volume). Gestalts as Ecological Templates. In The Oxford Handbook of Perceptual
Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology
(New York: Liveright Publishing Corporation).
Kondo, H. M. and M. Kashino (2009). ‘Involvement of the Thalamocortical Loop in the Spontaneous
Switching of Percepts in Auditory Streaming’. J Neurosci 29(40): 12695–12701.
Kondo, H. M., N. Kitagawa, M. S. Kitamura, A. Koizumi, M. Nomura, and M. Kashino (2012).
‘Separability and Commonality of Auditory and Visual Bistable Perception’. Cer Cort 22(8): 1915–1922.
Kubovy, M. and D. Van Valkenburg (2001). ‘Auditory and Visual Objects’. Cognition 80(1–2): 97–126.
Auditory Perceptual Organization 617

Lavie, N., A. Hirst, J. W. de Fockert, and E. Viding (2004). ‘Load Theory of Selective Attention and
Cognitive Control’. J Exp Psychol Gen 133(3): 339–354.
Lee, T. S. and D. Mumford (2003). ‘Hierarchical Bayesian Inference in the Visual Cortex’. J Opt Soc Am
A Opt Image Sci Vis 20(7): 1434–1448.
Leopold, D. A. and N. K. Logothetis (1999). ‘Multistable Phenomena: Changing Views in Perception’.
Trends Cogn Sci 3(7): 254–264.
Levelt, W. J. M. (1968). On Binocular Rivalry (Paris: Mouton).
Lyzenga, J. and B. C. Moore (2005). ‘Effect of Frequency-Modulation Coherence for Inharmonic
Stimuli: Frequency-Modulation Phase Discrimination and Identification of Artificial Double Vowels’.
J Acoust Soc Am 117(3, Pt 1): 1314–1325.
Malmierca, M. S., S. Cristaudo, D. Perez-Gonzalez, and E. Covey (2009). ‘Stimulus-Specific Adaptation in
the Inferior Colliculus of the Anesthetized Rat’. J Neurosci 29(17): 5483–5493.
Mesgarani, N. and E. F. Chang (2012). ‘Selective Cortical Representation of Attended Speaker in
Multi-talker Speech Perception’. Nature 485(7397): 233–236.
Micheyl, C., B. Tian, R. P. Carlyon, and J. P. Rauschecker (2005). ‘Perceptual Organization of Tone
Sequences in the Auditory Cortex of Awake Macaques’. Neuron 48(1): 139–148.
Micheyl, C., R. P. Carlyon, A. Gutschalk, J. R. Melcher, A. J. Oxenham, J. P. Rauschecker, B. Tian, and E.
Courtenay Wilson (2007). ‘The Role of Auditory Cortex in the Formation of Auditory Streams’. Hear
Res 229(1–2): 116–131.
Mill, R., T. Bőhm, A. Bendixen, I. Winkler, and S. L. Denham (2013). ‘Competition and Cooperation
between Fragmentary Event Predictors in a Model of Auditory Scene Analysis’. PLoS Comput Biol in press.
Miller, G. A. and J. C. R. Licklider (1950). ‘The Intelligibility of Interrupted Speech’. J Acoust Soc Am
22: 167–173.
Moore, B. C., B. R. Glasberg, and R. W. Peters (1986). ‘Thresholds for Hearing Mistuned Partials as
Separate Tones in Harmonic Complexes’. J Acoust Soc Am 80(2): 479–483.
Moore, B. C. J. and H. E. Gockel (2002). ‘Factors Influencing Sequential Stream Segregation’. Acta Acust
88: 320–333.
Moore, B. C. J. and H. E. Gockel (2012). ‘Properties of Auditory Stream Formation’. Philos Trans R Soc Lond
B Biol Sci 367(1591): 919–931.
Näätänen, R. and I. Winkler (1999). ‘The Concept of Auditory Stimulus Representation in Cognitive
Neuroscience’. Psychol Bull 125(6): 826–859.
Näätänen, R., M. Tervaniemi, E. Sussman, P. Paavilainen, and I. Winkler (2001). ‘ “Primitive Intelligence”
in the Auditory Cortex’. Trends Neurosci 24(5): 283–288.
Näätänen, R., T. Kujala, and I. Winkler (2011). ‘Auditory Processing that Leads to Conscious
Perception: A Unique Window to Central Auditory Processing Opened by the Mismatch Negativity and
Related Responses’. Psychophysiology 48(1): 4–22.
Nager, W., W. Teder-Sälejärvi, S. Kunze, and T. F. Münte (2003). ‘Preattentive Evaluation of Multiple
Perceptual Streams in Human Audition’. Neuroreport 14(6): 871–874.
Nakajima, Y., T. Sasaki, K. Kanafuka, A. Miyamoto, G. Remijn, and G. ten Hoopen (2000).
‘Illusory Recouplings of Onsets and Terminations of Glide Tone Components’. Percept Psychophys
62(7): 1413–1425.
Nakajima, Y., T. Sasaki, G. B. Remijn, and K. Ueda (2004). ‘Perceptual Organization of Onsets and Offsets
of Sounds’. J Physiol Anthropol Appl Human Sci 23(6): 345–349.
Neisser, U. (1967). Cognitive Psychology (New York: Appleton-Century-Crofts).
Nelken, I. (2008). ‘Processing of Complex Sounds in the Auditory System’. Curr Opin Neurobiol
18(4): 413–417.
618 Denham and Winkler

Oertel, D., R. R. Fay, and A. N. Popper (2002). Integrative Functions in the Mammalian Auditory Pathway
(New York: Springer-Verlag).
Paz, R., H. Gelbard-Sagiv, R. Mukamel, M. Harel, R. Malach, and I. Fried (2010). ‘A Neural Substrate in
the Human Hippocampus for Linking Successive Events’. Proc Natl Acad Sci USA 107(13): 6046–6051.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (San
Mateo: Morgan Kaufmann Publishers).
Pressnitzer, D. and J. M. Hupe (2006). ‘Temporal Dynamics of Auditory and Visual Bistability Reveal
Common Principles of Perceptual Organization’. Curr Biol 16(13): 1351–1357.
Rajendran, V. G., N. S. Harper, B. D. Willmore, W. M. Hartmann, and J. W. H. Schnupp (2013).
‘Temporal Predictability as a Grouping Cue in the Perception of Auditory Streams’. J Acoust Soc Am
134(1): EL98–104.
Rand, T. C. (1974). ‘Letter: Dichotic Release from Masking for Speech’. J Acoust Soc Am 55(3): 678–680.
Rensink, R. A. (2000). ‘Seeing, Sensing, and Scrutinizing’. Vision Res 40(10–12): 1469–1487.
Riecke, L., A. J. Van Opstal, and E. Formisano (2008). ‘The Auditory Continuity Illusion: A Parametric
Investigation and Filter Model’. Percept Psychophys 70(1): 1–12.
Rimmele, J. M., E. Schröger, and A. Bendixen (2012). ‘Age-Related Changes in the Use of Regular Patterns
for Auditory Scene Analysis’. Hear Res 289(1–2): 98–107.
Ritter, W., E. Sussman, and S. Molholm (2000). ‘Evidence that the Mismatch Negativity System Works on
the Basis of Objects’. Neuroreport 11(1): 61–63.
Roberts, B., B. R. Glasberg, and B. C. Moore (2002). ‘Primitive Stream Segregation of Tone Sequences
without Differences in Fundamental Frequency or Passband’. J Acoust Soc Am 112(5, Pt 1): 2074–2085.
Samuel, A. G. (1981). ‘The Role of Bottom-Up Confirmation in the Phonemic Restoration Illusion’. J Exp
Psychol Hum Percept Perform 7(5): 1124–1131.
Schadwinkel, S. and A. Gutschalk (2011). ‘Transient Bold Activity Locked to Perceptual Reversals
of Auditory Streaming in Human Auditory Cortex and Inferior Colliculus’. J Neurophysiol
105(5): 1977–1983.
Schofield, A. R. (2010). Structural Organization of the Descending Auditory Pathway. In The Oxford
Handbook of Auditory Science, vol. 2: The Auditory Brain, ed. A. Rees and A. R. Palmer, pp. 43–64
(Oxford: Oxford University Press).
Schwartz, J. L., N. Grimault, J. M. Hupe, B. C. Moore, and D. Pressnitzer (2012). ‘Multistability
in Perception: Binding Sensory Modalities, an Overview’. Philos Trans R Soc Lond B Biol Sci
367(1591): 896–905.
Seeba, F. and G. M. Klump (2009). ‘Stimulus Familiarity Affects Perceptual Restoration in the European
Starling (Sturnus vulgaris)’. PLoS One 4(6): e5974.
Shamma, S. A., M. Elhilali, and C. Micheyl (2011). ‘Temporal Coherence and Attention in Auditory Scene
Analysis’. Trends Neurosci 34(3): 114–123.
Shpiro, A., R. Moreno-Bote, N. Rubin, and J. Rinzel (2009). ‘Balance between Noise and Adaptation in
Competition Models of Perceptual Bistability’. J Comput Neurosci 27(1): 37–54.
Snyder, J. S., C. Alain, and T. W. Picton (2006). ‘Effects of Attention on Neuroelectric Correlates of
Auditory Stream Segregation’. J Cogn Neurosci 18(1): 1–13.
Snyder, J. S. and C. Alain (2007). ‘Toward a Neurophysiological Theory of Auditory Stream Segregation’.
Psychol Bull 133(5): 780–799.
Snyder, J. S., M. K. Gregg, D. M. Weintraub, and C. Alain (2012). ‘Attention, Awareness, and the
Perception of Auditory Scenes’. Front Psychol 3: 15.
Spence, C. (this volume). ‘Cross-modal Perceptual Organization’. In The Oxford Handbook of Perceptual
Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Stoffgren, T. A. and B. G. Brady (2001). ‘On Specification and the Senses’. Behavioral and Brain Sciences
24: 195–222.
Auditory Perceptual Organization 619

Summerfield, C. and T. Egner (2009). ‘Expectation (and Attention) in Visual Cognition’. Trends Cogn Sci
13(9): 403–409.
Sussman, E. S., W. Ritter, and H. G. Vaughan, Jr (1999). ‘An Investigation of the Auditory Streaming Effect
Using Event-Related Brain Potentials’. Psychophysiology 36(1): 22–34.
Sussman, E. S. (2005). ‘Integration and Segregation in Auditory Scene Analysis’. J Acoust Soc Am 117(3, Pt 1):
1285–1298.
Sussman, E. S., A. S. Bregman, W. J. Wang, and F. J. Khan (2005). ‘Attentional Modulation of
Electrophysiological Activity in Auditory Cortex for Unattended Sounds within Multistream Auditory
Environments’. Cogn Affect Behav Neurosci 5(1): 93–110.
Sussman, E. S., J. Horváth, I. Winkler, and M. Orr (2007). ‘The Role of Attention in the Formation of
Auditory Streams’. Percept Psychophys 69(1): 136–152.
Szalárdy, O., A. Bendixen, D. Tóth, S. L. Denham, and I. Winkler (2012). ‘Modulation-Frequency Acts as a
Primary Cue for Auditory Stream Segregation’. J Learning & Perception in press.
Szalárdy, O., T. Bőhm, A. Bendixen, and I. Winkler (2013). ‘Perceptual Organization Affects the
Processing of Incoming Sounds: An ERP Study’. Biol Psychol 93(1): 97–104.
Taaseh, N., A. Yaron, and I. Nelken (2011). ‘Stimulus-Specific Adaptation and Deviance Detection in the
Rat Auditory Cortex’. PLoS One 6(8): e23369.
Teki, S., M. Chait, S. Kumar, K. von Kriegstein, and T. D. Griffiths (2011). ‘Brain Bases for Auditory
Stimulus-Driven Figure-Ground Segregation’. J Neurosci 31(1): 164–171.
Ulanovsky, N., L. Las, and I. Nelken (2003). ‘Processing of Low-Probability Sounds by Cortical Neurons’.
Nat Neurosci 6(4): 391–398.
van Ee, R. (2009). ‘Stochastic Variations in Sensory Awareness are Driven by Noisy Neuronal
Adaptation: Evidence from Serial Correlations in Perceptual Bistability’. J Opt Soc Am A Opt Image Sci
Vis 26(12): 2612–2622.
van Leeuwen, C. (this volume). ‘Continuous versus Discrete Stages, Emergence versus Microgenesis.’ In The
Oxford Handbook of Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
van Noorden, L. P. A. S. (1975). Temporal Coherence in the Perception of Tone Sequences. Doctoral
dissertation, Technical University Eindhoven.
Vliegen, J. and A. J. Oxenham (1999). ‘Sequential Stream Segregation in the Absence of Spectral Cues’.
J Acoust Soc Am 105(1): 339–346.
von Helmholtz, H. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of Music
(London: Longmans, Green, and Co.).
Wagemans, J., J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, M. Singh, and R. von der Heydt
(2012). ‘A Century of Gestalt Psychology in Visual Perception, I: Perceptual grouping and figure-ground
organization’. Psychol Bull 138(6): 1172–1217.
Warren, R. M., J. M. Wrightson, and J. Puretz (1988). ‘Illusory Continuity of Tonal and Infratonal Periodic
Sounds’. J Acoust Soc Am 84(4): 1338–1342.
Wessel, D. L. (1979). ‘Timbre space as a musical control structure’. Computer Music Journal 3: 45–52.
Wightman, F. L. and R. Jenison (1995). ‘Auditory Spatial Layout’. In Perception of Space and Motion, ed.
W. Epstein and S. J. Rogers, pp. 365–400 (San Diego, CA: Academic Press).
Winkler, I., E. Sussman, M. Tervaniemi, J. Horváth, W. Ritter, and R. Näätänen (2003). ‘Preattentive
Auditory Context Effects’. Cogn Affect Behav Neurosci 3(1): 57–77.
Winkler, I., W. A. Teder-Salejarvi, J. Horváth, R. Näätänen, and E. Sussman (2003). ‘Human Auditory
Cortex Tracks Task-Irrelevant Sound Sources’. Neuroreport 14(16): 2053–2056.
Winkler, I. and N. Cowan (2005). ‘From Sensory to Long-Term Memory: Evidence from Auditory Memory
Reactivation Studies’. Exp Psychol 52(1): 3–20.
Winkler, I., R. Takegata, and E. Sussman (2005). ‘Event-Related Brain Potentials Reveal Multiple Stages in
the Perceptual Organization of Sound’. Brain Res Cogn Brain Res 25(1): 291–299.
620 Denham and Winkler

Winkler, I., T. L. van Zuijen, E. Sussman, J. Horváth, and R. Näätänen (2006). ‘Object Representation in
the Human Auditory System’. Eur J Neurosci 24(2): 625–634.
Winkler, I. (2007). ‘Interpreting the Mismatch Negativity’. Journal of Psychophysiology 21: 147–163.
Winkler, I., S. L. Denham, and I. Nelken (2009). ‘Modeling the Auditory Scene: Predictive Regularity
Representations and Perceptual Objects’. Trends Cogn Sci 13(12): 532–540.
Winkler, I. (2010). ‘In Search for Auditory Object Representations’. In Unconscious Memory Representations
in Perception: Processes and Mechanisms in the Brain, ed. I. Winkle and I. Czigler, pp. 71–106
(Amsterdam: John Benjamins).
Winkler, I. and I. Czigler (2012). ‘Evidence from Auditory and Visual Event-Related Potential (ERP)
Studies of Deviance Detection (MMN and vMMN) Linking Predictive Coding Theories and Perceptual
Object Representations’. Int J Psychophysiol 83(2): 132–143.
Winkler, I., S. Denham, R. Mill, T. M. Bohm, and A. Bendixen (2012). ‘Multistability in Auditory Stream
Segregation: A Predictive Coding View’. Philos Trans R Soc Lond B Biol Sci 367(1591): 1001–1012.
Yabe, H., I. Winkler, I. Czigler, S. Koyama, R. Kakigi, T. Sutoh, T. Hiruma, and S. Kaneko (2001).
‘Organizing Sound Sequences in the Human Brain: The Interplay of Auditory Streaming and Temporal
Integration’. Brain Res 897(1–2): 222–227.
Yildiz, I. B. and S. J. Kiebel (2011). ‘A Hierarchical Neuronal Model for Generation and Online Recognition
of Birdsongs’. PLoS Comput Biol 7(12): e1002303.
Yu, A. J. (2007). ‘Adaptive Behavior: Humans Act as Bayesian Learners’. Curr Biol 17(22): R977–980.
Zhuo, G. and X. Yu (2011). ‘Auditory Feature Binding and its Hierarchical Computational Model’. Artificial
Intelligence and Computational Intelligence: Lecture Notes in Computer Science 7002: 332–338.
Zwicker, E. and H. Fastl (1999). Psychoacoustics: Facts and Models (Heidelberg, New York: Springer).
Chapter 30

Tactile and haptic perceptual


organization
Astrid M. L. Kappers and Wouter M. Bergmann Tiest

Introduction
Tactile perception refers to perception by means of touch mediated only through the cutane-
ous receptors (mechanoreceptors and thermoreceptors) located in the skin (Lederman and
Klatzky, 2009; Loomis and Lederman, 1986). When also kinesthetic receptors (mechanorecep-
tors embedded in muscles, joints and tendons) are involved, the term haptic perception is used.
Four main types of cutaneous mechanoreceptors have been distinguished: Merkel nerve endings
(small receptive field, slowly adapting), Meissner corpuscles (small receptive field, fast adapting),
Pacinian corpuscles (large receptive field, slowly adapting) and Ruffini endings (large receptive
fields, fast adapting). Together, these are responsible for the human’s large range of sensitivities
to all kinds of stimulation, such as pressure, vibration, and skin stretch. The kinesthetic sense, or
kinesthesia, contributes to the perception of the positions and movement of the limbs (Proske and
Gandevia, 2009). The main kinesthetic receptor is the muscle spindle that is sensitive to changes
in length of the muscle; its sensitivity can be adapted to the circumstances. Most of our everyday
activities involving touch (think of handling and identifying objects, maintenance of body pos-
ture, sensing the texture of food in the mouth, estimating the weight of an object, etc.) fall into the
class of haptic perception.
An interesting difference with the sense of vision is that visual receptors are restricted to a small
well-delineated organ (namely the eye), whereas touch receptors are distributed all over the body.
However, the sensitivity of these receptors varies widely over the body. A commonly used measure
for the sensitivity is the two-point-threshold, which represents the smallest distance between two
stimuli that is necessary to distinguish the stimulation from just one stimulus. Such thresholds are
typically 2–4 mm for the fingertips, but can be more than 40 mm for the calf, thigh, and shoulder
(Lederman and Klatzky, 2009; Weinstein, 1968). Another interesting fact compared with vision is
that the extremities (limbs) are not only exploratory sense organs, but they are also performatory
motor organs (Gibson, 1966).
The availability of tactual information is usually taken-for-granted and as a consequence its
importance is severely underestimated. The importance of haptics, or of touch in general, is usu-
ally illustrated by referring to its significance to those individuals that lack the use of one of the
other major senses, particularly sight. Blind (or blindfolded) humans clearly have to rely heavily
on the sense of touch. However, this observation disregards the fact that in daily life touch is of
vital importance for everyone, not just for the visually disabled: living without the sense of touch
is virtually impossible (e.g. Cole and Paillard, 1995). Patients suffering from peripheral neuropa-
thy (a condition that deafferents the limbs, depriving the person of cutaneous and haptic touch)
are unable to control their limbs without visual feedback: in the dark or when covered under a
622 Kappers and Bergmann Tiest

blanket, they are completely helpless. Such patients are fortunately rare, but they make us aware of
our reliance on touch in basically all our daily activities.
Humans are able to perceive a wide range of properties by means of touch. Some of these are
shared with vision, for example, shape and size, but others are specific for touch, such as weight,
compliance, and temperature. Properties like texture can be perceived both visually and haptically,
but in quite different ways and these could contradict each other: an object might look smooth,
but feel rough and vice versa. In 1987, Lederman and Klatzky made an inventory of the typi-
cal hand movements humans make when assessing object and material properties. Information
about weight, size, texture, shape, compliance, and temperature can be obtained by unsupported
holding, enclosure, lateral movement, contour following, pressure and static touch, respectively
(Lederman and Klatzky, 1987). These so-called exploratory procedures do not only suffice to
assess these properties, but they are optimal and often also necessary.
This chapter aims at giving a concise overview of the human haptic perception of object and
spatial properties. Insight into perceptual organization can often be obtained by studying percep-
tual illusions, as many of these rely on tricks with perceptual organization. The theoretical basis
for this idea lies in the way information from the world around us is processed. A great deal of our
representation of the world is not actually perceived, but supplemented by our brain according
to certain mechanisms. When this process goes wrong, as is the case with illusions, these mecha-
nisms are laid bare and their operation can be fathomed. The topics in this chapter will, therefore,
where possible, be illustrated with tactile or haptic illusions (e.g. Hayward, 2008; Lederman and
Jones, 2011; Robertson, 1902; Suzuki and Arashida, 1992).

Object Properties
The question ‘What is an object?’ or, in particular, ‘How do humans segregate figure from ground?’
has been investigated extensively in vision. In touch, however, only a few studies are relevant in
this respect. For example, Pawluk and colleagues (2010) asked observers to distinguish between
figure and ground by means of a ‘haptic glance’, a very brief gentle contact with all five fingers of a
hand. They showed that such a brief contact is, indeed, sufficient for the distinction between figure
and ground. A similar pop-out phenomenon, immediately separating different aspects of a haptic
scene, has been reported for haptically relevant properties such as roughness (Plaisier et al., 2008)
and compliance (van Polanen et al., 2012). Some other studies report on numerosity perception.
By actively grasping a bunch of a small number of objects (in this case spheres), one can rapidly
determine the correct number of objects (Plaisier et al., 2009), which gives clear evidence of fast
object individuation by touch.
This section will focus on the haptic perception of object properties, such as curvature, shape,
size, and weight that have received quite some attention. It will also be shown that some of these
properties are susceptible to strong illusions and these are important for our understanding of
how and what aspects of objects can be perceived by touch.

Curvature
An important aspect of a smooth shape is its curvature and it is therefore of interest if and how
well humans can perceive and discriminate curvature, and what perceptual mechanism is used
for haptic curvature perception. The first studies on curvature perception focused on the question
how well humans could decide whether a stimulus was concave, straight or convex. Hunter (1954)
and later Davidson (1972) presented curved strips on the horizontal plane and found that what
observers perceive as straight is actually somewhat concave (the middle of the stimulus bent away
Tactile and Haptic Perceptual Organization 623

from the observer). They also compared performance of blind and blindfolded sighted observers
and their conclusion was that blind observers give more ‘objective’ (that is, veridical) responses.
Davidson found that if the sighted observers were instructed to use the scanning strategies of
the blind, their performance improved. He concluded that the exploratory movement of an arm
sweep might obscure the stimulus curvature.
Gordon and Morrison (1982) were interested in how well observers could discriminate curved
from flat stimuli. Using small curved stimuli explored by active touch, they could express the
discrimination threshold in terms of geometrical stimulus properties: the base-to-peak height
of the curved stimulus divided by half its length is constant (see Figure 30.1(a)). This expression
indicates the overall gradient of the stimulus. To exclude and investigate the possible influence of
kinesthetic perception on curvature discrimination, Goodwin et al. (1991) pressed small curved
stimuli onto the fingers of observers, so that only cutaneous receptors in the finger pads could play
a role. In this way, a 10 per cent difference in curvature could be detected. In a subsequent study
(Goodwin and Wheat, 1992), they found that discrimination thresholds remained the same even
if contact area was kept constant, so contact area was not the determining factor for curvature dis-
crimination. However, discrimination performance increased with contact area. For stimuli with
a larger contact area, the base-to-peak height is also larger, so their finding was consistent with
the conclusion of Gordon and Morrison that the stimulus gradient determines the discrimination
threshold (see Figure 30.1).
Pont et al. (1997) used stimuli that were similar in curvature and size to that of Hunter (1954)
and Davidson (1972), but they used these stimuli upright and performed discrimination instead

(a)

Base-to-peak
Gradient height

Half base length

(b)

Base-to-peak
Gradient height

Half base length

(c)
Gradient Base-to-peak
height
Half base length
Fig. 30.1  Illustration of the threshold expression of Gordon and Morrison (1982). (a) A curved
stimulus has a base-to-peak height and a length. The ratio of the two divided by 2 gives the gradient
or slope. (b) A stimulus with a higher curvature has a larger base-to-peak height if the length is the
same as in (a). As a consequence, the gradient is also larger. (c) Stimulus with the same curvature
as in (a), but of smaller length. The gradient is smaller than in (a) because of the nonlinear relation
between slope and stimulus length.
624 Kappers and Bergmann Tiest

of classification experiments. In various conditions, observers had to place their hand on two suc-
cessive stimuli and they had to decide which of the two had the higher curvature. Figure 30.2(a)–
(c) shows a few of their experimental conditions: stimuli could be placed along the various fingers
as in (a), across the fingers at several locations as in (b), or even at the dorsal side of the hand as in
(c). Consistent with the previous findings, they found that the gradient of the stimuli determined
the curvature discrimination threshold. As the dorsal side of the hand contains much less cutane-
ous mechanoreceptors than the palmar side, worse discrimination performance with the dorsal
side of the hand showed the importance of the cutaneous receptors in curvature perception. They
also found that performance with statically or dynamically touching the stimuli was not signifi-
cantly different (Pont et al., 1999). Possibly this is due to the important role the cutaneous recep-
tors play in discrimination performance.
If the overall gradient or slope of the stimulus plays a major role in curvature discrimination
performance, then height and local curvature are of minor importance. Pont et al. (1999) inves-
tigated this explicitly by creating a new set of stimuli in which the order of information that
the stimulus contained was varied (see Figure 30.2(d)–(f)). The first stimulus set contained only
height differences (zeroth order information), the second set contained both height differences
and slopes (zeroth and first order information) and the third set contained in addition local curva-
ture information (zeroth, first and second order information). Participants placed their fingers on
the stimuli as shown in Figure 30.2(d)–(f) and had to decide for each stimulus pair (within a set),
which of the two was more convex. All thresholds could be expressed in terms of base-to-peak
height. Convincingly, the thresholds for the zeroth order set were much higher than for both the
two other sets. There was no significant difference in thresholds if local curvature was added to
the stimuli, so thresholds are indeed based on the gradient information.
The experiments on stimulus order by Pont et al. (1999) were necessarily done using static
touch. Dostmohamed and Hayward (2005) designed a haptic device that made it possible to per-
form similar experiments using active touch. Participants had to place a finger on a small metal

(a) (b) (c)

(d) (e) (f)

Fig. 30.2  Illustration of some of the conditions in the experiments by Pont and colleagues (1997,
1999). (a) Stimulus placed along the index finger; (b) Stimulus placed across the fingers. (c) Stimulus
presented dorsally. (d) Stimulus just containing height differences (zeroth order information). (e)
Stimulus containing height and slope differences (zeroth and first order information). (f) Stimulus
containing height, slope and curvature information (zeroth, first, and second order information).
Tactile and Haptic Perceptual Organization 625

plate and when actively moving this plate, the plate followed the trajectory of a preprogrammed
stimulus shape. In this way, Wijntjes et al. (2009) could compare discrimination performance with
the same stimulus shapes Pont et al. (1999) used. They also included a condition directly touching
the real curved shapes. Their results were consistent with those obtained for static touch: height
information alone is not sufficient, but as soon as first order information (slope) is present, perfor-
mance is just as good as with the curved shapes. Therefore, the determining factor for curvature
discrimination performance is the overall gradient in the stimulus. It is clear that the principles
of perceptual organization are at work here: from just the orientation of the surface in a few loca-
tions, the entire curved surface is reconstructed according to the principle of good continuation.
Not only is the surface reconstructed, its curvature can also be perceived as accurately as in the
case of a complete surface.

Illusions of curvature
Although humans are sensitive to only small differences in curvature, their perception of
curvature is not veridical. Both Hunter (1954) and Davidson (1972) reported that what is
perceived as straight is actually curved away from the observer. Davidson’s explanation was
that a natural hand movement also follows a curved line, obscuring the stimulus’ curvature.
Vogels et al. (1996, 1997) found that a three-dimensional surface that is perceived as flat cor-
responds to a geometrically concave surface. In other words, an actually flat surface is usually
perceived as convex. There are other, even more pronounced, curvature illusions that will be
described below.

Anisotropy of the hand


Pont et  al. (1999) not only showed that curvature discrimination thresholds decreased with
increasing stimulus length, they also showed that the perceived curvature was larger for stimuli of
larger length. This has an interesting implication: as human hands are usually longer than wide,
perceived curvature of a sphere would be larger along the fingers than across the fingers. Pont
et al. (1998) tested this experimentally and could confirm the prediction that spherical objects are
perceived as ellipsoidal.

Curvature after effects


Gibson (1933) was the first to show that touching a curved strip leads to after effects. He asked
observers to move back and forth along a curved strip for 3 minutes and he reported that a
subsequently touched straight strip felt as curved in the opposite direction. Vogels et al. (1996)
performed extensive experiments investigating the curvature after effect of touching a curved
three-dimensional shape. In their experiments, observers, seated behind a curtain, had to place
their hand on a curved adaptation surface for only 5 s, and then decide for the next touched
shape presented at the same location whether it was convex or concave. By systematically vary-
ing the curvatures of both the adaptation and the test surfaces, they established that the strength
of the after effect was about 20 per cent of the curvature of the adaptation shape. Moreover, they
showed that an adaptation time of only 2 s was sufficient to obtain a measurable after effect and
after 10 s the effect was already at its maximum. On the other hand, a delay between touching
the adaptation surface and the test surface of 40 s could not eliminate the after effect.
In a follow-up study, Vogels et al. (1997) tried to locate the origin of this curvature after effect.
During a delay between touching the adaptation and test surfaces, observers were instructed to
either keep their hand still in the air, make a fist, or bend and stretch their hand periodically. In this
way, they varied the degree in which the cutaneous, joint, and muscle receptors were stimulated
626 Kappers and Bergmann Tiest

during the decay. As they did not find differences between the three conditions, they concluded
that peripheral receptors do not play a major role in causing the after effect. In a small experiment
with only two participants, they also tested whether the after effect transferred to the other hand.
As they did not find an indication of such a transfer, they had to conclude that the origin of the
after effect is neither of a high level.
Van der Horst et al. (2008a) found not only a substantial after effect when the curved surface
was just touched by a single finger, they also found a partial transfer of the after effect to other
fingers, both of the same hand and of the other hand. Because the transfer is only partial, they
conclude that the major part of the after effect is caused at a level where the individual fingers are
represented, but that in addition a part has to occur at a level shared by the fingers. Interestingly, in
another study Van der Horst et al. (2008b) found a full transfer of the after effect when the curved
surfaces were touched dynamically. They conclude that the level of the representation of curvature
apparently depends on the way the information is acquired (see Kappers (2011) for an overview
of all after effect studies).

Curvature perception induced by force


Robles-De-La-Torre and Hayward (2001) designed a haptic device with which they could com-
bine a geometric stimulus presentation with a horizontal force profile. Among others, they found
that if a flat physical surface was presented together with a force profile of either a bump or a hole,
observers perceived a bump or a hole. Even when a virtual bump or hole was combined with a
physical hole or bump, the virtual stimulus dominated the percept. They concluded that force
could overcome object geometry in the active perception of curvature.

Shape
Curvature is an important property of smooth shapes, but it is also of interest to investigate the
perception of shape itself. A first study was conducted by Gibson (1963), who used a set of smooth
solid objects that were ‘equally different’ from one another to perform matching and discrimi-
nation experiments. He concluded that blindfolded observers could distinguish such shapes by
touch. Klatzky and colleagues (1985) used a large set of common daily life objects, such as a comb,
wallet, screw, and tea bag, and they established that such three-dimensional objects could be rec-
ognized accurately and rapidly by touch alone. Norman and colleagues (2004) made plastic copies
of bell peppers, which they used in matching and discrimination experiments, both unimodally
(touch or vision) and bimodally (touch and vision). As the results in the various conditions were
quite similar, they concluded that the visual and haptic representations of three-dimensional
shape are functionally overlapping.
A different approach was followed by van der Horst and Kappers (2008). They used a set of
cylindrical objects with different elliptical cross-sections and a set of blocks with rectangular
cross-sections. The task of the observers was to grasp (without lifting) a pair of objects and deter-
mine which of the two had the circular (for the cylinders) or square (for the blocks) cross-section.
They found that an aspect ratio (i.e. ratio between the longer and the shorter axes) of 1.03 was
sufficient to distinguish circular from elliptical, but an aspect ratio of 1.11 was necessary for dis-
tinguishing square from rectangular. This was somewhat surprising, since the aspect ratio is more
readily available in the block than in the cylinders. They concluded that apparently the curva-
ture information present in the cylinders could be used in a reliable manner. Using a similar set
of objects, Panday et al. (2012) studied explicitly how local object properties (such as curvature
variation and edges) influenced the perception of global object perception. They found that both
Tactile and Haptic Perceptual Organization 627

curvature and curvature change could enhance performance in an object orientation detection
task, but edges deteriorated performance.

Size
Objects are always extended and thus have a certain size. Size can be measured in one, two, or
three dimensions, which corresponds to length, area, and volume. In this section, we will restrict
ourselves to the haptic perception of length and volume.

Length
An object’s length can basically be perceived in two ways. The first is the finger-span method, in
which the object is enclosed between thumb and index finger. This method is restricted to lengths
of about 10  cm or less, depending on hand size. The best accuracy (discrimination threshold)
with which lengths can be perceived in this way is about 0.5 mm (1 per cent) for a 5-cm reference
length (Langfeld, 1917). For greater lengths, the thresholds increase somewhat up to about 3 mm
for a 9-cm reference length (Stevens and Stone, 1959).
For even larger objects, the finger-span method cannot be used and movement is required to
perceive the object’s length. When moving the finger over the side of an object, two sources of
information are available—the distance travelled can be derived from the kinesthetic information
from muscles and joints. At the same time, it can also be extracted from the cutaneous informa-
tion of the fingertip moving over the surface by estimating the movement speed and duration.
Length perception with the movement method is a lot less accurate than the finger span method.
Based on kinesthetic information, the length discrimination threshold for an 8-cm reference
length is 11 mm (14 per cent), while based on cutaneous information, it is 25 mm (32 per cent)
(Bergmann Tiest et al., 2011). In conclusion, haptic length perception can be done with either the
finger-span method, kinesthetic movement information, or cutaneous movement information,
with varying degrees of accuracy.

Illusions of length
A well-known illusion in haptic length perception is the radial-tangential illusion, in which lengths
explored in the radial direction (away from and towards the body) are perceived to be larger than
lengths explored in the tangential direction (parallel to the frontoparallel plane; Armstrong and
Marks, 1999). This indicates that haptic space is anisotropic and that the perceived length of an
object depends on its orientation.
Regarding the different methods, it has been found that lengths perceived by the
finger-span method are judged to be shorter than by the movement method, both in a
perception-and-reproduction task (Jastrow, 1886) and in a magnitude estimation task using a
visual scale (Hohmuth et al., 1976). The difference in perceived length between the methods was
as high as a factor of 2.5 in some cases. Furthermore, lengths perceived using the movement
method with only cutaneous information were underestimated more than with only kinesthetic
information (Terada et al., 2006). When kinesthesia and cutaneous perception yielded conflicting
information, the estimate was found to be based on the greatest length.
Finally, the well-known Müller-Lyer illusion, in which the length of a line is perceived differ-
ently depending on the type of arrowheads present at the ends, has been demonstrated in touch
as well as in vision (Millar and Al-Attar, 2002; Robertson, 1902). All in all, these illusions indicate
that haptic length perception is not independent of the direction or the type of movements made,
nor of the direct environment of the object to be perceived.
628 Kappers and Bergmann Tiest

Volume
Although quite a number of studies focused on the perception of weight (see below), which
usually correlates with object size unless different materials are compared, only a few studies
investigated the haptic perception of volume. Volume is typically assessed by enclosing the
object with the hand(s) (Lederman and Klatzky, 1987). Kahrimanovic et al. (2011b) investi-
gated the just noticeable difference (JND) of spheres, cubes, and tetrahedrons that fitted in the
hand. They found that for the smaller stimuli of their set, the volumes of tetrahedra were sig-
nificantly more difficult to discriminate than those of cubes and spheres, with Weber fractions
of 0.17, 0.15, and 0.13, respectively. The availability of weight information did not improve
performance.
As visual estimates of volume were found to be biased depending on the object geometry,
Krishna (2006) decided to investigate this so-called ‘elongation bias’ haptically. She found that in
touch, an effect opposite to that in vision occurred: a tall glass was perceived as larger in volume
than a wide glass of the same volume. Her conclusion was that, whereas in vision, ‘height’ is a sali-
ent feature, for touch ‘width’ would be more salient. As objects can differ along more geometric
dimensions than just height or width, Kahrimanovic et al. (2010) investigated volume discrimin-
ation of spheres, cubes and tetrahedra (see Figure 30.3 left). These stimuli were of a size that fitted
in one hand. They found substantial biases: tetrahedra were perceived as much larger than spheres
(about 60 per cent) and cubes (about 30 per cent). Somewhat smaller, but still substantial biases
were found when observers had access to the mass (weight) of the object (although they were not
told explicitly that weight correlated with volume).
The subsequent step in the research was to investigate the physical correlates of these volume
biases. If the volumes of spheres, cubes, and tetrahedra are the same, then, among others, their
surface area and maximal length are not identical. It turned out that for volumes that were per-
ceived as being equal, the surface areas of the objects were almost the same (Kahrimanovic et al.,
2010). If participants were instructed to compare surface area of these shapes, their performance
was almost unbiased. This outcome makes sense, if one realizes that surface area correlates with
skin stimulation, which is a more direct measure of object size than the more ‘abstract’ volume.
If the cue of surface area of the cubes and tetrahedrons was absent by using wire frame objects,
biases increased to an average of 69 per cent in the cube-tetrahedron comparison. In this condi-
tion, the maximum length between two vertex points was the factor correlating with the partici-
pant’s perceived volume. Again, this can be understood by realizing that now length is the more

Fig. 30.3  Examples of tetrahedral stimuli as used by Kahrimanovic et al. (2010, 2011).


Tactile and Haptic Perceptual Organization 629

direct stimulus compared with volume. It seems to be a general principle of haptic perceptual
organization that volume is perceived on the basis of the most readily available geometric prop-
erty of the stimulus.
In a follow-up study, similar shapes but of a size much larger than the hand were used (see
Figure 30.3 right). Again a tetrahedron was perceived as larger than both the sphere (22 per cent)
and the cube (12 per cent), and the cube was perceived as larger than the sphere (8 per cent),
although the latter difference was not significant. From these smaller differences than in the pre-
vious study, it could already be seen that surface area could not be the (sole) responsible factor.
This need not be surprising. The objects are larger than the hands, so the skin area stimulated
when holding the objects is probably very similar (namely the whole hand surface) for all shapes.
Moreover, bimanual perception necessarily takes places at a higher level than unimanual percep-
tion, so the experimental findings need not be the same.

Weight
One of the first to report on weight perception was Weber (1834/1986). Since then, quite a number
of studies investigated human discriminability of weight (for an overview, see Jones (1986)). The
methods used to measure these thresholds are rather diverse and as a consequence the reported
Weber fractions also vary over a wide range, from 0.09 to 0.13 for active lifting. Thresholds
obtained with passively resting hands are higher, suggesting that receptors in muscles play a role
in weight discrimination (Brodie and Ross, 1984). Jones (1986) also gives an overview of the rela-
tionships between perceived weight and physical weight and also these vary widely: most authors
report power functions, but their exponents range from 0.7 to 2.0. When participants were asked
to enclose the objects (sphere, cubes, or tetrahedrons), Weber fractions for weight discrimination
were even higher (0.29). They were also higher than volume discrimination thresholds obtained
with the same objects, so apparently weight information could not be the determining factor in
volume discrimination (Kahrimanovic et al., 2011a).

Illusions involving weight


A well-known illusion concerning weight is the size–weight illusion. The first experimental evi-
dence was established by Charpentier in 1891 (Murray et  al., 1999). In this illusion, a smaller
object is perceived as heavier than a larger object of equal weight. There have been many attempts
to explain this illusion, such as the ‘expectation theory’ which uses the fact that in general there is a
correlation between size and weight of an object, or the ‘information-integration theory’ in which
size is considered to be an object property that affects its perceived weight (Ellis and Lederman,
1993). The information-integration theory holds that different cues (in this case weight, vol-
ume, or density) are combined with different weight factors to form the final percept. In many of
the experiments, visual inspection plays an essential role. However, Ellis and Lederman (1993)
showed that just as strong an illusion occurs with blindfolded sighted and congenitally blind
observers, suggesting that this illusion is a haptic phenomenon. They concluded that the existing
theories were not really able to predict their results, and that the illusion probably has a sensory
and not a cognitive basis.
There also exists a material-weight illusion, where objects made of a heavier (higher den-
sity) material are perceived to be lighter than same-sized objects of lighter material (e.g. Ellis
and Lederman, 1999). Ellis and Lederman (1999) showed that with only haptic information a
full-strength illusion can be obtained, whereas just visual information caused at most a moderate
illusion.
630 Kappers and Bergmann Tiest

These illusions show that different cues, which may not always be relevant to the task, contrib-
ute to the final percept. This suggests the existence of a mechanism, also in haptic perception, that
synthesizes the perception of an object from different information sources, possibly operating
according to Gestalt laws.

Spatial Properties
The haptic sense does not only provide us with object properties, but also the relations between
these objects or parts of objects have to be perceived. The perception of such spatial relations has
been studied most extensively in raised line drawings.

Line drawings
Although three-dimensional objects are easy to recognize by touch (see above), two- dimensional
raised line drawings are very hard to recognize (e.g. Heller, 1989; Klatzky et al., 1993; Loomis et al.,
1991; Magee and Kennedy, 1980; Picard and Lebaz, 2012), even with extended exploration times.
To illustrate this phenomenon, blindfolded observers had to explore a wire frame stimulus of a
house in an informal experiment, and when they felt confident that they could draw what they had
felt, they stopped the exploration that typically took several minutes, removed the blindfold and
made a drawing without seeing the stimulus. It can be seen in Figure 30.4, that some of the par-
ticipants clearly recognized a house, but most of them missed several details, such as parts like the
door, the bottom line of the roof or the placement of the chimney. Other participants had no idea
of the shape and were also not able to draw it. They missed (in addition) more important aspects
such as the straightness of lines, the relation between lines or the fact that many of the angles are
right. Note that observer LB was only able to recognize the house after he saw his own drawing.
One of the explanations given for the poor performance in recognizing line drawings, lies in the
difficulty to integrate spatial information. In the case of the line drawings, information is acquired
sequentially and has to be integrated over time into a coherent representation, a process possibly
governed by Gestalt laws. Loomis et al. (1991) compared tactual performance with that of explor-
ing a drawing visually with just a very limited field of view. If the field of view was similar in size

MM
LB SP IH
Original

ML PD
GO MH

Fig. 30.4  Result of an informal experiment. The original ‘house’ is a wire frame placed flat on a table
in the correct orientation. Blindfolded participants were asked to explore the stimulus and draw it
when they felt ready to do so. Exploration time was free and usually in the order of minutes. The
resulting drawings of the eight participants are shown.
Tactile and Haptic Perceptual Organization 631

to that of a finger pad, visual and tactual recognition performance was comparable. In an experi-
ment where the finger of the observer was either guided by the experimenter or actively moved
by the observer, performance was better in the guided condition (Magee and Kennedy, 1980). The
explanation could be that in the active condition movements are much noisier, making integra-
tion of information harder.
The role of vision in recognizing raised line drawings is somewhat controversial (e.g. Picard
and Lebaz, 2012). Some authors report similar performance of blindfolded sighted and con-
genitally blind observers (e.g. Heller, 1989), whereas others report worse performance for blind
observers (e.g. Lederman et al., 1990). In any case, from several studies, notably those by Kennedy
(e.g. 1993), it follows that congenitally blind observers are able to use raised line drawings to their
advantage.
Based on an idea by Ikeda and Uchikawa (1978), Wijntjes and colleagues (2008) gave blind-
folded observers 45 s to recognize drawings of common objects, such as a hammer, a car and
a duck. After this time period, they were forced to guess what they thought the object was.
Subsequently, in the case of a wrong answer (about 50 per cent of the cases), they had to draw
what they felt. Half of the observers had to do that without a blindfold, the other half with blind-
fold. Those who drew without blindfold, recognized their own drawing in about 30 per cent
of the cases; those who drew with blindfold mostly remained unaware of what the object was.
These different outcomes showed that the execution of motor movements during drawing could
not be the cause of the recognition. Naive observers also recognized the recognized drawings.
Therefore, the authors conclude that the mental capacities required to identify the drawing are
not sufficient. Externalization of the stimulus, as done by drawing on a sketchpad, seems to be a
process that can be used in the identification of serial input that needs to be integrated.

Spatial patterns
Gestalt psychologists have identified a number of regularities or ‘laws’ that can be used to explain
how humans categorize and group individual items, and how they perceive spatial patterns.
Principles of ‘similarity’, ‘proximity,’ and ‘good continuation’ can explain how humans group
items that seem to belong together. Almost all research has been performed using visual experi-
ments and only recently a few studies investigated the existence of such laws in the touch domain
(Gallace and Spence, 2011).

Proximity and similarity


Items that are close together (close proximity) will be perceived as being related and these will
be perceived as a group. Items that share some property such as color, shape, or texture will be
grouped because of their similarity. Chang and colleagues (2007b) performed an experiment com-
paring visual and haptic grouping principles. Their stimuli consisted of cards with elements that
differed in both color for the visual condition and texture for the haptic condition. Participants
were asked how they would group the elements and why. Groups could differ in number, prox-
imity, and similarity of the elements. Depending on the stimulus organization, items were either
grouped on the basis of spatial proximity or on the basis of their texture. For a large part, the
groupings in vision and haptics were similar, suggesting that the Gestalt laws of proximity and
similarity are also valid for touch. In a rivalry experiment, Carter et al. (2008) showed that the
proximity of tactile stimuli could bias the perceived movement direction of an ambiguous appar-
ent motion stimulus. As their tactile and visual experiments yielded similar results, they suggest
that this might be based on a strategy common to all modalities.
632 Kappers and Bergmann Tiest

Overvliet et al. (2012) used a search task to investigate the influence of similarity and proxim-
ity on finding a target item pair among distractor pairs. Their stimuli consisted of two columns
of small vertical and horizontal bars. They found, among others, that if distractors consisted of
pairs of different items and the target of a pair of identical items, performance was worse (longer
reaction times) than in the reverse condition. However, when searching for a different pair among
identical pairs, the task can be performed by just searching for the odd-one-out in either the left or
the right column. There is no need to correlate the input from the left and right fingers (although
that was the task instruction). This makes the task inherently easier than the reverse task, but in
our opinion, it is questionable whether this has to do with the Gestalt concept of similarity. The
finding that there is no influence of proximity (between the pairs of stimuli in the two columns)
can be explained in the same way.

Good continuation
Items that are aligned tend to be perceived as a group and will be integrated to a perceptual whole.
Chang and colleagues (2007a) also designed a ‘good continuation’ experiment, once again com-
paring visual and haptic performance. They constructed 16 different layouts, shapes that were
partially occluded. The occlusion was represented both by color and texture, so that the same
stimuli could be used in the visual and haptic experiments. They found that overall visual and
haptic behavior was nearly the same, indicating that the Gestalt principle of continuation is also
applicable to touch.

Spatial relations
Helmholtz (1867/1962) was one of the first to notice that visual perception of the world around
us is not veridical. Hillebrand (1902) showed that lines that appeared parallel to the eye were not
at all parallel. A  few years later, Blumenfeld (1913) showed that also visually equidistant lines
are not physically parallel, and, interestingly, that they are different from the ‘parallel alleys’ of
Hillebrand. In the literature, a discussion started about the concept and existence of ‘visual space’.
Inspired by these findings, Blumenfeld (1937) decided to perform similar experiments to investi-
gate the veridicality of haptic space. With pushpins, he fixed two threads to a table and he asked
blindfolded observers to straighten these threads by pulling them towards themselves in such
a way that they would be parallel to each other. Blumenfeld found that these threads were not
parallel: if the distance between the two pushpins was smaller than the observer’s shoulder width,
the threads diverged; if the distance was larger, the threads converged. In the same year, also von
Skramlik (1937) reported on the distortion of haptic space.
For a long time, hardly any research on the perception of haptic space was performed. In the
late nineties, Kappers and colleagues decided to investigate the haptic perception of parallelity in
more detail. Their first set-up consisted of a table on which 15 protractors in a 5 by 3 grid were
placed (e.g. Kappers and Koenderink, 1999). An aluminum bar of 20 cm could be placed on each
of the protractors. The bars could rotate around the center of the protractor. A typical experiment
consisted of a reference bar placed at a certain location in an orientation fixed by the experimenter
and a test bar at another location in a random orientation. The task of the blindfolded observers
was to rotate the test bar in such a way that it felt parallel to the reference bar. In all conditions,
either uni- or bimanual, large, but systematic deviations of parallelity were found. Depending on
the condition, these deviations could be more than 90°. The bar at the right hand side (either the
reference or the test) had to be rotated clockwise with respect to a bar to the left of it in order to be
perceived as haptically parallel (e.g. Kappers, 1999, 2003; Kappers and Koenderink, 1999). These
Tactile and Haptic Perceptual Organization 633

findings were reproduced in other labs (e.g. Fernández-Díaz and Travieso, 2011; Kaas and van
Mier, 2006; Newport et al., 2002).
The current explanation for the deviations is that they are caused by the biasing influence of an
egocentric reference frame (e.g. Kappers, 2005, 2007; Zuidhoek et al., 2003). The task of the obser-
ver is to make the two bars parallel in an allocentric (physical) reference frame, but of course, the
observer only has recourse to egocentric reference frames, such as the hand or the body reference
frame (see Figure 30.5). If the task had been performed (unintentionally) in an egocentric refer-
ence frame, the deviations would occur in the direction found. However, the deviations are not
as extreme as predicted by performance in just an egocentric reference frame, but they are biased
in that direction.
The evidence for this explanation is accumulating rapidly. For example, a time delay between
exploration of the reference bar and setting of the test bar causes a reduction of the deviation
(Zuidhoek et al., 2003), although in general a time delay would cause a deterioration of task per-
formance. The explanation is thought to lie in a shift during the delay from the egocentrically
biased spatial representation to a more allocentric reference frame, as suggested by Rossetti et al.
(1996) in pointing experiments. Non-informative vision (i.e. vision of the environment without
seeing the stimuli or set-up) strengthens the representation of the allocentric reference frame.
It was shown that this indeed leads to a reduction of the deviations (e.g. Newport et al., 2002;

Parallel in allocentric
reference frame

Haptically parallel

Parallel in egocentric
reference frame

Fig. 30.5  Illustration of different reference frames. (Top) Allocentric reference frame. This reference
frame coincides with a physical reference frame fixed to the table. Parallel bars have the same
orientation with respect to the protractor, independent of the location of the protractor. (Middle)
Haptically parallel. The two bars shown are perceived as haptically parallel by one of the observers
(the size of the deviations strongly depends on observer). (Bottom) Egocentric reference frame, in
this case fixed to the hand. The two bars have the same orientation with respect to the orientation
of the hand. The orientation of the hand will depend on its location, so the deviation from veridical
will directly depend on the hand. It can be seen that haptically parallel lies in between allocentrically
and egocentrically parallel.
634 Kappers and Bergmann Tiest

Zuidhoek et al., 2004). Asking observers to make two bars perpendicular, results for some observ-
ers in almost parallel bars (Kappers, 2004). This is consistent with what would be predicted on the
basis of the reference frame hypothesis. Moreover, mirroring bars in the mid-sagittal plane gave
almost veridical performance (Kappers, 2004; Kaas and van Mier, 2006). This is to be expected as
performance in both an egocentric and an allocentric reference frame would lead to veridical set-
tings. Moreover, the deviations obtained on mid-sagittal (Kappers, 2002), frontoparallel (Volcic
et al., 2007) and three-dimensional set-ups (Volcic and Kappers, 2008) can all be explained with
this same hypothesis.
The nature of the biasing egocentric reference frame originates most probably in a combina-
tion of the hand and the body. Kappers and colleagues (Kappers and Liefers, 2012; Kappers and
Viergever, 2006) manipulated the orientation of the hand during the exploration of the bars and
they showed that the deviation was linearly related to the orientation of the hand, that is, the ori-
entation of the hand reference frame. However, even when the two hands were aligned, a small but
significant deviation remained and this is consistent with influence of the body reference frame.

Illusions of orientation
The above-described investigations on the non-veridicality of haptic space already show that per-
ception of orientation is apt to yield illusions. Another class of illusions concerns the so-called
oblique effect (e.g. Appelle and Countryman, 1986; Gentaz et  al., 2008; Lechelt and Verenka,
1980). This effect, also reported in vision, shows itself in more variable performance for oblique
orientations (usually 45° or 135°) than for horizontal and vertical orientations (0° and 90°). Gentaz
and colleagues (Gentaz et al., 2008) attribute the haptic oblique effect to gravitational cues and
memory constraints that are specific for haptics.

Concluding Remarks
We focused this chapter on the haptic perception of objects and spatial properties, and left out
all mention of the perception of material properties. Using haptic perception, our mind creates
a representation of the world around us based on observed curvatures, shapes, sizes, weights,
and orientations of objects. It remains to be seen whether all these elements fit together into a
consistent representation governed by rules similar to those formulated by Gestalt psychologists
for visual perception. As we have seen, the perception of these elements is fraught with illusory
effects. The perception of size, orientation, shape, and weight all interact with each other, produc-
ing different results in different situations. It is these interactions that may be very instructive in
the deconstruction of the haptic perceptual system, and it is for this reason that, in addition to
studying the elements in isolation, the interactions between them should be studied and their
mechanisms fathomed.

References
Appelle, S. and Countryman M. (1986). Eliminating the haptic oblique effect: influence of scanning
incongruity and prior knowledge of the standards. Perception 15(3): 325–329.
Armstrong, L. and Marks L. E. (1999). Haptic perception of linear extent. Percept Psychophys
61(6): 1211–1226.
Bergmann Tiest, W. M., van der Hoff, L. M. A. and Kappers A. M. L. (2011). Cutaneous and kinesthetic
perception of traversed distance. In Proc. IEEE World Haptics Conference, edited by C. Basdogan,
S. Choi, M. Harders, L. Jones, and Y. Yokokohji, pp. 593–597 (Istanbul: IEEE).
Tactile and Haptic Perceptual Organization 635

Blumenfeld, W. (1913). Untersuchungen über die scheinbare Grösse im Sehraume. Zeitschr Psychol 65:
241–404.
Blumenfeld, W. (1937). The relationship between the optical and haptic construction of space. Acta Psychol
2: 125–174.
Brodie, E. E. and Ross, H. E. (1984). Sensorimotor mechanisms in weight discrimination. Percept
Psychophys 36(5): 477–481.
Carter, O., Konkle, T., Wang, Q., Hayward, V., and Moore, C. (2008). Tactile rivalry demonstrated with an
ambiguous apparent-motion quartet. Curr Biol 18(14): 1050–1054.
Chang, D., Nesbitt, K. V., and Wilkins, K. (2007a). The Gestalt principle of continuation applies to both
the haptic and visual grouping of elements. In Second Joint EuroHaptics Conference and Symposium on
Haptic Interfaces for Virtual Environment and Teleoperator Systems (WHC’07), pp. 15–20.
Chang, D., Nesbitt, K. V., and Wilkins, K. (2007b). The Gestalt principles of similarity and proximity
apply to both the haptic and visual grouping of elements. In Proc Eight Australasian Conference on User
Interface, Vol. 64: pp. 79–86 (Darlinghurst: Australian Computer Society, Inc.).
Cole, J., and Paillard, J. (1995). Living without touch and peripheral information about body position and
movement: studies with deafferented patients. In The Body and the Self, edited by J. L. Bermudez, N.
Eilan, and A. Marcel, pp. 245–266 (Cambridge, MA: MIT press).
Davidson, P. W. (1972). Haptic judgments of curvature by blind and sighted humans. J Exp Psychol
93(1): 43–55.
Dostmohamed, H., and Hayward, V. (2005). Trajectory of contact region on the fingerpad gives the illusion
of haptic shape. Exp Brain Res 164(3): 387–94.
Ellis, R. R., and Lederman, S. J. (1993). The role of haptic versus visual volume cues in the size-weight
illusion. Percept Psychophys 53(3): 315–324.
Ellis, R. R., and Lederman, S. J. (1999). The material-weight illusion revisited. Percept Psychophys
61(8): 1564–1576.
Fernández-Díaz, M., and Travieso, D. (2011). Performance in haptic geometrical matching tasks depends
on movement and position of the arms. Acta Psychol 136(3): 382–389.
Gallace, A., and Spence, C. (2011). To what extent do Gestalt grouping principles influence tactile
perception? Psychol Bull 137(4): 538–61.
Gentaz, E., Baud-Bovy, G., and Luyat, M. (2008). The haptic perception of spatial orientations. Exp Brain
Res 187(3): 331–348.
Gibson, J. J. (1933). Adaptation, after-effect and contrast in the perception of curved lines. J Exp Psychol
16(1): 1–31.
Gibson, J. J. (1963). The useful dimensions of sensitivity. Am Psychol 18: 1–15.
Gibson, J. J. (1966). The Senses Considered as Perceptual Systems (Boston: Houghton Mifflin Company).
Goodwin, A. W., John, K. T., and Marceglia, A. H. (1991). Tactile discrimination of curvature by humans
using only cutaneous information from the fingerpads. Exp Brain Res 86(3): 663–672.
Goodwin, A. W., and Wheat, H. E. (1992). Human tactile discrimination of curvature when contact area
with the skin remains constant. Exp Brain Res 88(2): 447–450.
Gordon, I. A., and Morison, V. (1982). The haptic perception of curvature. Percept Psychophys 31: 446–450.
Hayward, V. (2008). A brief taxonomy of tactile illusions and demonstrations that can be done in a
hardware store. Brain Res Bull 75(6): 742–752.
Heller, M. A. (1989). Texture perception in sighted and blind observers. Percept Psychophys 45(1): 49–54.
Hillebrand, F. (1902). Theorie der scheinbaren Grösse bei binocularem Sehen. Denkschrift Wiener Akad
Mathemat-Naturwissensch Klasse 72: 255–307.
Hohmuth, A., Phillips, W. D., and VanRomer, H. (1976). A discrepancy between two modes of haptic
length perception. J Psychol 92(1): 79–87.
636 Kappers and Bergmann Tiest

Hunter, I. M. L. (1954). Tactile-kinesthetic perception of straightness in blind and sighted humans. Q J Exp
Psychol 6: 149–154.
Ikeda, M., and Uchikawa, K. (1978). Integrating time for visual pattern perception and a comparison with
the tactile mode. Vision Res 18(11): 1565–1571.
Jastrow, J. (1886). The perception of space by disparate senses. Mind 11(44): 539–554.
Jones, L. A. (1986). Perception of force and weight: theory and research. Psychol Bull 100(1): 29–42.
Kaas, A., and van Mier, H. (2006). Haptic spatial matching in near peripersonal space. Exp Brain Res
170: 403–413.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2010). Haptic perception of volume
and surface area of 3-D objects. Atten Percept Psychophys 72(2): 517–527.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2011a). Characterization of the haptic
shape-weight illusion with 3-dimensional objects. IEEE Trans Haptics 4(4): 316–320.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2011b). ‘Discrimination thresholds for
haptic perception of volume, surface area, and weight’. Atten Percept Psychophys 73(8): 2649–2656.
Kappers, A. M. L. (1999). Large systematic deviations in the haptic perception of parallelity. Perception
28(8): 1001–1012.
Kappers, A. M. L. (2002). Haptic perception of parallelity in the midsagittal plane. Acta Psychol
109(1): 25–40.
Kappers, A. M. L. (2003). Large systematic deviations in a bimanual parallelity task: further analysis of
contributing factors. Acta Psychol 114(2): 131–145.
Kappers, A. M. L. (2004). The contributions of egocentric and allocentric reference frames in haptic spatial
tasks. Acta Psychol 117(3): 333–340.
Kappers, A. M. L. (2005). Intermediate frames of reference in haptically perceived parallelity. In Proc
1st Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and
Teleoperator Systems, pp. 3–11 (Pisa, Italy: IEEE Computer Society).
Kappers, A. M. L. (2007). Haptic space processing—allocentric and egocentric reference frames. Can J Exp
Psychol 61(3): 208–218.
Kappers, A. M. L. (2011). Human perception of shape from touch. Phil Trans R Soc B 366: 3106–3114.
Kappers, A. M. L., and Koenderink, J. J. (1999). Haptic perception of spatial relations. Perception
28(6): 781–795.
Kappers, A. M. L., and Liefers, B. J. (2012). What feels parallel strongly depends on hand orientation. In
Haptics: Perception, Devices, Mobility, and Communication, Vol. 7282 of Lecture Notes on Computer
Science, edited by P. Isokoski and J. Springare, pp. 239–246 (Berlin Heidelberg: Springer-Verlag).
Kappers, A. M. L., and Viergever, R. F. (2006). Hand orientation is insufficiently compensated for in haptic
spatial perception. Exp Brain Res 173(3): 407–414.
Kennedy, J. R. (1993). Drawing & the Blind: Pictures to Touch (New Haven, CT: Yale University Press).
Klatzky, R. L., Lederman, S. J., and Metzger, V. A. (1985). Identifying objects by touch: an ‘expert system’.
Percept Psychophys 37(4): 299–302.
Klatzky, R. L., Loomis, J. M., Lederman, S. J., Wake, H., and Fujita, N. (1993). Haptic identification of
objects and their depictions. Percept Psychophys 54(2): 170–178.
Krishna, A. (2006). Interaction of senses: the effect of vision versus touch on the elongation bias. J Consum
Res 32(4): 557–566.
Langfeld, H. S. (1917). The differential spatial limen for finger span. J Exp Psychol 2(6): 416–430.
Lechelt, E. C., and Verenka, A. (1980). Spatial anisotropy in intramodal and cross-modal judgments of
stimulus orientation: the stability of the oblique effect. Perception 9(5): 581–589.
Lederman, S. J., and Jones, L. A. (2011). Tactile and haptic illusions. IEEE Trans Haptics 4(4): 273–294.
Lederman, S. J., and Klatzky, R. L. (1987). Hand movements: a window into haptic object recognition.
Cogn Psychol 19(3): 342–368.
Tactile and Haptic Perceptual Organization 637

Lederman, S. J., and Klatzky, R. L. (2009). Haptic perception: a tutorial. Atten Percept Psychophys
71(7): 1439–1459.
Lederman, S. J., Klatzky, R. L., Chataway, C., and Summers, C. D. (1990). Visual mediation and the haptic
identification of 2-dimensional pictures of common objects. Percept Psychophys 47(1): 54–64.
Loomis, J. M., Klatzky, R. L., and Lederman, S. J. (1991). Similarity of tactual and visual picture
recognition with limited field of view. Perception 20(2): 167–177.
Loomis, J. M., and Lederman, S. J. (1986). Tactual perception. In Cognitive Processes and Performance,
Vol. 2 of Handbook of Perception and Human Performance, edited by K. R. Boff, L. Kaufman, and
J. P. Thomas, Chapter 31, 31.1–31.41 (New York: John Wiley & Sons).
Magee, L. E., and Kennedy, J. M. (1980). Exploring pictures tactually. Nature 283: 287–288.
Millar, S., and Al-Attar, Z. (2002). The Mu¨ller-Lyer illusion in touch and vision: implications for
multisensory processes. Percept Psychophys 64(3): 353–365.
Murray, D., Ellis, R., Bandomir, C., and Ross, H. (1999). Charpentier (1891) on the size–weight illusion.
Atten Percept Psychophys 61: 1681–1685.
Newport, R., Rabb, B., and Jackson, S. R. (2002). Noninformative vision improves haptic spatial
perception. Curr Biol 12(19): 1661–1664.
Norman, J. F., Norman, H. F., Clayton, A. M., Lianekhammy, J., and Zielke, G. (2004).The visual and
haptic perception of natural object shape. Percept Psychophys 66(2): 342–351.
Overvliet, K., Krampe, R., and Wagemans, J. (2012). Perceptual grouping in haptic search: the influence of
proximity, similarity, and good continuation. J Exp Psychol Hum Percept Perform 38(4): 817–821.
Panday, V., Bergmann Tiest, W. M., and Kappers, A. M. L. (2012). Influence of local properties on haptic
perception of global object orientation. IEEE Trans Haptics 5: 58–65.
Pawluk, D., Kitada, R., Abramowicz, A., Hamilton, C., and Lederman, S. J. (2010). Haptic figure-ground
differentiation via a haptic glance. In IEEE Haptics Symposium, 25–26 March, Waltham Massachusetts,
USA, 63–66.
Picard, D., and Lebaz, S. (2012). Identifying raised-line drawings by touch: a hard but not impossible task. J
Visual Impair Blindness 106(7): 427–431.
Plaisier, M. A., Bergmann Tiest, W. M., and Kappers, A. M. L. (2008). Haptic pop-out in a hand sweep.
Acta Psychol 128: 368–377.
Plaisier, M. A., Bergmann Tiest, W. M., and Kappers, A. M. L. (2009). One, two, three, many—subitizing
in active touch. Acta Psychol 131(2): 163–170.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1997). Haptic curvature discrimination at several
regions of the hand. Percept Psychophys 59(8): 1225–1240.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1998). Anisotropy in haptic curvature and shape
perception. Perception 27(5): 573–589.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1999). Similar mechanisms underlie curvature
comparison by static and dynamic touch. Percept Psychophys 61(5): 874–894.
Proske, U., and Gandevia, S. C. (2009). The kinesthetic senses. J Physiol 587(17): 4139–4146.
Robertson, A. (1902). Studies from the Psychological Laboratory of the University of California VI
‘Geometric-optical’ illusions in touch. Psychol Rev 9: 549–569.
Robles-De-La-Torre, G., and Hayward, V. (2001). Force can overcome object geometry in the perception of
shape through active touch. Nature 412(6845): 445–448.
Rossetti, Y., Gaunet, F., and Thinus-Blanc, C. (1996). Early visual experience affects memorization and
spatial representation of proprioceptive targets. NeuroReport 7(6): 1219–1223.
Stevens, S. S., and Stone, G. (1959). Finger span: ratio scale, category scale and JND scale. J Exp Psychol
57(2): 91–95.
Suzuki, K., and Arashida, R. (1992). Geometrical haptic illusions revisited—haptic illusions compared with
visual illusions. Percept Psychophys 52(3): 329–335.
638 Kappers and Bergmann Tiest

Terada, K., Kumazaki, A., Miyata, D., and Ito, A. (2006). Haptic length display based on
cutaneous-proprioceptive integration. J Robot Mechatron 18(4): 489–498.
van der Horst, B. J., Duijndam, M. J. A., Ketels, M. F. M., Wilbers, M. T. J. M., Zwijsen, S. A., and
Kappers, A. M. L. (2008a). Intramanual and intermanual transfer of the curvature aftereffect. Exp Brain
Res 187(3): 491–496.
van der Horst, B. J., and Kappers, A. M. L. (2008). Using curvature information in haptic shape perception
of 3D objects. Exp Brain Res 190(3): 361–367.
van der Horst, B. J., Willebrands, W. P., and Kappers, A. M. L. (2008b). Transfer of the curvature aftereffect
in dynamic touch. Neuropsychologia 46(12): 2966–2972.
van Polanen, V., Bergmann Tiest, W. M., and Kappers, A. M. L. (2012). Haptic search for hard and soft
spheres. PLOS One 7(10): e45298
von Helmholtz, H. (1867/1962). Treatise on Physiological Optics, Vol. 3 (English transl. by J. P. C. Southall)
for the Optical Society of America (1925) from the 3rd German edn of Handbuch der physiologischen
Optik (New York: Dover).
Vogels, I. M. L. C., Kappers, A. M. L., and Koenderink, J. J. (1996). Haptic aftereffect of curved surfaces.
Perception 25(1): 109–119.
Vogels, I. M. L. C., Kappers, A. M. L., and Koenderink, J. J. (1997). Investigation into the origin of the
haptic after-effect of curved surfaces. Perception 26: 101–107.
Volcic, R., and Kappers, A. M. L. (2008). Allocentric and egocentric reference frames in the processing of
three-dimensional haptic space. Exp Brain Res 188(2): 199–213.
Volcic, R., Kappers, A. M. L., and Koenderink, J. J. (2007). Haptic parallelity perception on the
frontoparallel plane: the involvement of reference frames. Percept Psychophys 69(2): 276–86.
von Skramlik, E. (1937). Psychophysiologie der Tastsinne (Leipzig: Akademische Verlagsgesellschaft).
Weber, E. H. (1834/1986). E.H. Weber on the Tactile Senses, H. E. Ross and D. J. Murray edition
(Hove: Erlbaum (UK) Taylor & Francis).
Weinstein, S. (1968). Intensive and extensive aspects of tactile sensitivity as a function of body part, sex,
and laterality. In The Skin Senses, edited by D. Kenshalo, pp. 195–222 (Springfield, IL: Thomas).
Wijntjes, M. W. A., Sato, A., Hayward, V., and Kappers, A. M. L. (2009). Local surface orientation
dominates haptic curvature discrimination. IEEE Trans Haptics 2(2): 94–102.
Wijntjes, M. W. A., van Lienen, T., Verstijnen, I. M., and Kappers, A. M. L. (2008). The influence of picture
size on recognition and exploratory behavior in raised-line drawings. Perception 37(4): 602–614.
Zuidhoek, S., Kappers, A. M. L., van der Lubbe, R. H. J., and Postma, A. (2003). Delay improves
performance on a haptic spatial matching task. Exp Brain Res 149(3): 320–330.
Zuidhoek, S., Visser, A., Bredero, M. E., and Postma, A. (2004). Multisensory integration mechanisms in
haptic space perception. Exp Brain Res 157(2): 265–268.
Chapter 31

Cross-modal perceptual organization


Charles Spence

Introduction
The last quarter of a century or so has seen a dramatic resurgence of research interest in the ques-
tion of how sensory inputs from different modalities are combined, merged, and/or integrated,
and, more generally, come to affect one another in perception (see Bremner et al. 2012; Stein
2012; Stein et al. 2010, for reviews). Until very recently, however, the majority of this research,
inspired as it often has been by neurophysiological studies of orienting responses in model brain
systems, such as the superior colliculus, has tended to use simple stimuli (e.g., a single beep,
flash, and/or tactile stimulus) on any given trial (see Stein & Meredith 1993 for a review). As a
result, to date, problems of perceptual organization have generally taken something of a back seat
in the world of multisensory perception research. That said, there has recently been a surge of
scientific interest in trying to understand how the perceptual system (normally in humans) deals
with, or organizes, more complex streams/combinations of multisensory inputs into meaningful
perceptual units, and how ambiguous (often bistable) inputs are interpreted over time. In trying
to answer such questions, it is natural that researchers look for inspiration in the large body of
empirical research that has been published over the last century on the Gestalt grouping prin-
ciples identified within the visual (Beck 1982; Kimchi et al. 2003; Kubovy & Pomerantz 1981;
Wagemans et al. 2012; Wertheimer 1923/1938; see also the many other chapters in this publi-
cation), auditory (Bregman 1990; Wertheimer 1923/38; see also Denham in this publication),
and occasionally tactile systems (Gallace & Spence 2011; see also ‘Tactile and haptic perceptual
organization’ by Kappers & Tiest). One might reasonably imagine that those classic grouping
principles, such as common fate, binding by proximity, and binding by similarity, that have been
shown to influence perceptual organization when multiple stimuli are presented within the same
sensory modality should also operate when combinations of stimuli originating from different
sensory modalities are presented instead.
In this review, the evidence concerning the existence of general principles of cross-modal per-
ceptual organization and multisensory Gestalt grouping is summarized. The focus here is pri-
marily on cross-modal perceptual organization and multisensory Gestalten for the spatial (some
would say ‘higher’) senses of audition, vision, and touch. Given the space constraints, this review
will focus primarily on the results of research that has been published more recently.1 The main
body of the text is arranged around a review of the evidence that is relevant to answering four key
questions that run through the literature on cross-modal perceptual organization.

1  Researchers interested in more of a historical perspective should see Spence et al. (2007) and/or Spence and
Chen (2012).
640 Spence

Four key questions in the study of cross-modal


perceptual organization
Q1: Does the nature of the perceptual organization (or interpretation) of stimuli taking place in one
sensory modality influence the perceptual organization (or interpretation) of stimuli presented in
another modality?
Researchers have typically addressed this first question by investigating whether there is any cor-
relation between the perceptual organization/interpretation of an ambiguous (typically bistable)
stimulus (or stream of stimuli) in one modality and the perceptual organization/interpretation of
an ambiguous (typically bistable) stimulus (or stream of stimuli) presented simultaneously in a
different sensory modality (e.g., Hupé et al. 2008; O’Leary & Rhodes 1984).
In what is perhaps the most-often cited early paper on this topic, O’Leary and Rhodes (1984)
presented participants with a six-element bistable auditory display and/or with a six-element
bistable visual display. The auditory display consisted of a sequence of tones alternating in pitch,
while the visual display consisted of an alternating sequence of dots presented from one of two
sets of elevations on a monitor (see Figure 31.1). The onsets of the auditory and visual stimuli
were synchronized. The spacing (in pitch and elevation) and the interstimulus interval between
the successive stimuli in these displays was manipulated until participants’ perception of whether
there appeared to be a single stream of stimuli, alternating in either pitch (audition) or eleva-
tion (vision), versus two distinct streams (presented at different pitches and/or elevations) itself
alternated on a regular basis over time. The specific question that O’Leary and Rhodes wanted to
address in their study was whether their participants’ perception of one versus two streams in a
given sensory modality (say audition) would influence their judgements regarding the number of
streams perceived in the other modality (e.g., vision). Confirming their predictions, the results
did indeed demonstrate that the number of streams that participants reported in one modality
was sometimes influenced by the number of streams that they were currently experiencing (or at
least reported experiencing) in the other modality.
O’Leary and Rhodes (1984) interpreted their findings as providing some of the first empirical
evidence to support the claim that the perceptual organization in one sensory modality affects
the perceptual organization of any (plausibly-related) stimuli that may happen to be presented
in another modality.2 However, most researchers writing since seem convinced that an alterna-
tive non-perceptual explanation (in terms of response bias) might explain the findings just as
well (e.g., Cook & Van Valkenburg 2009; Kubovy & Yu 2012; Spence & Chen 2012; Spence et al.
2007; Vroomen & De Gelder 2000). What is more, in one of the only other studies to have directly
addressed this first question, a negative result was obtained.
In particular, the participants in a study by Hupé et al. (2008) were presented with bistable audi-
tory and visual displays either individually or at the same time. These researchers examined the
statistics of the perceptual alternations that took place in each modality stream when presented
individually (that is, unimodally) and compared them to the pattern of reversals seen when the
stimuli were presented in both modalities simultaneously. The idea was that if the perceptual
organization of the stimuli in one sensory modality was to carry over and influence any perceptual
organization in the other modality, then the statistics of perceptual reversals should change, and/
or be correlated under conditions of multisensory stimulation. However, Hupé et al. found no
such evidence in two experiments.

  Note that the stimulus displays capitalized on the cross-modal correspondence between pitch and elevation
2

(see Spence 2011 for a review).


Cross-modal perceptual organization 641

Physical display
(a) Auditory Visual (b)
stimuli stimuli

Vertical position in vision


Frequency in audition;
T1
T5 Upper T1 T1 Upper
stimuli T5 stimuli
T3 T3 T3
40 ‒ 320

8 ‒ 72
mm
Hz

T2 T2 Lower
T2 T6 stimuli
ratio = 1.06
Frequency

Lower T4 T4
T6
4 mm

stimuli
T4
Time

(c) One-object percept (slow rate) (d) Two-object percept (fast rate)
Vertical position in vision

Vertical position in vision


Frequency in audition;

Frequency in audition;

Time Time
Fig. 31.1  (a, b) Schematic illustration of the sequence of auditory and visual stimuli presented by O’Leary
and Rhodes (1984) in their study of cross-modal influences on perceptual organization. T1–T6 indicate
the temporal order (from first to last) in which the six stimuli were presented in each sensory modality.
Half of the stimuli were from an upper group (frequency in sound, spatial location in vision), the rest
from a lower group. The stimuli were presented in sequence, alternating between events from the
upper and lower groups, either delivered individually (unimodal condition) or else together in synchrony
(in the cross-modal condition). (c, d) Perceptual correlates associated with different rates of stimulus
presentation. In either sensory modality, at slow rates of stimulus presentation (c), a single stream
(auditory or visual) was perceived, as shown by the continuous line connecting the points. At faster rates
of stimulus presentation (d), however, two separate streams were perceived concurrently, one in the
upper range (frequency or spatial position, for sound or vision, respectively) and the other in the lower
range. In the cross-modal condition, at intermediate rates of stimulus presentation, participants’ reports
of whether they perceived one stream versus two in a given sensory modality were influenced by their
perception of there being one or two streams in the other modality. O’Leary and Rhodes took these
results to show that the nature of the perceptual organization in one sensory modality can influence
how the perceptual scene may be organized (or segregated) in another modality.
Reproduced from Stein, Barry E., ed., The New Handbook of Multisensory Processing, figure 14.1, © 2012
Massachusetts Institute of Technology, by permission of The MIT Press.

The visual stimuli in Hupé et al.’s (2008) first experiment consisted of a network of crossing
lines (square wave gratings) viewed through a circular aperture. This display could either be per-
ceived as two gratings moving in opposite directions or as a single plaid moving in an inter-
mediate direction. Meanwhile, pure tones alternating in frequency in the pattern High (pitch)/
Low/High-High/Low/High could be presented over headphones. The participants either heard
642 Spence

two segregated streams (High-High-High, and --Low---Low--) or a single stream with the pitch
alternating from item to item. While the statistics of switching between alternative perceptual
interpretations were similar for the two modalities, there was absolutely no correlation between
the perceptual switches taking place in audition and vision.
This first experiment can, though, be criticized on the grounds that the participants would
have had no particular reason to treat the auditory and visual stimuli as belonging to the same
object or event (that is, they were completely unrelated). Hence, the fact that Hupé et al. (2008)
obtained a null result is perhaps not so surprising. In a second experiment, the auditory and visual
stimuli were spatiotemporally correlated: the auditory stimuli were as in Experiment 1, but were
now presented in an alternating sequence from one of a pair of loudspeaker cones, one placed on
either side of central fixation. The visual stimuli consisted of the illumination of an LED placed in
front of either loudspeaker that could be perceived either as two lights flashing independently, or
else could give rise to the perception of horizontal visual apparent motion. However, once again,
there was no evidence of any correlation between the perceptual switches taking place in the two
modalities. Therefore, despite the fact that the spatiotemporal presentation of the auditory and
visual stimuli was correlated in this study, the participants would presumably not have had any
particularly good reason to bind the contents of their visual and auditory experience.
One other study that is worth mentioning here comes from Sato et al. (2007). They investigated
the auditory and visual verbal transformation effect. In the auditory version of this phenomenon
(see Warren & Gregory 1958), as a participant listens to a speech stimulus that is played repeatedly,
such as the word ‘life’, after a number of repetitions, it alternates and the observer will likely hear
it as ‘fly’ instead. As time passes by, the percept alternates back and forth. Sato et al. discovered
that the same thing happens if we look at moving lips repeatedly uttering the same syllable instead
(this is known as the visual transformation effect). Sato and his colleagues presented auditory
alone, visual alone, and audiovisual stimulus combinations (either congruent or incongruent).
The participants were instructed to report their initial auditory ‘percept’, and whenever it changed
over the course of the 90 seconds of each trial. In Sato et al.’s study, either /psә/ or /sәp/ were used
as the speech stimuli. The results of their first experiment revealed that the incongruent audio-
visual condition, where the visual stimulus alternated between being congruent and incongruent
with what was heard, resulted in a higher rate of perceptual alternations as compared to any of the
other three conditions. Note here that what is seen and what is heard may be taken by participants
to refer to the same phonological entity. In fact, Kubovy and Yu (2012) have argued recently that
this (speech) may constitute a unique case when it comes to multisensory multistability.3
To date, the only studies that have attempted to investigate the question of whether the perceptual
organization taking place in one modality affects the perceptual organization taking place in the other
have involved the presentation of audiovisual stimuli (Hupé et al. 2008; O’Leary & Rhodes 1984; Sato
et al. 2007). It is interesting to speculate, then, on whether a similar conclusion would also have been
reached on the basis of visuotactile studies.4 There is currently surprisingly little unequivocal support

  One final thing to note here is that it is unclear from Sato et al.’s (2007) study whether their participants ever
3

experienced the audiovisual stimulus stream as presenting one stimulus auditorily and another visually, as
sometimes happens in McGurk-type experiments.
  One way to test this possibility would be to look for correlations in the changing interpretation of bistable
4

spatial displays such as the Ternus display (Harrar & Harris 2007; cf. Shi et al. 2010), or in simultaneously pre-
sented visual and tactile apparent motion quartets (Carter et al. 2008). Suggestive evidence from Harrar and
Harris, not to mention one’s own intuition, would appear to suggest that if the appropriate stimulus timings
Cross-modal perceptual organization 643

for the view that the perceptual organization (or interpretation) of an ambiguous, or bistable, stimulus
(or stimuli) in one sensory modality will necessarily, and automatically, affect the perceptual organi-
zation (or interpretation) of a stimulus (or stimuli) that happens to be presented in another modality
at around the same time (even when the auditory and visual stimuli can plausibly be related to one
another—e.g., as a result of their cross-modal correspondence, see O’Leary & Rhodes 1984, or due to
their spatiotemporal patterning, see Hupé et al. 2008; see also Kubovy & Yu 2012).

Q2: Does intramodal perceptual grouping modulate cross-modal perceptual grouping?


One of the best-known studies to have addressed the question of whether intramodal perceptual
grouping modulates cross-modal interactions was reported by Watanabe and Shimojo (2001).
The participants in their studies had to report whether two discs that started each trial moving
directly towards each other on a screen looked as though they streamed through each other (the
more common percept when the display is viewed in silence) or else bounced off one another.
This is known as the stream/bounce illusion (Metzger 1934; Michotte 1946/1963). Previously, it
had been demonstrated that if a sound is presented at the moment when the two discs meet, the
likelihood of participants reporting bouncing increases (Sekuler et al. 1997). Now the innovative
experimental manipulation in Watanabe and Shimojo’s study involved demonstrating that the
magnitude of this cross-modal effect was modulated by the strength of any intramodal grouping
taking place within the auditory modality. More specifically, these researchers found that if the
sound presented at the moment of ‘impact’ happened to be embedded within a stream of similar
regularly temporally-spaced tones, then participants reported fewer bounce percepts. However,
the incidence of bounce percepts increased once again if the other tones in the auditory sequence
had a markedly different frequency from the ‘impact’ tone.
Further support for the claim that the cross-modal effect of an auditory stimulus on visual
perception can be modulated by the strength of any intramodal auditory perceptual grouping has
also been demonstrated in a number of other studies, utilizing a variety of experimental para-
digms (e.g., Ngo & Spence 2010; Vroomen & de Gelder 2000). Additionally, other researchers
have reported that the magnitude of the temporal ventriloquism effect5 is modulated by any per-
ceptual grouping that happens to be taking place in the auditory modality (Keetels et al. 2007; see
also Cook & Van Valkenburg 2009).
But what about any cross-modal effects operating in the reverse direction? Does the perceptual
grouping taking place within the visual modality also modulate the cross-modal influence of
vision on auditory perception? The answer would appear to be in the affirmative. The majority of
the work on this particular issue has been conducted using variations of ‘the cross-modal dynamic
capture task’. In a typical study, participants try to discriminate the direction in which an auditory
apparent motion stream moved (i.e., judging whether a pair of sequentially-presented sounds
appeared to move from left to right or vice versa; see Herzog & Ogmen in this publication, on the

could be established, such that synchronous stimulus presentation was maintained while both modality inputs
retained their individual bistability, then any switch in the perceptual interpretation of the visual display
would likely also trigger a switch in the interpretation of the tactile display (one might certainly frame such a
result in terms of visual dominance).
  The temporal ventriloquism effect has most frequently been demonstrated between pairs of auditory and
5

visual stimuli. It occurs when the perceived timing of an event in one modality (normally vision) is pulled
toward temporal alignment with a slightly asynchronous event presented in another modality (e.g., audition;
see Morein-Zamir et al. 2003; Vroomen et al. 2004).
644 Spence

topic of apparent motion). At the same time, the participants are instructed to ignore any cues
delivered by the simultaneous presentation of an irrelevant visual (or, on occasion, tactile) appar-
ent motion stream (see Soto-Faraco et  al. 2004b for a review). The results of numerous stud-
ies have now demonstrated that people simply cannot ignore the visual apparent motion (even
though it may be entirely task-irrelevant), and will often report that they perceived the sound as
moving in the same direction, even if the opposite was, in fact, the case (e.g., Soto-Faraco et al.
2002). As hinted at already, similar cross-modal dynamic capture effects have also been reported
in experiments involving the presentation of tactile stimuli as well, both when tactile apparent
motion happens to act as the target modality, and when it acts as the to-be-ignored distractor
modality (Lyons et al. 2006; Sanabria et al. 2005b; Soto-Faraco et al. 2004a).
One other area of research that is relevant to the question of cross-modal perceptual organiza-
tion relates to the local versus global perceptual grouping taking place within a given modality
and its effect on perceptual organization within another sensory modality. For instance, Sanabria
et al. (2004) demonstrated the dominance of global field effects over local visual apparent motion
when the two were pitted directly against each other in the setting of the cross-modal dynamic
capture task (see Figure 31.2). In this particular experiment, the four-lights display (see Figure
31.2B) induced the impression of two pairs of lights moving in one direction, while the central
pair of lights (if considered in isolation) appeared to move in the opposite direction. In other
words, if the local motion of the two central lights was from right to left, the global motion of the
four-light display was from left to right instead. However, Sanabria et al.’s results revealed that it
was the direction of global visual motion that ‘captured’ the perceived direction of auditory appar-
ent motion (see also Sanabria et al. 2005a).

(a) Incongruent trial Congruent trial

2-lights
T2 T2

T1 T1
Light Sound
(b)

4-lights
T2 T2

T1 T1

Fig. 31.2  Schematic illustration of the different trial types presented in Sanabria et al.’s (2004)
study of the effect of local versus global visual perceptual grouping on the cross-modal dynamic
capture effect. The horizontal arrows indicate the (global) direction of visual apparent motion.
The magnitude of the cross-modal dynamic capture effect was significantly greater in the 2-lights
displays (a) than in the 4-lights displays (b). More importantly for present purposes though, the
results also revealed that the modulatory cross-modal effect of visual apparent motion on the
perceived direction of auditory apparent motion was determined by the global direction of visual
apparent motion rather than by the local motion of the central pair of lights (which appeared to
move in the opposite direction).
Data from Daniel Sanabria, Salvador Soto-Faraco, Jason S. Chan, and Charles Spence, When does visual
perceptual grouping affect multisensory integration? Cognitive, Affective, and Behavioural Neuroscience, 4(2),
pp. 218–29, 2004.
Cross-modal perceptual organization 645

Elsewhere, Rahne et al. (2008) have used an alternating high/low tone sequence, similar to that
used by O’Leary and Rhodes (1984), to demonstrate the effect of visual segmentation cues on audi-
tory stream segregation. The participants in their study either saw a circle presented in synchrony
with every third tone (thus being paired successively with a high tone, then with a low tone, then
with a high tone, etc.) or else they saw a square that appeared in synchrony with just the low-pitched
tones. The likelihood that the participants would perceive the auditory sequence as a single stream
was significantly higher in the former (circle) condition than in the latter (square) condition.
In terms of visuotactile interactions, Yao et al. (2009) have investigated whether the presenta-
tion of visual information would affect the cutaneous rabbit illusion (Geldard & Sherrick 1972).
They placed tactile stimulators at either end of a participant’s arm. LEDs were also placed at the
same locations, as well as at the ‘illusory’ locations where the tactile stimuli are generally per-
ceived to have been presented following the activation of the tactors (in this case, at the interven-
ing position, along the arm). Yao et al. reported that the activation of the lights that mimicked the
hopping percept strengthened the tactile illusion, while the activation of the lights at the veridical
locations of tactile stimulation weakened it. This result shows that the tactile grouping underly-
ing the cutaneous rabbit illusion can be modulated by concurrently presented visual information,
even if it is not relevant to the participant’s task.
At this point, it is worth noting that the majority of studies reported thus far in the text have
involved situations in which the conditions for intramodal perceptual grouping were established
prior to the presentation of the critical cross-modal stimuli (e.g., see Ngo & Spence 2010; Vroomen &
De Gelder 2000; Watanabe & Shimojo 2001; Yao et  al. 2009). However, it turns out that even
when the situation is temporally reversed, and the strength of intramodal perceptual grouping is
modulated by any stimuli that happen to be presented after the critical cross-modal stimuli, the
story remains unchanged (e.g., see Sanabria et al. 2005b). Thus, it would appear that intramodal
perceptual grouping normally tends to take precedence over cross-modal perceptual grouping
(see also Cook & Van Valkenburg 2009 for a similar conclusion).
In summary, then, a relatively large body of empirical evidence involving a range of different
behavioural paradigms has by now convincingly demonstrated that as the strength of intramodal
perceptual grouping increases, the magnitude of any cross-modal effects on visual, auditory, or
tactile perception are reduced. Thus, the answer to the second of the questions posed in this chapter
would appear to be unequivocally in the affirmative: that is, the strength of intramodal perceptual
grouping can indeed modulate the strength/magnitude of cross-modal interactions (at least when
the stimuli can be meaningfully related to one another; cf. Cook & Van Valkenburg 2009).
Before moving on, it should be noted that a large body of research shows that the rate of stimulus
presentation in one sensory modality can influence the perceived rate of presentation of stimuli
delivered in another modality (e.g., Gebhard & Mowbray 1959; Recanzone 2003; Wada et al. 2003;
Welch et al. 1986). However, as highlighted by Spence et al. (2007), given the high rates of stimulus
presentation used in the majority of studies in this area, it could plausibly be argued that most of
the results that have been published to date actually tell us more about cross-modal influences on
the perception of a discrete stimulus attribute (e.g., the flicker or flutter rate) rather than necessar-
ily telling us anything meaningful about the cross-modal constraints on perceptual organization.
An argument could certainly be made here that it is only when the stimuli are presented at rates
that are slow enough to allow for the individuation of the elements within the relevant stimulus
streams, and thus the matching of those elements across sensory modalities, that the results of
such research will really start to say anything interesting about cross-modal perceptual organiza-
tion (rather than just being relevant to researchers interested in multisensory integration).
Relevant to this discussion is research by Fujisaki and Nishida (e.g., Fujisaki & Nishida 2010).
They conducted a number of studies demonstrating that people can only really pair (or bind) pairs
646 Spence

of auditory, visual, and/or tactile stimulus streams cross-modally (i.e., in order to make in/out-of-
phase judgements) when the stimuli in those streams are presented at rates that do not exceed
4 Hz.6 If we take this as a legitimate argument (and I am the first to flag up that some may find it
controversial), then the majority of research on cross-modal influences on rate perception and on
flicker/flutter thresholds may, ultimately, turn out not to be relevant to the topic of cross-modal
perceptual organization (see also Benjamins et al. 2008).

Q3: Do intersensory Gestalten exist?


The first question to address here is ‘What exactly are intersensory Gestalten?’ Well, the termin-
ology is certainly muddled and confusing, with different researchers using different terms for what
may well turn out to be the same underlying concept. Gilbert (1938, 1941) was perhaps the first
to introduce the notion when he wrote: ‘. . . we must also reckon with the total field properties. This
involves the superimposition of one pattern of stimulation upon a heteromodal pattern, with a result-
ing new complex ‘inter-sensory Gestalt’ in which the properties of the original patterns are modified’
(Gilbert 1941, p. 401). Several decades later, Allen and Kolers (1981, p. 1318) talked of a ‘com-
mon or suprasensory organizing principle’. More recently still, Kubovy and Yu (2012, p. 963) have
introduced the notion of ‘trans-modal Gestalts’. What is, however, common to all of these various
suggestions is the idea that there may be some sort of multisensory (or supramodal) organization
(or structure), which, importantly, isn’t present in any of the constituent sensory modalities when
considered individually (see Spence & Chen 2012; Spence et al. 2007). However, over and above
any problem of terminology, the key issue is that despite occasional claims that such intersensory
Gestalten exist (e.g., Harrar et al. 2008; Zapparoli & Reatto 1969), there is surprisingly little con-
crete (i.e., uncontroversial) evidence in their favour (Allen & Kolers 1981; Sanabria et al. 2005b;
Spence & Bayne 2015).
To give but one example of the sort of approach that has been used by researchers in recent
times, let’s take the study reported by Huddleston et al. (2008; Experiment 3). These research-
ers presented a series of auditory and visual stimuli from four locations arranged on a virtual
clock face (e.g., with visual stimuli at 12 and 6, and auditory stimuli at 3 and 9; see Figure
31.3). The visual and auditory stimuli were presented sequentially at a range of temporal
rates. At the appropriate timings, the participants were clearly able to perceive visual appar-
ent motion vertically and auditory apparent motion horizontally. That said, the participants
never reported any circular cross-modal (or intermodal) apparent motion (despite being able
to determine whether the stimuli were being presented in a clockwise or counter-clockwise
sequence). Huddleston et al.’s results therefore provide evidence against the existence of inter-
modal Gestalten.
By contrast, a somewhat different conclusion was reached by Harrar et al. (2008). They pre-
sented pairs of stimuli, one from either side of fixation. The two stimuli could both be visual, both
tactile, or there might be one visual and one tactile stimulus. The stimuli alternated repeatedly,
and participants had to rate the strength of any apparent motion between them. The participants
gave a numerical response that was between 0 (‘No apparent motion’) and 6 (indicating ‘Strong
apparent motion’), across a range of interstimulus intervals (ISIs). The results revealed that the
strength of apparent motion was modulated by the ISI. As one might have expected, the visual

6  The one modality pairing where this limit did not apply was for cross-modal interactions between audi-
tory and tactile stimuli. There phase judgements are possible at stimulus presentation rates as high as 12 Hz
(Fujisaki & Nishida 2010).
Cross-modal perceptual organization 647

Loudspeaker

LED

Visual apparent
motion (observed)
Auditory apparent
motion (observed)

Intermodal apparent
motion (anticipated)
Fig. 31.3  Schematic illustration of the stimulus displays used to investigate the possibility of an
intersensory motion Gestalt (i.e., supramodal apparent motion) by Huddleston et al. (2008). When
the interstimulus intervals were adjusted appropriately, participants reported visual apparent motion
(vertically), auditory apparent motion (horizontally), but there were no reports of any circular
supramodal (or intermodal) apparent motion, thus providing evidence against the existence of an
intersensory Gestalt, at least in this case of audiovisual apparent motion.

apparent motion was stronger than the tactile motion. However, the interesting result for present
purposes was that mean ratings of the strength of apparent motion, while much weaker than
intramodal motion, were significantly greater than 0 for the cross-modal trials at many of the ISIs
tested. However, one could imagine that if Allen and Kolers (1981) were still writing, they might
not be convinced by such effects based, as they are, on self-report. It would seem plausible that
task demands might have played some role in modulating how participants respond in this kind
of task. Thus, more objective data using a more indirect task would certainly be useful in order
to convince the sceptic. However, on the other hand, Harrar et al. might want to argue that there
is, in fact, nothing fundamentally wrong with using subjective ratings to assess the strength of
apparent motion.
Researchers have also looked for evidence to support the existence of intersensory Gestalten in
the area of intersensory rhythm perception. The idea here is that it might be possible to experience
a cross-modal (or intermodal) rhythm that is not present in any one of the component unisensory
stimulus streams. However, just as for the other studies already mentioned, a closer look at the
literature reveals that while claims of intermodal rhythm perception certainly do exist (Guttman
et al. 2005), there is actually surprisingly little reliable psychophysical evidence to back up such
assertions. Furthermore, many authors have explicitly argued against the possibility of intermodal
rhythm perception (e.g., Fraisse 1963). Perhaps the strongest evidence in support of such a claim
comes from recent research on the perception of musical metre.
Huang et al. (2012) have recently provided some intriguing evidence that appears to suggest that
people can efficiently extract the musical metre (defined as the abstract temporal structure corre-
sponding to the periodic regularities of the music) from a temporal sequence of elements, some of
which happen to be presented auditorily, others via the sense of touch. Importantly, here, the metre
information was not available to either modality stream when considered in isolation. Huang et al.’s
results can therefore be taken as providing support for the claim that audiotactile musical metre per-
ception constitutes one of the first genuinely intersensory Gestalten to have been documented to date.
In conclusion, despite a number of attempts having been made over the decades, there is still
surprisingly little scientific evidence to support the claim that intersensory (or cross-modal)
648 Spence

Gestalten really do exist (see Guttman et al. 2005, p. 234; Huddleston et al. 2008).7 That said, both
of the examples just described (Harrar et al. 2008; Huang et al. 2012) might be taken to challenge
the conclusion forwarded recently by Spence and Chen (2012) that truly intersensory Gestalten
do not exist (see also Spence & Bayne 2015). One suggestion here as to why they may be so elu-
sive in laboratory studies (and presumably also in daily life) is that the nature of the experience
that we have in each of the senses is so fundamentally different that it may make cross- or trans-
modal Gestalten particularly difficult, if not impossible, to achieve or find (see Kubovy & Yu 2012;
Spence & Bayne 2015, on this point; though see Aksentijević et al. 2001; Julesz & Hirsh 1972;
Lakatos & Shepard 1997, for evidence that similar grouping principles may structure our experi-
ence in the different modalities).

Q4: Can cross-modal correspondences be considered as examples of intersensory Gestalten?


Cross-modal correspondences have been defined as compatibility effects between attributes, or
dimensions, of stimuli (i.e., objects and events) in different sensory modalities (be they redun-
dant or not; Spence 2011). Cross-modal correspondences have often been documented between
polarized stimulus dimensions, such that a more-or-less extreme stimulus on a given dimension
in one modality should be compatible with a more-or-less extreme value on the corresponding
dimension in another modality. So, for example, increasing auditory pitch tends to be associated
with higher elevations, smaller objects, and lighter visual stimuli (see Spence 2011). What is more,
the presentation of cross-modally corresponding pairs of stimuli often gives rise to a certain feel-
ing of ‘rightness’, despite the fact that there may be no objective truth about the matter (cf. Koriat
2008). Recently, cross-modally congruent combinations of stimuli have been shown to give rise to
enhanced multisensory integration, as compared to when incongruent pairings of stimuli are pre-
sented (see Guzman-Martinez et al. 2012; Parise & Spence 2009; see also Sweeny et al. 2012). And
when it comes to the discussion of perceptual organization, it is worth noting that cross-modally
corresponding stimuli have often been presented in previous studies (e.g., O’Leary & Rhodes,
1984; see also Gebhard & Mowbray, 1959).8
To give an example, research by Parise and Spence (2009) has highlighted the perceptual con-
sequences of playing with the well-documented cross-modal correspondence that exists between
auditory pitch and the size of (in this case visually-perceived) objects. People normally associate
smaller objects with higher-pitched sounds and larger objects with lower-pitched sounds (e.g.,
Parise & Spence 2012). The participants in the first of Parise and Spence’s (2009) studies had to
make unspeeded perceptual judgements regarding the temporal order in which a pair of audi-
tory or visual stimuli had been presented. The stimulus onset asynchrony in the cross-modal
temporal order judgement task was varied on a trial-by-trial basis using the method of constant

  Those working in the field of flavour perception often suggest that flavours constitute a form of multisensory
7

Gestalt (e.g., Delwiche 2004; Small & Green 2011; Spence et al. 2012; Verhagen & Engelen 2006). If such a
claim were to be true, then this could constitute another example of (genuinely intermodal) perceptual group-
ing. However, it is difficult to determine whether many of the authors making such claims really mean any-
thing more by the suggestion that flavour is a Gestalt than merely that the combination of gustatory, retronasal
olfactory, and trigeminal inputs give rise to an emergent property, or object, that is, the flavour of a food or
beverage that happens to be localized to the mouth. There really isn’t time to do justice to these questions here,
but the interested reader is directed to Kroeze for further discussion of this issue.
  It is perhaps worth noting that cross-modal causality also plays an important role in audiovisual integration
8

(see Armontrout et al. 2009; Kubovy & Schutz 2010; Schutz & Kubovy 2009).
Cross-modal perceptual organization 649

stimuli. The pair of visual and auditory stimuli presented on each trial were either cross-modally
congruent (i.e., a smaller circle was presented together with a higher-pitched sound or a larger
circle with a lower-pitched sound) or else they were incongruent (i.e., a smaller circle was paired
with a lower-pitched sound or a larger circle paired with a higher-pitched sound). The results
revealed that participants found it significantly harder to report the temporal order in which
the stimuli had been presented on the cross-modally congruent trials as compared to on the
cross-modally incongruent trials. The same pattern of results was also documented in a sec-
ond experiment in which the cross-modal correspondence between visual shape (angularity)
and auditory pitch/waveform was assessed. In a final study, Parise and Spence (2009) went on
to demonstrate a larger spatial ventriloquism effect for pairs of spatially-misaligned auditory
and visual stimuli when they were cross-modally congruent than when they were incongruent.
The results demonstrate enhanced spatiotemporal integration (as measured by the temporal and
spatial ventriloquism effects), thus leading to poorer temporal and spatial resolution of the com-
ponent unimodal stimuli, on cross-modally congruent as opposed to cross-modally incongruent
trials. Such findings suggest that cross-modal correspondences, which can perhaps be thought
of as a form of cross-modal Gestalt grouping by similarity, influence multisensory perception/
integration.
A growing number of studies published over the last few years have also demonstrated that the
perception of a bistable or ambiguous stimulus on one modality (normally vision) can be biased
by the information presented in another sensory modality, usually audition (e.g., Conrad et al.
2010; Guzman-Martinez et al. 2012; Kang & Blake 2005; Takahashi & Watanabe 2010, 2011; Van
Ee et al. 2009) but, on occasion, touch/haptics (see Binda et al. 2010; Bruno et al. 2007; Lunghi
et al. 2010). Often, such studies have contrasted pairings of stimuli that do, or do not, correspond
cross-modally. So, for example, in one study, the frequency of an amplitude-modulated auditory
stimulus was shown to bias subjective reports (e.g., in the binocular rivalry situation) toward
one of two competing visual stimuli (gratings) whose phase and contrast modulation frequency
happened to match that of the sound (see Kang & Blake 2005). Similarly, exploring an oriented
grooved surface haptically can also bias a participant’s perception in the binocular rivalry situa-
tion toward a congruently (as opposed to an orthogonally) oriented visual image (grating) of the
same spatial frequency (see Binda et al. 2010; Lunghi et al. 2010).
Thus, taken together, the latest evidence on the topic of cross-modal correspondences demon-
strates that when the stimuli presented in different sensory modalities correspond, there may be
perceptual interactions observed that are not present when the stimuli are incongruent (either
because they are incongruent, or else because they are simply unrelated to the stimuli/task that a
participant has been given to perform; Sweeny et al. 2012). What is more, there is also a feeling
of rightness that accompanies the pairing of stimuli that correspond cross-modally (which isn’t
there for pairs of stimuli that do not correspond; Koriat 2008). Such correspondences need not
be based on a perceptual mapping, but they often are. What is more, they can often affect both
perceptual organization and awareness. Such phenomena can be conceptualized in terms of the
Gestalt grouping based on similarity. Indeed, cross-modal correspondences have been described
as cross-modal similarities by some researchers (e.g., see Marks 1987a, b).9

9  Note here that there is likely also an interesting link to questions of perceptual organization in synaesthesia
proper (with which cross-modal correspondences are often confused; though see Deroy & Spence 2013) and
their potential use within the burgeoning literature on sensory substitution (see Styles & Shimojo in this pub-
lication).
650 Spence

Conclusions
The latest evidence from a number of psychophysical studies of cross-modal scene perception and
perceptual organization that have been reviewed in this chapter provides some answers to the four
questions that were outlined at the start of this piece. First, it would appear that the perceptual
organization of the stimuli taking place in one sensory modality does not automatically influ-
ence the perceptual organization of stimuli presented in another sensory modality (Hupé et al.
2008; O’Leary & Rhodes 1984), except perhaps in the case of speech (Sato et al. 2007; see also
Kubovy & Yu 2012). Second, intramodal perceptual grouping frequently modulates the strength
of cross-modal perceptual grouping (or interactions; Soto-Faraco et al. 2002; see Spence & Chen
2012 for a review). The evidence suggests that unimodal auditory, visual, and tactile perceptual
grouping can, and do, affect the cross-modal interactions taking place between auditory and vis-
ual stimuli. Finally, there is currently little convincing evidence for the existence of intersensory
Gestalten (see Allen & Kolers 1981; Huddleston et al. 2008), despite various largely anecdotal or
introspective claims to the contrary (e.g., see Harrar et al. 2008; Zapparoli & Reatto 1969). We
should keep in mind that several of the latest findings might nevertheless require us to revise this
view (see Harrar et al. 2008; Huang et al. 2012; Yao et al. 2009, on this question). Finally, I have
reviewed the latest evidence showing that cross-modal correspondences (Spence 2011), which
sometimes modulate both perceptual organization and awareness, can be conceptualized in terms
of cross-modal grouping by similarity.
It would seem probable that our understanding of the cross-modal constraints on perceptual
organization will likely be furthered in the coming years by animal (neurophysiological) studies
(see Rahne et al. 2008 for one such study). Furthermore, although beyond the scope of the pre-
sent study, it should also be noted that attention is likely to play an important role in cross-modal
perceptual organization (see Kimchi & Razpurker-Apfeld 2004; Sanabria et al. 2007; Talsma et al.
2010; and the chapters by Alais, Holcombe, Humphreys, and Rees in this publication). What does
seem clear already, though, is that cross-modal perceptual organization is modulated by Gestalt
grouping principles such as grouping by spatial proximity, common fate, and similarity just as in
the case of intramodal perception.

References
Aksentijević, A., Elliott, M.A., and Barber, P.J. (2001). ‘Dynamics of Perceptual Grouping: Similarities in
the Organization of Visual and Auditory Groups’. Visual Cognition 8: 349–358.
Allen, P. G., and Kolers, P. A. (1981). ‘Sensory Specificity of Apparent Motion’. Journal of Experimental
Psychology: Human Perception and Performance 7: 1318–1326.
Armontrout, J. A., Schutz, M., and Kubovy, M. (2009). ‘Visual Determinants of a Cross-modal Illusion’.
Attention, Perception, & Psychophysics 71: 1618–1627.
Beck, J. (Ed.) (1982). Organization and Representation in Vision (Hillsdale, NJ: Erlbaum).
Benjamins, J. S., van der Smagt, M. J., and Verstraten, F. A. J. (2008). ‘Matching Auditory and Visual
Signals: Is Sensory Modality Just Another Feature?’ Perception 37: 848–858.
Binda, P., Lunghi, C., and Morrone, C. (2010). ‘Touch Disambiguates Rivalrous Perception at Early Stages
of Visual Analysis’. Journal of Vision 10(7): 854.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge,
MA: MIT Press).
Bremner, A., Lewkowicz, D., and Spence, C. (Eds.) (2012). Multisensory Development (Oxford: Oxford
University Press).
Cross-modal perceptual organization 651

Bruno, N., Jacomuzzi, A., Bertamini, M., and Meyer, G. (2007). ‘A Visual-haptic Necker Cube Reveals Temporal
Constraints on Intersensory Merging During Perceptual Exploration’. Neuropsychologia 45: 469–475.
Carter, O., Konkle, T., Wang, Q., Hayward, V., and Moore, C. (2008). ‘Tactile Rivalry Demonstrated with
an Ambiguous Apparent-motion Quartet’. Current Biology 18: 1050–1054.
Conrad, V., Bartels, A., Kleiner, M., and Noppeney, U. (2010). ‘Audiovisual Interactions in Binocular
Rivalry’. Journal of Vision 10(10): 1–15.
Cook, L. A., and Van Valkenburg, D. L. (2009). ‘Audio-visual Organization and the Temporal
Ventriloquism Effect Between Grouped Sequences: Evidence that Unimodal Grouping Precedes
Cross-modal Integration’. Perception 38: 1220–1233.
Delwiche, J. (2004). ‘The Impact of Perceptual Interactions on Perceived Flavor’. Food Quality and
Preference 15: 137–146.
Deroy, O., and Spence, C. (2013). ‘Weakening the Case for “Weak Synaesthesia”: Why Crossmodal
Correspondences are not Synaesthetic’. Psychonomic Bulletin & Review 20: 643–664.
Fraisse, P. (1963). The Psychology of Time (London: Harper & Row).
Fujisaki, W., and Nishida, S. (2010). ‘A Common Perceptual Temporal Limit of Binding Synchronous
Inputs Across Different Sensory Attributes and Modalities’. Proceedings of the Royal Society B
277: 2281–2290.
Gallace, A., and Spence, C. (2011). ‘To What Extent do Gestalt Grouping Principles Influence Tactile
Perception?’ Psychological Bulletin 137: 538–561.
Gebhard, J. W., and Mowbray, G. H. (1959). ‘On Discriminating the Rate of Visual Flicker and Auditory
Flutter’. American Journal of Psychology 72: 521–528.
Geldard, F. A., and Sherrick, C. E. (1972). ‘The Cutaneous “Rabbit”; A Perceptual Illusion’. Science
178: 178–179.
Gilbert, G. M. (1938). ‘A Study in Inter-sensory Gestalten’. Psychological Bulletin 35: 698.
Gilbert, G. M. (1941). ‘Inter-sensory Facilitation and Inhibition’. Journal of General Psychology 24: 381–407.
Guttman, S. E., Gilroy, L. A., and Blake, R. (2005). ‘Hearing What the Eyes See: Auditory Encoding of
Visual Temporal Sequences’. Psychological Science 16: 228–235.
Guzman-Martinez, E., Ortega, L., Grabowecky, M., Mossbridge, J., and Suzuki, S. (2012). ‘Interactive
Coding of Visual Spatial Frequency and Auditory Amplitude-modulation Rate’. Current Biology
22: 383–388.
Harrar, V., and Harris, L. R. (2007). ‘Multimodal Ternus: Visual, Tactile, and Visuo-tactile Grouping in
Apparent Motion’. Perception 10: 1455–1464.
Harrar, V., Winter, R., and Harris, L. R. (2008). ‘Visuotactile Apparent Motion’. Perception & Psychophysics
70: 807–817.
Huang, J., Gamble, D., Sarnlertsophon, K., Wang, X., and Hsiao, S. (2012). ‘Feeling Music: Integration of
Auditory and Tactile Inputs in Musical Meter Perception’. PLoS ONE 7(10): e48496.
Huddleston, W. E., Lewis, J. W., Phinney, R. E., and DeYoe, E. A. (2008). ‘Auditory and Visual
Attention-based Apparent Motion Share Functional Parallels’. Perception & Psychophysics 70: 1207–1216.
Hupé, J. M., Joffoa, L. M., and Pressnitzer, D. (2008). ‘Bistability for Audiovisual Stimuli: Perceptual
Decision is Modality Specific’. Journal of Vision 8(7): 1–15.
Julesz, B., and Hirsh, I. J. (1972). ‘Visual and Auditory Perception—An Essay of Comparison’. In Human
Communication: A Unified View, edited by E. E. David, Jr., and P. B. Denes (Eds.), pp. 283–340
(New York: McGraw-Hill).
Kang, M.-S., and Blake, R. (2005). ‘Perceptual Synergy Between Seeing and Hearing Revealed During
Binocular Rivalry’. Psichologija 32: 7–15.
Keetels, M., Stekelenburg, J., and Vroomen, J. (2007). ‘Auditory Grouping Occurs Prior to Intersensory
Pairing: Evidence From Temporal Ventriloquism’. Experimental Brain Research 180: 449–456.
652 Spence

Kimchi, R., Behrmann, M., and Olson, C. R. (Eds.). (2003). Perceptual Organization in Vision: Behavioral
and Neural Perspectives (Mahwah, NJ: Erlbaum).
Kimchi, R., and Razpurker-Apfeld, I. (2004). ‘Perceptual Grouping and Attention: Not All Groupings are
Equal’. Psychonomic Bulletin & Review 11: 687–696.
Koriat, A. (2008). ‘Subjective Confidence in One’s Answers: The Consensuality Principle’. Journal of
Experimental Psychology: Learning, Memory, and Cognition 34: 945–959.
Kubovy, M., and Pomerantz, J. J. (Eds.) (1981). Perceptual Organization (Hillsdale, NJ: Erlbaum).
Kubovy, M., and Schutz, M. (2010). ‘Audio-visual Objects’. Review of Philosophy & Psychology 1: 41–61.
Kubovy, M., and Yu, M. (2012). ‘Multistability, Cross-modal Binding and the Additivity of Conjoint
Grouping Principles’. Philosophical Transactions of the Royal Society B 367: 954–964.
Lakatos, S., and Shepard, R. N. (1997). ‘Constraints Common to Apparent Motion in Visual, Tactile,
and Auditory Space’. Journal of Experimental Psychology: Human Perception & Performance
23: 1050–1060.
Lunghi, C., Binda, P., and Morrone, M. C. (2010). ‘Touch Disambiguates Rivalrous Perception at Early
Stages of Visual Analysis’. Current Biology 20: R143–R144.
Lyons, G., Sanabria, D., Vatakis, A., and Spence, C. (2006). ‘The Modulation of Crossmodal Integration by
Unimodal Perceptual Grouping: A Visuotactile Apparent Motion Study’. Experimental Brain Research
174: 510–516.
Marks, L. E. (1987a). ‘On Cross-modal Similarity: Auditory-visual Interactions in Speeded Discrimination’.
Journal of Experimental Psychology: Human Perception and Performance 13: 384–394.
Marks, L. E. (1987b). ‘On Cross-modal Similarity: Perceiving Temporal Patterns by Hearing, Touch, and
Vision’. Perception & Psychophysics 42: 250–256.
Metzger, W. (1934). ‘Beobachtungen über Phänomenale Identität (Studies of Phenomenal Identity)’.
Psychologische Forschung 19: 1–60.
Michotte, A. (1946/1963). The Perception of Causality (London: Methuen).
Morein-Zamir, S., Soto-Faraco, S., and Kingstone, A. (2003). ‘Auditory Capture of Vision: Examining
Temporal Ventriloquism’. Cognitive Brain Research 17: 154–163.
Ngo, M., and Spence, C. (2010). ‘Crossmodal facilitation of masked visual target identification’. Attention,
Perception, & Psychophysics 72: 1938–1947.
O’Leary, A., and Rhodes, G. (1984). ‘Cross-modal Effects on Visual and Auditory Object Perception’.
Perception & Psychophysics 35: 565–569.
Parise, C., and Spence, C. (2009). ‘When Birds of a Feather Flock Together: Synesthetic Correspondences
Modulate Audiovisual Integration in Non-synesthetes’. PLoS ONE 4(5): e5664.
Parise, C. V., and Spence, C. (2012). ‘Audiovisual Crossmodal Correspondences and Sound Symbolism: An
IAT Study’. Experimental Brain Research 220: 319–333.
Rahne, T., Deike, S., Selezneva, E., Brosch, M., König, R., Scheich, H., Böckmann, M., and Brechmann,
A. (2008). ‘A Multilevel and Cross-modal Approach Towards Neuronal Mechanisms of Auditory
Streaming’. Brain Research 1220: 118–131.
Recanzone, G. H. (2003). ‘Auditory Influences on Visual Temporal Rate Perception’. Journal of
Neurophysiology 89: 1078–1093.
Sanabria, D., Soto-Faraco, S., Chan, J. S., and Spence, C. (2004). ‘When Does Visual Perceptual Grouping
Affect Multisensory Integration?’ Cognitive, Affective, & Behavioral Neuroscience 4: 218–229.
Sanabria, D., Soto-Faraco, S., Chan, J. S., and Spence, C. (2005a). ‘Intramodal Perceptual Grouping
Modulates Multisensory Integration: Evidence from the Crossmodal Congruency Task’. Neuroscience
Letters 377: 59–64.
Sanabria, D., Soto-Faraco, S., and Spence, C. (2005b). ‘Assessing the Effect of Visual and Tactile Distractors
on the Perception of Auditory Apparent Motion’. Experimental Brain Research 166: 548–558.
Cross-modal perceptual organization 653

Sanabria, D., Soto-Faraco, S., and Spence, C. (2007). ‘Spatial Attention Modulates Audiovisual Interactions
in Apparent Motion’. Journal of Experimental Psychology: Human Perception and Performance
33: 927–937.
Sato, M., Basirat, A., and Schwartz, J. (2007). ‘Visual Contribution to the Multistable Perception of Speech’.
Perception & Psychophysics 69: 1360–1372.
Schutz, M., and Kubovy, M. (2009). ‘Causality and Cross-modal Integration’. Journal of Experimental
Psychology: Human Perception & Performance 35: 1791–1810.
Sekuler, R., Sekuler, A. B., and Lau, R. (1997). ‘Sound Alters Visual Motion Perception’. Nature
385: 308.
Shi, Z., Chen, L., and Müller, H. (2010). ‘Auditory Temporal Modulation of the Visual Ternus Display: The
Influence of Time Interval’. Experimental Brain Research 203: 723–735.
Small, D. M., and Green, B. G. (2011). ‘A Proposed Model of a Flavour Modality’. In Frontiers in the Neural
Bases of Multisensory Processes, edited by M. M. Murray and M. Wallace, pp. 705–726 (Boca Raton,
FL: CRC Press).
Soto-Faraco, S., Lyons, J., Gazzaniga, M., Spence, C., and Kingstone, A. (2002). ‘The Ventriloquist in
Motion: Illusory Capture of Dynamic Information Across Sensory Modalities’. Cognitive Brain Research
14: 139–146.
Soto-Faraco, S., Spence, C., and Kingstone, A. (2004a). ‘Congruency Effects Between Auditory and
Tactile Motion: Extending the Phenomenon of Crossmodal Dynamic Capture’. Cognitive, Affective, &
Behavioral Neuroscience 4: 208–217.
Soto-Faraco, S., Spence, C., Lloyd, D., and Kingstone, A. (2004b). ‘Moving Multisensory Research
Along: Motion Perception Across Sensory Modalities’. Current Directions in Psychological Science
13: 29–32.
Spence, C. (2011). ‘Crossmodal Correspondences: A Tutorial Review’. Attention, Perception, & Psychophysics
73: 971–995.
Spence, C., and Bayne, T. (2015). ‘Is Consciousness Multisensory?’ In D. Stokes, M. Matthen and S. Biggs
(Eds.), Perception and its modalities (pp. 95–132). Oxford: Oxford University Press.
Spence, C., and Chen, Y.-C. (2012). ‘Intramodal and Crossmodal Perceptual Grouping’. In The New
Handbook of Multisensory Processing, edited by B. E. Stein, pp. 265–282 (Cambridge, MA: MIT Press).
Spence, C., Ngo, M., Percival, B., and Smith, B. (2012). ‘Crossmodal Correspondences: Assessing Shape
Symbolism for Cheese’. Food Quality & Preference 28: 206–12.
Spence, C., Sanabria, D., and Soto-Faraco, S. (2007). ‘Intersensory Gestalten and Crossmodal Scene
Perception’. In Psychology of Beauty and Kansei: New Horizons of Gestalt Perception, edited by
K. Noguchi, pp. 519–579 (Tokyo: Fuzanbo International).
Stein, B. E. (Ed.) (2012). The New Handbook of Multisensory Processing (Cambridge, MA: MIT Press).
Stein, B. E., and Meredith, M. A. (1993). The Merging of the Senses (Cambridge, MA: MIT Press).
Stein, B. E., Burr, D., Costantinides, C., Laurienti, P. J., Meredith, A. M., Perrault, T. J., et al. (2010).
‘Semantic Confusion Regarding the Development of Multisensory Integration: A Practical Solution’.
European Journal of Neuroscience 31: 1713–1720.
Sweeny, T. D., Guzman-Martinez, E., Ortega, L., Grabowecky, M., and Suzuki, S. (2012). ‘Sounds
Exaggerate Visual Shape’. Cognition 124: 194–200.
Takahashi, K., and Watanabe, K. (2010). ‘Implicit Auditory Modulation on the Temporal Characteristics of
Perceptual Alternation in Visual Competition’. Journal of Vision 10(4): 1–13.
Takahashi, K., and Watanabe, K. (2011). ‘Visual and Auditory Influence on Perceptual Stability in Visual
Competition’. Seeing and Perceiving 24: 545–564.
Talsma, D., Senkowski, D., Soto-Faraco, S., and Woldorff, M. G. (2010). ‘The Multifaceted Interplay
Between Attention and Multisensory Integration’. Trends in Cognitive Sciences 14: 400–410.
654 Spence

van Ee, R., van Boxtel, J. J. A., Parker, A. L., and Alais, D. (2009). ‘Multisensory Congruency
as a Mechanism for Attentional Control over Perceptual Selection’. Journal of Neuroscience,
29: 11 641–11 649.
Verhagen, J. V., and Engelen, L. (2006). ‘The Neurocognitive Bases of Human Multimodal Food
Perception: Sensory Integration’. Neuroscience and Biobehavioral Reviews 30: 613–650.
Vroomen, J., and de Gelder, B. (2000). ‘Sound Enhances Visual Perception: Cross-modal Effects of
Auditory Organization on Vision. Journal of Experimental Psychology: Human Perception and
Performance 26: 1583–1590.
Vroomen, J., Keetels, M., de Gelder, B., and Bertelson, P. (2004). ‘Recalibration of Temporal Order
Perception by Exposure to Audio-visual Asynchrony’. Cognitive Brain Research 22: 32–35.
Wada, Y., Kitagawa, N., and Noguchi, K. (2003). ‘Audio-visual Integration in Temporal Perception’.
International Journal of Psychophysiology 50: 117–124.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt,
R. (2012). ‘A Century of Gestalt Psychology in Visual Perception. I. Perceptual Grouping and
Figure-ground Organization’. Psychological Bulletin 138: 1218–1252.
Warren, R. M., and Gregory, R. L. (1958). ‘An Auditory Analogue of the Visual Reversible Figure’. American
Journal of Psychology 71: 612–613.
Watanabe, K., and Shimojo, S. (2001). ‘When Sound Affects Vision: Effects of Auditory Grouping on Visual
Motion Perception’. Psychological Science 12: 109–116.
Welch, R. B., DuttonHurt, L. D., and Warren, D. H. (1986). ‘Contributions of Audition and Vision to
Temporal Rate Perception’. Perception & Psychophysics 39: 294–300.
Wertheimer, M. (1923/1938). ‘Laws of Organization in Perceptual Forms’. In A Source Book of Gestalt
Psychology, edited by W. Ellis, pp. 71–88 (London: Routledge & Kegan Paul).
Yao, R., Simons, D., and Ro, T. (2009). ‘Keep Your Eye on the Rabbit: Cross-modal Influences on the
Cutaneous Rabbit Illusion’. Journal of Vision 9: 705.
Yau, J. M., Olenczak, J. B., Dammann, J. F., and Bensmaia, S. J. (2009). ‘Temporal Frequency Channels are
Linked across Audition and Touch’. Current Biology 19: 561–566.
Zapparoli, G. C., and Reatto, L. L. (1969). ‘The Apparent Movement Between Visual and Acoustic Stimulus
and the Problem of Intermodal Relations’. Acta Psychologica 29: 256–267.
Chapter 32

Sensory substitution:
A new perceptual experience
Noelle R. B. Stiles and Shinsuke Shimojo

Introduction
The theme of this book, ‘perceptual organization’, asks how sensory inputs are organized into an
integrated, structured percept. Whereas most of the chapters do so in a single modality, several
chapters including this one and the one by Spence (this volume) ask the same question across
modalities. We may rephrase it as how cross-modal organization generates our unique perceptual
experience. Individual modalities have been traditionally isolated as specific sensations, yet all
senses are seamlessly blended into a holistic experience in the typical daily environment. Where
is the line segregating each modality? Is vision visual because the information comes from the
retina, or could it be ‘vision’ if the information derives from an image even if it is encoded by
a sound? As recent studies have shown evidence for the processing of both auditory and tactile
information in visual cortex (Bavelier and Neville 2002; Cohen et al. 1997; Collignon et al. 2009;
Sadato et al. 1996), the definition of vision in the brain has become increasingly blurry. Sensory
substitution (SS) encodes an image into a sound or tactile stimulation, and trained subjects have
been found not only to utilize the stimulus to coordinate adaptive behavior, but also to process
it in early visual areas. Some superusers of a sensory substitution device have further claimed to
subjectively experience a vision-like perception associated with device usage (Ward and Meijer
2010). This chapter will not only go over the technical and historical perspective of SS, but will
also more importantly highlight the implications of SS to cross-modal plasticity and the potential
of SS to reveal cross-modal perceptual organization.
Sensory substitution is processed like vision at cortical levels, but is transduced by audition (or
somatosensation) at receptor levels, thus it should be considered neither pure vision nor audi-
tion/somatosensation, but rather a third type of subjective sensation, or ‘qualia’. If perceptual
experience in sensory substitution is unique, do the same visual primitives hold? Are these visual
primitives fundamental to all vision-like processing, or are they dependent on the visual sen-
sory transduction process? Several other questions fundamental to the essential nature of visual
experience also become feasible to investigate with this new broader definition of ‘visual’ process-
ing, such as holistic vs. local processing, static vs. dynamic recognition and depth perception, and
perception based on purely sensory vs. sensory-motor neural processing. Studies with sensory
substitution attempt to aid the blind by understanding these questions and thereby improving
both SS devices and the users’ quality of life. Further, these investigations advance neuroscience
by demonstrating the roles that neural plasticity and sensory integration play in the organization
of visual perception. In short, the SS provides scientists and philosophers with a new artificial
dimension to examine perceptual organization processes.
656 Stiles and Shimojo

Historical and Technical Overview


Sensory substitution was designed as an aid to help the blind recover normal mobility and daily
task functionality. Over 300 million people are visually impaired worldwide, with 45 million
entirely blind (World Health Organization 2009). The majority of the blind acquire blindness late
in life (Resnikoff et al. 2004), but congenital blindness, or blindness inflicted near birth, still affects
one out of every 3300 children in developed countries (Bouvrie and Sinha 2007). While spe-
cialized therapies, surgeries, and medication make most blindness preventable, often blindness
cannot be ameliorated after the neural damage is complete. Therefore, several types of electronic
prosthetic devices (such as retinal prostheses) have been designed that take over the function
of the damaged neural circuitry by stimulating still-functional visual neurons (Humayun et al.
2003; Merabet et al. 2005; Stiles et al. 2010; Winter et al. 2007). However, these devices are inva-
sive and are still in development. An alternative approach is sensory substitution, which encodes
visual information into a signal perceived by another still-functional sensory modality, such as
somatosensation of the skin or audition. Extensive cross-modal plasticity then enables the brain
to interpret the tactile sensations and sounds visually.
Tactile sensation was first used by sensory substitution to transmit visual spatial information.
The Tactile Visual Substitution System (TVSS) device used stimulators embedded in the back of a
dental chair that were fed video by a camera mounted on a tripod (Bach-y-Rita et al. 1969). With
TVSS, six blind participants were anecdotally able to ‘discover visual concepts such as perspective,
shadows, shape distortion as a function of viewpoint, and apparent change in size as a function
of distance’ (Bach-y-Rita et al. 1969, pp. 963–964). TVSS was later modified into the Brainport
device that stimulates the tongue surface (Bach-y-Rita et al. 1998) in order to reduce stimulation
voltages and energy requirements as well as to utilize the high tactile resolution there.
Audition has also been used for sensory substitution with multiple types of encodings into
sound. Early devices such as the vOICe and PSVA devices used a direct brightness to volume and
pixel location to sound frequency transformation. The vOICe device encodes an image by rep-
resenting vertical position as distinct frequencies, horizontal position as scan time (left to right),
and the brightness of individual pixels as volume (Meijer 1992) (Figure 32.1). The Prosthesis
Substituting Vision by Audition (PSVA) device assigns a specific frequency to each pixel, and
encodes brightness with volume (Arno et al. 2001; Capelle et al. 2002). More recent devices such
as the Computer Aided System for Blind People (CASBliP) and the Michigan Visual Sonification
System (MVSS) have used 3-D sound (encoded with head-related transfer functions) to encode
the spatial location of objects (Araque et al. 2008; Clemons et al. 2012).
Despite a diverse array of sensory substitution devices, none are currently commercially avail-
able or have a large user population. The limited commercial success of sensory substitution is
likely due to the long duration (and substantial effort) required to learn a variety of basic visual
tasks, and to the limited functionality realized once training is completed. Furthermore, a large
part of the training improvement on psychophysical tests appears due to top-down executive con-
trol and concentration of attention, even at the intermediate to advanced stages.1 Recent devices
such as the MVSS and CASBliP hope to increase subject function and decrease training time by
changing device encodings from vision-centric to audition-centric. By encoding spatial location
in auditory coordinates, these devices exploit existing hardwired processing in auditory cortex

1  Discussion of the ‘effort’ and ‘practice’ required for sensory substitution learning implies top-down attention
(Browne 2003, p. 277). Further the lack of blind subject ‘confidence’ due to ‘long experimental time’ indicates
slow conscious processing rather than automatic perception (Dunai 2010, p. 84).
Sensory Substitution 657

The vOICe device


Louder in left ear Louder in right ear
Scan time (left to right)
High
Brighter pixels
are louder

Frequency
Portable computer
Low
Audio
vOICe output
software Video
input

Fig. 32.1  Schematic diagram of the vOICe device, which encodes an image into sound in real time.
A subject wears a pair of glasses with a camera attached that transmits live video to a portable
computer. The computer runs the vOICe software, transforming the image into a soundscape by
encoding the brightness of pixels into loudness of a sound frequency range that is high for upper
pixels and progressively lower for middle and bottom pixels. This column of pixels is scanned across
the image at one Hz with stereo panning (the scan rate is adjustable). The soundscape representing
an image frame is communicated to the user via headphones.

while conveying useful information about obstacles. An alternative method to reducing training
time and enhancing performance may be improvement of training methods, such as training
that exploits intrinsic cross-modal correspondences (Pratt 1930; Spence 2011; Stevens and Marks
1965) making devices more intuitive as will be elaborated later in this chapter.

Sensory Substitution as a Cross-modal Interaction


Regardless of the specific encoding employed, sensory substitution is intrinsically
cross-modal, as the information from the transducing modality is communicated to
visual cortex for processing by means of neural plasticity engendered through training. The
cross-modal interactions utilized by sensory substitution exist as both hardwired developmen-
tal connections and plasticity-induced changes in adulthood. For example, the Illusory Flash or
Double Flash Illusion (in which a single flash accompanied by two short sounds is perceived to
be doubled) seems to be lower-level-sensory, since the illusion is relatively immune to at least
certain cognitive factors, such as a feedback, reward, etc. (Andersen et al. 2004; Mishra et al.
2007; Rosenthal et al. 2009; Shams et al. 2000). This illusion demonstrates that the modality
carrying the more discontinuous therefore salient signal becomes the influential or modulat-
ing modality (Shams et al. 2002; Shimojo and Shams 2001). It has also been shown that a wide
variety of cross-modal information is combined such that the resulting variance is minimized
thereby mimicking maximum likelihood estimation (MLE) models (Ernst and Banks 2002).
Ernst and Banks were able to conclude from MLE that the modality that dominates in cross-
modal information integration is the one with the lowest variance. As for the plasticity-induced
changes, it has been proposed that the brain, including the visual cortex, may be ‘metamodal’,
658 Stiles and Shimojo

such that brain regions are segregated by processing of different types of information and not by
stimulus modality (Pascual-Leone and Hamilton 2001). The metamodal theory of the brain was
supported by the activation of the shape-decoding region, Lateral Occipital tactile-visual area
(LOtv), by audition when shape was conveyed by vOICe encoded sounds (Amedi et al. 2007).
Modalities are also plastic after development and can generate learned relations across senses,
as witnessed in visual activation during echolocation, sound localization, and braille reading in
the blind (late blind vs. early blind) (Bavelier and Neville 2002; Cohen et al. 1997; Collignon et al.
2009; Sadato et al. 1996). Braille reading activated primary visual cortex (BA 17) and extrastri-
ate cortices bilaterally in blind subjects (Sadato et al. 1996). Repetitive Transcranial Magnetic
Stimulation (rTMS) was used to deactivate visual cortical regions in blind braille experts and
generated errors in braille interpretation (Cohen et al. 1997). These results demonstrate a func-
tional and causal link between visual activation and the ability to read braille in the blind. Other
studies provide even more evidence for plasticity in the handicapped such as enhanced visual
ERPs (Event Related Potentials) in early-onset deaf (Neville et al. 1983; Neville and Lawson 1987),
auditory ERPs in the posterior (occipital) region in early and late blind (Kujala et al. 1995), and
posterior DC potentials in blind by tactile reading (Uhl et al. 1991).
Perceptual organization usually refers to Gestalt principles, such as proximity-based (both in
space and time) grouping/segregation, regularity, and Prägnanz (good shape). Vision, audition,
and somatosensation have partly the same, but partly different (unique) perceptual organization
rules. For example, segregation or chunking rules operate across modalities in the same way at
the most abstract level, but indeed it could be spatial in vision but temporal in audition (Bregman
and Campwell 1971; Neri and Levi 2007; Vroomen and De Gelder 2000; see also Denham and
Winkler, this volume). SS provides opportunity to investigate what would happen to such percep-
tual organization rules when between-modality connectivity is enhanced by training. To be more
specific, questions including: (a) would the auditory or the tactile modality acquire vision-like
perceptual organization rules and (b) would cross-modal combinations themselves self-organize
and generate new cross-modal organization principles, can be investigated in detail with sensory
substitution.
Existing literature on cross-modal interactions is a guide to understanding and interpreting
the visual nature of sensory substitution processing. Sensory substitution also requires plastically
generating new learned relationships across modalities, but it may also rely on existing develop-
mental connections. In fact, SS might modulate the strength of existing developmental connec-
tions, and thereby alter cross-modal perception, even in sighted subjects. Ideally, the training of
participants can exploit these existing cross-modal interactions and mappings to enable effortless
training and signal interpretation. In addition, training on SS devices should take into account
cross-modal interaction variance across both functional and experimental subject groups, includ-
ing the early blind with no visual experience, the late blind who have limited visual experience,
and the sighted with normal visual perception (Bavelier and Neville 2002; Poirier et al. 2007b).

Phenomenological Evidence for ‘Vision-like’ Processing


Sensory substitution generates activation in the primary visual cortex, and in addition may also
generate a vision-like perceptual experience, or have visual qualia in select long-term users. (Note
that we only refer to the absolute unique quality of subjective perceptual experience here, regard-
less of whether the neural basis of qualia is a ‘hard problem’ or not, as D. Chalmers (1995) has
postulated.) In particular, late-blind vOICe user PF claims to have a visual experience with a
sensory substitution device, and to even have color fill-in from previous visual experiences (Ward
and Meijer 2010). PF remembers colors in familiar items such as a strawberry, which she describes
Sensory Substitution 659

as a ‘red color with yellow seeds all around it and a green stalk’; whereas for unfamiliar objects her
brain ‘guesses’ at the color such as ‘greyish black’ for a sweater, and occasionally reduces the object
detail to a line drawing (Ward and Meijer 2010, p. 497). When rTMS was applied to her visual cor-
tex, she claimed to have the visual experience damped, causing her to ‘carefully listen to the details
of the soundscapes’ instead of having an automatic ‘seeing’ sensation, qualitatively linking visual
activation to ‘visual’ characteristics of the subjective experience (Merabet et al. 2009, p. 136). The
vOICe ‘visual’ experience according to PF:
‘Just sound? . . . No, it is by far more, it is sight! . . . When I am not wearing the vOICe, the light I perceive
from a small slit in my left eye is a grey fog. When wearing the vOICe the image is light with all the little
greys and blacks . . . The light generated is very white and clear, then it erodes down the scale of color to
the dark black.’
Ward and Meijer 2010, p. 495

Subject PF has not been the only blind user who has reported visual experiences with sensory sub-
stitution devices. A study with eighteen blind subjects and ten sighted controls found that in the
last three weeks of a three month training period, seven blind subjects claimed to perceive phos-
phenes while using a tactile sensory substitution device (Ortiz et al. 2011). Four out of seven sub-
jects with visual experiences retained light perception; they ranged in blindness onset from one
to 35 years old. In most cases the phosphenes appeared in the shape and angle of the line stimulus
tactilely presented; the ‘visual’ perception over time dominated the tactile perception (Ortiz et al.
2011). The blind group with ‘visual’ experience had activation in occipital lobe regions such as BA
17, 18, and 19 measured via electroencephalography (EEG); in contrast, the non-phosphene blind
subjects did not have visual activation (Ortiz et al. 2011).
Tactile devices have been studied for distal attribution of users (i.e. the externalization of the
stimulus) as defined by: (1) the coupling of subject movement and stimulation; (2) the presence of
an external object; and (3) the existence of ‘perceptual space’ (Auvray et al. 2005). Distal attribu-
tion was tested on sixty subjects naïve to the auditory sensory substitution device and its encod-
ing. Subjects moved freely with headphones, webcam attached, and a luminous object in hand
and in some conditions were provided an object to occlude the luminous object. A link between
subject’s actions and auditory stimulation was often perceived, this coupling perception occurred
more often than perception of distal object or environmental space.
Key questions about ‘visual’ sensations with sensory substitution remain. These include the
connection between ‘visual’ perception and functionality with the device, showing if ‘visual’ qual-
ity of experience enhances recognition and localization with sensory substitution. The cause of
visual perception with sensory substitution is also still unclear. Is ‘visual perception’ via sensory
substitution just mediated by primary visual areas, or do prefrontal and higher visual cortices play
a key role? Further, a quantitative rTMS study of Ortiz’s subjects that have ‘visual’ experience may
show if the visual cortical activation is necessary for their visual perception of sensory substitu-
tion stimuli. Deactivation of prefrontal regions (via rTMS) might demonstrate if those regions are
a part of a top-down cognitive network necessary to the distinctively unique subjective experience
of ‘visual’ nature with sensory substitution.
A major complication in visual activation and ‘visual’ perception with sensory substitution is
the role of visualization, particularly in the late blind. The late blind have experienced vision and
therefore are more familiar with visual principles but also have the ability to activate visual cortex
via visualization, or a mental effort to visually imagine a scene/object. PF is late blind (blindness
onset at age of twenty-one years) and five out of seven of Ortiz’s blind subjects with ‘visual’ percep-
tion had blindness onset at the age of four years or later (Ortiz et al. 2011). Therefore, it is possible
660 Stiles and Shimojo

that the visual activation in these late-blind subjects is due to top-down cognitive visualization
rather than an automatic ‘visual’ perception. The major evidence against visualization was limited
to the qualitative claims that (1) the ‘visual’ perception happens automatically, and (2) (in Ortiz’s
subjects) that tactile sensations fade and ‘visual’ perception dominates. A quantitative study of the
automaticity of ‘visual’ perception with sensory substitution device (i.e. does it occur even when
top-down attention is distracted) may further clarify the role of visualization in sensory substitu-
tion ‘visual’ experience. It will no doubt provide empirical seeds for theoretical reconsideration of
the subjective aspects of perception, including the issue of ‘qualia’.

Functional and Psychological Evidence


for ‘Vision-like’ Processing
In order for sensory substitution to be visual, it must also mimic the functional and psychological
aspects of vision, or the organization and hierarchy of visual processing, that allow people to inter-
act effectively with their environment. Key to visual functionality is depth perception with mon-
ocular depth cues such as perspective (parallel lines converge at infinity), relative size of objects,
and motion parallax (lateral movement causes object movement to vary with distance) (Palmer
1999). Furthermore, perceptual illusions are critical probes into vision-like processing, demon-
strating the assumptions necessary to disambiguate a 3-D world from 2-D retinal images. Vision
exhibits perceptual constancies that keep our perception of a given object the same despite the
environment, which may change the ambient brightness (brightness constancy), object distance
(size constancy), color of illumination (color constancy), tilt of the head (rotation constancy), and
angle of the object (shape constancy), etc. (Palmer 1999). Finally, effortless localization of objects
in simple to cluttered environments and recognition of object properties and categories are crit-
ical to visual perception.
Recognition of patterns and natural objects has been investigated with tactile and auditory
sensory substitution devices with positive results. Bach-y-Rita and colleagues (1998) tested five
sighted subjects on simple shape discrimination (such as circles and squares) with a Tongue
Display Unit (a tactile sensory substitution device). Recognition performance averaged at 79.8
percent correct across shapes using arrays of 16, 25, 36, or 49 electrodes, and percent correct
also improved with object size (Figure 32.2Aa, line TO). Poirier et al.’s (2007a) study tested pat-
tern recognition with the PSVA (an auditory sensory substitution device) in blindfolded sighted
subjects. Patterns were simple combinations of vertical and horizontal bars. Six sighted subjects
significantly improved on element and pattern recognition after a training of two hours in com-
parison to before (Figure 32.2 Ab). Simple and complex pattern recognition was studied compara-
tively with auditory sensory substitution device PSVA in Poirier et al.’s (2006a) behavioral analysis;
they concluded that subjects recognized the element size and spatial arrangement better than the
pattern’s element features (such as vertical bars and horizontal bars). Sensory substitution face
perception was investigated with PSVA (auditory sensory substitution device) for similar neural
correlates to natural visual face perception but subject recognition performance was not reported
(Plaza et al. 2009). Natural object recognition was tested in Auvray et al.’s 2007 study using the
vOICe (auditory sensory substitution). Ten natural objects (such as a plant, shoe, and table) were
identified by six sighted subjects in an artificial white background (brightness was inverted before
sonification) in an average of 42.4 seconds each (Auvray et al. 2007). Subjects listed 1.6 objects
on average before choosing the correct object. The time to identification improved over train-
ing (from 57.6 seconds to 34.7 seconds) and varied among object type and individual subjects.
Categories of objects were studied with the ten natural objects with nine additional objects in
(Aa) Pattern recognition, Tactile sensory (Ab) Pattern recognition, Auditory sensory
substitution (Bach-y-Rita et al. 1998) substitution (Poirier et al. 2007a)

1 B B
B Before training
B H
After training
0.9 RD H ∗
J
F F 100 ∗
J
0.8
Proportion correct

80

% of correct responses
TO H
0.7 J
F
H J 60
0.6 F ET

40
0.5
ES
J 20
0.4
Chance performance: 0.33
0
0.3 Elements Patterns
4 5 6 7 8 9
Pattern size

RD: Finger tip perceived raised dots, * Statistically significant difference between before and after
TO: Electrotactile tongue discrimination training (Elements: Wilcoxon test for paired samples: Z = 1.99,
ET: Fingertip electrotactile discrimination p < 0.05;, Patterns: Wilcoxon test for paired samples: Z = −2.23,
(subject dynamically modulate current), p < 0.03)
ES: Fingertip electrostatic stimulation

(Ba) Object localization, Tactile sensory (Bb) Object localization, Auditory sensory
substitution (Chebat et al. 2011) substitution (Auvray et al. 2007)

Detection
** ** **
100 *
Correct response (%)

80
22
20
60 18
16
14
Error/cm

40 12
10
8
20 6 14
4 12
2 10
0 0
8
L S L S SA SO SA SO Ve 40
rti 70 6
CB SC CB SC ca
l d 30 60 cm 4
ist
an 50 ow/
2
ce 0 40 the elb 2
CB: Congenitally blind, SC: Sighted controls, to 30 to
th 10
ee 20 tance 0
L: Large object, S: Small object,SA: Step-Around obstacle, 10 is
lbo 0 ntal d
w/ 0
rizo
SO: Step-Over obstacle (*P ≤ 0.05; **P ≤ 0.001) cm Ho

Fig. 32.2  Behavioral outcomes of Sensory Substitution training. Psychophysical testing with tactile
and auditory sensory substitution devices has had similar outcomes. Object recognition testing
with Tongue Display Unit (Aa) has shown a correlation between the pattern size and proportion
correct; all subjects exceeded the chance performance. Pattern recognition with an auditory
device (Ab) significantly improved with training and had a similar average percent correct as tactile
pattern recognition (between 0.6 and 0.8 proportion correct). Obstacle localization in uncluttered
maze environment with a tactile device (Ba) had between 0.8 and 1 proportion correct for most
object types. Localization of a four cm diameter ball with an auditory device showed that inaccuracy
increased with distance to the object (webcam to view environment was held in the right hand and
aligned with the elbow) (Bb).
662 Stiles and Shimojo

the same category of an original object. Subjects performed above chance at recognizing specific
objects even within the same category and subjects were more accurate when there were fewer
objects in each category.
A majority of the studies on object recognition with sensory substitution have focused on arti-
ficial stimuli in simplified environments. No studies yet have explored natural objects in natural
environments (such as finding a shirt in a closet or a clock on a nightstand) or the role of dis-
tractor objects to object perception (such as recognizing a object in the center of the field of view
with two objects to the left and right). A potential reason is that artificial patterns are easier to
identify and also can be manipulated to test for sensory substitution resolution as well as quan-
tify objects complexity relatively easily, with a hope that more cluttered scenes would eventually
become recognizable in the progress of training. Several key visual questions such as spatially
segregating objects, object recognition independent of point of view (i.e. shape constancy), and
differentiation of shadows and reflections from physical objects remain unanswered.
Vision is to perceive ‘what is where by looking’ (Marr 1982, p. 3). Recognition studies investi-
gated the ‘what’ element of perception, and now localization studies will highlight the ‘where’ ele-
ment of vision. Clinically, object localization has been most commonly studied with locomotion
through a maze of obstacles. Chebat and his collaborators (2011) constructed a life-sized maze
consisting of white hallway with black boxes, tubes, and bars (horizontal (on the floor or partial
protruding from the wall) or vertical (aligned with left or right wall)). Sixteen congenitally blind
and eleven sighted controls navigated the maze with a tactile display unit (10 ×10 pixels) and were
scored for obstacle detection (pointing at obstacle), and obstacle avoidance (walk past the obs-
tacle without touching it) (Figure 32.2Ba). Congenitally blind (CB in figure) were able to detect
and avoid obstacles significantly more accurately than the sighted controls (SC in figure). Both
groups performed the tasks above chance. Larger obstacles (white bars labeled L in figure) were
easier to avoid and detect than smaller obstacles (black bars labeled S in figure), and step-around
obstacles (white bars labeled SA in figure) were easier to negotiate than step-over obstacles (black
bars labeled SO in figure) (Figure 32.2Ba). A study by Proulx and colleagues (2008) showed that
auditory sensory substitution localization was enhanced when subjects were allowed to use the
SS device in normal life (in addition to device assessments) compared to subjects only using the
device during assessments. Other localization studies have also investigated artificial maze envi-
ronments and tracking of stimuli in 2-D and 3-D space (Chekhchoukh et al. 2011; Kupers et al.
2010). Auvray and colleagues (2007) used an auditory sensory substitution device to study the
accuracy of localization with a pointing task (Figure 32.2Bb) and found that 7.8 cm was the mean
error for pointing at 4 cm diameter ball. The pointing inaccuracy varied proportionally with dis-
tance to the hand held camera (vertically aligned with the subjects elbow).
Depth perception is also a key part of visual processing. With sensory substitution’s monocular
camera and low resolution it can be especially challenging for users to learn. Nevertheless, sighted
users have been found to have key illusions of monocular depth perception. As described earlier
in this chapter, Renier and colleagues (2005b) have tested for perception of the Ponzo illusion with
a sensory auditory substitution device and found that blindfolded sighted subjects could perceive
it similarly to the sighted, but early-blind subjects could not (Renier et al. 2005b). Investigation
of the vertical-horizontal illusion (vertical lines appear longer than horizontal lines) showed that
sighted subjects could perceive this illusion with an auditory sensory substitution device, but early
blind subjects could not perceive it (Renier et al. 2006). These results may indicate either that pre-
vious visual experience is essential for the perception of certain illusions, or that the duration of
training may have been too short or superficial. Testing late-blind subjects may further elucidate
why congenitally blind subjects did not perceive these illusions.
Sensory Substitution 663

The perceptual organization of sensory substitution perception has many properties yet to be
determined. Recognition and localization properties in natural environments are not thoroughly
quantified nor are performances in cluttered environments or in shadowy and glare-ridden set-
tings. Further questions as to what could be sensory substitution primitives (such as edges or
spatial frequencies in vision) have not been answered. Scene perception with sensory substitution
is also ambiguous. Questions such as: can spatial relations of scene be generated with sensory sub-
stitution, how much does it depend on past visual experience and the mode of stimulation (audi-
tory or visual), are still unanswered. The active allocation of attention via gaze is also a critical
component of the normal visual function that is entirely absent in sensory substitution encodings.
Does the absence of active sensation inhibit the processing of sensory substitution stimuli and the
generation of choice? Or instead, would exploration/orienting with the head turn compensate
for the gaze shift easily with minimal training? How does the absence of the gaze cascade impact
preference in the sensory substitution ‘visual’ experience (Shimojo et al. 2003)? Finally, Gestalt
binding principles of proximity and shared properties may or may not be perceived with sensory
substitution, and may be controlled by the transducing modality (somatosensation or audition) or
the processing modality (vision). These questions need to be answered in future research.

Neural (fMRI) Evidence for ‘Vision-like’ Processing


Neural imaging and stimulation studies have recently shown visual activation with limited SS device
usage in sighted, late blind, and early blind participants. In 2007, Poirier et al. (2007b) reviewed sen-
sory substitution imaging studies, concluding that early blind users use primarily cross-modal plas-
ticity and blindfolded sighted users mainly visual imagery to generate visual activation with sensory
substitution use. PET and fMRI studies with tactile and auditory SS devices have shown activation
in BA 17, BA 18, and BA 19 with recognition and localization tasks in early and late blind as well as
occasionally blindfolded sighted subjects (Amedi et al. 2007; Arno et al. 2001; Kupers et al. 2010;
Merabet et al. 2009; Poirier et al. 2006b; Poirier et al. 2007a, b; Ptito et al. 2005; Renier et al. 2005a, b;
Renier and De Volder 2010). Early PET studies showed activation in occipital cortex for early blind
subjects but not for sighted subjects (Arno et al. 2001, Ptito et al. 2005). fMRI imaging studies later
found visual activation with sensory substitution use in sighted subjects with pattern recognition
and localization, in particular in visual areas within the dorsal and ventral streams (Poirier et al.
2006b; Poirier et al. 2007a) (Figure 32.3B). Amedi and colleagues (2007) showed with fMRI imaging
that the lateral occipital tactile-visual (LOtv) area known to interpret object shape was also activated
by auditory sensory substitution device usage (Amedi et al. 2007) (Figure 32.3A). Plaza and col-
laborators (2009) demonstrated that PSVA could activate the fusiform face area with face stimuli
in blindfolded volunteers. Renier et al. (2005a,b) investigated depth perception with a SS device,
and found that blindfolded sighted subjects could perceive the Ponzo illusion and had activation in
occipito-parietal cortex while exploring 3-D images with PET imaging (Renier et al. 2005a,b).
Even non-sensory substitution binding of cross-modal stimuli can generate visual activation
from unimodal stimuli. Zangenehpour and Zatorre (2010) found that training on the spatial and
temporal congruence of beeps and flashes activated visual cortex even in the auditory-only condi-
tion. Therefore, visual cortex can be trained to respond to audition if the subjects are taught to
associate temporally and spatially collocated beep and flashes. This indicates that a critical part
of training-induced plasticity is simultaneous stimulation of sensory substitution (audition or
somatosensation) and vision (for sighted subjects), potentially due to Hebbian learning. Hebbian
learning can also be potentially extended to the blind if stimuli are felt by the hand simultaneously
with stimulation by sensory substitution.
664 Stiles and Shimojo

A. Activation in blind and sighted with a shape estimation task B. Sighted subject activation as a function of training
(Amedi et al. 2007) session on a pattern recognition task
(Poirier et al. 2006b)
(a) (b) (c)
Left Right

SV1 BVc SA1 Session 1

SV2 BVI SA2


P = 0.005 Session 2

(d)
n=7 P = 0.05 (Corr.)
SV3 SA3
Session 3

SV4 PreCS SA4 10


CS POS 8
PreCS IPS Session 4 6
STS 4
2
OTS 0

SV5 SA5

(a) Single sighted subjects neural activation, (b) Blind subject neural Voxels corrected for multiple comparisons in the whole
activation, (c) Single sighted subject activation from auditory control brain and threshold exceeding p<0.05. Six sighted
task, (d) Average across seven vOICe trained users subjects.
(subjects in a and b).

Fig. 32.3  Imaging with Sensory Substitution. Neural activation was shown on the left
occipitotemporal cortex in all sighted and blind expert users during sensory substitution shape
classification (Aa–Ab), whereas sighted users did not have visual activation with auditory control
task (Ac). Averaged results show activation in several multimodal regions (Ad). During a sensory
substitution pattern recognition task six sighted subjects showed a progressive increase in occipital
activation with training on an auditory sensory substitution device (B).

fMRI and PET studies have demonstrated that visual cortex activation correlates with sensory
substitution use, but cannot prove causality. Repetitive Transcranial Magnetic Stimulation (rTMS)
deactivates a region of cortex, examining the possible causal link between neural activation and
subject performance. Collignon and colleagues (2007) applied rTMS to the right dorsal extrastriate
occipital cortex of seven sighted and seven early blind subjects (both trained on the PSVA audi-
tory sensory substitution device) preceding sensory substitution pattern recognition (Collignon et
al. 2007). Early blind subjects had longer reaction times and lower accuracies with rTMS applied
as compared to a sham rTMS condition; sighted subjects had no performance change (Collignon
et al. 2007) (Figure 32.4B). Merabet et al. (2009) also deactivated occipital peristriate regions of a
late blind sensory substitution superuser, PF, and demonstrated a decrement in recognition accur-
acy relative to pre-rTMS and post-sham rTMS conditions (Figure 32.4A). In the tactile domain,
TMS applied to occipital cortex elicited somatotopic tactile sensations in blind but not blindfolded
sighted users of a tactile sensory substitution device (Kupers et al. 2006). Overall, rTMS studies
indicate that the blind users of sensory substitution devices functionally and causally recruit the
occipital cortex, potentially due to long-term cross-modal plasticity from visual deprivation.
Dynamic Causal Modeling (DCM) studies in the blind have constructed a cross-modal net-
work for auditory and somatosensory processing and the visual cortex (Fujii et al. 2009; Klinge
et al. 2010). It remains to be shown if these networks are used in blind subjects with sensory
Sensory Substitution 665

A. rTMS on a Late blind auditory sensory substitution B. rTMS on Early blind auditory sensory substitution users
expert (Merabet et al. 2009) (Collignon et al. 2007)

PSVA-form recognition
100 100
NS Baseline Sham rTMS
∗ Post-rTMS 95 Real rTMS
80
NS ∗
90
60

% Correct
% Correct

85
40
80

20
75

0 70
Occipital pole Vertex Sighted Blind

NS: Not Significant, *: P<0.05 *: P<0.05, Error bars indicate standard errors.

Fig. 32.4  rTMS with Sensory Substitution. Repetitive Transcranial Magnetic Stimulation (rTMS)
decreases neural activation and influences behavior thereby generating a causal link between
behavioral outcomes and neural region activation. rTMS of an occipital region significantly reduced
percent correct at object identification in an expert vOICe user, PF (A). PF’s recognition was not
significantly impaired by rTMS of a vertex location. Seven early blind subjects were also impaired at
sensory substitution pattern recognition task with rTMS to right dorsal extrastriate occipital cortex
(B). Seven sighted subjects performance was not significantly affected by rTMS (B).

substitution, and if the cross-modal network in the sighted is similar to, or different from blind
subjects. Nevertheless, literature on functional connectivity of sensory substitution ‘stimuli’ and
dynamic causal modeling of the blind can be used to generate several neural network possibilities
(Figure 32.5A and 32.B) with feedforward and feedback connections. The network likely includes
the primary sensory region of the transducing modality (somatosensation or audition), which
connects to a multimodal region that further connects to primary visual regions (V3, V2, or V1).
The filtering of stimuli as sensory substitution stimuli or natural stimuli could occur at the pri-
mary region of transducing modality (A1 or S1) or the multimodal region. More studies on the
specificity of the plasticity would be required to elucidate this. The role of prefrontal regions in
top down cognitive processing of the cross-modal stimulus had yet to be shown. More critically,
it remains to be fully determined which specific regions in the network are casually linked to
performance and therefore the role each region plays in stimulus processing. Feedback between
visual regions and the multimodal regions may play a significant role in stimulus processing, yet
the degree of feedback in sensory substitution processing is unclear. Motor regions and other
primary sensory regions may also play an important role in plastic changes in the sensory substi-
tution neural network.

Sensory Substitution and Aesthetics


A key aspect of perception, whether it is visual or auditory, is aesthetics or pleasantness of the
stimulus. This is mainly because perception is fundamentally an active, not passive, process,
and such active orienting is often triggered by positive (hedonic) or negative (aversive) values,
which the stimulus owns. Needless to say, aesthetic evaluation of stimuli in a sensory modality
666 Stiles and Shimojo

(a) Tactile sensory substitution (b) Auditory sensory substitution


neural network neural network

L R L R

S1 S1

PC PC A1 A1

STS STS

V3 V3
V3 V3
V1 V1
V1 V1

Fig. 32.5  Network with Sensory Substitution. Visual, auditory, and tactile regions generate a neural
network in blind and sighted sensory substitution users that process sensory information within a
feedforward and feedback hierarchy (A for tactile devices and B for auditory devices) (after Poirier
et al. 2007b). The sensory information is first filtered by primary sensory regions (A1 or S1 for
auditory and tactile devices, respectively). Sensory information is then communicated to multimodal
regions (such as STS or Parietal Cortex) and forward to primary visual regions (V3, V2 (not shown),
or V1). It is also likely that feedback and reiterative processing plays a role in the perception of the
sensory substitution stimuli.

is closely interlinked with perceptual organization principles in that modality (e.g., Palmer et al.,
this volume; van Tonder and Vishwanath, this volume). Since sensory substitution adds new crit-
ical associative dimensions to our perceptual experiences, it attracts artists with a possibility of
significant changes in the overall structure of multisensory aesthetics. If some subjects (primarily
late-blind) perceive sensory substitution with a vision-like perception, then do their preferences
for stimuli follow the aesthetics of vision rather than that of the transducing modality, i.e. audition
or somatosensation?
One interesting, though anecdotal, case is Neil Harbisson, a congenitally achromatic artist, who
uses a sensory substitution device to perceive ‘color’ as sound (Harbisson 2012). He seems to still
‘hear’ the color rather than ‘see’ it, and as such his perception of beautiful color combinations
derives from the aesthetics of audition rather than those of vision. His ‘color’ perception may
qualify as ‘a third kind of qualia’, given that he has mixed the information of vision (i.e., color)
as the decoded, and that of audition as the decoding. He also misinterprets natural sounds as
colors, thereby generating a new artificial synaesthesia. He uses these misinterpretations to gen-
erate visual artwork that represents the colors he perceives when listening to natural sounds, such
as famous music or speeches. One remaining question in his case, however, would be whether his
‘color’ experience is just a form of associative imagery, or a real percept as in the true synaesthete.
Aside from being an interesting case study, his experience opens the question as to whether typ-
ical sensory substitution users have aesthetics that are more typical to audition or to vision (or else
Sensory Substitution 667

newly-emerged cross-modal aesthetic organization), and if this depends on how they perceive
the stimulus. It might be true that aesthetics follows the mode of perception, such that late-blind
users, who are more likely to perceive sensory substitution as ‘vision’, will prefer different stimuli
to those of blindfolded sighted users, who are more likely to have an auditory experience with
sensory substitution.

Discussion
The practical objective of sensory substitution research is the rehabilitation potential for the
blind. Training methods and device encodings have yet to generate a high functionality outcome
with minimal training requirements. Several thrusts are attempting to ameliorate this problem,
including encodings that utilize spatial auditory processing, and optimizing training algorithms.
Improving training of existing devices such as the vOICe device may be possible by incorpo-
rating the findings from multimodal research. Well-known cross-modal correspondences or
intrinsic mappings of visual and auditory stimuli may enhance participant performance by using
pre-existing connections between auditory and visual stimuli to implicitly teach subjects how
to interpret sensory substitution stimuli. An alternative to improving training is to employ new
devices such as CASBLiP and MVSS that use 3-D sound to generate artificial sounds with a 3-D
spatial location, thereby indicating obstacles and overhangs to blind users, bypassing the 2-D
representation (‘image’). The idea behind them is unique and potentially innovating, because it
abandons the idea of vision as a 2-D (fronto-parallel) image whose parameters need to be trans-
lated into auditory (or somatosensory) parameters. Instead, it relies on the very simple idea of
direct perception, which immediately guides action for navigation and obstacle avoidance. While
CASBliP and MVSS have been developed, no extensive psychophysical evaluation of subject
capabilities have yet been published, thereby leaving their impact on rehabilitation as an open
question. Systematic evaluations of obstacle avoidance in cluttered environments and object iden-
tification will clarify the potential role of these new devices in improving the blind quality of life.
With both approaches, sensory substitution may have significant possibilities in blind rehabilita-
tion, up to the degree to which the brain has vigorous cross-modal plasticity.
Sensory substitution of vision may not only help rehabilitate the blind, but also provides a
power­ful and unique method to study cross-modal interactions and visual perception. While sen-
sory substitution is similar to visual perception and often retains visual illusions, properties, and
activation in visual cortex, most sighted subjects experience it still as auditory or somatosensory
perception. As reviewed above, selected few device users, often superusers and the late blind,
claim to have vision-like experiences with device use. The imaging, rTMS, and behavioral data
indicate that the visual or auditory/somatosensory dominance of sensory substitution depends
on the plasticity of individual’s multimodal neural network and previous visual experience. Key
questions remain about the structure of the multimodal network, and which unimodal or amodal
regions process the temporal and spatial aspects of sensory substitution stimuli. Unanswered
questions include the topographical mapping of sensory substitution stimuli onto visual cortex
via training, the decay rate of visual activation from sensory substitution after a period of disuse
(in the blind and sighted), the automaticity of sensory substitution processing (i.e., whether it is
possible to acquire effortless perception without massive top-down attention), and how temporal
coordination is accomplished across modalities.
Although the information provided to subjects by sensory substitution devices may be derived
from the same source as visual stimuli in the sighted, it is interpreted and processed in a unique way
by the central nervous system, generating a percept that is neither visual nor auditory but instead
668 Stiles and Shimojo

is intrinsically cross-modal. Blind and sighted subjects interpret auditory cues differently, have dif-
ferent connectivity between visual and auditory/somatosensory cortices, and therefore likely use
different aspects of the information from sensory substitution to generate perception. Sensory
substitution is a new way of pairing sensory modalities, such that it may be understood as a new
sub-modality using the transduction of audition or somatosensation and processed by visual cortex.
How could such new sensory experiences be perceptually organized, be experienced, and guide
action? It will be a challenge to further quantify the unique aspects of this third type of qualia and
to understand the features, such as new illusions, that are wholly unique to this form of perception.

References
Amedi, A., Stern, W.M., Camprodon, J.A., et al. (2007). Shape conveyed by visual-to-auditory sensory
substitution activates the lateral occipital complex. Nature Neuroscience 10: 687–9.
Andersen, T.S., Tiippana, K., and Sams, M. (2004). Factors influencing audiovisual fission and fusion
illusions. Cognitive Brain Research 21: 301–8.
Araque, N.O., Dunai, L., Rossetti, F., et al. (2008). Sound map generation for a prototype blind mobility
system using multiple sensors. Service Robotics and Smart Homes: How a gracefully adaptive integration
of both environments can be envisaged? Bilbao, Spain.
Arno, P., De Volder, A.G., Vanlierde, A., et al. (2001). Occipital activation by pattern recognition in the
early blind using auditory substitution for vision. Neuroimage 13: 632–45.
Auvray, M., Hanneton, S., Lenay, C., and O’Regan, K. (2005). There is something out there: distal
attribution in sensory substitution, twenty years later. Journal of Integrative Neuroscience 4: 505–21.
Auvray, M., Hanneton, S., and O Regan, J.K. (2007). Learning to perceive with a visuo-auditory
substitution system: localisation and object recognition with the vOICe. Perception-London 36: 416–30.
Bach-y-Rita, P., Collins, C.C., Saunders, F.A., White, B., and Scadden, L. (1969). Vision substitution by
tactile image projection. Nature 221: 963–4.
Bach-y-Rita, P., Kaczmarek, K.A., Tyler, M.E., and Garcia-Lara, J. (1998). Form perception with a 49-point
electrotactile stimulus array on the tongue: a technical note. Development 35: 427–30.
Bavelier, D. and Neville, H.J. (2002). Cross-modal plasticity: where and how? Nature Reviews Neuroscience
3: 443–52.
Bouvrie, J.V. and Sinha, P. (2007). Visual object concept discovery: observations in congenitally blind
children, and a computational approach. Neurocomputing 70: 2218–33.
Bregman A.S. and Campwell J. (1971). Primary auditory stream segregation and perception of order in
rapid sequences of tones. Journal of Experimental Psychology 89: 244–9.
Browne, R.F. (2003). Toward mobility aid for the blind. Image and Vision Computing New Zealand, pp.
275–9. Palmerston North, New Zealand.
Capelle, C., Trullemans, C., Arno, P., and Veraart, C. (2002). A real-time experimental prototype for
enhancement of vision rehabilitation using auditory substitution. IEEE Transactions on Biomedical
Engineering 45: 1279–93.
Chalmers, D.J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies
2: 200–19.
Chebat, D.R., Schneider, F.C., Kupers, R., and Ptito, M. (2011). Navigation with a sensory substitution
device in congenitally blind individuals. Neuroreport 22: 342–7.
Chekhchoukh, A., Vuillerme, N., and Glade, N. (2011). Vision substitution and moving objects tracking
in 2 and 3 dimensions via vectorial electro-stimulation of the tongue. Actes de ASSISTH 2011, 2eme
Conference internationale sur l’Accessibilite et les Systemes de Suppleance aux personnes en situaTions de
Handicaps. Paris.
Sensory Substitution 669

Clemons, J., Bao, S.Y., Savarese, S., Austin, T., and Sharma, V. (2012). MVSS: Michigan Visual Sonification
System. 2012 IEEE International Conference on Emerging Signal Processing Applications (ESPA),
pp. 143–6. Las Vegas.
Cohen, L.G., Celnik, P., Pascual-Leone, A., et al. (1997). Functional relevance of cross-modal plasticity in
blind humans. Nature 389: 180–2.
Collignon, O., Lassonde, M., Lepore, F., Bastien, D., and Veraart, C. (2007). Functional cerebral
reorganization for auditory spatial processing and auditory substitution of vision in early blind subjects.
Cerebral Cortex 17: 457–65.
Collignon, O., Voss, P., Lassonde, M., and Lepore, F. (2009). Cross-modal plasticity for the spatial
processing of sounds in visually deprived subjects. Experimental Brain Research 192: 343–58.
Dunai, L. (2010). Design, modeling and analysis of object localization through acoustical signals for
cognitive electronic travel aid for blind people. Universidad Politecnica De Valencia, School of Design
Engineering, PhD Thesis.
Ernst, M.O. and Banks, M.S. (2002). Humans integrate visual and haptic information in a statistically
optimal fashion. Nature 415: 429–33.
Fujii, T., Tanabe, H.C., Kochiyama, T., and Sadato, N. (2009). An investigation of cross-modal plasticity
of effective connectivity in the blind by dynamic causal modeling of functional MRI data. Neuroscience
Research 65: 175–86.
Harbisson, N. (2012). I listen to color. TEDGlobal, [Online] Jul 2012, Available at: http://www.ted.com/
talks/neil_harbisson_i_listen_to_color.html, accessed 26 Sept 2012.
Humayun, M.S., Weiland, J.D., Fujii, G.Y., et al. (2003). Visual perception in a blind subject with a chronic
microelectronic retinal prosthesis. Vision Research 43: 2573–81.
Klinge, C., Eippert, F., Roder, B., and Buchel, C. (2010). Corticocortical connections mediate
primary visual cortex responses to auditory stimulation in the blind. The Journal of Neuroscience
30: 12798–805.
Kujala, T., Huotilainen, M., Sinkkonen, J., et al. (1995). Visual cortex activation in blind humans during
sound discrimination. Neuroscience Letters 183: 143–6.
Kupers, R., Fumal, A., de Noordhout, A.M., Gjedde, A., Schoenen, J., and Ptito, M. (2006). Transcranial
Magnetic Stimulation of the visual cortex induces somatotopically organized qualia in blind subjects.
Proceedings of the National Academy of Sciences 103: 13256–60.
Kupers, R., Chebat, D.R., Madsen, K.H., Paulson, O.B., and Ptito, M. (2010). Neural correlates of
virtual route recognition in congenital blindness. Proceedings of the National Academy of Sciences
107: 12716–21.
Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of
Visual Information. WH San Francisco: Freeman and Company.
Meijer, P.B.L. (1992). An experimental system for auditory image representations. IEEE Transactions on
Biomedical Engineering 39: 112–21.
Merabet, L., Rizzo, J., Amedi, A., Somers, D., and Pascual-Leone, A. (2005). What blindness can tell us
about seeing again: merging neuroplasticity and neuroprostheses. Nature Reviews Neuroscience 6: 71–7.
Merabet, L.B., Battelli, L., Obretenova, S., Maguire, S., Meijer, P., and Pascual-Leone, A. (2009).
Functional recruitment of visual cortex for sound encoded object identification in the blind.
Neuroreport 20: 132–8.
Mishra, J., Martinez, A., Sejnowski, T.J., and Hillyard, S.A. (2007). Early cross-modal interactions in
auditory and visual cortex underlie a sound-induced visual illusion. The Journal of Neuroscience
27: 4120–31.
Neri, P. and Levi, D.S. (2007). Temporal dynamics of figure-ground segregation in human vision. Journal of
Neurophysiology 97: 951–7.
670 Stiles and Shimojo

Neville, H.J. and Lawson, D. (1987). Attention to central and peripheral visual space in a movement
detection task: an event-related potential and behavioral study. II. Congenitally deaf adults. Brain
Research 405: 268–83.
Neville, H.J., Schimidt, A., and Kutas, M. (1983). Altered visual-evoked potentials in congenitally deaf
adults. Brain Research 266: 127–32.
Ortiz, T., Poch, J., Santos, J.M., et al. (2011). Recruitment of occipital cortex during sensory substitution
training linked to subjective experience of seeing in people with blindness. PLoS One 6: e23264.
Palmer, S.E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Pascual-Leone, A. and Hamilton, R. (2001). The metamodal organization of the brain. In: Casanova, C.
and Ptito, M. (eds.). Vision: From Neurons to Cognition, pp. 427–45. Amsterdam: Elsevier Science.
Plaza, P., Cuevas, I., Collignon, O., Grandin, C., De Volver, A.G., and Renier, L. (2009). Percieving
schematic faces and man-made objects using a visual-to-auditory sensory substitution activates the
fusiform gyrus. 10th International Multisensory Research Forum. New York.
Poirier, C.C., Richard, M.A., Duy R.T., and Veraart C. (2006a). Assessment of sensory substitution
prosthesis potentialities in minimalist conditions of learning. Applied Cognitive Psychology 20: 447–60.
Poirier, C.C., De Volder, A.G., Tranduy, D., and Scheiber, C. (2006b). Neural changes in the ventral
and dorsal visual streams during pattern recognition learning. Neurobiology of Learning and Memory
85: 36–43.
Poirier, C., De Volder, A., Tranduy, D., and Scheiber, C. (2007a). Pattern recognition using a device
substituting audition for vision in blindfolded sighted subjects. Neuropsychologia 45: 1108–21.
Poirier, C., De Volder, A.G., and Scheiber, C. (2007b). What neuroimaging tells us about sensory
substitution. Neuroscience and Biobehavioral Reviews 31: 1064–70.
Pratt, C.C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology 13: 278.
Proulx, M.J., Stoerig, P., Ludowig, E., and Knoll, I. (2008). Seeing ‘where’ through the ears: effects of
learning-by-doing and long-term sensory deprivation on localization based on image-to-sound
substitution. PLoS One 3: e1840.
Ptito, M., Moesgaard, S.M., Gjedde, A. and Kupers, R. (2005). Cross-modal plasticity revealed by
electrotactile stimulation of the tongue in the congenitally blind. Brain 128: 606–14.
Renier, L., Collignon, O., Poirier, C., et al. (2005a). Cross-modal activation of visual cortex during depth
perception using auditory substitution of vision. Neuroimage 26: 573–80.
Renier, L., Laloyaux, C., Collignon, O., et al. (2005b). The ponzo illusion with auditory substitution of
vision in sighted and early-blind subjects. Perception 34: 857–67.
Renier, L., Bruyer, R., and De Volder, A. (2006). Vertical-horizontal illusion present for sighted but not
early blind humans using auditory substitution of vision. Perception and Psychophysics 68: 535–42.
Renier, L. and De Volder, A. (2010). Vision substitution and depth perception: early blind subjects
experience visual perspective through their ears. Disability & Rehabilitation: Assistive Technology
5: 175–83.
Resnikoff, S., Pascolini, D., Etya’ale, D., et al. (2004). Global data on visual impairment in the year 2002.
Bulletin of the World Health Organization 82: 844–52.
Rosenthal, O., Shimojo, S., and Shams, L. (2009). Sound-induced flash illusion is resistant to feedback
training. Brain Topography 21: 185–92.
Sadato, N., Pascual-Leone, A., Grafman, J., et al. (1996). Activation of the primary visual cortex by braille
reading in blind subjects. Nature 380: 526–8.
Shams, L., Kamitani, Y., and Shimojo, S. (2000). What you see is what you hear. Nature, 40: 788.
Shams, L., Kamitani, Y., and Shimojo, S. (2002). Visual illusion induced by sound. Cognitive Brain Research
14: 147–52.
Shimojo, S. and Shams, L. (2001). Sensory modalities are not separate modalities: plasticity and
interactions. Current Opinion in Neurobiology 11: 505–9.
Sensory Substitution 671

Shimojo, S., Simion, C., Shimojo, E., and Scheier, C. (2003). Gaze bias both reflects and influences
preference. Nature Neuroscience 6: 1317–22.
Spence, C. (2011). Crossmodal correspondences: a tutorial review. Attention, Perception, and Psychophysics
73: 971–95.
Stevens, J.C. and Marks, L.E. (1965). Cross-modality matching of brightness and loudness. Proceedings of
the National Academy of Sciences of the United States of America 54: 407–11.
Stiles, N.R.B., McIntosh, B.P., Nasiatka, P.J., et al. (2010). An intraocular camera for retinal protheses:
restoring sight to the blind. In: A. Serpenguzel and A.W. Poon (eds.). Optical Processes in Microparticles
and Nanostructures: A Festschrift Dedicated to Richard Kounai Chang on His Retirement from Yale
University, pp. 385–430. Singapore: World Scientific.
Uhl F., Lindinger, G., Lang, W., and Deecke, L. (1991). On the functionality of visually deprived occipital
cortex in early blind persons. Neuroscience Letters 124: 256–9.
Vroomen, J. and De Gelder, B. (2000) Sound enhances visual perception: crossmodal effects of auditory
organization on vision. Journal of Experimental Psychology: Human Perception and Performance 26:
1583–90.
Ward, J. and Meijer, P. (2010). Visual experiences in the blind induced by an auditory sensory substitution
device. Consciousness and Cognition 19: 492–500.
Winter, J.O., Cogan, S.F., and Rizzo, J.F. (2007). Retinal prostheses: current challenges and future outlook.
Journal of Biomaterials Science, Polymer Edition 18: 1031–55.
World Health Organization. (2009). Visual impairment and blindness. [Online] June 2012, Available
at: http://www.who.int/mediacentre/factsheets/fs282/en/index.html, accessed 4 Oct 2012.
Zangenehpour, S. and Zatorre, R.J. (2010). Crossmodal recruitment of primary visual cortex following
brief exposure to bimodal audiovisual stimuli. Neuropsychologia 48: 591–600.
Chapter 33

Different modes of visual organization


for perception and for action
Melvyn A. Goodale and Tzvi Ganel

Introduction
We depend on vision, more than on any other sense, to perceive the world of objects and events
beyond our bodies. We also use vision to move around that world and to guide our goal-directed
actions. Over the last 25 years, it has become increasingly clear that the visual pathways in the
brain that mediate our perception of the world are quite distinct from those that mediate the
control of our actions. This distinction between ‘vision-for-perception’ and ‘vision-for-action’ has
emerged as one of the major organizing principles of the visual brain, particularly with respect to
the visual pathways in the cerebral cortex (Goodale and Milner, 1992; Milner and Goodale, 2006).
According to Goodale and Milner’s (1992) account, the ventral stream of visual processing,
which arises in early visual areas and projects to inferotemporal cortex, constructs the rich and
detailed representation of the world that serves as a perceptual foundation for cognitive opera-
tions, allowing us to recognize objects, events and scenes, attach meaning and significance to them,
and infer their causal relations. Such operations are essential for accumulating a knowledge-base
about the world. In contrast, the dorsal stream, which also arises in early visual areas, but projects
instead to the posterior parietal cortex, provides the necessary visual control of skilled actions,
such as manual prehension. Even though the two streams have different functions and operating
principles, in everyday life they have to work together. The perceptual networks of the ventral
stream interact with various high-level cognitive mechanisms, and enable an organism to select
a goal and an associated course of action, while the visuomotor networks in the dorsal stream
(and their associated cortical and subcortical pathways) are responsible for the programming and
on-line control of the particular movements the action entails. Of course, the dorsal and ventral
streams have other roles to play as well. For example, the dorsal stream, together with areas in
the ventral stream, plays a role in spatial navigation – and areas in the dorsal stream appear to
be involved in some aspects of working memory (Kravitz et al., 2011). This review, however, will
focus on the respective roles of the two streams in perception and action – and will concentrate
largely on the implications of the theory for the principles governing perceptual organization and
visuomotor control.

Different neural computations for perception and action


Evidence from a broad range of empirical studies from human neuropsychology to single-unit
recording in non-human primates (for reviews, see Culham and Valyear, 2006; Goodale, 2011;
Kravitz et al., 2011) supports the idea of two cortical visual systems. Yet the question remains as
to why two separate systems evolved in the first place. Why couldn’t one ‘general purpose’ visual
Different Modes of Visual Organization for Perception and for Action 673

system handle both vision-for-perception and vision-for-action? The answer to this question lies
in the differences in the computational requirements of vision-for-perception on the one hand
and vision-for-action on the other. To be able to grasp an object successfully, for example, the
visuomotor system has to deal with the actual size of the object, and its orientation and posi-
tion with respect to the hand you intend to use to pick it up. These computations need to reflect
the real metrics of the world, or at the very least, make use of learned ‘look-up tables’ that link
neurons coding a particular set of sensory inputs with neurons that code the desired state of
the limb (Thaler and Goodale, 2010). The time at which these computations are performed is
equally critical. Observers and goal objects rarely stay in a static relationship with one another
and, as a consequence, the egocentric location of a target object can often change radically from
moment-to-moment. In other words, the required coordinates for action need to be computed at
the very moment the movements are performed.
In contrast to vision-for-action, vision-for-perception does not need to deal with the abso-
lute size of objects or their egocentric locations. In fact, very often such computations would be
counter-productive because our viewpoint with respect to objects does not remain constant  –
even though our perceptual representations of those objects do show constancy. Indeed, one can
argue that it would be better to encode the size, orientation, and location of objects relative to
each other. Such a scene-based frame of reference permits a perceptual representation of objects
that transcends particular viewpoints, while preserving information about spatial relationships
(as well as relative size and orientation) as the observer moves around. The products of perception
also need to be available over a much longer time scale than the visual information used in the
control of action. By working with perceptual representations that are object- or scene-based, we
are able to maintain the constancies of size, shape, color, lightness, and relative location, over time
and across different viewing conditions.
The differences between the relative frames of reference required for vision-for-perception and
absolute frames of reference required for vision-for-action lead, in turn, to clear differences in the
way in which visual information about objects and their spatial relationships is organized and
represented. These differences can be most readily seen in the way in which the two visual systems
deal with visual illusions.

Studies of visual illusions


The most intriguing – yet also the most controversial evidence – for dissociations between action
and perception in healthy subjects has come from studies of visual illusions of size (for a review
see Goodale, 2011). In visual illusions of size, an object is typically embedded within the context
of other objects or other pictorial cues that distort its perceived size. Visual illusions, by defin-
ition, have robust effects on perceptual judgments. Surprisingly, the same illusions can have little
or no effects on visuomotor tasks, such as grasping. Thus, even though a person might perceive
an object embedded within an illusion to be larger or smaller than it really is, when they reach out
to pick up the object, the opening of his or her grasping hand is often unaffected by the illusion.
In other words, the grip aperture is scaled to the real, not the apparent size of the goal object. This
result has been interpreted as evidence for the idea that vision-for-action makes use of real-world
metrics while vision-for-perception uses relative or scene-based metrics (Goodale and Milner,
2005). This interpretation, however, has been vigorously challenged over the past decade by stud-
ies claiming that when attention and other factors are taken into account, there is no difference
between the effects of size-contrast illusions on grip scaling and perceptual reports of size (for a
review, see Franz and Gegenfurtner, 2008).
674 Goodale and Ganel

A representative example of such conflicting results comes from studies that have compared the
effects of the Ebbinghaus illusion on action and perception. In this illusion, a circle surrounded by
an annulus of smaller circles appears to be larger than the same circle surrounded by an annulus
of larger circles (see Figure 33.1A). It is thought that the illusion arises because of an obligatory
comparison between the size of the central circle and the size of the surrounding circles, with
one circle looking relatively smaller than the other (Coren and Girgus, 1978). It is also possible
that the central circle within the annulus of smaller circles will be perceived as more distant (and
therefore larger) than the circle of equivalent retinal-image size within the array of larger circles.
In other words, the illusion may be simply a consequence of the perceptual system’s attempt to
make size-constancy judgments on the basis of an analysis of the entire visual array (Gregory,
1963). In addition, the distance between the surrounding circles and the central circle may also
play a role; if the surrounding circles are close to the central circle, then the central circle appears
larger, but if they are further away, the central circle appears smaller (Roberts et al., 2005). In many

(a) (c)

Perceptually different
Physically identical

(b) (d) 80
Large
Small
Aperture (mm)

60

40

Perceptually identical
Physically different 20
0s 1.0 s

Fig. 33.1  The effect of a size-contrast illusion on perception and action. (a) The traditional
Ebbinghaus illusion in which the central circle in the annulus of larger circles is typically seen as
smaller than the central circle in the annulus of smaller circles, even though both central circles are
actually the same size. (b) The same display, except that the central circle in the annulus of larger
circles has been made slightly larger. As a consequence, the two central circles now appear to be the
same size. (c) A 3D version of the Ebbinghaus illusion. Participants are instructed to pick up one of
the two 3D disks placed either on the display shown in panel A or the display shown in panel B.
(d) Two trials with the display shown in panel B, in which the participant picked up the small disk on
one trial and the large disk on another. Even though the two central disks were perceived as being
the same size, the grip aperture in flight reflected the real not the apparent size of the disks.
Reprinted from Current Biology, 5(6), Salvatore Aglioti, Joseph F.X. DeSouza, and Melvyn A. Goodale, Size-
contrast illusions deceive the eye but not the hand, pp. 679–85, Copyright (1995), with permission from Elsevier.
Different Modes of Visual Organization for Perception and for Action 675

experiments, the size of the surrounding circles and the distance between them and the central
circle are confounded. But whatever the critical factors might be in any particular Ebbinghaus
display, it is clear that the apparent size of the central circle is influenced the context in which it is
embedded. These contextual effects are remarkably resistant to cognitive information about the
real size of the circles. Thus, even when people are told that the two circles are identical in size
(and this fact is demonstrated to them), they continue to experience a robust illusion of size.
The first demonstration that grasping might be refractory to the Ebbinghaus illusion was car-
ried out by Aglioti et al. (1995). These investigators constructed a 3-D version of the Ebbinghaus
illusion, in which a poker-chip type disk was placed in the centre of a 2-D annulus made up of
either smaller or larger circles (Figure 33.1C). Two versions of the Ebbinghaus display were used.
In one case, the two central disks were physically identical in size, but one appeared to be larger
than the other (Figure 33.1A). In the second case, the size of one of the disks was adjusted so that
the two disks were now perceptually identical, but had different physical sizes (Figure 33.1B).
Despite the fact that the participants in this experiment experienced powerful illusion of size,
their anticipatory grip aperture was unaffected by the illusion when they reached out to pick up
each of the central disks. In other words, even though their perceptual estimates of the size of
the target disk were affected by the presence of the surrounding annulus, maximum grip aper-
ture between the index finger and thumb of the grasping hand, which was reached about 70% of
the way through the movement, was scaled to the real not the apparent size of the central disk
(Figure 33.1D).
The findings of Aglioti et al. (1995) have been replicated in a number of other studies (for
a review, see Carey, 2001; Goodale, 2011). Nevertheless, other studies using the Ebbinghaus
illusion have failed to replicate these findings. Franz et al. (2000a,b, 2001), for example, used
a modified version of the illusion and found similar (and significant) illusory effects on both
vision-for-action and vision-for-perception, arguing that the two systems are not dissociable
from one another, at least in healthy participants. These authors argued that the difference
between their findings and those of Aglioti et  al. resulted from different task demands. In
particular, in the Aglioti study (as well as in a number of other studies showing that visuo-
motor control is resistant to visual illusions), subjects were asked to attend to both central
disks in the illusory display in the perceptual task, but to grasp only one object at a time in
the action task. Franz and colleagues argued that this difference in attention in the perceptual
and action tasks could have accounted for the pattern of results in the Aglioti et al. study. In
the experiments by Franz and colleagues, participants were presented with only a single disk
surrounding by an annulus of either smaller or larger circles. Under these conditions, Franz
and colleagues found that both grip aperture and perceptual reports were affected by the
presence of the surrounding annulus. The force of this demonstration, however, was under-
cut in later experiments by Haffenden and Goodale (1998), who asked participants either to
estimate the size of one of the central disks manually by opening their finger and thumb a
matching amount or to pick up it up. Even though in both cases, participants were arguably
directing their attention to only one of the disks, there was a clear difference in the effect of
the illusion: the manual estimates, but not the grasping movements were affected by the size
of the circles in the surrounding annulus.
Franz (2003) later argued the slope of the function describing the relationship between man-
ual estimates and the real size of the target object was far steeper than more ‘conventional’ psy-
chophysical measures and that, when one adjusted for the difference in slope, both action and
perception were affected to the same degree by the Ebbinghaus and by other illusions. Although
this explanation, at least on the face of it, is a compelling one, it cannot explain why Aglioti et al.
(1995), and Haffenden and Goodale (1998) found that when the relative sizes of the two target
676 Goodale and Ganel

objects in the Ebbinghaus display were adjusted so that they appeared to be perceptually identical,
the grip aperture that participants used to pick up the two targets continued to reflect the physical
difference in their size. Nor can it explain the findings of a recent study by Stöttinger and col-
leagues (2012) who showed that even when slopes were adjusted, manual estimates of object size
were much more affected by the illusion (in this case, the Diagonal illusion), than were grasping
movements.
Recently, several studies have suggested that online visual feedback during grasping could be a
relevant factor accounting for some of the conflicting results in the domain of visual illusions and
grasping. For example, Bruno and Franz (2009) have performed a meta-analysis of studies that
looked at the effects of the Müller–Lyer illusion on perception and action, and concluded that the
dissociation between the effects of this illusion on grasping and perception is mostly pronounced
when online visual feedback is available. According to this account, feedback from the fingers
and the target object during grasp can be affectively used by the visuomotor system to counteract
the effect of visual illusions on grip aperture. Further support for this proposal comes from stud-
ies that showed that visual illusions, such as the Ebbinghaus illusion, affect grasping trajectories
only during initial stages of the movement, but not in later stages, in which visual feedback can
be effectively used allow the visuomotor system to compensate for the effects of the illusory con-
text (Glover and Dixon, 2002). However, other studies that manipulated the availability of visual
feedback during grasp failed to find evidence of visual feedback on grasping performance in the
context of visual illusions (Ganel et al., 2008a; Westwood and Goodale, 2003).
The majority of studies that have claimed that action escapes the effects of pictorial illusions
have demonstrated this by finding a null effect of the illusory context on grasping movements. In
other words, they have found that perception (by definition) was affected by the illusion, but peak
grip aperture of the grasping movement was not. Null effects like this are never as compelling as
double dissociations between action and perception.
As it turns out, a more recent study has, in fact, demonstrated a double dissociation between
perception and action. Ganel and colleagues (2008a) used the well-known Ponzo illusion in
which the perceived size of an object is affected by its location within pictorial depth cues.
Objects located at the diverging end of the display appear to be smaller than those located at
the converging end. To dissociate the effects of real size from those of illusory size, Ganel and col-
leagues manipulated the real sizes of two objects that were embedded in a Ponzo display so that
the object that was perceived as larger was actually the smaller one of the pair (see Figure 33.2A).
When participants were asked to make a perceptual judgment of the size of the objects, their per-
ceptual estimates reflected the illusory Ponzo effect. In contrast, when they picked up the objects,
the aperture between the finger and thumb of their grasping hand was tuned to their actual size.
In short, the difference in their perceptual estimates of size for the two objects, which reflected the
apparent difference in the size, went in the opposite direction from the difference in their peak
grip aperture, which reflected the real difference in size (Figure 33.2B). This double dissociation
between the effects of apparent and real size differences on perception and action respectively
cannot be explained away by appealing to differences in attention or differences in slope (Franz et
al., 2001; Franz et al., 2000a,b; Franz, 2003).
In a series of experiments that used both the Ebbinghaus and the Ponzo illusions, Gonzalez
and her colleagues provided a deeper understanding of the conditions under which grasping can
escape the effects of visual illusions (Gonzalez et al., 2006). They argued that many of the earlier
studies showing that actions are sensitive to the effects of pictorial illusions required participants
to perform movements requiring different degrees of skill under different degrees of deliberate
control and with different degrees of practice. If one accepts the idea that high-level conscious
processing of visual information is mediated by the ventral stream (Milner and Goodale, 2006),
Different Modes of Visual Organization for Perception and for Action 677

(a) (b)
Long object

Distance between fingers (mm)


68
Short object
66
64
62
60
58
56
54
Grasping Perceptual
(maximum grip aperture) estimations
Fig. 33.2  The effect of the Ponzo illusion on grasping and manual estimates. (a) Two objects embedded
in the Ponzo illusion used in Ganel et al.’s (2008a) study. Although the right object is perceived as larger,
it is actually smaller in size. (b) Maximum grip apertures and perceptual estimation data show that the
fingers’ aperture was not affected by the perceived but rather tuned to the actual sizes of the objects.
Perceptual estimations, on the other hand, were affected by the Ponzo illusory context.
Reproduced from Tzvi Ganel, Michal Tanzer, and Melvyn A. Goodale, Psychological Science, 19(3), A Double
Dissociation Between Action and Perception in the Context of Visual Illusions: Opposite Effects of Real and Illusory
Size, pp. 221–225, doi:10.1111/j.1467-9280.2008.02071.x, Copyright © 2008 by SAGE Publications. Reprinted
by Permission of SAGE Publications.

then it is perhaps not surprising that the less skilled, less practiced, and thus, more deliberate an
action, the greater the chances that the control of this action would be affected by ventral stream
perceptual mechanisms. Gonzalez et al. (2006) provided support for this conjecture by demon-
strating that awkward, unpracticed grasping movements, in contrast to familiar precision grips,
were sensitive to the Ponzo and Ebbinghaus illusions. In a follow-up experiment, they showed that
the effects of these illusions on initially awkward grasps diminished with practice (Gonzalez et al.,
2008). Interestingly, similar effects of practice were not obtained for right-handed subjects grasp-
ing with their left hand. Even more intriguing is the finding that grasping with the left hand, even
for many left-handed participants, was affected to a larger degree by pictorial illusions compared
with grasping with right hand (Gonzalez et al., 2006). Gonzalez and colleagues have interpreted
these results as suggesting that the dorsal-stream mechanisms that mediate visuomotor control
may have evolved preferentially in the left hemisphere, which primarily controls right-handed
grasping. Additional support from this latter idea comes from work with patients with optic
ataxia from unilateral lesions of the dorsal stream (Perenin and Vighetto, 1988). Patients with
left-hemisphere lesions typically show what is often called a ‘hand effect’ – they exhibit a deficit
in their ability to visually direct reaching and grasping movements to targets situated in both the
contralesional and the ipsilesional visual field. In contrast, patients with right-hemisphere lesions
are impaired only when they reach out to grasp objects in the contralesional field.
Although the debate of whether or not action escapes the effects of perceptual illusions is far
from being resolved (for recent findings, see Foster et al., 2012; Heed et al., 2011; van der Kamp
et al., 2012), the focus on this issue has directed attention away from the more general question
of the nature of the computations underlying visuomotor control in more natural situations. One
example of an issue that has received only minimal attention from researchers is the role of infor-
mation about object shape on visuomotor control (but see Cuijpers et al., 2004, 2006; Goodale
et al., 1994b; Lee et al., 2008) – and how that information might differ in its organization from
conventional perceptual accounts of shape processing.
678 Goodale and Ganel

Studies of configural processing of shape


The idea that vision treats the shape of an object in a holistic manner has been a basic theme run-
ning through theoretical accounts of perception from early Gestalt psychology (Koffka, 1935) to
more contemporary cognitive neuroscience (e.g. Duncan, 1984; O’Craven et al., 1999). Encoding
an object holistically permits a representation of the object that preserves the relationship between
object parts and other objects in the visual array without requiring precise information about the
absolute size of each of the object’s dimensions (see Behrmann et al., 2013; Pomerantz and Cragin,
2013). In fact, as discussed earlier, calculating the exact size, distance, and orientation of every
aspect of every object in a visual scene carries a huge computational load. Holistic (or configural)
processing is much more efficient for constructing perceptual representations of objects. When
we interact with an object, however, it is imperative that the visual processes controlling the action
take into account the absolute metrics of the most relevant dimension of the object without being
influenced by other dimensions or features. In other words, rather than being holistic, the visual
processing mediating action should be analytical.
Empirical support for the idea that the visual control of action is analytical, rather than con-
figural comes from experiments using a variant of the Garner speeded classification task (Ganel
and Goodale, 2003). In these experiments, participants were required to either make perceptual
judgments of the width of rectangles or to grasp them across their width, while in both cases try-
ing to ignore the length. As expected, participants could not ignore the length of a rectangle when
making judgments of its width. Thus, when the length of a rectangle was varied randomly from
trial to trial, participants took longer to discriminate a wide rectangle from a narrow one than
when the length did not change. In sharp contrast, however, participants appeared to completely
ignore the length of an object when grasping it across its width. Thus, participants took no longer
to initiate (or to complete) their grasping movement when the length of the object varied than
when its length did not change. These findings show that the holistic processing that character-
izes perceptual processing does not apply to the visual control of skilled actions such as grasping.
Instead, the visuomotor mechanisms underlying this behavior deal with the basic dimensions
of objects as independent features. This finding of a dissociation between holistic and analytical
processing for perception and action, respectively, using Garner’s paradigm has been replicated by
several other different studies (Janczyk and Kunde, 2012; Kunde et al., 2007) and, more recently,
has been reported in young children (Schum et al., 2012).
Beyond being driven by configural processing, subjects’ inability to ignore information about
an irrelevant dimension when estimating the size of a relevant dimension often leads to a direc-
tional distortion in their size perception. In particular, because a rectangle’s width is always per-
ceived relative to its length, longer rectangles will be always perceived as narrower, even in cases in
which their actual width is kept constant (see Figure 33.3). This type of illusion, in which the per-
ceived element is affected by irrelevant dimensions belonging to the same object, has been termed
a within-object illusion (see Ben-Shalom and Ganel, 2012). Interestingly, it has been recently
argued that within-object illusions and between-objects illusions (discussed in the previous sec-
tion) rely on different cognitive mechanisms; for example, it has been shown that representations
in iconic memory are affected by the later type of illusions, but not by within-object illusions.
More relevant to the present discussion, it has been shown that within-object illusions, like
between-object illusions, do not affect visuomotor control. That is, unlike perceptual estimations
of rectangle’s width, which is affected by its length, the aperture of the fingers when grasping the
rectangle across its width was shown to be unaffected by length (Ganel and Goodale, 2003). Taken
together, all these findings point to the same conclusion, Unlike visual perception, which is always
Different Modes of Visual Organization for Perception and for Action 679

Fig. 33.3  An example of a within-object illusion of shape. Although the two rectangles have an
equal width, the shorter rectangle is perceived as wider than the taller rectangle (see Ganel and
Goodale, 2003; Ben-Shalom and Ganel, 2012).

affected by relative frames of reference, the visual control of action is more analytical and is there-
fore immune to the effects of both within-object and between-objects pictorial illusions.
Recent work also suggests that there are fundamental differences in scene segmentation for
perception and action planning. It is well-established that our perceptual system parses complex
scenes into discrete objects, but what is less known is that parsing is also required for planning
visually-guided movements, particularly when more than one potential target is present. In a
recent study, Milne et al. (2013) explored whether perception and motor planning use the same
or different parsing strategies, and whether perception is more sensitive to contextual effects than
is motor planning. To do this, they used the ‘connectedness illusion’, in which observers typically
report seeing fewer targets if pairs of targets are connected by short lines (Franconeri et al., 2009;
He et al., 2009; see Figure 33.4).
Milne et al. (2013) tested participants in a rapid reaching paradigm they had developed that
requires subjects to initiate speeded arm movements toward multiple potential targets before one
of the targets is cued for action (Chapman et al., 2010). In their earlier work, they had shown that
when there were an equal number of targets on each side of a display, participants aimed their ini-
tial trajectories toward a midpoint between the two target locations. Furthermore, when the dis-
tribution of targets on each side of a display was not equal (but each potential target had an equal
probability of becoming the goal target), initial trajectories were biased toward the side of the
display that contained a greater number of targets. They argued that this behavior maximizes the
chances of success on the task because movements are directed toward the most probable location
of the eventual goal, thereby minimizing the ‘cost’ of correcting the movement in-flight. Because
it provides a behavioral ‘read-out’ of rapid comparisons of target numerosity for motor planning,
the paradigm is an ideal way to measure object segmentation in action in the context of the con-
nectedness illusion. When participants were asked to make speeded reaches towards the targets
where sometimes the targets were connected by lines, their reaches were completely unaffected by
the presence of the connecting lines. Instead, their movement plans, as revealed by their move-
ment trajectories, were influenced only by the difference in the number of targets present on each
side of the display, irrespective of whether connecting lines were there or not. Not unexpectedly,
680 Goodale and Ganel

Fig. 33.4  There appear to be fewer circles on the right than on the left, even though in both cases
there are 22 individual circles. Connecting the circles with short lines creates the illusion of fewer
circles. Even so, when our brain plans actions to these targets it computes the actual number of
targets. In the task used by Milne et al. (2013) far fewer circles were used, but the effect was still
present in perceptual judgments but not in the biasing of rapid reaching movements. In the action
task, it was the actual not the apparent number of circles that affected performance.
Reproduced from Jennifer L. Milne, Craig S. Chapman, Jason P. Gallivan, Daniel K. Wood, Jody C. Culham, and
Melvyn A. Goodale, Psychological Science, 24(8), Connecting the Dots: Object Connectedness Deceives Perception
but Not Movement Planning, pp. 1456–1465, doi:10.1177/0956797612473485, Copyright © 2013 by SAGE
Publications. Reprinted by Permission of SAGE Publications.

however, when they were asked to report whether there were fewer targets present on one side
compared with the other, their reports were biased by the connecting lines between the targets.
The work by Milne et  al. (2013) suggests that scene segmentation for perception depends
on mechanisms that are distinct from those that allow humans to plan rapid and efficient
target-directed movements in situations where there are multiple potential targets. While the per-
ception of object numerosity can be dramatically influenced by manipulations of object grouping,
such as the connectedness illusion, the visuomotor system is able to ignore such manipulations,
and to parse individual objects and accurately plan, execute, and control rapid reaching move-
ments to multiple goals. These results are especially compelling considering that initial goal selec-
tion is undoubtedly based on a perceptual representation of the goal (for a discussion of this issue,
see Milner and Goodale, 2006). The planning of the final movement, however, is able to effectively
by-pass the contextual biases of perception, particularly in situations where rapid planning and
execution of the movement is paramount.

Studies of object size resolution


The 19th century German physician and scientist, Ernst Heinrich Weber, is usually credited
with the observation that our sensitivity to changes in any physical property or dimension of an
object or sensory stimulus decreases as magnitude of that property or dimension increases. For
example, if a bag of sugar weighs only 50 g, then we will notice a change in weight if a only few
grams of sugar are added or taken away. However, if the bag weighs 500 g, much more sugar must
be added or taken away before we notice the difference. Typically, if the weight of something is
doubled, then the smallest difference in weight that can be perceived is also doubled. Similar,
but not identical functions have been demonstrated for the loudness of sounds, the brightness
of visual stimuli, and a broad range of other sensory experiences. Imagine, for example, that you
are riding on an express train on your way to an important meeting. As the train accelerates from
220 to 250 km an hour, you might scarcely notice the change in velocity, even though the same
change in velocity was easily noticed as the train left the station earlier and began to accelerate.
Different Modes of Visual Organization for Perception and for Action 681

In short, the magnitude of the ‘just-noticeable difference’ (JND) increases with the magnitude
or intensity of the stimulus. The German physicist-turned-philosopher Gustav Fechner later for-
malized this basic psychophysical principle mathematically and called it Weber’s Law.
Weber’s law is one of the most fundamental features of human perception. It is not clear, how-
ever, if the visual control of action is subject to the same universal psychophysical function. To
investigate this possibility, Ganel and colleagues (Ganel et al., 2008b) carried out a series of psy-
chophysical and visuomotor experiments in which participants were asked either to grasp or to
make perceptual estimations of the length of rectangular objects. The JNDs were defined in this
study by using the standard deviation of the mean grip aperture and the standard deviation of the
mean perceptual judgment for a given stimulus. This is akin to the classical Method of Adjustment
in which the amount of variation in the responses for a given size of a stimulus reflects an ‘area of
uncertainty’ in which participants are not sensitive to fluctuations in size. Not surprisingly, Ganel
and colleagues found that the JNDs for the perceptual estimations of the object’s length showed
a linear increase with length, as Weber’s law would predict. The JNDs for grip aperture, however,
showed no such increase with object length and remained constant as the length of the object
increased (see Figure 33.5). In other words, the standard deviation for grip aperture remained
the same despite increases in the length of the object. Simply put, visually guided actions appear
to violate Weber’s law reflecting a fundamental difference in the way that object size is computed
for action and for perception (Ganel et al., 2008a,b). This fundamental difference in the psycho-
physics of perception and action has been found to emerge in children as young as 5 years of age
(Hadad et al., 2012, see Figure 33.6).

Grasping Perceptual adjustments


5.0 5.0

4.5 4.5

4.0 4.0

3.5 3.5
JND (mm)

JND (mm)

3.0 3.0

2.5 2.5

2.0 2.0

1.5 1.5

1.0 1.0
20 30 40 50 60 70 20 30 40 50 60 70
Object size (mm) Object size (mm)
Fig. 33.5  Effects of object size on visual resolution (Just Noticeable Difference: JND). (Left panel) The
effect of object size on JNDs for Maximum Grip Apertures (MGAs) during grasping. (Right panel)
The effect of object size on JNDs during perceptual estimations. Note that JNDs for the perceptual
condition increased linearly with length, following Weber’s law, whereas the JNDs for grasping were
unaffected by size.
Adapted from Current Biology, 18(14), Tzvi Ganel, Eran Chajut, and Daniel Algom, Visual coding for action
violates fundamental psychophysical principles, pp. R599–R601, Copyright (2008), with permission from Elsevier.
682 Goodale and Ganel

(a) (b)
6 6
5 5
4 4
JND (mm)

JND (mm)
3 3
2 2
1 1
0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50
Disk size (mm) Disk size (mm)

Adults Ages 5–6 Ages 7–8

Fig. 33.6  JNDs for perceptual estimations (a) and for grasping (b) in different age groups. In all age
groups, JNDs for perceptual condition increased with object size, following Weber’s law. Importantly,
however, the JNDs for grasping in all groups were unaffected by changes in the size of the target.
Reproduced from Functional dissociation between perception and action is evident early in life, Bat-Sheva
Hadad, Galia Avidan, and Tzvi Ganel, Developmental Science, 15(5), pp. 653–658, DOI: 10.1111/j.1467-
7687.2012.01165.x Copyright © 2012, Blackwell Publishing Ltd.

This difference in the psychophysics of perception and action can be observed in other contexts
as well. In a recent study (Ganel et al., 2012), for example, participants were asked to grasp or to
make perceptual comparisons between pairs of circular disks. Importantly, the actual difference in
size between the members of the pairs was set below the perceptual JND. Again, a dissociation was
observed between perceptual judgments of the size and the kinematic measures of the aperture of
the grasping hand. Regardless of the whether or not participants were accurate in their judgments
of the difference in size between the two disks, the maximum opening between the thumb and
forefinger of their grasping hand in flight reflected the actual difference in size between the two
disks (see Figure 33.7). These findings provide additional evidence for the idea that the computa-
tions underlying the perception of objects are different from those underlying the visual control
of action. They also suggest that people can show differences in the tuning of grasping movements
directed to objects of different sizes even when they are not conscious of those differences in size.
The demonstrations showing that the visual control of grasping does not obey Weber’s law
resonates with Milner and Goodale’s (2006) proposal that there is a fundamental difference in the
frames of reference and metrics used by vision-for-perception and vision-for-action (Ganel et al.
2008b). This findings also converge with the results of imaging studies that suggest that the ventral
and the dorsal streams represent objects in different ways (James et al., 2002; Konen and Kastner,
2008; Lehky and Sereno, 2007). Yet, the interpretation of these results has not gone unchallenged
(Heath et al., 2011, 2012; Holmes et al., 2011; Smeets and Brenner, 2008). For example, in a series
of papers, Heath and his colleagues (Heath et al., 2011, 2012; Holmes et al., 2011) have exam-
ined the effects of Weber’s law on grip aperture throughout the entire movement trajectory and
found an apparent adherence to Weber’s law early, but not later in the trajectory of the movement.
A  recent paper by Foster and Franz (2013), however, has suggested that these effects are con-
founded by movement velocity. In particular, due to task demands that require subjects to hold
their finger and thumb together prior to each grasp, subjects tend to open their fingers faster for
larger compared with smaller objects, a feature that characterizes only early stages of the grasping
Different Modes of Visual Organization for Perception and for Action 683

(a) (b)
53.0
Correct trials
52.8
Incorrect trials
52.6

Maximum grip aperture (mm)


52.4
52.2

52.0

51.8

51.6

51.4

51.2

51.0
Smaller disk Larger disk

Fig. 33.7  Grasping objects that are perceptually indistinguishable. (a) The set-up with examples of
the stimuli that were used. Participants were asked on each trial to report which object of the two
was the larger and then to grasp the object in each pair that was in the centre of the table (task
order was counterbalanced between subjects). (b) MGAs for correct and for incorrect perceptual
size classifications. MGAs reflected the real size differences between the two objects even in trials in
which subjects erroneously judged the larger object in the pair as the smaller one.
Reproduced from Tzvi Ganel, Erez Freud, Eran Chajut, and Daniel Algom, Accurate Visuomotor Control below
the Perceptual Threshold of Size Discrimination, PLoS One, 7(4), e36253, Figures 1 and 2 DOI: 0.1371/journal.
pone.0036253 Copyright © 2012, The Authors. This work is licensed under a Creative Commons Attribution 3.0
License.

trajectory. Therefore, the increased grip variability for larger compared with smaller objects dur-
ing the early portion of the trajectories could be attributed to velocity differences in the opening
of the fingers rather than to the effects of Weber’s law.
In their commentary on Ganel et  al.’s (2008b) paper, Smeets and Brenner (2008) argue that
the results can be more efficiently accommodated by a ‘double-pointing’ account of grasping.
According to this model, the movements of each finger of a grasping hand are controlled indepen-
dently, each digit being simultaneously directed to a different location on the goal object (Smeets
and Brenner, 1999, 2001). Thus, when people reach out to pick up an object with a precision grip,
for example, the index finger is directed to one side of the object and the thumb to the other. No
computation of object size is required, only the computation of two separate locations on the
object, one for the finger and the other for the thumb. The apparent scaling of the grip to object size
is nothing more than a by-product of the fact that the index finger and thumb are moving towards
their respective end points. Smeets and Brenner go on to argue that because size is not computed
for grasping, and only location matters, Weber’s law would not apply. In other words, because
location, unlike size, is a discrete, rather than a continuous dimension, Weber’s law is irrelevant
for grasping. Smeets and Brenner’s account also comfortably explains why grasping escapes the
effects of pictorial illusions, such as the Ebbinghaus and Ponzo illusions. In fact, more generally,
their double-pointing or position-based account of grasping would appear to offer a more parsi-
monious account of a broad range of apparent dissociations between vision-for-perception and
vision-for-action than appealing to a two-visual-systems model.
Although Smeets and Brenner’s (1999, 2001) interpretation is appealing, there are several lines
of evidence showing that finger’s trajectories during grasping are tuned to object size, rather than
684 Goodale and Ganel

location. For example, van de Kamp and Zaal (2007) have shown that when one side of a target
object, but not the other is suddenly pushed in or out (with a hidden compressed-air device)
as people are reaching out to grasp it, the trajectories of both digits are adjusted in flight. In
other words, the trajectories of the both the finger and the thumb change to reflect the change in
size of the target object. Smeets and Brenner’s model would not predict this. According to their
double-pointing hypothesis, only the digit going to the perturbed side of the goal object should
change course. The fact that the trajectories of both digits show an adjustment is entirely consist-
ent with the idea that the visuomotor system is computing the size of the target object. In other
words, as the object changes size, so does the grip.
Another line of evidence that goes against Smeets and Brenner’s double-pointing hypoth-
esis comes from the neuropsychological literature. Damage to the ventral stream in the human
occipitotemporal cortex can result in visual form agnosia, a deficit in visual object recognition.
The best-documented example of such a case is patient DF, who has bilateral lesions to the lateral
occipital area rendering her unable to recognize or discriminate between even simple geometric
shapes such as a rectangle and a square. Despite her profound deficit in form perception, she is
able to scale her grasp to the dimensions of the very objects she cannot describe or recognize,
presumably using visuomotor mechanisms in her dorsal stream. As is often the case for neuro-
logical patients, DF is able to (partially) compensate for her deficits by relying on non-natural
strategies based on their residual intact abilities. Schenk and Milner (2006), for example, found
that, under certain circumstances, DF could use her intact visuomotor skills to compensate for
her marked impairment in shape recognition. When DF was asked to make simple shape clas-
sifications (rectangle/square classifications), her performance was at chance. Yet, her shape clas-
sifications markedly improved when performed concurrently with grasping movements toward
the target objects she was being asked to discriminate. Interestingly, this improvement appeared
not to depend on afferent feedback from the grasping fingers because it was found that even
when DF was planning her actions and just before the fingers actually started to move. Schenk
and Milner therefore concluded that information about an object’s dimensions is available at
some level via visuomotor activity in DF’s intact dorsal stream and this, in turn, improves her
shape-discrimination performance. For this to happen, the dorsal-stream mechanisms would
have to be computing the relevant dimension of the object to be grasped and not simply the
locations on that object to which the finger and thumb are being directed (for similar evidence
in healthy individuals, see Linnell et al., 2005). Again, these findings are clearly not in line with
Smeets and Brenner’s double-pointing hypothesis and suggest that the dorsal stream uses infor-
mation about object size (more particularly, the relevant dimension of the target object) when
engaged in visuomotor control. Parenthetically, it is interesting to note that the results of one of
the experiments in the Schenk and Milner study also provide indirect evidence that grip aper-
ture is not affected by the irrelevant dimension of the object to be grasped (Ganel and Goodale,
2003). When DF was asked to grasp objects across a dimension that was not informative of shape
(i.e., grasp across rectangles of constant width that varied in length), no grasping-induced per-
ceptual improvements in distinguishing between the different rectangles were found. This find-
ing not only shows that shape per se was not being used in the earlier tasks where she did show
some enhancement in her ability to discriminate between objects of different widths, but it also
provides additional evidence for the idea that visuomotor control is carried out in an analytical
manner (e.g. concentrating entirely on object width) without being influenced by differences in
the configural aspects of the objects.
As mentioned at the beginning of the chapter, Milner and Goodale (2006) have argued
that visuomotor mechanisms in the dorsal stream tend to operate in real time. If the target
Different Modes of Visual Organization for Perception and for Action 685

object is no longer visible when the imperative to begin the movement is given, then any
object-directed action would have to be based on a memory of the target object, a memory
that is necessarily dependent on earlier processing by perceptual mechanisms in the ventral
stream. Thus, DF is unable to scale her grasp for objects that she saw only seconds earlier,
presumably because of the damage to her ventral stream (Goodale et al., 1994a). Similarly,
when neurologically intact participants are asked to base their grasping on memory repre-
sentations of the target object, rather than on direct vision, the kinematics of their grasping
movements are affected by Weber’s law and by pictorial illusions (Ganel et  al. 2008b; for
review, see Goodale, 2011). Again, without significant modification, Smeets and Brenner’s
double-pointing model does not provide a parsimonious account for why memory-based
action control should be affected by size, whereas real-time actions should not. However,
as we have already seen, according to the two-visual systems account, when vision is not
allowed and memory-based actions are performed, such actions have to rely on earlier per-
ceptual processing of the visual scene, processing that in principle is subject to Weber’s law
and pictorial illusions of size.

Conclusions
The visual control of skilled actions, unlike visual perception, operates in real time and reflects
the metrics of the real world. This means that many actions, such as reaching and grasping, are
immune to the effects of a range of pictorial illusions, which by definition affect perceptual judg-
ments. Only when the actions are deliberate and cognitively ‘supervised’ or are initiated after the
target is no longer in view do the effects of illusions emerge. All of this suggests that our perceptual
representations of objects are organized in a fundamentally different way from the visual informa-
tion underlying the control of skilled actions directed at those objects. As we have seen, the visual
perception of objects and their relations tends to be holistic and contextual with relative poor
real-world metrics, whereas the visual control of skilled actions is more analytical, circumscribed,
and metrically accurate. Of course, in everyday life, vision-for-perception and vision-for-action
work together in the production of purposive behavior  – vision-for-perception, together with
other cognitive systems, selects the goal object from the visual array, while vision-for-action work-
ing with associated motor networks, carries out the required computations for the goal-directed
action. In a very real sense, then, the strengths and weaknesses of these two kinds of vision com-
plement each other in the production of adaptive behavior.

References
Aglioti, S., DeSouza, J. F., and Goodale, M. A. (1995). Size-contrast illusions deceive the eye but not the
hand. Curr Biol 5(6): 679–685.
Behrmann, M., Richler, J., and Avidan, G. (2013). Holistic face perception. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Ben-Shalom, A., and Ganel, T. (2012). Object representations in visual memory: evidence from visual
illusions. J Vision 12(7): 1–11.
Bruno, N., and Franz, V. H. (2009). When is grasping affected by the Müller-Lyer illusion? A quantitative
review. Neuropsychologia 47(6): 1421–1433.
Carey, D. P. (2001). Do action systems resist visual illusions? Trends Cogn Sci 5(3): 109–113.
Chapman, C. S., Gallivan, J. P., Wood, D. K., Milne, J. L., Culham, J. C., and Goodale, M. A. (2010).
Reaching for the unknown: multiple target encoding and real-time decision making in a rapid reach
task. Cognition 116: 168–176.
686 Goodale and Ganel

Coren, S., and Girgus, J. S. (1978). Seeing is Deceiving: the Psychology of Visual Illusions. Hillsdale,
NJ: Lawrence Erlbaum Associates.
Cuijpers, R. H., Brenner, E., and Smeets, J. B. J. (2006). Grasping reveals visual misjudgements of shape.
Exp Brain Res 175(1): 32–44.
Cuijpers, R. H., Smeets, J. B. J., and Brenner, E. (2004). On the relation between object shape and grasping
kinematics. J Neurophysiol 91(6): 2598–2606.
Culham, J. C., and Valyear, K. F. (2006). Human parietal cortex in action. Curr Opin Neurobiol 16(2): 205–212.
Duncan, J. (1984). Selective attention and the organization of visual information. J Exp Psychol Gen
113(4): 501–517.
Foster, R. M., and Franz, V. H. (2013). Inferences about time course of Weber’s Law violate statistical
principles. Vision Res 78: 56–60.
Foster, R. M., Kleinholdermann, U., Leifheit, S., and Franz, V. H. (2012). Does bimanual grasping of
the Müller-Lyer illusion provide evidence for a functional segregation of dorsal and ventral streams?
Neuropsychologia 50(14): 3392–3402.
Franconeri, S. L., Bemis, D. K., and Alvarez, G. A. (2009). Number estimation relies on a set of segmented
objects. Cognition 113: 1–13.
Franz, V. H. (2003). Manual size estimation: a neuropsychological measure of perception? Exp Brain Res
151(4): 471–477.
Franz, V. H., Fahle, M., Bülthoff, H. H., and Gegenfurtner, K. R. (2001). Effects of visual illusions on
grasping. J Exp Psychol Hum Percept Perform 27(5): 1124–1144.
Franz, V. H., and Gegenfurtner, K. R. (2008). Grasping visual illusions: consistent data and no dissociation.
Cogn Neuropsychol 25(7–8): 920–950.
Franz, V. H., Gegenfurtner, K. R., Bülthoff, H. H., and Fahle, M. (2000a). Grasping visual illusions: no
evidence for a dissociation between perception and action. Psychol Sci 11(1): 20–25.
Franz, V. H., Gegenfurtner, K. R., Bülthoff, H. H., and Fahle, M. (2000b). Grasping visual illusions: no
evidence for a dissociation between perception and action. Psychol Sci 11(1), 20–25.
Ganel, T., Chajut, E., and Algom, D. (2008b). Visual coding for action violates fundamental psychophysical
principles. Curr Biol 18(14): R599–601.
Ganel, T., Freud, E., Chajut, E., and Algom, D. (2012). Accurate visuomotor control below the perceptual
threshold of size discrimination. PloS One 7(4): e36253.
Ganel, T., and Goodale, M. A. (2003). Visual control of action but not perception requires analytical
processing of object shape. Nature 426(6967): 664–667.
Ganel, T., Tanzer, M., and Goodale, M. A. (2008a). A double dissociation between action and
perception in the context of visual illusions: opposite effects of real and illusory size. Psychol Sci
19(3): 221–225.
Glover, S., and Dixon, P. (2002). Dynamic effects of the Ebbinghaus illusion in grasping: support for a
planning/control model of action. Percept Psychophys 64(2): 266–278.
Gonzalez, C. L. R, Ganel, T., Whitwell, R. L., Morrissey, B., and Goodale, M. A. (2008). Practice makes
perfect, but only with the right hand: sensitivity to perceptual illusions with awkward grasps decreases
with practice in the right but not the left hand. Neuropsychologia 46(2): 624–631.
Gonzalez, C. L. R, Ganel, T., and Goodale, M. A. (2006). Hemispheric specialization for the visual control
of action is independent of handedness. J Neurophysiol 95(6): 3496–3501.
Goodale, M. A. (2011). Transforming vision into action. Vision Res 51(13): 1567–1587.
Goodale, M. A, Jakobson, L. S., and Keillor, J. M. (1994a). Differences in the visual control of pantomimed
and natural grasping movements. Neuropsychologia 32(10): 1159–1178.
Goodale, M. A, Meenan, J. P., Bülthoff, H. H., Nicolle, D. A., Murphy, K. J., and Racicot, C. I. (1994b).
Separate neural pathways for the visual analysis of object shape in perception and prehension. Curr Biol
4(7): 604–610.
Different Modes of Visual Organization for Perception and for Action 687

Goodale, M. A, and Milner, A. D. (1992). Separate visual pathways for perception and action. Trends
Neurosci 15(1): 20–25.
Goodale, M. A., and Milner, A. D. (2005). Sight Unseen: An Exploration of Conscious and Unconscious
Vision. New York: Oxford University Press.
Gregory, R. L. (1963). Distortion of visual space as inappropriate constancy scaling. Nature 199(678-91): 1.
Hadad, B-S., Avidan, G., and Ganel, T. (2012). Functional dissociation between perception and action is
evident early in life. Develop Sci 15(5): 653–658.
Haffenden, A. M., and Goodale, M. A. (1998). The effect of pictorial illusion on prehension and perception.
J Cogn Neurosci 10(1): 122–136.
He, L., Zhang, J., Zhou, T., and Chen, L. (2009). Connectedness affects dot numerosity
judgment: Implications for configural processing. Psychonom Bull Rev 16: 509–517.
Heath, M., Holmes, S. A., Mulla, A., and Binsted, G. (2012). Grasping time does not influence the early
adherence of aperture shaping to Weber’s law. Frontiers Hum Neurosci 6: 332.
Heath, M., Mulla, A., Holmes, S. A., and Smuskowitz, L. R. (2011). The visual coding of grip aperture
shows an early but not late adherence to Weber’s law. Neurosci Lett 490(3): 200–204.
Heed, T., Gründler, M., Rinkleib, J., Rudzik, F. H., Collins, T., Cooke, E., and O’Regan, J. K. (2011). Visual
information and rubber hand embodiment differentially affect reach-to-grasp actions. Acta Psychol
138(1): 263–271.
Holmes, S. A., Mulla, A., Binsted, G., and Heath, M. (2011). Visually and memory-guided grasping: aperture
shaping exhibits a time-dependent scaling to Weber’s law. Vision Res 51(17): 1941–1948.
James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., and Goodale, M. A. (2002). Differential effects of
viewpoint on object-driven activation in dorsal and ventral streams. Neuron 35(4): 793–801.
Janczyk, M., and Kunde, W. (2012). Visual processing for action resists similarity of relevant and irrelevant
object features. Psychonom Bull Rev 19(3): 412–417.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace.
Konen, C. S., and Kastner, S. (2008). Two hierarchically organized neural systems for object information in
human visual cortex. Nature Neurosci 11(2): 224–231.
Kravitz, D. J., Saleem, K., Baker, C. I., and Mishkin, M. (2011). A new neural framework for visuospatial
processing. Nature Rev Neurosci 12(4): 217–230.
Kunde, W., Landgraf, F., Paelecke, M., and Kiesel, A. (2007). Dorsal and ventral processing under
dual-task conditions. Psychol Sci 18(2): 100–104.
Lee, Y-L., Crabtree, C. E., Norman, J. F., and Bingham, G. P. (2008). Poor shape perception is the reason
reaches-to-grasp are visually guided online. Percept Psychophys 70(6): 1032–1046.
Lehky, S. R., and Sereno, A. B. (2007). Comparison of shape encoding in primate dorsal and ventral visual
pathways. J Neurophysiol 97(1): 307–319.
Linnell, K. J., Humphreys, G. W., McIntyre, D. B., Laitinen, S., and Wing, A. M. (2005). Action modulates
object-based selection. Vision Res 45(17): 2268–2286.
Milne, J. L., Chapman, C. S., Gallivan, J. P., Wood, D. K., Culham, J. C., and Goodale, M. A. (2013).
Connecting the dots: object connectedness deceives perception by not movement planning. Psychol Sci
24(8): 1456–1465.
Milner, A. D., and Goodale, M. A. (2006). The Visual Brain in Action, 2nd edn. New York: Oxford
University Press.
O’Craven, K. M., Downing, P. E., and Kanwisher, N. (1999). fMRI evidence for objects as the units of
attentional selection. Nature 401(6753): 584–587.
Perenin, M. T., and Vighetto, A. (1988). Optic ataxia: a specific disruption in visuomotor mechanisms.
I. Different aspects of the deficit in reaching for objects. Brain: J Neurol 111(3): 643–674.
Pomerantz, J. R., and Cragin, A. I. (2014). Emergent features and feature combination. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans. Oxford, U.K: Oxford University Press.
688 Goodale and Ganel

Roberts, B., Harris, M. G., and Yates, T. A. (2005). The roles of inducer size and distance in the Ebbinghaus
illusion (Titchener circles). Perception 34(7): 847–856.
Schenk, T., and Milner, A. D. (2006). Concurrent visuomotor behaviour improves form discrimination in a
patient with visual form agnosia. Eur J Neurosci 24(5): 1495–1503.
Schum, N., Franz, V. H., Jovanovic, B., and Schwarzer, G. (2012). Object processing in visual perception
and action in children and adults. J Exp Child Psychol 112(2): 161–177.
Smeets, J. B., and Brenner, E. (1999). A new view on grasping. Motor Control 3(3): 237–271.
Smeets, J. B., and Brenner, E. (2001). Independent movements of the digits in grasping. Exp Brain Res
139(1): 92–100.
Smeets, J. B., and Brenner, E. (2008). Grasping Weber’s law. Curr Biol 18(23): R1090–1091.
Stöttinger, E., Pfusterschmied, J., Wagner, H., Danckert, J., Anderson, B., and Perner, J. (2012). Getting
a grip on illusions: replicating Stöttinger et al [Exp Brain Res (2010) 202: 79–88] results with 3-D
objects. Exp Brain Res 216(1): 155–157.Thaler, L., and Goodale, M. A. (2010). Beyond distance and
direction: the brain represents target locations non-metrically. J Vision 10(3): 3.1–27.
Van de Kamp, C., and Zaal, F. T. (2007). Prehension is really reaching and grasping. Exp Brain Res
182(1): 27–34.
Van der Kamp, J., De Wit, M. M., and Masters, R. S. W. (2012). Left, right, left, right, eyes to the front!
Müller-Lyer bias in grasping is not a function of hand used, hand preferred or visual hemifield, but
foveation does matter. Exp Brain Res 218(1): 91–98.
Westwood, D. A., and Goodale, M. A. (2003). Perceptual illusion and the real-time control of action.
Spatial Vision 16(3–4): 243–254.
Section 8

Special interest topics


Chapter 34

Development of perceptual
organization in infancy
Paul C. Quinn and Ramesh S. Bhatt

Introduction
Even simple visual displays can have multiple interpretations. Consider the stimulus depicted
in Figure 34.1A. Why is it that most adults report perceiving an overlapping hexagon and cross,
despite the fact that other interpretations, such as those in Figure 34.1B–D, are equally physically
possible? As put by Metzger (1936/2006, p. 43, italics from original text), the ‘stimulus distribution
in the eye is always infinitely ambiguous’. One could argue that the favoured interpretation receives
support from language and instruction, given that during development we come to learn that the
labels ‘hexagon’ and ‘cross’ refer to those particular constellations of contours. However, the rapid
emergence of visual cognition (with many grouping phenomena evident in the initial months of
life), combined with the difficulty of the problem, suggests that the development of perceptual
organization results from the imposition of strong constraints (Quinn et al. 2008a). This chapter
will take up the task of identifying those constraints and explicating their developmental deter-
minants. In particular, we will examine how the constraints are a mix of the inherent operational
characteristics of the visual system and the learning engendered by a structured environment
(Bhatt and Quinn 2011). First, however, we consider some theoretical accounts of the ontogeny
of perceptual organization.

Historical Theoretical Positions on the Development


of Perceptual Organization
Gestalt accounts
For the Gestaltists, holistic percepts are realized even on initial presentation of a visual pattern
(Wagemans et al. 2012). As stated by Köhler (1929, p. 163), ‘elementary organization is an origi-
nal sensory fact’ and it occurs because our perceptual systems are constrained to follow certain
grouping principles that operate on the basis of the proximity, similarity, common movement,
and good continuation properties of the elements. The perception of one organization over
other organizations that are equally physically possible reflects adherence to such principles
(Wertheimer 1923/1958). Emphasizing the nativist basis for perceptual organization, Koffka
(1935, p. 209) observed that, ‘Whereas to traditional psychology the articulation of our field into
things . . . appears as a clear example of experience or learning, our theory considers this articula-
tion as the direct result of . . . the spontaneous organization aroused by the stimulus mosaic’.
Zuckerman and Rock (1957) sided with Gestalt claims of innate organizing processes on the
grounds of logic and parsimony. That is, if one does not posit such processes, then the starting
692 Quinn and Bhatt

(a) (c) (d)

(b)

Fig. 34.1  (a) Configuration of contours perceived as a hexagon and cross, even though one could
just as readily perceive (b), (c), and (d).
Reproduced from Metzger, Wolfgang. Translated by Lothar Spillman., Laws of Seeing, figure 27, © 2006
Massachusetts Institute of Technology, by permission of The MIT Press.

point for infants is an unorganized ‘mosaic of sensory impressions’ (Zuckerman and Rock 1957,
p. 278), and experience with different shapes and forms must somehow induce the transforma-
tion of sensory data into bounded regions. Such transformation is presumably mediated through
memory but, according to Zuckerman and Rock, if that memory consists of amorphous sensa-
tions rather than cohesive shapes then it is unclear how it could lead to subsequent organized
percepts. Instead, it is simpler to assume that innate organizing processes account for the initial
structuring of visual displays into coherent patterns. As summarized by Zuckerman and Rock
(1957, p. 291), ‘the organization of the visual field into shaped areas is not an outcome of learn-
ing—past experience cannot carve visual form out of initially formless perception’.

Learning accounts
Two other views of the development of perceptual organization have proposed mechanisms that
allow one to more readily envision how organization could emerge, even if it is not the initial start-
ing point. For Hebb (1949), perception of a whole object is a learned process that is founded in per-
ception of the individual features of the object and the integration of those perceptions as achieved
through eye movements. As described by Hebb (1949, p. 83), ‘If line and angle are the bricks from
which form perceptions are built, the primitive unity of the figure might be regarded as the mortar,
and eye movement as the hand of the builder’. For Hebb, the emergence of perceptual organiza-
tion would take considerable developmental time because of dependence on improvements in eye
movements that yield more holistic perceptions as visual scanning becomes more systematic.
Another account of the emergence of perceptual organization relies neither on inherent con-
straints nor on perceptual learning that occurs from the development of visual scanning, but
rather on the learning of probabilistic image statistics derived from regularities in the environ-
ment (Brunswik and Kamiya 1953; Elder and Goldberg 2002; Elder, this volume). Consider the
organizing principle of proximity, which specifies that close elements will be grouped together. In
the Brunswik and Kamiya view, proximity may actually be learned because image elements that
correspond to the same object are likely to be closer to each other than elements that correspond
to different objects. Likewise, in the case of lightness similarity, discontinuities in luminance cues
are correlated with boundaries where one object ends and another begins. The discovery of such
Development of Perceptual Organization in Infancy 693

correlations by infants can presumably be used as a basis for integrating sequences of elements
that project from common structures in a visual scene.
With different theorists offering differing accounts of the development of perceptual organiza-
tion, some stressing innate grouping factors and others emphasizing ways in which visual order
could emerge through maturation of internal mechanisms or experience with a structured envi-
ronment, we turn to a discussion of the evidence.

Initial Eye Movement Evidence in Infants: Salapatek (1975)


At the time that Gestalt theory and reactions to it were being composed, methodologies were
not available to investigate perceptual abilities in infants. However, such methods did become
available in the 1960s and 1970s, and one technique in particular provided some early evidence
relevant to the debate. Specifically, Salapatek (1975) recorded infants’ eye movements while they
visually scanned simple outline figures, and reported a developmental trend over the first months
of life in which scanning was initially limited to single features and gradually expanded to include
multiple features and eventually the whole pattern. These eye movement data are consistent with
a Hebbian account of the development of perceptual organization, although one can question how
direct a relation there is between fixation and the surrounding expanse of visual attention. That
is, if visual attention is distributed broadly about the fixation point, then an infant who fixates a
corner of a triangle could actually be processing information across the entire triangle. For this
reason, it is unclear what inferences can be drawn from infant visual scanning data, at least as they
pertain to the ontogeny of perceptual organization.

Demonstrations of Organizational Phenomena in Infants


Looking-time procedures used to study perceptual organization in infants are based on the
infant’s visual preference for novel stimuli (Fantz 1964). To determine whether two stimuli can be
discriminated, for example, infants can be familiarized with one of the stimuli and subsequently
presented with the familiar stimulus paired with the novel stimulus. A preference for the novel
stimulus that cannot be attributed to a priori preference implies that infants have recognized the
familiar stimulus and can discriminate between it and the novel stimulus.

Configural superiority
A strategy for researchers interested in the start-up of visual cognition has been to take empirical
phenomena supportive of a particular mental faculty in adults and adapt looking-time procedures
to study those same phenomena in infants. One such occurrence relevant to perceptual organi-
zation is the configural-superiority effect (Pomerantz 1981; Chapter 26, this volume). In adults,
configural superiority is in evidence when the mirror image line elements shown in Figure 34.2A
are found easier to discriminate when embedded in the non-informative contextual frame shown
in Figure 34.2B (Pomerantz et al. 1977). This result poses difficulty for feature analytic models
of visual processing, because if one were processing only the features of the visual forms (i.e. the
individual line segments), then the stimuli in Figure 34.2B should be more easily confused than
those in Figure 34.2A given the overlap of features in the horizontal and vertical line segments.
Instead, the finding suggests that emergent relations between features (i.e. angles, corners, whole
forms) are represented when processing visual patterns.
It could be argued that the configural-superiority effect shown in Figure 34.2A and B is lin-
guistically based given that labels such as ‘arrow’ versus ‘triangle’ may generate an acquired
694 Quinn and Bhatt

Configural superiority Fig. 34.2  Configural superiority: (a) line segments


(a) in isolation; (b) line segments embedded in
a right-angle contextual frame. Subjective
contours: configuration of elements produces
(c) and does not produce (d) the perception of a
square shape.
Reprinted from Infant Behavior and Development, 9(1),
Paul C. Quinn and Peter D. Eimas, Pattern-line effects
(b) and units of visual processing in infants, pp. 57–70,
doi:10.1016/0163-6383(86)90038-X Copyright (1986),
with permission from Elsevier.
Reprinted from Infant Behavior and Development,
Subjective contours
13(2), Hei-Rhee Ghim, Evidence for perceptual
(c) (d)
organization in infants: Perception of subjective
contours by young infants, pp. 221–48, doi:
10.1016/0163-6383(90)90032-4 Copyright (1990),
with permission from Elsevier.

distinctiveness of the patterns. However, that interpretation is defeated by demonstrations of con-


figural superiority in young infants (Bomba et al. 1984; Colombo et al. 1984; Quinn and Eimas
1986). In Quinn and Eimas (1986), for example, 3- to 4-month-olds familiarized with a single
line element showed no preference when tested with the familiar element paired with its mirror
image line element (Figure 34.2A). By contrast, when these elements were embedded in the right-
angle contextual frame (Figure 34.2B), the infants reliably preferred the novel stimulus. These
results suggest that the configural-superiority effect is perceptually based, and that young infants
represent more global visual processing units that emerge when simple components are grouped
together.

Global precedence
Another perceptual effect that has been considered as evidence of organization in adults and that
has been of interest to developmentalists is the global-precedence effect (Navon 1977; Kimchi,
this volume). In the procedure used to generate this effect, adult observers are presented with a
multilevel stimulus consisting of a large letter made from small letters. The global letter matches
or does not match the local letter and the observer’s task is to identify either the global letter or
the local letters. The key findings are that: (1) response times are faster to the global letter, (2) con-
flicting local letters do not impact upon processing at the global level, and (3) a conflicting global
letter interferes with processing of the local letters. This pattern of outcomes indicates that global
aspects of a stimulus are processed and recognized before local aspects.
Ghim and Eimas (1988) investigated whether a global precedence effect could be demonstrated
in young infants. In one condition, 3- to 4-month-old infants were familiarized with a global
square made up of local squares followed by either a local or global preference test. The local test
contrasted a pair of global diamond stimuli, one constructed from local squares and the other
from local diamonds. By contrast, the global test paired a global square with a global diamond,
each composed of novel local diamonds. If global precedence is occurring, then in the local test,
the novelty at the global level would lead infants to divide their attention evenly between the two
Development of Perceptual Organization in Infancy 695

stimuli, even though there is a source of novelty at the local level residing in the local diamonds.
However, in the global test, infants should prefer the global diamond, even though there is a com-
peting source of novelty from the local diamonds. These predictions were confirmed: infants in
the local test did not respond differentially, whereas those in the global test preferred the global
diamond (even though a control condition showed that infants were sensitive to the change in the
local elements). The findings provide evidence that, as is the case with adults, global information
has a processing advantage over local information in young infants (see also Frick et al. 2000).

Subjective contours
Yet another manifestation of organization in adult vision is the perception of subjective con-
tours (Kanizsa 1955; van Lier and Gerbino, this volume). Consider Figure 34.2C: adults perceive
a white square atop some pacman shapes. The contour appears to continue across the white
space between the shapes, thereby suggesting a completion process. Although one can argue for
a top-down explanation and suggest that the completion process is facilitated by knowledge of
the square form, this explanation is weakened by demonstrations that infants perceive illusory
contours (Ghim 1990; Johnson and Aslin 1998; Kavsek 2002; Hayden et al. 2008). For example,
Ghim (1990) reported that 3- to 4-month-olds were more likely to display novelty preferences in
tasks involving a pattern that elicited the perception of subjective contours (Figure 34.2C) versus
one that did not (Figure 34.2D) relative to tasks involving two patterns neither of which produced
subjective contours. In addition, after familiarization with an outline square, infants preferred
a pattern that did not produce subjective contours (Figure 34.2D) to one that did produce the
illusory square in adults (Figure 34.2C). This evidence suggests that, like adults, young infants are
capable of a completion process that produces the perception of subjective contours.
Demonstrations of configural superiority, global precedence, and subjective contours in infants
suggest that at least some of the mechanisms that produce perceptual organization in adults are
also functional in the initial months of life. However, these demonstrations are less informative
about how infants relate individual elements to each other. For example, in the cases of configural
superiority and global precedence, was it the Gestalt principles of closure, good continuation,
proximity, lightness similarity, or form similarity or some combination that allowed infants to
organize the patterns? Similarly, in the case of subjective contours, any of the above principles,
with the exception of proximity, could be involved. To better identify which specific grouping fac-
tors are functional during early development, some investigators have taken the approach of stud-
ying how infants will respond to displays of elements that could be organized by one or another
principle. We now turn to a discussion of these studies.

An Influential Study: Kellman and Spelke (1983)


Kellman and Spelke (1983) presented 4-month-olds with a display of a rod partly occluded by a block.
The question was whether infants represented the continuity of the rod behind the block based on the
Gestalt properties of the visible portions of the rod including their good continuation and similar-
ity of shape. After familiarization with the rod–block display, infants were presented with a complete
rod versus a broken rod. If infants represented the continuity of the rod, then they should respond
to the complete rod as familiar and the broken rod as novel. However, if the rod fragments were not
grouped together, then the broken rod should be perceived as familiar and the complete rod as novel.
The infants looked equally to the complete and broken rod displays, a null result that was difficult
to interpret. A follow-up experiment was conducted that was a replication of the initial experiment,
except that the cue of common motion was added to the visible portions of the rod. In this instance, the
696 Quinn and Bhatt

infants preferred the broken rod. An additional experiment that pitted common motion against good
continuation and similarity confirmed that it was common motion alone rather than the combination
of common motion, good continuation, and similarity that enabled infants to group the rod. Moreover,
using a similar methodology, Spelke (1982) reported that same-aged infants perceived the continuity
of two adjacent objects as long as their surfaces were contiguous and even when those surfaces were
dissimilar in size, shape, and textural markings.
These results led Spelke (1982) to develop a hybrid account of the development of object
perception, incorporating innate organizing principles as well as a role for learning based on
experience with a structured environment. Specifically, Spelke argued that infants at birth are
constrained by two core organizational principles, common movement and connected surface.
Adherence to these principles would parse from a visual scene those surfaces that move together
and maintain their coherence as they move and grant them the status of objects. The resulting
object ‘blobs’ can then be tracked over real time. Such experience, according to Spelke, allows
infants to discover that objects exhibit other properties including proximity of parts, similarity
of surface, and good continuation of contour (Brunswik and Kamiya 1953). In this way, some of
the principles that were considered to be innate organizing principles by the Gestaltists were, by
the Spelke account, learned through their natural correlation with the core principles.

Further Work on Perceptual Grouping of Visual Displays


by Infants via Classic Organizational Principles
Questions that arise from the initial Spelke (1982) account concern the status of grouping princi-
ples such as similarity, good continuation, and proximity. While common motion was found to be
a more potent determinant of grouping relative to similarity and good continuation in a stimulus
display in which different sources of information were in competition, one can ask whether simi-
larity and good continuation might be functional when not in competition with another principle.
Moreover, although similarity and good continuation were not sufficiently strong to provide a
basis for grouping when they were the sole sources for organization in the case of partial occlusion
with the rod–box display, these principles might be functional for displays in which the elements
are completely visible. In addition, Kellman and Spelke (1983) investigated a particular type of
similarity (form), thus leaving unresolved the issue of whether infants might be able to utilize
other forms of similarity (e.g. lightness). Furthermore, one other classic Gestalt principle, proxim-
ity, was not investigated, thereby keeping open the question of its functionality.

Lightness similarity
Quinn et al. (1993) asked whether 3-month-olds could utilize lightness similarity to organize col-
umns or rows of elements that could be grouped only on the basis of their lightness versus darkness
(see also Quinn and Bhatt 2006). The test stimuli were horizontal versus vertical bars (see Figure
34.3, top panel). If the organization in the row and column arrays is apprehended, then infants
familiarized with columns should prefer horizontal bars and infants familiarized with rows should
prefer vertical bars. The findings provided positive evidence for use of lightness similarity: infants
preferred the novel organization of bars. An additional control experiment showed that infants
could discriminate between arrays differing in the shape (square versus diamond) of the dark or
light elements. This latter finding mitigates explanations of the preference for the novel organization
based on immature resolution acuity and indicates that infants were able to perceive the individual
elements of the displays and organize them into larger perceptual units (i.e. rows versus columns)
based on lightness similarity. Of note is that Farroni et al. (2000) used a similar methodology to
Development of Perceptual Organization in Infancy 697

Luminance
Familiar

Test

vs.

Proximity
Familiar Test
Columns vs.

Rows vs.

Fig. 34.3  Luminance (top panel): familiarization and test stimuli used in the study of Quinn et al.
(1993) investigating whether 3-month-old infants can organize visual patterns in accord with
lightness similarity. Proximity (bottom panel): familiarization and test stimuli used to determine
whether infants adhere to proximity when organizing visual patterns.
Reprinted from Acta Psychologica, 127(2), Paul C. Quinn, Ramesh S. Bhatt, and Angela Hayden, Young infants
readily use proximity to organize visual pattern information, pp. 289–98, doi: 10.1016/j.actpsy.2007.06.002
Copyright (2008), with permission from Elsevier.

argue that even newborns adhere to lightness similarity when organizing visual patterns; however,
because that study did not determine if the individual light elements could be resolved, it left open
the question of whether the displays were organized via the proximity of the dark elements.

Proximity
Another classic grouping principle investigated was proximity (Quinn et al. 2008b). As shown
in the bottom panel of Figure 34.3, using the same methodology as Quinn et  al. (1993), 3- to
4-month-olds were presented with arrays of elements that could be organized into rows or col-
umns via proximity, and then tested with horizontal versus vertical bars. Infants preferred the test
stimuli with the novel organization, and subsequent control experiments indicated that the pref-
erences were not attributable to an a priori preference or to an inability to resolve elements within
the rows and columns. The results indicate that proximity joins lightness similarity as a grouping
principle that can be used to organize visual patterns by young infants.

Good continuation
A third classic static principle investigated was good continuation (Quinn and Bhatt 2005a). In con-
trast to the column versus row methodology used to study lightness similarity and proximity, a
methodology was adopted that had been used to investigate good continuation grouping by adults
698 Quinn and Bhatt

Familiar stimulus Test stimuli


(a)

(b)

Fig. 34.4  Examples of the familiarization and test stimuli used in Quinn and Bhatt (2005a). The
in-line condition is depicted in (a) and the off-line condition in (b).

(Prinzmetal and Banks 1977). The displays (shown in Figure 34.4) consisted of a line of circular dis-
tracters and a square or diamond target. Infants were presented with one pattern and then tested for
discrimination between the familiar pattern and a novel one. In the top panel A, the target appeared
in line, embedded, or aligned (and thus in good continuation) with the distracters, whereas in the
bottom panel B, the target was off line with the distracters. The expectation is that if infants per-
ceived the patterns in accord with good continuation, then the change in the target should be more
difficult to detect when the target is in a good continuation relation with the distracters, as in the
in-line condition. By contrast, in the off-line condition, the target would not group with the distract-
ers and would retain its status as an independently processed unit of information, thereby increasing
the likelihood that a change in its form would be detected. Three- to 4-month-olds preferred the
novel test stimulus in the off-line condition, but not in the in-line condition. This evidence suggests
that good continuation is a third organizational principle available to young infants.

Form similarity
The functionality of form similarity in young infants was examined by Quinn et al. (2002),
who drew upon the methodology that was used to investigate lightness similarity and prox-
imity. As shown in Figure 34.5, 3- to 4-month-olds were familiarized with rows or columns
of Xs versus Os, and then tested with horizontal versus vertical bars. If infants group the
familiarization stimulus into rows or columns via form similarity, then they should prefer the
novel organization of bars. However, the infants did not display such a preference, even when
familiarization time was doubled; instead, attention was divided between the test stimuli.
A  control study showed that infants were capable of discriminating between the familiari-
zation arrays and arrays that consisted entirely of Xs or Os. This latter result indicates that
failure of the infants to use form similarity was not due simply to an inability to discriminate
between the constituent X and O shapes.
With the data thus far described not demonstrating the use of form similarity by young infants,
Quinn et al. (2002) tested older infants aged 6 to 7 months on the form similarity task. This age
group preferred the novel organization. Thus, 6- to 7-month-olds, but not 3- to 4-month-olds, can
organize visual patterns in accord with form similarity. In combination with outcomes indicat-
ing that 3- to 4-month-olds can utilize lightness similarity, proximity, and good continuation to
organize visual patterns under similar testing conditions (Quinn et al. 1993, 2008b; Quinn and
Bhatt 2005a); the results indicating that only 6- to 7-month-olds can use form similarity suggest
Development of Perceptual Organization in Infancy 699

Familiar

Test

or vs.

Fig. 34.5  Examples of the familiarization and test stimuli used to test for perceptual organization by
form similarity in Quinn et al. (2002).
Reproduced from Paul C. Quinn, Ramesh S. Bhatt, Diana Brush, Autumn Grimes, and Heather Sharpnack,
Psychological Science, 13(4), Development of Form Similarity as a Gestalt Grouping Principle in Infancy,
pp. 320–328, doi: 10.1111/1467-9280.00458, Copyright © 2002 by SAGE Publications. Reprinted by Permission
of SAGE Publications.

that different Gestalt principles may become functional over different time courses of develop-
ment and that not all principles are readily deployed.
The findings are inconsistent with a strict Gestalt view that all organizing principles are
automatically activated upon first encounter with a visual pattern (e.g. Köhler 1929). The
data are, however, consistent with evidence indicating that adults have independent lumi-
nance- and edge-based grouping mechanisms (Gilchrist et al. 1997). They are also in accord
with the finding that some visual agnosics show intact lightness similarity and proximity
grouping, but impaired shape configuring and form-based grouping ability (Behrmann and
Kimchi 2003; Humphreys 2003), and the result that individuals with Williams syndrome
show superior lightness similarity and good continuation grouping abilities relative to those
for form similarity (Farran 2005). The developmental evidence contrasting the time course
of emergence of the principles of proximity and form similarity in infants is moreover con-
sistent with microgenetic evidence in adults indicating that proximity grouping occurs more
rapidly than form-based grouping in the time course of processing (Ben-Av and Sagi 1995;
Han et al. 1999). However, we now consider evidence indicating that the inability of young
infants to use form to organize visual images is not absolute.

Contribution of learning to the development


of form-based grouping
Because Quinn et  al. (2002) reported a later development of form similarity, Quinn and Bhatt
(2005b) sought to determine whether this development was driven by maturation or learning. They
reasoned that if form similarity is under experiential control, then it might be possible to find a
stimulus display or procedural manipulation that would allow 3- to 4-month-olds to organize vis-
ual patterns in accord with form similarity. Alternatively, if infants’ use of form similarity is matu-
rationally determined, then methodological variants would not be expected to bring about positive
evidence that form similarity is functional in the younger age group. Given that Quinn et al. (2002)
found that the X–O form contrast yielded null results with 3- to 4-month-olds, Quinn and Bhatt
(2005b) tested this age group with two other form contrasts, square versus diamond geometric
shapes and H versus I letter shapes, on the form similarity task. Neither contrast was successful in
producing a preference for the novel organization; infants in both cases divided attention between
700 Quinn and Bhatt

the horizontal and vertical bars. This result suggests that young infants’ inability to organize by
form similarity is not a specific deficit with Xs versus Os, but rather a more general phenomenon.
A second attempt to determine if 3- to 4-month-olds could be induced to use form similarity employed
a training regime. Specifically, Quinn and Bhatt (2005b) asked whether variations in the patterns used to
depict rows or columns during familiarization would enhance infants’ performance in the form similarity
task. One may reason that pattern variation will facilitate performance because the invariant organization
of the stimuli will be more easily detected against a changing background. In other words, variation might
provide infants with the opportunity to form concepts of ‘rows’ or ‘columns’. To investigate this possibil-
ity, the form similarity task that had previously produced null results (when each of the three different
form contrasts was presented individually) was administered, but in this instance with each of the three
form contrasts presented during a single familiarization session (see Figure 34.6). Younger infants now
preferred the novel organization of bars. This striking result suggests that 3- to 4-month-olds can use form
similarity to organize elements if they are provided with varied examples with which to abstract the invar-
iant arrangement of the pattern. The outcome is theoretically significant because it demonstrates that
perceptual learning may play a role in acquiring some aspects of visual organization. Moreover, following
Goldstone’s (2003) proposal that one mechanism by which perceptual learning occurs is by increasing
attention to relevant information and decreasing attention to irrelevant information, Bhatt and Quinn
(2011) have suggested that variability led to grouping based on shape similarity because it enhanced
infant attention to global structures and diminished attention to local elements.

Perceptual Grouping of Visual Displays by Infants


via Modern Organizational Principles
While the classic grouping principles were described by Wertheimer (1923/1958), the group-
ing principles that will be examined in this section, connectedness and common region, were

(a) (b) (c)

Familiar or or or

Test vs.

Fig. 34.6  Familiarization and test stimuli used in Quinn and Bhatt (2005b).
Reproduced from Paul C. Quinn and Ramesh S. Bhatt, Psychological Science, 16(7), Learning Perceptual
Organization in Infancy, pp. 511–515, doi: 10.1111/j.0956-7976.2005.01567.x, Copyright © 2005 by SAGE
Publications. Reprinted by Permission of SAGE Publications.
Development of Perceptual Organization in Infancy 701

introduced by Palmer and Rock in the 1990s (Rock and Palmer 1990; Palmer 1992; Palmer and
Rock 1994; see also Brooks, this volume).

Connectedness
Rock and Palmer (1990) described the principle of connectedness as the visual system’s tendency to
group together connected entities, and remarked that ‘connectedness . . . may be the most fundamental
principle of grouping yet uncovered’ (Rock and Palmer 1990, p. 86). To determine whether sensitivity
to connectedness is operational in early infancy, as shown in Figure 34.7, infants as young as 3 months
of age were habituated to the connected patterns shown in panels A or B, and then administered a pref-
erence test pairing connected elements (panel C) with disconnected elements (panel D) (Hayden et al.
2006). The expectation was that if the infants organize the habituation patterns on the basis of connect-
edness, then they should display a novelty preference for the disconnected-element test stimulus. This
outcome was observed, and a control condition showed that it could not be attributed to a spontaneous
preference. The results indicate that young infants are sensitive to the connectedness principle.

Common region
Another newer grouping principle is common region, which states that elements within a region
are grouped together and separated from those in other regions (Palmer 1992). Palmer has also
proposed that common region is driven by a characteristic that is external to the elements them-
selves. In other words, the ‘common region’ quality that engenders grouping of elements is not
inherent in the elements themselves. By contrast, other grouping principles such as similarity
are based on intrinsic characteristics of the elements to be grouped. Palmer thus distinguished
between ‘extrinsic’ versus ‘intrinsic’ organizational cues and suggested that common region is an
extrinsic cue. This distinction raises the possibility that common region could be a different kind

Habituation
(a) (b)

or

Test
(c) (d)

vs.

Fig. 34.7  The stimuli used in Hayden et al. (2006). Infants in the habituation conditions were
habituated to the connected patterns in panels (a) or (b) and tested with the patterns in panels (c)
and (d). Infants in the no-habituation condition were tested with the patterns in panels (c) and (d)
without prior exposure to the patterns in panels (a) and (b).
Reproduced from Psychonomic Bulletin & Review, 13(2), pp. 257–261, Infants’ sensitivity to uniform
connectedness as a cue for perceptual organization, Angela Hayden, Ramesh S. Bhatt, and Paul C. Quinn,
Copyright © 2006, Springer-Verlag. With kind permission from Springer Science and Business Media.
702 Quinn and Bhatt

of organizational cue from many others, thereby adding to the importance of understanding its
emergence in infants.
To examine whether young infants use common region to organize visual patterns, 3- to
4-month-olds were familiarized with a display consisting of two pairs of shapes, with one pair
(e.g. A and B) located together in a region and the other pair (e.g. C and D) located together in
another region (see Figure 34.8) (Bhatt et al. 2007). The locations of the individual shapes changed
from one trial to the next, but the shapes A and B always shared a region while the shapes C and
D shared another region. Infants were then tested with a within-region grouping (e.g. AB) versus
a between-region grouping (e.g. BC; see Figure 34.8). Importantly, because the physical distance
between A and B versus B and C was equivalent, the only difference between the A and B versus
B and C pairs was that the former pair shared the same region, whereas the members of the lat-
ter pair were from different regions. If common region is functional in infancy, then the A and B
elements should be grouped together because they always shared the same region. That is, infants
should find the within-region grouping to be familiar and the between-region grouping to be novel,
and respond differentially to these patterns during the test.
Another aspect of the work of Bhatt et al. (2007) is that it asks whether grouping will carry over
to novel regions, given that infants were habituated to vertical regions and tested with horizontal

Habituation
Trial 1

Trial 2

Test
Between region Within region

Fig. 34.8  Examples of the stimuli used in Bhatt et al. (2007). Infants were habituated to two pairs of
shapes, with one pair sharing a vertical region and the other pair a different vertical region. Infants
were then tested for their preference between a pair of shapes that had shared a common region
during habitation (within-region pair) versus a pair of shapes that had been in different regions during
habituation (between-region pair), both presented in novel horizontal regions.
Reproduced from Perceptual Organization Based on Common Region in Infancy, Ramesh S. Bhatt, Angela Hayden,
and Paul C. Quinn, Infancy, 12(2), pp. 147–168, Copyright © 2007 International Society on Infant Studies.
Development of Perceptual Organization in Infancy 703

regions. This manipulation allows one to determine whether the perceptual system expects group-
ing to remain intact when presented with elements that were previously grouped based on one set
of regions are subsequently encountered in novel regional configurations. Presumably, if grouping
and perceptual organization are to be functionally advantageous, they need to allow the world to
be structured into meaningful entities that transcend particular situations.
The major result from Bhatt et al. (2007) was that the infants discriminated the grouping of
elements from different regions from the grouping of elements that had shared a common region
during habituation. Moreover, Hayden et al. (2008) extended these results to regions formed by
illusory contours. The findings that infants are sensitive to common region suggest that the extrin-
sic nature of this cue did not preclude its role as an organizing factor. In other words, infants,
like adults, are not solely dependent upon the intrinsic nature of elements to organize them; they
are able to use extrinsic factors such as common region to organize. Additionally, the result that
performance transferred from differently shaped regions from familiarization to test provides
evidence that the perceptual organizational abilities of infants can produce processing units of
an abstract nature. This latter result actually points toward a unitization process by which previ-
ously disparate elements become grouped and begin to function as coherent units in new contexts
(Goldstone 2003; Bhatt and Quinn 2011).

Relations Among the Principles


Transfer of organization across principles
Although the research reviewed thus far in this chapter suggests that there is flexibility in early
grouping in that perceptual units formed by applying a particular grouping principle can be general-
ized to novel patterns organized by the same principle, one may also ask whether perceptual units
formed from application of one principle can be transferred to process a visual pattern organized
by a different principle. To this end, Quinn and Bhatt (2009) investigated the possibility of transfer
of organization between two principles, lightness similarity and form similarity, both of which were
previously shown to be functional in 6- to 7-month-olds. Six- to 7-month-olds were familiarized
with arrays that could be organized into columns or rows based on lightness similarity. The infants
were then given a novelty-preference test that paired arrays that could be organized into columns
or rows based on form similarity (see Figure 34.9, top panel). If infants can organize the familiariza-
tion patterns by lightness similarity and use the represented organization as a basis for processing
test patterns organized by form similarity, then they should prefer the novel organization. It should
be noted that this transfer task is more demanding than the lightness similarity task of Quinn et al.
(1993) because it calls upon infants to group the elements in the test displays based on their form
similarity. The infants performed in accord with this expectation, indicating that perceptual units
formed from application of one grouping principle (lightness similarity) can be transferred to appre-
hend an organization defined by a different grouping principle (form similarity).
Kangas et al. (2011) also reported transfer of organization from common region to proximity
in 6- to 7-month-olds, but not in 3- to 4-month-olds; however, they did demonstrate transfer of
organization from connectedness to proximity at the younger age. These latter results indicate that
transfer of organization across principles is evident early in life, although it continues to undergo
quantitative change during infancy.

Perceptual scaffolding
Given transfer between lightness and form similarity, one can inquire as to whether evidence might be
found for perceptual scaffolding, a process by which learning based on an already functional organizational
704 Quinn and Bhatt

principle enables an organizational process that is not yet functional. That is, might infants who are oth-
erwise not able to group based on an organizational principle be induced to do so if they are previously
allowed to group elements based on an already functional organizational process? To answer this ques-
tion, Quinn and Bhatt (2009) capitalized on previous evidence showing that 3- to 4-month-old infants
readily organize via lightness similarity (Quinn et al. 1993), whereas organization by form similarity is not
readily exhibited until 6 to 7 months of age (Quinn et al. 2002), and administered the procedure depicted
in Figure 34.9 (top panel) to a group of 3- to 4-month-olds. The younger infants succeeded in the task,
thereby showing that the already developed luminance-based organizational system facilitated grouping
based on form similarity. This conclusion was upheld by the null performance of a control group of 3- to

Luminance Shape

Familiar stimulus Test stimuli

Shape Shape
Familiar stimulus Test stimuli

Fig. 34.9  Illustrations of the luminanceshape (top panel) and shapeshape tasks (bottom panel)
presented to infants by Quinn and Bhatt (2009) to examine whether infants will learn to use shape cues to
organize if presented in the context of organization based on luminance cues.
Reproduced from Paul C. Quinn and Ramesh S. Bhatt, Psychological Science, 20(8), Transfer and Scaffolding
of Perceptual Grouping Occurs Across Organizing Principles in 3- to 7-Month-Old Infants, pp. 933–938, doi:
10.1111/j.1467-9280.2009.02383.x, Copyright © 2009 by SAGE Publications. Reprinted by Permission of SAGE
Publications.
Development of Perceptual Organization in Infancy 705

4-month-olds who were familiarized and tested with the form elements shown in Figure 34.9 (bottom
panel). Taken together, the results highlight a scaffolding process that may engender learning by enabling
infants to group based on a new cue using an already functioning organizational process. Importantly,
this work demonstrates that new organizational principles can be learned via bootstrapping onto already
functioning organizational principles, as Spelke (1982) had suggested.

A salience hierarchy?
Although the chapter has thus far documented that a variety of organizational principles are oper-
ational in infants, what has not yet been discussed is whether there is differential salience among
the cues. That is, are there differences in cue salience when multiple cues are concurrently avail-
able in a stimulus display presented to infants? This question derives significance because of the
previously discussed differences in how readily principles such as lightness similarity and form
similarity are deployed, and because of arguments that connectedness may be the most funda-
mental of all the principles (Rock and Palmer 1990).
In an initial experiment that tested the salience of connectedness versus form similarity, 6- to
7-month-olds were habituated to a pattern that could be organized on the basis of both connect-
edness and shape similarity (Hayden et al. 2009). The stimuli contained alternating rows or col-
umns of two different shapes (Xs and Os). The shapes were connected by a black bar in the same
configuration (rows or columns) in which the shapes were organized (see Figure 34.10). Following
habituation, infants were tested with a pair of new stimuli: one in which connectedness was altered
(by breaking the connectedness among the shapes), and the other in which shape organization
was altered (a change from rows to columns or vice versa). The connectedness manipulation was
accomplished by positioning the previously connecting lines higher, rather than using shorter
lines in their original familiarization location, to keep the total amount of contour constant across
the displays. X–O stimuli were used to depict the shape contrast; while one could have used alter-
native displays to depict the shape contrast (e.g. square versus diamond), several different shape
contrasts presented to infants have yielded equivalent grouping results (Quinn and Bhatt 2005b).
If one of the perceptual organizational cues (connectedness versus shape similarity) was more

(a) (b) (c)

Habituation Stimulus Change in Connectedness Change in Shape


Fig. 34.10  Examples of the stimuli used by Hayden et al. (2009). Infants were habituated to patterns of
the kind shown in panel (a). These patterns could be organized based on both connectedness cues and
shape similarity cues. The infants were tested with a pattern in which the connectedness was altered
(panel (b)) paired with a pattern in which the shape similarity was altered (panel (c)).
Reproduced from Attention, Perception, & Psychophysics, 71(1), pp. 52–63, Relations between uniform
connectedness, luminance, and shape similarity as perceptual organizational cues in infancy, Angela Hayden,
Ramesh S. Bhatt, and Paul C. Quinn, Copyright (c) 2009, Springer-Verlag. With kind permission from Springer
Science and Business Media.
706 Quinn and Bhatt

salient than the other, the change induced by the manipulation of this cue should be more novel
and the infants should look longer at this pattern than at the pattern in which the less salient cue
was altered. The key finding was that infants preferred the pattern displaying the change in con-
nectedness, a result suggesting that connectedness is more salient than shape similarity.
Hayden et al. (2009) next examined the salience relations of connectedness and lightness simi-
larity by repeating their experimental procedure, except that the patterns previously organized
by shape (i.e. X versus O) were now organized by lightness (i.e. dark versus light squares). In
this case, infants preferred to look at the pattern displaying a luminance change to a significantly
greater degree than the pattern displaying a connectedness change, a result suggesting that lumi-
nance similarity was more salient than connectedness. The pattern of the results of Hayden et al.
(2009) provide evidence that there is a luminance–connectedness–shape salience hierarchy oper-
ating among the organizational cues to which 6- to 7-month-olds have been shown to be sensitive.

Further Evidence on the Flexibility of the Principles


While having a set of organizing principles functioning in the initial months establishes the
coherence of visual patterns, it is also the case that such principles need to work in conjunction
with other cognitive processes such as concept formation. This observation suggests that some
flexibility may be needed in the deployment of the principles given that visual features that are
diagnostic of a category can, in certain instances, be features that would not be selected by Gestalt
organizing principles. Schyns et al. (1998) have therefore argued for a flexible system of perceptual
unit formation, one in which some of the features that come to define objects are extracted during
concept learning. Moreover, concepts possessed by an individual at a specific point in time should
affect subsequent perceptual organization processes.
Quinn and Schyns (2003) undertook a set of experiments to determine whether features that
are specified as coherent by Gestalt principles would be ‘overlooked’ by young infants if alternative
means of perceptual organization are ‘suggested’ by presenting a category of objects in which the
features uniting the objects are ‘non-natural’ in the Gestalt sense. In Experiment 1, 3- to 4-month-
olds were familiarized with a number of complex figures, examples of which are shown in the top
portion of Figure 34.11. Subsequently, during a novelty preference test, the infants were presented
with the pacman shape paired with the circle shown in the bottom portion of Figure 34.11. The
infants preferred the pacman shape, a finding which suggests that they had parsed the circle from
the complex figures via good continuation (Quinn et al. 1997).
In Experiment 2, Quinn and Schyns (2003) (see also Quinn et al. 2006) asked whether an
invariant part abstracted during category learning would interfere with perceptual organization
achieved by good continuation. Experiment 2 consisted of two parts. In Part 1, infants were famil-
iarized with multiple exemplars consistent with category learning, with each exemplar marked by
an invariant pacman shape, and subsequently administered a novelty preference test that paired the
pacman and circle shapes. Examples of the stimuli are shown in Figure 34.12. The pacman shape
was recognized as familiar, as evidenced by a preference for the circle shape. Part 2 of the proce-
dure was then administered and it consisted of a replication of the procedure from Experiment 1.
If the category learning from Part 1 of Experiment 2, in particular the representation of the
invariant pacman shape, could interfere with the Gestalt-based perceptual organization that was
observed in Experiment 1, then the preference for the pacman shape observed in Experiment 1
should no longer be observed in Part 2 of Experiment 2. In fact, if representation of the pacman
shape carried over from Part 1 to Part 2 of Experiment 2, one would expect the opposite result in
which infants continue to prefer the circle in the test phase. The latter result was observed, and it
suggests that perceptual units formed during category learning can interfere with the formation
Familiarization trials

Test trials

Fig. 34.11  Examples of the familiarization and test stimuli used in Quinn and Schyns (2003) and
Quinn et al. (2006). If the infants can parse the circle from the familiar patterns in accord with good
continuation, then they should prefer the pacman shape over the circle during the test trials.
Reproduced from What goes up may come down: perceptual process and knowledge access in the organization
of complex visual patterns by young infants, Paul C. Quinn and Philippe G. Schyns, Cognitive Science, 27(6),
pp. 923–35, Copyright © 2003, Cognitive Science Society, Inc.

Familiarization trials

Test trials

Fig. 34.12  Examples of the familiarization and test stimuli used in Quinn and Schyns (2003) and Quinn
et al. (2006). If the infants can extract the invariant pacman from the familiar patterns, then they should
prefer the circle shape over the pacman shape during the test trials.
Reproduced from What goes up may come down: perceptual process and knowledge access in the organization
of complex visual patterns by young infants, Paul C. Quinn and Philippe G. Schyns, Cognitive Science, 27(6),
pp. 923–35, Copyright © 2003, Cognitive Science Society, Inc.
708 Quinn and Bhatt

of perceptual units organized by good continuation. The bias set by good continuation can thus be
thought of as soft-wired. More generally, an individual’s history of categorization can affect their
subsequent organizational processes.

Conclusions
This chapter has reviewed evidence on the development of perceptual organization, described
against a backdrop of different theoretical views, including those that emphasize innate organiz-
ing principles and others that highlight perceptual learning. The studies clearly show that several
phenomena that have been taken as evidence of perceptual organization in adults, such as con-
figural superiority, global precedence, and subjective contours, can be demonstrated in infants.
The data also suggest that different organizational principles may become functional over dif-
ferent time courses of development, may be governed by different developmental determinants,
i.e. maturation versus experience, have differential salience, and that not all principles are readily
deployed in the manner originally proposed by Gestalt theorists.
The principles were additionally shown to be flexible in their operation in terms of producing
units of processing that would transfer across different displays organized by the same principle
and also across different principles. In this sense, the units produced by the infant’s organizational
processes may be regarded as conceptual-like in their generalizability.
To comment further on the differences among grouping principles, there is evidence for early
functionality of classic organizational principles that include common motion, good continua-
tion, lightness similarity, and proximity, as well as for the modern organizational principles of
common region and connectedness. By contrast, form similarity was shown to develop later
and not be as readily deployed. However, form similarity was shown to be activated when young
infants were provided with multiple element contrasts, thereby suggesting a role for perceptual
learning in its emergence. Form similarity was also activated when pulled along by the already
functional principle of lightness similarity, thus demonstrating a perceptual scaffolding process by
which new organizational principles can be learned.
Overall, the evidence points to a hybrid model to explain the development of perceptual
organization. As contended by the Gestaltists (Wertheimer 1923/1958; Köhler 1929; Koffka 1935;
Metzger 1936/2006), as well as Zuckerman and Rock (1957), a number of grouping principles
are operational in the early months. However, as contended by Hebb (1949) and Brunswik and
Kamiya (1953), other principles may be learned through perceptual experience (Bhatt and Quinn
2011). The data actually lend support to the type of model proposed by Spelke (1982) in which
some start-up principles enable other principles to be bootstrapped onto them.
As we look to the future, there are a number of aspects of the development of perceptual
organization that are likely to be subject to further empirical inquiry. First, there are few stud-
ies of perceptual organization in newborns, with the majority of studies being conducted with
infants aged 3  months or older. Additional work on the functionality of the principles from
birth to 3 months of age has the potential to change our understanding of what competencies
are part of the infant’s initial endowment. Second, given evidence that the development of per-
ceptual organization continues into adolescence (e.g. Kovacs 2000; Kimchi et al. 2005; Hadad
and Kimchi 2006; Scherf et al. 2009; Hadad et al. 2010), we need to know more about how the
perceptual organizing abilities of infants are both continuous and discontinuous with those of
children and young adults.
A third issue centers on the mechanisms by which infants learn perceptual organization. In the
sections on Further Work on Perceptual Grouping . . . and Relations Among the Principles, we
reviewed studies showing that variability exposure and scaffolding based on already functional
Development of Perceptual Organization in Infancy 709

organizational principles facilitate the use of new organizational cues in infancy. Moreover, Bhatt
and Quinn (2011) have suggested attentional enhancement and unitization (Goldstone 2003) as
mechanisms that underlie perceptual learning in infancy. By attentional enhancement, we refer
to an increased weighting of global structure in situations that allow infants to be exposed to dif-
ferent element contrasts depicting a common organization. By unitization, we mean the process
by which elements are grouped via adherence to one organizational principle, and continue to be
combined in novel contexts organized by the same principle or even different principles, thereby
functioning as higher-order building blocks. Future research will need to address these and other
proposals (e.g. Johnson 2010) concerning the nature of learning that contributes to the develop-
ment of perceptual organization.
In addition, we know little of the cognitive neuroscience underlying development of perceptual
organization in infants (for an exception see Csibra et al. 2000). What neural correlates underlie
development of the different grouping principles? Also, given recent advances in our abilities to
track the eye movements of infants as they scan visual displays, what is the role of eye movements
in the establishment of perceptual organization? Although eye movements may not play quite the
defining role that was proposed by Hebb (1949), there is evidence of correlation between visual
scanning and perceptual completion for displays of partly occluded objects (Johnson et al. 2004).
Furthermore, while figure–ground segregation has been an area of investigation in the literature
on adult perceptual organization (e.g. Peterson 1994; Vecera et al. 2002), we know little about pro-
cesses of figure–ground segregation in infants. Finally, it will be interesting to learn how well the
grouping principles described here as being functional for a variety of two-dimensional displays
can scale up to organizing even more complex three-dimensional displays (e.g. Soska and Johnson
2008; Vrins et al. 2011). Continuing investigation on these and the other topic areas reviewed in
this chapter is likely to shed further light on the question of how we come to establish perceptual
organization in the domain of vision.

Acknowledgements
Preparation of this chapter was supported by grant HD-46526 from the National Institute of Child
Health and Human Development. We thank Johan Wagemans and two anonymous reviewers for
their comments. Correspondence should be sent to Paul C. Quinn, Department of Psychology,
University of Delaware, Newark, DE 19716, USA. E-mail: pquinn@udel.edu.

References
Behrmann, M. and R. Kimchi (2003). ‘What does visual agnosia tell us about perceptual organization and
its relationship to object perception?’. J Exp Psychol: Human Percept Perform 29: 19–42.
Ben-Av, M. B. and D. Sagi (1995). ‘Perceptual grouping by similarity and proximity: experimental results
can be predicted by autocorrelations’. Vision Res 35: 853–866.
Bhatt, R. S., A. Hayden, and P. C. Quinn (2007). ‘Perceptual organization based on common region in
infancy’. Infancy 12: 147–168.
Bhatt, R. S. and P. C. Quinn (2011). ‘How does learning impact development in infancy? The case of
perceptual organization’. Infancy 16: 2–38.
Bomba, P. C., P. D. Eimas, E. R. Siqueland, and J. L. Miller (1984). ‘Contextual effects in infant visual
perception’. Perception 13: 369–376.
Brunswik, E. and J. Kamiya (1953). ‘Ecological cue validity of ‘proximity’ and other gestalt factors’. Am
J Psychol 66: 20–32.
Colombo, J., C. A. Laurie, T. A. Martelli, and B. R. Hartig (1984). ‘Stimulus context and infant orientation
discrimination’. J Exp Child Psychol 37: 576–586.
710 Quinn and Bhatt

Csibra, G., G. Davis, M. W. Spratling, and M. H. Johnson (2000). ‘Gamma oscillations and object
processing in the infant brain’. Science 290: 1582–1585.
Elder, J. H. and R. M. Goldberg (2002). ‘Ecological statistics of Gestalt laws from the perceptual
organization of contours’. J Vision 2: 324–353.
Fantz, R. L. (1964). ‘Visual experience in infants: Decreased attention to familiar patterns relative to novel
ones.’ Science, 164: 668–670.
Farran, E. K. (2005). ‘Perceptual grouping in Williams syndrome: evidence for deviant patterns of
performance’. Neuropsychologia 43: 815–822.
Farroni, T., E. Valenza, F. Simion, and C. Umilta (2000). ‘Configural processing at birth: evidence of
perceptual organization’. Perception 29: 355–372.
Frick, J. E., J. Colombo, and J. R. Allen (2000). ‘Temporal sequence of global-local processing in
3-month-old infants’. Infancy 1: 375–386.
Ghim, H. (1990). ‘Evidence for perceptual organization in infants: perception of subjective contours by
young infants’. Infant Behav Dev 13: 221–248.
Ghim, H. R. and P. D. Eimas (1988). ‘Global and local processing in 3- and 4-month-old infants’. Percept
Psychophys 43: 165–171.
Gilchrist, I. D., G. W. Humphreys, M. J. Riddoch, and H. Neumann (1997). ‘Luminance and edge
information in grouping: a study using visual search’. J Exp Psychol: Human Percept Perform 23: 464–480.
Goldstone, R. L. (2003). ‘Learning to perceive while perceiving to learn’. In Perceptual Organization in
Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and C. R. Olson, pp.
223–278 (Mahwah, NJ: Erlbaum).
Hadad, B. and R. Kimchi (2006). ‘Developmental trends in utilizing closure for grouping of shape: Effects
of spatial proximity and collinearity’. Percept Psychophys 68: 1264–1273.
Hadad, B. S., D. Maurer, and T. L. Lewis (2010). ‘The development of contour interpolation’. J Exp Child
Psychol 106: 163–176.
Han, S., G. W. Humphreys, and L. Chen (1999). ‘Uniform connectedness and classical Gestalt principles of
perceptual grouping’. Percept Psychophys 61: 661–674.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2006). ‘Infants’ sensitivity to uniform connectedness as a cue for
perceptual organization’. Psychon Bull Rev 13: 257–271.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2008). ‘Perceptual organization based on illusory regions in
infancy’. Psychon Bull Rev 15: 443–447.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2009). ‘Relations between uniform connectedness, luminance,
and shape similarity as perceptual organizational cues in infancy’. Attention, Percept Psychophys
71: 52–63.
Hebb, D. O. (1949). The Organization of Behavior (New York: Wiley).
Humphreys, G. W. (2003). ‘Binding in vision is a multistage process’. In Perceptual Organization in
Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and C. R. Olson,
pp. 377–399 (Mahwah, NJ: Erlbaum).
Johnson, S. P. (ed.) (2010). Neoconstructivism: the New Science of Cognitive Development (New York: Oxford
University Press).
Johnson, S. P. and R. N. Aslin (1998). ‘Young infants’ perception of illusory contours in dynamic displays’.
Perception 27: 341–353.
Johnson, S. P., J. A. Slemmer, and D. Amso (2004). ‘Where infants look determines how they see: eye
movements and object perception performance in 3-month-olds’. Infancy 6: 185–201.
Kangas, A., N. Zieber, A. Hayden, P. C. Quinn, and R. S. Bhatt (2011). ‘Transfer of associative grouping to
novel perceptual contexts in infancy’. Attention, Percept Psychophys 73: 2657–2667.
Kanizsa, G. (1955). ‘Margini quasi-percettivi in campi con stimolazione omogenea’. Riv Psicologia 49: 7–30.
Kavsek, M. J. (2002). ‘The perception of static subjective contours in infancy’. Child Dev 73: 331–344.
Development of Perceptual Organization in Infancy 711

Kellman, P. J. and E. S. Spelke (1983). ‘Perception of partly occluded objects in infancy’. Cogn Psychol
15: 483–524.
Kimchi, R., B. Hadad, M. Behrmann, and S. Palmer (2005). ‘Microgenesis and ontogenesis of perceptual
organization: evidence from global and local processing of hierarchical patterns’. Psychol Sci
16: 282–290.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt, Brace and World).
Köhler, W. (1929). Gestalt Psychology (New York: Horace Liveright).
Kovacs, I. (2000). ‘Human development of perceptual organization’. Vision Res 40: 1301–1310.
Metzger, W. (1936/2006). The Laws of Seeing, translated by L. Spillmann (Cambridge, MA: MIT Press).
Navon, D. (1977). ‘Forest before trees: the precedence of global features in visual perception’. Cogn Psychol
9: 353–383.
Palmer, S. E. (1992). ‘Common region: a new principle of perceptual grouping’. Cogn Psychol 24: 436–447.
Palmer, S. E. and I. Rock (1994). ‘Rethinking perceptual organization: the role of uniform connectedness’.
Psychon Bull Rev 1: 29–55.
Peterson, M. A. (1994). ‘Shape recognition can and does occur before figure-ground organization’. Curr
Direct Psychol Sci 3: 105–111.
Pomerantz, J. R. (1981). ‘Perceptual organization in information processing’. In Perceptual Organization,
edited by M. Kubovy and J. R. Pomerantz, pp. 141–180 (Hillsdale, NJ: Erlbaum).
Pomerantz, J. R., L. C. Sager, and R. J. Stoever (1977). ‘Perception of wholes and of their component
parts: some configural superiority effects’. J Exp Psychol: Human Percept Perform 3: 422–435.
Prinzmetal, W. and W. P. Banks (1977). ‘Good continuation affects visual detection’. Percept Psychophys
21: 389–395.
Quinn, P. C. and R. S. Bhatt (2005a). ‘Good continuation affects discrimination of visual pattern
information in young infants’. Percept Psychophys 67: 1171–1176.
Quinn, P. C. and R. S. Bhatt (2005b). ‘Learning perceptual organization in infancy’. Psychol Sci 16: 511–515.
Quinn, P. C. and R. S. Bhatt (2006). ‘Are some Gestalt principles deployed more readily than others during
early development? The case of lightness versus form similarity’. J Exp Psychol: Human Percept Perform
32: 1221–1230.
Quinn, P. C. and R. S. Bhatt (2009). ‘Transfer and scaffolding of perceptual grouping occurs across
organizing principles in 3- to 7-month-old infants’. Psychol Sci 20: 933–938.
Quinn, P. C. and P. D. Eimas (1986). ‘Pattern–line effects and units of visual processing in infants’. Infant
Behav Dev 9: 57–70.
Quinn, P. C. and P. G. Schyns (2003). ‘What goes up may come down: perceptual process and knowledge
access in the organization of complex visual patterns by young infants’. Cogn Sci 27: 923–935.
Quinn, P. C., S. Burke, and A. Rush (1993). ‘Part–whole perception in early infancy: evidence for
perceptual grouping produced by lightness similarity’. Infant Behav Dev 16: 19–42.
Quinn, P. C., C. R. Brown, and M. L. Streppa (1997). ‘Perceptual organization of complex visual
configurations by young infants’. Infant Behav Dev 20: 35–46.
Quinn, P. C., R. S. Bhatt, D. Brush, A. Grimes, and H. Sharpnack (2002). ‘Development of form similarity
as a Gestalt grouping principle in infancy’. Psychol Sci 13: 320–328.
Quinn, P. C., P. G. Schyns, and R. L. Goldstone (2006). ‘The interplay between perceptual organization and
categorization in the representation of complex visual patterns by young infants’. J Exp Child Psychol
95: 117–127.
Quinn, P. C., R. S. Bhatt, and A. Hayden (2008a). ‘What goes with what? Development of perceptual
grouping in infancy’. In Psychology of Learning and Motivation, Vol. 49, edited by B. H. Ross, pp. 105–
146 (San Diego: Elsevier).
Quinn, P. C., R. S. Bhatt, and A. Hayden (2008b). ‘Young infants readily use proximity to organize visual
pattern information’. Acta Psychol 127: 289–298.
712 Quinn and Bhatt

Rock, I. and S. Palmer (1990). ‘The legacy of Gestalt psychology’. Sci Am 263: 84–90.
Salapatek, P. (1975). ‘Pattern perception in early infancy’. In Infant Perception: From Sensation to
Cognition: Vol. 1 Basic Visual Processes, edited by L. B. Cohen and P. Salapatek, pp. 133–248
(New York: Academic Press).
Scherf, K. S., M. Behrmann, R. Kimchi, and B. Luna (2009). ‘Emergence of global shape processing
continues through adolescence’. Child Dev 80: 162–177.
Schyns, P. G., R. L. Goldstone, and J. P. Thibaut (1998). ‘The development of features in object concepts’.
Behav Brain Sci 21: 1–54.
Spelke, E. S. (1982). ‘Perceptual knowledge of objects in infancy’. In Perspectives on Mental Representation,
edited by J. Mehler, M. Garrett, and E. Walker, pp. 409–430 (Hillsdale, NJ: Erlbaum).
Soska, K. C. and S. P. Johnson (2008). ‘Development of three-dimensional object completion in infancy’.
Child Dev 79: 1230–1236.
Vecera, S. P., E. K. Vogel, and G. F. Woodman (2002). ‘Lower region: a new cue for figure–ground
segregation’. J Exp Psychol: Gen 131: 194–205.
Vrins, S., S. Hunnius, and R. van Lier (2011). ‘Volume completion in 4.5-month-old infants’. Acta Psychol
138: 92–99.
Wagemans, J., J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, M. Singh, and R. von der Heydt
(2012). ‘A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization’. Psychol Bull 138: 1172–1217.
Wertheimer, M. (1923/1958). ‘Principles of perceptual organization’. In Readings in Perception, edited by
D. C. Beardslee and M. Wertheimer, pp. 115–135 (Princeton, NJ: Van Nostrand). Translated from the
German by M. Wertheimer.
Zuckerman, C. B. and I. Rock (1957). ‘A reappraisal of the roles of past experience and innate organizing
processes in visual perception’. Psychol Bull 54: 269–296.
Chapter 35

Individual differences in local and


global perceptual organization
Lee de-Wit and Johan Wagemans

Local versus global processing: a binary


distinction, worth investigating
Allen Newell (1973) argued that a lot of cognitive science could be characterized as a game of
twenty questions, whereby researchers would identify a potentially interesting phenomenon and
then set about asking dualistic questions about the underlying mechanisms: ‘is this process serial
or parallel’, ‘automatic or controlled’, ‘local or global’? Newell argued that this research agenda
was not the answer to building an effective paradigm for cognitive science. Whilst sensitive to
Newell’s criticism, this chapter will argue that one such dualistic distinction may in fact provide
deep insights into the mechanisms underlying perceptual organization, and potentially cognition
more generally. The integration of local signals into more global wholes in visual perception could
be viewed as a process of abstraction that could be applied to many domains: letters need to be
assembled into words, words to sentences, sentences to stories. To meaningfully interact with
others we make inferences beyond individual acts of behavior to construct abstract notions of the
self and other. Across many domains of processing it is therefore clear that abstracting parts into
wholes is a fundamental part of cognition, into which the study of the visual integration of local
signals into more global representations could provide important insights.
The need to combine local signals into more global wholes is essentially built into the architec-
ture of the visual system, which begins with an array of receptive fields sampling spatially distinct
parts of the input. Transforming local signals into more global ones is therefore not a side question
in vision, but one of the key challenges of perceptual organization. Indeed this transformation of
local signals into more global ones is a recurrent feature of visual processing as signals are repeat-
edly pooled via neurons with larger receptive fields into new retinotopic maps based on the input
to V1 (Harvey and Dumoulin, 2011) at higher stages of the visual system (Arcaro et al. 2009).
This integration of local signals into more global ones is such an integral feature of human
vision that it sometimes results in poor task performance for local details (Gottschaldt 1926;
Scholl et al. 2001). Indeed, humans have been argued to have a general global preference, bias or
precedence (Navon 1977; see also Hochstein and Ahissar 2002). However, this bias towards more
global percepts is not equally evident across individuals. Especially in certain patient populations
one sees an interesting reversal whereby patients perform better for certain tasks that require
the use of local details, and show a reduced sensitivity to certain Gestalt grouping cues (see sec-
tions on schizophrenia and autism). In this chapter we argue that these individual differences
are neither a peripheral question to visual perception nor an arbitrary dualistic distinction with
which psychologists can play twenty questions. Finding common variance between different local
714 de-Wit and Wagemans

global tasks could provide important pointers to common mechanisms or principles of perceptual
organization.

Visions of individual differences: one man’s


noise is another man’s signal
Rutherford is famously quoted as saying that ‘all real science is physics, the rest is just stamp
collecting’. Insecure of their discipline’s place as a science, many psychologists hold physics up
as an ideal, and seem to be in something of a rush to manipulate dependent variables and find
significant differences in independent ones, whilst ignoring nature’s experiments inherent in the
individual variability across participants. If done vigorously and systematically, however, stamp
collecting individual differences could provide a fundamental tool for advancing science. Darwin’s
observations (stamp collecting) of the variability of beak lengths on different islands is a powerful
illustration of this.
In his presidential address to the American Psychological Society: ‘The two disciplines of scien-
tific psychology’, Cronbach (1957) argued that the hypothesis testing of experimental psychology
needed to be combined with an interest in measuring and understanding individual differences
(which is typically found in more applied areas of psychology). Perhaps as a reflection of vision
science’s increasing maturity as a field, the integration of experimental work with measures of
individual differences, although not yet mainstream, is becoming an increasingly prevalent fea-
ture of our research.
Indeed, the individual difference is beginning to make a difference in almost all domains of
vision science, from the interpretation of fMRI (Yovel and Kanwisher (2005); Vogel and Awh
2008), EEG (van Leeuwen and Smit 2012) and behavioral results (Wang et al. 2012), to the rela-
tionship between visual abilities and structural differences in cortical volume (Kanai and Rees
2011; Gilaie-Dotan et al. 2012), and in relation to neurotransmitter concentrations (Sumner et al.
2010). The idea that individual differences at a behavioral level can be associated with neuro-
transmitter concentrations is reinforced in a study by Van Loon et  al. (2013), where they first
established that individual differences in three bi-stable phenomena were correlated with GABA
concentrations in early visual areas, and then followed up this correlation using a pharmacological
intervention, known to influence GABA, which increased perceptual stability across participants.
These examples are intended to make clear that if measuring individual differences is to be
regarded as stamp collecting, then stamp collecting should be regarded as an integral research
tool to develop, and test hypotheses in vision science (Wilmer 2008), and in the cognitive neuro-
sciences more generally (e.g., Duncan 2012).

Historical developments: local to global,


from Witkin to today
This section is intended to put some key empirical developments on the table before explor-
ing their theoretical underpinnings and their construct validity in later sections. This chapter
will then return to these developments in more detail in the sections on autism and schizo-
phrenia. The study of perceptual organization began in earnest with the Gestalt school (see
Wagemans, this volume). The primary focus of this school of thought was somewhat dif-
ferent to modern vision science: Gestalt psychologists were interested in how parts were
organized into wholes, as distinct figures from their backgrounds (see also Wagemans et al.
2012a), whereas modern visual science studies object recognition in a manner that some-
times implicitly excludes these more basic stages of perceptual organization (Wichmann et al.
Individual Differences in Local and Global Perceptual Organization 715

2010). This is not to imply that there is a fundamental discrepancy between object recogni-
tion and perceptual organization (see Biederman 1987; Feldman and Hock, this volume).
Rather this is a question of emphasis: whilst modern research often focuses on how objects
are recognized, Gestalt research focused on how visual input could be organized into distinct
objects. In reality these processes are surely intertwined: recognition influences grouping and
grouping influences recognition (Pelli et al. 2009; Peterson 1994).
Gestalt psychologists, inspired more by the experimental science of physics than the stamp col-
lecting of biology, also focused on identifying universal laws in organizing visual input such as
the minimum or simplicity principle (see van der Helm, this volume). Nevertheless, the stimuli
and paradigms developed within the Gestalt school have motivated some of the most significant
developments in the study of individual differences. One of the most important of these was the
Embedded Figures Test (EFT—Gottschaldt 1926, see Figure 35.1). In the EFT a target element is

(a) Embedded figures test (EFT) (b) Navon letters

(c) Ebbinghaus illusion (d) Mooney figure and original

(e) Ponzo illusion (f) Block design (g) Collinear contour (h) Proximity dot lattice
Fig. 35.1  (a) Embedded Figures Test: the stimulus on the left has to be identified in the embedded
context to the right; (b) Navon Letters: the local and global letters are illustrated in direct conflict;
(c) Ebbinghaus Illusion: the perceived size of the central dot is influenced by its context; (d) Mooney
Figure and Original: a novel two-tone Mooney image is illustrated on the left based on the original to
the right; (e) Ponzo Illusion: the perceived size of the two dots is influenced by their context; (f) Block
Design: subjects have to replicate a simple pattern with 3-D cubes, critical for the local global literature
is the difference between the standard version on the left and the segmented version on the right; (g)
Collinear Contour: the co-alignment of a string of Gabors creates the impression of a closed shape
(generated using GERT, Demeyer and Machilsen 2012); (h) Proximity Dot Lattice: the slight difference in
spacing between the dots creates the impression of a row of oriented lines.
Reproduced from Lee de-Wit, Stimuli used to study individual differences in local and global perceptual
organization. FigShare. http://dx.doi.org/10.6084/m9.figshare.707082 (c) 2013, The Authors. This work is licensed
under a Creative Commons Attribution 3.0 License.
716 de-Wit and Wagemans

literally embedded (often exploiting a range of grouping cues, including proximity, closure and good
continuation) in a new more complex pattern, and participants have to find this local element in the
more complex whole. Herman Witkin (1962) used performance on this task to help motivate the
constructs of ‘field independent’ (more local) and ‘field dependent’ (more global) processing styles.
Witkin exploited this task not out of an interest in visual perception per se, but rather because he
regarded it as providing a more objective test of what he argued was a general cognitive style.
One of the important strengths of Witkin’s work was that he not only used participants’ embedded
figures score to measure their degree of perceptual bias. He also showed that performance on the EFT
was highly correlated with performance on the ‘rod-and-frame’ test (Witkin and Asch 1948). In this
test the orientation of a central rod has to be judged while that rod is surrounded by a larger oriented
frame. For some observers the judgment of the orientation of the individual rod is heavily influenced
by the surrounding context. Thus, analogous to the EFT, a judgment about a local part has to be made
whilst trying to ignore the influence of a more global whole. Witkin’s work, and in particular the use
of the EFT provided important groundwork for the study of individual differences in numerous con-
texts, from education (Goodenough 1976) to cultural differences (Witkin and Berry 1975), but prob-
ably most significantly for Uta Frith’s work with autism. Frith (1989) theorized that visual perception
in autism was altered in a manner that meant that parts were less likely to be integrated into coherent
wholes. This theory was motivated by the finding that across a number of tasks, including EFT, people
with autism were actually better at extracting or using local information.
The identification of changes in perceptual organization in autism has paralleled research in
schizophrenia. Already in 1952, Matussek postulated that schizophrenia involved an increased
perceptual disembedding of parts from wholes, which was integrally related to the disruption of
feeling meaningfully embedded as an agent in the world. Over the 1990s and 2000s a wide range
of evidence has been accumulated that schizophrenia, but in particular the disorganized symp-
toms of schizophrenia (Uhlhaas et al. 2006b), are associated with a reduction in the ability to use
a range of Gestalt grouping cues to integrate local signals into more global organized percepts
(Kurylo et al. 2007).
Since Witkin’s work, a wide range of stimuli and tasks have emerged as operalizations of local
and global processing. A sample of these stimuli and tasks are illustrated in Figure 35.1. As will
become clear however, these stimuli and tasks can be conceptualized as engaging very different
underlying processes. Some tasks are global in the sense of requiring a comparison of the relation
between two local elements (configural tasks), some tasks involve global judgments that critically
depend on the integration of local elements (Mooney 1957), some illusions test the perception of
a local element when spatially surrounded by contextual elements (Ebbinghaus and Ponzo illu-
sions), some tasks require the detection of a local element when spatially and structurally embed-
ded in a new context (EFT), some tasks attempt to explicitly put local and global responses in
conflict with each other (Navon 1977), other tasks look at the detection of changes in focal objects
in contrast to global scene contexts (Masuda and Nisbett 2006), and so forth. Other tasks do not
require perceptual judgments per se, but involve more complex responses, such that participants
have to draw a complex figure (Complex Figure of Rey) or have to reproduce a global pattern
using individual blocks (Block Design—WAIS). Given this wide range of ‘local global’ tasks it is
perhaps no surprise that the literature in this domain appears somewhat inconsistent, with some
authors reporting clear relationships between different local-global tasks and others finding that
distinct measures seem to be dominated by entirely unrelated sources of individual variance (see,
Construct validity: all that varies is not global, below).
A chapter reviewing local and global paradigms could provide a useful service to the field by
developing a taxonomy of local and global paradigms. Indeed one could argue that until such a
Individual Differences in Local and Global Perceptual Organization 717

clear taxonomy of tasks is defined there is no way to make progress. Often in psychology, however,
if one really wants to study interesting underlying processes, one cannot start from a predefined
concept. The terms local and global obviously have no meaning except with respect to a given
information processing system. From the perspective of perceptual organization, being able to
define the terms local and global in advance would require that we already have a definitive model
or explanation of how visual input is organized. This would be putting the cart before the horse.
The next section will therefore attempt to outline a range of theoretical perspectives that could
at least provide some candidate horses that could be pulling the clusters of correlated individual
differences in local-global tasks. This overview will only use the terms (and tasks pertaining to)
local and global when these have an inherently spatial component. The terms local and global are
sometimes used as synonymous with an assumed levels of processing or levels of abstraction. For
example, view-invariant object recognition may be described as a more global task, whereas rec-
ognizing the orientation of an object might be described as a more local task. In this chapter how-
ever, global tasks pertain only to tasks where a percept integrates visual stimuli (local parts) over
space. This integration is likely to be a recurrent feature at many levels of visual processing. For
example, edges may become integrated into a longer line, this line may be integrated as the border
of a rectangle, this rectangle may be integrated as a part (screen) of a larger object (a laptop), and
that object may in turn be integrated into an (office) scene. The potentially recurrently nature of
local to global integration at different spatial scales may indeed overlap with the extraction of
more and more abstract (or higher-level) representations, but this overlap need not be assumed,
and we do not use it here as a defining feature of local global tasks.

General principles for explaining local and global biases


It is unlikely to be the case that all biases in local or global paradigms result from one overarch-
ing framework or principle. It would, however, be equally naive to assume that every bias in the
local-global literature reflects an entirely distinct process or isolated module. Visual perception
in the human brain may turn out to be a ‘bag of tricks’ (Ramachandran 1985), but it would be
unfortunate to have prematurely given up on general principles before exploring their potential.
This section is therefore intended to flesh out a number of general underlying principles that
could help us to understand the factors underlying individual differences in local-global tasks and
the sensitivities to different grouping cues. This section takes an intentionally global view of the
potential contributions to perceptual biases, sometimes blurring local details in order to simplify
the explanation of a given approach. This blurring is not intended to convince the reader of any
one perspective, but rather to enable readers from different domains to understand why each fac-
tor is important to consider as a plausible general constraint.

Are Gestalt grouping principles an internalization


of ‘likely’ input statistics?
Gestalt psychologists are often credited only with the cataloguing of grouping cues (proximity,
similarity, collinearity, etc.) without providing a principled underlying explanation (Wagemans
et al. 2012b). This characterization probably says more about the way Gestalt psychology is pre-
sented in modern textbooks than it does about the actual Gestalt tradition. Classically, Gestalt
psychologists contrasted two ways of thinking about how visual input could be organized. One
line of thinking, actually challenged by Gestalt psychologists, held that mental life was dominated
by the formation of associations based on what was probable. This class of explanations can be
captured with the term ‘likelihood’. Thus the grouping of a series of collinear lines - - - - - is based
718 de-Wit and Wagemans

on an association learnt in the past (in one’s life, or over the course of evolution), namely that
these edges co-occur as part of the same line (see Elder and Goldberg 2002, for evidence that col-
linearity is a ‘likely’ feature of our input). The role of likelihood was contrasted with the notion
of simplicity: here interpretations were not based on likely associations but rather the inherent
simplicity of different perceptual interpretations (see Translating Gestalt simplicity into intrinsic
anatomical constraints, below).
The Gestalt focus on emergent properties based on the construct of simplicity may have led to
an unfortunate neglect of the possibility that many Gestalt laws could be learnt on the basis of
associations that are likely in the statistics of co-occurring features in the input to the visual sys-
tem. Assume, for example, that perception does come with certain building blocks (for luminance,
color, motion, simple edges) and that Hebbian learning causes these building blocks to become
associated over time, as neurons that fire together and wire together. Under these assumptions
many Gestalt principles for integrating local signals could emerge from associations that are ‘likely’
in the input to the visual system. A sensitivity to common fate for example could reflect an inter-
nalization of the statistical likelihood that, if one part of a rigid bodied object is moving, so too are
the other parts of that object. A sensitivity to proximity could emerge based on the fact that two
input signals that are spatially close together are more likely to have similar properties than two
signals that are more distant from each other. Even good continuation could result from a statistical
likelihood that in the input the visual system receives: In many real world scenes any edge is more
likely to continue in the same direction than in a different direction (Elder and Goldberg 2002;
Geisler, 2008), an association that a simple process of Hebbian learning could potentially be sensi-
tive to (for a potential implementation of this, see Prödohl et al. 2003).
How could individual differences emerge from this sensitivity to likely associations in the envi-
ronmental input? The primate (and human) visual system appears to be highly flexible in learning
contingencies in visual input (Cox et al. 2005; Li and DiCarlo 2010). Indeed algorithms based on
extracting contingencies in visual input that remain more stable over time result in representa-
tions that are not only useful for object recognition, but which also closely resemble the recep-
tive field sensitivities of early visual areas (Berkes and Wiskott 2005). There are therefore good
reasons to think that the nature of representations in the visual system could be shaped by one’s
experience: Given that individuals live in different environments (see Global priors and/or global
predictions, below), and may have different eye movement strategies to sample input from those
environments (particularly in patient populations), this provides a plausible cause for individual
differences. Critically here, whilst many low-level statistical properties maybe equivalent across
different image contexts, the kinds of associations that might shape the mid-level vision processes
important for local or global biases, are likely to differ. Collinearity is a good case in point, since it
seems logical that urban environments contain more collinearity (straight lines) than rural ones
(though this requires quantifying). If one’s sensitivity to collinearity is shaped by one’s input, then
one might expect that inhabitants of urban environments would show more global (integrated)
percepts (see Caparos et al. 2012 and Personality, mood, and culture, below), particularly in tasks
where collinearity is an important integration cue, like the Embedded Figures Test.

Translating Gestalt simplicity into intrinsic


anatomical constraints
Gestalt psychologists developed the formulation that the whole is different from the sum of its
parts as an explicit challenge to the empiricist notion that knowledge (or perceptual represen-
tations in this case) could reflect the building of associations between more primitive building
Individual Differences in Local and Global Perceptual Organization 719

blocks (elementary sensations in this case) based on experience. Gestalt psychologists argued that
different perceptual interpretations were selected, not because they were probable, but because
those interpretations were inherently simpler. Defining what exactly makes a given perceptual
interpretation more ‘simple’ than another is by no means trivial (see van der Helm, this volume).
One way of thinking about this is in terms of the description length of a given perceptual inter-
pretation in a coding language (Chater 1996). Some Gestalt psychologists attempted to explain
simplicity in biophysical terms: They thought that visual stimulation generated electrical fields in
the brain, and that these electrical fields could more easily settle into certain formations based on
a minimization of energy, which determined the perceptual experience of the observer. The exact
biophysical implementation in terms of electrical fields is no longer tenable per se, but it should
inspire us to think about the ways in which intrinsic biophysical constraints could influence per-
ception (see Zucker, this volume).
A useful case in point is the heuristic rule to group input on the basis of proximity: The visual
system may well be organized into retinotopic maps, such that neighboring input will lead to
activation in neighboring neurons, but these neurons are physically separate entities (albeit con-
nected by synapses), and there is no a priori reason to assume that two neurons that are physi-
cally close to each other in the brain are any more likely to communicate or combine input than
two distant neurons. If however one adds some additional constraints, such that neurons that
are closer physically on a retinotopic map in the cortex share more connections, and that later-
ally communicated signals are delayed by the slower conduction rates of non-myelinated neu-
rons, then there are plausible (though not necessarily correct) reasons to think that these intrinsic
architectural constraints could shape how perceptual input is organized such that proximity
becomes a strong grouping cue (though see above for the alternative idea that the strength of local
connectivity could be learnt based on associations in the input). The possibility that such intrinsic
constraints could have a direct impact on local and global biases in perception is borne out by a
study by Schwarzkopf et al. (2011), who demonstrate that sensitivity to a number of contextual
size illusions is correlated with the functionally defined surface area of the primary visual cortex
of each individual. An intrinsic architectural constraint may therefore have a very direct influence
on how visual signals are integrated, and, thus provide a source of variance that could be common
to a number of local and global paradigms. If this correlation is not caused by a common third
process, then we need to further identify how cortical size can be related to perceptual biases. For
example, a smaller V1 could be associated with a greater strength of lateral interactions, which
could in turn follow from the constraint that neural signals take longer to conduct over larger
areas of cortical tissue. In addition, cortical size could also influence the scale over which signals
at one level are pooled to drive signals at subsequent stages, an idea that could be tested by look-
ing at topographic relations between visual field maps (see Heinzle et al. 2011 and Harvey and
Dumoulin 2011).

Intrinsic constraints in establishing cortical oscillations


for integrating signals
Whilst cortical size and connectivity patterns may be critical intrinsic constraints, it is important
to consider that the multiplicity of potential ways in which cortical signals will need to be com-
bined to group perceptual input will require flexible mechanisms that are not fixed in the anat-
omy of the visual system. A potential candidate here is the formation of cortical rhythms (see
van Leeuwen, this volume) that enable a greater integration of spatially separated signals on the
cortex by synchronizing their firing patterns (see also Schwarzkopf et al. 2012, for evidence that
720 de-Wit and Wagemans

cortical size may influence such larger scale cortical dynamics). There are a number of sources
of evidence that these larger scale cortical rhythms are associated with more global object per-
ception (Tallon-Baudry and Bertrand 1999) and individual differences in perceptual grouping
in particular (Nikolaev et al. 2010). Indeed, changes in the formation of more long-range corti-
cal oscillations have also been directly linked to changes in grouping sensitivity in schizophre-
nia (Spencer et al. 2003; Uhlhaas et al. 2006a). In autism, there is also evidence for changes in
functional connectivity (Barttfeld et al. 2011), both purely at a neural level when perceiving the
Kanizsa illusion (Brown et al. 2005) and in relation to the perception of behavioral performance
in the detection of Mooney figures (Sun et al. 2012).
There is also causal evidence that the entrainment of cortical rhythms, either via visual stimula-
tion (Elliot and Muller 1998) or TMS (Romei et al. 2011) can directly influence perceptual organi-
zation. Indeed, Romei et al. used the Navon task to show that the entrainment of slower rhythms
(5 Hz) caused more global biases, whilst faster rhythms (20 Hz) induced more local biases. It is
tempting to speculate that slower rhythms facilitate global integration exactly because the global
percepts require the integration of signals separated by larger distances on cortical maps, and thus
require longer times (thus optimizing slower rhythms) to achieve integration.
As a side-point to debates concerning how to describe simplicity, the approach outlined above
focuses on the relative constraints imposed by the biophysical implementation of integrative
information processing. This approach contrasts with the focus on interpreting the Gestalt energy
minimization principle in terms of the length or complexity of description of different visual
interpretations (Chater 1996; see also van der Helm, this volume). We would argue that a modern
revision of the Gestalt principle of simplicity may prove more valuable in understanding percep-
tual organization when framed in terms of the Relative-Simplicity of the biological constraints on
integrated signal processing rather than the Strong-Simplicity implied by (biologically implausi-
ble) coding languages. Indeed, an inherent feature of the Strong-Simplicity approach is that all
coding languages have a common description length (Chater 1996), leaving no immediate scope
to explain individual differences.

Flexible read-out from a cortical hierarchy


As already intimated above, it is often assumed that local signals (edges) are represented in
early stages of the visual system such as V1 (e.g., Hubel and Wiesel 1959) and that more global
interpretations (segmented surfaces and shapes) are represented at higher stages such as the
lateral occipital complex or LOC (e.g., Kourtzi and Kanwisher 2001). The possibility that per-
ceptual organization occurs in stages is not an obvious feature of our phenomenology, how-
ever, which arguably contains only one clear interpretation at any one time (see van Leeuwen,
Chapter 48, this volume). A potential reconciliation here is that our conscious perception is
determined by focusing only on representations at a given level of processing (cf. Reverse
Hierarchy Theory by Hochstein and Ahissar 2002). This logic is potentially consistent with
the finding that switches between more local and global interpretations of bi-stable stimuli are
associated with increases in activity at lower and higher stages of the visual system, respec-
tively (Fang et al. 2008; de-Wit et al. 2012).
If more local and more global interpretations can be mapped onto different stages of the visual
hierarchy, and there is some flexibility across individuals regarding where in the system informa-
tion is ‘read-out’, then this could also lead to consistent sources of variability across individuals.
There are many potential ways this read out could be envisaged. It could be that people have a
bias towards reading out from higher or lower stages of processing. Alternatively, people could differ
Individual Differences in Local and Global Perceptual Organization 721

in their flexibility, with some being unable to switch to the most appropriate level for a given task.
Finally, it could be that the ability to read out from early areas versus the ability to read out from
higher areas are independent, such that an individual maybe good at accessing information from
early stages, but that is not predictive of whether they are good at computing or accessing informa-
tion from higher stages. This may seem highly speculative, but it actually has important implications
in relation to a debate within the autism literature that enhanced local processing may exist without a
reduction in global perception (central coherence) per se (Mottron et al. 2006). Mottron et al. partly
motivated the idea that people with autism have an enhancement in local processing via demonstra-
tions of greater fMRI activity in sensory processing areas. There is however substantial evidence that
activation in early areas is dependent upon the interpretations formed in higher areas of the brain
(Muckli 2010). Of particular importance here are demonstrations that perceptually organizing input
into a global shape in higher areas can cause a reduction of activation in earlier areas (de-Wit et al.
2012; Fang et al. 2008; Murray et al. 2002). At the level of fMRI therefore it is sometimes not possible
to study representations at one stage of the system independent of how those representations inter-
act with higher stages of the system. Indeed this observation in fMRI is complimented by numerous
behavioral demonstrations of a direct interaction, such that global interpretations directly influence
the accessibility of local information (Chakravarthi and Pelli 2011; He et al. 2012; Poljac et al. 2012;
Sayim et al. 2010). Thus, returning to our discussion regarding how the ‘reading out’ of informa-
tion at different stages of the cortical hierarchy may provide a useful framework for thinking about
how a local or global perceptual bias could arise, this framework also needs to take into account the
dynamic interactions between levels of the hierarchy, which will sometimes mean that the accessibil-
ity of local and global interpretations will be interdependent.

Integration and the scale of attention


Although there are instances in which the integration of visual signals into more global interpreta-
tions can occur in the absence of visual attention (Driver and Mattingley 1998), it is also undoubtedly
the case that typical visual processing is dominated by a close interaction between perceptual group-
ing and the allocation of attention (Driver et al. 2001, see also Gillebert and Humphreys, this volume).
The influence of visual attention upon visual processing is often characterized as a kind of flexible
spotlight that can spatially focus, and zoom in on important aspects of the visual field. It is plausible
to imagine that the scale of attentional focus could play a direct role in influencing the extent to which
signals are integrated with their surrounding contexts when those contexts are attended. Recent sup-
port for this idea has come from a TMS paper which highlights that disrupting attentional processing
in the parietal lobe can influence whether participants perceive a bi-stable stimulus in a local or global
configuration (Zaretskaya et al. 2013). Indeed, Robertson et al. (2013), have reported that people with
autism appear to have a kind of tunnel vision in their focus of attention, such that participants with
autism have a much sharper spatial gradient in allocating attention. Interestingly, the degree of sharp-
ening in this study also correlated with Autistic traits in a non-clinical sample.
However, there are also other ways in which a change in attentional selection might manifest.
For example, it could be that some individuals can focus more easily on task relevant features.
Such variability in the selectivity of attention is suggested by results with the Navon task in which
it appears that individuals from a different culture can both have an enhanced local or global
report (relative to observes from another culture) depending on what they are asked to focus on
(Caparos et al. 2013). Caparos et al. actually use this observation to argue that variability in the
Navon task is more associated with selective attention, and that this could be regarded as distinct
from a bias in perception per se (see Construct validity: all that varies is not global, below).
722 de-Wit and Wagemans

Personality, mood, and culture


There are numerous papers that account for a local or global bias in terms of a more general per-
sonality trait. These include a cultural bias to focus on contextual relations (in the East), versus
focusing on localized objects (in the West, see Nisbett and Miyamoto 2005), differences in mood
(Gasper and Clore 2002), regulatory focus (Förster and Higgins 2005), and of course Witkin’s
(1962) formulation of field-(in)dependent processors. As noted by Nisbett and Miyamoto, these
differences (specifically referring to those across cultures) have not yet been taken up with great
interest by vision scientists. This is potentially unfortunate given that these links to personality
and culture in the healthy population may provide insights into the broader changes in patient
groups (see sections on schizophrenia and autism).
The forms of explanation given for these personality and cultural differences are however quite
far from those normally considered within vision science. Indeed, many vision scientists would
probably think of these findings as phenomena to be explained, rather than offering explanations
in their own right. In terms of the potential factors already outlined, it could well be that many
of the effects related to mood or personality can be explained in terms of a change to the scale of
attentional focus.
Another line of explanation is suggested by the results of Caparos et al. (2012) who found evi-
dence that cultural differences may in fact be related to exposure to different environments rather
than social or personality differences per se. For example, they found that members of the same
culture show clear differences depending on their degree of exposure to urban environments. It
maybe that these different contexts promote the learning of different regularities (Are Gestalt
grouping principles an internalization of ‘likely’ input statistics?, above) or that they promote a
different scale of attentional focus (Integration and the scale of attention, above ). Indeed, other
studies on cultural differences have also found differences in the way in which urban environ-
ments are constructed in different cultures. Furthermore, this difference in the culturally spe-
cific input seem to induce different perceptual styles, even for individuals from different cultures
(Masuda and Nisbett 2006). Such a flexible induction of different styles when exposed to stimuli
from different cultures would surely require a flexible mechanism, unlikely to be accounted for by
neuroanatomical factors (considered above).

Global priors and/or global predictions


Some authors have attempted to explain dynamic interactions within the cortical hierarchy in
terms of the implementation of a form of predictive coding, whereby higher levels feed predic-
tions back to lower areas in order to compare bottom-up input with top-down predictions or
expectations (Rao and Ballard 1999). These predictive coding models predict a reduction in the
representation salience of local signals that can be ‘explained’ (are predicted) by global interpreta-
tions via feedback mechanisms. If predictive coding is implemented across many domains, this
could provide a common principle with which to explain many perceptual biases. Indeed, we have
argued that the biases in autism (in perception and other domains) could potentially be explained
in terms of predictions that are over-fitted to sensory input (Van de Cruys et al. 2013).
Predictive coding can be considered as an implementation of a much broader class of hier-
archical Bayesian inference models (Friston 2008). In more general Bayesian terms, perceptual
interpretations are determined not only on the basis of sensory input, but rather sensory input
is weighted against prior expectations. This Bayesian framework has been used in the context
of autism, to argue that perceptual differences could be explained in terms of weaker priors for
interpreting sensory input (Pellicano and Burr 2012). However, whilst Bayesian frameworks are
Individual Differences in Local and Global Perceptual Organization 723

useful for explicitly implementing approaches to perception, they do not provide any inherent
account for where the priors that bias the interpretation of sensory input actually come from (see
also Feldman, this volume). Thus, a Bayesian approach to explaining a global bias (or local in
autism) may ultimately have to operationalize changes to perceptual priors in terms of one of the
other factors outlined above.

Conclusions: As many theories as there are phenomena


The intention of this broad (but still limited) overview was to flesh out a range of important levels
of explanation for individual differences on local global tasks, without trying to select a preferred
theory per se, because (a) multiple mechanisms maybe involved, (b) the data do not enable one to
pin one’s colors to any one theory, and (c) this overview is intended to provide a counterbalance
to much of the literature, in which individual papers often consider only a narrow range of poten-
tial explanations. Despite this, there are still further important factors that could be discussed
(including spatial frequency and hemispheric differences). Also, because of space constraints we
have chosen not to discuss the potential role of local or global biases in face perception, although
this is a very important issue, particularly with regards to the debate regarding developmental
prosopagnosia (Behrmann et al. 2005; Busigny and Rossion 2011, see also see Behrmann et al.,
this volume). In addition we have also not gone into detail regarding the induction of individ-
ual differences via pharmacological interventions (Wagemans et al. 1998), that could also prove
important in understanding potential underlying differences in neurotransmitter concentrations
associated with individual differences, particularly in schizophrenia (Uhlhaas et al. 2007).

Construct validity: all that varies is not global


The dangerous face validity of local and global tasks
When learning a foreign language, one often comes across ‘false-friends’, words that sound so
intuitively like something familiar that their meaning is assumed, when (actuellement) they mean
something very different. Tasks assumed to be good measures of variance in local versus global
processing suffer from a similarly dangerous face validity. The Navon task is a prime example
in this context: this task was designed to demonstrate a general bias towards global processing
(see also Kimchi, this volume). This does not mean, however, that the primary source of indi-
vidual variability on the task will always be related to the visual integration of local signals into
global ones. There are numerous other processes at hand when performing this task, including
the resolution of the response conflict, the maintenance of the current task goal and the allocation
of visual selection mechanisms (Caparos et al. 2013, though see Personality, mood, and culture,
above ). Individual variance on this task is likely to be a mixture of all of these factors.

Critical mechanisms versus sources of variance in


tasks assessing individual differences
This reflection on the sources of variance in the Navon task brings us to an important consideration
regarding a distinction needed when interpreting individual variability, namely, between mecha-
nisms that are critical to a given task, and mechanisms that cause the most variance in a given task.
It is often tempting to assume that if a mechanism is known to be important for a given task, it will
also be the primary source of variance for that task. The danger of this assumption is brought to
light in a recent study by Goodbourn et al. (2012), in which they measured the shared variance in
three tasks known to require the functioning of magnocellular neurons. These tasks have been used
724 de-Wit and Wagemans

to motivate the idea that dyslexia (and autism) is associated with a general magnocellular deficit.
Contrary to expectations however, Goodbourn et al. found no shared variance between these three
measures, despite demonstrating a wide range of variance in their sample, that was stable over suc-
cessive testing sessions. Thus, whilst magnocellular neurons maybe critically needed in order to
perform the three tasks in question, this does not mean that variance in this neuron type provides
a primary (common) source of variance on these tasks. In many ways the Goodbourn et al. study
sets a benchmark standard for what research on individual differences in visual perception should
look like. Firstly, they tested a very large sample (over a thousand participants), and demonstrated
that levels of correlation do not differ for participants with different levels of performance, nor do
the correlations (in this instance) differ for participants with a subsample that had a diagnosis of
dyslexia. They also included a control task (thought to measure a different function) to demon-
strate that correlations with this task are as high as those between the other (‘magnocellar’) tasks.
Last but most definitely not least, they established the test-retest reliability of all of their measures
with a subsample of their participants on a different day, giving a baseline for correlations that can
be expected between tasks based on the consistency of individual differences within each task.
Returning to the critical issue of distinguishing between mechanisms that are critical for a given
task and mechanisms that are a primary source of variance for that task, it is important that future
research focuses not just on individual differences on one task, but focuses on the correlations
across tasks. If one assumes that variance on a given task relates to an underlying process, then it
is important to demonstrate that this task correlates with variance on another task (and the more
dissimilar the better) which is assumed to be dependent upon the same underlying mechanism.
This focus on a common factor underlying variance in multiple tasks would bring us back to the
original formulation of consistent individual biases identified in the correlation between the rod
and frame task and the EFT used by Witkin.
The subsequent literature on local and global biases since Witkin has also revealed some striking
correlations between very different tasks. For example, the difference score in the Block Design
task (specifically between the locally segmented and standard versions) has been found to corre-
late with a number of more basic perceptual tasks, even though the Block Design task requires a
very complicated attentional/saccadic sampling and motor reconstruction process. Indeed, there
is evidence of correlated biases in the ability to respond to local or global properties across differ-
ent modalities (Bouvet et al. 2011). Furthermore, there have been several replications of (at least)
a cluster of correlated tasks in the general population (Grinter et al. 2009; Milne and Szczerbinski
2009) and in patient groups (Bolte et al. 2007; Uhlhaas et al. 2006b). At the same time however,
it is also clear that many tasks operationalized as local or global measures do not do not share a
primary source of common variance (Milne and Szcerbinski 2009).

How much common variance is there between different


local and global paradigms?
Milne and Szczerbinski (2009) have provided a great service to the field by testing whether the
individual variability on a wide range of tasks assumed to measure a local or global bias actually
tasks load on a common factor. Milne and Szczerbinski found a cluster of correlated tasks that
loaded on a factor that is closely related to Witkin’s original work, but found that a large range
of had very little, if any loading on this factor (including the Navon task). Just as a correlation
can have many underlying causes, so too can its absence. It could be that the many tasks opera-
tionalized as measures of a local or global bias simply do not depend upon a common process
involved in local to global integration. This is also a methodological concern in repetitive tasks,
Individual Differences in Local and Global Perceptual Organization 725

where a discrimination that could at face value appear to require a global analysis (based on the
integration of multiple spatially separated local signals) can sometimes be solved by picking up on
one local cue. An alternative concern is that whilst the integration of local signals into global ones
is critical to these tasks, it is possible that this aspect of the task is not the most prominent factor
in generating individual variance on these tasks (as already outlined above). The impact of this
problem is likely to be compounded by the fact that tests proposed as measures of local or global
bias have very different tasks demands, and very different output measures (e.g., the drawing of
the Complex Figure of Rey). This problem is also interrelated to the fact that there may not be
sufficient variability in the population selected for variance in a mechanism of interest to manifest
as a clear factor dominating the individual differences (especially if one only recruits Western
undergrads with a psychology major). The validity of this problem is potentially borne out by the
fact that correlations between local-global paradigms are often higher in patient groups—who
presumably have more variance on the continuum of interest (see sections on schizophrenia and
autism). A final possibility however, is that the integration of local signals into global representa-
tions is simply implemented differently for different stimuli and different task demands.
Differentiating between the idea that the brain is just a ‘bag of tricks’ and the idea that common
mechanisms are involved, but become hard to identify because individual variability is denomi-
nated by other factors for a given task, will be a major challenge for research which aims to use
individual differences as a means of unearthing underlying mechanisms. This problem can be
illustrated with two studies already discussed, in which a common mechanisms may seem to
be implied, but a correlation is not found. First, in the study reported earlier by Schwarzkopf
et al. (2011), they find that the size of the primary visual cortex influences the strength of two
contextual illusions, but that the strength of these illusions did not correlate with each other. In a
similarly intriguing example, Caparos et al. (2012) have found that although exposure to an urban
environment influences bias on the Navon task and sensitivity to a contextual illusion, perfor-
mance on these tasks did not correlate. However, interpreting these ‘null’ effects is limited by our
current focus on null hypothesis testing, which only enables one to report if the null hypothesis
can be rejected. In other words, although they highlight that there is absence of evidence for a cor-
relation, they do not actually provide evidence against the existence of a correlation. Hopefully a
greater emphasis on the power needed to find effects (Button et al. 2013) and an increasing adop-
tion of Bayesian statistical techniques will enable studies to more meaningfully quantify support
for, and against, the existence of a correlation.
Clearly, there is more work to be done to establish when different local-global paradigms do and
do not correlate at an individual level. This will require larger scale studies that simultaneously
test many paradigms, and use statistical techniques that accumulate evidence both for and against
the existence of correlations. Ideally, these studies also need to test broad participant samples,
because, as discussed in the next section, it is often within patient samples that one sees clearer
correlations between tests.

Empirical differences in different populations


Schizophrenia
There is substantial evidence that the disorganized symptom types of schizophrenia are associ-
ated with changes to perceptual organization (see Silverstein and Keane 2011, for a review). These
changes to perceptual organization manifest as a reduced sensitivity to a wide range of group-
ing cues, including proximity, similarity (Kurylo et al. 2007), collinearity (Silverstein et al. 2000;
726 de-Wit and Wagemans

Must et al. 2004; Kéri et al. 2005), and common fate (Chen et al. 2003). The reduced sensitivity to
common fate is evidenced via the increase in the number of coherent dots required to recognize
motion in one direction. As will become apparent, this deficit in global motion provides a very
direct parallel to that revealed in autism. Somewhat surprisingly however, direct theoretical or
empirical comparisons between the perceptual organizational differences in autism and schizo-
phrenia are rare, but direct comparisons including both patient groups have found highly compa-
rable changes (Bolte et al. 2007).
One of the reasons for the lack of direct comparisons between autism and schizophrenia may
result from a very explicit attempt by some researchers in schizophrenia to avoid some of the more
clinical or indirect measures of perceptual organization popular in autism research that poten-
tially include too many contributing factors (Kurylo et  al. 2007). Interestingly, in their review
Silverstein and Keane (2011) explicitly exclude any discussion of what they call ‘global-local’ tasks
(one assumes they mean the Navon task), because they argue most of the variance induced in
these tasks is caused by attentional processes (which may indeed be valid; see Caparos et al. 2013).
Despite the emphasis on tests that look more directly at the sensitivity to different Gestalt
grouping principles, there are also interesting results in schizophrenia using slightly less con-
strained tests of perceptual organization. Johnson et al. (2005) for example report a clear local bias
in a version of the Navon task, in which they match the salience of the local and global targets such
that there is no ‘global precedence’ for control participants. Perhaps more interestingly, Uhlhaas
et al. (2006b) measure in parallel the ability of patients to detect a contour, group Gabor elements,
recognize a Mooney figure and assess size in contextual illusions. Uhlhaas et al. find a very con-
sistent change in perceptual organization across these very different tasks (towards what could be
described as a local bias). They also make clear that this change is more closely associated with
‘disorganized’ symptoms, although a differential sensitivity to different contextual effects may also
be evident for other symptoms (Yang et al. 2012). Uhlhaas et al. also highlight that performance
in all three of their tasks develops with age, something they raise to motivate a speculation that
the development of the ability to form long-range cortical synchronizations may be critical to all
of these tasks. The importance of long-range cortical synchronization in perceptual organization
in schizophrenia is also supported by work looking at Kanizsa figures (Spencer et al. 2003) and
Mooney figures (Uhlhaas et al. 2006a).
As in autism (see below) there is also some debate regarding whether the perceptual changes
are causal to the broader clinical syndrome. There are certainly interesting parallels between the
reductions in perceptual organization and the less organized world views of Schizophrenic patients
(Uhlhaas and Mishara 2007), and there are certainly reported correlations between perceptual
thresholds for form and motion coherence and deficits in Theory of Mind (Kelemen et al. 2005),
although this was studied with respect to the negative symptoms of schizophrenia. Also of signifi-
cant interest for this chapter, while correlations between different perceptual tasks appear to be
higher amongst Schizophrenics (Uhlhaas et al. 2006b), this does not appear to imply a fundamen-
tally altered mode of perceptual organization. In contrast, there is evidence that the continuum
of symptoms associated with certain aspects of schizophrenia also correlate with impairments
in contour integration and a reduced sensitivity to context illusions for non-clinical participants
scoring high on both the Schizotypy and the Thought Disorder Index (Uhlhaas et al. 2004).

Autism
Numerous reviews on the perceptual abilities of people with autism have concluded that there
are differences, but how consistent these differences are, and how they should be characterized or
Individual Differences in Local and Global Perceptual Organization 727

explained is not yet clear (Dakin and Frith 2005). This section will not attempt to provide an addi-
tional review per se, rather this section will selectively focus on issues that might help to resolve
some of these inconsistencies, or are of general interest to other questions regarding cultural and
developmental differences in perceptual organization.
Frith (1989) initially launched the focus on local and global differences in autism by focusing on
the Block Design test and the EFT. Whilst interesting in themselves, the exact conclusions that one
can draw from these findings regarding perceptual organization are complicated by the multiple
processes undoubtedly recruited in solving these tasks. Organizing perceptual input is certainly
a critical mechanism in these tests, but whether it is the main source of variance in all instances
is questionable. There does seem to be some evidence that these tasks are more closely related
to each other in autism (Bolte et al. 2007) than in the typical population (Pellicano et al. 2005),
which could suggest that the role of perceptual organization becomes more evident when it has a
larger influence on task performance. However, given the likely role of general task solving func-
tions and strategies in tasks such as these, and the known executive function problems in autism,
conclusions from these tasks certainly require careful consideration.
Many researchers have therefore attempted to use different paradigms, and in particular ones
that are more clearly motivated from vision science. One such task which provides some promise
here is the measurement of the threshold required for coherent global motion detection. This has
become one of the most replicated findings in autism (Davis et al. 2006; Milne et al. 2002; Spencer
et al. 2000; Tsermentseli et al. 2008), although the effect is more clearly seen with short presenta-
tion times (Robertson et al. 2012). Interestingly, there is also evidence for a negative correlation
between global motion thresholds and more complex tasks like the EFT for non-clinical samples
who score highly on Autistic traits (Grinter et al. 2009). Milne and Szczerbinski (2009) also find a
negative correlation between a ‘disembedding’ factor (based on performance on the block design
and EFT task) and global motion thresholds in non-clinical sample.
However, the attempt to shift to simpler paradigms has not resolved the debate regarding the
existence and nature of perceptual changes in autism. In this regard it is striking that reviews in
the domain of schizophrenia have come to a much clearer conclusion that there is an impaired
or weakened use of grouping principles in perceptual organization. There are numerous reasons
why the picture may be more complicated in autism, the most obvious being that the nature of the
perceptual changes may be very different. For example, whilst a breakdown in contour integration
is one of the most consistently associated findings with the disorganized symptoms of schizophre-
nia, there are numerous indications that this process is not impaired in autism (Blake et al. 2003;
Del Viva et al. 2006). Another salient difference is that while schizophrenia patients are normally
diagnosed or at least studied in adulthood (or late adolescence), patients with autism are studied
from much younger age. This significantly complicates the interpretation of studies on younger
samples with autism because the processes underlying the integration of local information into a
more global organization are known to continue to develop from childhood into adulthood. This
is well illustrated in a study by Scherf et al. (2008), who demonstrate that a difference between
participants with autism and typically developing children on the Navon task only emerged later
into adolescence as the typically developing children begin to adopt an increasingly global bias.
The role of development could also be important in contextual illusions. An initial study by
Happé (1996) showed a reduced sensitivity to a number of contextual illusions, but Ropar and
Mitchell (2001) did not find evidence for such a difference. This inconsistency is unfortunate in
terms of linking between different strands of research because these are versions of the same illu-
sions that are related to V1 size in adulthood, are biased in different cultures, and reveal weaker
contextual effects in patients with disorganized symptoms of schizophrenia. There is however
728 de-Wit and Wagemans

clear evidence that the sensitivity to these illusions also develops, and that the adult-like sensitiv-
ity is not apparent until later adolescence (Doherty et al. 2010; Káldy and Kovács 2003). To our
knowledge, studies that have compared autism and control samples at older ages have in fact
found evidence for differential sensitivities to these illusions (Bolte et al. 2007, also see Mitchell
et al. 2010), suggesting that participants with autism do perceive these illusions differently, but
that this difference is only clear once the perceptual processes underlying these illusions have
matured. This is not simply a methodological point. It also has important theoretical implications
regarding the causal role of differences in perceptual bias in autism. If the perceptual changes in
autism are more reliably discernible from the typical population only at older ages, then this sug-
gests that, if these perceptual biases are different, they may not have a causal role in generating
the broader syndrome, but rather emerge based on an underlying mechanism that impacts many
domains of processing.

Looking forward
Our aim in this chapter was to provide a global overview of a fragmented literature. Much of the
existing literature focuses only on specific tasks, one patient group, or one theoretical approach
or simply negates individual differences as a valid research tool. This chapter provides somewhat
more room to explore the space of theories, tasks, methods, patients, and populations of interest.
Hopefully, this outline will motivate larger scales of empirical research and will provide a broader
scope with which local and global tasks can be understood both as an intrinsic part of perceptual
organization, but also in terms of its relation to a domain-general challenge in combining local
signals into more abstract global wholes. However, rather than focusing on (premature) conclu-
sions, this final section will focus on some reflections for moving this field of research forward.

Individual differences can be seen as a tool, not a problem


Perhaps out of cognitive dissonance resulting from the convenience of using relatively homogene-
ous samples of undergraduates as participants, most vision scientists assume that individual vari-
ability is either negligible or noise. We would argue that taking an interest in individual differences
will not only provide insights into the mechanisms underlying mid and higher level research
illustrated here, but that taking into account (and controlling for and/or modeling) individual
variability can also provide a much more precise means of measuring even lower level visual phe-
nomena. Busse et al. (2011) provide an excellent illustration of this in modeling the biases, strate-
gies and trial history when measuring contrast sensitivity in mice. In this sense vision science
could learn from other areas in psychology in implementing statistical techniques that explicitly
model how differences between conditions differ across individuals.

Learning to use individual differences as a research tool


One of the biggest challenges in trying to obtain an overview of this literature (which would
also undermine actual meta-analytical reviews) is the challenge in defining what tasks should
be regarded as genuine replications. Many studies employ the concepts of local and global biases
with very different tasks (or modifications to the same task). It is a pity here that Witkin’s original
focus on tasks that are known to correlate (EFT and the rod-and-frame), and thus load on an
underlying factor, has not been consistently maintained. Of course, returning to the focus on a
factor (rather than one task) requires that we have a clear set of tasks that one regards as a defini-
tive operalization of a local or global bias. Often the Navon task is assumed to fulfill this role. In
Individual Differences in Local and Global Perceptual Organization 729

the context of individual differences however, we think this is not an optimal choice, because the
test-retest reliability is quite low (Dale and Arnell 2013), it has an unclear relationship to other
measures of local and global bias (Milne and Szczerbinski 2009), and more critically, there are too
many sources of variance contributing to task performance. At the current time, we would regard
an advantage on the EFT task and a reduction in detecting coherent motion as good benchmarks
for a local perceptual bias (especially when used in combination). However, these tasks, and espe-
cially the EFT (White and Saldana 2011), are also not without their problems. Ideally, the field
needs to translate experimental paradigms into broad-scale test batteries that provide continuous
variability for multiple aspects of local to global integration with minimal variation in the execu-
tive task demands.

Broader demands for larger scales research, open data,


evidence accumulation, and open tasks
This last recommendation could be applied to most modern psychology and neuroscience (Button
et al. 2013), but particularly within this domain we think a different scale, and more open style
of research is needed. Many of the studies informing this literature are biased on surprisingly
small sample sizes, for example claims are made about differences between Eastern and Western
cultures based on samples sizes below 50. Given the differences between different environments
(urban or rural) in the same culture (Caparos et al. 2012), one has to be concerned that differences
with such small samples may result as a consequence of the testing contexts rather than personal-
ity differences in culture per se.
Alternatively, smaller scale studies can be useful when the results are actually based on exactly
the same methods, rather than harder to interpret, ‘conceptual’ replications (Yong 2012). This
shift to more genuine replications would be hugely facilitated by a shift to Free and Open Source
Software. Indeed, implementing tests online would also be advantageous here—not only in
making the same tests available to researchers working in different cultures, but also to clinical
researchers working with different patient groups.
Finally, a shift is also needed to a more open mode of data availability. Other fields of sci-
ence have succeeded in making such a shift (e.g., the Human Genome Project), and it is time we
think about turning the resources available from publishing companies, academic societies and
research councils towards the development of more open platforms for sharing data and experi-
mental code. A change in the openness of data should also be complimented by a change in how
evidence is statistically accumulated. Currently individual studies are interpreted in a ‘one-shot’
null-hypothesis testing framework. Facilitating the access to previous data, would enable a more
informative inference based on the accumulated evidence of the existence of an effect taking into
account data across multiple studies.

Acknowledgements
We would like to thank Sander Van de Cruys, Ruth Van der Hallen, Kris Evers, Cees van Leeuwen,
Marlene Behrmann, Sam Schwarzkopf, Karina Linnell, Roeland Verhallen, Pieter Moors, Jonas
Kubilius, Brian Keane, Steve Silverstein, and Peter van der Helm for providing valuable feedback
on a previous version of this chapter. The Navon and Mooney images for Figure 35.1 were pro-
vided by Sander Van de Cruys. This work was supported by long-term structural funding from
the Flemish Government to JW (METH/08/02) and a postdoctoral fellowship from the Research
Foundation—Flanders (FWO) to LdW.
730 de-Wit and Wagemans

References
Arcaro, M. J., McMains, S. A., Singer, B. D., and Kastner, S. (2009). Retinotopic organization
of human ventral visual cortex. The Journal of Neuroscience 29(34): 10638–52. doi:10.1523/
JNEUROSCI.2807-09.2009.
Barttfeld, P., Wicker, B., Cukier, S., Navarta, S., Lew, S., and Sigman, M. (2011). A big-world network
in ASD: Dynamical connectivity analysis reflects a deficit in long-range connections and an excess of
short-range connections. Neuropsychologia 49(2): 254–63. doi:10.1016/j.neuropsychologia.2010.11.024.
Behrmann, M., Avidan, G., Marotta, J. J., and Kimchi, R. (2005). Detailed exploration of face-related
processing in congenital prosopagnosia: 1. Behavioral findings. Journal of Cognitive Neuroscience
17(7): 1130–49. doi:10.1162/0898929054475154.
Berkes, P. and Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties.
Journal of Vision 5(6). doi:10.1167/5.6.9.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological
Review 94(2): 115–47. doi:10.1037/0033-295X.94.2.115
Bölte, S., Holtmann, M., Poustka, F., Scheurich, A., and Schmidt, L. (2007). Gestalt perception and
local-global processing in high-functioning autism. Journal of Autism and Developmental Disorders
37(8): 1493–504. doi:10.1007/s10803-006-0231-x.
Blake, R., Turner, L. M., Smoski, M. J., Pozdol, S. L., & Stone, W. L. (2003). Visual Recognition of
Biological Motion is Impaired in Children With Autism. Psychological Science 14(2): 151–57.
doi:10.1111/1467-9280.01434.
Bouvet, L., Rousset, S., Valdois, S., & Donnadieu, S. (2011). Global precedence effect in audition and
vision: evidence for similar cognitive styles across modalities. Acta Psychologica 138(2): 329–35.
doi:10.1016/j.actpsy.2011.08.004.
Brown, C., Gruber, T., Boucher, J., Rippon, G., and Brock, J. (2005). Gamma abnormalities during
perception of illusory figures in autism. Cortex 41(3): 364–76. doi:10.1016/S0010-9452(08)70273-9.
Busigny, T. and Rossion, B. (2011). Holistic processing impairment can be restricted to faces in acquired
prosopagnosia: Evidence from the global/local Navon effect. Journal of Neuropsychology 5(1): 1–14.
doi:10.1348/174866410X500116.
Busse, L., Ayaz, A., Dhruv, N. T., Katzner, S., Saleem, A. B., Schölvinck, M. L., et al. (2011). The detection
of visual contrast in the behaving mouse. The Journal of Neuroscience 31(31): 11351–61. doi:10.1523/
JNEUROSCI.6689-10.2011.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., and Munafò, M. R.
(2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews
Neuroscience 14(5): 365–76. doi:10.1038/nrn3475.
Caparos, S., Ahmed, L., Bremner, A. J., de Fockert, J. W., Linnell, K. J., and Davidoff, J. (2012). Exposure
to an urban environment alters the local bias of a remote culture. Cognition 122(1) : 80–5. doi:10.1016/j.
cognition.2011.08.013.
Caparos, S., Linnell, K. J., Bremner, A. J., Fockert, J. W. de, and Davidoff, J. (2013). Do local and global
perceptual biases tell us anything about local and global selective attention? Psychological Science.
doi:10.1177/0956797612452569.
Chakravarthi, R. and Pelli, D. G. (2011). The same binding in contour integration and crowding. Journal of
Vision 11(8). doi:10.1167/11.8.10.
Chater, N. (1996). Reconciling simplicity and likelihood principles in perceptual organization. Psychological
Review 103(3): 566–81.
Chen, Y., Nakayama, K., Levy, D., Matthysse, S., and Holzman, P. (2003). Processing of global, but not
local, motion direction is deficient in schizophrenia. Schizophrenia Research 61(2-3): 215–27.
Cox, D. D., Meier, P., Oertelt, N., and DiCarlo, J. J. (2005). ‘Breaking’ position-invariant object recognition.
Nature Neuroscience 8(9): 1145–7. doi:10.1038/nn1519.
Individual Differences in Local and Global Perceptual Organization 731

Cronbach, L. (1957). The two disciplines of scientific psychology. American Psychologist 12(11): 671–84.
Dakin, S. and Frith, U. (2005). Vagaries of visual perception in autism. Neuron 48(3): 497–507.
doi:10.1016/j.neuron.2005.10.018.
Dale, G. and Arnell, K. M. (2013). Investigating the stability of and relationships among global/
local processing measures. Attention, Perception and Psychophysics 75(3): 394–406. doi:10.3758/
s13414-012-0416-7.
Davis, R. A. O., Bockbrader, M. A., Murphy, R. R., Hetrick, W. P., and O’Donnell, B. F. (2006). Subjective
perceptual distortions and visual dysfunction in children with autism. Journal of Autism and
Developmental Disorders 36(2): 199–210. doi:10.1007/s10803-005-0055-0.
de-Wit, L. (2013). Stimuli used to study individual differences in local and global perceptual organization.
figshare. doi:10.6084/m9.figshare.707082.
de-Wit, L. H., Kubilius, J., Wagemans, J., and Op de Beeck, H. P. (2012). Bistable Gestalts reduce activity in
the whole of V1, not just the retinotopically predicted parts. Journal of Vision 12(11). doi:10.1167/12.11.12.
Del Viva, M. M., Igliozzi, R., Tancredi, R., and Brizzolara, D. (2006). Spatial and motion integration in
children with autism. Vision Research 46(8-9): 1242–52. doi:10.1016/j.visres.2005.10.018.
Demeyer, M. and Machilsen, B. (2012). The construction of perceptual grouping displays using GERT.
Behavior Research Methods 44(2): 439–46. doi:10.3758/s13428-011-0167-8.
Doherty, M. J., Campbell, N. M., Tsuji, H., and Phillips, W. A. (2010). The Ebbinghaus
illusion deceives adults but not young children. Developmental Science 13(5): 714–21.
doi:10.1111/j.1467-7687.2009.00931.x.
Driver, J. and Mattingley, J. B. (1998). Parietal neglect and visual awareness. Nature Neuroscience
1(1): 17–22. doi:10.1038/217.
Driver, J., Davis, G., Russell, C., Turatto, M., and Freeman, E. (2001). Segmentation, attention and
phenomenal visual objects. Cognition 80(1–2): 61–95.
Duncan, J. (2012). How Intelligence Happens. Yale University Press.
Elder, J. H. and Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. Journal of Vision 2(4): 324–53. doi:10:1167/2.4.5.
Elliott, M. A. and Müller, H. J. (1998). Synchronous information presented in 40-Hz flicker enhances visual
feature binding. Psychological Science 9(4): 277–83. doi:10.1111/1467-9280.00055.
Fang, F., Kersten, D., and Murray, S. O. (2008). Perceptual grouping and inverse fMRI activity patterns in
human visual cortex. Journal of Vision 8(7). doi:10.1167/8.7.2.
Förster, J., & Higgins, E. (2005). How global versus local perception fits regulatory focus. Psychological
Science 16(8): 631–36. doi:10.1111/j.1467-9280.2005.01586.x
Friston, K. (2008). Hierarchical models in the brain. PLoS Computational Biology, 4(11), e1000211.
doi:10.1371/journal.pcbi.1000211.
Frith, U. (1989). Autism: Explaining the enigma. Oxford: Blackwell.
Gasper, K., & Clore, G. L. (2002). Attending to the big picture: mood and global versus local processing of
visual information. Psychological Science 13(1): 34–40.
Geisler, W. S. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of
Psychology 59(1): 167–92. doi:10.1146/annurev.psych.58.110405.085632.
Gilaie-Dotan, S., Kanai, R., Bahrami, B., Rees, G., and Saygin, A. P. (2012). Neuroanatomical
correlates of biological motion detection. Neuropsychologia 51(3): 457–63. doi:10.1016/j.
neuropsychologia.2012.11.027.
Goodbourn, P. T., Bosten, J. M., Hogg, R. E., Bargary, G., Lawrance-Owen, A. J., and Mollon, J. D. (2012).
Do different ‘magnocellular tasks’ probe the same neural substrate? Proceedings of the Royal Society,
B. Biological sciences 279(1745): 4263–71. doi:10.1098/rspb.2012.1430.
Goodenough, D. R. (1976). The role of individual differences in field dependence as a factor in learning and
memory. Psychological Bulletin 83(4): 675–94.
732 de-Wit and Wagemans

Gottschaldt, K. (1926). Über den Einfluß der Erfahrung auf die Wahrnehmung von Figuren. I. Über den
Einfluß gehäufter Einprägung von Figuren auf ihre Sichtbarkeit in umfassenden Konfigurationen
[About the influence of experience on the perception of figures]. Psychologische Forschung 8: 261–317.
Grinter, E. J., Maybery, M. T., Van Beek, P. L., Pellicano, E., Badcock, J. C., and Badcock, D. R. (2009).
Global visual processing and self-rated autistic-like traits. Journal of Autism and Developmental
Disorders 39(9): 1278–90. doi:10.1007/s10803-009-0740-5.
Happé, F. G. (1996). Studying weak central coherence at low levels: children with autism do not succumb
to visual illusions. A research note. Journal of Child Psychology and Psychiatry, and its Allied Disciplines
37(7): 873–7.
Harvey, B. M. and Dumoulin, S. O. (2011). The relationship between cortical magnification factor and
population receptive field size in human visual cortex: Constancies in cortical architecture. The Journal
of Neuroscience 31(38): 13604–12. doi:10.1523/JNEUROSCI.2572-11.2011.
He, D., Kersten, D., and Fang, F. (2012). Opposite modulation of high—and low-level visual aftereffects by
perceptual grouping. Current Biology 22(11): 1040–5. doi:10.1016/j.cub.2012.04.026.
Heinzle, J., Kahnt, T., and Haynes, J.-D. (2011). Topographically specific functional connectivity
between visual field maps in the human brain. NeuroImage 56(3): 1426–36. doi:10.1016/j.
neuroimage.2011.02.077.
Hochstein, S. and Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804. doi:10.1016/S0896-6273(02)01091-7.
Hubel, D. H. and Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The
Journal of Physiology 148(3): 574–91.
Johnson, S. C., Lowery, N., Kohler, C., and Turetsky, B. I. (2005). Global-local visual processing in
schizophrenia: evidence for an early visual processing deficit. Biological Psychiatry 58(12): 937–46.
doi:10.1016/j.biopsych.2005.04.053.
Káldy, Z. and Kovács, I. (2003). Visual context integration is not fully developed in 4-year-old children.
Perception 32(6): 657–66. doi:10.1068/p3473.
Kanai, R. and Rees, G. (2011). The structural basis of inter-individual differences in human behaviour and
cognition. Nature Reviews Neuroscience 12(4): 231–42. doi:10.1038/nrn3000.
Kelemen, O., Erdélyi, R., Pataki, I., Benedek, G., Janka, Z., and Kéri, S. (2005). Theory of Mind and
motion perception in schizophrenia. Neuropsychology 19(4): 494–500. doi:10.1037/0894-4105.19.4.494.
Kéri, S., Kelemen, O., Benedek, G., and Janka, Z. (2005). Lateral interactions in the visual cortex of patients
with schizophrenia and bipolar disorder. Psychological Medicine 35(7): 1043–51.
Kourtzi, Z. and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293(5534): 1506–9. doi:10.1126/science.1061133.
Kurylo, D. D., Pasternak, R., Silipo, G., Javitt, D. C., and Butler, P. D. (2007). Perceptual organization
by proximity and similarity in schizophrenia. Schizophrenia Research 95(1-3): 205–14. doi:10.1016/j.
schres.2007.07.001.
Li, N. and DiCarlo, J. J. (2010). Unsupervised natural visual experience rapidly reshapes size-invariant
object representation in inferior temporal cortex. Neuron 67(6): 1062–75. doi:10.1016/j.
neuron.2010.08.029.
Masuda, T. and Nisbett, R. E. (2006). Culture and change blindness. Cognitive Science 30(2): 381–99.
doi:10.1207/s15516709cog0000_63.
Matussek, P. (1952). [Studies on delusional perception. I. Changes of the perceived external world in
incipient primary delusion]. Archiv für Psychiatrie und Nervenkrankheiten, vereinigt mit Zeitschrift für
die gesamte Neurologie und Psychiatrie, 189(4), 279–319; contd.
Milne, E. and Szczerbinski, M. (2009). Global and local perceptual style, field-independence, and central
coherence: An attempt at concept validation. Advances in Cognitive Psychology 5: 1–26. doi:10.2478/
v10053-008-0062-8.
Individual Differences in Local and Global Perceptual Organization 733

Milne, E., Swettenham, J., Hansen, P., Campbell, R., Jeffries, H., and Plaisted, K. (2002). High motion
coherence thresholds in children with autism. Journal of Child Psychology and Psychiatry, and its Allied
Disciplines 43(2): 255–63.
Mitchell, P., Mottron, L., Soulières, I., and Ropar, D. (2010). Susceptibility to the Shepard illusion in
participants with autism: Reduced top-down influences within perception? Autism Research 3(3): 
113–19. doi:10.1002/aur.130.
Mooney, C. M. (1957). Age in the development of closure ability in children. Canadian Journal of
Psychology 11(4): 219–26.
Mottron, L., Dawson, M., Soulières, I., Hubert, B., & Burack, J. (2006). Enhanced perceptual functioning
in autism: an update, and eight principles of autistic perception. Journal of Autism and Developmental
Disorders 36(1): 27–43. doi:10.1007/s10803-005-0040-7
Muckli, L. (2010). What are we missing here? Brain imaging evidence for higher cognitive functions in
primary visual cortex V1. International Journal of Imaging Systems Technology 20(2): 131–9. doi:10.1002/
ima.v20:2.
Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P., and Woods, D. L. (2002). Shape perception
reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences
99(23): 15164–9. doi:10.1073/pnas.192579399.
Must, A., Janka, Z., Benedek, G., and Kéri, S. (2004). Reduced facilitation effect of collinear flankers on
contrast detection reveals impaired lateral connectivity in the visual cortex of schizophrenia patients.
Neuroscience Letters 357(2): 131–4. doi:10.1016/j.neulet.2003.12.046.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology 9(3): 353–83. doi:10.1016/0010-0285(77)90012-3
Newell, A. (1973). You can’t play 20 questions with nature and win: Projective comments on the papers of this
symposium. In: W. G. Chase (ed.), Visual information processing, pp. 283–308. New York: Academic Press.
Nikolaev, A. R., Gepshtein, S., Gong, P., and van Leeuwen, C. (2010). Duration of coherence intervals in
electrical brain activity in perceptual organization. Cerebral Cortex 20(2): 365–82. doi:10.1093/cercor/bhp107.
Nisbett, R. E., and Miyamoto, Y. (2005). The influence of culture: holistic versus analytic perception. Trends
in Cognitive Sciences 9(10): 467–73. doi:10.1016/j.tics.2005.08.004.
Pelli, D. G., Majaj, N. J., Raizman, N., Christian, C. J., Kim, E., and Palomares, M. C. (2009). Grouping
in object recognition: The role of a Gestalt law in letter identification. Cognitive Neuropsychology 26(1): 
36–49. doi:10.1080/13546800802550134.
Pellicano, E., & Burr, D. (2012). When the world becomes “too real”: a Bayesian explanation of autistic
perception. Trends in Cognitive Sciences 16(10): 504–10. doi:10.1016/j.tics.2012.08.009
Pellicano, E., Maybery, M., and Durkin, K. (2005). Central coherence in typically developing
preschoolers: Does it cohere and does it relate to mindreading and executive control?
Journal of Child Psychology and Psychiatry, and its Allied Disciplines 46(5): 533–47.
doi:10.1111/j.1469-7610.2004.00380.x.
Peterson, M. A. (1994). Object recognition processes can and do operate before figure–ground organization.
Current Directions in Psychological Science 3(4): 105–111. doi:10.1111/1467-8721.ep10770552.
Poljac, E., de-Wit, L., and Wagemans, J. (2012). Perceptual wholes can reduce the conscious accessibility of
their parts. Cognition 123(2): 308–12. doi:10.1016/j.cognition.2012.01.001
Prodöhl, C., Würtz, R. P., and von der Malsburg, C. (2003). Learning the Gestalt rule of collinearity from
object motion. Neural Computation 15(8): 1865–96. doi:10.1162/08997660360675071.
Ramachandran, V. S. (1985). Guest editorial: The neurobiology of perception. Perception 14: 127–34.
Rao, R. P. N. and Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation
of some extra-classical receptive-field effects. Nature Neuroscience 2(1): 79–87. doi:10.1038/4580.
Robertson, C. E., Martin, A., Baker, C. I., & Baron-Cohen, S. (2012). Atypical Integration of Motion
Signals in Autism Spectrum Conditions. PLoS ONE 7(11): e48173. doi:10.1371/journal.pone.0048173
734 de-Wit and Wagemans

Robertson, C. E., Kravitz, D. J., Freyberg, J., Baron-Cohen, S., and Baker, C. I. (2013). Tunnel
vision: Sharper gradient of spatial attention in autism. The Journal of Neuroscience 33(16): 6776–81.
doi:10.1523/JNEUROSCI.5120-12.2013.
Romei, V., Driver, J., Schyns, P. G., and Thut, G. (2011). Rhythmic TMS over parietal cortex links distinct
brain frequencies to global versus local visual processing. Current Biology 21(4): 334–7. doi:10.1016/j.
cub.2011.01.035.
Ropar, D. and Mitchell, P. (2001). Susceptibility to illusions and performance on visuospatial tasks in
individuals with autism. The Journal of Child Psychology and Psychiatry and its Allied Disciplines
42(04): 539–49. doi:10.1017/S002196300100717X.
Sayim, B., Westheimer, G., and Herzog, M. H. (2010). Gestalt factors modulate basic spatial vision.
Psychological Science 21(5): 641–4. doi:10.1177/0956797610368811.
Scherf, K. S., Luna, B., Kimchi, R., Minshew, N., and Behrmann, M. (2008). Missing the big picture: Impaired
development of global shape processing in autism. Autism Research 1(2): 114–29. doi:10.1002/aur.17.
Scholl, B. J., Pylyshyn, Z. W., and Feldman, J. (2001). What is a visual object? Evidence from target
merging in multiple object tracking. Cognition 80(1–2): 159–77.
Schwarzkopf, D. S., Song, C., and Rees, G. (2011). The surface area of human V1 predicts the subjective
experience of object size. Nature Neuroscience 14(1): 28–30. doi:10.1038/nn.2706.
Schwarzkopf, D. S., Robertson, D. J., Song, C., Barnes, G. R., and Rees, G. (2012). The frequency of
visually induced gamma-band oscillations depends on the size of early human visual cortex. The Journal
of Neuroscience 32(4): 1507–12. doi:10.1523/JNEUROSCI.4771-11.2012.
Silverstein, S M, Kovács, I., Corry, R., and Valone, C. (2000). Perceptual organization, the disorganization
syndrome, and context processing in chronic schizophrenia. Schizophrenia Research 43(1): 11–20.
Silverstein, S. M., and Keane, B. P. (2011). Perceptual organization impairment in schizophrenia and
associated brain mechanisms: Review of research from 2005 to 2010. Schizophrenia Bulletin 37(4): 
690–9. doi:10.1093/schbul/sbr052.
Spencer, J., O’Brien, J., Riggs, K., Braddick, O., Atkinson, J., and Wattam-Bell, J. (2000). Motion
processing in autism: Evidence for a dorsal stream deficiency. Neuroreport 11(12): 2765–7.
Spencer, K. M., Nestor, P. G., Niznikiewicz, M. A., Salisbury, D. F., Shenton, M. E., and McCarley, R. W.
(2003). Abnormal neural synchrony in schizophrenia. The Journal of Neuroscience 23(19): 7407–11.
Sumner, P., Edden, R. A. E., Bompas, A., Evans, C. J., and Singh, K. D. (2010). More GABA, less
distraction: A neurochemical predictor of motor decision speed. Nature Neuroscience 13(7): 825–7.
doi:10.1038/nn.2559.
Sun, L., Grützner, C., Bölte, S., Wibral, M., Tozman, T., Schlitt, S., . . . Uhlhaas, P. J. (2012).
Impaired gamma-band activity during perceptual organization in adults with Autism Spectrum
Disorders: Evidence for dysfunctional network activity in frontal-posterior cortices. The Journal of
Neuroscience 32(28): 9563–73. doi:10.1523/JNEUROSCI.1073-12.2012.
Tallon-Baudry and Bertrand. (1999). Oscillatory gamma activity in humans and its role in object
representation. Trends in Cognitive Sciences 3(4): 151–62.
Tsermentseli, S., O’Brien, J. M., and Spencer, J. V. (2008). Comparison of form and motion coherence
processing in autistic spectrum disorders and dyslexia. Journal of Autism and Developmental Disorders
38(7): 1201–10. doi:10.1007/s10803-007-0500-3.
Uhlhaas, P. J. and Mishara, A. L. (2007). Perceptual anomalies in schizophrenia: Integrating phenomenology
and cognitive neuroscience. Schizophrenia Bulletin 33(1): 142–56. doi:10.1093/schbul/sbl047.
Uhlhaas, P. J., Linden, D. E. J., Singer, W., Haenschel, C., Lindner, M., Maurer, K., and Rodriguez,
E. (2006a). Dysfunctional long-range coordination of neural activity during Gestalt perception in
schizophrenia. The Journal of Neuroscience 26(31): 8168–75. doi:10.1523/JNEUROSCI.2002-06.2006.
Uhlhaas, P. J., Phillips, W. A., Mitchell, G., and Silverstein, S. M. (2006b). Perceptual grouping in
disorganized schizophrenia. Psychiatry Research 145(2–3): 105–17. doi:10.1016/j.psychres.2005.10.016.
Individual Differences in Local and Global Perceptual Organization 735

Uhlhaas, P. J., Millard, I., Muetzelfeldt, L., Curran, H. V., and Morgan, C. J. A. (2007). Perceptual
organization in ketamine users: Preliminary evidence of deficits on night of drug use but not 3 days
later. Journal of Psychopharmacology 21(3): 347–52. doi:10.1177/0269881107077739.
Uhlhaas P. J., Silverstein S. M., Phillips W. A., Lovell P. G. (2004). Evidence for impaired visual
context processing in schizotypy with thought disorder. Schizophr. Res. 68: 249–260. doi:10.1016/
S0920-9964(03)00184-1.
Van de Cruys, S., de-Wit, L., Evers, K., Boets, B., & Wagemans, J. (2013). Weak priors versus overfitting of
predictions in autism: Reply to Pellicano and Burr (TICS, 2012). I-Perception, 4(2), 95–97. doi:10.1068/
i0580ic
Van Leeuwen, C., and Smit, D. J. A. (2012). Restless brains, wandering minds. In: S. Edelman, T. Fekete,
and N. Zach (eds.): Being in time: Dynamical models of phenomenal awareness. Advances in consciousness
research, pp. 121–47. Amsterdam: John Benjamins PC.
Van Loon, A. M., Knapen, T., Scholte, H. S., St. John-Saaltink, E., Donner, T. H., and Lamme, V. A. F.
(2013). GABA shapes the dynamics of bistable perception. Current Biology (in press). doi:10.1016/j.
cub.2013.03.067.
Vogel, E. K. and Awh, E. (2008). How to exploit diversity for scientific gain using individual
differences to constrain cognitive theory. Current Directions in Psychological Science 17(2): 171–6.
doi:10.1111/j.1467-8721.2008.00569.x.
Wagemans, J., Notebaert, W., and Boucart, M. (1998). Lorazepam but not diazepam impairs identification
of pictures on the basis of specific contour fragments. Psychopharmacology 138(3–4): 326–33.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012a). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychological Bulletin 138(6): 1172–217. doi:10.1037/a0029333.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., and van
Leeuwen, C. (2012b). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychological Bulletin 138(6): 1218–52. doi:10.1037/a0029334.
Wang, R., Li, J., Fang, H., Tian, M., and Liu, J. (2012). Individual differences in holistic processing predict
face recognition ability. Psychological Science 23(2): 169–77. doi:10.1177/0956797611420575.
White, S. J. and Saldaña, D. (2011). Performance of children with autism on the Embedded Figures
Test: A closer look at a popular task. Journal of Autism and Developmental Disorders 41(11): 1565–72.
doi:10.1007/s10803-011-1182-4.
Wichmann, F. A., Drewes, J., Rosas, P., and Gegenfurtner, K. R. (2010). Animal detection in natural
scenes: Critical features revisited. Journal of Vision 10(4). doi:10.1167/10.4.6.
Wilmer, J. B. (2008). How to use individual differences to isolate functional organization, biology, and
utility of visual functions; with illustrative proposals for stereopsis. Spatial Vision 21(6): 561–79.
doi:10.1163/156856808786451408.
Witkin, H. A. (1962). Psychological differentiation: studies of development. New York: Wiley.
Witkin, H. A. and Asch, S. E. (1948). Studies in space orientation: Further experiments on perception of
the upright with displaced visual fields. Journal of Experimental Psychology 38(6): 762–82.
Witkin, H. A. and Berry, J. W. (1975). Psychological differentiation in cross-cultural perspective. Journal of
Cross-Cultural Psychology 6(1): 4–87.
Yang, E., Tadin, D., Glasser, D. M., Hong, S. W., Blake, R., and Park, S. (2012). Visual context processing in
schizophrenia. Clinical Psychological Science. doi:10.1177/2167702612464618.
Yong, E. (2012). Replication studies: Bad copy. Nature 485(7398): 298–300. doi:10.1038/485298a.
Yovel, G. and Kanwisher, N. (2005). The neural basis of the behavioral face-inversion effect. Current Biology
15(24): 2256–62. doi:10.1016/j.cub.2005.10.072.
Zaretskaya, N., Anstis, S., and Bartels, A. (2013). Parietal cortex mediates conscious perception of illusory
Gestalt. The Journal of Neuroscience 33(2): 523–31. doi:10.1523/JNEUROSCI.2905-12.2013.
Chapter 36

Mutual interplay between perceptual


organization and attention:
A neuropsychological perspective
Céline R. Gillebert and Glyn W. Humphreys

1  Introduction
The visual system possesses the remarkable ability to rapidly group elements in a complex visual
environment based on a range of factors first elucidated by the Gestalt psychologists, including
proximity, similarity, and common fate (Wertheimer 1923). However, there is also a competition
for neural representation, given constraints on neuronal tuning and the presence of large recep-
tive fields at higher levels of visual association cortex (Desimone and Duncan 1995). To deal with
the complexity that exists in the environment, there need to be processes which prioritize the
information that is most relevant to on-going behavior. Representing the world efficiently requires
both the selection of a fraction of the information that reaches our senses and the organization of
this information into coherent and meaningful elements.
In this chapter, we discuss the dynamic interplay between (on the one hand) visual, selective
attention and (on the other) perceptual organization, two important processes that allow us to
perceive a seamless, integrated world. In describing this interplay, we will draw on evidence from
neuropsychology, which provides striking examples where (i) perceptual organization appears to
operate despite a patient having a very poor ability to select visual information, and (ii) spatial
attention appears to operate even when perceptual organization is impaired. At least at first sight,
such evidence provides one of the strongest examples of perceptual organization being independ-
ent of visual attention. Whether this is a robust conclusion will be something we will review. In
this chapter, we will predominantly focus on perceptual grouping.

1.1  A neuropsychological
example of the interplay of attention
and perceptual organization
As we shall review, neuropsychology provides many striking examples of the interplay between
attention and perceptual organization. A  case described by Alexander Luria in 1959 provides
a good illustration. Luria reported a patient with simultanagnosia after bilateral occipitopari-
etal brain injury—a major impairment in “seeing” more than one object at a time. The patient
was shown two versions of the Star of David, formed by two overlapping triangles. When the
triangles differed in color, the patient only reported a single triangle. However, when triangles
were the same color, the patient immediately perceived the complete star. Similarly, when two
separate shapes were briefly exposed, only one was seen at a time. Nevertheless, when the shapes
were identical, or combined into a single structure through a connecting line, their perception
was facilitated (Luria 1959). This case study demonstrates how perceptual organization (notably
Mutual interplay between perceptual organization and attention 737

grouping by similarity or connectedness) can determine where attention is allocated and which
objects are accessible for explicit report.
The mutual interplay between perceptual grouping and attention can be assessed through dif-
ferent lenses, answering at least three distinct but related questions:
•  Can perceptual grouping constrain visual attention, determining which objects will be selected
and be candidates for explicit report?
•  Can perceptual grouping occur even without (focused) attention, or does perceptual grouping
fully depend on the availability of attentive resources?
•  Can visual attention modulate perceptual grouping, determining how elements are grouped to
form meaningful wholes?
Note that evidence that perceptual grouping constrains attention, and that grouping can oper-
ate without focused attention, can be taken to indicate that attention has no influence on group-
ing. However this would be an incorrect inference, since evidence for grouping without attention
does not necessarily indicate that attention does not modulate grouping under appropriate condi-
tions. This is the conclusion we will come to.
In the next paragraphs, we will first define the concept of “visual attention,” distinguish it from
the concept of “awareness,” and describe the most common attentional neuropsychological defi-
cits after stroke. We will then tackle each of our questions, drawing on evidence from neuropsy-
chological studies in patients with attention deficits, along with evidence from behavioral and
neuroimaging studies in healthy volunteers. We will then outline a framework for the dynamic
modulation of perceptual grouping by attention. In particular, we will argue that perceptual
grouping is weakly constrained by visual attention, but that attention nevertheless can play a role
in dynamically altering the “weighting” of elements in any organized structure, especially under
conditions in which stored knowledge and learning cannot play a major role.

2  Visual attention
2.1  Assigning attentional priorities
Visual attention can be defined as the set of cognitive functions that prioritize visual information
according to our current task goals and expectations. Many models of selective attention posit
that processing resources are allocated to perceptual units on the basis of the dynamically evolv-
ing peak of activity in an “attentional priority map” (e.g., Bays et al. 2010; Bisley and Goldberg
2010; Bundesen 1990; Gillebert et al. 2012; Ipata et al. 2009; Mavritsaki et al. 2011; Ptak 2012;
Vandenberghe and Gillebert 2009; Vandenberghe et al. 2012). The attentional priority map pro-
vides an abstract, topographical representation of the environment in which each object (or loca-
tion) is “weighted” by its sensory characteristics and its current behavioral relevance. At any given
moment in time, attention is directed towards the object (or location) with the highest priority
(e.g., Koch and Ullman 1985; Treisman 1998). These models are strongly based on the concept
of a salience map. The concept of a saliency map was proposed by Itti and Koch (Itti and Koch
2000; Koch and Ullman 1985) to refer to a map which encodes the local conspicuity (physical
“saliency”) in the visual scene. The term priority map however goes beyond this to posit the
joint influence of bottom-up and top-down factors, such as behavioral goals and expectations
(Bisley and Goldberg 2010; Ptak 2012; Vandenberghe and Gillebert 2009). The attentional priority
map is a key concept in the Theory of Visual Attention (TVA) (Bundesen 1990; Bundesen et al.
2005, 2011), a mathematical framework related to the biased competition account (Desimone
and Duncan 1995), which we will return to discuss in detail. Evidence from single-unit studies,
738 Gillebert and Humphreys

functional neuroimaging, and lesion-symptom mapping in patients with brain damage suggests
that attentional priorities are encoded in a network of frontoparietal areas—the so-called dorsal
attention network—which includes the intraparietal sulcus and the frontal eye fields (Bisley and
Goldberg 2010; Corbetta and Shulman 2002; Gillebert et al. 2012; Gillebert et al. 2011; Ptak 2012;
Vandenberghe and Gillebert 2009).

2.2  Attention and awareness


If not identical, attention and awareness are often considered to be two sides of the same coin (e.g.,
Posner 1994). The implicit assumption behind this posits that attending an object is necessary and
sufficient for awareness of the object. However, ample evidence has been provided that attention
and conscious awareness can be dissociated, both at a cognitive level and a neural level (Kentridge
2011; Koch and Tsuchiya 2007; Wyart and Tallon-Baudry 2008). In particular, attention is not
sufficient to give rise to awareness (see also chapter by Schwarzkopf and Rees, this volume). For
example, spatial attention can facilitate the processing of stimuli which do not reach awareness
in patients with blindsight (Kentridge et al. 1999). It remains debated, however, whether or not
conscious awareness can occur in the absence of attention (Prinz 2011).

2.3  Neuropsychological deficits of visual attention


Impairments in visual attention are a frequent consequence of brain lesion, with the incidence
of problems being particularly high after right hemisphere brain damage (Stone et  al. 1993).
Patients with attention deficits may fail to be aware of items in the side of space opposite the
lesion (hemispatial neglect), show impaired report of an item on the contralesional side of
space only when simultaneously presented with an ipsilesional item (visual extinction), or they
may be poor at detecting multiple visual items, regardless of where the stimuli appear in space
(simultanagnosia).
Patients with hemispatial neglect are typically unaware of stimuli presented on the side of space
contralateral to the brain damage, even in the absence of sensory or motor loss. In its most
extreme form, these patients may act as if the contralesional side of the world does not exist.
A spontaneous and sustained deviation of the eyes and head towards the ipsilesional side of space
may form the core deficit underlying the neglect syndrome, although patients with neglect often
exhibit a variety of other attentional and spatial deficits (Karnath and Rorden 2012). Neglect
should therefore be considered a heterogeneous disorder which affects attentional, intentional,
and representational processes to different degree, depending the extent of the damage onto
parietal (Golay et al. 2008), temporal (Hillis et al. 2005; Ptak and Schnider 2005) or prefrontal
cortex (Husain and Kennard 1997; Verdon et al. 2010). However, the core deficit of the neglect
syndrome, i.e. biased orienting of attention, has been suggested to be specifically induced by
structural or functional damage to a set of regions surrounding the sylvian fissure, including the
inferior parietal lobule, the superior/middle temporal cortex and underlying insula, and the ven-
trolateral prefrontal cortex (Karnath and Rorden 2012). Hemispatial neglect differs from sensory
syndromes, such as hemianopia, in being modulated by contextual variables, such as motivation
(Malhotra et al. 2013), experience (Rossetti et al. 1998), expectancy (Geng and Behrmann 2006;
Riddoch and Humphreys 1983), task demands (Vuilleumier and Rafal 2000), novelty (Karnath
1994), and the organization of the visual input itself (Driver and Halligan 1991). The syndrome
is diagnosed on the basis of a set of conventional neuropsychological tests (Heilman et al. 1993;
Humphreys et al. 2012; Mesulam 2000; Vallar and Perani 1986), such as cancellation, line bisec-
tion, and copying.
Mutual interplay between perceptual organization and attention 739

Visual extinction differs from hemispatial neglect as it is usually only detected with brief pres-
entations of at least two competing stimuli (Heilman et al. 1993). Patients with visual extinction
fail to detect a contralesional stimulus only when it is presented together with a competing
ipsilesional stimulus. In the conventional clinical task for extinction in the visual domain, the
patient is presented with either a visibly wiggling finger on the left or the right side, or with two
wiggling fingers concurrently on both sides (Bender 1952; Humphreys et  al. 2012). Patients
with visual extinction can detect a single stimulus on either side, but are impaired at detecting
the contralesional stimulus when two stimuli are presented simultaneously on opposite sides.
Visual extinction, primarily associated with damage to the right temporoparietal junction (e.g.,
Chechlacz et  al. 2013; Ticini et  al. 2010; Vossel et  al. 2011), has typically been attributed to
the brain lesion biasing attentional selection, so that less attentional weight is allocated to the
contra- relative to the ipsilesional side of space. The weight assigned to the contralesional side
can be sufficient for a single contralesional item to be detected, but this item then loses any
competition for selection when a competing stimulus appears simultaneously on the ipsile-
sional side (Duncan et al. 1997).
Patients with simultanagnosia, typically induced by bilateral lesions of the occipitoparietal cor-
tex and underlying white matter (Chechlacz et  al. 2012), show impaired report of two stimuli
relative to one, are poor at integrating multiple objects in a scene, and at integrating local ele-
ments into a coherent object (Bálint 1909; Rizzo and Vecera 2002). In other words, simultag-
nosic patients are biased towards selecting the local shape representations (unless counteracted by
grouping between local elements) rather than more global stimuli (Shalev et al. 2004).
These deficits of visual attention may be a consequence of damage to or dysfunction of the
attentional priority map (Ptak and Fellrath 2013). For example, patients with hemispatial neglect
may fail to assign attentional priorities to events in the contralesional side of space—resulting in a
competitive advantage for ipsilesional events to be candidates for attentional orienting. In particu-
lar, visual attention deficits in patients with hemispatial neglect may be driven by impairment in
integrating bottom-up and top-down factors to compute attentional priorities (Dombrowe et al.
2012; Ptak and Fellrath 2013).

3  Perceptual grouping influences the assignment


of attentional priorities
In this section, we will argue that perceptual grouping can influence attentional priorities and can
therefore determine which elements in the visual field are selected. In particular, we will demon-
strate that items that belong together are selected together, even if one of the items is irrelevant
for the current task goal or if it has a competitive disadvantage in patients with visual attention
deficits.

3.1  Evidence from patients with attention deficits


Perceptual grouping based on both low-level and high-level factors can result in recovery from
extinction, attenuation of neglect, and the ability to see more than one item in simultanagnosia.

3.1.1  Low-level grouping


Recovery of extinction can be obtained when the contralesional item groups with the ipsilesional
item on the basis of the Gestalt principles of similarity (Berti et al. 1992; Ptak et al. 2002; Ward et al.
1994) (but see Baylis et al. 1993; Vuilleumier and Rafal 1999, 2000), proximity (Pavlovskaya et al.
2007), symmetry (Ward et al. 1994), connectedness (Driver et al. 1997; Humphreys 1998), brightness
740 Gillebert and Humphreys

(Gilchrist et al. 1996), collinearity (Boutsen and Humphreys 2000; Gilchrist et al. 1996; Mattingley
et al. 1997; Pavlovskaya et al. 2007), common shape (Gilchrist et al. 1996; Humphreys 1998; Ptak and
Schnider 2005) and common contrast polarity (Gilchrist et al. 1996; Humphreys 1998).
Mattingley et  al. (1997), for example, presented a patient with left-sided extinction with a
sequence of displays, consisting of four circles arranged to form a square (Figure 36.1a). On each
trial, quarter segments were briefly removed from the circles either from the left, from the right,
from both sides, or not at all. The patient’s task was to detect the side of the offsets. When the
segments were configured such that no grouping emerged, bilateral removal of quarter segments

(a) Surface completion in patients with visual extinction

Extinction:
<20% left detections

From which
side where the
segments
removed?
No extinction:
>80% left detections
Time

(b) Low-level grouping in patients with visual extinction

Baseline

Brightness
Grouping factor

Collinearity

Connectedness

Surroundness

0 10 20 30
Number of two-item responses (/30)
Fig. 36.1  Perceptual grouping and recovery from extinction. (a) Example of a task requiring
discrimination between displays where segments were briefly removed from circles on the left, right,
both sides or on neither side. On bilateral trials, when segments were removed on the outer side
of the circle, extinction occurred. When segments were removed on the inner side of the circle,
inducing a Kanizsa figure, no extinction was observed. Adapted from Mattingley et al. (1997).
(b) Results on a detection task from two-item displays as a function of the grouping among the
contra- and ipsilesional item. The task required the discrimination between displays with no, one, or
two items.
Adapted from Glyn W. Humphreys, Neural representation of objects in space: a dual coding account, Philosophical
Transactions of The Royal Society B: Biological Sciences, 353 (1373), pp. 1341–1351, doi: 10.1098/rstb.1998.0288
Copyright © 1998, The Royal Society.
Mutual interplay between perceptual organization and attention 741

induced extinction: the patient made more errors for offset detections on the left side which were
presented together with right-sided offsets, when compared with unilateral left presentations.
Extinction, however, was less severe when the stimulus configuration could be grouped to form a
Kanizsa square (see also Conci et al. 2009).
Several of these factors were investigated in GK, a patient who suffered bilateral lesions of the
occipitoparietal and parietotemporal region, resulting in Bálint’s syndrome and in extinction of
left-sided targets. Humphreys and colleagues (Gilchrist et al. 1996; Humphreys 1998) presented
GK either with a single stimulus in the left or right visual field, or with two stimuli, one in the
left and one in the right visual field. GK showed recovery from extinction if the elements had: the
same brightness (two white or two black circles), collinear edges (with aligned squares), a con-
necting line (joining circles with opposite contrast polarities), and inside-outside relations (e.g.,
a left-field circle appearing within a surrounding rectangle) (Figure 36.1b). Grouping not only
operated between items presented in the impaired and his “better” visual field, but also when both
items were presented within the impaired visual field.
These data suggest that patients with visual attention deficits can explicitly report the contral-
esional stimulus if perceptual grouping allows it to be processed together with the ipsilesional
stimulus. The benefit of perceptual grouping may result from attentional priorities being assigned
to the perceptual group as a whole, rather than to the items constituting the group, therefore facili-
tating the selection of individual items within the group. In other words, the ability to compute
attentional priority for one item in the display (e.g., the ipsilesional item in extinction) may spread
this attentional priority to the item with which it is grouped.

3.1.2  High-level grouping


As well as there being evidence for low-level grouping in neglect and extinction, there is also evi-
dence for grouping based on higher-level perceptual properties of stimuli, where access to stored
knowledge is required.
Hemispatial neglect is attenuated for familiar words or compound word pairs compared to
meaningless strings or unrelated word pairs (Behrmann et al. 1990; Braet and Humphreys 2006;
Brunn and Farah 1991; Riddoch et al. 1990; Sieroff et al. 1988), or when two visual elements form
a meaningful whole (Seron et al. 1989). Also extinction is reduced if elements are both part of
a known shape or a familiar configuration (Kumada and Humphreys 2001; Vuilleumier 2000;
Vuilleumier and Sagiv 2001; Vuilleumier et al. 2001a; Ward et al. 1994), or if there are associa-
tive relations between separate words (Coslett and Saffran 1991). For example, Ward et al. (1994)
found recovery from extinction when two symbolic stimuli formed a familiar configuration (e.g.
an arrow <-) relative to an unfamiliar configuration (e.g. V-). Similarly, patients with extinction
are better at identifying left-sided letters in words than in non-words (Kumada and Humphreys
2001). Interestingly, Kumada and Humphreys reported that word-level grouping between letters
over-rode effects of whether the letters failed to group using low-level similarity relations. These
authors reported that having two letters with opposite contrast polarities (one white, one black,
against a grey background) disrupted report when the letters formed a nonword, but there was
recovery of the contralesional letter irrespective of the contrast polarity when the letters formed
a word.
Hence, when participants are presented with pairs of objects that do not group on the basis
of low-level Gestalt factors, extinction can still be modulated by the relationship between the
stimuli. This argument is also supported by evidence that visual extinction is reduced when there
is an action relation between the contra- and ipsilesional objects. When stimuli are positioned
where they appear to be engaged in a common action (e.g., a bottle pouring into a glass), patients
show less extinction than when the objects are depicted in locations where they could not be
742 Gillebert and Humphreys

used together (e.g., bottle pouring underneath a glass; Riddoch et al. 2010; Riddoch et al. 2006;
Riddoch et al. 2002). Several factors appear to contribute to this result. The effect is stronger when
objects are used frequently together, and are correctly positioned for the action (Riddoch et al.
2006), but it is also eliminated if the objects are inverted (Riddoch et al. 2011). Such results suggest
that the familiarity of the action as it is standardly seen (with objects in their usual orientation
for the interaction) is important for grouping the objects for selection. Riddoch et al. (2010) addi-
tionally suggest that it is the implied motion from one object to another which links the objects
together so they are encoded as a single perceptual unit.

3.1.3  Whenperceptual grouping is disruptive for patients


with attention deficits
Whereas grouping has a beneficial effect on the report of contralesional items in patients when
there is a meaningful relationship between the contra- and ipsilesional items, it may negatively
affect the ability to name the left-side item in some cases. For example, within the syndrome of
neglect it is possible to distinguish between patients who show a deficit to stimuli on one side
of space in relation to the body, and patients whose deficits reflect the position of parts within
an object (so-called egocentric versus allocentric neglect; see Chechlacz et al. 2010; Humphreys
and Riddoch 1994; Verdon et al. 2010). Positive effects of grouping on the perceptual report of
neglected stimuli may be evident in egocentric neglect, where the coding of elements within a
group reduces the egocentric attentional bias. However, grouping may be disruptive for patients
with allocentric neglect (Buxbaum and Coslett 1994; Humphreys and Heinke 1998; Tian et al.
2011; Young et al. 1992). For example, Young et al. (1992) reported the case of a patient able to
report two images of the left half of different faces but who showed a lack of awareness for the left
half of a chimeric face formed by linking the left and right sides of two faces. In this case, grouping
the left and right sides of a face induced neglect, presumably because there was biased allocation
of attention to an object-based representation of the stimulus. In some models (e.g., Heinke and
Humphreys 2003), the setting of attentional weights within an object-centered representation can
be separated from setting attentional weights within a spatial priority map for separate objects.
The reference frame is indeed important when making predictions about the effect of grouping
in patients with spatial attention disorders (Behrmann and Tipper 1994; Tipper and Behrmann
1996). Behrmann and Tipper presented neglect patients with two circles to the left and the right
of the midline, one colored red and the other blue. When grouping the circles by a connected line
induced an object-centered reference frame, and the object rotated such by 180 degrees, patients
ignored the ipsilesional item (contralesional side of the object) and reported the contralesional
item (ipsilesional side of the object).
The distinction between egocentric and allocentric neglect also links onto the presence of
respectively more anterior and posterior brain lesions, and more dorsal versus ventral lesions
within posterior parietal cortex (Chechlacz et al. 2010; Verdon et al. 2010). Beneficial effects of
grouping may reflect spared ventral coding within patients with egocentric neglect in patients
with more dorsal lesions, while more ventral lesions may impact on spatial coding within allocen-
tric representations.

3.1.4  Neural basis


At which level of representation does perceptual organization influence the distribution of atten-
tional weights? The evidence cited above clearly demonstrated that perceptual grouping can influ-
ence the distribution of attentional weights, despite structural or functional damage to the dorsal
attention network. In contrast, lesions of the ventral visual stream, such as the lateral occipital
Mutual interplay between perceptual organization and attention 743

complex, are associated with agnosia, an impaired object recognition that cannot be attributed to
visual loss (see chapter by Behrmann and colleagues, this volume, for a discussion of prosopag-
nosia, an impairment of face recognition). In the case of apperceptive agnosia, the percept of the
object is not fully constructed—hence these patients may have deficits in perceptual grouping.
Double dissociations can indeed be found. In contrast to neglect (Schindler et al. 2009), patients
with agnosia can normally orient their attention to the contralesional visual field, but their allo-
cation of attention is not influenced by objects (de-Wit et al. 2009; Vecera and Behrmann 1997).
We conclude that perceptual organization can influence the distribution of attentional weights
through representation in the ventral visual stream rather than in the parietal cortex. Nevertheless,
the setting of spatial attentional weights can be dissociated from such ventral input, in cases of
agnosia (de-Wit et al. 2009; Vecera and Behrmann 1997).

3.2  Evidence from healthy volunteers


Reminiscent of the beneficial effects of grouping in neuropsychological cases, responses from nor-
mal participants to multiple targets are facilitated when the targets group on the basis of Gestalt
cues (Behrmann et al. 1998; Duncan 1984; Lavie and Driver 1996; Vecera and Farah 1994), or
when the objects are positioned for action (Roberts and Humphreys 2011). In selective attention
tasks, however, the grouping of targets and distractors can disrupt performance. For example,
target-distracter grouping by low-level factors such as color similarity, connectedness, common
motion, continuation (Baylis and Driver 1992; Driver and Baylis 1989; Harms and Bundesen 1983;
Kahneman and Henik 1981; Kramer and Jacobson 1991), or high-level factors such as familiarity
(Green and Hummel 2006), increases the level of interference by the distracter. Similarly, the abil-
ity to keep track of independently moving targets in multiple-object tracking tasks (Pylyshyn and
Storm 1988) is impaired when the targets are merged to form objects with distracters, for example
by connectedness (Howe et al. 2012; Scholl et al. 2001).
Egly et al. (1994) provided further evidence suggesting that attention is allocated to perceptual
groups. In their study, participants were presented with two rectangles. Attention was briefly cued
to one end of one of the rectangles, and participants were asked to detect a target presented either
on a validly or on an invalidly cued location. On invalid trials, reaction times were faster when
the target appeared within the same rectangle that was cued than when it appeared at an equal
distance from the cue but in a different rectangle. Here a spread of attention within an object can
facilitate selection. The results also apply to objects that require perceptual completion due to
occlusion and objects formed from subjective contours (Moore et al. 1998) or contour alignment
(Norman et al. 2013). Interestingly, relevant to our understanding about the relations between
attention and awareness, the same-object advantage occurs even when participants are unaware
of these objects (Norman et al. 2013). In the study by Norman and colleagues (2013), the objects
were rendered invisible to the participants: Texture elements in the objects had an orientation
contrast of 90 degrees to the elements in the background. When the texture elements both inside
and outside of the object boundaries are continually reversed at a high frequency, participants are
unaware of the objects. Despite being unaware of the objects, participants were faster in discrimi-
nating the target’s color when the cue and the target appeared within the same object relative to
when they appeared in different objects. Hence, similarly to the neuropsychological evidence, the
data suggest that perceptual grouping can operate without attention and awareness.
Converging evidence for an enhanced processing of unattended stimuli which group with attended
stimuli comes from functional magnetic resonance imaging (fMRI) and event-related brain potentials
(ERPs) studies: relevant and irrelevant elements which group through an illusory contour elicit a very
744 Gillebert and Humphreys

similar response pattern in visual cortex (Martinez et al. 2007; Martinez et al. 2006) and there is neural
activation of unattended items if they share a featural property with an attended item (Saenz et al. 2002).
These studies suggest that attention has a tendency to spread throughout perceptual groups
(Richard et al. 2008). In other words, attending to one element of a perceptual group can cause
attention to spread to other elements of the same perceptual group, and therefore enhancing the
sensory representation of these elements. Inversely, grouping between distracter elements can
facilitate visual search because distracters can be rejected together—a process termed spreading
suppression (e.g., Dent et al. 2011; Donnelly et al. 1991; Duncan and Humphreys 1989; Gilchrist
et al. 1997; Humphreys et al. 1989). Hence, the outcome of perceptual grouping constrains visual
attention.
Not only can attention spread throughout a perceptual group, a good perceptual group can
in itself capture attention (Humphreys and Riddoch 2003; Humphreys et al. 1994; Kimchi et al.
2007; Yeshurun et al. 2009). Kimchi and colleagues (2007) presented participants with displays
containing eight distracters and a target defined from its location relative to a cue. On some trials,
a subset of the elements grouped to form a diamond based on the Gestalt principle of collinear-
ity. Compared to the condition when no perceptual group was present in the display, reaction
times to the target were shorter when the cue appeared within the perceptual group and longer
when the cue occurred outside the perceptual group (Kimchi et al. 2007). Similarly, given two
stimuli, simultagnosic patients tend to perceive the stimulus whose parts group more strongly
(Humphreys et al. 1994), even when the strong group is less complex than the competing weak
group (Humphreys and Riddoch 2003). Furthermore, Humphreys and Riddoch (2003) showed
that attention is drawn to the location of the strong group, facilitating the identification of a sub-
sequently presented letter in that location.

4  Perceptual grouping can operate without


selection by attention
According to many theories of attention, fundamental visual processes, such as figure-ground
segmentation and perceptual grouping, are fully pre-attentive: they occur automatically, with-
out attention, effort or “scrutiny” (Julesz 1981; Marr 1982; Neisser 1967; Treisman 1982). This
view has drawn support from behavioral experiments in normal participants, such as visual
search, showing that reaction times increase as a function of the number of distracter groups
rather than individual distracter elements (Treisman 1982). An opposing account suggests that
little, if any, perceptual organization can occur in the absence of attention: perceptual organi-
zation cannot proceed without attention being allocated to the location where organization is
computed (Ben-Av et al. 1992), or, in other words, without the attentional priority of that loca-
tion being high.
Support for the latter view can be derived from dual-task experiments, where observers
are unable to explicitly report perceptual groups whilst attention is concurrently engaged in a
demanding task not involving the groups (Ben-Av et al. 1992). Mack, Rock, and their colleagues
(Mack et al. 1992; Rock et al. 1992) developed the “inattention paradigm” to determine whether
perceptual grouping can occur not only in the absence of attention to the constituent elements,
but also when there is not even the intention to perceive the elements. Participants were presented
with a task-relevant cross in the center of the screen, along with a task-irrelevant Gestalt group-
ing display in the background (Figure 36.2a). The task was to determine whether the vertical or
horizontal line of the cross was longer. The basic finding, replicated in several studies (Mack and
Rock 1998), was that the observers were unable to report anything about how the elements in the
background grouped, when surprise questions were given retrospectively.
Mutual interplay between perceptual organization and attention 745

(a) (b) (c)

Fig. 36.2  Perceptual grouping without awareness or attention. (a) Example of a display used in the
“inattention paradigm” developed by Mack et al. (1992). Participants were to judge which of the two
arms of the cross was longer. The elements in the background could be grouped by color similarity.
Participants were asked surprise questions about the background grouping. (b,c) Example of a type of
display used by Moore and Egeth (1997). Participants were to judge which of two horizontal lines was
longer, while dots in the background formed displays such as in the Ponzo (b) or Müller-Lyer illusion (c).
Line judgments were influenced by the illusions.
Data from A. Mack, B. Tang, R. Tuma, S. Kahn, and I. Rock, Perceptual organization and attention, Cognitive
Psychology, 24(4), pp. 475–501, 1992.

However, the inability to explicitly report grouping, i.e. not being aware of it, when attention
is engaged in a concurrent demanding task does not necessarily imply that perceptual grouping
in itself requires attention. In studies of patients with blindsight, and also in normal observers
with stimuli presented under masking conditions, there can be enhanced perceptual processing of
stimuli that the observer is unaware of, indicating that attention to the location of an object does
not necessarily imply awareness of that object (Kentridge et al. 1999); awareness can be dissoci-
ated from attention. In addition, limited explicit report/awareness of a stimulus may, for example,
also reflect poor encoding of the item into memory. To counteract this criticism, Moore and Egeth
(1997) used an implicit measure of perceptual grouping: observers were to judge the length of
line segments, presented along with background elements that were entirely task-irrelevant. The
background elements were arranged so that, if perceptually grouped, they could induce optical
illusions, such as the Ponzo illusion (Figure 36.2b) or the Müller-Lyer illusion (Figure 36.2c).
Although observers appeared unaware of the background elements when retrospectively ques-
tioned, arrangement of the elements clearly modulated line length judgments. For example, when
the background pattern could induce the Ponzo illusion (Figure 36.2b), the line that was closer
to the converging end of the background pattern was judged to be longer than the line that was
further away from the converging end. This suggests that perceptual grouping can occur with-
out attention. Several other studies in healthy volunteers and patients with hemispatial neglect
support these findings (Chan and Chua 2003; Kimchi and Razpurker-Apfeld 2004; Lamy et  al
2006; Russell and Driver 2005; Shomstein et al. 2010). For example, Shomstein and colleagues
(2010) investigated whether perceptual grouping in the poorly attended (contralesional) visual
field of neglect patients affected performance on stimuli presented in the intact (ipsilesional)
visual field. To assess this, they adapted a paradigm developed by Russell and Driver (2005): they
asked patients with hemispatial neglect to perform a change detection task on complex target
stimuli, successively presented to the ipsilesional hemifield (Figure 36.3a). At the same time, irrel-
evant distracter elements appeared in the contralesional hemifield, either changing or retaining
their perceptual grouping on successive displays. Changes in perceptual grouping of the con-
tralesional distracters produced congruency effects on the attended (ipsilesional) target-change
746 Gillebert and Humphreys

(a) Effect of irrelevant grouping in the contralesional hemifield on change detection in the
ipsilesional hemifield

Contralesional Ipsilesional Change No change


distracter target (Incongruent trial)
Time

(b) Effect of irrelevant background grouping on change detection at the fovea

Grouping of columns/rows Grouping of shape of Grouping of shape by


by color similarity homogeneous color similarity
elements
Fig. 36.3  Perceptual grouping without attention in neglect and healthy volunteers. (a) Example of the
change detection paradigm used by Shomstein et al. (2010). Participants were asked to judge whether
successively presented checkerboards in the ipsilesional hemifield were the same or different, while the
grouping in the contralesional hemifield was manipulated independently. (b) Example of displays used in
a similar change detection task by Kimchi and Razpurker–Apfeld (2004). The elements in the background
were grouped into columns/rows by similarity, into a shape, or into a shape by color similarity.
Data from C. Moore and H. Egeth, Perception without attention: Evidence of grouping under conditions of
inattention, Journal of Experimental Psychology. Human Perception and Performance, 23(2), pp. 339–52, 1997.

judgment—for example, the time take to decide that two ipsilesional stimuli differed was speeded
if the grouping relations in the contralesional field changed. The magnitude of the effect was the
same in neglect patients and control participants. Again it appears that perceptual grouping can
take place in the absence of attention allocated to the elements forming the perceptual grouping.
There is converging evidence too from patients with simultanagnosia. Even though normal
participants can show a bias to global hierarchical shapes, rather than to their local constituents
(Navon 1977) (see Figure 36.4a) (see chapter by Kimchi, this volume, for a detailed analysis of the
processing of hierarchical figures), simultagnosic patients tend to show a local bias—they may rec-
ognize the local elements whilst being poor at explicitly reporting the global shape (Huberle and
Karnath 2006; Karnath et al. 2000). However, the same patients can be faster at naming the local
letters when their identity is congruent with the global letter compared to when it is incongruent.
These congruency effects again suggest that, even if the global shape is not available for explicit
report, grouping based on proximity of local elements can still occur in simultagnosic patients.
In line bisection tasks, patients with hemispatial neglect have to indicate the midpoint of a
horizontal line presented on a piece of paper in front of them. Deviation of the estimated mid-
point towards the side of brain damage is typically regarded as being indicative of hemispatial
neglect. Vuilleumier and colleagues (Vuilleumier and Landis 1998; Vuilleumier et  al. 2001b)
Mutual interplay between perceptual organization and attention 747

(a) Local bias in simultanagnosia affected by congruency


betweeen local and global shape

Congruent Incongruent

(b) Spatial bias of neglect in line bisection task


also present with illusory contours

Illusory contour

Real contour

Fig. 36.4  Implicit perceptual grouping in simultanagnosia and neglect. (a) Patients with


simultanagnosia are typically poor at explicitly reporting the global shape in hierarchical letter, but
are faster at identifying the local shapes when congruent with the global shape. (b) In line bisection
tasks, the midpoint indicated by patients with neglect typically deviate towards the side of brain
damage, even when bisecting an illusory contour.
Adapted from Neuropsychologia, 39(6), Patrik Vuilleumier, Nathalie Valenza, and Theodor Landis, Explicit and
implicit perception of illusory contours in unilateral spatial neglect: behavioural and anatomical correlates of
preattentive grouping mechanisms, pp. 597–610, doi: 10.1016/S0028-3932(00)00148-2 Copyright © 2001, with
permission from Elsevier.

used Kanizsa-type illusory figures to examine whether patients with neglect would also deviate
from the midpoint when marking the midpoint of illusory contours rather than real contours
(Figure 36.4b). Bisection judgments in neglect patients were similar on Kanizsa stimuli with
illusory contours and connected stimuli with real contours, even though the patients could not
detect the contralateral inducers explicitly. These results suggest that neglect patients can implic-
itly group inducing elements prior to the stage where the attentional bias towards the ipsilesional
side of space arises. Interestingly, patients with lesions extending posteriorly to the lateral occipital
complex did not show this systematic bisection pattern, suggesting that implicit grouping may
depend on the integrity of lateral occipital areas (Vuilleumier et al. 2001b).
Other evidence that perceptual grouping can occur without observers paying attention to
the constituent elements comes from fMRI studies in healthy volunteers. One line of work has
exploited the visual suppression that occurs between simultaneously presented, proximal visual
elements. These competitive interactions appear to occur automatically, without attention, in
early visual cortex (Kastner et al. 1998; Reynolds et al. 1999). McMains and Kastner (2010)
assessed whether the level of competitive interaction induced by task-irrelevant elements varied
as a function of the strength of perceptual grouping between the elements. They found that
competitive interactions in early visual cortex and V4 were reduced when the elements could
be grouped on the basis of the Gestalt principles of collinearity, proximity, or illusory contour
748 Gillebert and Humphreys

formation compared to when the same stimuli could not be grouped, even if these elements
were task-irrelevant and observers performed a concurrent demanding task (McMains and
Kastner 2010).
Whether or not perceptual grouping requires attentive resources may, however, also depend on
the type of perceptual grouping involved (Han et al. 1999; Han et al. 2001; Han et al. 2002; Kimchi
and Razpurker-Apfeld 2004). Kimchi and Razpurker-Apfeld (2004) used Russell and Driver’s par-
adigm (2005) to study different forms of grouping under inattention. On each trial, participants
were presented with two successive displays; each containing a central target matrix surrounded
by task-irrelevant grouped background elements, and individuals performed a demanding change
detection task on the target matrix. Grouping between the background elements stayed the same
or changed across successive displays, independent of any change in the target matrix. Grouping
of columns/rows by color similarity and grouping of shape by homogeneous elements affected
performance on the central change detection task (Figure 36.3b). Grouping of shape by color sim-
ilarity, however, did not result in congruency effects, suggesting that the latter form of grouping is
contingent upon the availability of (sufficient) attentional resources. Whether or not attention is
necessary for grouping to occur, may not be an all-or-none phenomenon. Kimchi and colleagues
(Kimchi and Peterson 2008; Kimchi and Razpurker-Apfeld 2004) proposed that a continuum of
attentional requirements exists as a function of the processes involved in different types of group-
ing. According to this view, grouping of shape by color similarity may be a weaker form of group-
ing requiring more attentional resources.
Other evidence for attention playing a necessary role in grouping is suggested by both brain
imaging and neuropsychological evidence. These studies indicate that damage to posterior pari-
etal cortex, a brain region implicated in attentional control, disrupts grouping (e.g., Zaretskaya
et al. 2013). Global pattern coding, for which local integration processes are not sufficient, also
seem to depend on the integrity of brain areas controlling attention, such as the intraparietal
cortex. Lestou et al. (2014) observed reduced activity to global radial and concentric Glass pat-
terns in structurally preserved intermediate regions such as the lateral occipital complex, after
lesions of the intraparietal cortex. This suggests that the intraparietal cortex plays a critical role in
modulating grouping in regions such as the lateral occipital cortex, which are typically thought
to respond to perceptual groups. Furthermore, perceptual grouping in neglect patients may not
be as efficient in patients compared to healthy volunteers. Han and Humphreys (2007) examined
the role of the frontoparietal cortex in top-down modulation of perceptual grouping by recording
ERPs from two patients with frontoparietal lesions and eight controls. In controls grouping by
proximity and collinearity was indexed by short-latency activity over the medial occipital cortex
and long-latency activity over the occipitoparietal areas. For the patients, however, both the short-
and long-latency activities were eliminated or weakened.
We can conclude from the above studies that some types of perceptual grouping can occur with-
out focused attention, although attentive resources appear to be necessary for the outputs of these
grouping processes to be accessible for explicit report. In contrast, other forms of grouping cannot
be accomplished optimally without focused attention (see also Kimchi 2009). Additional research
is needed to investigate in more detail which forms of grouping require attentional resources.

5  Attention constrains perceptual grouping


Several studies indicate that attention can modulate neural activity associated with grouping in
early visual cortex (e.g., Casco et al. 2005; Freeman et al. 2003; Freeman et al. 2001; Khoe et al.
2006; Wu et al. 2005). Freeman et al. (2001) showed that contrast thresholds for a central Gabor
Mutual interplay between perceptual organization and attention 749

stimulus are lower when it is flanked by collinear, oriented grating stimuli, but only when the
flankers are attended. In a subsequent study, Freeman and colleagues (2003) showed that the
attentional modulation persists even for high flanker contrasts, suggesting that attention acts by
integration of the local elements into a global form, rather than by changing the local sensitivity
to the flankers themselves. Goldsmith and Yeari (2003) demonstrated that effects of grouping
are found under conditions of divided attention—allowing attention to spread across the vis-
ual field—but that grouping effects are reduced under conditions of focused attention. Effects of
attention have also been observed for higher-level types of grouping. For example, Roberts and
Humphreys (2011) showed that the benefit of positioning pairs of objects for action is reduced by
cueing attention towards one of the objects. Converging evidence has been obtained using fMRI
(Han et al. 2005a) and ERP techniques (Han et al. 2005b) by Han and colleagues showing that
proximity grouping is modulated by whether stimuli fall within an attended region. Furthermore,
de Haan and Rorden (2010) showed that similarity grouping can be modulated by whether or not
the grouping mechanism is relevant for the task.
Other studies (McMains and Kastner 2011) hypothesized that attentional modulation of corti-
cal activity may vary as a function of the degree of perceptual grouping in the display. Participants
were presented either with a strong perceptual group (i.e. an illusory shape), a weak perceptual
group (i.e. an illusory shape with ill-defined borders), or no perceptual group. McMains and
Kastner observed that the amount of attentional modulation on competitive interactions in
early visual cortex depended on the degree of competition left unresolved by bottom-up pro-
cesses: attentional modulation was greatest for displays without perceptual groups—when neural
competition was little influenced by bottom-up mechanisms—and smallest, although still signifi-
cantly present, for displays containing a strong perceptual group. However, when observers paid
attention to the elements forming the perceptual group, competitive interactions were similar
for all levels of perceptual grouping, suggesting that bottom-up and top-down processes interact
dynamically to maximally resolve neural competition.

6  Discussion and framework


The results we have reviewed, drawn from behavioral and neuroimaging studies with both normal
observers and neuropsychological patients are consistent with the view that, whilst not being nec-
essary for at least some forms of perceptual grouping, visual attention can nevertheless modulate
grouping. The modulation effects are stronger on some forms of grouping than others, and atten-
tion seems necessary in order for explicit report and awareness of the perceptual groups to take
place.
One framework to account for the array of data is that offered by TVA (Bundesen 1990). TVA
suggests that selection is directed by an attentional priority map that can be affected both by
bottom-up cues (e.g., the strength of local Gestalt grouping between proximal elements, the
“goodness” of the perceptual object) and top-down factors (e.g., stored knowledge about how
objects interact, or stored knowledge about words). Strong bottom-up grouping could pull atten-
tional priority to stimuli, enabling selection to be captured by the group. In addition, strong
top-down knowledge could push attentional prioritisation to matching stimulus elements
(see also Humphreys and Riddoch 1993). Importantly, these “push and pull” operations may still
operate even if the attentional priority map is damaged or operating under conditions of noise
due to brain lesion. Our conclusion is that attentional selection is dynamically set by bottom-up
stimulus factors, top-down knowledge and the allocation of attention to space and within grouped
regions of objects.
750 Gillebert and Humphreys

Acknowledgements
We would like to thank Lee de-Wit and one anonymous reviewer for their valuable feedback on
this chapter. Preparation of this work was supported by an ERC Advanced Investigator Award to
GWH and a Sir Henry Wellcome Fellowship to CRG (grant number 098771/Z/12/Z).

References
Bálint, R. (1909). Seelenlähmung des “Schauens,” optische Ataxie, räumliche Störung der Aufmerksamkeit.
Monatschrift für Psychiatrie und Neurologie 25: 51–81.
Baylis, G. and Driver, J. (1992). Visual parsing and response competition: The effect of grouping factors.
Perception & Psychophysics 51(2): 145–62.
Baylis, G., Driver, J., and Rafal, R. (1993). Visual extinction and stimulus repetition. Journal of Cognitive
Neuroscience 5(4): 453–66.
Bays, P., Singh-Curry, V., Gorgoraptis, N., Driver, J., and Husain, M. (2010). Integration of goal- and
stimulus-related visual signals revealed by damage to human parietal cortex. The Journal of Neuroscience
30(17): 5968–78.
Behrmann, M. and Tipper, S. P. (1994). Object-based attentional mechanisms: Evidence from patients with
unilateral neglect. In: C. Umilta and M. Moscovitch (eds.), Attention and Performance XV: Conscious
and Nonsconscious Processing and Cognitive Functioning, pp. 351–75. Cambridge: MIT Press.
Behrmann, M., Moscovitch, M., Black, S., and Mozer, M. (1990). Perceptual and conceptual mechanisms
in neglect dyslexia: Two contrasting case studies. Brain 113(4): 1163–83.
Behrmann, M., Zemel, R., and Mozer, M. (1998). Object-based attention and occlusion: Evidence from
normal participants and a computational model. Journal of Experimental Psychology. Human Perception
and Performance 24(4): 1011–36.
Ben-Av, M., Sagi, D., and Braun, J. (1992). Visual attention and perceptual grouping. Perception &
Psychophysics 52(3): 277–94.
Bender, M. B. (1952). Disorders in Perception. Springfield: Thomas Publisher.
Berti, A., Allport, A., Driver, J., Dienes, Z., Oxbury, J., and Oxbury, S. (1992). Levels of processing for
visual stimuli in an “extinguished” field. Neuropsychologia 30(5): 403–15.
Bisley, J. and Goldberg, M. (2010). Attention, intention, and priority in the parietal lobe. Annual Review of
Neuroscience 33: 1–21.
Boutsen, L. and Humphreys, G. (2000). Axis-based grouping reduces visual extinction. Neuropsychologia
38(6): 896–905.
Braet, W. and Humphreys, G. (2006). The “special effect” of case mixing on word
identification: Neuropsychological and transcranial magnetic stimulation studies dissociating case
mixing from contrast reduction. Journal of Cognitive Neuroscience 18(10): 1666–75.
Brunn, J. and Farah, M. (1991). The relation between spatial attention and reading: Evidence from the
neglect syndrome. Cognitive Neuropsychology 8(1): 59–75.
Bundesen, C. (1990). A theory of visual attention. Psychological Review 97(4): 523–47.
Bundesen, C., Habekost, T., and Kyllingsbæk, S. (2005). A neural theory of visual attention: Bridging
cognition and neurophysiology. Psychological Review 112(2): 291–328.
Bundesen, C., Habekost, T., and Kyllingsbæk, S. (2011). A neural theory of visual attention and short-term
memory (NTVA). Neuropsychologia 49(6): 1446–57.
Buxbaum, L. J. and Coslett, H. B. (1994). Neglect of chimeric figures: Two halves are better than a whole.
Neuropsychologia 32(3): 275–88.
Casco, C., Grieco, A., Campana, G., Corvino, M., and Caputo, G. (2005). Attention modulates
psychophysical and electrophysiological response to visual texture segmentation in humans. Vision
Research 45(18): 2384–96.
Mutual interplay between perceptual organization and attention 751

Chan, W. and Chua, F. (2003). Grouping with and without attention. Psychonomic Bulletin & Review
10(4): 932–8.
Chechlacz, M., Rotshtein, P., Bickerton, W. L., Hansen, P. C., Deb, S., and Humphreys, G. W. (2010).
Separating neural correlates of allocentric and egocentric neglect: Distinct cortical sites and common
white matter disconnections. Cognitive Neuropsychology 27(3): 277–303.
Chechlacz, M., Rotshtein, P., Hansen, P. C., Riddoch, J. M., Deb, S., and Humphreys, G. W. (2012). The
neural underpinings of simultanagnosia: Disconnecting the visuospatial attention network. Journal of
Cognitive Neuroscience 24(3): 718–35.
Chechlacz, M., Rotshtein, P., Hansen, P. C., Deb, S., Riddoch, M. J., and Humphreys, G. W. (2013). The
central role of the temporo-parietal junction and the superior longitudinal fasciculus in supporting
multi-item competition: Evidence from lesion-symptom mapping of extinction. Cortex 49(2): 487–506.
Conci, M., Böbel, E., Matthias, E., Keller, I., Müller, H., and Finke, K. (2009). Preattentive surface and
contour grouping in Kanizsa figures: Evidence from parietal extinction. Neuropsychologia 47(3): 726–32.
Corbetta, M. and Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the
brain. Nature Reviews Neuroscience 3(3): 201–15.
Coslett, H., and Saffran, E. (1991). Simultanagnosia: To see but not two see. Brain 114(4): 1523–45.
de-Wit, L. H., Kentridge, R. W., and Milner, A. D. (2009). Object-based attention and visual area LO.
Neuropsychologia 47(6): 1483–90.
de Haan, B. and Rorden, C. (2010). Similarity grouping and repetition blindness are both influenced by
attention. Frontiers in Human Neuroscience 4: 20.
Dent, K., Humphreys, G. W., and Braithwaite, J. J. (2011). Spreading suppression and the guidance of search by
movement: Evidence from negative color carry-over effects. Psychonomic Bulletin & Review 18(4): 690–6.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of
Neuroscience 18: 193–222.
Dombrowe, I., Donk, M., Wright, H., Olivers, C. N., and Humphreys, G. W. (2012). The contribution
of stimulus-driven and goal-driven mechanisms to feature-based selection in patients with spatial
attention deficits. Cognitive Neuropsychology 29(3): 249–74.
Donnelly, N., Humphreys, G. W., and Riddoch, M. J. (1991). Parallel computation of primitive shape
descriptions. Journal of Experimental Psychology. Human Perception and Performance 17(2): 561–70.
Driver, J. and Baylis, G. (1989). Movement and visual attention: The spotlight metaphor breaks down.
Journal of Experimental Psychology. Human Perception and Performance 15(3): 448–56.
Driver, J. and Halligan, P. (1991). Can visual neglect operate in object-centred co-ordinates? An affirmative
single-case study. Cognitive Neuropsychology 8(6): 475–96.
Driver, J., Mattingley, J., Rorden, C., and Davis, G. (1997). Extinction as a pardigm measure of attentional
bias and restricted capacity following brain injury. In: P. Thier and H. O. Karnath (eds.), Parietal Lobe
Contributions to Orientation in 3D Space, pp. 401–29. Heidelberg: Springer-Verlag.
Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental
Psychology. General 113(4): 501–17.
Duncan, J. and Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review
96(3): 433–58.
Duncan, J., Humphreys, G., and Ward, R. (1997). Competitive brain activity in visual attention. Current
Opinion in Neurobiology 7(2): 255–61.
Egly, R., Driver, J., and Rafal, R. (1994). Shifting visual attention between objects and locations: Evidence
from normal and parietal lesion subjects. Journal of Experimental Psychology. General 123(2): 161–77.
Freeman, E., Sagi, D., and Driver, J. (2001). Lateral interactions between targets and flankers in low-level
vision depend on attention to the flankers. Nature Neuroscience 4(10): 1032–6.
Freeman, E., Driver, J., Sagi, D., and Zhaoping, L. (2003). Top-down modulation of lateral interactions
in early vision: Does attention affect integration of the whole or just perception of the parts? Current
Biology 13(11): 985–9.
752 Gillebert and Humphreys

Geng, J. and Behrmann, M. (2006). Competition between simultaneous stimuli modulated by location
probability in hemispatial neglect. Neuropsychologia 44(7): 1050–60.
Gilchrist, I., Humphreys, G. W., and Riddoch, M. (1996). Grouping and extinction: Evidence for low-level
modulation of visual selection. Cognitive Neuropsychology 13(8): 1223–49.
Gilchrist, I., Humphreys, G. W., Riddoch, M., and Neumann, H. (1997). Luminance and edge information
in grouping: A study using visual search. Journal of Experimental Psychology. Human Perception and
Performance 23(2): 464–80.
Gillebert, C. R., Mantini, D., Thijs, V., Sunaert, S., Dupont, P., and Vandenberghe, R. (2011). Lesion
evidence for the critical role of the intraparietal sulcus in spatial attention. Brain 134: 1694–709.
Gillebert, C. R., Dyrholm, M., Vangkilde, S., Kyllingsbæk, S., Peeters, R., and Vandenberghe, R.
(2012). Attentional priorities and access to short-term memory: Parietal interactions. NeuroImage
62(3): 1551–62.
Golay, L., Schnider, A., and Ptak, R. (2008). Cortical and subcortical anatomy of chronic spatial neglect
following vascular damage. Behavioral and Brain Functions 4: 43.
Goldsmith, M. and Yeari, M. (2003). Modulation of object-based attention by spatial focus under
endogenous and exogenous orienting. Journal of Experimental Psychology. Human Perception and
Performance 29(5): 897–918.
Green, C., and Hummel, J. (2006). Familiar interacting object pairs are perceptually grouped. Journal of
Experimental Psychology. Human Perception and Performance 32(5): 1107–19.
Han, S. and Humphreys, G. (2007). The fronto-parietal network and top-down modulation of perceptual
grouping. Neurocase 13(4): 278–89.
Han, S., Humphreys, G. W., and Chen, L. (1999). Parallel and competitive processes in hierarchical
analysis: Perceptual grouping and encoding of closure. Journal of Experimental Psychology. Human
Perception and Performance 25(5): 1411–32.
Han, S., Song, Y., Ding, Y., Yund, E., and Woods, D. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38(6): 926–35.
Han, S., Ding, Y., and Song, Y. (2002). Neural mechanisms of perceptual grouping in humans as revealed
by high density event related potentials. Neuroscience Letters 319(1): 29–32.
Han, S., Jiang, Y., Mao, L., Humphreys, G. W., and Gu, H. (2005a). Attentional modulation of perceptual
grouping in human visual cortex: Functional MRI studies. Human Brain Mapping 25(4): 424–32.
Han, S., Jiang, Y., Mao, L., Humphreys, G. W., and Qin, J. (2005b). Attentional modulation of perceptual
grouping in human visual cortex: ERP studies. Human Brain Mapping 26(3): 199–209.
Harms, L. and Bundesen, C. (1983). Color segregation and selective attention in a nonsearch task.
Perception & Psychophysics 33(1): 11–19.
Heilman, K., Watson, R., and Valenstein, E. (1993). Neglect and related disorders. In: K. Heilman and
E. Valenstein (eds.), Clinical Neuropsychology, pp. 279–336. New York: Oxford University Press.
Heinke, D. and Humphreys, G. W. (2003). Attention, spatial representation, and visual neglect: Simulating
emergent attention and spatial memory in the selective attention for identification model (SAIM).
Psychological Review 110(1): 29–87.
Hillis, A. E., Newhart, M., Heidler, J., Barker, P. B., Herskovits, E. H., and Degaonkar, M. (2005).
Anatomy of spatial attention: Insights from perfusion imaging and hemispatial neglect in acute stroke.
The Journal of Neuroscience 25(12): 3161–7.
Howe, P., Incledon, N., and Little, D. (2012). Can attention be confined to just part of a moving object?
Revisiting target-distractor merging in multiple object tracking. PloS One 7(7): e41491.
Huberle, E. and Karnath, H. (2006). Global shape recognition is modulated by the spatial distance of local
elements—Evidence from simultanagnosia. Neuropsychologia 44: 905–11.
Humphreys, G. W. (1998). Neural representation of objects in space: A dual coding account. Philosophical
Transactions of the Royal Society B: Biological Sciences 353(1373): 1341–51.
Mutual interplay between perceptual organization and attention 753

Humphreys, G. W. and Heinke, D. (1998). Spatial representation and selection in the


brain: Neuropsychological and computational constraints. Visual Cognition 5(1–2): 9–47.
Humphreys, G. W. and Riddoch, M. (1993). Interactions between object and space systems revealed
through neuropsychology. In: D. E. Meyer and S. Kornblum (eds.), Attention and Performance
XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience,
pp. 143–162. Cambridge: MIT Press.
Humphreys, G. W. and Riddoch, M. J. (1994). Attention to within-object and between-object spatial
representations: Multiple sites for visual selection. Cognitive Neuropsychology 11(2): 207–41.
Humphreys, G. W. and Riddoch, M. (2003). From what to where: Neuropsychological evidence for implicit
interactions between object- and space-based attention. Psychological Science 14(5): 487–92.
Humphreys, G. W., Quinlan, P. T., and Riddoch, M. J. (1989). Grouping processes in visual
search: Effects with single- and combined-feature targets. Journal of Experimental Psychology. General
118(3): 258–79.
Humphreys, G. W., Romani, C., Olson, A., Riddoch, M., and Duncan, J. (1994). Non-spatial extinction
following lesions of the parietal lobe in humans. Nature 372(6504): 357–9.
Humphreys, G. W., Bickerton, W.-L., Samson, D., and Riddoch, M. (2012). Birmingham Cognitive Screen
(BCoS). Hove: Psychology Press.
Husain, M. and Kennard, C. (1997). Distractor-dependent frontal neglect. Neuropsychologia 35(6): 829–41.
Ipata, A., Gee, A., Bisley, J., and Goldberg, M. (2009). Neurons in the lateral intraparietal area create a
priority map by the combination of disparate signals. Experimental Brain Research 192(3): 479–88.
Itti, L. and Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual
attention. Vision Research 40(10–12): 1489–506.
Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature
290(5802): 91–7.
Kahneman, D. and Henik, A. (1981). Perceptual organization and attention. In: M. Kubovy and
J. R. Pomerantz (eds.), Perceptual Organization, pp. 181–211. Hillsdale: Erlbaum.
Karnath, H. O. (1994). Subjective body orientation in neglect and the interactive contribution of neck
muscle proprioception and vestibular stimulation. Brain 117: 1001–12.
Karnath, H. O., and Rorden, C. (2012). The anatomy of spatial neglect. Neuropsychologia 50(6): 1010–17.
Karnath, H. O., Ferber, S., Rorden, C., and Driver, J. (2000). The fate of global information in dorsal
simultanagnosia. Neurocase 6: 295–305.
Kastner, S., De Weerd, P., Desimone, R., and Ungerleider, L. G. (1998). Mechanisms of directed attention
in the human extrastriate cortex as revealed by functional MRI. Science 282(5386): 108–11.
Kentridge, R. W. (2011). Attention without awareness: A brief review. In: C. Mole, D. Smithies, and
W. Wu (eds.), Attention: Philosophical and Psychological Essays, pp. 228–46. Oxford: Oxford University Press.
Kentridge, R. W., Heywood, C. A., and Weiskrantz, L. (1999). Attention without awareness in blindsight.
Proceedings of the Royal Society of London. Series B. 266(1430): 1805–11.
Khoe, W., Freeman, E., Woldorff, M., and Mangun, G. (2006). Interactions between attention and
perceptual grouping in human visual cortex. Brain Research 1078(1): 101–11.
Kimchi, R. (2009). Perceptual organization and visual attention. Progress in Brain Research 176: 15–33.
Kimchi, R. and Peterson, M. A. (2008). Figure-ground segmentation can occur without attention.
Psychological Science 19(7): 660–8.
Kimchi, R. and Razpurker-Apfeld, I. (2004). Perceptual grouping and attention: Not all groupings are
equal. Psychonomic Bulletin & Review 11(4): 687–96.
Kimchi, R., Yeshurun, Y., and Cohen-Savransky, A. (2007). Automatic, stimulus-driven attentional capture
by objecthood. Psychonomic Bulletin & Review 14(1): 166–72.
Koch, C. and Tsuchiya, N. (2007). Attention and consciousness: Two distinct brain processes. Trends in
Cognitive Sciences 11(1): 16–22.
754 Gillebert and Humphreys

Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry.
Human Neurobiology 4(4): 219–27.
Kramer, A. and Jacobson, A. (1991). Perceptual organization and focused attention: the role of objects and
proximity in visual processing. Perception & Psychophysics, 50(3): 267–84.
Kumada, T. and Humphreys, G. (2001). Lexical recovery from extinction: Interactions between visual form
and stored knowledge modulate visual selection. Cognitive Neuropsychology 18(5): 465–78.
Lestou, V., Lam, J.M., Humphreys, K., Kourtzi, Z., and Humphreys, G.W. (2014). A dorsal visual route
necessary for global form perception: evidence from neuropsychological fMRI. Journal of Cognitive
Neuroscience, 26(3): 621–34.
Lamy, D., Segal, H., and Ruderman, L. (2006). Grouping does not require attention. Perception &
Psychophysics 68(1): 17–31.
Lavie, N. and Driver, J. (1996). On the spatial extent of attention in object-based visual selection. Perception
& Psychophysics 58(8): 1238–51.
Lestou, V. Lam, J.M., Humphreys, K., Kourtzi, Z., Humphreys, G.W. (2014). A dorsal visual route
necessary for global form perception: evidence from neuropsychological fMRI. Journal of Cognitive
Neuroscience, 26(3), 621–34.
Luria, A. (1959). Disorders of “simultaneous perception” in a case of bilateral occpitoparietal brain injury.
Brain 82: 437–49.
Mack, A. and Rock, I. (1998). Inattentional Blindness. Cambridge: MIT Press.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). Perceptual organization and attention.
Cognitive Psychology 24(4): 475–501.
Malhotra, P. A., Soto, D., Li, K., and Russell, C. (2013). Reward modulates spatial neglect. Journal of
Neurology Neurosurgery and Psychiatry 84(4): 366–9.
Marr, D. (1982). Vision. San Francisco: W. H. Freeman and Co.
Martinez, A., Teder-Salejarvi, W., and Hillyard, S. A. (2007). Spatial attention facilitates selection of
illusory objects: evidence from event-related brain potentials. Brain Research 1139: 143–52.
Martinez, A., Teder-Salejarvi, W., Vazquez, M., Molholm, S., Foxe, J. J., Javitt, D. C., et al. (2006). Objects
are highlighted by spatial attention. Journal of Cognitive Neuroscience 18(2): 298–310.
Mattingley, J., Davis, G., and Driver, J. (1997). Preattentive filling-in of visual surfaces in parietal
extinction. Science 275(5300): 671–4.
Mavritsaki, E., Heinke, D., Allen, H., Deco, G., and Humphreys, G. W. (2011). Bridging the gap between
physiology and behavior: evidence from the sSoTS model of human visual attention. Psychological
Review 118(1): 3–41.
McMains, S. and Kastner, S. (2010). Defining the units of competition: Influences of perceptual
organization on competitive interactions in human visual cortex. Journal of Cognitive Neuroscience
22(11): 2417–26.
McMains, S. and Kastner, S. (2011). Interactions of top-down and bottom-up mechanisms in human visual
cortex. The Journal of Neuroscience 31(2): 587–97.
Mesulam, M. M. (2000). Attentional networks, confusional states, and neglect syndromes. In: M.
M. Mesulam (ed.), Principles of Behavioral and Cognitive Neurology, 2nd edn., pp. 174–256.
New York: Oxford University Press.
Moore, C. and Egeth, H. (1997). Perception without attention: Evidence of grouping under conditions of
inattention. Journal of Experimental Psychology. Human Perception and Performance 23(2): 339–52.
Moore, C., Yantis, S., and Vaughan, B. (1998). Object-based visual selection: Evidence from perceptual
completion. Psychological Science 9(2): 104–10.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology 9(3): 353–83.
Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts.
Mutual interplay between perceptual organization and attention 755

Norman, L. J., Heywood, C. A., and Kentridge, R. W. (2013). Object-based attention without awareness.
Psychological Science, 24(6): 836–43.
Pavlovskaya, M., Soroker, N., and Bonneh, Y. (2007). Extinction is not a natural consequence of unilateral
spatial neglect: evidence from contrast detection experiments. Neuroscience Letters 420(3): 240–4.
Posner, M. I. (1994). Attention: The mechanisms of consciousness. Proceedings of the National Academy of
Sciences of the United States of America 91(16): 7398–403.
Prinz, J. J. (2011). Is attention necessary and sufficient for consciousness? In: C. Mole, D. Smithies, and
W. Wu (eds.), Attention: Philosophical and Psychological Essays, pp. 174–203. Oxford: Oxford University
Press.
Ptak, R. (2012). The frontoparietal attention network of the human brain: action, saliency, and a priority
map of the environment. The Neuroscientist 18(5): 502–15.
Ptak, R. and Fellrath, J. (2013). Spatial neglect and the neural coding of attentional priority. Neuroscience
and Biobehavioral Reviews 37(4): 705–22.
Ptak, R. and Schnider, A. (2005). Visual extinction of similar and dissimilar stimuli: Evidence for
level-dependent attentional competition. Cognitive Neuropsychology, 22(1): 111–27.
Ptak, R., Valenza, N., and Schnider, A. (2002). Expectation-based attentional modulation of visual
extinction in spatial neglect. Neuropsychologia 40(13): 2199–205.
Pylyshyn, Z. and Storm, R. (1988). Tracking multiple independent targets: Evidence for a parallel tracking
mechanism. Spatial Vision 3(3): 179–97.
Reynolds, J. H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. The Journal of Neuroscience 19(5): 1736–53.
Richard, A. M., Lee, H., and Vecera, S. P. (2008). Attentional spreading in object-based attention. Journal of
Experimental Psychology. Human Perception and Performance 34(4): 842–53.
Riddoch, M. and Humphreys, G. (1983). The effect of cueing on unilateral neglect. Neuropsychologia
21(6): 589–99.
Riddoch, M., Humphreys, G., Cleton, P., and Fery, P. (1990). Interaction of attentional and lexical
processes in neglect dyslexia. Cognitive Neuropsychology 7(5–6): 479–517.
Riddoch, M., Humphreys, G. W., Edwards, S., Baker, T., and Willson, K. (2002). Seeing the action:
Neuropsychological evidence for action-based effects on object selection. Nature Neuroscience 6(1): 82–9.
Riddoch, M., Humphreys, G., Hickman, M., Clift, J., Daly, A., and Colin, J. (2006). I can see what you are
doing: Action familiarity and affordance promote recovery from extinction. Cognitive Neuropsychology
23(4): 583–605.
Riddoch, M., Bodley Scott, S., and Humphreys, G. (2010). No direction home: Extinction is affected by
implicit motion. Cortex 46(5): 678–84.
Riddoch, M., Pippard, B., Booth, L., Rickell, J., Summers, J., Brownson, A., et al. (2011). Effects of
action relations on the configural coding between objects. Journal of Experimental Psychology. Human
Perception and Performance 37(2): 580–7.
Rizzo, M. and Vecera, S. P. (2002). Psychoanatomical substrates of Bálint’s syndrome. Journal of Neurology,
Neurosurgery, and Psychiatry 72(2): 162–78.
Roberts, K. and Humphreys, G. W. (2011). Action relations facilitate the identification of briefly-presented
objects. Attention, Perception & Psychophysics 73(2): 597–612.
Rock, I., Linnett, C., Grant, P., and Mack, A. (1992). Perception without attention: Results of a new
method. Cognitive Psychology 24(4): 502–34.
Rossetti, Y., Rode, G., Pisella, L., Farné, A., Li, L., Boisson, D., et al. (1998). Prism adaptation to a
rightward optical deviation rehabilitates left hemispatial neglect. Nature 395(6698): 166–9.
Russell, C. and Driver, J. (2005). New indirect measures of “inattentive” visual grouping in a
change-detection task. Perception & Psychophysics 67(4): 606–23.
756 Gillebert and Humphreys

Saenz, M., Buracas, G. T., and Boynton, G. M. (2002). Global effects of feature-based attention in human
visual cortex. Nature Neuroscience 5(7): 631–2.
Schindler, I., McIntosh, R. D., Cassidy, T. P., Birchall, D., Benson, V., Ietswaart, M., et al. (2009). The
disengage deficit in hemispatial neglect is restricted to between-object shifts and is abolished by prism
adaptation. Experimental Brain Research 192(3): 499–510.
Scholl, B., Pylyshyn, Z., and Feldman, J. (2001). What is a visual object? Evidence from target merging in
multiple object tracking. Cognition 80(1–2): 159–77.
Seron, X., Coyette, F., and Bruyer, R. (1989). Ipsilateral influences on contralateral processing in neglect
patients. Cognitive Neuropsychology 6(5): 475–98.
Shalev, L., Humphreys, G. W., and Mevorach, C. (2004). Global processing of compound letters in a patient
with Bálint’s syndrome. Cognitive Neuropsychology 22(6): 737–51.
Shomstein, S., Kimchi, R., Hammer, M., and Behrmann, M. (2010). Perceptual grouping operates
independently of attentional selection: evidence from hemispatial neglect. Attention, Perception &
Psychophysics 72(3): 607–18.
Sieroff, E., Pollatsek, A., and Posner, M. (1988). Recognition of visual letter strings following injury to the
posterior visual spatial attention system. Cognitive Neuropsychology 5(4): 427–49.
Stone, S., Halligan, P., and Greenwood, R. (1993). The incidence of neglect phenomena and related
disorders in patients with an acute right or left hemisphere stroke. Age and Ageing 22(1): 46–52.
Tian, Y. H., Huang, Y., Zhou, K., Humphreys, G. W., Riddoch, M. J., and Wang, K. (2011). When
connectedness increases hemispatial neglect. PloS One 6(9): e24760.
Ticini, L. F., de Haan, B., Klose, U., Nagele, T., and Karnath, H. O. (2010). The role of temporo-parietal
cortex in subcortical visual extinction. Journal of Cognitive Neuroscience 22(9): 2141–50.
Tipper, S. P. and Behrmann, M. (1996). Object-centered not scene-based visual neglect. Journal of
Experimental Psychology. Human Perception and Performance 22(5): 1261–78.
Treisman, A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal
of Experimental Psychology. Human Perception and Performance 8(2): 194–214.
Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the
Royal Society B: Biological Sciences 353(1373): 1295–306.
Vallar, G. and Perani, D. (1986). The anatomy of unilateral neglect after right-hemisphere stroke lesions.
A clinical/CT-scan correlation study in man. Neuropsychologia 24(5): 609–22.
Vandenberghe, R. and Gillebert, C. R. (2009). Parcellation of parietal cortex: Convergence between
lesion-symptom mapping and mapping of the intact functioning brain. Behavioural Brain Research
199(2): 171–82.
Vandenberghe, R., Molenberghs, P., and Gillebert, C. R. (2012). Spatial attention deficits in humans: The
critical role of superior compared to inferior parietal lesions. Neuropsychologia 50(6): 1092–103.
Vecera, S. and Behrmann, M. (1997). Spatial attention does not require preattentive grouping.
Neuropsychology 11(1): 30–43.
Vecera, S. and Farah, M. (1994). Does visual attention select objects or locations? Journal of Experimental
Psychology. General 123(2): 146–60.
Verdon, V., Schwartz, S., Lovblad, K. O., Hauert, C. A., and Vuilleumier, P. (2010). Neuroanatomy
of hemispatial neglect and its functional components: A study using voxel-based lesion-symptom
mapping. Brain 133(3): 880–94.
Vossel, S., Eschenbeck, P., Weiss, P. H., Weidner, R., Saliger, J., Karbe, H., et al. (2011). Visual extinction
in relation to visuospatial neglect after right-hemispheric stroke: Quantitative assessment and statistical
lesion-symptom mapping. Journal of Neurology, Neurosurgery and Psychiatry 82(8): 862–8.
Vuilleumier, P. (2000). Faces call for attention: Evidence from patients with visual extinction.
Neuropsychologia 38(5): 693–700.
Vuilleumier, P. and Landis, T. (1998). Illusory contours and spatial neglect. Neuroreport 9(11): 2481–4.
Mutual interplay between perceptual organization and attention 757

Vuilleumier, P. and Rafal, R. (1999). “Both” means more than “two”: Localizing and counting in patients
with visuospatial neglect. Nature Neuroscience 2(9): 783–4.
Vuilleumier, P. and Rafal, R. (2000). A systematic study of visual extinction. Between- and within-field
deficits of attention in hemispatial neglect. Brain 123: 1263–79.
Vuilleumier, P. and Sagiv, N. (2001). Two eyes make a pair: Facial organization and perceptual learning
reduce visual extinction. Neuropsychologia 39(11): 1144–9.
Vuilleumier, P., Sagiv, N., Hazeltine, E., Poldrack, R., Swick, D., Rafal, R., et al. (2001a). Neural fate
of seen and unseen faces in visuospatial neglect: A combined event-related functional MRI and
event-related potential study. Proceedings of the National Academy of Sciences of the United States of
America 98(6): 3495–500.
Vuilleumier, P., Valenza, N., and Landis, T. (2001b). Explicit and implicit perception of illusory contours in
unilateral spatial neglect: Behavioural and anatomical correlates of preattentive grouping mechanisms.
Neuropsychologia 39(6): 597–610.
Ward, R., Goodrich, S., and Driver, J. (1994). Grouping reduces visual extinction: Neuropsychological
evidence for weight-linkage in visual selection. Visual Cognition 1(1): 101–29.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychologische Forschung 4: 301–50.
Translated as “Investigations on Gestalt principles, II.”. In: L. Spillmann (ed.) (2012). On motion and
figure-ground organization, pp. 2127–82. Cambridge: MIT Press.
Wu, Y., Chen, J., and Han, S. (2005). Neural mechanisms of attentional modulation of perceptual grouping
by collinearity. Neuroreport 16(6): 567–70.
Wyart, V. and Tallon-Baudry, C. (2008). Neural dissociation between visual awareness and spatial
attention. The Journal of Neuroscience 28(10): 2667–79.
Yeshurun, Y., Kimchi, R., Sha’shoua, G., and Carmel, T. (2009). Perceptual objects capture attention. Vision
Research 49(10) 1329–35.
Young, A. W., Hellawell, D. J., and Welch, J. (1992). Neglect and visual recognition. Brain 115: 51–71.
Zaretskaya, N., Anstis, S., and Bartels, A. (2013). Parietal cortex mediates conscious perception of illusory
gestalt. The Journal of Neuroscience 33(2): 523–31.
Chapter 37

Holistic face perception


Marlene Behrmann, Jennifer J. Richler,
Galia Avidan, and Ruth Kimchi

Unlike most objects, for which recognition at the category-level is usually sufficient (e.g., ‘chair’
Rosch et al. 1976), recognizing faces at the individual-level (e.g., ‘Bob’ rather than ‘Joe’) is essential
in day-to-day interactions. But face recognition, as a perceptual process, is not trivial: in addition
to the fact that recognition must be rapidly and accurately accomplished, there is the added per-
ceptual burden as all faces consist of the same kinds of features (eyes, nose, and mouth) appearing
in the same configuration (eyes above nose, nose above mouth). Thus, an obvious challenge asso-
ciated with face recognition is the need to individuate a large number of visually similar exemplars
successfully, while, at the same time, to generalize across perceptual features that are not critical
for the purpose of identification, such as differences in illumination or viewpoint, or even in
the age of the face and changes in hairstyle, amongst others. As evident, the cognitive demands
of face perception differ from most other forms of non-face object recognition. Unsurprisingly,
then, there are many instances where performance with faces differs from performance with other
categories of objects. For example, inversion of the input disrupts recognition for faces dispro-
portionately compared with other objects (Yin 1969), and changing the spatial relations between
features impairs face perception to a greater degree than is true for other objects (Tanaka and
Sengco 1997).
In light of these apparent distinctions, many have posited that faces are processed differently
from other objects, and that the representations and/or processes that mediate face perception are
qualitatively different from those supporting the recognition of other non-face object categories
(Farah et al. 1995; Farah et al. 1998; Tanaka and Farah 2003). Specifically, according to some pro-
ponents, face processing is thought to require encoding as a whole or a Gestalt, and this is neces-
sary in order to ensure that, during processing, the input matches a face template that enforces
the first-order configuration of parts (e.g., eyes above nose, nose above mouth). Such (holistic or
unified) representations are believed to facilitate the extraction of second-order configural infor-
mation (e.g., spacing between features) that is coded as deviations from the template prototype
(Diamond and Carey 1986). This second-order spatial or configural information is, according
to some researchers, particularly critical for distinguishing between objects that are structurally
very similar; the class of faces is a paradigmatic example of a collection of homogenous exemplars
(for review, see Maurer et al. 2002). A possible corollary of the assumption that face representa-
tions are processed holistically is that the individual parts are not explicitly or independently
represented. In its extreme version, this view assumes that faces are not decomposed into parts at
all and, moreover, the parts themselves are especially difficult to access (Davidoff and Donnelly
1990). Consistent with this is the claim that the face template may have no internal part structure;
as stated, ‘the representation of a face used in face recognition is not composed of the faces’ parts’
Holistic Face Perception 759

(Tanaka and Farah 1993). On such an account, there is mandatory perceptual integration across
the entire face region (McKone 2008), or, similarly, mandatory interactive processing of all facial
information (Yovel and Kanwisher 2004) (and for a recent review of holistic processing in relation
to the development of face perception, see McKone et al. 2012). Note that the notion of a unified
face template bears similarity to the view espoused by Gestalt psychologists and the reader is
referred to other chapters in this volume that articulate this concept in greater depth (for example,
Koenderink, this volume) and also that offer empirical evidence for the use of such a Gestalt and
individual differences therein (for example, de-Wit and Wagemans, this volume).
In this chapter, we focus specifically on the viability of a unified face template as implicated in
face perception. We first review behavioral evidence suggesting that face recognition is indeed
holistic in nature (Part 1), and we draw on data from normal observers and patient groups to
support this point. In Part 2, we examine the nature of the mechanisms that give rise to holistic
face recognition. Specifically, we argue that holistic face processing is not necessarily based on
template-like, undifferentiated representations and, rather, we suggest that holistic processing can
also be accomplished by alternative mechanisms such as an automatic attentional strategy and/or
that it can emerge from the interactive processing of face configuration and features. We conclude
by claiming that holistic processing is engaged in face perception but that the underlying mecha-
nism is not likely to be that of a single, unified template.

Evidence that Face Recognition is Holistic


Normal Observers
Several lines of empirical evidence have been offered in support of the view that face recogni-
tion is holistic. A particularly strong line of support derives from the ‘part-whole effect’, which
refers to the finding that a particular facial feature (e.g., the nose) is recognized less accurately
when tested in isolation (65 % accuracy) than when presented in the context of the entire
studied face (77 %), an effect that is not observed for non-face objects (e.g., houses; isolated
house parts, 81 % accuracy, whole-house, 79 % accuracy) (Tanaka and Farah, 1993). This find-
ing has been taken as evidence that face parts (but not object parts) are represented together:
thus, matching an individual isolated face feature is less accurate than matching an entire face
because the stored representation corresponds to the entire face rather than to its individual
parts. In anticipation of the argument we present later that face parts must be represented as
well, we draw the reader’s attention to the observation that, even in this classic study, partici-
pants must have access to parts to some extent (see 65 % accuracy for isolated face part match-
ing). Therefore, the conclusion that there is no decomposition of the face is not supported by
the empirical results.
In addition to evidence from such part-whole effects, data obtained from another well-known
paradigm, the composite task, is also often taken as strong evidence that faces—but not other
objects—are represented as undifferentiated wholes. In the composite task1 (Hole 1994; Young
et al. 1987) (see Figure 37.1), participants are asked to judge whether one half (e.g., the top) of
two sequentially presented composite faces are the same or different while ignoring the other,

  Note that there are two versions of the composite task being used in the literature, and an ongoing debate over
1

which is more appropriate (e.g., Gauthier and Bukach 2007 vs. Robbins and McKone 2007). The interested
reader might also wish to consult the recent exchange by Rossion (2013) and by Richler and Gauthier (2013).
Details of this debate are beyond the scope of this chapter.
760 Behrmann, Richler, Avidan, and Kimchi

Study face Mask Cue Test face


Fig. 37.1  Example of a single trial from the composite task. Participants are asked to judge whether
the cued face half (in this case, top) is the same or different between the study and test face
while ignoring the other, task-irrelevant face half (in this case, bottom). Here, the correct answer
is ‘different’ because the top parts are different, even though the bottom parts are the same.
Holistic processing is indexed by the extent to which the task-irrelevant bottom part interferes with
performance on the target part as a function of alignment.

Faces Novel Objects


2.5 2.5

2 2
Sensitivity (d’)

1.5 1.5

1 1

0.5 0.5 Congruent


Incongruent
0 0
Aligned Misaligned Aligned Misaligned
Fig. 37.2  Re-plotted composite task data from Richler et al. (2011c, Experiment 2). Holistic processing is
indexed by a congruency effect (better performance on congruent vs. incongruent trials) that is reduced
or eliminated when parts are misaligned. As shown above, this effect is robust for faces (left panel), but
is absent for non-face objects in novices (right panel).
Data from Jennifer J. Richler, Michael L. Mack, Thomas J. Palmeri, and Isabel Gauthier, Inverted faces are
(eventually) processed holistically, Vision Research, 51(3), pp. 333–342, doi: 10.1016/j.visres.2010.11.014,
Experiment 2, 2011.

task-irrelevant face half (e.g., the bottom). The face stimuli were taken from the MPI face data-
base (Troje and Bülthoff 1996). Holistic processing is indexed by a failure to selectively attend
to just the one half of the face: because faces are processed as wholes, the task-irrelevant face
half cannot be successfully ignored and, consequently, influences judgments on the target face
half. Thus, participants are more likely to produce a false alarm (say ‘different’) when the two
top halves are identical and when their bottom halves differ than when both the top and the two
bottom halves of the two faces are identical. Interference from the task-irrelevant half is reduced
when the normal face configuration is disrupted by misaligning the face halves (Hole 1994;
Richler et al. 2008), and, as one might expect from the holistic face view, is absent for non-face
objects (Farah et al. 1998; Richler et al. 2011d) (see Figure 37.2). Importantly, the magnitude of
holistic processing as indexed by the interference in the composite task is a significant predictor
Holistic Face Perception 761

of face recognition abilities more generally (DeGutis et al. 2013; McGugin et al. 2012; Richler
et al. 2011b), validating the presumed role of holistic processing as an important component of
face recognition2.

Prosopagnosia
Support for the claim that face processing is necessarily holistic (i.e., faces treated as an undiffer-
entiated whole) is also gleaned from the findings that individuals who suffer from prosopagnosia
and fail to recognize faces appear unable to process visual information in a holistic or configural
fashion. In one of the earliest case studies, Levine and Calvanio (1989) argued that patient LH
suffered from a deficit in configural processing, which they defined as ‘the ability to identify by
getting an overview of an item as a whole in a single glance’ (p. 160). This patient painstakingly
analyses a stimulus such as a face detail-by-detail, over several visual fixations, noting the shapes
of the features and their spatial relationships. Consistent with the failure to represent the whole,
this patient was also impaired in the Gestalt completion tests of visual closure. Similar descrip-
tions abound for other cases. In his popular book The Man who Mistook his Wife for a Hat, Oliver
Sacks reports the following incident concerning his patient, Dr P.
Sacks noted that when Dr. P. looked at him, he seemed to fixate on individual features of his face—an
eye, the right ear, his chin—instead of taking it in as a whole. The only faces he got right were of his
brother— ‘Ach, Paul! That square jaw, those big teeth; I would know Paul anywhere!’—and Einstein
whom he also seemed to recognize from characteristic features—Einstein’s signature hair and mustache.

Considerable empirical evidence supports such anecdotes, with the central claim being that a
breakdown in holistic processing or the ability to integrate the disparate local elements of a face
into a coherent unified representation is causally related to the impairment in face processing
(Barton 2009; Rivest et al. 2009). Indeed, it has been suggested that a key characteristic of patients
with acquired prosopagnosia (AP) is the inability to derive a unified perceptual representation
from the multiple features of an individual face (Ramon et al. 2010; Saumier et al. 2001). Similar
claims have been made about individuals with congenital prosopagnosia (CP). CP is a more
recently recognized deficit in face recognition that occurs in the absence of frank neurological
damage and altered cognition or vision, and is apparently present even over the course of develop-
ment. The growing consensus is that CP individuals are also unable to rapidly process the whole of
the face (e.g., Avidan et al. 2011; Behrmann et al. 2006; Lobmaier et al. 2010; Palermo et al. 2011),
and it appears that the patterns of impairment in face perception are extremely similar across the
acquired and congenital groups of prosopagnosia (although performance in perceiving emotional
expression may differ across the groups, e.g. Humphreys et al. 2007).
We now consider the same sources of evidence gleaned from individuals with prosopagnosia
as we did with the normal participants (part-whole and composite paradigms) and we consider
some additional data from experiments that manipulate spatial configuration between face parts
and spatial relations sensitivity). Rather few studies have directly examined the part-whole effect
in prosopagnosia. In a variant of the standard part-whole task, two well-characterized APs showed
a slight part-over-whole face advantage for eyes trials, in contrast to whole-over-part advantage
found in controls, suggesting that these prosopagnosic individuals have severe holistic processing

2  Studies that have not found support for this relationship have been criticized for the measure of holistic
processing used (Konar et al. 2010) and erroneous interpretation of a correlation based on difference scores
(Wang et al. 2012).
762 Behrmann, Richler, Avidan, and Kimchi

deficits, at least for the eye region (Busigny et al., 2010; Ramon et al. 2010). Similar findings were
obtained in a small group of congenital (or as they define them, developmental) prosopagnos-
ics who showed a lack of a holistic advantage for both Korean and Caucasian faces (though CPs
overall holistic advantage for Caucasian faces was not significantly different from that of controls,
who did show a significant advantage) (DeGutis et al. 2011). Compatible with these findings is
the result of an incomplete part-whole task (no isolated parts trials) in which a single patient
was significantly worse at discriminating part changes in faces than controls, but not for houses
(de Gelder and Rouw, 2000a). These data support the claim that the prosopagnosic individual did
not benefit from the context of the face when making part judgments. A recent study has repli-
cated the lack of benefit from the whole in CP but it appears that this may be specific to the eyes
as trials in which the mouth was presented in context versus alone showed no differential perfor-
mance across CP and controls (Degutis et al. 2012). The differential reliance on mouth versus eye
processing in prosopagnosia has been reported on several occasions (Barton et al. 2003; Bukach
et al. 2008; Caldara et al. 2005).
As has been the case with normal individuals (see above), the composite face paradigm has
been employed to explore the underlying processing in individuals with prosopagnosia. In con-
trast with normal individuals, in the context of a composite face paradigm, congenital prosopag-
nosic individuals performed equivalently with aligned and misaligned faces and were impervious
to (the normal) interference from the task-irrelevant bottom part of faces (Avidan et al. 2011).
Interestingly, the extent to which these individuals were impervious to the misalignment manipu-
lation, was correlated with poorer performance on diagnostic face processing tasks (such as the
Cambridge Face Memory Test; Duchaine and Nakayama, 2006). Consistent with these results,
others have also shown that prosopagnosic (both AP and CP) individuals show reduced interfer-
ence from the unattended part of the face in the composite face paradigm (Busigny et al. 2010;
Ramon et al. 2010) (note, however, that again, not every individual with prosopagnosia evinces
the same profile and some appear to show the normal interference effects; Le Grand et al. 2006;
Susilo et al. 2010). In general, these findings have been taken as evidence to support the notion
that the severity of the face recognition impairment is directly related to the difficulty in attending
to multiple parts of the face in parallel.
Individuals with prosopagnosia also show reduced sensitivity to the spacing between the fea-
tures, implying a difficulty in representing the ‘second order’ relations between facial features.
For example, Ramon and Rossion (2010) reported that patient PS, who suffers from acquired
prosopagnosia, performed poorly on a task that required matching unfamiliar faces in which
the faces differed either with respect to local features or inter-feature distances, over the upper
and lower areas of the face. PS was impaired at matching when the relative distances between the
features differed and this was true even when the location of the features was held constant (and
uncertainty about their position was eliminated) (Caldara et al. 2005; Orban de Xivry et al. 2008).
Consistent with this, patients with prosopagnosia appear to adopt an analytical feature-by-feature
face processing style and focus only on a small spatial window at a time (Bukach et  al. 2006).
The failure to focus on the eye region of the face (Bukach et al. 2006; Bukach et al. 2008; Caldara
et  al. 2005; Rossion et  al. 2009) as well as the relative distances between features (Barton and
Cherkasova 2005; Barton et al. 2002), as mentioned above, may be a direct consequence of defec-
tive holistic processing (Rivest et al. 2009). Also, in a paradigm in which interocular distance or
the distance between the nose and mouth were altered or the relative distances between features
was changed, prosopagnosic patients perform more poorly when required to decide which of
three faces was ‘odd’ (Barton et al. 2002).
Holistic Face Perception 763

Finally, we review those studies, which examine whether both configural and/or featural pro-
cessing are affected in prosopagnosia. For example, some studies that directly examined featural
versus configural processing have found that while CPs show face discrimination deficits for faces
that differ only in configural information (Lobmaier et al. 2010), whereas others report that CPs
are impaired in discriminating both faces that differ only in configural information and faces that
differ only in featural information (Barton et al. 2003; Duchaine et al. 2007; Yovel and Duchaine
2006). However, Le Grand et al. (2006) found that three of their eight developmental prosopag-
nosic individuals were impaired in discrimination of faces that differed in the shape of internal
features, four were impaired in discrimination of faces that differed in spacing, and one partici-
pant performed normally on both discrimination tasks. Taken together, these findings suggest
that CPs can be impaired in processing featural information, configural information, or both.
Whether the impairment in configural and featural processing versus configural processing alone
reflects the heterogeneity in the population or whether the methodological differences in the vari-
ous paradigms elicit somewhat different patterns of performance, remains to be determined.

Why is Face Recognition Holistic?


The Holistic Account
Much of the literature on holistic face processing in normal observers has focused on effects of
stimulus manipulations, such as spatial frequency filtering (Cheung et al. 2008; Goffaux 2009;
Goffaux and Rossion 2006), face race (e.g., Michel et al. 2006; Mondloch et al., 2010), and ori-
entation (e.g., Robbins and McKone 2003; Rossion and Boremanse 2008). Such results are often
explained by a holistic representation account in which manipulations that disrupt first-order
configuration (e.g., inversion, misalignment) result in patterns that are no longer consistent
with the face template, and so are encoded more similarly to other objects. This latter encoding
style permits selective attention to parts (i.e., no composite effect), as parts are not integrated
in the representation, and, additionally, eliminates any advantage of a whole-face context when
matching parts because part representations themselves are explicitly available (no part-whole
effect).
Importantly, however, although the results from the part-whole and composite task are consist-
ent with a processing mechanism that might be optimized for faces versus other objects, there
is surprisingly little direct empirical evidence that this is the result of holistic representations
per se. Indeed, there are several results that are incompatible with the notion of template-like face
representations created during encoding. For example, when a face composite task and a novel
object composite task are interleaved, novel objects are processed holistically in some conditions.
Specifically, participants exhibit difficulty in selectively attending to parts of novel objects when
they are preceded by an aligned face (that is processed holistically) but not when they are pre-
ceded by a misaligned face (that is not processed holistically; Richler et al. 2009a). This result is
difficult to explain by invoking a face template—how would a holistic face representation created
during encoding influence processing of a subsequent object that does not share the same con-
figuration of features?
Other work showing that holistic processing can be modulated by experimentally induced
attentional biases is also difficult to reconcile with the idea of a face template. For example, holis-
tic processing of faces is larger when each trial of the composite task is preceded by a task that
requires attention to the global elements of an unrelated, non-face hierarchical stimulus like
Navon compound letters (Navon 1977) versus a task that requires attention to the local elements
764 Behrmann, Richler, Avidan, and Kimchi

of the compound letter (Gao et al. 2011; Macrae and Lewis 2002). Similarly, Curby et al. (2012)
found that inducing a negative mood—a manipulation that is believed to promote a local process-
ing bias (Basso et al. 1996)—led to a decrease in holistic processing measured in the composite
task relative to inducing a positive or neutral mood. Thus, as is evident, promoting global vs. local
attentional biases can obviously influence holistic processing, but there is no simple explanation
for how such manipulations would alter the use of a face template, or disrupt face representations.
For example, although it is conceivable that these global/local manipulations operate on a tem-
plate representation, such that a global bias enhances the Gestalt representation and a local bias
draws attention to features, it is unclear how the latter would work if the face features were not
independently represented in the first place. The key distinction then is between an underlying
holistic template, which serves as the representation of a face versus a mechanism that allows for
rapid processing of the disparately represented features in tandem.
Finally, according to the holistic representation view, inverted faces do not fit the face template
(first-order configuration is disrupted), and so should (and could) never be processed holistically
(e.g., Rossion and Boremanse 2008). Thus, the holistic representation view posits a qualitative pro-
cessing difference between upright and inverted faces. However, a growing body of work suggests
that performance differences between upright and inverted faces are quantitative, such that upright
faces and inverted faces are processed in qualitatively the same way, but that upright faces are pro-
cessed more efficiently than inverted faces (Loftus et al. 2004; Riesenhuber et al. 2004; Sekuler et al.
2004). Inversion effects (and their loss in patients with prosopagnosia) have also been documented
for non-face objects, especially those that have a canonical orientation (de Gelder et al. 1998; de
Gelder and Rouw 2000b). Consistent with this more graded account of inversion effects, results
from a composite task show that both upright and inverted faces are processed equally holistically,
but overall performance is better and faster for upright faces (Richler et al. 2011c)3.
One interesting consequence of the difference in processing efficiency for upright versus
inverted faces is that holistic effects require longer presentation times to be observed for inverted
faces (Richler et al. 2011c). Interference from task-irrelevant parts are observed for upright faces
presented for as little as 50ms (Richler et  al. 2009b; Richler et  al. 2011c), and the modulation
of this interference due to misalignment that characterizes holistic processing occurs with pres-
entation times of 183ms. In contrast, although performance is above chance for inverted faces
presented for 50ms and 183ms, there is no evidence for holistic processing of inverted faces until
presentation times of 800ms (Richler et al. 2011c).
The interaction between presentation time and holistic processing challenges the holistic repre-
sentation account for several reasons. First, the holistic representation account would not predict
that presentation time should influence holistic processing—faces either are or are not encoded
into the face template, and, consequently, holistic processing should be all or none. Second, the fact
that presentation time influences holistic processing suggests that parts are, in fact, being encoded
independently: above chance performance in the composite task only requires encoding of the
target part, whereas interference indicative of holistic processing in the composite task requires
that the irrelevant part be encoded as well. Accordingly, one interpretation of these results is that
at 50ms and 183ms only the target part of inverted faces could be encoded, resulting in successful
performance but no interference. Longer presentation times are required to encode both parts of

  This study also shows that the results of studies that find reduced holistic processing of inverted faces are
3

driven by differences in response bias between upright and inverted faces. Interested readers are encouraged
to see Richler et al. (2012) and Richler and Gauthier (2013) for discussion of this issue.
Holistic Face Perception 765

inverted faces, so more time is required to observe interference. In contrast, although they may
be encoded separately, both the target and distractor part in upright faces can be encoded within
50ms (Curby and Gauthier 2009), leading to interference from holistic processing at the fastest
presentation times.
While compelling, the evidence for independent part representations based on the interaction
between holistic processing and time in Richler et al. (2011c) is certainly speculative. However,
other findings also suggest that individual face features can be used in face recognition (e.g.,
Cabeza and Kato 2000; Rhodes et al. 2006; Schwarzer and Massaro 2001), indicating that part
representations are accessible. Indeed, participants can recognize previously learned faces with
above chance accuracy when the face parts are presented in a scrambled configuration, a condi-
tion in which recognition must rely on feature information alone because configural informa-
tion has been removed. Although recognition performance is better in a blurred condition where
facial configuration is maintained but facial featural information is ‘blurred out’ compared to the
scrambled condition, above chance performance in the scrambled condition implies that feature
representations are available and can be used, as well (Schwaninger et al. 2009; see also Hayward
et al. 2008). In fact, at the extreme, face discrimination performance can be guided by a single
feature in the absence (or near absence) of configural variability (Amishav and Kimchi 2010).

Holistic Processing as an Automatized Attentional Strategy


If faces are not encoded as unified representations, and face parts can be encoded independently,
then what mechanism gives rise to differences in performance between faces and objects, and how
can we account for the interference effects that are unique to faces and are described as holistic pro-
cessing? Studies comparing holistic processing of faces and failures of selective attention that can
be found for other objects converge to show that while failures of selective attention to object parts
are malleable and responsive to changes in task demands and strategy (Richler et al. 2009a; Wong
and Gauthier 2010), holistic processing of faces is automatic and impervious to top-down strategic
manipulations (Richler et al. 2011a; Richler et al. 2009b). This has led to the suggestion that holistic
processing of faces is the outcome of a perceptual strategy of attending to all object parts together
and that this strategy becomes automated with extensive experience (Richler et al. 2011d). Unlike
objects where parts are interchangeable and largely independent (e.g., one can replace the armrests
of a chair without affecting the shape of the cushions), face parts often change together: face parts
move together during speech or changes in emotional expression. Thus, although we can volition-
ally attend to all parts of a chair, this attentional strategy becomes increasingly automatized in
cases where we learn that the higher-order statistics are particularly useful. Importantly, although
an attentional strategy may influence encoding, it does not require that the individual face parts
attended to simultaneously are integrated at the level of the resulting representation.
The results from the interleaved face and object composite tasks described earlier can be accom-
modated by this account: the holistic processing strategy that was automatically engaged for the
aligned face could not be ‘turned off ’ in time to process the subsequent object, leading to holistic
processing of that object as well (Richler et al. 2009a). Additionally, although holistic processing is
robust to strategic, top-down control, it can be modulated by perceptually-driven manipulations
of attentional resources (Curby et al. 2012; Gao et al. 2011). This suggests that holistic process-
ing itself is the outcome of an attentional strategy, and may explain the fact that we see impaired
holistic processing in CP for non-face stimuli, as well.
The idea that holistic processing of faces can be understood within the context of domain-general
attentional processes is supported by a composite task study by Curby et al. (2013). In that study,
766 Behrmann, Richler, Avidan, and Kimchi

face parts were always presented in an aligned format. Square regions surrounding the two face
halves were either the same color and aligned, or different colors and misaligned. Remarkably,
this manipulation led to a decrease in holistic processing that was similar in magnitude to that
observed when face parts themselves are misaligned. In other words, discouraging the grouping
of face parts by disrupting classic Gestalt cue of common region reduced holistic processing in the
same manner as physically misaligning the face parts.

Holistic Processing as Interactivity between


Features and Configuration
Another possible way in which interactivity might emerge is one in which the features themselves
are processed independently (Macho and Leder 1998; Rossion et al. 2012), and holistic processing
is the result of interactive processing of features and configuration (Amishav and Kimchi 2010;
Kimchi and Amishav 2010; Wenger and Townsend 2006). Support for this view comes from a
study based on the Garner’s speeded classification task (Garner 1974). In this paradigm, observers
classify faces based on a single dimension that could be either configural (inter-eyes and nose-
mouth spacing) or featural (shape of eyes, nose, and mouth) while ignoring the other dimension
which remains constant in some blocks (baseline) or varies independently in others (filtering)
(see Figure 37.3a). Critically, the relationship between the two dimensions is inferred from the
relative performance across these two conditions. Equal performance in the baseline and filtering
conditions indicates perfect selective attention to the relevant dimension, and the dimensions are
considered separable. Poorer performance in the filtering than in the baseline condition—Garner
interference—indicates that participants could not selectively attend to one dimension without
being influenced by irrelevant variation in another dimension, and the dimensions are considered
integral. Using this paradigm, Amishav and Kimchi (2010) documented that normal participants
exhibited symmetric Garner interference: they could not selectively attend to the features with-
out interference from irrelevant variation in the configuration, nor could they attend to the con-
figuration without interference from irrelevant variation in the features and both ‘interference’
effects were comparable in magnitude. These findings indicate that features and configuration are
perceptually integral in the processing of upright faces and cannot be processed independently.
Interestingly, when only face features were manipulated, participants were able to attend to vari-
ation in one feature (e.g., nose) and ignore variation in another feature (e.g., mouths), provid-
ing further support for the notion that features are perceptually separable. However, when faces
were inverted, an asymmetrical Garner interference was observed such that participants could
attend the features while ignoring configuration but not vice versa, thus showing evidence for the
dominance of featural information in inverted compared to upright face. Taken together, these
experiments provide support for the notion that holistic processing, indexed by the combined
integration of features and their configuration, is dominant only for upright faces.
In a recent study, Kimchi et al. (2012) adopted Amishav and Kimchi’s (2010) version of
Garner’s speeded classification task and applied it to individuals with congenital prosopagnosia,
along with matched control participants. This study replicated the finding that normal observers
evince symmetric Garner interference for upright faces as revealed by the failure to selectively
attend to features without being influenced by irrelevant variation in configuration, and vice
versa, indicating that featural and configural information are integral in normal upright face pro-
cessing (see Figure 37.3b, 37.c). In contrast, the prosopagnosics showed no Garner interference:
they were able to attend to configural information without interference from irrelevant variation
in featural information, and they were able to attend to featural information without interfer-
ence from irrelevant variation in configural information. The absence of Garner interference in
Holistic Face Perception 767

(a)

A B

C D

(b) (c)
CP Matched CP Matched
1000 controls 16 controls
950 14
Baseline Baseline
Response time (ms)

Filtering 12 Filtering
900
Error rate (%)

10
850 8

800 6
4
750
2
700 0
Configural

Configural

Configural

Configural
judgment

judgment

judgment

judgment
judgment

judgment

judgment

judgment
Featural

Featural

Featural

Featural

Fig. 37.3  (a) The stimulus set used used in Amishav and Kimchi (2010) and Kimchi et al. 2012. Faces
in each row (Faces A and B and Faces C and D) vary in their configural information (inter-eyes and
nose-mouth distance) but have the same components (eyes, nose, and mouth). Faces in each column
(Faces A and C and Faces B and D) vary in their components (eyes, nose, and mouth) but have the
same configural information (inter-eyes and nose-mouth distance).
Reproduced from Psychonomic Bulletin & Review, 17(5), pp. 743–748, Perceptual integrality of componential and
configural information in faces, Rama Amishav and Ruth Kimchi, doi: 0.3758/PBR.17.5.743 Copyright © 2010,
Springer-Verlag. With kind permission from Springer Science and Business Media.

prosopagnosics provides strong evidence that featural information and configural information
are perceptually separable and processed independently by individuals with congenital prosop-
agnosia implying that, in contrast with normal observers, these individuals do not perceive faces
holistically.
768 Behrmann, Richler, Avidan, and Kimchi

The finding that information about the parts and information about the configuration of a face
are available is also noted in fMRI and electrophysiological recording that indicate the existence of
both whole-, and part-based representations in face-selective regions of the human and monkey
brain (Harris and Aguirre 2008, 2010) suggesting that part-based and holistic neural tuning are
possible in face-selective regions such as the right fusiform gyrus, further suggesting that such
tuning is surprisingly flexible and dynamic. Similar findings have been uncovered in studies with
non-human primates (Freiwald et al. 2009). Holistic processing is largely attenuated when only
high spatial frequencies are preserved in the stimulus (Goffaux 2009; Goffaux and Rossion 2006)
(but see Cheung et al. 2008, who found equal holistic processing for LSF and HSF faces). However,
a face in high spatial frequencies is still well detected as being a face by the observers, suggesting
again that detecting a face (and presumably activating the template representation of an upright
face) may not be enough to involve holistic processing. More recently, evidence indicated that
holistic processing might depend on the availability of discriminative local feature information
(Goffaux et al. 2012).
Before we conclude, we draw some speculative observations about the mechanisms we have
considered and their possible generality. We have articulated a perspective in which face parts
are processed holistically and in which, over the course of experience, this integrated processing
becomes more automatized. Similar mechanisms may play out in other visual domains as well at
both lower and higher levels of the visual system where context (co-occurrence of other infor-
mation) is present. For example, similar discussion about holistic processing is present in the
literature about crowding and the need and difficulty to extract individual components from the
multiplicity of items; debates about the inability to attend to only a part and whether this affects
the perception of the whole are rife in that field too (Oliva and Torralba 2007). Finally, discus-
sions about context in scene perception have a similar flavor and so we tentatively suggest that
similar mechanisms in which higher-order statistics are derived from the input, especially with
greater experience, may be at play throughout the visual system (e.g., Bar and Aminoff 2003).

Conclusions
There is abundant behavioral evidence that face recognition is holistic based on effects that are
observed in faces but not non-face objects in normal observers, and that are absent in patient
groups characterized by face recognition deficits. But there remains disagreement about what
mechanisms are responsible. Of course, what it means for face recognition to be ‘holistic’ need not
be all-or-none. Here, we have argued against the holistic representation view that, in the extreme,
posits that faces are represented as undifferentiated wholes with no explicit representation of indi-
vidual features. However, ‘more-than-features’ can take on more graded meanings. For example,
spatial relations between face features may be explicitly represented and used in addition to infor-
mation about the features themselves.
It is also important to note that the alternatives to the extreme holistic representation view that
we have proposed here—automatic attentional strategy account and the interactive account—are
not mutually exclusive. For example, proponents of the view that holistic processing is the result
of interactivity between features and configuration often describe face features as being processed
in parallel (Kimchi and Amishav 2010; Macho and Leder 1998; see also Fific and Townsend 2010),
which may be consistent with the notion that attention is automatically deployed to the entire
face at the same time (Richler et al. 2011d). Importantly, certain aspects of these two accounts
need to be empirically reconciled. For example, the classic finding in the composite task (used to
Holistic Face Perception 769

support the automatic attentional strategy account) is that participants cannot selectively attend
to one face half (e.g., Richler et al. 2008), but in the Garner paradigm (used to support the inter-
active account) participants are able to make classification judgments based on one feature while
successfully ignoring other features (Amishav and Kimchi 2010). Moreover, the failures of selec-
tive attention documented in the composite task are also observed for inverted faces (Richler
et al. 2011c), but interactivity of features and configuration assessed in the Garner paradigm are
specific to upright faces (Kimchi and Amishav 2010). Thus, the two paradigms lead to different
conclusions about whether processing differences between upright and inverted faces are qualita-
tive vs. quantitative. One potential reason for these discrepancies is that the coarse parts used in
the composite task (full face halves) contain both feature changes (e.g., a different bottom part will
have a different mouth) but also subtle configural changes, whereas in the Garner paradigm used
by Amishav and Kimchi (2010) feature and configural information are fully isolated and manipu-
lated independently. An exciting avenue for future research is to explore how these two lines of
work and the theoretical accounts they support come together to explain normal face perception.

Acknowledgements
The preparation of this chapter and the associated research was supported by a grant from the
National Science Foundation to MB (BCS0923763), by a grant from the Temporal Dynamics of
Learning Center, SBE0542013 (G. Cottrell), and by a grant from the Israeli Science Foundation
(ISF, 384/10) to GA.

References
Amishav, R., and Kimchi, R. (2010). ‘Perceptual integrality of componential and configural information in
faces’. Psychon Bull Rev 17(5): 743–8.
Avidan, G., Tanzer, M., and Behrmann, M. (2011). ‘Impaired holistic processing in congenital
prosopagnosia’. Neuropsychologia 49(9): 2541–52. doi: 10.1016/j.neuropsychologia.2011.05.002.
Bar, M., and Aminoff, E. (2003). ‘Cortical analysis of visual context’. Neuron 38(2): 347–58.
Barton, J. J. S. (2009). ‘What is meant by impaired configural processing in acquired prosopagnosia?’
Perception 38(2): 242–60.
Barton, J. J. S., and Cherkasova, M. V. (2005). ‘Impaired spatial coding within objects but not between
objects in prosopagnosia’. Neurology 65(2): 270–4.
Barton, J. J. S., Press, D. Z., Keenan, J. P., and O’Connor, M. (2002). ‘Lesions of the fusiform face area
impair perception of facial configuration in prosopagnosia’. Neurology 58: 71–8.
Barton, J. J. S., Cherkasova, M. V., Press, D. Z., Intriligator, J. M., and O’Connor, M. (2003).
‘Developmental prosopagnosia: a study of three patients’. Brain Cogn 51(1): 12–30.
Basso, M. R., Schefft, B. K., Ris, M. D., and Dember, W. N. (1996). ‘Mood and global-local visual
processing’. Journal of the International Neuropsychological Society 2(3): 249–55.
Behrmann, M., Avidan, G., Leonard, G. L., Kimchi, R., Luna, B., Humphreys, K., and Minshew, N.
(2006). ‘Configural processing in autism and its relationship to face processing’. Neuropsychologia
44(1): 110–29.
Bukach, C. M., Bub, D. N., Gauthier, I., and Tarr, M. J. (2006).‘ Perceptual expertise effects are not all
or none: spatially limited perceptual expertise for faces in a case of prosopagnosia’. J Cogn Neurosci
18(1): 48–63.
Bukach, C. M., Le Grand, R., Kaiser, M. D., Bub, D. N., and Tanaka, J. W. (2008). ‘Preservation of mouth
region processing in two cases of prosopagnosia’. J Neuropsychol 2(Pt 1): 227–44.
770 Behrmann, Richler, Avidan, and Kimchi

Busigny, T., Joubert, S., Felician, O., Ceccaldi, M., and Rossion, B. (2010). ‘Holistic perception of
the individual face is specific and necessary: evidence from an extensive case study of acquired
prosopagnosia’. Neuropsychologia 48(14): 4057–92. doi: 10.1016/j.neuropsychologia.2010.09.017.
Cabeza, R., and Kato, T. (2000). ‘Features are also important: contributions of featural and configural
processing to face recognition’. Psychol Sci 11(5): 429–33.
Caldara, R., Schyns, P., Mayer, E., Smith, M. L., Gosselin, F., and Rossion, B. (2005). ‘Does Prosopagnosia
Take the Eyes Out of Face Representations? Evidence for a Defect in Representing Diagnostic Facial
Information following Brain Damage’. J Cogn Neurosci 17(10): 1652–66.
Cheung, O. S., Richler, J. J., Palmeri, T. J., and Gauthier, I. (2008). ‘Revisiting the Role of Spatial
Frequencies in the Holistic Processing of Faces’. Journal of Experimental Psychology: Human Perception
and Performance 34(6): 1327–36.
Curby, K. M., and Gauthier, I. (2009). ‘The temporal advantage for individuating objects of
expertise: perceptual expertise is an early riser’. J Vis 9(6): 7.1-13. doi: 10.1167/9.6.7.
Curby, K. M., Johnson, K. J., and Tyson, A. (2012). ‘Face to face with emotion: holistic
face processing is modulated by emotional state’. Cognition and Emotion 26(1): 93–102.
doi: 10.1080/02699931.2011.555752.
Curby, K. M., Goldstein, R. R., and Blacker, K. (2013). ‘Disrupting perceptual grouping of face parts
impairs holistic face processing’. Atten Percept Psychophys 75(1): 83–91. doi: 10.3758/s13414-012-0386-9.
Davidoff, J., and Donnelly, N. (1990). ‘Object superiority: A comparison of complete and part probes’. Acta
Psychologica 73: 225–43.
de Gelder, B., and Rouw, R. (2000a). ‘Configural face processes in acquired and developmental
prosopagnosia: evidence for two separate face systems?’ NeuroReport 11(14): 3145–50.
de Gelder, B., and Rouw, R. (2000b). ‘Paradoxical configuration effects for faces and objects in
prosopagnosia’. Neuropsychologia 38(9): 1271–9.
de Gelder, B., Bachoud-Levi, A. C., and Degos, J. D. (1998). ‘Inversion superiority in visual agnosia may be
common to a variety of orientation polarised objects besides faces’. Vision Research 38(18): 2855–61.
de-Wit, L. and Wagemans, J. (in press). Individual differences in local and global perceptual organization. In
Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
DeGutis, J., DeNicola, C., Zink, T., McGlinchey, R., and Milberg, W. (2011). ‘Training with own-race
faces can improve processing of other-race faces: evidence from developmental prosopagnosia’.
Neuropsychologia 49(9): 2505–13. doi: 10.1016/j.neuropsychologia.2011.04.031.
DeGutis, J., Cohan, S., Mercado, R. J., Wilmer, J., and Nakayama, K. (2012). ‘Holistic processing of the
mouth but not the eyes in developmental prosopagnosia’. Cognitive Neuropsychology 29(5–6): 419–46.
doi: 10.1080/02643294.2012.754745.
DeGutis, J., Wilmer, J., Mercado, R. J., and Cohan, S. (2013). ‘Using regression to measure holistic face
processing reveals a strong link with face recognition ability’. Cognition, 126(1), 87–100. doi: 10.1016/j.
cognition.2012.09.004.
Diamond, R., and Carey, S. (1986). ‘Why faces are and are not special: An effect of expertise’. Journal of
Experimental Psychology: General 115: 107–17.
Duchaine, B., and Nakayama, K. (2006). The Cambridge Face Memory Test: Results for neurologically
intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic
participants. Neuropsychologia, 44(4), 576–585.
Duchaine, B., Yovel, G., and Nakayama, K. (2007). ‘No global processing deficit in the Navon task in 14
developmental prosopagnosics’. Soc Cogn Affect Neurosci 2(2): 104–13. doi: 10.1093/scan/nsm003.
Farah, M. J., Tanaka, J. W., and Drain, H. M. (1995). ‘What causes the face inversion effect?’ Journal of
experimental psychology. Human perception and performance 21(3): 628–34.
Farah, M. J., Wilson, K. D., Drain, M., and Tanaka, J. W. (1998). ‘What is “special” about face perception?’
Psychol Rev 105(3): 482–98.
Holistic Face Perception 771

Fific, M., and Townsend, J. T. (2010). ‘Information-processing alternatives to holistic


perception: identifying the mechanisms of secondary-level holism within a categorization paradigm’.
J Exp Psychol Learn Mem Cogn 36(5): 1290–313. doi: 10.1037/a0020123.
Freiwald, W. A., Tsao, D. Y., and Livingstone, M. S. (2009). ‘A face feature space in the macaque temporal
lobe’. Nature Neuroscience 12(9): 1187–96. doi: 10.1038/nn.2363.
Gao, Z., Flevaris, A. V., Robertson, L. C., and Bentin, S. (2011). ‘Priming global and local processing of
composite faces: Revisiting the processing-bias effect on face perception’. Attention, Perception and
Psychophysics 73: 1477–86.
Garner, W. R. (1974). The Processing of Information and Structure. (Hillsdale, NJ: Erlbaum).
Gauthier, I., and Bukach, C. (2007). ‘Should we reject the expertise hypothesis?’ Cognition 103(2): 322–30.
doi: 10.1016/j.cognition.2006.05.003.
Goffaux, V. (2009). ‘Spatial interactions in upright and inverted faces: re-exploration of spatial scale
influence’. Vision Research 49(7): 774–81. doi: 10.1016/j.visres.2009.02.009.
Goffaux, V., and Rossion, B. (2006). ‘Faces are “spatial”—holistic face perception is supported by low spatial
frequencies’. Journal of experimental psychology. Human perception and performance 32(4): 1023–39.
doi: 10.1037/0096-1523.32.4.1023.
Goffaux, V., Schiltz, C., Mur, M., and Goebel, R. (2012). ‘Local discriminability determines the
strength of holistic processing for faces in the fusiform face area’. Front Psychol 3: 604. doi: 10.3389/
fpsyg.2012.00604.
Harris, A., and Aguirre, G. K. (2008). The representation of parts and wholes in face-selective cortex.
Journal of Cognitive Neuroscience, 20(5), 863–878. doi: 10.1162/jocn.2008.20509
Harris, A., and Aguirre, G. K. (2010). ‘Neural tuning for face wholes and parts in human fusiform gyrus
revealed by FMRI adaptation’. Journal of Neurophysiology 104(1): 336–45. doi: 10.1152/jn.00626.2009.
Hayward, W. G., Rhodes, G., and Schwaninger, A. (2008). ‘An own-race advantage for components
as well as configurations in face recognition’. Cognition 106(2): 1017–27. doi: 10.1016/j.
cognition.2007.04.002.
Hole, G. J. (1994). ‘Configurational factors in the perception of unfamiliar faces’. Perception 23: 65–74.
Humphreys, K., Avidan, G., and Behrmann, M. (2007). ‘A detailed investigation of facial expression
processing in congenital prosopagnosia as compared to acquired prosopagnosia’. Experimental Brain
Research 176(2): 356–73.
Kimchi, R., and Amishav, R. (2010). ‘Faces as perceptual wholes: The interplay between component and
configural properties in face processing’. Visual Cognition 18(7): 1034–62.
Kimchi, R., Avidan, G., Behrmann, M., and Amishav, R. (2012). ‘Perceptual separability of featural and
configural information in congenital prosopagnosia’. Cognitive Neuropsychology 5–6: 447–63.
Koenderink, J. (in press). ‘Gestalts as ecological templates. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Konar, Y., Bennett, P. J., and Sekuler, A. B. (2010). ‘Holistic processing is not correlated with
face-identification accuracy’. Psychological Science 21(1): 38–43. doi: 10.1177/0956797609356508.
Le Grand, R., Cooper, P. A., Mondloch, C. J., Lewis, T. L., Sagiv, N., de Gelder, B., and Maurer, D.
(2006). ‘What aspects of face processing are impaired in developmental prosopagnosia?’ Brain Cogn
16(11): 1584–94.
Levine, D. N., and Calvanio, R. (1989). ‘Prosopagnosia: a defect in visual configural processing’. Brain Cogn
10(2): 149–70.
Lobmaier, J. S., Bolte, J., Mast, F. W., and Dobel, C. (2010). ‘Configural and featural processing in humans
with congenital prosopagnosia’. Advances in cognitive psychology / University of Finance and Management
in Warsaw 6: 23–34. doi: 10.2478/v10053-008-0074-4.
Loftus, G. R., Oberg, M. A., and Dillon, A. M. (2004). ‘Linear theory, dimensional theory, and the
face-inversion effect’. Psychological Review 111: 835–62.
772 Behrmann, Richler, Avidan, and Kimchi

Macho, S., and Leder, H. (1998). ‘Your eyes only? A test of interactive influence in the processing of facial
features’. Journal of experimental psychology. Human perception and performance 24(5): 1486–500.
Macrae, C. N., and Lewis, H. L. (2002). ‘Do I know you? Processing orientation and face recognition’.
Psychological Science 13(2): 194–6.
Maurer, D., Le Grand, R., and Mondloch, C. J. (2002). ‘The many faces of configural processing’. TRENDS
in Cognitive Sciences 6(6): 255–60.
McGugin, R. W., Richler, J. J., Herzmann, G., Speegle, M., and Gauthier, I. (2012). ‘The Vanderbilt
Expertise Test reveals domain-general and domain-specific sex effects in object recognition’. Vision
Research 69: 10–22. doi: 10.1016/j.visres.2012.07.014.
McKone, E. (2008). ‘Configural processing and face viewpoint’. Journal of experimental psychology. Human
perception and performance 34(2): 310–27. doi: 10.1037/0096-1523.34.2.310.
McKone, E., Crookes, K., Jeffery, L., and Dilks, D. D. (2012). ‘A critical review of the development of
face recognition: Experience is less important than previously believed’. Cognitive Neuropsychology.
doi: 10.1080/02643294.2012.660138.
Michel, C., Rossion, B., Han, J., Chung, C. S., and Caldara, R. (2006). ‘Holistic processing is finely tuned
for faces of one’s own race’. Psychological Science 17(7): 608–15. doi: 10.1111/j.1467-9280.2006.01752.x.
Mondloch, C. J., Elms, N., Maurer, D., Rhodes, G., Hayward, W. G., Tanaka, J. W., and Zhou, G.
(2010).‘Processes underlying the cross-race effect: an investigation of holistic, featural, and relational
processing of own-race versus other-race faces’. Perception 39(8): 1065–85.
Navon, D. (1977). ‘Forest before trees: The precedence of global features in visual perception’. Cognitive
Psychology 9(3): 353–83.
Oliva, A., and Torralba, A. (2007). ‘The role of context in object recognition’. TRENDS in Cognitive Sciences
11(12): 520–7. doi: 10.1016/j.tics.2007.09.009.
Orban de Xivry, J. J., Ramon, M., Lefevre, P., and Rossion, B. (2008). ‘Reduced fixation on the upper area
of personally familiar faces following acquired prosopagnosia’. J Neuropsychol 2(Pt 1): 245–68.
Palermo, R., Willis, M. L., Rivolta, D., McKone, E., Wilson, C. E., and Calder, A. J. (2011). ‘Impaired
holistic coding of facial expression and facial identity in congenital prosopagnosia’. Neuropsychologia
49(5): 1226–35. doi: 10.1016/j.neuropsychologia.2011.02.021.
Ramon, M., and Rossion, B. (2010). ‘Impaired processing of relative distances between features
and of the eye region in acquired prosopagnosia—two sides of the same holistic coin?’ Cortex;
a journal devoted to the study of the nervous system and behavior 46(3): 374–389. doi: 10.1016/j.
cortex.2009.06.001.
Ramon, M., Busigny, T., and Rossion, B. (2010). ‘Impaired holistic processing of unfamiliar
individual faces in acquired prosopagnosia’. Neuropsychologia 48(4): 933–44. doi: 10.1016/j.
neuropsychologia.2009.11.014.
Rhodes, G., Hayward, W. G., and Winkler, C. (2006). ‘Expert face coding: configural and component
coding of own-race and other-race faces’. Psychonomic Bulletin and Review 13(3): 499–505.
Richler, J. J., Tanaka, J. W., Brown, D. D., and Gauthier, I. (2008). ‘Why does selective attention to parts fail
in face processing?’ J Exp Psychol Learn Mem Cogn 34(6): 1356–68. doi: 10.1037/a0013080.
Richler, J. J., Bukach, C. M., and Gauthier, I. (2009a). ‘Context influences holistic processing of nonface
objects in the composite task’. Atten Percept Psychophys 71(3): 530–40. doi: 10.3758/APP.71.3.530.
Richler, J. J., Mack, M. L., Gauthier, I., and Palmeri, T. J. (2009b). ‘Holistic processing of faces happens at a
glance’. Vision Research 49(23): 2856–61. doi: 10.1016/j.visres.2009.08.025.
Richler, J. J., Cheung, O. S., and Gauthier, I. (2011a). ‘Beliefs alter holistic face processing . . . if response
bias is not taken into account’. J Vis 11(13): 17. doi: 10.1167/11.13.17.
Richler, J. J., Cheung, O. S., and Gauthier, I. (2011b). ‘Holistic processing predicts face recognition’.
Psychological Science 22(4): 464–71. doi: 10.1177/0956797611401753.
Holistic Face Perception 773

Richler, J. J., Mack, M. L., Palmeri, T. J., and Gauthier, I. (2011c). ‘Inverted faces are (eventually) processed
holistically’. Vision Research 51(3): 333–42. doi: 10.1016/j.visres.2010.11.014.
Richler, J. J., Wong, Y. K., and Gauthier, I. (2011d). ‘Perceptual Expertise as a Shift from Strategic
Interference to Automatic Holistic Processing’. Current Directions in Psychological Science 20(2): 129–34.
doi: 10.1177/0963721411402472.
Richler, J. J., Palmeri, T. J., and Gauthier, I. (2012). ‘Meanings, mechanisms, and measures of holistic
processing’. Front Psychol 3: 553. doi: 10.3389/fpsyg.2012.00553.
Richler, J. J., and Gauthier, I. (2013). ‘When intuition fails to align with data: A reply to Rossion (2013)’.
Visual Cognition 21(2): 254–76.
Riesenhuber, M., Jarudi, I., Gilad, S., and Sinha, P. (2004). ‘Face processing in humans is compatible with a
simple shape-based model of vision’. Proc Biol Sci 271 Suppl 6: S448–450. doi: 10.1098/rsbl.2004.0216.
Rivest, J., Moscovitch, M., and Black, S. (2009). ‘A comparative case study of face recognition: the
contribution of configural and part-based recognition systems, and their interaction’. Neuropsychologia
47(13): 2798–811. doi: 10.1016/j.neuropsychologia.2009.06.004.
Robbins, R., and McKone, E. (2003). ‘Can holistic processing be learned for inverted faces?’ Cognition
88: 79–107.
Robbins, R., and McKone, E. (2007). ‘No face-like processing for objects-of-expertise in three behavioural
tasks’. Cognition 103(1): 34–79. doi: 10.1016/j.cognition.2006.02.008.
Rosch, E. H., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. (1976). ‘Basic objects in
natural categories’. Cognitive Psychology 8: 382–439.
Rossion, B. (2013). ‘The composite face illusion: A whole window into our understanding of holistic face
perception’. Visual Cognition 21(2): 139–253.
Rossion, B., and Boremanse, A. (2008). ‘Nonlinear relationship between holistic processing of individual
faces and picture-plane rotation: evidence from the face composite illusion’. J Vis 8(4): 3 1–13.
doi: 10.1167/8.4.3.
Rossion, B., Kaiser, M. D., Bub, D., and Tanaka, J. W. (2009). ‘Is the loss of diagnosticity of the eye region
of the face a common aspect of acquired prosopagnosia?’ J Neuropsychol 3(Pt 1): 69–78.
Rossion, B., Prieto, E. A., Boremanse, A., Kuefner, D., and Van Belle, G. (2012). ‘A steady-state visual
evoked potential approach to individual face perception: Effect of inversion, contrast-reversal and
temporal dynamics’. NeuroImage 63(3): 1585–1600. doi: 10.1016/j.neuroimage.2012.08.033.
Saumier, D., Arguin, M., and Lassonde, M. (2001). ‘Prosopagnosia: a case study involving problems in
processing configural information’. Brain Cogn 46(1–2): 255–9.
Schwaninger, A., Lobmaier, J. S., Wallraven, C., and Collishaw, S. (2009). ‘Two routes to face
perception: evidence from psychophysics and computational modeling’. Cognitive Science 33(8): 1413–40.
doi: 10.1111/j.1551-6709.2009.01059.x.
Schwarzer, G., and Massaro, D. W. (2001). ‘Modeling face identification processing in children and adults’.
Journal of Experimental Child Psychology 79(2): 139–61. doi: 10.1006/jecp.2000.2574.
Sekuler, A. B., Gaspar, C. M., Gold, J. M., and Bennett, P. J. (2004). ‘Inversion leads to quantitative, not
qualitative, changes in face processing’. Curr Biol 14(5): 391–6.
Susilo, T., McKone, E., Dennett, H., Darke, H., Palermo, R., Hall, A., . . . Rhodes, G. (2010). ‘Face
recognition impairments despite normal holistic processing and face space coding: evidence from a case
of developmental prosopagnosia’. Cogn Neuropsychol 27(8): 636–64. doi: 10.1080/02643294.2011.613372.
Tanaka, J. W., and Farah, M. J. (1993). ‘Parts and wholes in face recognition’. Quarterly Journal of
Experimental Psychology 46A: 225–45.
Tanaka, J. W., and Farah, M. J. (2003). ‘The holistic representation of faces’. In Analytic and Holistic
Processes in Perception of Faces, Objects and Scenes, edited by G. Rhodes and M. A. Peterson.
(New York: Oxford University Press).
774 Behrmann, Richler, Avidan, and Kimchi

Tanaka, J. W., and Sengco, J. A. (1997). ‘Features and their configuration in face recognition’. Mem Cognit
25(5): 583–92.
Troje, N., and Bülthoff, H. H. (1996). ‘Face recognition under varying poses: The role of texture and shape’.
Vision Research 36: 1761–71.
Wang, R., Li, J., Fang, H., Tian, M., and Liu, J. (2012). ‘Individual differences in holistic processing predict
face recognition ability’. Psychological Science 23(2): 169–77. doi: 10.1177/0956797611420575.
Wenger, M. J., and Townsend, J. T. (2006). ‘On the costs and benefits of faces and words: process
characteristics of feature search in highly meaningful stimuli’. Journal of experimental psychology.
Human perception and performance 32(3): 755–79. doi: 10.1037/0096-1523.32.3.755.
Wong, Y. K., and Gauthier, I. (2010). ‘Holistic processing of musical notation: Dissociating failures of
selective attention in experts and novices’. Cognitive, affective and behavioral neuroscience 10(4): 541–51.
doi: 10.3758/CABN.10.4.541.
Yin, R. K. (1969). ‘Looking at upside-down faces’. Journal of Experimental Psychology 81: 141–5.
Young, A. W., Hellawell, D., and Hay, D. C. (1987). ‘Configurational information in face perception’.
Perception 16: 747–59.
Yovel, G., and Duchaine, B. (2006). ‘Specialized face perception mechanisms extract both part and spacing
information: evidence from developmental prosopagnosia’. Journal of Cognitive Neuroscience 18(4): 580–93.
doi: 10.1162/jocn.2006.18.4.580.
Yovel, G., and Kanwisher, N. (2004). ‘Face perception: domain specific, not process specific’. Neuron
44(5): 889–98.
Chapter 38

Binocular rivalry and perceptual


ambiguity
David Alais and Randolph Blake

Introduction and Background


Humans possess the impressive ability to achieve coherent and reliable perception of the exter-
nal world. Remarkably, this achievement is realized despite the relatively low resolution of the
retinal images, images that are inherently two-dimensional and often under-represent what one
is actually looking at. Consequently, many important aspects of objects and scenes are funda-
mentally ambiguous at the input stage to vision, including size, distance, depth ordering, shape,
and color. The general reliability of visual perception is striking given that not all pieces of the
puzzle are present in the retinal input. To overcome this limitation, perception relies on percep-
tual organization (Wertheimer 1923) and knowledge about the likely properties of the external
world acquired through evolution or learned from experience to make ‘unconscious inferences’
(von Helmholtz 1925) about the world we live in. Thanks to these processes, we are generally able
to construct a plausible interpretation of the world from the ambiguous and incomplete retinal
image. Circumstances may arise, however, that defeat the brain’s ability to infer a single coherent
percept (Leopold and Logothetis 1999). In cases where more than one plausible percept is pos-
sible, the competing perceptual interpretations alternate over time in an irregular fashion each
second or so, as the reader can experience by viewing a well-known ambiguous figure known as
the Necker cube (Figure 38.1a). This class of phenomenon, generally labelled bistable perception,
reveals the competition or ‘rivalry’ that occurs when the perceptual system is confronted with
ambiguous visual information (e.g., Blake and Logothetis 2002). As well as competition, bistable
perception also reveals a key role for inhibition, as the competing percepts are mutually exclu-
sive: only one interpretation is visible at a time, with the other being suppressed from perceptual
awareness.
Examples of bistable perception are found in many areas of vision including 3D perspective,
figure/ground organization, binocular rivalry (Wheatstone 1838), and new varieties discovered
in motion (e.g., Hupe and Rubin 2003), perception of human action (Vanrie et al. 2004) and
stereo-depth organization (van Ee et al. 2003). Other modalities, too, must deal with stimulus
uncertainty. Conflicting dichoptic auditory messages also compete for dominance, creating bin-
aural rivalry (Brancucci and Tommasi 2011). Tone sequences that can be perceptually grouped
into two distinct patterns produce auditory bistability (e.g., Pressnitzer and Hupe 2006). In the
tactile domain, rivalry occurs when vibrotactile sequences supporting two interpretations are
applied to a finger tip (Carter et al. 2008). See chapters by Denham and Winkler (this volume)
and Kappers and Bergmann Tiest (this volume) for further discussion of perceptual ambiguity
in the auditory and tactile domains, respectively. In general, fluctuations in perception seem
to be the rule when sensory input is ambiguous. The phenomenology of all forms of bistable
776 Alais and Blake

(a) Examples of bistable stimuli

Necker cube Schroeder’s stairs Rubin’s vase/face

(b) Binocular rivalry Bistable dynamics

Video t
monitor
Percept durations

Mirror stereoscope
or LCD shutters
Gamma
Frequency

distribution

Percept durations
Fig. 38.1  (a) Examples of perceptually ambiguous stimuli. Inspecting any of these figures will elicit
perceptual alternations between two roughly equally probable interpretations. The first two stimuli
are examples of ambiguous perspective that can arise when three-dimensional forms are rendered as
two-dimensional images, as commonly occurs in the retinal image of the external world. Over time,
the two perspectives or ‘view points’ alternate. The third example shows an instance of ambiguous
segregation between figure and ground. A vase is perceived when the white region is interpreted as
figure, or as two faces in profile when the black region is interpreted as figure. (b) Binocular rivalry is
a very actively researched form of ambiguous perception. Separate images are presented to the eyes,
usually by means of a mirror stereoscope. Any significant interocular difference in orientation, color,
texture, movement, etc. will suffice to trigger binocular rivalry, which is experienced as a series of
irregular perceptual alternations over time as first one image is perceived and then the other. While
one image is perceived, the other is suppressed from visual awareness. A given image therefore
undergoes periods of dominance and suppression. All forms of bistable perception produce a
skewed gamma-like distribution when the durations of many dominance periods are pooled. For
binocular rivalry, the peak of this distribution typically would be around 2–3 seconds, with occasional
longer dominance periods.

perception is broadly similar in that all involve exclusive alternations between the competing
perceptual interpretations. One common hallmark is the apparent randomness of the alter-
nations between competing interpretations, as evidenced by the gamma-like, skewed normal
frequency histograms of dominance durations (Fox and Herrmann 1967) (see Figure 38.1b).
Binocular Rivalry and Perceptual Ambiguity 777

Several studies have shown that diverse instances of perceptual rivalry all exhibit this pattern of
temporal dynamics (Carter and Pettigrew 2003; Long and Toppino 2004; Brascamp et al. 2005;
van Ee 2005; O’Shea et al. 2009), suggesting that it may be a general characteristic of bistable
perception.
In this chapter we focus on the most widely studied form of bistable perception, binocular
rivalry (Blake 2001; Tong 2001; Blake and Logothetis 2002; Alais and Blake 2005). We begin by
describing the basic properties of binocular rivalry, and then review work on rivalry relating to
perceptual organization, including figure/ground segregation and perceptual grouping. The sec-
ond half of the chapter broadens the scope by discussing the role of attention in binocular rivalry
and considering the impact of top-down and contextual influences. Broader still, the final section
examines recent work studying binocular rivalry in a multisensory context.

Binocular Rivalry
Binocular rivalry is a compelling bistable phenomenon first systematically studied by
Wheatstone (1838) following his invention of the mirror stereoscope. Binocular rivalry occurs
when each eye views incompatible images at the same retinal location, where ‘incompatible’
means stimuli sufficiently different to prevent a binocular match. This can be easily achieved
in the laboratory using a mirror stereoscope to present a different image to each eye, as shown
in Figure 38.1b. Perceptually, binocular rivalry is experienced as seemingly random fluctua-
tions in dominance between one image and the other that continue as long as the dissimilar
images are viewed. For stimuli of similar salience, these stochastic fluctuations tend to even
out over time so that each image is seen equally often during extended viewing. Stimulus
salience in binocular rivalry is largely governed by low-level stimulus properties, such as con-
trast, luminance, and orientation, with a relatively small but demonstrable role for high-level
stimulus factors such as attention and context (reviewed later in the chapter). Generally, while
one image is dominant, little or no trace of the other image is perceived. Interest in binocular
rivalry has increased in recent decades, in part because rivalry allows systematic examination
of processes governing perceptual competition, neural dynamics and selection of the contents
of visual awareness.
Although binocular rivalry has much in common with other forms of bistable perception,
some very important differences set binocular rivalry apart. First, binocular rivalry is unique
in presenting a different stimulus to each eye, whereas other bistable examples involve a single
stimulus viewed binocularly. This interocular conflict disrupts normal binocular vision and trig-
gers binocular rivalry, in part because the conflict interferes with the establishment of binocular
correspondence necessary for stereomatching. Second, the alternations in binocular rivalry are
generally mutually exclusive, such that when one image is perceived the other is completely sup-
pressed. Other forms of bistable perception involve a single stimulus that supports two interpre-
tations, and it is those interpretations that alternate over time while the stimulus itself remains
visible. The Necker cube, for example, elicits bistable alternations of perceived perspective without
any part of cube disappearing from visual awareness. Third, binocular rivalry has a strong local
component, as revealed by the phenomenon of piecemeal rivalry in which large images tend to
alternate as a patchwork (O’Shea et al. 1997). By contrast, other bistable stimuli tend to alternate
globally and do not exhibit obvious ‘piecemeal’ states. There are, however, conditions under which
rivalry behaves globally, and this makes it useful as a tool for studying perceptual organization.
Accordingly, the following sections review basic features of binocular rivalry that illustrate its
links to the principles of perceptual organization.
778 Alais and Blake

Gestalt Organizing Principles in Binocular Rivalry


Figure/Ground Segregation and Binocular Rivalry
One of the primary processes in perceptual organization is figure/ground segregation, the process
by which some regions within the visual image merge perceptually to form objects while remain-
ing regions are treated as the background against which those objects appear. The relationship
between figure and ground is one of occluder and occluded because the figure, in terms of depth
ordering, must be nearer than the background. Surprisingly little work in binocular rivalry has
examined figure/ground organization directly, although it has been widely studied in other con-
texts (see Kogo and Van Ee, this volume). In one old study, Alexander (1951) attempted to weaken
the strength of rivaling figures by using dashed lines instead of continuous contours to portray
shapes and by reducing the lines’ contrast by printing them on gray paper. The rationale was that
these manipulations would reduce ‘figural strength’ and make vigorous rivalry less likely, because
figural strength entails resistance to distortion, impressiveness, internal articulation, density of
energy and symmetry (Koffka 1935). In fact, Alexander did find reduced alterations rates for
the weak figures, but a contemporary interpretation of that finding would focus simply on the
accompanying variations in stimulus contrast: stimuli higher in contrast and greater in contour
strength produce more vigorous rivalry (Levelt 1965), presumably because of contrast-dependent
responses in early cortical areas tuned to orientation. Still, it could be argued that those response
properties in turn contribute to figure/ground relationships.
One reasonable hypothesis arising from figure/ground organization is that stimulus regions
defined as figure should engage more vigorously in rivalry than regions deemed to be back-
ground. This is in line with traditional thinking on figure/ground classification and also squares
with modern thinking about visual processing in which visual objects are extracted from the
visual image and compete for visual attention (Desimone and Duncan 1995), although there is no
direct test of this notion in the published literature on rivalry. A simple test would be to present
dichoptic displays consisting of a small figure region (e.g., red horizontal lines) within a sur-
rounding background region (e.g., green vertical lines), with the reverse pattern in the other eye,
as shown in (Figure 38.2a). More vigorous rivalry for the figure region could be demonstrated in
two ways, by showing that rivalry alternations were faster in the figure region, consistent with the
figure having greater stimulus strength, or by measuring contrast sensitivity to probe stimuli—a
common method for measuring rivalry suppression strength (Fox and Check 1968; Nguyen et al.
2003; Alais and Melcher 2007). The prediction would be that probes presented in the figure region
would show greater threshold elevation during rivalry suppression than probes presented in the
background region.
Although there is little work directly examining the impact of figure/ground organization on
binocular rivalry, several studies have looked at other aspects of visual scene organization. One
examined the salience of different regions of a visual scene by inducing rivalry between a simu-
lated ground plane and a simulated ceiling plane (Ozkan and Braunstein 2009). The ground plane
was a receding checkerboard appearing to incline towards the horizon while the ceiling plane was
a receding checkerboard appearing to decline towards the horizon (Figure 38.2b). Thus, the two
stimuli were identical except for one being a rotated version of the other, and yet the ground plane
tended to predominate over the ceiling plane. Moreover, the ground plane, when suppressed,
returned more quickly to dominance than did the ceiling plane. Other studies have highlighted
the relevance of surface layout, finding that it influences the dynamics of rivalry alternations by
inhibiting false matches between the eyes according to ecological constraints. Other aspects of
Binocular Rivalry and Perceptual Ambiguity 779

(a)

(b)

Fig. 38.2   (a) Figure/ground segregation has not been widely investigated in binocular rivalry. In
this stimulus, the left- and right-eye stimuli contain clearly defined central ‘figure’ regions that are
mismatched in color and orientation, as does the surrounding ‘ground’ region – although with the
inverse arrangement. Perceptual organization prioritizing figure over ground should produce more
vigorous rivalry for the central region, which would manifest as a faster rivalry alternation rate and
stronger suppression of the unseen stimulus – both well-known consequences of increasing stimulus
strength. (b) Perceptual interpretation of the rivaling monocular images can also influence binocular
rivalry. The left image simulates a ground plane and the right image a ceiling plane. Both images
are identical except for a 180° rotation added to one of them, however a ground plane has greater
ecological relevance in our interaction with the world. Consistent with the ground plane having
more salience, it tends to predominate over the ceiling plane in overall dominance duration and
returns to dominance more quickly than the ceiling plane when suppressed.

surface properties such as natural boundary contours (Ooi and He 2006) and the coherence of
surfaces (Ooi and He 2003) influence dynamics and dominance durations in rivalry. As an exam-
ple, continuous or homogenous surfaces tend to dominate over discontinuous images (Ooi and
He 2003).

Perceptual Grouping in Binocular Rivalry


Another fundamental process in perceptual organization is grouping. Unlike the paucity of work
on figure/ground classification in rivalry, a good deal of research has been done on perceptual
grouping. For example, Whittle et  al. (1968) demonstrated grouping by similarity in showing
robust configural effects among multiple, small contour segments when each engaged in rivalry.
Observers tended to see simultaneous dominance of segments that formed an extended line, even
when those segments were presented to different eyes. More dramatic versions of figural grouping
encouraging globally synchronized dominance have been reported by Dorrenhaus (Dorrenhaus
1975), Kovacs et al. (1996) and Alais et al. (2000) which suggest that grouping in rivalry is possible
at a binocular level (Figure 38.3). In a similar vein, Van Lier and De Weert (2003) showed group-
ing by color in binocular rivalry: in a multi-element display, similarly colored features tended to
dominate together. Kim and Blake (2007) showed this also occurs with illusory colors experienced
(a)

(b)

Fig. 38.3  Two examples of rivalry stimuli that engage in large-scale perceptual organization. (a) First
published by Diaz-Caneja in 1928, these two images show a tendency to alternate as globally coherent
patterns, switching between entirely red horizontal lines and entirely green concentric lines. Theories
explaining rivalry as a competition between monocular channels predict that the dominant percept
should never be globally coherent as one or the other of the bipartite monocular stimuli should be
dominant at any given moment. The fact that the dominant percept may become grouped into a
coherent whole shows that perceptual organization can occur interocularly and combine independent
monocular views into perceptual wholes. (b) Dichoptically viewing the upper pair of images produces
rivalrous alternations between the left- and right-eye stimuli. The lower pair also produce left- vs.
right-eye rivalry, but in addition produces periods of rivalry between the coherent images (the monkey
face vs. the page of text) which requires grouping elements from each image simultaneously across
the eyes (Kovacs et al. 1996). These demonstrations show that coherent perceptual organisation can
be imposed on conflicting monocular images when strong Gestalts are present. Because this requires
interocular grouping, it implies a binocular process over-riding earlier interocular suppression.
Reproduced from E. Diaz-Caneja, Sur l’alternance binoculaire, Annales D’Oculistique , 165, pp. 721–31, Copyright
© 1928, The Author.
Reproduced from Ilona Kovács, Thomas V. Papathomas, Ming Yang, and Ákos Fehér, When the brain changes its
mind: Interocular grouping during binocular rivalry, Proceedings of the National Academy of Sciences, USA, 93
(26), pp. 15508–15511, Figures 1 a and b, Copyright (1996) National Academy of Sciences, U.S.A.
Binocular Rivalry and Perceptual Ambiguity 781

by color-graphemic synesthetes. In the domain of motion perception, spatially distributed dots


that move in the manner of a human figure (so-called point-light animations) remain dominant
as an entire figure more often during rivalry than does the same configuration when inverted to
form an upside down figure, or when distributed between the eyes (Watson et al. 2004). Evidently,
conjoint dominance of individual dots is promoted when they form a dynamic and globally
coherent human figure.
The findings summarized above pertain to perceptual grouping among multiple, spatially dis-
tributed elements each engaged in rivalry. Grouping can also occur within a single large-field
stimulus, especially when they contain meaningful spatial structure (Lee and Blake 2004; Alais and
Melcher 2007), although before reviewing this work it is necessary to describe the phenomenon of
‘piecemeal rivalry’. When two small stimuli engage in binocular rivalry, they will usually produce
coherent fluctuations in perception so that either one image or the other dominates entirely. This
is generally true for stimuli subtending a degree or two of visual angle. Rivalry between larger
stimuli, however, tends to fragment into a patchwork of local alternations, with the local patches
appearing to alternate between the left and right eyes’ images independently of each other. This
mosaic of independent local rivalry zones is commonly referred to as ‘piecemeal’ rivalry and is
very common when large images engage in rivalry. Piecemeal rivalry points to the local nature of
rivalry, yet there are also occasions when large stimuli appear to alternate in a coherent or syn-
chronized manner. Clearly some cooperative grouping process is at work in coordinating these
otherwise independent local processes.
The existence of piecemeal rivalry prompts two fundamental questions. First, what determines
the size of local rivalry zones, and second, what are the cooperative processes that promote inter-
actions among these local zones? Regarding the first question, there is good evidence that the
spatial extent of local rivalry zones is governed by the size of receptive fields in early visual cortex.
In central vision, rivalry zones are typically about a degree or so in diameter, however their size
increases with eccentricity at a similar rate to the expanding size of V1/V2 receptive fields with
eccentricity (O’Shea et  al. 1997). This implies that rivalry has a spatial extent governed by the
sizes of receptive fields in early visual cortex and that rivalry alternations are more likely to be
piecemeal when stimuli activate neurons spanning multiple receptive fields. The link with recep-
tive field size also relates to another interesting observation, namely that rivalry appears to have
a minimum size. It has been shown that even when the interocular conflict is limited to a single
point, as when two thin orthogonal lines are viewed dichoptically, there exists a zone of sup-
pression that extends around that point (Kaufman 1963), with the size of the suppression zone
depending on eccentricity. Rivalry therefore appears to be a process that operates locally over an
extent determined by receptive field sizes in early cortex. One advantage of rivalry being local is
that suppression is localized and allows binocular vision to operate normally in any binocularly
congruent regions outside the region of interocular conflict.
The second question prompted by piecemeal rivalry is why independent local rivalry zones
sometimes appear to function synchronously to form global alternations. One study examined
this question by presenting two adjacent gratings to one eye, rivaling with corresponding noise
patches in the other eye (Alais and Blake 1999). Observers tracked rivalry alternations at the two
grating locations and the orientations of the gratings were manipulated over blocks to be either
collinear, orthogonal, or parallel. The perceptual fluctuations reported in the orthogonal condi-
tion were independent, meaning that both gratings occasionally were visible at the same time
but not more often than would be expected by chance alone. In the collinear condition, how-
ever, the gratings were often jointly dominant, significantly more than predicted by independence
(Figure 38.4). This grouping tendency was very strong when the two pairs of rivaling stimuli were
782 Alais and Blake

The association field Travelling waves

Spatial interactions

Local dominance travels as a wave


along collinear contours

Close & collinear; correlated alternations

Too distant to correlate

Correlated dominance re-established

Correlated dominance prevented Radial patterns discourage


travelling waves

Fig. 38.4  When large stimuli engage in rivalry their perceptual alternations are not global but
piecemeal. Instead of coherent oscillations between one whole image and the other, a multitude of
local rivalry zones appears with each appearing to alternate independently of the others. These local
zones of suppression may exhibit coordinated alternation dynamics, especially when adjacent zones
share collinear or near-collinear contours, as illustrated by the ‘association field’. This can be studied
using discrete orientation patches and varying relative orientation and distance. In continuous
stimuli, as shown in the annular stimuli on the right-hand side, these local interactions manifest as
travelling waves of dominance when the orientation is collinear or nearly so. Such a stimulus will first
emerge from suppression in a local region and then smoothly emerge from suppression following
a wave front travelling along the orientation. In an annulus with radial orientation, travelling
dominance waves are not generally observed and piecemeal rivalry is more likely.

adjacent in the same hemifield (therefore projecting to adjacent columns in the same cortical
hemisphere), and was still quite strong when the rivaling stimuli were placed on either side of fixa-
tion. The fact that grouping was still observed for grating patches placed on either side of fixation
suggests that callosal connections between hemispheres are able to establish the adjacency of the
grating patches in the visual field as well as their orientation relationship. Consistent with this sug-
gestion, a study of binocular rivalry in a split-brain observer found that coordinated dominance
Binocular Rivalry and Perceptual Ambiguity 783

between rivalry patches did not occur when those patches were located either side of the midline
(O’Shea and Corballis 2005). The corpus callosum does indeed seem critical for perceptual group-
ing across the vertical midline.
Binocular rivalry is therefore a process occurring in local zones, but these can group together
into pairs or larger ensembles (Bonneh and Sagi 1999) according to the principle of the ‘asso-
ciation field’ (Field et al. 1993). This notion (see Figure 38.4) is similar to the Gestalt principle
of common fate or good continuation and posits that collinear orientations will tend to associ-
ate more strongly than oblique contours (Alais et al. 2006), and that the strength of association
declines with distance. The association field is thought to have a basis in the long-range horizontal
connections in V1 which are known to be longer and stronger for collinear orientations and to fall
off monotonically with angular difference (Kapadia et al. 1995). Related work shows that spatial
interactions influencing rivalry can arise outside regions of the visual field within which rivalry
is occurring. For instance, the predominance and strength of suppression of a patch of grating
engaged in rivalry are influenced by a surrounding grating that is not engaged in rivalry (Paffen
et al. 2004; Paffen et al. 2005). This interaction is thought to have a neural basis in center-surround
interactions between classical and extended receptive fields (e.g., Blakemore and Tobin 1972;
Fitzpatrick 2000).
Another line of work pointing to local grouping between rivalry zones comes from studies of
‘traveling waves’ of rivalry dominance (Wilson et al. 2001; Kang et al. 2010). These studies exam-
ined the often noted observation that when a large rivalry stimulus is suppressed, dominance will
often breakthrough in a single small region and then spread like a wave, sweeping across the entire
stimulus until it is fully visible. Psychophysical observations have shown that traveling waves tend
to travel faster and further along collinear contours than non-collinear contours (see Figure 38.4),
in keeping with the association field hypothesis (Wilson et al. 2001; Kang et al. 2010). An fMRI
study (Lee et al. 2005) has shown that when a traveling wave is experienced in rivalry it produces
a concomitant wave of changing BOLD activity across the occipital cortex that is correlated spa-
tially and temporally with the perceived traveling wave. The speed of the wave in perception, in
other words, is tightly correlated with the spreading wave within neural tissue, as is the spatial
movement of the wave in the visual field and in retinotopic cortical areas (Lee et al. 2007).
Taken together, these findings are consistent with binocular rivalry being a local process with
lateral interactions capable of coordinating rivalry states across adjacent locations, thereby allow-
ing coherent states to emerge through perceptual grouping and synchronized transitions. Rivalry
thus exhibits spatial grouping over space and time. This grouping is made possible by cooperation
along collinear or near-collinear orientations and is likely mediated by lateral cortico-cortical
networks (Kapadia et al. 1995; Angelucci et al. 2002). For a full review of contour interactions, see
Hess et al. (this volume). Consistent with this reasoning, natural images—which contain locally
correlated orientations across spatial scales—tend to resist breaking into piecemeal zones and will
remain coherent at much larger image sizes than gratings will (Alais and Melcher 2007). Natural
images will also tend to predominate over non-natural images when the two are pitted against one
another in rivalry (Baker and Graf 2009).

Dynamics of Binocular Rivalry


One of the striking features of binocular rivalry is that the competition between conflicting
monocular inputs never seems to be resolved. Alternations in dominance between dissimilar
monocular patterns persist for as long as those patterns are viewed, although the incidence of
mixed dominance tends to increase when one views rivalry for very long periods of time (Klink
784 Alais and Blake

et al. 2010). What underlies the temporal dynamics of binocular rivalry? This section will review
the factors governing rivalry dynamics, and in doing so will lay the groundwork for the subse-
quent sections discussing top-down and contextual influences on binocular rivalry.
Levelt (1965), one of the first to examine rivalry dynamics in detail, borrowed the idea of recip-
rocal inhibition from early neurophysiologists. He contended that when conflicting rival images
first activate respective neural populations, reciprocal inhibition would inevitably cause one
response to dominate the other. The reason is that a stronger response in one population—even
a slight one—leads to greater inhibition over the other population. Any degree of advantage less
inhibition is exerted back by the weaker population, freeing the stronger population to respond
even more strongly (and exert still further inhibition over the other). This process rapidly leads to
one population completely inhibiting the other so that only one image is visible. Most subsequent
models of binocular rivalry have employed reciprocal inhibition to account for rivalry suppres-
sion (Lehky 1988; Blake 1989; Mueller 1990; Laing and Chow 2002; Freeman 2005).
Reciprocal inhibition offers an explanation of the suppression of one image at rivalry onset,
but how does it explain the ensuing alternation of perceptual dominance? Simply adding neural
adaptation to the reciprocal inhibition process is sufficient to account for ongoing fluctuations in
dominance because it reverses the process. Adaptation gradually attenuates the responses within
the dominant population, progressively weakening its inhibitory hold over the suppressed popu-
lation. Concurrent with weakening inhibition, the suppressed neurons are also recovering from
adaptation incurred in their previous dominance phase and are thus gaining strength. Over time,
responses in the two populations converge towards a balance point where any minor change in
response can trigger a flip in perceptual dominance. The adapting reciprocal inhibition model of
binocular rivalry is sufficient to explain both suppression and alternation dynamics. Importantly,
the tipping point is somewhat variable, as it is influenced by external factors such as eye move-
ments or blinks, or by internal factors such as attentional shifts or neuronal noise in response
levels (Kim et  al. 2006; Lankheet 2006; Moreno-Bote et  al. 2007). These potential tipping fac-
tors assume increasing significance as the tipping point approaches and can trigger perceptual
shifts at irregular times, consistent with the fundamentally stochastic nature of rivalry dynamics
(Brascamp et al. 2006; Shpiro et al. 2009).
The adapting reciprocal inhibition model of rivalry predicts that suppression strength should
weaken over a dominance period, reaching a minimum level just prior to a dominance switch.
Two studies testing this prediction found sensitivity for detecting probes in the suppressed
eye late in a suppression period were not better than early in the period (Fox and Check 1968;
Norman et al. 2000), implying that inhibition was not weakening over time. However, two limi-
tations may explain their null finding. First, both studies used gratings as rival stimuli but meas-
ured sensitivity using completely different probes (letters or small spots of light) that would not
tap into the same neurons signaling (and adapting to) the suppressed grating. Second, the ‘late’
probes in these studies were presented at the median dominance duration so that no genuinely
late probes were measured. Recently, a new approach solved these problems (Alais et al. 2010a).
First, the probe was a contrast increment of the suppressed stimulus itself, meaning it directly
probed contrast sensitivity of the neurons encoding the suppressed stimulus. Second, in a new
‘reverse correlation’ approach, hundreds of probes were presented at random times and their
timing relative to suppression onset was later mapped onto observers’ rivalry alternation data. In
this design, probes could fall early or late in a rivalry phase with equal probability. Plotting probe
sensitivity within rivalry phases showed a striking reciprocity: dominance performance was ini-
tially stable but declined late in the period, and suppression performance was initially stable but
improved in a complementary fashion late in the period (Figure 38.5). The complementarity
Binocular Rivalry and Perceptual Ambiguity 785

100 2500

90 2000
% Correct probes

Tally per bin


80 1500

70 1000

60 500

50 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized duration
Fig. 38.5  The classical model of rivalry is based on reciprocal inhibition reciprocal inhibition between
competing neural representations of images viewed by the left and right eyes. This model explains
how a monocular image becomes suppressed, and the ongoing alternation dynamics are attributed to
adaptation occurring within the currently dominant neurons and thus shifting the balance of inhibition.
A key prediction of this model is that suppression strength should weaken during a rivalry phase as
adaptation increases. This was confirmed in a recent study that had observers detect randomly timed
probe stimuli at random contrasts over many hundreds of trials to build up a picture of contrast sensitivity
over a rivalry phase (Alais et al 2010a). Data from this method are illustrated here and show contrast
sensitivity declining over a period of dominance, with a corresponding reciprocal rise in sensitivity during
suppression. The two sensitivity curves converge just prior to a change in perceptual dominance.

of these curves confirms the reciprocity of the model, and their convergence late in the period
confirms the role of adaptation in rivalry dynamics.
A study by van Ee (van Ee 2009) explored the role of noise in rivalry dynamics using a computa-
tional model. A comparison was made between adding noise to the adapting representation of the
dominant stimulus or to the cross-inhibited neural activity. The intention was to clarify whether
the mutual inhibition process adapts, as has been suggested (Klink et al. 2010), or whether it is
the response to the dominant stimulus. Results showed that adding noise to the cross-inhibition
process did not produce typical rivalry dynamics, but adding noise to the dominant response
did. They suggest this reflects differing time scales. Cross-inhibition is a fast process (millisecond
scale) and no amount of noise perturbation produces significant variations in dominance dura-
tions (typically lasting a second or so). However, noise added to the adaptation of the dominant
stimulus does produce typical rivalry dynamics, showing that noisy adaptation within a recipro-
cal inhibition framework can account for stochastic rivalry dynamics. This and related work by
others has seen noise and adaptation become key, interacting features in recent rivalry models
(Brascamp et al. 2006; Kim et al. 2006; Moreno-Bote et al. 2007; Kang and Blake 2011; Seely and
Chow 2011; Roumani and Moutoussis 2012).
Another key characteristic of rivalry dynamics is that phase durations are significantly affected
by stimulus contrast (Mueller and Blake 1989; Lankheet 2006). Rivalry alternation rate reliably
increases as the contrast of both stimuli increases, with each stimulus perceived for shorter peri-
ods on average. Within the reciprocal inhibition model, this is attributed to faster adaptation
arising from stronger neural responses to high-contrast stimuli. Interestingly, increasing the con-
trast of only one stimulus will also increase alternation rates but in a curious way:  increasing
one image’s contrast can slightly increase its dominance duration, but the main consequence is
786 Alais and Blake

to decrease the dominance duration of the other image (Levelt 1965; Mueller and Blake 1989;
Bossink et al. 1993). This counterintuitive relationship is easily explained within the framework of
reciprocal inhibition where a given stimulus generates not an isolated response but one linked to
the response generated by the other, competing stimulus.
This underscores the distinction between overall rivalry alternation rate and the relative dura-
tions of the dominance and suppression phases making up a rivalry cycle, which is referred to
as ‘predominance’. Rivalry predominance is measured by tracking rivalry alternations and then
calculating the proportion of time each image was visible. Alternation rate relates to the period of
a full rivalry cycle (i.e., dominance plus suppression duration), whereas predominance effectively
measures the duty cycle (the proportion of each phase relative to the cycle period). Both measures
are important, as a change in predominance of one stimulus over the other (e.g., from 50:50 to
70:30) could go unnoticed if only alternation rate were measured. This is an important point for
the following sections where we discuss how perceptual organization, as manifest through a vari-
ety of contextual and top-down effects, influences rivalry dynamics. By way of preview, contextual,
and top-down effects in rivalry generally affect the duration that a given rival target is dominant,
but less often when it is suppressed. This implies that perceptual organization’s influence during
rivalry operates primarily on the rival pattern already selected for conscious awareness.

Top-down and Contextual Influences on Binocular Rivalry


Attention in Binocular Rivalry
The first top-down influence on rivalry we consider is attention, a concept closely linked to rivalry
over the years because both can be thought of as acts of selection. Attention involves select-
ing among competing objects and rivalry could be interpreted as perceptual selection between
competing images. The role of attention in binocular rivalry has been debated since the begin-
nings of experimental psychology. Von Helmholtz thought attention played a key role and that
rivalry alternations were under volitional control and easily manipulated by will. Hering adopted
a contrary position and considered rivalry to be driven by physiological processes related to the
stimuli. More than a century later, both positions have support. There is ample evidence support-
ing Hering’s contention that basic stimulus properties such as contrast and spatial frequency are
important determinants of rivalry. In support of von Helmholtz, it is also clear that attention can
modulate aspects of rivalry such as alternation dynamics, dominance durations, and selection of
initial perceptual dominance. The key point, however, is that no act of attention or will-power can
arrest the alternations of rivalry so that a single image remains dominant, undermining the notion
that rivalry is completely synonymous with attentional selection.
In more recent times, Lack was the first to systematically examine the role of attention in
binocular rivalry (Lack 1978). Lack found that attentional control over rivalry was generally
limited, although with training observers were better able to select and hold one stimulus. This
led to extended dominance durations (by about 20 per cent) relative to a baseline condition,
showing a degree of endogenous or volitional control over rivalry (although much less than von
Helmholtz had suggested). In other experiments, Lack used spatial cueing to draw attention to
the dominant image, which extended its dominance duration, or to cue the suppressed stimu-
lus, which increased the likelihood of it becoming dominant. This established that exogenous
attention could also influence binocular rivalry. Other papers have confirmed that voluntary
and involuntary attention affect binocular rivalry. Ooi and He (1999) presented four targets
to the dominant eye and asked observers to attend to one. A transient signal in the suppressed
Binocular Rivalry and Perceptual Ambiguity 787

eye, which would normally trigger a dominance switch, was less likely to cause a switch when
it occurred at the attended location, compared to the three unattended locations. Voluntary
attention can therefore help maintain the ‘selected’ image despite transient exogenous stimuli.
These authors also used a monocular pop-out cue flanking a suppressed image to show that
involuntary attention directed to a suppressed stimulus could cause it to become dominant. In
related work, Paffen and Van der Stigchel (2010) presented rivalry at two locations and added
an exogenous cue around one of them, finding that alternations occurred earlier and more
frequently at the cued location, linking rivalry dynamics to the spatio-temporal properties of
visual attention. In other words, drawing attention to a spatial location increases the rate of
perceptual alternation at that location.
Object-based attention can also bias which image dominates in binocular rivalry. In one study
(Mitchell et al. 2004), observers were first presented with two objects superimposed in transpar-
ency that were binocularly viewed for a brief period before shutter glasses activated and streamed
them separately to the two eyes to trigger rivalry. Just before the rivalry stage, one object was
exogenously cued with a transient movement. This caused the cued object to achieve perceptual
dominance at rivalry onset and showed that an object selection made during normal binocular
viewing is maintained despite a change to rivalrous dichoptic viewing. A subsequent study using
different techniques drew the same conclusions (Chong and Blake 2006). Endogenous cuing, too,
has been shown to produce a similar effect (Chong et al. 2005; Klink et al. 2008), although in
both cases the cue’s influence in determining image dominance is restricted to the early phase of
rivalry, after which normal alternation dynamics are observed. Studies with other kinds of per-
ceptually bistable stimuli show similar modulatory effects of attention (Struber and Stadler 1999;
van Ee 2005) in that attention can bias which percept tends to dominate, although several studies
have found that attentional control over rivalry is generally weaker than control over other forms
of bistability (Meng and Tong 2004; van Ee et al. 2005).
These studies manipulated attention by selecting one of the perceptual alternatives, either
endogenously or exogenously. An alternative approach involves directing attention away from
the rival stimuli towards a peripheral secondary task. Paffen et al. used this method to show that
removing attention from the stimuli causes rival alternations to slow. The slowing effect was
graded, being stronger for a more difficult secondary task (Paffen et  al. 2006), with some evi-
dence that alternations cease altogether when attention is completely removed from rival stimuli
(Brascamp and Blake 2012). A similar paradigm was used to show that perceptual alternations
in bistable motion perception are also slowed by a difficult attentional distractor (Pastukhov and
Braun 2007). In a neuroimaging study examining the withdrawal of attention, Lee et al. (2007)
investigated rivalry between large images designed to produce a travelling wave of dominance fol-
lowing a path of ‘good continuation’ along locally similar orientations. With attention directed to
the rival images, the traveling waves of perceptual dominance produced corresponding waves of
activity sweeping across retinotopic areas V1, V2, and V3. However, when attention was diverted
to a letter monitoring task at the center of the display, activity in V2 and V3 no longer indicated a
travelling wave and rivalry-related activity was restricted to V1.

Interpretation and Affect Influence Rivalry Dynamics


As noted already, there is abundant evidence that low-level visual attributes impact on binocular
rivalry dynamics. Indeed, most reciprocal inhibition models described earlier assume that rivalry
transpires early in visual processing where inhibitory competition occurs between local features
signaled by monocular neurons. Several lines of evidence, however, have emerged to show that
788 Alais and Blake

seemingly ‘high-level’ influences can govern the occurrence and dynamics of rivalry, as can feed-
back from mid-level vision (Alais and Blake 1998; Watson et al. 2004; Pearson and Clifford 2005;
van Boxtel et al. 2008). Top-down approaches to rivalry, in focusing on interpretation of ambigu-
ous retinal input, broaden the scope of potential influences on rivalry. We will focus here on
results implicating high-level influences operating during rivalry, for those results bear on the
role of perceptual organization in governing rivalry dynamics. We start by summarizing findings
from a growing list of studies showing that the meaning or emotional content of rivalry stimuli
can influence rivalry dynamics.
The question of cognitive and motivational influences on rivalry goes back to the middle of
the previous century (reviewed by Walker 1978). In early studies, rival stimuli with conflicting
emotional or symbolic content were presented to different groups and predominance was meas-
ured. When Jewish and Catholic observers viewed the star of David versus a Christian cross,
Jewish observers tended to see the star more than the cross, and vice versa for Catholic observers
(Losciuto and Hartley 1963). In a similar vein, figures a person had seen before tended to pre-
dominate in rivalry over figures never seen before (Goryo 1969). These results were interpreted
to mean that non-visual factors such as affective content and familiarity influence the resolution
of stimulus conflict during binocular rivalry (Walker 1978). Recently, interest in this question has
returned with several new papers addressing this topic (reviewed by (Blake 2013)). For exam-
ple, studies report that emotionally arousing pictures—whether positive or negative—produce
longer dominance durations than non-arousing pictures, even when both images have compara-
ble low-level image properties (Sheth and Pham 2008). Dominance durations are also longer for
emotional faces rivalling against neutral faces. An emotional face is also more likely to dominate
first at rivalry onset (Alpers and Pauli 2006). More remarkably, neutral looking faces dominate
significantly longer if they have previously been associated with negative social behaviors through
conditioning (‘threw a chair at a classmate’), relative to faces associated with positive or neutral
behaviors (Anderson et  al. 2011). Even the simple act of imagining a given stimulus can sub-
sequently boost its dominance in rivalry, implying a boost in stimulus strength from the act of
imagining (Pearson et al. 2008).
Top-down influences such as these are not too surprising given our knowledge that attention
can modulate rivalry durations (Lack 1978; Paffen et al. 2006): familiar, imagined, or emotion-
ally charged stimuli may command greater attention and, hence, receive a boost in rivalry.
Accordingly, enhanced rivalry predominance could arise from lengthened dominance dura-
tions, for it is presumably the dominant stimulus that receives attention during rivalry. Is that
the sole basis of context’s modulation of rivalry? To answer this, we turn to recent work using a
new procedure that isolates context’s influence on suppression durations. These new studies all
employ continuous flash suppression (CFS: Figure 38.6), a robust form of binocular rivalry pro-
duced when one eye views a rapidly changing array of densely overlaid, high-contrast shapes
(the CFS inducer) and the other eye views a more conventional, static rival image (Tsuchiya and
Koch 2005). Because of the broadband spatio-temporal energy spectrum of the CFS inducer
(Yang and Blake 2012), it is always the initially dominant stimulus at rivalry onset, and it
remains dominant for an unusually long duration compared to rivalry produced by conven-
tional rival stimuli.
Exploiting the robustness of CFS, recent studies have used a variant whereby the CFS inducer is
initially presented to one eye and a probe stimulus is presented to the other eye shortly after. The
predominance of CFS at onset prevents observers from seeing the probe at first, but probe con-
trast is steadily increased until eventually the observer can indicate in which of four display quad-
rants the probe appeared. In some cases, contrast of the CFS inducer is also gradually decreased,
Binocular Rivalry and Perceptual Ambiguity 789

Stimulus Percept
Left eye Right eye

Time

Fig. 38.6  An illustration of the flash suppression paradigm, a new method of producing interocular
suppression. A sequence of independent Mondrian-like arrays is presented at a rate of ~10 Hz to
one eye and causes the image in the other eye to be very deeply suppressed and for far longer
periods (several 10s of seconds) than is typical of binocular rivalry. Because the dynamic inducing
pattern has a broad and dense spatio-temporal energy spectrum it will always be dominant over the
static image at onset.

to ensure the probe will eventually be perceived. The dependent measure is the duration of sup-
pression, the period from probe onset until successful reporting of the probe’s location. Using
this approach, several recent studies have asked what stimulus properties empower an initially
suppressed probe to overcome the potent suppression from the CFS inducer. Whatever those
properties turn out to be, they cannot be due to a boost from attention because the identity and
location of the suppressed probe remains unknown to the observer until it emerges from suppres-
sion. Some examples of findings from these studies are:
•  Upright faces emerge from suppression more quickly than inverted faces, as do words printed
in familiar script that can be read by an observer compared to words in unfamiliar script (Jiang
et al. 2007).
•  Angry faces escape suppression faster than neutral or happy faces (Yang et al. 2007; Tsuchiya
et al. 2009).
•  Faces implying direct eye contact break suppression faster than the same faces with gaze slightly
diverted (Stein et al. 2011).
•  Scenes containing an object (e.g., a watermelon) in a bizarre context (a basketball game) are
freed from suppression faster than the same scenes with a contextually appropriate object (e.g.,
a basketball) (Mudrik et al. 2011).
Based on this kind of speeded emergence from suppression, most (but not all) of these stud-
ies conclude that meaning, affective connotation and contextual relevance of suppressed stimuli
are still registered, despite being completely absent from visual awareness. At first glance, these
kinds of findings seem to rule out attention as the modulating factor in enhanced predominance
of certain stimuli engaged in rivalry. However, there are some reasons to take that conclusion
with a grain of salt. Two papers that used CFS together with emotional faces adopted a more
cautious tone by pointing to actual feature differences between faces that break suppression early
and those that do not (Yang et al. 2007; Gray et al. 2013). Also, the investigators that documented
gaze direction’s effect on dominance (Stein et al. 2011a) expressed in another paper doubt about
the adequacy of control measures typically employed to rule out alternative explanations (Stein
et al. 2011b).
790 Alais and Blake

Rivalry in a Multisensory Context


Next we turn to studies that have asked whether sensory inputs from modalities other than vision
can influence binocular rivalry dynamics. As we live in a multisensory world, there are many
occasions when visual signals from the external environment are accompanied by auditory or tac-
tile signals (see chapter by Spence, this volume, for multisensory processing, including a section
on multisensory bistability). Psychophysical and neurophysiological evidence shows the brain
combines information across senses if it is likely to refer to the same stimulus event (see recent
reviews: Alais et al. 2010b; Spence 2011). This helps achieve a more veridical and less ambiguous
percept, one of the main functions of cross-modal interactions (Ernst and Bulthoff 2004). Recent
results suggest multisensory signal combination can significantly modulate rivalry dynamics.
Specifically, a sound congruent with one of the rival stimuli biases perceptual dominance towards
that stimulus (Kang and Blake 2005; van Ee et al. 2009; Conrad et al. 2010; Chen et al. 2011;
Lunghi et al. 2014), and rubbing a finger back and forth over a tactile grating promotes domi-
nance of a visual grating of matched orientation (Lunghi et al. 2010; Lunghi and Alais 2013). Even
smelling a distinctive odor while experiencing binocular rivalry can bias dominance in favor of
a congruent visual rival target (Zhou et al. 2010). The motor system, too, can influence binocular
rivalry dynamics, as evidenced by increased predominance when the motion of a rival stimulus is
controlled by the observer’s self-generated actions (Maruya et al. 2007). More broadly, motor and
non-visual sensory signals can bias other forms of visual bistability, including ambiguous motion
(Sekuler et al. 1997) and ambiguous depth perspective (Blake et al. 2004).
One way that multisensory interactions can influence binocular rivalry is by boosting the degree
of attentional control over perceptual alternations. A recent multisensory study added two differ-
ent auditory signals to the rivalry stimulus, with one signal being congruent with one of the visual
stimuli (van Ee et al. 2009). It was found that attentional control over rivalry was augmented by a
congruent auditory signal, relative to the non-congurent signal. The boost to attentional control
over rivalry was also shown with a congruent tactile signal. In a trimodal experiment, a combina-
tion of both auditory and tactile congruency afforded even more attentional control over binocu-
lar rivalry than either modality alone. This study shows that the attentional resources involved in
exerting voluntary control over binocular rivalry are central or ‘supramodal’, and squares with
another study showing that attending to an auditory distractor task slows binocular rivalry (Alais
et al. 2010c), in the same way that attending to a visual distractor slows rivalry (Paffen et al. 2006).
These multisensory influences in binocular rivalry demonstrate perceptual organization in its
full breadth, as information from all available sensory modalities is used in pursuit of a coherent,
disambiguated interpretation of the external world.

Cortical Networks Underlying Rivalry Alternations


Consistent with the top-down influences on rivalry reviewed in the preceding sections, recent
brain imaging work has implicated fronto-parietal networks in control of rivalry dynamics. The
first study suggesting such a role found transient activation in parietal and prefrontal areas during
switches in perceptual dominance, activations which were much smaller when observers viewed a
physically alternating image sequence (Lumer et al. 1998). This study highlighted that selection for
consciousness in binocular rivalry may involve networks in common with top-down attentional
control (Desimone and Duncan 1995; Kastner and Ungerleider 2000; Bisley 2011). Subsequent
studies also found evidence for a fronto-parietal network in binocular rivalry (Lumer and Rees
1999; Miller et al. 2000; Cosmelli et al. 2004; Sterzer and Rees 2008). According to a top-down
view of rivalry, frontal and parietal regions trigger the process of perceptual selection and then
Binocular Rivalry and Perceptual Ambiguity 791

promote that selection via feedback to early visual areas (Leopold and Logothetis 1999). Further
evidence for this view comes from studies showing frontal (Sterzer and Kleinschmidt 2007) and
parietal (Britz et al. 2011) activity preceding occipital activity associated with perceptual alterna-
tions, although these studies used ambiguous motion and Necker cubes—stimuli that are clearly
bistable but lack the interocular conflict that triggers rivalry. One study that did use binocular
rivalry confirmed fronto-parietal activation associated with perceptual alternations but a phase
analysis indicated the activity resulted from occipital sources (Kamphuisen et al. 2008). This study,
together with a subsequent one (Knapen et al. 2011), implies that fronto-parietal activations may
be a result of experiencing rivalry alternations rather than a cause of those activations.
A recent TMS study implicated parietal cortex in mediating perceptual alternations (Carmel
et al. 2010), finding that TMS applied over right superior parietal cortex (SPL) shortened rivalry
dominance durations. Later, Kanai, Carmel, Bahrami and Rees (Kanai et al. 2011) reported that
disrupting right anterior SPL shortened dominance durations, while disrupting right posterior
SPL increased dominance durations. Contrasting results, however, were found in a similar study
that used TMS over anterior SPL and reported increased rivalry durations (Zaretskaya et al. 2010).
The reason for this discrepancy is not clear and more research will be needed to resolve it but it
suffices to implicate parietal cortex in binocular rivalry dynamics.

A Bayesian View
As evidence has emerged for top-down and contextual processing in binocular rivalry, so have
new theoretical models of rivalry that formalize the interpretative aspect of perception and its
response to ambiguous input (e.g., Sterzer et  al. 2009), including models based on a Bayesian
probabilistic framework (Dayan 1998; Hohwy et al. 2008; Sundareswara and Schrater 2008). On
the Bayesian view (see Feldman, this volume, for a full analysis of Bayesian models of perceptual
organization), the existence of incompatible monocular images precludes a single interpretation
of the visual environment. That is, there is a very low prior probability that both images could be
true simultaneously (two different objects in the same visual location logically is not possible). If
the likelihoods of each image being true are roughly equal, the model is faced with two equivalent
solutions and perception alternates between the two competing percepts. On this view, binocular
rivalry is a consequence of the conflicting interpretations of the left—and right-eye images, rather
than of inhibitory connections between early feature-tuned neurons (Dayan 1998). This kind of
model can accommodate a good deal of the traditional low-level psychophysical data about bin-
ocular rivalry (reviewed in Hohwy et al. 2008). It is also well suited to describing how multisen-
sory interactions help resolve visual ambiguity. Where one visual image is correlated with signals
in another modality, that visual image will have a higher likelihood than the other and will receive
a higher weighting in alternation dynamics and therefore tend to dominate rivalry perception.
Through learning and experience, too, certain auditory, visual and tactile combinations will have
high prior probabilities and be favored when the visual stimuli alone may be ambiguous.

Conclusion
We began the chapter by mentioning a school of thought that sees perception as a process of infer-
ence and interpretation, a tradition that stretches back to Helmholtz in the late nineteenth century.
Although binocular rivalry has been an active field since those times, most rivalry research con-
ducted since Levelt’s seminal work in the 1960s has focused on basic stimulus features and early
cortical processing. Although low-level factors are undoubtedly important in binocular rivalry,
the chapter’s second half focused on more recent work showing the significance of top-down
792 Alais and Blake

processing and perceptual organization. We reviewed the importance of top-down influences


such as attention and context in controlling rivalry dynamics. These top-down influences are
broad, including the familiarity or affective content of rivaling stimuli, object-based properties
of those stimuli, and of course a pervasive role for attention. These factors can be thought of col-
lectively as perceptual organization in binocular rivalry, a top-down influence helping resolve a
very low-level visual ambiguity. These top-down, contextual effects influence rivalry dynamics by
modulating the duration of the dominant image rather than the unseen, suppressed image, show-
ing that perceptual organization operates on the consciously perceived pattern.
We also reviewed very recent evidence showing that information from non-visual senses can
influence visual alternations in binocular rivalry. As we live in a multisensory environment,
perceiving the external world is a multisensory problem and perceptual organization should
occur in a multisensory context. The ability of touch and sound (and even olfactory stimuli)
to alter rivalry dynamics and help resolve ambiguity is evidence of this. These effects gener-
ally require the non-visual stimuli to be congruent with the visual stimuli, either in terms of
low-level stimulus attributes (e.g., spatially or temporally matched) or to be semantically congru-
ent. This inter-sensory influence on binocular rivalry is an example of perceptual organization
in its broadest extent. Recent Bayesian models of rivalry, being fundamentally inferential, sit
well within a high-level, interpretive view of rivalry. One advantage of Bayesian models and a
top-down ‘perceptual organization’ approach is that they are not tied to the inhibitory interocu-
lar interactions that characterize most earlier models of rivalry. They can therefore be applied
easily to bistable stimuli in general as the phenomenology of all forms of perceptual rivalry is
similar in that all involve irregular perceptual alternations and common patterns of temporal
dynamics.
Overall, the last decade or so of binocular rivalry research has seen a steady stream of contex-
tual and top-down findings that can be interpreted within the framework of perceptual organiza-
tion. These studies have not replaced the important low-level work that dominated recent decades
of rivalry research but they do provide important balance. They show the value of a top-down
view in complementing the recently prevalent low-level focus, and importantly the top-down
view provides scope for a more complete account of binocular rivalry and perceptual ambiguity.

References
Alais, D. and Blake, R. (1998). ‘Interactions between global motion and local binocular rivalry’. Vision Res
38(5): 637–44.
Alais, D. and Blake, R. (1999). ‘Grouping visual features during binocular rivalry’. Vision Res
39(26): 4341–53.
Alais, D. and Blake, R. (2005). Binocular rivalry. (Cambridge: MIT Press).
Alais, D. and Melcher, D. (2007). ‘Strength and coherence of binocular rivalry depends on shared stimulus
complexity’. Vision Res 47(2): 269–79.
Alais, D., R. P. O’Shea, et al. (2000). ‘On binocular alternation’. Perception 29(12): 1437–45.
Alais, D., J. Lorenceau, et al. (2006). ‘Contour interactions between pairs of Gabors engaged in binocular
rivalry reveal a map of the association field’. Vision Res 46(8–9): 1473–87.
Alais, D., Cass, J. et al. (2010a). ‘Visual sensitivity underlying changes in visual consciousness’. Current
Biology 20: 1362–7.
Alais, D., Newell, F. N. et al. (2010b). ‘Multisensory processing in review: from physiology to behaviour’.
Seeing Perceiving 23(1): 3–38.
Alais, D., van Boxtel, J. J. et al. (2010c). ‘Attending to auditory signals slows visual alternations in binocular
rivalry’. Vision Res 50(10): 929–35.
Binocular Rivalry and Perceptual Ambiguity 793

Alexander, L. T. (1951). ‘The influence of figure-ground relationships in binocular rivalry’. J Exp Psychol
41(5): 376–81.
Alpers, G. W. and Pauli, P. (2006). ‘Emotional pictures predominate in binocular rivalry’. Cognition and
emotion 20: 596–607.
Anderson, E., Siegel, E. H. et al. (2011). ‘The visual impact of gossip’. Science 332(6036): 1446–8.
Angelucci, A., Levitt, J. B. et al. (2002). ‘Circuits for local and global signal integration in primary visual
cortex’. J Neurosci 22(19): 8633–46.
Baker, D. H. and Graf, E. W. (2009). ‘Natural images dominate in binocular rivalry’. Proc Natl Acad Sci USA
106(13): 5436–41.
Bisley, J. W. (2011). ‘The neural basis of visual attention’. J Physiol 589(Pt 1): 49–57.
Blake, R. (1989). ‘A neural theory of binocular rivalry’. Psychol Rev 96(1): 145–67.
Blake, R. (2001). ‘A Primer on Binocular Rivalry, Including Current Controversies’. Brain and Mind 2: 5–38.
Blake, R. (2013). Binocular rivalry updated. In The New Visual Neurosciences, edited by J. S. Werner and L.
M. Chalupa. (Cambridge, MA: MIT Press).
Blake, R. and Logothetis, N. K. (2002). ‘Visual competition’. Nat Rev Neurosci 3(1): 13–21.
Blake, R., Sobel, K. V. et al. (2004). ‘Neural synergy between kinetic vision and touch.’ Psychol Sci
15(6): 397–402.
Blakemore, C. and Tobin, E. A. (1972). ‘Lateral inhibition between orientation detectors in the cat’s
visual cortex’. Experimental brain research Experimentelle Hirnforschung Expérimentation cérébrale
15(4): 439–40.
Bonneh, Y. and Sagi, D. (1999). ‘Configuration saliency revealed in short duration binocular rivalry’. Vision
Res 39(2): 271–81.
Bossink, C. J., Stalmeier, P. F. et al. (1993). ‘A test of Levelt’s second proposition for binocular rivalry’. Vision
Res 33(10): 1413–19.
Brancucci, A. and Tommasi, L. (2011). ‘ “Binaural rivalry”: Dichotic listening as a tool for the investigation
of the neural correlate of consciousness’. Brain Cogn 76(2): 7.
Brascamp, J. W., van Ee, R. et al. (2005). ‘Distributions of alternation rates in various forms of bistable
perception’. Journal of Vision 5(4): 287–98.
Brascamp, J. W., van Ee, R. et al. (2006). ‘The time course of binocular rivalry reveals a fundamental role of
noise’. Journal of Vision 6(11): 1244–56.
Brascamp, J. W., and Blake, R. (2012) ‘Inattention abolishes binocular rivalry: perceptual evidence’.
Psychological Science 23: 1159–67.
Britz, J., Pitts, M. A. et al. (2011). ‘Right parietal brain activity precedes perceptual alternation during
binocular rivalry’. Hum Brain Mapp 32(9): 1432–42.
Carmel, D., Walsh, V. et al. (2010). ‘Right parietal TMS shortens dominance durations in binocular rivalry’.
Curr Biol 20(18): R799–800.
Carter, O. L., Konkle, T. et al. (2008). ‘Tactile rivalry demonstrated with an ambiguous apparent-motion
quartet’. Curr Biol 18(14): 1050–4.
Carter, O. L. and Pettigrew, J. D. (2003). ‘A common oscillator for perceptual rivalries?’ Perception
32(3): 295–305.
Chen, Y. C., Yeh, S. L. et al. (2011). ‘Crossmodal constraints on human perceptual awareness: auditory
semantic modulation of binocular rivalry’. Front Psychol 2: 212.
Chong, S. C. and Blake, R. (2006). ‘Exogenous attention and endogenous attention influence initial
dominance in binocular rivalry’. Vision Res 46(11): 1794–803.
Chong, S. C., D. Tadin, et al. (2005). ‘Endogenous attention prolongs dominance durations in binocular
rivalry’. Journal of Vision 5(11): 1004–12.
Conrad, V., Bartels, A. et al. (2010). ‘Audiovisual interactions in binocular rivalry’. Journal of Vision
10(10): 27.
794 Alais and Blake

Cosmelli, D., David, O. et al. (2004). ‘Waves of consciousness: ongoing cortical patterns during binocular
rivalry’. Neuroimage 23(1): 128–40.
Dayan, P. (1998). ‘A hierarchical model of binocular rivalry’. Neural Comput 10(5): 1119–35.
Denham, S. L., & Winkler, I. (2014). ‘Auditory perceptual organization’. In J. Wagemans (Ed.), Oxford
Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Desimone, R. and Duncan, J. (1995). ‘Neural mechanisms of selective visual attention’. Annu Rev Neurosci
18: 193–222.
Dorrenhaus, W. (1975). ‘Pattern specific visual competition’. Naturwissenschaften 62(12): 578–9.
Ernst, M. O. and Bulthoff, H. H. (2004). ‘Merging the senses into a robust percept’. Trends Cogn Sci
8(4): 162–9.
Feldman, J. (2014). ‘Bayesian models of perceptual organization’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Field, D. J., Hayes, A. et al. (1993). ‘Contour integration by the human visual system: evidence for a local
“association field”.’. Vision Res 33(2): 173–93.
Fitzpatrick, D. (2000). ‘Seeing beyond the receptive field in primary visual cortex’. Curr Opin Neurobiol
10(4): 438–43.
Fox, R. and Check, R. (1968). ‘Detection of motion during binocular rivalry suppression’. J Exp Psychol
78(3): 388–95.
Fox, R. and Herrmann, J. (1967). ‘Stochastic properties of binocular rivalry alternations’. Perception &
Psychophysics 2: 432–6.
Freeman, A. W. (2005). ‘Multistage model for binocular rivalry’. J Neurophysiol 94(6): 4412–20.
Goryo, K. (1969). ‘The effect of past experience on binocular rivalry’. Japanese Psychological Research 11: 46–53.
Gray, K. L., Adams, W. J. et al. (2013). ‘Faces and awareness: Low-level, not emotional factors, determine
perceptual dominance’. Emotion, 13(3): 537–44, doi: 10.1037/a0031403.
Hess, R. F., May, K. A., & Dumoulin, S. O. (2014). ‘Contour integration: Psychophysical,
neurophysiological and computational perspectives’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Hohwy, J., Roepstorff, A, et al. (2008). ‘Predictive coding explains binocular rivalry: an epistemological
review’. Cognition 108(3): 687–701.
Hupe, J. M. and Rubin, N., et al. (2003). ‘The dynamics of bi-stable alternation in ambiguous motion
displays: a fresh look at plaids’. Vision Res 43(5): 531–48.
Jiang, Y., Costello, P. et al. (2007). ‘Processing of invisible stimuli: advantage of upright faces and
recognizable words in overcoming interocular suppression’. Psychol Sci 18(4): 349–55.
Lunghi, C., Morrone, M. C. et al. (2014). ‘Auditory and tactile signals combine to influence vision during
binocular rivalry’. J. Neurosci, 34(3): 784–92.
Kamphuisen, A., Bauer, M. et al. (2008). ‘No evidence for widespread synchronized networks in binocular
rivalry: MEG frequency tagging entrains primarily early visual cortex’. Journal of Vision 8(5): 4 1–8.
Kanai, R., Carmel, D. et al. (2011). ‘Structural and functional fractionation of right superior parietal cortex
in bistable perception’. Curr Biol 21(3): R106–7.
Kang, M. and Blake, R. (2005). ‘Perceptual synergy between seeing and hearing revealed during binocular
rivalry’. Psichologija 32: 7–15.
Kang, M. S. and Blake, R. (2011). ‘An integrated framework of spatiotemporal dynamics of binocular
rivalry’. Front Hum Neurosci 5: 88.
Kang, M.-S., Lee, S.-H. et al. (2010). ‘Modulation of spatiotemporal dynamics of binocular rivalry by
collinear facilitation and pattern-dependent adaptation’. Journal of Vision 10(11): 3–3.
Kapadia, M. K., Ito, M. et al. (1995). ‘Improvement in visual sensitivity by changes in local context: parallel
studies in human observers and in V1 of alert monkeys’. Neuron 15(4): 843–56.
Binocular Rivalry and Perceptual Ambiguity 795

Kappers, A. M. L., & Bergmann Tiest, W. M. (2014). ‘Tactile and haptic perceptual organization’. In
J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford
University Press.
Kastner, S. and Ungerleider, L. G. (2000). ‘Mechanisms of visual attention in the human cortex’. Annu Rev
Neurosci 23: 315–41.
Kaufman, L. (1963). ‘On the Spread of Suppression and Binocular Rivalry’. Vision Res 61: 401–15.
Kim, C. Y. and R. Blake (2007). ‘Illusory colors promote interocular grouping during binocular rivalry’.
Psychon Bull Rev 14(2): 356–62.
Kim, Y. J., Grabowecky, M. et al. (2006). ‘Stochastic resonance in binocular rivalry’. Vision Res 46(3): 392–406.
Klink, P. C., van Ee, R. et al. (2008). ‘Early interactions between neuronal adaptation and voluntary control
determine perceptual choices in bistable vision’. Journal of Vision 8(5): 16 11–18.
Klink, P. C., Brascamp, J. W. et al. (2010). ‘Experience-driven plasticity in binocular vision’. Current Biology
20(16): 1464–9.
Knapen, T., Brascamp, J. et al. (2011). The role of frontal and parietal brain areas in bistable perception.
J Neurosci 31: 10293–301.
Koffka, K. (1935). Principles of Gestalt Psychology. (New York: Harcourt Brace).
Kogo, N., & van Ee, R. (2014). ‘Neural mechanisms of figure-ground organization: Border-ownership,
competition and perceptual switching’. In J. Wagemans (Ed.), Oxford Handbook of Perceptual
Organization (in press). Oxford, U.K.: Oxford University Press.
Kovacs, I., Papathomas, T. V. et al. (1996). ‘When the brain changes its mind: interocular grouping during
binocular rivalry’. Proc Natl Acad Sci USA 93(26): 15508–11.
Lack, L. C. (1978). Selective attention and the control of binocular rivalry. (The Hague: The Netherlands,
Mouton).
Laing, C. R. and Chow, C. C. (2002). ‘A spiking neuron model for binocular rivalry’. J Comput Neurosci
12(1): 39–53.
Lankheet, M. J. (2006). ‘Unraveling adaptation and mutual inhibition in perceptual rivalry’. Journal of
Vision 6(4): 304–10.
Lee, S. H. and Blake, R. (2004). ‘A fresh look at interocular grouping during binocular rivalry’. Vision Res
44(10): 983–91.
Lee, S.-H., Blake, R. et al. (2005). ‘Traveling waves of activity in primary visual cortex during binocular
rivalry’. Nat Neurosci 8(1): 22–3.
Lee, S. H., Blake, R. et al. (2007). ‘Hierarchy of cortical responses underlying binocular rivalry’. Nat
Neurosci 10(8): 1048–54.
Lehky, S. R. (1988). ‘An astable multivibrator model of binocular rivalry’. Perception 17(2): 215–28.
Leopold, D. A. and Logothetis, N. K. (1999). ‘Multistable phenomena: changing views in perception’.
Trends in Cognitive Sciences 3(7): 254–64.
Levelt, W. (1965). On Binocular Rivalry. (Soesterberg, The Netherlands: Institute for Perception).
Long, G. M. and Toppino, T. C. (2004). ‘Enduring interest in perceptual ambiguity: alternating views of
reversible figures’. Psychol Bull 130(5): 748–68.
Losciuto, L. A. and Hartley, E. L. (1963). ‘Religious Affiliation and Open-Mindedness in Binocular
Resolution’. Percept Mot Skills 17: 427–30.
Lumer, E. D., Friston, K. J. et al. (1998). ‘Neural correlates of perceptual rivalry in the human brain’. Science
280(5371): 1930–4.
Lumer, E. D. and Rees, G. (1999). ‘Covariation of activity in visual and prefrontal cortex associated with
subjective visual perception’. Proc Natl Acad Sci USA 96(4): 1669–73.
Lunghi, C. and Alais, D. (2013). ‘Touch Interacts with Vision during Binocular Rivalry with a Tight
Orientation Tuning’. PLoS ONE 8(3): e58754.
796 Alais and Blake

Lunghi, C., Binda, P. et al. (2010). ‘Touch disambiguates rivalrous perception at early stages of visual
analysis’. Current Biology 20(4): R143-R144.
Lunghi, C., Morrone, M. C. et al. (2014). ‘Auditory and tactile signals combine to influence vision during
binocular rivalry’. J Neurosci 34(3): 784–792.
Maruya, K., Yang, E. et al. (2007). ‘Voluntary action influences visual competition’. Psychol Sci
18(12): 1090–98.
Meng, M. and Tong, F. (2004). ‘Can attention selectively bias bistable perception? Differences between
binocular rivalry and ambiguous figures’. Journal of Vision 4(7): 539–51.
Miller, S. M., Liu, G. B. et al. (2000). ‘Interhemispheric switching mediates perceptual rivalry’. Curr Biol
10(7): 383–92.
Mitchell, J. F., Stoner, G. R. et al. (2004). ‘Object-based attention determines dominance in binocular
rivalry’. Nature 429(6990): 410–13.
Moreno-Bote, R., Rinzel, J. et al. (2007). ‘Noise-induced alternations in an attractor network model of
perceptual bistability’. J Neurophysiol 98(3): 1125–39.
Mudrik, L., Deouell, L. Y. et al. (2011). ‘Scene congruency biases Binocular Rivalry’. Conscious Cogn
20(3): 756–67.
Mueller, T. J. (1990). ‘A physiological model of binocular rivalry’. Vis Neurosci 4(1): 63–73.
Mueller, T. J. and Blake, R. (1989). ‘A fresh look at the temporal dynamics of binocular rivalry’. Biol Cybern
61(3): 223–32.
Nguyen, V. A., Freeman, A. W. et al. (2003). ‘Increasing depth of binocular rivalry suppression along two
visual pathways’. Vision Res 43(19): 2003–8.
Norman, H. F., Norman, J. F. et al. (2000). ‘The temporal course of suppression during binocular rivalry’.
Perception 29(7): 831–41.
Ooi, T. L. and He, Z. J. (1999). ‘Binocular rivalry and visual awareness: The role of attention’. Perception
28: 551–74.
Ooi, T. L. and He, Z. J. (2003). ‘A distributed intercortical processing of binocular rivalry: psychophysical
evidence’. Perception 32(2): 155–66.
Ooi, T. L. and He, Z. J. (2006). ‘Binocular rivalry and surface-boundary processing’. Perception
35(5): 581–603.
O’Shea, R. P. and Corballis, P. M. (2005). ‘Visual grouping on binocular rivalry in a split-brain observer’.
Vision Res 45(2): 247–61.
O’Shea, R. P., Sims, A. J. et al. (1997). ‘The effect of spatial frequency and field size on the spread of
exclusive visibility in binocular rivalry’. Vision Res 37(2): 175–83.
O’Shea, R. P., Parker, A. et al. (2009). ‘Monocular rivalry exhibits three hallmarks of binocular
rivalry: evidence for common processes’. Vision Res 49(7): 671–81.
Ozkan, K. and Braunstein, M. L. (2009). ‘Predominance of ground over ceiling surfaces in binocular
rivalry’. Atten Percept Psychophys 71(6): 1305–12.
Paffen, C. L. E., te Pas, S. F. et al. (2004). ‘Center-surround interactions in visual motion processing during
binocular rivalry’. Vision Research 44: 1635–9.
Paffen, C. L. E. and S. Van der Stigchel (2010). ‘Shifting spatial attention makes you flip: Exogenous
visual attention triggers perceptual alternations during binocular rivalry’. Attention, Perception, &
Psychophysics 72(5): 1237–43.
Paffen, C. L. E., Alais, D. et al. (2005). ‘Center-surround inhibition deepens binocular rivalry suppression’.
Vision Res 45(20): 2642–9.
Paffen, C. L. E., Alais, D. et al. (2006). ‘Attention speeds binocular rivalry’. Psychological Science 17(9): 752–6.
Pastukhov, A. and J. Braun (2007). ‘Perceptual reversals need no prompting by attention’. Journal of Vision
7(10): 5 1–17.
Binocular Rivalry and Perceptual Ambiguity 797

Pearson, J. and Clifford, C. W. G. (2005). ‘When your brain decides what you see: grouping across
monocular, binocular, and stimulus rivalry’. Psychological science: a journal of the American Psychological
Society/APS 16(7): 516–19.
Pearson, J., Clifford, C. W. et al. (2008). ‘The functional impact of mental imagery on conscious perception’.
Curr Biol 18(13): 982–6.
Pressnitzer, D. and Hupe, J. M. (2006). ‘Temporal dynamics of auditory and visual bistability reveal
common principles of perceptual organization’. Current Biology 16(13): 1351–7.
Roumani, D. and K. Moutoussis (2012). ‘Binocular rivalry alternations and their relation to visual
adaptation’. Front Hum Neurosci 6: 35.
Seely, J. and Chow, C. C. (2011). ‘Role of mutual inhibition in binocular rivalry’. J Neurophysiol
106(5): 2136–50.
Sekuler, R., Sekuler, A. B. et al. (1997). ‘Sound alters visual motion perception’. Nature 385(6614): 308.
Sheth, B. R. and Pham, T. (2008). ‘How emotional arousal and valence influence access to awareness’. Vision
Res 48(23–24): 2415–24.
Shpiro, A., Moreno-Bote, R. et al. (2009). ‘Balance between noise and adaptation in competition models of
perceptual bistability’. J Comput Neurosci 27(1): 37–54.
Spence, C. (2011). ‘Crossmodal correspondences: a tutorial review’. Atten Percept Psychophys 73(4): 971–95.
Spence, C. (2014). ‘Cross-modal perceptual organization’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Stein, T., Senju, A. et al. (2011a). ‘Eye contact facilitates awareness of faces during interocular suppression’.
Cognition 119(2): 307–11.
Stein, T., Hebart, M. N. et al. (2011b). ‘Breaking Continuous Flash Suppression: A New Measure of
Unconscious Processing during Interocular Suppression?’ Front Hum Neurosci 5: 167.
Sterzer, P. and Kleinschmidt, A. (2007). ‘A neural basis for inference in perceptual ambiguity’. Proc Natl
Acad Sci USA 104(1): 323–8.
Sterzer, P., Kleinschmidt, A. et al. (2009). ‘The neural bases of multistable perception’. Trends Cogn Sci
13(7): 310–18.
Sterzer, P. and Rees, G. (2008). ‘A neural basis for percept stabilization in binocular rivalry’. J Cogn Neurosci
20(3): 389–99.
Struber, D. and Stadler, M. (1999). ‘Differences in top-down influences on the reversal rate of different
categories of reversible figures’. Perception 28(10): 1185–96.
Sundareswara, R. and Schrater, P. R. (2008). ‘Perceptual multistability predicted by search model for
Bayesian decisions’. Journal of Vision 8(5): 12 11–19.
Tong, F. (2001). ‘Competing Theories of Binocular Rivalry: A Possible Resolution’. Brain and Mind 2: 55–83.
Tsuchiya, N. and Koch, C. (2005). ‘Continuous flash suppression reduces negative afterimages’. Nat Neurosci
8(8): 1096–101.
Tsuchiya, N., Moradi, F. et al. (2009). ‘Intact rapid detection of fearful faces in the absence of the amygdala’.
Nat Neurosci 12(10): 1224–5.
van Boxtel, J. J. A., Alais, D. et al. (2008). ‘Retinotopic and non-retinotopic stimulus encoding in binocular
rivalry and the involvement of feedback’. Journal of Vision 8(5): 1–10.
van Ee, R. (2005). ‘Dynamics of perceptual bi-stability for stereoscopic slant rivalry and a comparison with
grating, house-face, and Necker cube rivalry’. Vision Res 45(1): 29–40.
van Ee, R. (2009). ‘Stochastic variations in sensory awareness are driven by noisy neuronal
adaptation: evidence from serial correlations in perceptual bistability’. J Opt Soc Am A Opt Image Sci Vis
26(12): 2612–22.
van Ee, R., Adams, W. J. et al. (2003). ‘Bayesian modeling of cue interaction: bistability in stereoscopic slant
perception’. J Opt Soc Am A Opt Image Sci Vis 20: 1398–406.
798 Alais and Blake

van Ee, R., van Dam, L. C. et al. (2005). ‘Voluntary control and the dynamics of perceptual bi-stability’.
Vision Res 45(1): 41–55.
van Ee, R., van Boxtel, J. J. et al. (2009). ‘Multisensory congruency as a mechanism for attentional control
over perceptual selection’. J Neurosci 29(37): 11641–9.
van Lier, R. and De Weert, C. M. M. (2003). ‘Intra- and interocular colour-specific activation during
dichoptic suppression’. Vision Res 43(10): 1111–6.
Vanrie, J., Dekeyser, M. et al. (2004). ‘Bistability and biasing effects in the perception of ambiguous
point-light walkers’. Perception 33: 547–60.
von Helmholtz, H. (1925). Treatise on physiological optics. (New York: Dover).
Walker, P. (1978). ‘Binocular rivalry: central or peripheral selective processes?’. Psychological Bulletin
85: 376–89.
Watson, T., Pearson, J. et al. (2004). ‘Perceptual grouping of biological motion promotes binocular rivalry’.
Current Biology 14(18): 1670–4.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt, II’. Psychologische Forschung 4: 301–50.
Wheatstone, C. (1838). ‘Contributions of the Physiology of vision. Part the first. On some remarkable, and
hitherto unobserved, phenomena of binocular vision’. Philosophical Transactions of the Royal Society of
London 128: 371–94.
Whittle, P., Bloor, D. C. et al. (1968). ‘Some experiments on figural effects in binocular rivalry’. Perception
& Psychophysics 4: 183–8.
Wilson, H. R., Blake, R. et al. (2001). ‘Dynamics of travelling waves in visual perception’. Nature
412(6850): 907–10.
Yang, E. and R. Blake (2012). ‘Deconstructing continuous flash suppression’. Journal of Vision 12(3): 8.
Yang, E., Zald, D. H. et al. (2007). ‘Fearful expressions gain preferential access to awareness during
continuous flash suppression’. Emotion 7(4): 882–6.
Zaretskaya, N., Thielscher, A. et al. (2010). ‘Disrupting parietal function prolongs dominance durations in
binocular rivalry’. Curr Biol 20(23): 2106–11.
Zhou, W., Jiang, Y. et al. (2010). ‘Olfaction Modulates Visual Perception in Binocular Rivalry’. Curr Biol
20: 1356–58.
Chapter 39

Perceptual organization
and consciousness
D. Samuel Schwarzkopf and Geraint Rees

Introduction
All of our lives revolve around our conscious experience of the world we inhabit. In spite of that,
the questions of why we have consciousness in the first place and how much it influences our
perception and action remains largely unanswered. Is consciousness just an epiphenomenon, a
genetic quirk that arose in the course of evolution as a consequence of other processes in the
human brain, or does it have a teleological purpose? For vision, this interpretation depends not
only on the object or feature that is the current focus of attention, but also on the perceptual
context in which it is embedded. Yet surprisingly, little is currently understood about how per-
ceptual organization affects our consciousness, whether conscious awareness of sensory stimuli
is a prerequisite for interpreting them as coherent objects and scenes, and the underlying neural
processes in the human brain.
This chapter will review the state of research on how consciousness is entwined with the per-
ceptual organization of sensory input. The first section, ‘Access to Consciousness’, describes the
categorical nature of how our conscious perception is typically viewed and how this can be used to
make inferences about the neural correlates of consciousness. The following section, ‘Unconscious
Perceptual Organization’, goes into more depth on the interaction between awareness of a stimu-
lus and the brain’s interpretation of it. This also includes a discussion of studies trying to address
the question as to whether there is any information that requires conscious awareness of the stim-
ulus to be processed. The final section, ‘Phenomenological Contents of Consciousness’, describes
research going beyond the purely categorical aspects of our awareness but instead concentrating
on the mechanisms determining a person’s percept of the environment.

Access to Consciousness
We are all familiar with the ways in which our awareness and our perception interact. At any
point in time, our sense organs are bombarded by an overwhelming amount of input; however,
we are usually not aware of this information overload. Rather, we usually feel that we are only
really conscious of a particular part or aspect of the environment. Moreover, some aspects of our
sensorium are usually or almost always outside our awareness (James 1890). For example, we are
generally unaware of our heartbeat or of the workings of our internal organs even though there
are afferent nerves continuously transmitting signals to our brain. Only when something requires
our attention, for example when we are hungry or sick, do we usually feel anything about our bod-
ies, and even then it is merely a vague feeling, not a thorough awareness of all our affected bodily
800 Schwarzkopf and Rees

functions. Thus, the focus of awareness constantly fluctuates, partly under our own volition and
partly for reasons that are mostly outside our control.
Studies investigating the neural events that determine whether a sensation reaches conscious-
ness and what kind of perceptual processing occurs unconsciously can take several forms. One
obvious approach is to manipulate directly whether the observer is aware of the sensory stimulus.
In the visual domain this is typically done through masking procedures of which there are numer-
ous variations. It is possible to mask a stimulus from being consciously perceived by presenting a
masking stimulus either directly after or before the onset of the stimulus. Among such methods,
meta-contrast masking (Breitmeyer and Ogmen 2000) employs masking stimuli with contours of
opposite contrast polarity to the stimulus of interest being presented subsequent to that stimu-
lus. This method can render even bright stimuli invisible to the observer. An extension to this
method employs a mask presented for a longer period before the stimulus of interest. Repeating
this several times results in a ‘standing wave of invisibility’ that can render a stimulus invisible for
prolonged periods (Macknik and Livingstone 1998). This methodology can show that informa-
tion about stimulus orientation is present in primary visual cortex (V1) even when the orienta-
tion does not reach awareness (Haynes and Rees 2005), consistent with behavioural experiments
showing that grating stimuli rendered invisible through various forms of masking can produce
contextual interactions or adaptation effects on contrast or orientation perception (Clifford and
Harris 2005; Falconbridge, Ware, and MacLeod 2010; Motoyoshi and Hayakawa 2010).
While such methods can be very effective in removing a stimulus from conscious access and
typically allow excellent experimental control over awareness, they share the caveat that they are
based on substantial perturbations of the stimulus and that it therefore becomes difficult to distin-
guish the effect of changes in the stimulus parameters from changes in consciousness. It is unsur-
prising that a stimulus presented in close temporal proximity to another stimulus will interfere
with the neuronal response to that stimulus (Macknik and Livingstone 1998). Nevertheless, this
approach can provide important insights into what distinguishes conscious and unconscious pro-
cessing as long as this stimulus confound is taken into account. In essence, if a stimulus can exert
unconscious effects when rendered invisible through masking (or any other stimulus manipula-
tion), this is sufficient evidence that it is processed even in the absence of awareness. However,
when no unconscious effects are observed, the interpretation is more complicated. The only direct
conclusion that can be made in this situation is that the processing of a stimulus is disrupted by
this stimulus manipulation. Further inference on the role of conscious awareness can only be
made through convergent evidence combining other masking procedures or different manipula-
tions of awareness.
Another popular approach to studying unconscious processing is therefore directly to exploit
the fluctuating focus of awareness. To do this, one can use multistable perception. Ambiguous
images, like those shown in Figure 39.1, can be interpreted in more than one way, but only one
interpretation is ever experienced at a time. The dynamics and behavioural studies of ambiguous
images are discussed in detail in the chapter by Alais and Blake (this volume). For example, the
Necker cube (Figure 39.1A) can be perceived such that the upper corner is either facing forward
or facing backward. Sometimes a third state is reported in which the impression of depth is lost
entirely—a two-dimensional collection of parallelograms. Critically, however, it is impossible to
see all of these interpretations simultaneously.
Under ideal situations, comparing the variable percept evoked by such ambiguous images dis-
sociates the contents of awareness (which alternate) from physical stimulation (which remains
unchanging). Naturally, this is based on the assumption that peripheral processes in the individ-
ual perceiving these stimuli are constant between the different perceptual experiences. This may
Perceptual Organization and Consciousness 801

(a) (b)

(c) (d)

Fig. 39.1  Examples of ambiguous stimuli showing both traditional examples (a, b) and stimuli that
become multistable because of changes in how the visual system interprets low-level information (c,
d). (a). The Necker Cube for which perception alternates between which face is interpreted as being
in front. (b). Binocular rivalry. When viewing this stimulus with red-blue anaglyph glasses perception
alternates between the two oblique grating patches (see the chapter by Alais and Blake for an
in-depth discussion and more examples). (c). Even though only the black bars are visible and physically
moving up and down (denoted by red arrows), perception can interpret this stimulus also as a black
diamond shape (implied by the dashed grey lines) viewed behind white, vertical occluding bars.
(Please refer to http://www.pnas.org/content/suppl/2002/10/26/192579399.DC1/5793Movie2Legend.
html for a moving demonstration. (d). Each of the four pairs of discs constantly circles around a hinge
point (denoted by red arrows). We can interpret this locally as four pairs of discs, but perception can
also be dominated by a global interpretation in which there are two groups of four dots arranged in
the squares implied by the dashed lines. Please refer to http://anstislab.ucsd.edu/2012/11/27/local-
and-global-motion-with-juno-kim/ for a moving demonstration and a discussion of the parameters
determining whether the local or global interpretation predominates.

not be the case in all situations. For example, subtle eye movements may change the retinal projec-
tion of the Necker cube and favour one interpretation of the two-dimensional image over another
(Einhäuser, Martin, and König 2004). In this context it is also worth noting that eye movements
do not correspond with perceived depth of a stimulus but reflect low-level attributes of the image
(Wismeijer et al. 2008, 2010). For ambiguous structure-from-motion stimuli that lead to percep-
tion of a three-dimensional shape spinning either clockwise or anti-clockwise, the percept may
depend on whether attention is directed to the dots drifting to the left or to the right. Moreover,
for many ambiguous stimuli one of the interpretations is more dominant. Thus, provided such
802 Schwarzkopf and Rees

peripheral factors are controlled for adequately, this approach permits a stronger inference to
be made about the neural correlates of consciousness than manipulating the stimulus directly.
However, by using multistable stimuli one loses direct experimental control over the observer’s
conscious perceptual experience.
One particular form of bistable perception occurs when two different stimuli are presented
to separate paired sensory organs, so that the brain receives conflicting sensory inputs. This has
been studied most extensively with binocular rivalry, when each eye is presented with a different
image. Rather than seeing an incoherent mixture or blend of the two images, conscious percep-
tion typically alternates between each monocular percept just as with other types of ambiguous
stimuli. A third piecemeal percept, where the perceived image is a mosaic of images seen by the
left and right eyes can also occur. During the switches between alternate interpretations, percep-
tion does not flip instantaneously from one state to another but changes rapidly from an initiating
location across the visual field, akin to a wave travelling across the image. Psychophysical studies
of binocular rivalry and such perceptual waves also receive much greater attention in the chapter
by Alais and Blake (this volume).
Of course, the eyes are not the only sensory organs that come as a pair. Therefore, it is unsur-
prising that there are equivalents of binocular rivalry for other senses. In binaural rivalry, the two
ears hear different sequences of tones. The resulting percept alternates between the specific sensa-
tions rather than evoking a cacophony of mismatching sounds (van Ee et al. 2009; Brancucci and
Tommasi 2011). Even more surprising, in binaral rivalry two different odours are administered
separately to each of the nostrils and again the perceived smell switches back and forth between
the two (Zhou and Chen 2009). Unlike binocular rivalry, which occurs naturally under normal
viewing conditions outside Panum’s fusional area, binaural and binaral rivalry are sensory condi-
tions that must be artificially created in a laboratory. In the normal environment of an organism
it is not probable that each of the nostrils would receive conflicting smells or that completely dif-
ferent sounds would reach each of the ears without any crossover between the two. On the other
hand, in natural vision the images projected onto the two retinas are generally quite distinct and
there are frequent occurrences where two completely different images are seen at least by parts of
each retina: for example, the region blocked by the nose. Moreover, outside Panum’s fusional area
binocular fusion does not occur. Fusing the two retinal images in a meaningful way is the basis
of stereovision and thus important for judging depth and distance. Thus, binocular rivalry is an
extreme situation that reveals a mechanism associated with normal visual processing away from
fixation. Binaural and binaral rivalry, on the other hand, seem to be a purer demonstration of the
processes underlying the wavering focus of awareness. It is therefore of note that in spite of this,
the three forms of bisensory rivalry are phenomenologically very similar.
Perhaps the simplest form of bistable perception occurs when two stimuli are superimposed or
mixed. In the visual domain this is sometimes referred to as monocular rivalry, that is, when the
same picture contains two different images. Again, the focus of perception can alternate between
the two individual images. Even though this effect may not have the same potency as binocular
rivalry or other ambiguous images, it underlines that all of the sensory input cannot be processed
simultaneously with equal processing resources. We can focus on one component image but only
perceive the other one as a distracting background blur or vice versa (O’Craven, Downing, and
Kanwisher 1999); alternatively, we may force vision to perceive both at the same time but this only
results in a messy, broken-up percept.
It should also be noted that the fact that perception can be multistable at all has implications for
our understanding of the perceptual apparatus. The reason that we are not conscious of all possible
interpretations of an ambiguous stimulus could be related to a limit in the capacity with which the
Perceptual Organization and Consciousness 803

brain can perceptually organize and interpret the overwhelming sensory input. However, if this is
true, this must mean that some information can only be processed with awareness of the stimulus.
Conversely, the fact that our percept does not simply stabilize into one of the possibilities is incon-
sistent with any account that the brain merely interprets the sensorium using the most probable
prior expectation. Instead perhaps the continuous fluctuation in perception reflects the brain’s
way to search for the appropriate solution when faced with strongly ambiguous input. Reconciling
theories of prediction with rivalrous perception remains an important topic for future research.
What neural processes underlie the perceptual switches and periods of perceptual dominance
in multistable perception? The advent of modern neuroimaging techniques like positron emis-
sion tomography (PET), functional magnetic resonance imaging (fMRI), electroencephalography
(EEG), and magnetoencephalography (MEG) have made it possible to measure neural activity
throughout the human brain whilst measuring behavioural reports of the observer’s percep-
tual state in real time. Such experiments show that regions of superior parietal and prefrontal
cortex, which are also associated with attentional deployment, are active during the transitions
from one perceptual state to another (Lumer, Friston, and Rees 1998). Moreover, the structure
of such regions is related to the frequency of perceptual switches. Specifically, individual differ-
ences in the grey matter volume in right superior parietal cortex correlate with the switch rate
for a structure-from-motion stimulus (Kanai, Bahrami, and Rees 2010). Causally manipulating
neural activity in these regions using transcranial magnetic stimulation (TMS) with continuous
theta-burst stimulation decreases switch rate (Kanai et al. 2010), showing that these areas play a
causal role in generating perceptual switches. Moreover, applying TMS to a slightly more anterior
part of parietal cortex has the opposite effect on switch rates in binocular rivalry (Carmel et al.
2010; Zaretskaya et al. 2010). Taken together this suggests a sophisticated model in which parietal
(and perhaps prefrontal) cortices play a complex causal role in generating top-down signals that
ultimately resolve perceptual competition in ventral visual cortex (Kanai et al. 2011).
The link between brain structure and the switch rate in these forms of perceptual rivalry also
hints at the possibility that these processes are deeply rooted in human physiology. While grey
matter volume can change over the lifespan and there is some short-term experience-dependent
plasticity associated with learning motor tasks (Draganski et  al. 2004), there is a strong herit-
able component to switch rate in multistable perception (Miller et al. 2010; Shannon et al. 2011).
Moreover, switch rate correlates with the occurrence and severity of bipolar disorder (Pettigrew
and Miller 1998; Miller et al. 2003; Krug et al. 2008; Nagamine et al. 2009). This obviously does
not imply that binocular rivalry, and bistable perception in general, is causal to psychiatric or neu-
rological conditions but it suggests that rivalry shares mechanisms affected in these conditions.
Recent studies investigated the balance of excitatory and inhibitory signalling in visual cortex
motivated by the assumption that this balance relates to the dynamics of perceptual rivalry (van
Loon et al. 2013), which could be altered in certain conditions (Aznar Casanova et al. 2013; Said
et al. 2013).
Naturally, the focus of awareness does not exist in isolation from wider perceptual process-
ing. While there is a strong stochastic element to how and when perceptual transitions occur
during multistable perception, the timing of such transitions is also strongly influenced by the
stimuli used and other factors, such as what stimuli had been presented previously or atten-
tional deployment. So it is possible to some degree to control perceptual alternations through
selectively attending to one particular interpretation (Ooi and He 1999; Hugrass and Crewther
2012), although binocular rivalry may be less susceptible to voluntary control than other forms
of multistability (Meng and Tong 2004). Moreover, when viewing of a binocular rivalry stimulus
is interrupted by a blank epoch, the first percept reported when the rivalrous stimulus returns is
804 Schwarzkopf and Rees

frequently the same as the one last perceived before the blank epoch (Leopold et al. 2002). Even
more fundamentally, basic image statistics can influence bistable perception. During binocular
rivalry, sharp edges with high contrasts and sudden movement usually result in perceptual domi-
nance, while homogeneous regions of an image tend to be suppressed. Thus rivalrous images that
contain a large degree of heterogeneity in one eye but homogenous regions in the other tend to
be dominated by the heterogeneous image. The sudden appearance of one monocular image can
substantially bias the percept to being dominated by that image, a process known as flash suppres-
sion (Wolfe 1984), perhaps because sudden appearance of a stimulus is particularly salient (Cole
et al. 2004).
This phenomenon can be exploited to sustain perceptual dominance of one eye for prolonged
periods. One eye views a dynamic stream of constantly changing patterns of high-contrast geo-
metric shapes (e.g. a Mondrian-like pattern) while the other views a low-contrast stimulus. Under
the right circumstances such continuous flash suppression (CFS) results in complete dominance
of perception for extended periods of time by the dynamic stimulus, thus suppressing the other
monocular stimulus from awareness (Tsuchiya and Koch 2005). It is however critical to keep in
mind that this suppression may differentially affect the low-level stimulus components, such as the
stimulus spatial frequency (Yang and Blake 2012) and the phase alignment of stimulus and mask
(Maehara et al. 2009). CFS has been used to study unconscious stimulus processing in numerous
studies and enjoys increasing popularity due to the ease of its use. In one variant of these experi-
ments, the contrast of the suppressed image is gradually increased and the critical parameter to
be measured is the ‘time to emergence’ when the suppressed stimulus breaks through the masking
stimulus in the other eye and reaches awareness. Comparing this parameter for different stimulus
conditions can reveal differences in the unconscious processing of the images (Jiang, Costello, and
He 2007). However, it is always important to keep in mind the time it takes a stimulus to break
interocular suppression may be determined not necessarily by a stimulus parameter of interest
but could also involve other, low-level features of the suppressed image. Further, it is possible that
a faster time to emergence does not actually reflect unconscious processing but rather the speed
(or other dynamics) with which the stimulus breaks through suppression once it has passed the
threshold to conscious perception.
At an even more basic level, image statistics vie for perceptual dominance. When one eye views
white noise images, while the other views noise images filtered to fall within the 1/f spectrum typi-
cally observed in natural scenes (Field 1987; Simoncelli and Olshausen 2001), the latter dominate
perception for a significantly longer periods than the white noise images (Baker and Graf 2009).
This may suggest that the visual system selectively responds by bringing stimuli whose image sta-
tistics conform with the natural world to the focus of awareness. However, the same may not apply
to higher-order image statistics, such as the collinearity or co-circularity of orientated segments
in the image. While some studies show that collinear gratings in a binocular rivalry stimulus tend
perceptually to transition as a group (Alais and Blake 1999), there have also been reports that
when a noisy field of grating patches of random orientations is paired with a field of varying levels
of co-circularity in the other eye, it is the incoherent, random pattern that dominates perception
(Hunt, Mattingley, and Goodhill 2012), even though the natural environment contains a high
degree of such co-circular regularities (Geisler et al. 2001; Geisler 2008). The reason for that may
be that the two monocular images in that study were not perfectly overlapping, so that individual
patches were not in directly rivalry with one another. As a matter of particular relevance to the
question as to how the visual system organizes stimulus elements into coherent objects it is inter-
esting that interocular suppression spreads along contours and around angles and even across
gaps in a contour provided that it is interpreted as arising from occlusion (Maruya and Blake
Perceptual Organization and Consciousness 805

2009). It is evident that the same processes that are involved in organizing our perception into a
coherent representation of the environment have complex interactions with awareness.
Bistability of the contents of awareness can also be experienced with regard to how the brain
interprets information as a coherent whole. A  stimulus like that shown in Figure 39.1C can
be perceived in different states, reflecting the way individual stimulus elements are regarded
as being independent or part of a larger object (Murray et al. 2002; Fang, Kersten, and Murray
2008). In the local state the two lines are perceived as drifting up or down, i.e. the veridical
interpretation. However, in the global state the observer instead reports the lines as the sides
of a square that is moving left and right behind several occluding rectangles. Which particular
interpretation currently dominates perception also influences the aftereffects from using these
stimuli as adaptors (He, Kersten, and Fang 2012). A similar stimulus is shown in Figure 39.1D.
There are four groups of stimuli, each comprising two discs circling around a central hinge point.
Under the local interpretation, each of these groups is perceived as independent moving objects
(perhaps akin to binary star systems). However, in the global state discs from distant locations
are grouped into larger entities, resulting in the percept of two squares rotating around one
another. Neuroimaging experiments show that in the global state, neural responses in early vis-
ual cortex to such stimuli are reduced relative to the local interpretation (Zaretskaya, Anstis, and
Bartels 2013). Such a response pattern is a hallmark of coherent perceptual organization, pos-
sibly indicative of predictive coding by which areas higher up in the processing hierarchy send
feedback signals to early visual cortex that cancel out the neural activity that is ‘explained away’
by coherent objects (Rao and Ballard 1999; Murray et al. 2002; Joo, Boynton, and Murray 2012).
However, such an interpretation is complicated by the fact that while responses in early visual
cortex are reduced, this reduction is general to the whole region, rather than specifically to the
location responding to the stimulus (de-Wit et al. 2012). Moreover, the neural representation
of the stimulus is enhanced (Kok, Jehee, and de Lange 2012), which could be related to the fact
that there is reduced variability in stimulus features (Dumoulin and Hess 2006) and thus reduced
lateral inhibition (which would appear as metabolic activity in neuroimaging measurements)
between adjacent neuronal populations with different tuning properties (Kinoshita, Gilbert, and
Das 2009). While such lower-level explanations cannot entirely account for findings supporting
the predictive coding hypothesis in the context of ambiguous stimuli, the underlying neural
mechanisms are probably more complicated than the predictive coding account proposes.
The beauty of these particular stimulus examples lies in the fact that, like all bistable images, the
stimuli themselves are physically constant and only perceptual organization alternates. However,
one problem with these particular forms of bistable perceptual organization is that our interpre-
tation is typically fairly biased towards one state. For instance, in the latter example the percept
becomes more predominantly local as the speed of rotation is increased, and, more critically, it
tends to become more global with prolonged exposure (Anstis and Kim 2011). This is also why it
is necessary to adapt stimulus parameters continuously to ensure relatively equal dominance of
each state (Zaretskaya et al. 2013), something that is typically less problematic for more classical
ambiguous stimuli like binocular rivalry or structure-from-motion displays that constantly switch
between perceptual states. Nevertheless, as these and other studies illustrate, stimuli like these can
be used successfully to reveal how grouping processes influence the contents of awareness.
One way to reveal neural correlates of consciousness and to understand what information is
processed in the absence of awareness is to rely entirely on whether a stimulus gains access to
conscious report or not. Multistability is not the only means of doing this. For example, there have
been demonstrations of priming effects exerted by stimuli that remained undetected in change
blindness paradigms (Silverman and Mack 2006; Yeh and Yang 2009). Interestingly, while previous
806 Schwarzkopf and Rees

neuroimaging and TMS experiments implicate right parietal and dorsolateral prefrontal cortex in
signalling for the presence of a change of the stimulus (Beck et al. 2001, 2006; Turatto, Sandrini,
and Miniussi 2004), there is also evidence to suggest that the memory trace of a stimulus can be
boosted by applying TMS to visual cortical areas encoding the stimulus (Schwarzkopf et al. 2010).
Research on the neural correlates of consciousness (Rees, Keiman, and Koch 2002) has also
implicated that recurrent connectivity between brain regions in the sensory hierarchy is criti-
cally important for conscious perception of a stimulus. The visibility of a visual stimulus under
meta-contrast masking correlates with effective connectivity between early visual areas and fusi-
form cortex, which seems to relate to activity in the region immediately surrounding the reti-
notopic representation of the stimulus in early visual cortex (Haynes, Driver, and Rees 2005).
Further, it has been proposed that feedback from higher regions into earlier areas is critical for
conscious perception (Roelfsema, Lamme, and Spekreijse 1998; Lamme and Roelfsema 2000;
Lamme 2006), although others have argued that at least for visual masking paradigms conscious-
ness varies due to disruptions in feed-forward processing (Tse et al. 2005; Dehaene et al. 2006;
Macknik and Martinez-Conde 2007).

Unconscious Perceptual Organization


Bistability is not the only phenomenon that can illustrate interactions between perceptual organi-
zation and awareness. The contents of awareness are modulated by many factors that can depend
on our perceptual organization. One such phenomenon is motion-induced blindness (Bonneh,
Cooperman, and Sagi 2001). Here a few small target stimuli, yellow dots, are placed inside a dark
background containing a rotating grid of blue dots. After viewing this stimulus (and maintaining
stable eye fixation) one of the yellow dots will vanish, completely blotted out by the surround-
ing dynamic background (see http://www.michaelbach.de/ot/mot-mib/index.html for a dem-
onstration). The neural processes underlying this effect remain unclear. Activity in retinotopic
regions corresponding to the target location is modulated upon its disappearance. The pattern
of modulation is complex with decreased activity in ventral region V4 accompanied by some-
what counter-intuitive increases in dorsal and early visual cortex (Donner et al. 2008; Schölvinck
and Rees 2010). Interestingly, the probability of disappearance is also enhanced when attention is
directed at the target (Schölvinck and Rees 2009).
A similar phenomenon is the artificial scotoma that occurs when we view a small plain target on
a background of high-contrast, dynamic noise. After prolonged viewing, the target is perceptually
filled in by the background and vanishes from awareness (Ramachandran and Gregory 1991).
This process is not dissimilar to the filling in that occurs in the blind spot corresponding to the
optic disc in the eye or with scotomas resulting from damage to the retina. Such filling in and per-
ceptual extrapolation mechanisms can be so effective that the individual themselves is not even
aware that there is anything abnormal with their vision. Neuroimaging experiments show that
the disappearance of the target stimulus is accompanied by a reduction in target-related neural
responses in early visual cortices (Weil et al. 2007; Weil, Watkins, and Rees 2008). These examples
demonstrate how our perceptual apparatus continuously works to interpret the sensory input and
extrapolates across gaps in the sensory representation to generate a more coherent representation
of the world in our mind’s eye.
The plethora of perceptual phenomena related to consciousness, both in terms of awareness
of an object’s presence and the fluctuating focus of our perceptual interpretation of the sensory
input, show that awareness and perceptual organization are closely intertwined. But to what
extent does awareness influence perception? Because the neural nature of consciousness remains
Perceptual Organization and Consciousness 807

very poorly understood, the role it plays in our interpretation of the environment is also difficult
to establish. Are there any perceptual functions that require conscious awareness of the stimulus?
Alternatively, could consciousness simply be a product of the mind but irrelevant for how the
brain analyses sensory information?
There have been numerous demonstrations of how unconscious stimuli can have complex and
powerful effects on behaviour. Images of emotional faces rendered invisible through masking,
can influence behavioural performance (Yang, Zald, and Blake 2007; Faivre, Berthet, and Kouide
2012; Almeida et al. 2013) and produce brain activation in neuroimaging experiments linked to
emotional processing, like enhanced amygdala responses to fearful faces (Williams et al. 2004;
De Gelder et  al. 2005). This suggests that the neural mechanisms required for detecting emo-
tional expressions operate even when we are not aware of the stimulus. Similar findings have been
made for social information in faces. For example, the time for a face to emergence from continu-
ous flash suppression (i.e. the time it takes for a low contrast face stimulus to break through the
dichoptic mask) is influenced by its dominance or trustworthiness (Stewart et al. 2012). It has
been argued that the information about emotional valence, in particular fear responses, is con-
fined to low spatial frequencies and bypasses the high-resolution image analysis in early visual
cortex entirely (Vuilleumier et al. 2003; Winston, Vuilleumier, and Dolan 2003) through a subcor-
tical pathway. This would suggest that while perceptual analysis necessary for such primal emo-
tional responses is independent of awareness, conscious processing may nevertheless be required
for detailed perceptual organization.
However, even more complex information is processed in the absence of awareness. For exam-
ple, semantic information can be processed without awareness and break through binocular
suppression (Costello et  al. 2009), although it is unclear how much semantic information can
be processed whilst undergoing dichoptic suppression (Zimba and Blake 1983). Organizing
local image features like lines and angles into letters, and subsequently letters into words, must
require fairly sophisticated processing. At least to some extent this process must be preserved in
the absence of conscious awareness. Whether or not an invisible stimulus exerts an influence on
perception probably also depends on what aspect of perception is measured: while a high-order
visual stimulus, like a spiral, may not produce adaptation (unlike simpler stimuli, like a grat-
ing) when masked from awareness, a complex, naturalistic image may still capture attentional
resources (Lin and He 2009). Further, as discussed earlier, one important aspect to consider is also
that the means by which a stimulus is rendered invisible may influence whether a stimulus can
have subliminal effects (Faivre et al. 2012; Yang and Blake 2012). A briefly presented stimulus fol-
lowed by a mask may be available to complete perceptual processing even though it is unavailable
to conscious report. On the other hand, presenting the same stimulus under conditions of binocu-
lar rivalry may eliminate its neural representation in higher brain regions where the information
about the stimulus eye of origin is lost.
In light of this problem, it is even more interesting that even the processing of complex natural
images appears to proceed under continuous flash suppression that renders the images invisible.
One study measured the time to emergence to visual scenes that were either congruent with the
natural world or contained some form of inconsistency, such as an archer using a tennis racket
instead of an arrow or basketball players using a watermelon instead of a ball (Mudrik et al. 2011).
Intriguingly, incongruent scenes broke through perceptual suppression faster than congruent
scenes. This may suggest that even the complex integration of objects in their semantic context
can occur in the absence of awareness. Even if we assume that this effect may in some way be
influenced by low-level image properties (an explanation which is somewhat unlikely due to the
diverse range of natural stimuli used in that study) and bypasses detailed visual analysis through
808 Schwarzkopf and Rees

different pathways, it must require some complex processes to identify the out-of-place features.
This finding is in some way contrary to the aforementioned reports of a bias for more ‘natural’
stimuli to dominate in binocular rivalry (Baker and Graf 2009). However, as discussed in the pre-
vious section, it is also important to note that the measure used by this study, time to emergence
from CFS, may not truly reflect the processing that occurs under suppression but the detection of
incongruent scenes at the moment of transition between suppression and visibility, which in turn
results in them reaching perceptual dominance with a faster speed.
In contrast to this finding, the neural representation of complex visual stimuli may not be
the same in the absence of awareness as during conscious viewing. For example, one study
used multivariate pattern decoding techniques to decode distributed activations measured with
high-resolution functional MRI in higher ventral visual cortex to distinguish processing associ-
ated with viewing of face or house images (Sterzer, Haynes, and Rees 2008). While it was possible
to decode which of the two stimulus classes was being processed, regardless of whether or not they
were rendered invisible by continuous flash suppression, the results suggested that the nature of
the pattern information under awareness and invisibility was different. This is notably different
from the situation in early visual cortex, where the neural representation of invisible orientated
gratings is similar to visible stimuli (Haynes and Rees 2005). The overall visual response in higher
visual brain regions to stimuli rendered invisible through binocular fusion (when two comple-
mentary images are presented to each eye and perceived merely as a uniform blank) can be very
similar, albeit weaker, to that to visible stimuli (Moutoussis and Zeki 2002; Schurger et al. 2010).
This suggests that there may be fundamental differences in terms of how information about the
visual stimulus is encoded during unconscious processing.
It has been argued that one neural correlate of awareness is the reliability of the visual response
to the stimulus (Schurger et al. 2010). Using functional MRI and multivariate decoding analysis
these authors showed that the pattern of activation produced by invisible stimuli is indeed more
variable compared to that for visible stimuli. However, it seems curious to regard this as a neural
correlate of consciousness: by definition variability must be determined over the course of multi-
ple or prolonged measurements. Consciousness, on the other hand, can vary from one moment
to the next. While it is certainly possible that one property granting neural representations access
to consciousness may be its temporal stability, the response patterns in functional MRI are meas-
ured on a trial-by-trial basis with each trial comprising slow haemodynamic measurements over
several seconds. It seems unlikely that response variability between such trials can explain the
absence (or presence) of awareness across all trials because awareness of a stimulus operates at
much faster time scales. More importantly, because this study employed a stimulus manipulation
(binocular fusion) to render the stimulus invisible, it is a demonstration of the earlier discussion
of masking methods: it is impossible to rule out that the reduced reliability of fMRI responses is
correlated to consciousness or merely a result of differences in the stimulus. Only a design that
compares conscious and unconscious trials with identical stimulation can conclusively arbitrate
between those possibilities.
Nevertheless, the finding is interesting because it suggests that without awareness a stabiliz-
ing influence on the neural representation may be lost. This is also supported by psychophysical
experiments showing that without awareness, behavioural tuning to orientation is broader, con-
sistent with greater variability (Ling and Blake 2009). In that study, awareness was manipulated
by using binocular rivalry with flash suppression, and comparing identical stimulus conditions in
the presence and absence of awareness, rather than directly manipulating the stimulus to render
it invisible. This provides stronger evidence that the differences indeed relate to consciousness
rather than physical differences in visual input.
Perceptual Organization and Consciousness 809

Another interesting aspect of Schurger and colleagues’ finding was that the brain regions where
the most diagnostic information about the visual images was encoded differed between visible
and invisible stimuli (Schurger et al. 2010). While the former selectively activated well-replicated
areas in ventral cortex known to respond preferentially to images of faces and houses, respectively,
invisible stimuli on the other hand were decoded by more posterior regions in intermediate fusi-
form cortex presumably corresponding to areas V4 and the VO complex (Wandell, Dumoulin,
and Brewer 2007). While these regions are already sensitive to relatively complex visual infor-
mation, they are not as selective for object identity. It is therefore possible that in the absence of
awareness visual information is encoded in a more incoherent form, relying on more primitive
features rather than abstract classes. At least some perceptual organization, transforming geomet-
ric primitives into coherent and meaningful objects, may thus require consciousness.
To test this notion, in behavioural experiments we measured priming effects produced by
simple visual shapes that were either visible or rendered invisible by fast counter-phase flicker,
a method that seems to allow for at least low-level processing of visual information to occur
(Falconbridge et  al. 2010). Shapes comprised sparse fragments and could either be defined by
the position or the orientation of the elements (Schwarzkopf and Rees 2010). We observed that
priming effects from invisible stimuli on the discrimination of shapes of the opposite feature only
occurred when the primes were defined by orientation. Moreover, this effect disappeared when
the discrimination targets were rescaled. This indicates that without awareness, oriented elements
are not integrated into an abstract representation of a shape but that some more local processes
involved in spatial integration, possibly confined to early retinotopic cortex, are nonetheless func-
tioning. Consciousness, it seems, is after all required for some more abstract analysis of the visual
environment.
This notion was also supported by an experiment in which we tested whether Kanizsa triangles
are formed when the inducers producing this type of illusory contour are rendered invisible by
continuous flash suppression, but a central region containing the illusory contour produced by
the stimulus configuration was available to conscious perception (Harris et al. 2011). Participants
were required to discriminate the orientation of the illusory contour. Without awareness, per-
formance was consistently at chance levels, indicating that participants could not perceive the
illusory contour. This contrasts with a control experiment where we showed that simultaneous
brightness contrast (Figure 39.2A), the contextual modulation of perceived brightness when a
stimulus is presented against a dark or light background, is preserved even when the background
is suppressed from awareness. This null finding for perception of illusory contours when the
inducers are suppressed from awareness cannot be explained by lack of statistical power, because
each participant performed a large number of trials and performance was extremely consistent
across the group. However, as previously discussed with any of these studies in which awareness
is manipulated by a change in the stimulus, it is possible that the dichoptic masking procedure,
rather than consciousness per se, interfered with the formation of the illusory contours. Others
have shown that when masking, illusory contours are not perceived when the inducers are sup-
pressed during binocular rivalry (Sobel and Blake 2003). There is evidence that illusory contours
are mediated by binocular neurons (Liu, Stevenson, and Schor 1994; Gillam and Nakayama 1999;
Häkkinen and Nyman 2001) that may have been affected by dichoptic masking. One argument
speaking against that is that Kanizsa triangles enhance the speed with which a stimulus breaks
through binocular suppression (Wang, Weng, and He 2012), although this is inconsistent with
the absence of any effect on dominance periods during binocular rivalry (Sobel and Blake 2003),
and it remains unclear to what degree the time to emergence from binocular suppression reflects
unconscious processing per se.
810 Schwarzkopf and Rees

(a) (b) (c)

(d) (e) (f)

Fig. 39.2  Visual illusions. (a). Simultaneous brightness contrast. The luminance of the two circles
is identical. (b). Contrast suppression. The contrast in the two circular patches is identical. (c).
Ebbinghaus illusion. The size of the two light grey circles is the identical. (d). Ponzo illusion. The length
of the two horizontal lines is identical. (e). Mueller-Lyer illusion. The length of horizontal section of the
two arrows is identical. (f). Shepard’s Tables. The surface area of the two tables is identical.

It is also likely that inferring illusory contours operates through a multi-stage process where
first the local stimulus features are segmented and grouped into objects, which then produces the
illusory percept possibly mediated by hierarchically earlier stages of visual processing through
feedback (Kogo et al. 2010); see also the chapter by Kogo and van Ee, this volume). This is consist-
ent with the finding that stimuli that mimic the salience of Kanizsa figures but that do not pro-
duce the percept of illusory contours produce similar neural responses in lateral occipital cortex,
a region presumed to be involved in extracting surfaces and objects (Stanley and Rubin 2003).
It also agrees with recent findings that the perception of Kanizsa stimuli depends not only on
processing in early visual cortex but also on feedback from higher lateral occipital cortex (Wokke
et al. 2013). The arrangement of the inducers may attract attention to the Kanizsa stimulus with-
out producing an actual percept of illusory contours. This is not an unlikely explanation because
there is considerable evidence that, while related, attention is a process distinct from awareness
(Kentridge, Heywood, and Weiskrantz 1999; Lamme 2003; Koch and Tsuchiya 2007; Bahrami
et al. 2008a, 2008b; Zhaoping 2008). Further, the spread of attentional responses in V1 is deter-
mined by Gestalt principles (Wannig, Stanisor, and Roelfsema 2011). The extent to which pro-
cessing of illusory contours occurs without awareness thus still remains a question to be resolved
by future research. However, our results already point towards the fact that illusory contours are
formed at least at a higher-level stage of processing beyond where signals from the two eyes are
still separate.
Perceptual Organization and Consciousness 811

Interestingly in this context, there have been findings from stroke patients with parietal extinc-
tion (where a stimulus on the side contralateral to a parietal lesion remains undetected if a
simultaneous ipsilateral stimulus is presented). Grouping of stimuli that form Kanizsa figures
can alleviate the effects of extinction (Mattingley, Davis, and Driver 1997; Conci et al. 2009), sug-
gesting that these processes are not dependent on awareness of the stimulus. However, again in
this situation it is unclear which comes first: the production of illusory contours or the segmenta-
tion of stimuli into surfaces. This line of research is discussed in greater detail in the chapter by
Gillebert and Humphreys (this volume).

Phenomenological Contents of Consciousness


Thus far we have considered consciousness in terms of the focus and contents of the mind’s eye.
However, the concept of the contents of awareness is broader than merely whether we are aware
of a stimulus or not. Perception of objects is strongly modulated by interactions with their neigh-
bours and the context in which they appear. Contextual illusions like simultaneous brightness
contrast and other examples shown in Figure 39.2 reveal processes by which the visual system
shapes perception of objects rather than representing a physically accurate reality. These pro-
cesses serve a teleological purpose because they reflect the way the brain interprets the small,
inherently two-dimensional images falling on the retinae in the eyes as originating in a large,
three-dimensional world. Perceiving the same luminance object with different brightness depend-
ing on whether it is brightly lit or in the shade, or two objects with identical retinal size as bigger
or smaller depending on how far away we believe it to be, are mechanisms for interpreting sensory
input in a meaningful way. Our perception may be ‘fooled’ by illusions, because they are tailored
around the way perceptual processing works; however, in the real world this processing typically
helps us understand that an object close to us is not oppressively large even though it covers most
of the visual field. The visual system is not designed for making photometric measurements or
precise estimations of visual angle. Its purpose is to help the observer understand and interpret
the environment and form a representation about their place in the world.
Through these modulations of our sensory input, illusions alter the contents of consciousness.
Rather than simply determining what we perceive at all, consciousness also reflects how we per-
ceive the world around us. Because they disentangle the physical reality of the stimulus from our
subjective experience of it, illusions are also excellent tools for research into how consciousness
interacts with perceptual organization and into the underlying neuronal mechanisms.
Typically, these illusions rely on the fact that physically identical stimuli can appear notably
different depending on either the surround they appear in or on other global interpretations. We
already mentioned simultaneous brightness contrast (Figure 39.2A), where the brightness of a
stimulus is influenced by the brightness of the surround. Similar effects are seen in the tilt illusion,
where the orientation of a central grating appears to be tilted away from that of a surrounding
annulus; contrast-suppression (Figure 39.2B), where the contrast of a central stimulus surrounded
by a high-contrast annulus appears to be reduced; and the Ebbinghaus illusion (Figure 39.2C),
where a stimulus appears larger or smaller depending on the size of and the distance from stimuli
surrounding it (Roberts, Harris, and Yates 2005). Other illusions, like the Ponzo and Mueller-Lyer
illusions (2D, E) and variants thereof, may affect the neural processes underpinning interpretation
of three-dimensional distance (Gregory 2008). Objects interpreted to be at a far distance appear to
be larger than those near to us. However, alternative accounts for several of these illusions have also
been proposed instead, positing that our perception of these illusions reflects the statistical prop-
erties of the visual environment (Howe and Purves 2004, 2005). The Shepard Tables (Figure 39.2F)
812 Schwarzkopf and Rees

influence our judgment of object size by exploiting inherent assumptions about perspective.
Finally, some illusions like the rotating snakes (http://www.ritsumei.ac.jp/~akitaoka/index-e.
html) motion that is not physically present in the image. Similarly, in the percept of illusory con-
tours and amodal completion in images like the aforementioned Kanizsa figures, or the extrapo-
lation of edges from abutting line segments (see the chapter by Kogo and van Ee, this volume,
for an in-depth discussion of these processes), we perceive a faint luminance edge that can be of
remarkable clarity simply due to the presence of inducing image components that imply the pres-
ence of a figure or an edge even though there is no physical luminance contrast. Thus, even very
simple geometric stimulus features can influence and alter the contents of awareness, making us
experience things that are not actually there.
Naturally, this list is not exhaustive but meant to give an overview of the different types of visual
illusions. One thing that they all share is that they affect the contents of our awareness by letting us
see things that are at odds with physical reality. Many neuroimaging studies show that the neural
representation of our perceived environment can be found even at relatively early stages of corti-
cal visual processing. For example, activity produced by physically identical stimuli in primary
visual cortex (V1) reflects their perceived size (Murray, Boyaci, and Kersten 2006). Subsequent
work shows that this was not solely due to larger responses to stimuli perceived as larger and that
this effect required participants to attend to the stimulus (Fang, Boyaci, et al. 2008). More recently,
this effect was further corroborated by the finding that the perceived size of a retinal afterim-
age is also reflected by V1 activity (Sperandio, Chouinard, and Goodale 2012). Intriguingly, the
perceived size of afterimages is also susceptible to contextual size illusions (Sperandio, Lak, and
Goodale 2012).
Consistent with this, in our own experiments the Ebbinghaus illusion is reduced under
dichoptic presentation when inducers and target stimuli are presented to different eyes (Song,
Schwarzkopf, and Rees 2011). Such absent or weak interocular transfer of an effect indicates that
it must be at least partly mediated by early stages of visual processing where the information from
the two eyes has not been fully combined, such as V1. We therefore hypothesized that the cortical
surface area of V1, which varies quite considerably between individuals (Andrews, Halpern, and
Purves 1997; Dougherty et al. 2003), might co-vary with the strength of such size illusions. In
particular, we reasoned that if the circuits mediating these illusions (lateral connections, feedback
pathways) do not scale with V1 surface area, the strength of these illusions should thus be reduced
in individuals with a larger V1. We measured the surface area of V1 in thirty individuals using
functional MRI and retinotopic mapping procedures (Schwarzkopf, Song, and Rees 2011) and
compared that to the magnitude of the Ebbinghaus and a variant of the Ponzo illusion measured
behaviourally in a psychophysics lab. As predicted, illusion magnitude was negatively correlated
with V1 surface area. In subsequent experiments we further show that this correlation is present
for both components of the Ebbinghaus stimulus, that is, both for contexts with both small and
large inducers. Our results further support the interpretation that the cortical distance over which
the contextual interaction occurs is a major factor determining illusion strength (Schwarzkopf
and Rees 2013). While correlational studies like this cannot resolve the question of causality and
the specific circuits mediating the illusion remain to be identified, our findings suggest that the
surface area of V1 at least in part reflects the subjective awareness of object size.
All of the examples in this section thus far have been in the visual domain. As with perceptual
science in general, vision has received most attention. However, there are also perceptual illu-
sions in other sensory domains and it is important not to neglect these as of course all sensory
input contributes to our subjective experience of the world. One example is the Aristotle illusion
from the somatosensory modality that can occur when we cross our fingers (as when wishing
Perceptual Organization and Consciousness 813

somebody luck, or hoping for our Nature manuscript to be accepted for publication) and then
touching a single marble so that it is held in between the two fingertips. One then has the experi-
ence (especially when moving the marble along the surface of a table or the floor) that there are
two marbles, each touching one finger (Aristotle 1924). This percept may arise because in our
interpretation of the somatosensory input the fingers are not normally crossed, and so under typi-
cal conditions the sensation caused by this finger configuration would truly reflect the presence
of two independent objects. Different sensory modalities may also interact to produce perceptual
illusions, such as in the flash-beep illusion where the presence of a two sounds presented in brief
succession simultaneously with a single visual flash can produce the percept of two independent
flashes (Shams, Kamitani, and Shimojo 2000; Watkins et al. 2007). Interestingly, how prone an
individual is to this illusion correlates with grey matter volume in early visual cortex (De Haas
et al. 2012). Another example is the McGurk effect (McGurk and MacDonald 1976), which occurs
when an auditory vocalization of a syllable is presented together with an incongruent movie of
a face vocalizing a different syllable. The actual percept tends to be a mixture of the two modali-
ties. Interestingly, in the context of the topics discussed earlier, congruency between the visual
face stimulus and the auditory vocalization helps the face break through interocular suppression
(Alsius and Munhall 2013); however, face stimuli rendered invisible through CFS did not produce
the McGurk illusion, suggesting that in order for a stimulus to exert multimodal effects it must be
consciously perceived (Palmer and Ramsey 2012).

Conclusion
In this chapter, we outlined some of the ways in which consciousness interacts with the perceptual
organization of our sensory input. Not only does the brain’s interpretation of stimuli influence
whether or not they reach the focus of our awareness, but we can also regard the way a scene is
perceived to be a reflection of our subjective experience, the contents of awareness. We described
a number of experiments investigating the processes by which our percepts are shaped by the
brain and how to separate those functions that operate in the absence of awareness from those
that require conscious processing. What kinds of sensory information can be interpreted without
awareness remains unclear. The literature on this question is patchy, with several studies inves-
tigating small aspects of unconscious perceptual processing, but a general theory tying together
these findings is elusive.
It also still remains unresolved how different means of removing a stimulus from conscious
access relate in terms of their neural mechanism, and as such in how far they can be compared.
The best experimental manipulations to study consciousness are those that keep the stimulus con-
stant and instead rely on subjective differences in awareness to dissociate objective physical prop-
erties from subjective experience. This makes bistable stimuli and contextual illusions popular
targets for experimental investigations, but the approach is not suited to addressing all questions.
Therefore, a more comprehensive comparison of different masking techniques will be instrumen-
tal in advancing our understanding of the role consciousness plays in perceptual organization.

References
Alais, D. and R. Blake (1999). ‘Grouping Visual Features during Binocular Rivalry’. Vision Research
39: 4341–4353.
Almeida, J., P. E. Pajtas, B. Z. Mahon, K. Nakayama, and A. Caramazza (2013). ‘Affect of the
Unconscious: Visually Suppressed Angry Faces Modulate our Decisions’. Cognitive Affective &
Behavioral Neuroscience 13: 94–101.
814 Schwarzkopf and Rees

Alsius, A. and K. G. Munhall (2013). ‘Detection of Audiovisual Speech Correspondences without Visual
Awareness’. Psychological Science 24: 423–431.
Andrews, T. J., S. D. Halpern, and D. Purves (1997). ‘Correlated Size Variations in Human Visual Cortex,
Lateral Geniculate Nucleus, and Optic Tract’. Journal of Neuroscience 17: 2859–2868.
Anstis, S. and J. Kim (2011). ‘Local versus Global Perception of Ambiguous Motion Displays’. Journal of
Vision 11 (3): 13.
Aristotle (1924). Metaphysics. Oxford: Oxford University Press.
Aznar Casanova, J. A., J. A. Amador Campos, M. Moreno Sánchez, and H. Supér (2013). ‘Onset Time
of Binocular Rivalry and Duration of Inter-dominance Periods as Psychophysical Markers of ADHD’.
Perception 42: 16–27.
Bahrami, B., D. Carmel, V. Walsh, G. Rees, and N. Lavie (2008a). ‘Spatial Attention Can Modulate
Unconscious Orientation Processing’. Perception 37: 1520–1528.
Bahrami, B., D. Carmel, V. Walsh, G. Rees, and N. Lavie (2008b). ‘Unconscious Orientation Processing
Depends on Perceptual Load’. Journal of Vision 8 (3): 12.
Baker, D. H. and E. W. Graf (2009). ‘Natural Images Dominate in Binocular Rivalry’. Proceedings of the
National Academy of Sciences USA 106: 5436–5441.
Beck, D. M., N. Muggleton, V. Walsh, and N. Lavie (2006). ‘Right Parietal Cortex Plays a Critical Role in
Change Blindness’. Cerebral Cortex 16: 712–717.
Beck, D. M., G. Rees, C. D. Frith, and N. Lavie (2001). ‘Neural Correlates of Change Detection and Change
Blindness’. Nature Neuroscience 4: 645–650.
Bonneh, Y. S., A. Cooperman, and D. Sagi (2001). ‘Motion-induced Blindness in Normal Observers’.
Nature 411: 798–801.
Brancucci, A. and L. Tommasi (2011). ‘“Binaural Rivalry”: Dichotic Listening as a Tool for the Investigation
of the Neural Correlate of Consciousness’. Brain and Cognition 76: 218–224.
Breitmeyer, B. G. and H. Ogmen (2000). ‘Recent Models and Findings in Visual Backward
Masking: A Comparison, Review, and Update’. Perception and Psychophysics 62: 1572–1595.
Carmel, D., V. Walsh, N. Lavie, and G. Rees (2010). ‘Right Parietal TMS Shortens Dominance Durations in
Binocular Rivalry’. Current Biology 20: R799–R800.
Clifford, C. W. G. and J. A. Harris (2005). ‘Contextual Modulation outside of Awareness’. Current Biology
15: 574–578.
Cole, G. G., R. W. Kentridge, C. A. Heywood, and G. G. Cole (2004). ‘Visual Salience in the Change
Detection Paradigm: The Special Role of Object Onset’. Journal of Experimental Psychology: Human
Perception and Performance 30: 464–477.
Conci, M., E. Böbel, E. Matthias, I. Keller, H. J. Müller, et al. (2009). ‘Preattentive Surface and
Contour Grouping in Kanizsa Figures: Evidence from Parietal Extinction’. Neuropsychologia
47: 726–732.
Costello, P., Y. Jiang, B. Baartman, K. McGlennen, and S. He (2009). ‘Semantic and Subword Priming
during Binocular Suppression’. Consciousness and Cognition 18: 375–382.
De Gelder, B., J. S. Morris, and R. J. Dolan (2005). ‘Unconscious Fear Influences Emotional Awareness of
Faces and Voices’. Proceedings of the National Academy of Sciences USA 102: 18682–18687.
De Haas, B., R. Kanai, L. Jalkanen, and G. Rees (2012). ‘Grey Matter Volume in Early Human Visual
Cortex Predicts Proneness to the Sound-induced Flash Illusion’. Proceedings of the Royal Society
B: Biological Sciences 279: 4955–4961.
Dehaene, S. S., J.-P. Changeux, L. Naccache, J. Sackur, and C. Sergent (2006). Conscious, preconscious,
and subliminal processing: a testable taxonomy. Trends in Cognitive Sciences (REGUL edn)
10: 204–211.
de-Wit, L. H., J. Kubilius, J. Wagemans, and H. P. Op de Beeck (2012). ‘Bistable Gestalts Reduce Activity in
the Whole of V1, not just the Retinotopically Predicted Parts’. Journal of Vision 12 (11): 12.
Perceptual Organization and Consciousness 815

Donner, T. H., D. Sagi, Y. S. Bonneh, and D. J. Heeger (2008). ‘Opposite Neural Signatures of
Motion-induced Blindness in Human Dorsal and Ventral Visual Cortex’. Journal of Neuroscience
28: 10298–10310.
Dougherty, R. F., V. M. Koch, A. A. Brewer, B. Fischer, J. Modersitzki, et al. (2003). ‘Visual Field
Representations and Locations of Visual Areas V1/2/3 in Human Visual Cortex’. Journal of Vision 3 (10): 1.
Draganski, B., C. Gaser, V. Busch, G. Schuierer, U. Bogdahn, et al. (2004). ‘Neuroplasticity: Changes in
Grey Matter Induced by Training’. Nature 427: 311–312.
Dumoulin, S. O. and R. F. Hess (2006). ‘Modulation of V1 Activity by Shape: Image-statistics or
Shape-based Perception?’ Journal of Neurophysiology 95: 3654–3664.
Einhäuser, W., K. A. C. Martin, and P. König (2004). ‘Are Switches in Perception of the Necker Cube
Related to Eye Position?’ European Journal of Neuroscience 20: 2811–2818.
Faivre, N., V. Berthet, and S. Kouider (2012). ‘Nonconscious Influences from Emotional
Faces: A Comparison of Visual Crowding, Masking, and Continuous Flash Suppression’. Frontiers in
Psychology 3: 129.
Falconbridge, M., A. Ware, and D. I. A. MacLeod (2010). ‘Imperceptibly Rapid Contrast Modulations
Processed in Cortex: Evidence from Psychophysics’. Journal of Vision 10 (8): 21.
Fang, F., H. Boyaci, D. Kersten, and S. O. Murray (2008). ‘Attention-dependent Representation of a Size
Illusion in Human V1’. Current Biology 18: 1707–1712.
Fang, F., D. Kersten, and S. O. Murray (2008). ‘Perceptual Grouping and Inverse fMRI Activity Patterns in
Human Visual Cortex’. Journal of Vision 8 (7): 2.
Field, D. J. (1987). ‘Relations between the Statistics of Natural Images and the Response Properties of
Cortical Cells’. Journal of the Optical Society of America A 4: 2379–2394.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-occurrence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41: 711–724.
Geisler, W. S. (2008). ‘Visual Perception and the Statistical Properties of Natural Scenes’. Annual Review of
Psychology 59: 167–192.
Gillam, B. and K. Nakayama (1999). ‘Quantitative Depth for a Phantom Surface Can Be Based on
Cyclopean Occlusion Cues Alone’. Vision Research 39: 109–112.
Gregory, R. L. (2008). ‘Emmert’s Law and the Moon Illusion’. Spatial Vision 21: 407–420.
Häkkinen, J. and G. Nyman (2001). ‘Phantom Surface Captures Stereopsis’. Vision Research 41: 187–199.
Harris, J. J., D. S. Schwarzkopf, C. Song, B. Bahrami, and G. Rees (2011). ‘Contextual Illusions Reveal the
Limit of Unconscious Visual Processing’. Psychological Science 22: 399–405.
Haynes, J.-D., J. Driver, and G. Rees (2005). ‘Visibility Reflects Dynamic Changes of Effective Connectivity
between V1 and Fusiform Cortex’. Neuron 46: 811–821.
Haynes, J.-D. and G. Rees (2005). ‘Predicting the Orientation of Invisible Stimuli from Activity in Human
Primary Visual Cortex’. Nature Neuroscience 8: 686–691.
He, D., D. Kersten, and F. Fang (2012). ‘Opposite Modulation of High- and Low-level Visual Aftereffects by
Perceptual Grouping’. Current Biology 22: 1040–1045.
Howe, C. Q. and D. Purves (2004). ‘Size Contrast and Assimilation Explained by the Statistics of Natural
Scene Geometry’. Journal of Cognitive Neuroscience 16: 90–102.
Howe, C. Q. and D. Purves (2005). ‘The Müller-Lyer Illusion Explained by the Statistics of Image-source
Relationships’. Proceedings of the National Academy of Sciences USA 102: 1234–1239.
Hugrass, L. and D. Crewther (2012). ‘Willpower and Conscious Percept: Volitional Switching in Binocular
Rivalry’. PLoS ONE. 7: e35963.
Hunt, J. J., J. B. Mattingley, and G. J. Goodhill (2012). ‘Randomly Oriented Edge Arrangements Dominate
Naturalistic Arrangements in Binocular Rivalry’. Vision Research 64: 49–55.
James W. (1890). The Principles of Psychology. New York: Holt.
816 Schwarzkopf and Rees

Jiang, Y., P. Costello, and S. He (2007). ‘Processing of Invisible Stimuli: Advantage of Upright


Faces and Recognizable Words in Overcoming Interocular Suppression’. Psychological Science
18: 349–355.
Joo, S. J., G. M. Boynton, and S. O. Murray (2012). ‘Long-range, Pattern-dependent Early Human Visual
Cortex’Contextual Effects in. Current Biology 22: 781–786.
Kanai, R., B. Bahrami, and G. Rees (2010). ‘Human Parietal Cortex Structure Predicts Individual
Differences in Perceptual Rivalry’. Current Biology 20: 1626–1630.
Kanai, R., D. Carmel, B. Bahrami and G. Rees (2011). ‘Structural and Functional Fractionation of Right
Superior Parietal Cortex in Bistable Perception’. Current Biology 21: R106–R107.
Kentridge, R. W., C. A. Heywood, and L. Weiskrantz (1999). ‘Attention without Awareness in Blindsight’.
Proceedings of the Royal Society B: Biological Sciences 266: 1805–1811.
Kinoshita, M., C. D. Gilbert, and A. Das (2009). ‘Optical Imaging of Contextual Interactions in V1 of the
Behaving Monkey’. Journal of Neurophysiology 102: 1930–1944.
Koch, C. and N. Tsuchiya (2007). ‘Attention and Consciousness: Two Distinct Brain Processes’. Trends in
Cognitive Sciences (regul. edn) 11: 16–22.
Kogo, N., C. Strecha, L. Van Gool, and J. Wagemans (2010). ‘Surface Construction by a 2-D
Differentiation-integration Process: A Neurocomputational Model for Perceived Border Ownership,
Depth, and Lightness in Kanizsa Figures’. Psychological Review 117: 406–439.
Kok, P., J. F. M. Jehee, F. P. de Lange (2012). ‘Less is More: Expectation Sharpens Representations in the
Primary Visual Cortex’. Neuron 75: 265–270.
Krug, K., E. Brunskill, A. Scarna, G. M. Goodwin, and A. J. Parker (2008). ‘Perceptual Switch Rates
with Ambiguous Structure-from-motion Figures in Bipolar Disorder’. Proceedings of the Royal Society
B: Biological Sciences 275: 1839–1848.
Lamme, V. A. F. and P. R. Roelfsema (2000). ‘The Distinct Modes of Vision Offered by Feedforward and
Recurrent Processing’. Trends in Neurosciences 23: 571–579.
Lamme, V. A. F. (2003). ‘Why Visual Attention and Awareness are Different’. Trends in Cognitive Sciences
(regul. edn) 7: 12–18.
Lamme, V. A. F. (2006). ‘Towards a True Neural Stance on Consciousness’. Trends in Cognitive Sciences
(regul. edn) 10: 494–501.
Leopold, D. A., M. Wilke, A. Maier, and N. K. Logothetis (2002). ‘Stable Perception of Visually Ambiguous
Patterns’. Nature Neuroscience 5: 605–609.
Lin, Z. and S. He (2009). ‘Seeing the Invisible: The Scope and Limits of Unconscious Processing in
Binocular Rivalry’. Progress in Neurobiology 87: 195–211.
Ling, S. and R. Blake (2009). ‘Suppression During Binocular Rivalry Broadens Orientation Tuning’.
Psychological Science 20: 1348–1355.
Liu, L., S. B. Stevenson, and C. M. Schor (1994). ‘Quantitative Stereoscopic Depth without Binocular
Correspondence’. Nature 367: 66–69.
Lumer, E. D., K. J. Friston, and G. Rees (1998). ‘Neural Correlates of Perceptual Rivalry in the Human
Brain’. Science 280: 1930–1934.
McGurk H. and J. MacDonald (1976). ‘Hearing Lips and Seeing Voices’. Nature 264: 746–748.
Macknik, S. L. and M. S. Livingstone (1998). ‘Neuronal Correlates of Visibility and Invisibility in the
Primate Visual System’. Nature Neuroscience 1: 144–149.
Macknik, S. L. and S. Martinez-Conde (2007). ‘The Role of Feedback in Visual Masking and Visual
Processing’. Advances in Cognitive Psychology 3: 125–152.
Maehara, G., P.-C. Huang, and R. F. Hess (2009). ‘Importance of Phase Alignment for Interocular
Suppression’. Vision Research 49: 1838–1847.
Maruya, K. and R. Blake (2009). ‘Spatial Spread of Interocular Suppression is Guided by Stimulus
Configuration’. Perception 38: 215–231.
Perceptual Organization and Consciousness 817

Mattingley, J. B., G. Davis, and J. Driver (1997). ‘Preattentive Filling-in of Visual Surfaces in Parietal
Extinction’. Science 275: 671–674.
Meng, M. and F. Tong (2004). ‘Can Attention Selectively Bias Bistable Perception? Differences between
Binocular Rivalry and Ambiguous Figures’. Journal of Vision 4 (7): 2.
Miller, S. M., B. D. Gynther, K. R. Heslop, G. B. Liu, P. B. Mitchell, et al. (2003). ‘Slow Binocular Rivalry in
Bipolar Disorder’. Psychological Medicine 33: 683–692.
Miller, S. M., N. K. Hansell, T. T. Ngo, G. B. Liu, J. D. Pettigrew, et al. (2010). ‘Genetic Contribution to
Individual Variation in Binocular Rivalry Rate’. Proceedings of the National Academy of Sciences USA
107: 2664–2668.
Motoyoshi, I. and S. Hayakawa (2010). ‘Adaptation-induced Blindness to Sluggish Stimuli’. Journal of Vision
10 (2): 16.
Moutoussis, K. and S. Zeki (2002). ‘The Relationship between Cortical Activation and Perception
Investigated with Invisible Stimuli’. Proceedings of the National Academy of Sciences USA 99: 9527–9532.
Mudrik, L., A. Breska, D. Lamy, and L. Y. Deouell (2011). ‘Integration without Awareness: Expanding the
Limits of Unconscious Processing’. Psychological Science 22: 764–770.
Murray, S. O., D. Kersten, B. A. Olshausen, P. Schrater, and D. L. Woods (2002). ‘Shape Perception
Reduces Activity in Human Primary Visual Cortex’. Proceedings of the National Academy of Sciences
USA 99: 15164–15169.
Murray, S. O., H. Boyaci, and D. Kersten (2006). ‘The Representation of Perceived Angular Size in Human
Primary Visual Cortex’. Nature Neuroscience 9: 429–434.
Nagamine, M., A. Yoshino, M. Miyazaki, Y. Takahashi, and S. Nomura (2009). ‘Difference in Binocular
Rivalry Rate between Patients with Bipolar I and Bipolar II Disorders’. Bipolar Disorders 11: 539–546.
O’Craven, K. M., P. E. Downing, and N. Kanwisher (1999). ‘fMRI Evidence for Objects as the Units of
Attentional Selection’. Nature 401: 584–587.
Ooi, T. L. and Z. J. He (1999). ‘Binocular Rivalry and Visual Awareness: The Role of Attention’. Perception
28: 551–574.
Palmer, T. D. and A. K. Ramsey (2012). ‘The Function of Consciousness in Multisensory Integration’.
Cognition. 125: 353–364.
Pettigrew, J. D. and S. M. Miller (1998). ‘A “Sticky” Interhemispheric Switch In Bipolar Disorder?’
Proceedings of the Royal Society B: Biological Sciences 265: 2141–2148.
Ramachandran, V. S. and R. L. Gregory (1991). ‘Perceptual Filling In of Artificially Induced Scotomas in
Human Vision’. Nature 350: 699–702.
Rao, R. P. and D. H. Ballard (1999). ‘Predictive Coding in the Visual Cortex: A Functional Interpretation of
Some Extra-classical Receptive-field Effects’. Nature Neuroscience 2: 79–87.
Rees, G., G. Kreiman, and C. Koch (2002). ‘Neural Correlates of Consciousness in Humans’. Nature
Reviews Neuroscience 3: 261–270.
Roberts, B., M. G. Harris, and T. A. Yates (2005). ‘The Roles of Inducer Size and Distance in the
Ebbinghaus Illusion (Titchener Circles)’. Perception 34: 847–856.
Roelfsema, P. R., V. A. Lamme, and H. Spekreijse (1998). ‘Object-based Attention in the Primary Visual
Cortex of the Macaque Monkey’. Nature 395: 376–381.
Said, C. P., R. D. Egan, N. J. Minshew, M. Behrmann, and D. J. Heeger (2013). ‘Normal Binocular Rivalry
in Autism: Implications for the Excitation/Inhibition Imbalance Hypothesis’. Vision Research 77: 59–66.
Schölvinck, M. L. and G. Rees (2009). ‘Attentional Influences on the Dynamics of Motion-induced
Blindness’. Journal of Vision 9 (1): 38.
Schölvinck, M. L. and G. Rees (2010). ‘Neural Correlates of Motion-induced Blindness in the Human
Brain’. Journal of Cognitive Neuroscience 22: 1235–1243.
Schurger, A., F. Pereira, A. Treisman, and J. D. Cohen (2010). ‘Reproducibility Distinguishes Conscious
from Nonconscious Neural Representations’. Science 327: 97–99.
818 Schwarzkopf and Rees

Schwarzkopf, D. S. and G. Rees (2010). ‘Interpreting Local Visual Features as a Global Shape Requires
Awareness’. Proceedings of the Royal Society B: Biological Sciences. http://rspb.royalsocietypublishing.org/
content/early/2010/12/04/rspb.2010.1909.
Schwarzkopf, D. S., J. Silvanto, S. Gilaie-Dotan, and G. Rees (2010). ‘Investigating Object Representations
during Change Detection in Human Extrastriate Cortex’. European Journal of Neuroscience
32: 1780–1787.
Schwarzkopf, D. S., C. Song, and G. Rees (2011). ‘The Surface Area of Human V1 Predicts the Subjective
Experience of Object Size’. Nature Neuroscience 14: 28–30.
Schwarzkopf, D. S. and G. Rees (2013). ‘Subjective Size Perception Depends on Central Visual Cortical
Magnification in Human V1’. PLoS ONE 8: e60550.
Shams, L., Y. Kamitani, and S. Shimojo (2000). ‘Illusions. What You See Is What You Hear’. Nature
408: 788.
Shannon, R. W., C. J. Patrick, Y. Jiang, E. Bernat, and S. He (2011). ‘Genes Contribute to the Switching
Dynamics of Bistable Perception’. Journal of Vision 11 (3): 8.
Silverman, M. E. and A. Mack (2006). ‘Change Blindness and Priming: When it Does and Does Not Occur’.
Consciousness and Cognition 15: 409–422.
Simoncelli, E. P. and B. A. Olshausen (2001). ‘Natural Image Statistics and Neural Representation’. Annual
Review of Neuroscience 24: 1193–1216.
Sobel, K. V. and R. Blake (2003). ‘Subjective Contours and Binocular Rivalry Suppression’. Vision Research
43: 1533–1540.
Song, C., D. S. Schwarzkopf, and G. Rees (2011). ‘Interocular Induction of Illusory Size Perception’. BMC
Neuroscience 12: 27.
Sperandio, I., P. A. Chouinard, and M. A. Goodale (2012). ‘Retinotopic Activity in V1 Reflects the
Perceived and not the Retinal Size of an Afterimage’. Nature Neuroscience 15: 540–542.
Sperandio, I., Lak, A., and M. A. Goodale (2012). ‘Afterimage Size is Modulated by Size-contrast Illusions’.
Journal of Vision 12 (2): 18.
Stanley, D. A. and N. Rubin (2003). ‘fMRI Activation in Response to Illusory Contours and Salient Regions
in the Human Lateral Occipital Complex’. Neuron 37: 323–331.
Sterzer, P., J.-D. Haynes, and G. Rees (2008). ‘Fine-scale Activity Patterns in High-level Visual Areas
Encode the Category of Invisible Objects’. Journal of Vision 8 (15): 10.
Stewart, L. H., Ajina, S., Getov, S., Bahrami, B., A. Todorov, et al. (2012). ‘Unconscious Evaluation of Faces
on Social Dimensions’. Journal of Experimental Psychology: General 141: 715–727.
Tse, P. U., S. Martinez-Conde, A. A. Schlegel, and S. L. Macknik (2005). ‘Visibility, Visual Awareness, and
Visual Masking of Simple Unattended Targets are Confined to Areas in the Occipital Cortex beyond
Human V1/V2’. Proceedings of the National Academy of Sciences USA 102: 17178–17183.
Tsuchiya, N. and C. Koch (2005). ‘Continuous Flash Suppression Reduces Negative Afterimages’. Nature
Neuroscience 8: 1096–1101.
Turatto, M., M. Sandrini, and C. Miniussi (2004). ‘The Role of the Right Dorsolateral Prefrontal Cortex in
Visual Change Awareness’. NeuroReport 15: 2549–2552.
van Ee, R., J. J. A. van Boxtel, A. L. Parker, and D. Alais (2009). ‘Multisensory Congruency as a Mechanism
for Attentional Control over Perceptual Selection’. Journal of Neuroscience 29: 11641–11649.
van Loon, A. M., T. Knapen, H. S. Scholte, E. St John-Saaltink, T. H. Donner, et al. (2013). ‘GABA Shapes
the Dynamics of Bistable Perception’. Current Biology 23: 823–827.
Vuilleumier, P., J. L. Armony, J. Driver, and R. J. Dolan (2003). ‘Distinct Spatial Frequency Sensitivities for
Processing Faces and Emotional Expressions’. Nature Neuroscience 6: 624–631.
Wandell, B. A., S. O. Dumoulin, and A. A. Brewer (2007). ‘Visual Field Maps in Human Cortex’. Neuron
56: 366–383.
Perceptual Organization and Consciousness 819

Wang, L., X. Weng, and S. He (2012). ‘Perceptual Grouping without Awareness: Superiority of Kanizsa
Triangle in Breaking Interocular Suppression’. PLoS ONE 7: e40106.
Wannig, A., L. Stanisor, and P. R. Roelfsema (2011). ‘Automatic Spread of Attentional Response
Modulation along Gestalt Criteria in Primary Visual Cortex’. Nature Neuroscience 14: 1243–1244.
Watkins, S., L. Shams, O. Josephs, and G. Rees (2007). ‘Activity in Human V1 Follows Multisensory
Perception’. Neuroimage 37: 572–578.
Weil, R. S., J. M. Kilner, J. D. Haynes, and G. Rees (2007). ‘Neural Correlates of Perceptual Filling-in of an
Artificial Scotoma in Humans’. Proceedings of the National Academy of Sciences USA 104: 5211–5216.
Weil, R. S., S. Watkins, and G. Rees (2008). ‘Neural Correlates of Perceptual Completion of an Artificial
Scotoma in Human Visual Cortex Measured Using Functional MRI’. Neuroimage 42: 1519–1528.
Williams, M. A., A. P. Morris, F. McGlone, D. F. Abbott, and J. B. Mattingley (2004). ‘Amygdala Responses
to Fearful and Happy Facial Expressions under Conditions of Binocular Suppression’. Journal of
Neuroscience 24: 2898–2904.
Winston, J. S., P. Vuilleumier, and R. J. Dolan (2003). ‘Effects of Low-spatial Frequency Components of
Fearful Faces on Fusiform Cortex Activity’. Current Biology 13: 1824–1829.
Wismeijer, D. A., R. van Ee, and C. J. Erkelens (2008). ‘Depth Cues, rather than Perceived Depth, Govern
Vergence’. Experimental Brain Research 184: 61–70.
Wismeijer, D. A., Erkelens, C. J., R. van Ee, and M. Wexler (2010). ‘Depth Cue Combination in
Spontaneous Eye Movements’. Journal of Vision 10 (6): 25.
Wokke, M. E., A. R. E. Vandenbroucke, H. S. Scholte, and V. A. F. Lamme (2013). ‘Confuse your
Illusion: Feedback to Early Visual Cortex Contributes to Perceptual Completion’. Psychological Science
24: 63–71.
Wolfe, J. M. (1984). ‘Reversing Ocular Dominance and Suppression in a Single Flash’. Vision Research
24: 471–478.
Yang, E., D. H. Zald, and R. Blake (2007). ‘Fearful Expressions Gain Preferential Access to Awareness
during Continuous Flash Suppression’. Emotion 7: 882–886.
Yang, E. and R. Blake (2012). ‘Deconstructing Continuous Flash Suppression’. Journal of Vision 12 (3): 8.
Yeh, Y.-Y. and C.-T. Yang (2009). ‘Is a Pre-change Object Representation Weakened under Correct
Detection of a Change?’ Consciousness and Cognition 18: 91–102.
Zaretskaya, N., S. Anstis, and A. Bartels (2013). ‘Parietal Cortex Mediates Conscious Perception of Illusory
Gestalt’. Journal of Neuroscience 33: 523–531.
Zaretskaya, N., A. Thielscher, N. K. Logothetis, and A. Bartels (2010). ‘Disrupting Parietal Function
Prolongs Dominance Durations in Binocular Rivalry’. Current Biology 20: 2106–2111.
Zhaoping, L. (2008). ‘Attention Capture by Eye of Origin Singletons even without Awareness: A Hallmark
of a Bottom-up Saliency Map in the Primary Visual Cortex’. Journal of Vision 8: 1.1–1.18.
Zhou, W. and D. Chen (2009). ‘Binaral Rivalry between the Nostrils and in the Cortex’. Current Biology
19: 1561–1565.
Zimba, L. D. and R. Blake (1983). ‘Binocular Rivalry and Semantic Processing: Out of Sight, Out of Mind’.
Journal of Experimental Psychology: Human Perception and Performance 9: 807–815.
Chapter 40

The temporal organization


of perception
Alex Holcombe

Visual perception textbooks and handbooks customarily do not include sections devoted to the
topic of time perception (the exception is van de Grind, Grusser, and Lunkenheimer 1973). But
this may soon change, with this chapter a sign of the times. In journals, the literature on tempo-
ral factors has grown very rapidly, and reviews in journals of time perception have proliferated
(Vroomen and Keetels 2010; Holcombe 2009; Wittmann 2011; Eagleman 2010; Grondin 2010;
Nishida and Johnston 2010; Spence and Parise 2010). In an attempt to restrict this review to fun-
damental issues, only simple judgments of temporal order will be considered. The rapidly growing
literature on duration judgments will not be discussed.
Interpreting experimental results requires assumptions. For temporal experience, it is tempting
to think of temporal experience as forming a single timeline, with all sensations mapped to points
or extents on that timeline. This assumption is often implicit in the literature, together with another
assumption to allow for the experience of simultaneity: that sensations closer than a certain inter-
val, the duration of the ‘simultaneity window’, are perceived as simultaneous (Meredith et al. 1987).
Yet it is far from clear whether experience comprises a single ordered timeline. This chapter
will question this assumption and ultimately suggest that our experience is frequently the product
of organizational processes whose purpose is not to create an ordered timeline. Rather, simpler
grouping and segmentation processes can be more important, with ordering sometimes only a
byproduct or not occurring at all.
Similar matters have arisen in the study of spatial perception. Marr (1982) suggested that the
visual system delivered a representation of the ordered 3-D layout of all the objects and surfaces in
a scene. This is similar to the ordered timeline view of temporal experience. The evidence suggests
that visual representation may be more impoverished than what Marr envisioned (Koenderink,
Richards, and van Doorn 2012) but in the spatial domain can still provide ordered and metric
depth relations (van Doorn et al. 2011). Whether our timeline of experience achieves that level of
organization, a consistent ordering, remains unclear.
One alternative to a well-ordered timeline is that we sometimes experience objects and quali-
ties with undefined temporal relationships. That is, there may be some percepts for which we do
not have an experience of before or after, and where the explanation for this failure is not simply
that the two stimuli fall within the simultaneity window. A possible example is provided in the
animations showcased at http://www.psych.usyd.edu.au/staff/alexh/research/colorMotionSimple.
In those animations, a field of dots alternates between leftward motion and rightward motion.
In synchrony with the motion direction alternation, the dots’ colour alternates between red and
green. Yet at alternation rates above about six times per second, one is unable to judge the pairing
of motion and colour, for example whether the leftward motion is paired with red or with green
The Temporal Organization of Perception 821

(Arnold 2005; Holcombe and Clifford 2012). Yet this rate is slow enough that the successive col-
ours and motions should not fall inside the same simultaneity window (Wittmann 2011).
A potentially related phenomenon was reported by William James in 1890. In Chapter 15 of his
Principles of Psychology, James claimed that
When many impressions follow in excessively rapid succession in time, although we may be distinctly
aware that they occupy some duration, and are not simultaneous, we may be quite at a loss to tell which
comes first and which last. (p.610)

Unfortunately, James provided no examples, so we do not know to what he was referring. More
detailed descriptions of dissociations of temporal order judgments and asynchrony judgments
have been provided by Jaśkowski and others (Jaśkowski 1991; Allan 1975), however these may
be explainable by decision criterion differences for the two tasks of a few tens of milliseconds.
A temporal order deficit that seems less likely to be explained by decision criteria differences was
reported by Holcombe, Kanwisher, and Treisman (2001), and can be experienced here:  http://
www.psych.usyd.edu.au/staff/alexh/research/MOD/demo.html. When four letters are presented
serially, each for about 200 ms, and the sequence repeats, observers are typically unable to report
their order. Yet if the sequence is presented just once, the order of the items is easily perceived (for
a possible auditory analogue, see Warren et al. 1969).
What are the implications of this phenomenon for the nature of temporal experience? It may
mean that temporal experience is less organized than spatial experience. Ordering seems more
integral to our representations of space, which benefit from the retinotopic organization of vis-
ual cortices. The positions of items on the retina are readily available thanks to this topography
(although determining their locations in external space is another matter, requiring more myste-
rious mechanisms). This organization also affords parallel processing of a large range of locations.
Orientation and boundary processing as well as local motion processing occur at many locations
simultaneously, providing some spatial relationships preattentively and continuously (e.g. Levi
1996; Forte, Hogben, and Ross 1999). At a larger scale, perception of certain global forms is based
on massively parallel processing (Clifford, Holcombe, and Pearson 2004), which may also be true
of perceiving the location of the centroid of a large array (Alvarez 2011).
The visual brain has retinotopy but does not seem to have chronotopy. That is, no brain area
seems to include an array of neurons that systematically respond to different times, arranged
in temporal order. A  possible exception is neurons selective for temporal rank order in
movement-related areas of cortex (Berdyyeva & Olson, 2010), but as far as we know these are
not involved in time perception. Our knowledge of the relative times of stimuli surely suffers for
lack of a chronotopic representation. Not only does the lack of chronotopy suggest the absence
of a readily available ordered temporal array, it may also mean less parallel processing of dis-
tinct times than of distinct locations. It is difficult to imagine that the brain gets by without any
parallel temporal processing, and without any sort of temporally structured buffer. Smithson
and Mollon (2006) and Smith et al. (2011) have provided some evidence for a temporally struc-
tured buffer in vision, but overall temporal processing seems less pre-organized than spatial
processing.
Retinotopy (or chronotopy) is not a full solution to the problem of perceiving spatial (or tem-
poral) relationships, even ignoring the complication of movements of the eyes and body. There
are aspects of spatial perception that are not achieved by specialized parallel processing, and those
solutions might also be used in temporal processing.
Two recent pieces of research suggest that some spatial relationships become available via serial,
one-by-one processing, through shifts of attention (Holcombe, Linares, and Vaziri-Pashkam
822 Holcombe

2011; Franconeri et al. 2011). With a moving spatial array, the Holcombe et al. (2011) study docu-
mented an inability to apprehend the spatial order of the items in the array when the items moved
faster than the speed limit on attentional tracking. This, together with a telling pattern of errors,
indicated that a time-consuming shift of spatial attention was necessary to determine the spatial
relationships among the stimuli. Converging evidence from Franconeri et al. (2011) suggests that
shifts of spatial attention are also involved in perceiving spatial relationships among static stimuli.
Attention may serve to select stimuli of interest for the limited-capacity processing that deter-
mines temporal and spatial relations.
Some aspects of the rich spatial layout we enjoy are thus a result of accumulated represen-
tations from multiple shifts of attention (see Cavanagh et al. 2010 for related ideas). In this
dependence on serial processing, spatial experience may be similar to temporal experience.
But even these attention-mediated aspects of spatial perception seem to capitalize on the par-
allel processing advantage of retinotopy. Shifting attention involves moving from activating
one set of location-labelled neurons to another set of location-labelled neurons (assuming
local sign has been set during the development of the organism—Lotze 1881). This may help
to calculate the vector of the attention shift, which then indicates the relative location of the
two regions.
Although it is limited by the absence of chronotopy, temporal processing does reap some ben-
efits from retinotopy. Thanks to retinotopy, motion detectors can operate in parallel across the
visual field. The motion direction they compute indicates the temporal order of stimuli.
It has also been suggested that retinotopy allows the visual system to compute in parallel
whether stimuli across the visual field change together (in synchrony) or not. Some investigators
suggested that this occurs not just for the luminance transients known to engage the motion
system, but also direction and contrast changes (Usher and Donnelly 1998; Lee and Blake 1999).
Follow-up work, however, supported alternative explanations (Dakin and Bex 2002; Beaudot
2002; Farid and Adelson 2001; Farid 2002). The issue remains unsettled, but the continued
absence of good evidence for parallel temporal processing feeds the suspicion that perception of
relative timing is serial and possibly attention-mediated. Temporal processing may be restricted
to what can be processed serially in the short interval before it disappears from our sensory
buffer.
In some ways even better than chronotopy would be time-stamping of all stimuli by an inter-
nal clock. The time stamp might be provided by a dedicated internal clock comprising a pace-
maker and counter (Treisman 1963; Ivry and Schlerf 2008) or a neural network with intrinsic
dynamics and an internal model of the network that translates the network state into the cur-
rent time (Karmarkar and Buonomano 2007). With time-stamping, relative timing of two events
is judged by simply comparing the time-stamps of the two events, just as is done by desktop
computers with files on a hard drive. If this were automatic and preattentive, then we might
have better-organized temporal experience than spatial experience. But there is little or no evi-
dence for extensive time-stamping. Instead the system may rely on less reliable information, like
the relative activation of different stimulus types. Because activation in cortex and presumably
short-term memory typically decreases over time, the most active item is likely to be the last one
presented, the second most active the item presented before, etc. This ‘recency’ scheme is sub-
ject to distortion as other factors like attention can affect which item is most active (Reeves and
Sperling 1986). The use of relative activation might also be thwarted with repeating displays that
result in saturation of the activation of multiple items.
An earlier paragraph described the alternating-motion display for which one cannot deter-
mine which colour goes with which motion direction (http://www.psych.usyd.edu.au/staff/alexh/
The Temporal Organization of Perception 823

research/colorMotionSimple). The repetition of this display may saturate in memory the acti-
vation levels of the colours and motions, preventing the use of relative activation levels to pair
the features. Another reason feature pairing may be difficult here is because pairing ordinarily
involves using salient temporal transients to temporally segment the dynamic scene (Holcombe
and Cavanagh 2008; Nishida and Johnston 2010; Nishida and Johnston 2002). The unusual unin-
terrupted motion of the alternating-motion display results in continual transients that swamp
registration of the transient associated with the colour change, and without other cues to rapidly
guide attention to the transients of interest (Holcombe and Cavanagh 2008), temporal experience
of the colour and motion remains poorly organized.
Only when the rate is slow can attention select an individual phase of the cycle, and that selec-
tion returns two features, indicating they occurred at the same time (Holcombe and Cavanagh
2008). This is like spatial visual search, for which Treisman and Gelade (1980) suggested that
attentional mediation is required to perceive that a colour and shape originate from the same
location. For time, strong luminance transients serve to engage the selective mechanism (perhaps
attention, or a ‘when’ pathway) that can make temporal relations explicit.
Thus determination of temporal order and simultaneity is best when just two punctate, discrete
events with strong transients are presented. In the remainder of this chapter we will set aside the
segmentation and processing capacity problems created by complex scenes. For the ideal situation
of two stimuli, we will examine how sophisticated visual temporal processing can be.
There is an important basic theoretical distinction between the time a percept is created and
at what time the observer experiences the event to have taken place. The analogous distinction
in spatial perception is uncontroversial, with the phrase ‘where an object is perceived’ taken to
mean ‘where an object is perceived to be’ rather than where in the brain the percept is created.
Yet if time is substituted for space and we write ‘when an object is perceived’, this will be inter-
preted by many as the time the percept was created rather than the time the percept refers to.
This is the issue of brain time vs event time—whether the brain processes events such that when
a percept arises is not identical to the time it is experienced as having occurred (Dennett and
Kinsbourne 1992).
Event time advocates have affirmed the distinction and moreover claimed that the system rou-
tinely considers the time of sensory signals together with other cues to infer the time of the cor-
responding stimuli in the external world. But this conclusion may be premature.

Brain time theory versus event time theory


Conceivably, there is no distinction between when an object is perceived and the time that it is
perceived to refer to. In other words, the time a percept occurs may be identical to the time that its
object is perceived to have occurred. This possibility is referred to as the brain time theory of tem-
poral perception. As Köhler put it in 1947, ‘Experienced order in time is always structurally iden-
tical with a functional order in the sequence of correlated brain processes’ (1947: 62) (Köhler’s
statement might also allow stretching of time that preserves order, but we will set aside this
complication).
According to this brain time theory, an event is perceived as occurring when the signals it
evokes in the senses reach the processes responsible for consciousness. Some signals may take
longer than others to travel from the receptors to the processes responsible for consciousness, and
this will result in temporal illusions, because there is no processing that might compensate for
delays. That is, signals with long latencies will be perceived as having occurred later than signals
with short latencies.
824 Holcombe

The alternative to brain time theory is that some property of signals other than when they arrive
affects when the associated events are perceived to have taken place. The brain may have adaptive
processes that result in perceived timing being closer to veridical than they would be otherwise.
But some question this supposition, among them Moutoussis, who writes that ‘the idea of the
perception of the time of a percept being different to the time that the actual percept is being
perceived, seems quite awkward’ (Moutoussis 2012: 4).
To other thinkers (e.g. Dennett and Kinsbourne 1992), this would be no more peculiar than
spatial illusions, wherein the perceived location of an object is dissociated from its retinal location
(e.g. Roelofs 1935; De Valois and De Valois 1991). Time perception may be as much a construc-
tive, interpretational process as is space perception. But to date, the evidence is that time percep-
tion does not adaptively take into account various cues to correct timing as comprehensively as
spatial perception uses spatial cues.

Event time theory and simultaneity constancy


Event time refers to the time that events occur in the environment rather than the time that they
are processed by various stages of the brain. Event time theory is the idea that the perceived tim-
ing of events do not always correspond to brain time, but rather the brain may effectively label a
percept as referring to a time different from when the percept became conscious. This could result
in the perceived time of events being more accurate. For the brain, there are two aspects to the
problem of recovering event time.
A first aspect is the different latencies and processing times that re-order the temporal sequence
of signals as they ascend the neural processing hierarchy. This is referred to as the differential
neural latency problem. The second aspect is the different times signals require to travel from their
physical sources to the receptors of the organism. For example, the light emanating from an object
will arrive at the eye sooner than its sound will arrive at the ear. This is the problem of differential
external latencies.
In the face of these two differential latency problems, recovering actual event time would be a
major achievement. It is sometimes claimed that the brain does accomplish this feat (Kopinska
and Harris 2004). Just as the visual system recovers the correct size of external objects despite
wide variation in retinal extent (size constancy), the brain may also recover the correct time of
events—‘simultaneity constancy’ (Kopinska and Harris 2004).

Brain time rules the day, and the minute


At the very coarse time frame of years, days, or hours, it’s clear that brain time rules and
simultaneity constancy fails. At night, when we look up at the sky and see stars, all the light
we receive was caused by events that took place years ago. But our brain does not compensate
for this travel time, and we perceive the stars’ appearance as being their appearance at the
present, rather than of years ago. When we look at the moon, we see what it was 1.3 seconds
ago, but again the brain does not compensate for this lag. Clearly any ‘simultaneity constancy’
or compensation for differential latencies is only partial at best. It is unreasonable to expect
the brain to know the distance of heavenly bodies, but more than this, absolutely no exam-
ples of evidence for simultaneity constancy on the scale of minutes or longer has ever been
offered (as far as I know). On the scale of minutes, hours, and days, brain time rules. At the
finer sub-second timescale however, some researchers have provided evidence for event time
processing.
The Temporal Organization of Perception 825

Does brain time rule the split-second?


Some researchers suggest that the brain generally does reconstruct event times, at least at the
sub-second scale (Harris et al. 2010). Eagleman writes that ‘the brain can keep account of laten-
cies’ (Eagleman 2010). His theory is that the brain waits until the slowest signals arrive, and then
reconstructs the order of events, compensating for the latencies of their neural signals.
The full range of evidence however includes some conspicuous failures of the system to account
for latencies, even at the sub-second scale with good cues available. These failures discard the
strong form of the event time theory—that latencies are comprehensively accounted for. Following
our discussion of that, examination of evidence for successful event time reconstruction will lead
to rejection of the other extreme, brain time theory, so we will conclude that partial compensation
does occur.

Failures to compensate for differential neural


and external latencies
The strength of a sensory signal can have a dramatic effect on its neural latency. The neural
signals evoked by a high-contrast flash reach visual cortex tens of milliseconds quicker than a
low-contrast one (Maunsell et al. 1999; Oram et al. 2002). This effect is very consistent and Oram
et al. (2002) reported that also at higher-order cortical areas such as STS, stimulus contrast is the
major determinant of response latency.
Successful compensation would amount to low-contrast flashes being perceived at the same
time as high-contrast flashes. But if people are asked to report which of two simultaneous flashes
of different contrasts came first, they more frequently report the higher-contrast one (Allik and
Kreegipuu 1999; Alpern 1954; Arden and Weale 1954; Exner 1875). It is natural to conclude that
high-contrast flashes are perceived before low-contrast flashes, constituting a failure of event
time perception. But that conclusion would be premature, because the greater salience of the
high-contrast stimulus may bias decisions regarding temporal order, even if perception is unaf-
fected (Yarrow et al. 2011; Schneider and Bavelier 2003). Such biases complicate the interpreta-
tion of much of the literature on temporal judgments. Fortunately, more convincing evidence
comes from two other illusions where decisional biases are unlikely to be responsible for the
phenomenon.
The first of these illusions was described by Hess in 1904. Hess and his subjects viewed two
patches, one directly above the other while they both moved from left to right. When one patch
was dimmer than the other, it appeared to lag the brighter patch, suggesting a difference in per-
ceptual latency. The spatial size of the lag seems to scale with speed (Wilson and Anstis 1969),
consistent with a constant temporal delay between two stimuli with a particular luminance dif-
ference. And the delay is substantial, around a few dozen milliseconds per log unit difference in
luminance (Wilson and Anstis 1969; White, Linares, and Holcombe 2008).
Eagleman (2010) argued that the Hess effect displays were one of only a few special cases where
the brain cannot succeed in accounting for differential latencies. Eagleman suggested that it was a
very special case indeed, writing that the Hess effect only occurs ‘when one uses a neutral density
filter over half the screen—simply reducing the contrast of a single dot is insufficient’. Contrary to
this proposal however, White, Linares and Holcombe (2008), for example, obtained a Hess effect
without changing the background luminance. And for the additional illusions reviewed below,
stimuli also were typically not presented in a larger filtered region.
826 Holcombe

The perceptual correlate of the intensity-related neural delay also manifests in motion signal
processing. Roufs (1963) and Arden and Weale (1954) presented two flashes simultaneously and
side by side on a dark background. When one flash was brighter than the other, motion was per-
ceived from the brighter flash to the dimmer flash. Stromeyer and Martini (2003) documented a
similar effect for two gratings differing in contrast rather than luminance. Motion was perceived
in the direction from the higher-contrast grating to the lower-contrast grating, consistent with
physiological evidence for latency decreasing with contrast as well as with luminance (Shapley and
Victor 1978; Benardete and Kaplan 1999). A number of other motion illusions are also consistent
with the effect of luminance or contrast on latency (Purushothaman et al. 1998; Ogmen et al. 2004;
Lappe and Krekelberg 1998; White, Linares, and Holcombe 2008; Kitaoka and Ashida 2007).
An apparent concordance of physiological latency and percepts is also observed for stimuli
darker than the background vs stimuli brighter than the background. ON-centre ganglion cells
in primate retina respond ~5 ms faster than OFF-centre cells. Correspondingly, psychophysical
motion nulling experiments in humans indicate that dark dots have a processing latency of about
3 ms shorter than bright dots (Del Viva, Gori, and Burr 2006).
Together these illusions indicate that brain time rules when it comes to neural latency differ-
ences caused by variations in luminance or contrast. Unfortunately we cannot exclude the pos-
sibility that the brain engages in partial compensation for the latency difference while consistently
falling short of full compensation. But the size of the effects are similar in human perceptual stud-
ies and in the latency of physiological responses in nonhuman animals (Maunsell et al. 1999; Oram
et al. 2002), so any neural accounting for latency differences must be woefully under-complete.
To explain these phenomena, defenders of the event time hypothesis may argue that they are
an exception, perhaps because these luminance-related latency differences are unimportant to the
organism. But this argument is less than compelling, as explained in the next section.

Compensation in action but not perception?


Well-timed behaviour is critical in playing many sports, in fighting, and in hunting. The size of the
Hess effect in the photopic range is roughly 8 ms per log unit of luminance (White, Linares, and
Holcombe 2008). Comparing a daylight-illuminated object to one in dark shadow (5 log units or
more), then, the object in shadow will be delayed by about 40 ms. If the objects were moving at 10
km/hr, this would result in a perceived spatial offset of 11 cm.
These numbers may seem small, but they are large relative to the accuracy of human perfor-
mance in hitting a ball with a bat. Even amateurs hitting a ball with a bat achieve better than 15
ms resolution (McLeod, McLaughlin, and Nimmo-Smith 1985) and some expert cricket bat-
ters seem to have 2 ms resolution (McLeod and Jenkins 1991). The size of the Hess effect is
large enough, then, to substantially impair performance. Its existence then should be surprising
for theorists who are sanguine about the general ability of the visual system to compensate for
latencies.
But even if sensory learning does not compensate for delays caused by low luminance, this
does not mean that sportsmen are condemned to miss the ball when the sun begins to set.
Sensorimotor (as opposed to sensory) learning may save the day (White, Linares, and Holcombe
2008; Nijhawan 2008). Actions like hitting a ball involve mapping the timing of sensory signals
onto behaviour. Mappings between particular luminances and particular timings could perhaps
be learned thanks to the feedback involved in successful action. But if this learning does not
occur in the sensation→perception mapping (as argued in this chapter), then it may apply only to
The Temporal Organization of Perception 827

the perception→action mapping. That is, the error signal may not propagate to the deeper (sensa-
tion and perception) layers of the system because they are farther from the teaching feedback.

Evidence for event time reconstruction


As reviewed above, luminance contrast has a consistent effect on latencies in the visual system,
but perception does not seem to take account of these effects for reconstruction of event time.
Let’s consider another factor that consistently affects latencies: the sensory modality of the signal.
Auditory signals reach cortex quicker than visual signals, by roughly 30 to 50 ms (Regan 1989;
Musacchia and Schroeder 2009).
Yet the sight and sound of snapped fingers is not noticeably out of sync. This apparent discrep-
ancy between perception and neural latencies has been cited as a case of simultaneity constancy or
‘active editing’ of time (Eagleman 2007, 2009, 2010). The sight and sound of snapped fingers may
indeed be typically perceived as simultaneous. This does not however imply editing of event time.
Rather, the perceived simultaneity may simply be due to our poor acuity for perceiving temporal
differences or to a broad simultaneity window.
Consider the relevant sort of psychophysical experiment. These reveal that although in many
cases people are more likely to judge physically simultaneous sounds and flashes as simultaneous
than as having occurred at different times, simultaneity is not the timing most likely to yield a per-
cept of simultaneity. Instead, the best timing for perceptual simultaneity is, for most participants,
to present the flash before the sound (Stone et al. 2001), consistent with sounds being processed
faster than flashes. The point of subjective simultaneity is the relative timing value at which both
responses are equally likely when a person is forced to choose which of two signals was presented
first. The non-zero point of subjective simultaneity suggests that the differences in latency were
not entirely compensated for, or not compensated at all.
Then why do the sight and sound of snapped fingers seem in sync? The perceptual asynchrony
may simply not be large enough to be detected. Temporal order discrimination ability is just too
poor (e.g. van Eijk et al. 2008). Active editing or reconstruction of event time need not be invoked.
An additional factor that might make the snapped fingers asynchrony even more difficult to notice
is the ambiguity in which moment of the temporally extended visual event generated the sound.
It is not until the end of the fingers’ movement that the finger generates the snapping sound. If
the brain instead assumes that the sound corresponds more to the beginning of the movement,
this corresponds to an earlier visual event, diminishing the difference in neural latencies between
sound and corresponding sight.
While the auditory/visual latency difference and luminance contrast effects demonstrate fail-
ures to reconstruct event time, they do not imply that the perceptual system never reconstructs
event time. After all, even the clear successes of adaptive vision turn into failures when certain
limits are exceeded. In the case of size constancy for example, while the visual system does an
acceptable job, failures are common (McBeath, Neuhoff, and Schiano 1993; Granrud et al. 2003).
If an organism must learn its own latencies over its lifespan, we might end up with a patchwork of
partial event time reconstructions. To fully evaluate whether the brain takes account of latencies,
we must review the other phenomena promulgated as evidence for simultaneity constancy.

Compensation for auditory distance?


Several researchers have suggested that the brain compensates for the effect of the slow speed of
sound relative to the faster speed of light. Although the difference in timing of sound and sight is
828 Holcombe

small for most events, during storms we sometimes experience a very large timing difference. A dis-
tant thunderclap is heard a few seconds after the light from the physically simultaneous lightning
bolt. Because we do not perceive distant thunder and lightning as simultaneous, clearly our brain
does not reconstruct the simultaneity of these events. This is unsurprising even for advocates of event
time reconstruction, because the nature of the event and its distance is not easily perceived. For much
closer events, however, from a few centimetres to a few dozen metres away, some have suggested that
neural processing does result in perceiving an associated sound and light as simultaneous.
Studies of the issue have generally presented a light and a sound at different distances and
different relative timings. According to the event time hypothesis, the point of subjective simul-
taneity for the sound and the light should shift with greater object distance. That is, for greater
object distances, larger sound delays should be considered simultaneous. However, different stud-
ies have yielded very different results. Keetels and Vroomen (2012) and Vroomen and Keetels
(2010) provide good reviews of the subject and consider various explanations for the discrepancy
between those that favour the hypothesis (Sugita and Suzuki 2003; Alais and Carlile 2005; Engel
and Dougherty 1971; Kopinska and Harris 2004) and those that do not (Arnold, Johnston, and
Nishida 2005; Heron et al. 2007; Lewald and Guski 2003; Stone et al. 2001). The issue is complex,
for example because negative findings can be blamed on the experimenters presenting the visual
and auditory information in such a way that the observer perceives the distance to the sound
inaccurately. Second, whether trials with different times and distances were blocked or mixed
can change the adaptation state of the observer, and as this can shift the simultaneity point (as
described below), it might explain some of the findings supporting latency compensation.

Compensation for the length of tactile nerves?


Simultaneity constancy in tactile perception would be more straightforward to assess, and presum-
ably for the brain to implement, than simultaneity constancy in the audiovisual domain. Tactile
signals from the toe reach the brain about 40 ms after the signals from the face (Macefield et al.
1989). The brain might compensate for this fact of longer latencies from parts of the body farther
than the brain, so that a simultaneous touch on toe and forehead feels simultaneous. Whereas
audiovisual simultaneity constancy is complicated by the fact that the transmission time of sounds
varies with the distance of the source, the latency differences of tactile stimulation should be more
stable, possibly making it easier to learn.
Otto Klemm, at the time a junior colleague of Wilhelm Wundt in Leipzig, published a series of
studies of the topic (Klemm 1925). Klemm presented tactile stimuli to the forehead, index finger,
and ankle. The method he used is not entirely clear but he seems to have asked participants to
report which of two touches was presented first, while also giving them the option of responding
‘simultaneous’.
An interesting complication he encountered may be relevant to whether sensations are consist-
ently assigned to points on a timeline or instead are represented differently. In the simple situation
of a touch on the head accompanied by one near the ankle, Klemm reports (1925: 215): ‘At the
beginning of the series some of the observers were helpless even when fairly large temporal sepa-
rations were used . . . observers had a lot of trouble to judge direct simultaneity: Since the two tac-
tile impressions did not go together [zusammengingen] into one common Gestalt it was difficult
to merge [zusammenfassen] them to simultaneity’ (translation courtesy of Lars T. Boenke). Fraisse
(1964) makes a related observation that it is difficult to combine stimuli of different modalities
and perceive them as synchronous.
Klemm pressed on with testing his subjects until they produced reliable measurements (he did
not report how much experience was required). He determined that five of his six participants,
The Temporal Organization of Perception 829

when presented with simultaneous stimulation to ankle and forehead, tended to report that the
forehead was stimulated first. More specifically, in those five participants the ankle had to be
touched 23 to 30 ms earlier than the forehead for the best chance of perceived simultaneity. In the
sixth observer, he instead found evidence for simultaneity constancy, with the point of subjective
simultaneity being true physical simultaneity. It is hard to know what to conclude, and indeed
Klemm himself expressed some frustration. Klemm also noted that even when participants per-
formed the temporal task without a problem, some continued to report that, as described in the
previous paragraph, it felt artificial to categorize temporal order.
Halliday and Mingay (1964) performed a similar study, but unfortunately with only two partici-
pants. For both participants, Halliday and Mingay concluded that touches of more distal body parts
(toe vs index finger, in their case) were perceived to have occurred later. Harrar and Harris (2005)
followed with more experiments that yielded the same result, using temporal order judgments to
infer the time difference for subjective simultaneity. Quantitatively, pooling the data across their
six participants, they reported that the difference in perceived timing was approximately that pre-
dicted by the differences in simple reaction time to the body parts involved. Unfortunately, they
did not assess whether some participants were different than others, so we do not know if there
was the significant variation between participants that Klemm found. Bergenheim et al. (1996)
also investigated the issue, and like the others found evidence that stimulation of the more distal
body parts was perceived later than more proximal areas. However, Bergenheim et al. suggested
that the discrepancy they found between foot and arm (12 ms) was not as large as it should be for
the difference in conduction latency indicated by physiological studies.
In summary, all researchers found that on average, stimulation of distal areas of the skin was
perceived as occurring earlier in time than stimulation of more proximal areas. If there is any
compensation at all, it appears that the proportion of latency difference compensated for is small,
or the proportion of people who compensate for latency is small. Settling the issue will require
more studies of this topic using modern physiological methods, larger numbers of participants,
and enough data per participant to assess simultaneity constancy in each participant.
To evaluate whether the times at which signals are perceived reflects compensation for signal
processing latencies, we have reviewed the effects on perceptual latency of luminance, originat-
ing modality, the speed of sound, and the length of tactile fibers. The support in the literature for
adaptive compensation in these instances ranges from none to mixed.
Yet one class of studies provides strong evidence for limited compensation. These are the stud-
ies of adaptation to asynchrony. The phenomenon involved suggests a path to understanding the
imperfect and limited processing that can compensate for differential latency.

Intersensory adaptation to take account


of latency differences
Fujisaki et al. (2004) repeatedly exposed participants to a particular asynchrony between auditory
and visual information, and found consistent effects on the point of subjective simultaneity. In
one condition, a tone pip was followed 235 ms later by a flashed ring. After about three minutes
of repeated exposure to that sequence, participants made temporal order judgments to a range of
temporal offsets, which revealed that the point of subjective simultaneity had shifted by an average
of 22 ms. The shift was in the direction appropriate to compensate for the 235-ms offset between
sight and sound. Other studies have proven this result to be robust (Vroomen et al. 2004; Hanson,
Heron, and Whitaker 2008; Harrar and Harris 2008; Di Luca, Machulla, and Ernst 2009; Roach
et al. 2010), and a similar phenomenon has been observed for other modality pairings (Di Luca,
830 Holcombe

Machulla, and Ernst 2009). Compensation for a particular asynchrony has also been observed for
the temporal delay between actions and their sensory consequences (Cunningham, Billock, and
Tsou 2001; Stetson et al. 2006), and these shifts do not seem to be caused by shifting the physical
time of stimulus-evoked neural signals (Roach et al. 2010).
Not only do these results constitute evidence for event time reconstruction rather than reliance
on brain time, but they also indicate how latency differences might be known, through learning.
The rationale for these shifts may stem from the statistics of the natural environment, where the
distribution of the relative timing of stimulation by external events is likely to be centred on or
near zero (simultaneity). Processes for compensation of any consistent departures of the average
may therefore cause the adaptation effects.
These adaptation effects are analogous to after-effects for other aspects of perception such as
motion and orientation. Accordingly, to explain these effects researchers typically invoke similar
neural mechanisms as those that have been proposed to explain traditional adaptation effects.
Specifically, a typical suggestion is that neurons in the brain are selective for the adapted feature,
and that adaptation of these neurons causes the after-effect. In the case of the intersensory timing
shifts, both Roach et al. (2010) and Cai, Stetson, and Eagleman (2012) suggest that the responsi-
ble neurons are multimodal neurons tuned to different asynchronies between the modalities. In
the cat, there are indeed multimodal neurons that prefer different asynchronies (Meredith et al.
1987) and these also appear to exist in rhesus monkeys (Wallace, Wilkinson, and Stein 2012).
The relative timing perceived may reflect the differing activity of these neurons. Adaptation shifts
this activity difference in a manner that compensates for the asynchrony (Roach et al. 2010 Cai,
Stetson and Eagleman 2012).

Timing-selective neurons vs criterion shifts and expectations


The explanation of asynchrony after-effects in terms of a population of neurons tuned to vari-
ous asynchronies is appealing. But other possible explanations should be considered, especially
because one recent result is difficult to explain in the standard way.
An adaptation effect reported by Roseboom and Arnold (2011) amounts to a shift in perceived
audiovisual timing that is specific to the visual stimulus used. Participants in the experiment saw
video clips of a male and a female actor on different trials, all saying the syllable ‘ba’. In one condi-
tion the auditory signal of the male actor was always presented 300 ms before the video, whereas
the auditory signal of the female actor was always presented 300 ms after the video. In other
words, participants adapted to opposite A-V timing shifts for the male speaker and for the female
speaker. After 50 presentations of these stimuli, participants were tested to determine what timing
relationship they considered simultaneous.
For the test, participants were shown the videos with a range of relative timings between the
auditory and visual component, and each time asked to judge whether the sound and the video
were simultaneous. It turned out that the point of subjective simultaneity had shifted by a few
dozen milliseconds to compensate for the adapted asynchrony, but shifted in different directions
for the male actor and the female actor. The temporal shift maintained this association with the
actor even though the locations of the two actors were switched during test, meaning that the tim-
ing shift was contingent more on the actor than on the location the actor was presented in during
the adaptation phase.
Unlike the experiments involving a simple, single auditory-visual timing offset, these results
cannot be explained by the adaptation of a population of multimodal neurons tuned to various
auditory-visual timings. The contingency on the actor requires additional processes. One might
The Temporal Organization of Perception 831

extend the logic of explaining simple asynchrony adaptation with multimodal neurons by posit-
ing neurons that are jointly selective for actor and audiovisual timing. But this might lead to a
combinatorial explosion of neurons, as the contingency on ‘actor’ is unlikely to be the only pos-
sible contingency. A range of neurons would be needed for each kind of contingency. A process
with more flexibility should be considered.
The processing that shifts decision criteria may fit the bill of a suitably flexible process that can
accommodate different contingencies. In signal detection theory, the criterion is a threshold level
of the internal signal that the observer uses to decide which response to make. In the context of
a simultaneity judgment the relevant signal may be something like the difference in the internal
timing of the auditory response and the visual response. This signal is assumed to have a Gaussian
distribution. As the timing difference is signed (indicating whether auditory was before vs after
visual), two criteria may be involved: one for the positive side of the distribution (discriminating
simultaneous from auditory after visual) and one for the negative side (discriminating simultane-
ous from visual after auditory). See Yarrow et al. (2011) for discussion.
Shifts of these decision criteria result in shifts in points of subjective simultaneity, from which
perceived timing is inferred. Repeated exposure to a particular asynchrony might cause the sys-
tem to shift the decision criteria in the direction of compensation. This account is in a different
spirit than those involving adaptation of a population of asynchrony-tuned neurons (Roach et al.
2009; Cai, Stetson, and Eagleman 2012). Among psychophysicists, criterion shifts are often con-
sidered uninteresting. The notion seems to be that a criterion shift is more likely to be caused by
observers taking a different attitude towards their percepts rather than perception itself changing.
In contrast, the asynchrony-tuned neuron account is firmly a theory of change of percepts, from
a shift in underlying neural populations. Fortunately, there is some hope of distinguishing these
accounts by experiment, although this has not yet been done. The asynchrony-tuned neuron code
account appears to predict that sensitivity will change, not just criterion.
The evidence in the literature appears consistent with a shift in criteria (Fujisaki et al. 2004;
Vroomen et al. 2004; Yarrow et al. 2011; Hanson, Heron, and Whitaker 2008). Certainly, no one
has demonstrated that their result could not be explained by a shift in criteria or greater variability
in criteria (Roach et al. 2010; Yarrow et al. 2011).
But one should not dismiss lack of evidence for sensitivity change as implying that percepts
did not change. As Michael Morgan and colleagues have pointed out, even some indisputably
perceptual effects, like the motion after-effect, may be caused by criterion shifts (or ‘subtractive
adaptation’) rather than sensitivity changes (Morgan, Chubb, and Solomon 2011; Morgan and
Glennerster 1991; Morgan, Hole, and Glennerster 1990).
Thus an after-effect that manifests only as a criterion shift is not necessarily non-perceptual. To
get a fuller view of what needs to be explained, future investigations should document the scope
of contingencies adapted to. Perhaps, given an appropriate task and stimulus exposure protocol,
timing shifts could be accomplished for completely arbitrary stimulus pairings, with one pair of
criteria for pictures of Jennifer Aniston, another for pictures of pink koalas, and another for a
person whose face you didn’t encounter until the experiment began. For the brain to accomplish
such a feat, some process has to store these criteria and trot them out for the appropriate tasks and
stimuli. This topic is rarely discussed in the adaptation literature, but raises interesting issues that
may be widespread in the study of human cognition and learning.
While the Roseboom and Arnold (2011) result may herald an explosion of contingent timing
shifts, this may be restricted to situations of high temporal uncertainty regarding the time of sen-
sory signals. For rather than using a simple tone and flash as had been used in previous studies,
Roseboom and Arnold (2011) presented extended, time-varying video and auditory stimuli. The
832 Holcombe

video clip involved facial movements of the actor that extended over what appears to be (from
the supplementary clip provided in the paper) several hundred milliseconds, and the duration
of the auditory syllable signal was probably also at least a few hundred milliseconds. Both were
complex stimuli with multiple features occurring over their time-course, with differing durations
and without unambiguous discrete onsets and offsets. In such a situation, to determine whether
the stimuli were simultaneous, one must identify which stimulus features should go together.
The adaptation process may then be one of associating particular features of the extended video
signal that occur at certain times with particular features of the auditory train. This might be the
explanation of the results—after repeated experience hearing a particular part of the auditory
train presented simultaneously with a particular lip movement, one may learn that is the way that
particular speaker talks. Deviations from that learned timing for simultaneity are then perceived,
correctly, as temporally shifted from that speaker’s usual timing. This may thus be a criterion shift,
and one that does not generalize to cases where the auditory-visual matching is unambiguous.
This interpretation that the contingent asynchrony adaptation found by Roseboom and Arnold
(2011) will not generalize to unambiguous audiovisual correspondence situations gets some sup-
port from the results of Heron et al. (2012). Like Roseboom and Arnold (2011), Heron et al. (2012)
tested whether intersensory asynchrony adaptation could be contingent on the identity of the stim-
ulus. Instead of using different actors paired with their respective voices, they used high spatial
frequency gratings with high-pitched tones and low spatial frequency gratings with low-pitched
tones. Other researchers have shown that observers tend to spontaneously associate these values
(Evans and Treisman 2010; Spence 2011), suggesting they are not entirely unnatural associations.
Yet unlike Roseboom and Arnold (2011), these authors found that the asynchrony adaptation did
not ‘stick’ to the identity of the stimulus, but was instead tied to the spatial location. Thus they
demonstrated adaptation to opposite asynchronies (visual before auditory and visual after audi-
tory) tied to distinct locations. This is compatible with mediation by a brain area like the superior
colliculus that is retinotopically organized and has neurons tuned to audiovisual asynchronies. The
accounts based on a population of neurons tuned to various asynchronies therefore remains viable.
We have considered whether the brain sets the perceived timing of sensory signals to com-
pensate for learned or imputed sensory latencies. In a limited way it does, but the scope of the
phenomenon and nature of the underlying processing remains obscure.

Grouping and Gestalts


Auditory stimuli can have a powerful effect on temporal aspects of visual perception. A single
flash looks like two if two sounds are presented within about 100 ms of the same time (Shams,
Kamitani, and Shimojo 2000, 2002). Sounds also shift the perceived timing of flashes, in a manner
suggesting strong perceptual integration (Morein-Zamir et al. 2003; Freeman and Driver 2008;
Kafaligonul and Stoner 2010). But these shifts in perceived timing are not necessarily conse-
quences of processing that evolved to extract event time. That is, although they may mean that the
brain time theory is wrong, this does not mean that the event time theory is right. Instead of the
brain being bent on recovering the time of sensory events and achieving simultaneity constancy,
perceived timing may instead be a secondary effect of grouping and integration. Evolutionary
selection pressure may have driven the brain towards organizing ambiguous stimuli into the most
likely groupings, without special consideration for timing.
A striking auditory illusion discovered a century ago supports this theory that the brain pri-
oritizes grouping over correct timing. Benussi in 1913 reported that simple punctate sound
sequences result in consistent illusions of temporal order (Sinico 1999; Albertazzi 1999). In a
The Temporal Organization of Perception 833

demonstration available online (http://i-perception.perceptionweb.com/journal/I/volume/3/arti-


cle/i0490sas), Koenderink et al. (2012) present one example: a sequence comprising a low tone, a
high tone, and a noise burst. When the noise burst is presented as the middle sound, so that the
tones are not neighbouring each other temporally, perceptually one hears the tones as grouped
together and the noise occurring afterwards. This likely occurs because the tones form a good
Gestalt, and the noise is segmented away from them. The shifting of the time perceived may be
a byproduct of processes driven primarily by the need for auditory comprehension and source
identification (see also Spence, this volume). This is very different from the view of event time
theorists, who assume the goal of perceiving the correct time of events is the primary factor deter-
mining perceived timing.
Brain time theory is wrong, but so is the strong form of event time theory. Instead, the brain’s
priority may be grouping sensory signals originating with a common event together. But this does
not exclude the existence of adaptation and criterion shifts that on average push perceived timing
towards veridicality.

Summary
We do not yet know whether perception consistently represents event sequences as a timeline, in
the way that in the spatial domain we have a strong sense of the layout of a scene. It may be that
temporal experience is more impoverished.
When several to many stimuli are presented rather than just a few, most of the temporal relations
may be unavailable or reliant on unreliable cues like relative strength of the items in short-term
memory (Reeves and Sperling 1986). When just two stimuli accompanied by strong transients are
presented, they are more likely to engage attention and result in a clear percept of temporal order
(Fujisaki and Nishida 2007).
Extracting certain spatial relationships also seems to require attentional mediation (Holcombe,
Linares, and Vaziri-Pashkam 2011; Franconeri et al. 2011). But aspects of spatial perception take
advantage of the brain’s topographic arrays to process information in parallel, whereas the visual
brain may lack a chronotopic bank of processors.
In recent years much of the literature has focused on deciding between the event time recon-
struction theory and brain time. But the reality may be a modest amount of event time reconstruc-
tion that emerges from a recalibration process that shifts cross-modal simultaneity points after
prolonged exposure to asynchrony. Operating in parallel with this recalibration may be organiza-
tional processes that create temporal illusions as a byproduct of Gestalt grouping (Benussi 1913).
In evolutionary history, success at event reconstruction has likely been a factor in selecting the
winning organisms over the now-extinct losers. But segmenting events and identifying them may
have been both more important for the organism and more feasible than determining exact event
timing. When absolute timing is critical, learning of sensorimotor mappings may be used for cor-
rect timing of behaviour rather than changes to perception.

Acknowledgments
I thank Lars T. Boenke, Colin Clifford, and Paolo Martini for discussions, and Lars T. Boenke,
Alex L. White, and Daniel Linares for comments on an earlier version of the manuscript. I thank
Alex L.  White for the point that in snapping one’s fingers, it is not obvious which part of the
visual sequence generated the sound. Lars T. Boenke translated Klemm (1925) from German into
English. The writing of this chapter was supported by ARC grants DP110100432 and FT0990767.
834 Holcombe

References
Alais, D. and S. Carlile (2005). ‘Synchronizing to Real Events: Subjective Audiovisual Alignment Scales
with Perceived Auditory Depth and Speed of Sound’. Proceedings of the National Academy of Sciences of
the United States of America 102(6): 2244–2247.
Albertazzi, L. (1999). ‘The Time of Presentess. A Chapter in Positivistic and Descriptive Psychology’.
Axiomathes 10: 49–73.
Allan, L. G. (1975). ‘The Relationship between Judgments of Successiveness and Judgments of Order’.
Perception and Psychophysics 18: 29–36.
Allik, J. and K. Kreegipuu (1998). ‘Multiple Visual Latency’. Psychological Science 9: 135–138.
Alpern M. (1954). ‘The Relation of Visual Latency to Intensity’. A.M.A. Archives. of Ophthamology
51: 369–374.
Alvarez, G. A. (2011). Representing multiple objects as an ensemble enhances visual cognition. Trends in
Cognitive Sciences 15(3): 122–131. doi:10.1016/j.tics.2011.01.003.
Arden, G. B. and R. A. Weale (1954). ‘Variations of the Latent Period of Vision’. Proceedings of the Royal
Society of London B 142: 258–267.
Arnold, D. H. (2005). ‘Perceptual Pairing of Colour and Motion’. Vision Research 45(24): 3015–3026.
Arnold, D. H., A. Johnston, and S. Nishida (2005). Timing sight and sound. Vision Research 45: 1275–1284.
doi:10.1016/j.visres.2004.11.014.
Beaudot, W. H. (2002). Role of onset asynchrony in contour integration. Vision Research, 42(1), 1–9.
Benardete, E. A. and E. Kaplan (1999). ‘The Dynamics of Primate M Retinal Ganglion Cells’. Visual
Neuroscience 16: 355–368.
Benussi, V. (1913). Psychologie der Zeitauffassung. Winter: Heidelberg.
Berdyyeva, T. K. and C. R. Olson (2010). Rank signals in four areas of macaque frontal cortex during
selection of actions and objects in serial order. Journal of Neurophysiology 104(1): 141–159.
Bergenheim, M., H. Johansson, B. Granlund, and J. Pedersen (1996). ‘Experimental Evidence for a
Sensory Synchronization of Sensory Information to Conscious Experience’. In Towards a Science of
Consciousness: The First Tucson Discussions and Debates, edited by S. R. Hameroff, A. W. Kaszniak, and
A. C. Scott, pp. 301–310. Cambridge, MA: MIT Press.
Cai, M., C. Stetson, and D. M. Eagleman. (2012). A Neural Model for Temporal Order Judgments and
their Active Recalibration: A Common Mechanism for Space and Time? Frontiers in Psychology
3(November): 1–11. doi:10.3389/fpsyg.2012.00470.
Cavanagh, P., A. R. Hunt, A. Afraz, and M. Rolfs (2010). ‘Visual Stability Based on Remapping of Attention
Pointers’. Trends in Cognitive Sciences 14(4): 147–153. doi:10.1016/j.tics.2010.01.007.
Clifford, C. W. G., A. O. Holcombe, and J. Pearson (2004). Rapid global form binding with loss of
associated colors. Journal of Vision 4: 1090–1101.
Cunningham, D. W., V. A. Billock, and B. H. Tsou (2001). ‘Sensorimotor Adaptation to Violations of
Temporal Contiguity’. Psychological Science 12: 532–5.
Dakin, S. C. and P. J. Bex (2002) The Role of Synchrony in Contour Binding: Some Transient Doubts
Sustained. Journal of the Optical Society of America A 19(04): 678–686
De Valois, R. L. and K. K. De Valois (1991). ‘Vernier Acuity with Stationary Moving Gabors’. Vision
Research 31(9): 1619–1626.
Del Viva, M. M., M. Gori and D. C. Burr (2006). ‘Powerful Motion Illusion Caused by Temporal
Asymmetries in ON and OFF Visual Pathways’. Journal of Neurophysiology 95(6): 3928–32. doi:10.1152/
jn.01335.2005
Dennett, D. and M. Kinsbourne (1992). ‘Time and the Observer: The Where and When of Consciousness
in the Brain’. Behavioral and Brain Sciences 15(1992): 1–35.
The Temporal Organization of Perception 835

Di Luca, M., T. K. Machulla, and M. O. Ernst (2009). ‘Recalibration of Multisensory Simultaneity:


Cross-modal Transfer Coincides with a Change in Perceptual Latency’. Journal of Vision 9: 7–16.
Eagleman, D. M. (2007). ‘10 Unsolved Mysteries of the Brain’. Discover (August): 1–3.
Eagleman D. M. (2009). ‘Brain Time’. In What’s Next: Dispatches From the Future of Science, edited by
M. Brockman. London: Vintage Books.
Eagleman, D. M. (2010). ‘How Does the Timing of Neural Signals Map onto the Timing of Perception’. In
Issues of Space and Time in Perception and Action, edited by R. Nijhawan and B. Khurana, pp. 216–231.
Cambridge: Cambridge University Press.
Engel, G. R. and W. G. Dougherty (1971). ‘Visual–Auditory Distance Constancy’. Nature 234(5327): 308.
Evans, K. K. and A. Treisman. (2010). Natural cross-modal mappings between visual and auditory features.
Journal of Vision 10(1).
Exner, S. (1875). ‘Experimentelle Untersuchung der einfachsten psychischen Processe. III Abhandlung’
[Experimental research on simple physical processes]. Pflügers Archiv für die gesammte Physiologie des
Menschen und Thiere 11: 403–432.
Farid, H. (2002). Temporal synchrony in perceptual grouping: A critique. Trends in Cognitive Sciences, 6(7),
284–288.
Farid, H. and E. H. Adelson (2001). Synchrony does not promote grouping in temporally structured
displays. Nature Neuroscience 4(9): 875–876.
Forte, J., J. H. Hogben, and J. Ross (1999). ‘Spatial Limitations of Temporal Segmentation’. Vision Research
39: 4052–4061.
Fraisse, P. (1964). ‘The Psychology of Time’. London: Eyre and Spottiswoode.
Franconeri, S., J. Scimeca, J. Roth, S. Helseth, and L. Kahn (2011). ‘Flexible Visual Processing of Spatial
Relationships’. Cognition 122: 210–227.
Freeman, Elliot, and Jon Driver (2008). “Direction of visual apparent motion driven solely by timing of a
static sound”. Current Biology 18.16: 1262–1266.
Fujisaki, W. and S. Nishida (2007). ‘Feature-based Processing of Audio-visual Synchrony Perception
Revealed by Random Pulse Trains’. Vision Research 47(8): 1075–1093.
Fujisaki, W., S. Shimojo, M. Kashino, and S. Nishida (2004). Recalibration of audiovisual simultaneity.
Nature Neuroscience, 7(7): 773–778.
Granrud, C. E., M. A. Granrud, J. C. Koc, R. W., Peterson, and S. M. Wright (2003). ‘Perceived Size of
Traffic Lights: A Failure of Size Constancy for Objects Viewed at a Distance’. Journal of Vision 3(9): 491.
Grondin, S. (2010). ‘Timing and Time Perception: A Review of Recent Behavioral and Neuroscience
Findings’. Attention, Perception and Psychophysics 72(3): 561–582. doi:10.3758/APP.
Halliday, A. and R. Mingay (1964). ‘On the Resolution of Small Time Intervals and the Effect of
Conduction Delays on the Judgement of Simultaneity’. Quarterly Journal of Experimental Psychology
16(1): 37–41.
Hanson, J. V., J. Heron, and D. Whitaker (2008). ‘Recalibration of Perceived Time across Sensory
Modalities’. Experimental Brain Research 185: 347–352.
Harrar, V. and L. R. Harris (2008). ‘The Effect of Exposure to Asynchronous Audio, Visual, and Tactile
Stimulus Combinations on the Perception of Simultaneity’. Experimental Brain Research 186: 517–524.
Harrar, V. and L. Harris (2005). Simultaneity constancy: detecting events with touch and vision.
Experimental Brain Research, 166: 465–473. doi:10.1007/s00221-005-2386-7.
Harris L. R., V. Harrar, P. Jaekl, and A. Kopinska (2010). ‘Mechanisms of Simultaneity Constancy’. In
Space and Time in Perception and Action, edited by R. Nijhawan, pp. 232–253. Cambridge: Cambridge
University Press.
Heron, J., D. Whitaker, P. V. McGraw, and K. V. Koroshenkov (2007). ‘Adaptation Minimizes
Distance-related Audiovisual Delays’. Journal of Vision 7: 1–8.
836 Holcombe

Heron, J., J. V. M. Hanson, and D. Whitaker (2009). ‘Effect Before Cause: Supramodal Recalibration of
Sensorimotor Timing’. PLoS ONE 4: e7681. doi:10.1371/journal.pone. 0007681.
Heron, J., N. W. Roach, J. V. M. Hanson, P. V. McGraw, and D. Whitaker (2012). ‘Audiovisual Time
Perception is Spatially Specific’. Experimental Brain Research 218(3): 477–485. doi:10.1007/
s00221-012-3038-3.
Hess C. V. (1904) Untersuchungen über den Erregungsvorgan im Sehorgan der Katze bei kurz- und bei
länger dauernder Reizung. Pflügers Arch ges Physiolo 101: 226–262.
Holcombe, A. O. and P. Cavanagh (2008). ‘Independent, Synchronous Access to Color and Motion
Features’. Cognition 107(2): 552–580.
Holcombe, A. O. (2009). ‘Seeing Slow and Seeing Fast: Two Limits on Perception’. Trends in Cognitive
Science 13(5): 216–221.
Holcombe, A. O., D. L. Linares, and M. Vaziri-Pashkam (2011). ‘Perceiving Spatial Relationships via
Attentional Tracking and Shifting’. Current Biology 21: 1–5.
Holcombe, A. O. and C. W. Clifford (2012). ‘Failures to Bind Spatially Coincident Features: Comment on
Di Lollo’. Trends in Cognitive Science 16(8): 402.
Holcombe, A. O., N. Kanwisher, and A. Treisman (2001). ‘The Midstream Order Deficit’. Perception and
Psychophysics 63(2): 322–329.
Ivry, R. B. and J. E. Schlerf (2008). Dedicated and intrinsic models of time perception. Trends in Cognitive
Sciences 12(7): 273–280.
James, W. (1890). Principles of Psychology. Accessed from http://psychclassics.yorku.ca/James/Principles/
Jaśkowski, P. (1991). ‘Two-Stage Model for Order Discrimination’. Perception and Psychophysics 50: 76–82.
Kafaligonul, H. and G. R. Stoner (2010). ‘Auditory Modulation of Visual Apparent Motion with Short
Spatial and Temporal Interval’. Journal of Vision 10: 1–13. doi:10.1167/10.12.31.
Karmarkar, U. R. and D. V. Buonomano (2007). Timing in the absence of clocks: encoding time in neural
network states. Neuron, 53(3): 427–38.
Kitaoka, A. and H. Ashida (2007). A variant of the anomalous motion illusion based upon contrast and
visual latency. Perception, 36(7), 1019–1035. doi:10.1068/p5362
Kitaoka, A. and H. Ashida (2003). ‘Phenomenal Characteristics of the Peripheral Drift Illusion’. Vision
Research 15: 261–262.
Keetels, M. and J. Vroomen (2012). ‘Perception of Synchrony between the Senses’. In Frontiers in the Neural
Basis of Multisensory Processes, edited by M. T. Wallace and M. M. Murray, pp. 147–178. London:
CRC Press.
Klemm, O. (1925). ‘Über die Wirksamkeit kleinster Zeitunterschiede auf dem Gebiete des Tastsinns’. Archiv
fur die gesamte Psychologie 50: 205–220.
Koenderink, J., W. Richards, and A. van Doorn (2012). ‘Space-time Disarray and Visual Awareness’.
i-Perception 3(3): 159–162. doi:10.1068/i0490sas.
Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology.
New York: Liveright.
Kopinska, A. and L. R. Harris. (2004). ‘Simultaneity Constancy’. Perception 33(9): 1049–1060.
Lappe, M., & Krekelberg, B. (1998). The position of moving objects. Perception, 27(12), 1437–1449.
Lee, S. H., and R. Blake (1999). Visual form created solely from temporal structure. Science, 284(5417),
1165–1168.
Levi, D. (1996). ‘Pattern Perception at High Velocities’. Current Biology 6(8): 1020–1024.
Lewald, J. and R. Guski (2004). ‘Auditory–Visual Temporal Integration as a Function of Distance: No
Compensation for Sound-transmission Time in Human Perception’. Neuroscience Letters
357(2): 119–122.
Lotze, H. (1881). Grundzüge der Psychologie. Leipzig: Dictate aus den Vorlesungen S. Hirzel.
The Temporal Organization of Perception 837

McBeath, M. K., J. G. Neuhoff, and D. J. Schiano (1993). ‘Familiar Suspended Objects Appear Smaller than
Actual Independent of Viewing Distance’. Paper presented at the Annual Convention of the American
Psychological Society, Chicago, IL.
Macefield, G., S. C. Gandevia, and D. Burke (1989). ‘Conduction Velocities of Muscle and Cutaneous
Afferents in the Upper and Lower Limbs of Human Subjects’. Brain 112(6): 1519–1532.
McLeod, P., C. McLaughlin, and I. Nimmo-Smith (1985). ‘Information Encapsulation and Automaticity
Evidence from the Visual Control of Finely Timed Actions’. In Attention and Performance XI, edited by
M. I. Posner and O. S. Marin. Hillsdale, NJ: Erlbaum.
McLeod, P. and S. Jenkins (1991). ‘Timing Accuracy and Decision Time in High-speed Ball Games’.
International Journal of Sport Psychology 22: 279–295.
Marr, D. (1982). Vision. San Francisco, CA: Freeman.
Maunsell, J. H., G. M. Ghose, J. A. Assad, C. J. McAdams, C. E. Boudreau, and B. D. Noerager (1999).
‘Visual Response Latencies of Magnocellular and Parvocellular LGN Neurons in Macaque Monkeys’.
Visual Neuroscience 16(1): 1–14.
Meredith, M. A., J. W. Nemitz, and B. E. Stein (1987). Determinants of multisensory integration in
superior colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7(10): 3215–3229.
Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. (2003). ‘Auditory Capture of Vision: Examining
Temporal Ventriloquism’. Cognitive Brain Research 17(1): 154–163.
Morgan, M. J., G. J. Hole, and A. Glennerster (1990). ‘Biases and Sensitivities in Geometrical Illusions’.
Vision Research 30: 1793–1810.
Morgan, M. J. and A. Glennerster (1991). ‘Efficiency of Locating Centres of Dot-clusters by Human
Observers’. Vision Research 31: 2075–2083.
Morgan, M. J., C. Chubb, and J. A. Solomon (2011). ‘Evidence for a Subtractive Component in Motion
Adaptation’. Vision Research 51: 2312–2316.
Morgan, M., B. Dillenburger, S. Raphael, and J. A. Solomon (2012). ‘Observers Can Voluntarily Shift their
Psychometric Functions without Losing Sensitivity’. Attention, Perception and Psychophysics 74: 185–193.
Moutoussis, K. (2012). Asynchrony in Visual Consciousness and the Possible Involvement of Attention.
Frontiers in Psychology 3: 1–9.
Musacchia, G., C. E. and Schroeder (2009). ‘Neuronal Mechanisms, Response Dynamics and Perceptual
Functions of Multisensory Interactions in Auditory Cortex’. Hearing Research 258(1–2): 72–79.
doi:10.1016/j.heares.2009.06.018.
Nijhawan, R. (2008). ‘Visual Prediction: Psychophysics and Neurophysiology of Compensation for Time
Delays’. Behavioral and Brain Sciences 31: 179–239.
Nishida, S. and A. Johnston (2002). ‘Marker Correspondence, not Processing Latency, Determines
Temporal Binding of Visual Attributes’. Current Biology 12(5): 359–368.
Nishida S. and A. Johnston (2010). ‘The Time Marker Account of Cross-channel Temporal Judgments’.
In Space and Time in Perception and Action, edited by R. Nijhawan and B. Khurana, pp. 278–300.
Cambridge: Cambridge University Press.
Ogmen, H., S.S. Patel, H.E. Bedell, and K. Camuz (2004). Differential latencies and the dynamics of the
position computation process for moving targets, assessed with the flash-lag effect. Vision Research
44: 2109–2128.
Oram, M. W., D. Xiao, B. Dritschel, and K. R. Payne (2002). ‘The Temporal Resolution of Neural
Codes: Does Response Latency Have a Unique Role?’ Philosophical Transactions of the Royal Society
B: Biological Sciences 357(1424): 987–1001.
Purushothaman, G., S. S. Patel, H. E. Bedell, and H. Ogmen (1998). Moving ahead through differential
visual latency. Nature 396(6710): 424. doi:10.1038/24766.
Reeves, A. and G. Sperling (1986). ‘Attention Gating in Short-term Visual Memory’. Psychological Review
93(2): 180–206.
838 Holcombe

Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science
and Medicine. New York: Elsevier.
Roach, N. W., J. Heron, D. Whitaker, and P. V. McGraw (2010). ‘Asynchrony Adaptation Reveals Neural
Population Code for Audio-visual Timing’. Proceedings of the Royal Society: Biological Sciences
278(1710): 1314–1322. doi:10.1098/rspb.2010.1737.
Roelofs, C. (1935). ‘Optische localisation’. Archive fur Augenheilkunde 109: 395–415.
Roseboom, W. and D. H. Arnold (2011). Twice upon a time: multiple concurrent temporal recalibrations of
audiovisual speech. Psychological Science, 22(7): 872–7. doi:10.1177/0956797611413293.
Roseboom, W., S. Nishida, W. Fujisaki, and D. H. Arnold (2011). ‘Audio-visual Speech Timing Sensitivity
Is Enhanced in Cluttered Conditions’. PloS ONE 6(4): 1–8. doi:10.1371/journal.pone.0018309.
Roufs, J. A. J. (1963). ‘Perception Lag as a Function of Stimulus Luminance’. Vision Research 3: 81–91.
Schneider, K. A. and D. Bavelier (2003). ‘Components of Visual Prior Entry’. Cognitive Psychology
47(4): 333–366.
Shams, L., Y. Kamitani, and S. Shimojo (2002). ‘Visual Illusion Induced by Sound’. Cognitive Brain Research
14(1): 147–152.
Shams, L., Y. Kamitani, and S. Shimojo (2000). ‘Illusions. What You See Is What You Hear’. Nature
408(6814): 788.
Shapley, R. M. and J. D. Victor (1978). ‘The Effect of Contrast on the Transfer Properties of Cat Retinal
Ganglion Cells’. Journal of Physiology 285: 275–298.
Shore, D. I., E. Spry, and C. Spence (2002). ‘Confusing the Mind by Crossing the Hands’. Cognitive Brain
Research 14: 153–163.
Sinico, M. (1999). ‘Benussi and the History of Temporal Displacement’. Axiomathes 10: 75–93.
Smith, W. S., J. D. Mollon, R. Bhardwaj, and H. E. Smithson (2011). ‘Is There Brief Temporal Buffering of
Successive Visual Inputs?’ The Quarterly Journal of Experimental Psychology: 64(4): 767–791.
Smithson, H. and J. Mollon (2006). ‘Do Masks Terminate the Icon?’ Quarterly Journal of Experimental
Psychology 59(1): 150–160.
Snowden, R., P. Thompson, and T. Troscianko (2006). Basic Vision. Oxford: Oxford University Press.
Spence, C. and C. Parise (2010). ‘Prior-entry: A Review’. Consciousness and Cognition 19(1): 364–79.
doi:10.1016/j.concog.2009.12.001.
Spence, C. (2011). ‘Crossmodal Correspondences: A Tutorial Review’. Attention, Perception, and
Psychophysics 73: 971–995.
Stetson, C., X. Cui, P. R. Montague, and D. M. Eagleman (2006). ‘Motor-sensory Recalibration Leads to an
Illusory Reversal of Action and Sensation’. Neuron 51: 651–659.
Stone, J. V., M. M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, et al. (2001). ‘When is
Now? Perception of Simultaneity’. Proceedings of the Royal Society: Biological Sciences 268(1462): 31–8.
doi:10.1098/rspb.2000.1326.
Stromeyer, C. F. and P. Martini (2003). ‘Human Temporal Impulse Response Speeds Up with Increased
Stimulus Contrast’. Vision Research 43(3): 285–298.
Sugita, Y. and Y. Suzuki (2003). Audiovisual perception: Implicit estimation of sound-arrival time. Nature
421(6926): 911.
Tanji, J. (2001). ‘Sequential Organization of Multiple Movements: Involvement of Cortical Motor Areas.
Annual Reviews of Neuroscience 24: 631– 651.
Treisman, A. and G. Gelade (1980). A feature integration theory of attention. Cognitive Psychology
12: 97–136.
Treisman, M. (1963). Temporal discrimination and the indifference interval: Implications for a model of the
“internal clock”. Psychological Monographs General Applied 77(13): 1–31.
Usher, M. and N. Donnelly (1998). Visual synchrony affects binding and segmentation in perception.
Nature 394(9 July): 179–182.
The Temporal Organization of Perception 839

Uttal, W. R. (1979). ‘Do Central Nonlinearities Exist?’ Behavioral and Brain Sciences 2: 286.
van Eijk, R. L., A. Kohlrausch, J. F. Juola, and S. van de Par (2008). ‘Audiovisual Synchrony and Temporal
Order Judgments: Effects of Experimental Method and Stimulus Type’. Perception and Psychophysics
70(6): 955–968.
Van de Grind, W. A., O. -J. Grüsser, and H. U. Lunkenheimer (1973). Temporal transfer properties of the
afferent visual system. Psychophysical, neurophysiological and theoretical investigations. In R. Jung
(Ed.), Handbook of sensory physiology (Vol. VII/3, pp. 431–573). Berlin: Springer, Chapter 7
van Doorn, A. J., J. J. Koenderink, and J. Wagemans (2011). Rank order scaling of pictorial depth.
i-Perception (special issue on Art & Perception) 2: 724–744. doi:10.1068/i0432aap.
Vicario, G. B. (2003). ‘Temporal Displacement’. In The Nature of Time: Geometry, Physics, and Perception,
edited by R. Buccheri, M. Saniga, and M. S. Stuckey, pp. 53–66. Dordrecht: Kluwer Academic.
von der Malsburg, C. (1981). ‘The Correlation Theory of Brain Function’. In Models of Neural Networks II:
Temporal Aspects of Coding and Information Processing in Biological Systems, edited by J. L. Domany, J. L.
van Hemmen and K. Schulten, pp. 95–119. New York: Springer-Verlag (reprinted in 1994).
Vroomen, J. and M. Keetels (2010). ‘Perception of Intersensory Synchrony: A Tutorial Review’. Attention,
Perception, and Psychophysics 72(4): 871–884. doi:10.3758/APP.
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson (2004). ‘Recalibration of Temporal Order
Perception by Exposure to Audio-visual Asynchrony’. Cognitive Brain Research 22(1): 32–5.
doi:10.1016/j.cogbrainres.2004.07.003.
Wackermann, J. (2007). ‘Inner and Outer Horizons of Time Experience’. The Spanish jOurnal of Psychology
10(1): 20–32.
Wallace, M. T., L. K. Wilkinson, & B. E. Stein (2012). ‘Representation and Integration of Multiple Sensory
Inputs in Primate Superior Colliculus’. Journal of Neurophysiology 76: 1246–1266.
Warren, R. M., C. J. Obusek, R. M. Farmer, and R. P. Warren (1969). ‘Auditory Sequence: Confusion of
Patterns Other than Speech or Music’. Science: 164: 586–587.
White, A. L., D. Linares, and A. O. Holcombe (2008). Visuomotor timing compensates for changes in
perceptual latency. Current Biology 18(20): R951–3.
Williams, J. M. and A. Lit (1983). ‘Luminance-dependent Visual Latency for the Hess Effect, the Pulfrich
Effect and Simple Reaction Time’. Vision Research 23: 171–179.
Wilson, J. A, & S. M. Anstis (1969). Visual delay as a function of luminance. The American Journal of
Psychology 82(3): 350–8.
Wittmann, M. (2011). ‘Moments in Time’. Frontiers in Integrative Neuroscience 5(October): 1–9.
doi:10.3389/fnint.2011.00066.
Yarrow, K., N. Jahn, S. Durant, and D. H. Arnold (2011). ‘Shifts of Criteria or Neural Timing? The
Assumptions Underlying Timing Perception Studies’. Consciousness and Cognition 20(4): 1518–1531.
doi:10.1016/j.concog.2011.07.003.
Section 9

Applications of perceptual
organization
Chapter 41

Camouflage and perceptual


organization in the animal kingdom
Daniel C. Osorio and Innes C. Cuthill

Introduction
There is hardly a law of vision that is not found again serving camouflage.
(Metzger 1936, transl. Spillman 2009. p. 85)
Animal camouflage is subtle and beautiful to the human eye, but it is has evolved to deceive non-
human adversaries. Multiple mechanisms are involved. For example, crypsis works by defeating
figure-ground segregation, whereas patterns that disguise the animal as a commonplace object or
lead to misclassification are known as masquerade and mimicry (Endler 1981; Ruxton, Speed, and
Sherratt 2004b; but see also Stevens and Merilaita 2009 for a discussion of these terms). Mimetic
patterns, which are often conspicuous, work by similarity to a different animal, typically one that
is avoided by the predator, whereas in masquerade the animal resembles a commonplace but val-
ueless object, such as a bird-dropping or plant thorn. Early Gestalt psychologists used examples
from animal camouflage to illustrate their principles of perception (Metzger 2009), which were,
in turn, used to explain deceptive coloration (Keen 1932). What was not appreciated, or underes-
timated, in early studies of animal camouflage were the differences in vision between humans and
other animals, even though it is these ‘other animals’ that have been the selective force in evolu-
tion (Endler 1978; Cuthill and Bennett 1993; Bennett, Cuthill, and Norris 1994). Conversely, there
has been a view that certain aspects of vision, such as object completion, may require mechanisms
specific to the neocortex, and so are not expected in animals without such a structure (Nieder
2002; Shapley, Rubin, and Ringach 2004; Zylinski, Darmaillacq, and Shashar 2012; van Lier and
Gerbino this volume). The fact that camouflage is effective against humans suggests that common
principles of perceptual organization apply across diverse visual environments, eye designs, and
types of brain. In any case, camouflage offers an approach to the vision of non-human animals
that is both more naturalistic and very different from standard methods, such as tests of associa-
tive learning.
Historically, biological camouflage was studied from about 1860 to 1940 as evidence for the the-
ory of natural selection and for military applications. Notable contributors included the American
artist Thayer (1896, 1909), who was fascinated by countershading and disruptive coloration, and
the English zoologist Cott whose beautifully illustrated book Adaptive coloration in animals (1940)
set out principles of camouflage such as ‘maximum disruptive contrast’ and ‘differential blending’
(Figure 41.2A). Cott’s view that these principles are attributable to the ‘optical properties’ of the
image, rather than being physiological or psychological phenomena, ignored the possible influence
844 Osorio and Cuthill

of differences in perception between animals. Cott could not have been aware of the diversity of
animal colour vision. A trichromatic bee (with ultraviolet, blue, and green photoreceptors), a tet-
rachromatic bird (with UV, blue, green, and red photoreceptors), and a trichromatic human will
process identical spectral radiance in different ways, but all these animals face common challenges,
such as figure-ground segmentation and colour constancy. Furthermore, for camouflage that has
evolved as concealment against multiple visual systems (e.g. a praying mantis in foliage, concealed
both to its insect prey and reptilian and avian predators), the common denominators will prevail
over viewer-specific solutions. As the ultimate common denominator is the physical world, one
might, for example, expect the colours of many camouflaged animals to be based on pigments
that have similar reflectances to natural backgrounds across a broad spectral range, even though
in principle a metamer might be effective against any one visual system (Wente and Phillips 2005;
Chiao et al. 2011).
In contrast to Cott, Metzger’s account of camouflage in The Laws of Seeing (2009), was explic-
itly cognitive, not optical, drawing attention to the Gestalt psychological principles of ‘belong-
ing’, ‘common fate’, and ‘good continuation’. Metzger also devotes a chapter to the obliteration of
3D form, by countershading. More recently Julesz’s (1971, 1981) influential work in vision was
motivated by the idea that image segregation by texture, depth, and motion evolved to break
camouflage. His lecture at the 1998 European Conference on Visual Perception was entitled ‘In
the Last Minutes of the Evolution of Life, Stereoscopic Depth Perception Captured the Input
Layer to the Visual Cortex to Break Camouflage’ (Frisby 2004). Julesz’s ideas remain relevant
to understanding texture matching, and also raise the question of whether any camouflage can
defeat the stereo-depth and motion-sensitive mechanisms that allow figure-ground segregation
in ‘random-dot’ images.
Recently, research on camouflage has been stimulated by the realization that direct evidence for
how particular types of camouflage exploit perceptual mechanisms was sparser than textbooks
might suggest. In addition, such evidence as did exist had been evaluated via human perception
of colour and pattern, not the evolutionarily relevant viewer. For example, the bright warning col-
ours of toxic insects such as ladybirds have evolved under the selective pressure exerted by, among
others, bird eyes and brains, and avian colour vision is tetrachromatic and extends into the ultra-
violet (Cuthill 2006). This has led to experimental tests, within the natural environment, of basic
camouflage principles such as disruptive coloration and countershading, informed by physiologi-
cally based models of non-human low-level vision (Cuthill et al. 2005; Stevens and Cuthill 2006).
Biologists also recognize that animal coloration patterns often serve multiple functions, includ-
ing sexual and warning signals, non-visual purposes such as thermoregulation and mechanical
strengthening. Not only can animal colours only be understood in the light of trade-offs between
these functions (Ruxton et al. 2004b), but it is often difficult to be sure which function is relevant
(Stuart-Fox and Moussali 2009).
Other recent studies, which we describe here, have investigated animals that can change their
appearance, such as chameleons (Stuart-Fox and Moussali 2009), flatfish and especially cuttlefish
(Figure 41.1). Cuttlefish, like other cephalopod molluscs control their appearance with extraor-
dinary facility, which allows them to produce a vast range of camouflage patterns under visual
control. These patterns illustrate interesting and subtle features of camouflage design, includ-
ing disruptive and depth effects. However, the special feature of actively controlled camouflage
is that one can ask what visual features and image parameters the animals use to select colora-
tion patterns. This gives us remarkable insights into perceptual organization in these advanced
invertebrates.
Spots
(a)
i ii

Blotches

iii iv

(b)

Fig. 41.1  Images of (a) a flatfish, the plaice (Pleuronectes platessa) and (b) a cuttlefish (Sepia
officinalis), which vary their appearance to match the background. The plaice varies the level of
expression of two patterns, which we call blotches and spots. These can be expressed at low
levels (i), separately (ii, iii), or mixed (iv). The cuttlefish displays a great range of patterns. Here the
upper left panel illustrates an animal expressing a Disruptive type of pattern on a checkerboard
background, and the lower left a Mottle on the background with the same power spectrum
but randomized phase. The right-hand panel shows two animals on a more natural background
expressing patterns with both disruptive and mottle elements.
Adapted from Emma J. Kelman, Palap Tiptus and Daniel Osorio, Juvenile plaice (Pleuronectes platessa) produce
camouflage by flexibly combining two separate patterns, The Journal of Experimental Biology, 209 (17),
pp. 3288–3292, Figure 1, doi: 10.1242/​jeb.02380 © 2006, The Company of Biologists.
846 Osorio and Cuthill

Principles of Camouflage
A naive view is that camouflage ‘matches the background’, but the simplicity of the concept has
proved deceptive and led to controversies about definitions up to the present day (for instance
Stevens and Merilaita’s 2009 arguments about cryptic camouflage). An exact physical match, such
that the pattern on the animal and the substrate against which it is viewed are perceptually identi-
cal, is possible only with a uniform background; if only because differences in pattern phase at
the boundary between object and background, or 3D cues from shadowing on its surface, are
almost inevitable. A fascinating example of near-perfect background matching, in this very literal
sense, is produced by the scales of many fish that work as vertical mirrors. Ideally, such mirrors
reflect the ‘space-light’ of open water so that a viewer sees the same light as it would with uninter-
rupted line of sight, making the fish invisible (Denton 1970; Jordan, Partridge, and Roberts 2012).
Accepting that invisibility through exact replication of the occluded background is rarely achiev-
able, in the biological literature ‘background matching’ (largely replacing earlier terms such as
‘general protective resemblance’) is taken to mean matching the visual texture of the background.
That texture may be a continuous patterned surface such as tree bark, or it may include discrete
3D objects, such as pebbles or leaves, that could in principle be segregated separately. Exactly how
best to match the background is a topic we return to in ‘The problem of multiple backgrounds’.
Logically distinct from crypsis is ‘masquerade’, where an animal mimics a specific background
object that is inedible or irrelevant (leaf-mimicking butterflies and bird’s-dropping-mimicking insect
pupae are classic examples; Skelhorn, Rowland, and Ruxton 2010a; Skelhorn et al. 2010b). Although a
stick insect benefits from both matching its generally stick-textured background as well as looking like
a stick, the distinction can be made when such an animal is seen against a non-matching background.
Masquerading as a stick can be successful even when completely visible, whereas matching a sample
of the background texture ceases to be an effective defence when the animal is readily segmented
from the background. Masquerade depends on the mechanisms of object recognition and relative
abundance of model and mimic (frequency dependent selection), rather than perceptual organiza-
tion, so we say no more about it here but refer the reader to a recent review (Skelhorn et al. 2010a).
Historically (Cott 1940), two main camouflage strategies have been recognized: cryptic and dis-
ruptive camouflage. Cryptic camouflage relies on the body pattern in some sense matching its back-
ground. At present there is no simple way to predict whether two visual textures will match, yet
the quality of camouflage patterns is striking, especially considering the complexity of generating
naturalistic visual textures in computer graphics (Portilla and Simoncelli 2000; Peyré 2009; Allen
et al. 2011; Rosenholtz 2013). The lack of a simple theory for the classification of visual textures, as
envisaged by Julesz (1981, 1984; Kiltie, Fan, and Laine 1995), has limited progress in the understand-
ing of camouflage, which leaves this area open. However, the adaptive camouflage of flatfish and
cuttlefish offers an experimental approach to the question of what range of patterns is needed for one
type of natural background—namely seafloor habitats—and to test what local image parameters and
features are used by these marine animals to classify the substrates that they encounter.
Disruptive camouflage ‘classically’ involves well-defined (e.g. high-contrast) visual features that
create false edges and hence interfere with figure-ground segregation (Figures 41.1–41.3; Cott
1940; Osorio and Srinivasan 1991; Cuthill et  al. 2005). However, the idea can be generalized to
any mechanism that interferes with perceptual grouping of the object’s features. Hence disruptive
camouflage gives a more direct route to understanding principles of perceptual organization. It has
had more attention than cryptic camouflage, which works by matching the background, perhaps
because, in some sense, it appears to be more sophisticated, involving deceptions resembling opti-
cal illusions. A major impetus for recent research has been the realization that the effectiveness of
disruptive camouflage had been accepted for over a century without direct test (Merilaita 1998;
Cuthill et al. 2005). It may be that the widespread use of (allegedly) disruptive patterning in military
Camouflage and Perceptual Organization in the Animal Kingdom 847

(a) (b)

(c)

Fig. 41.2  (a) Drawings adapted from the artwork by Hugh Cott illustrating coincident colours
that create false contours on the leg and body of the frog Rana temporaria Cott. (b) The frog
Lymnodynastes tasmaniensis showing enhanced edges to the camouflage pattern. (c) See Cott’s
(1940, Figure 17) interpretation of the enhanced border on the wing of a butterfly as being
consistent with a surface discontinuity. It is an interesting question how often such intensity profiles
occur in nature.
Reproduced from H.B. Cott, Adaptive Coloration in Animals, Figure 21, Methuen, London, UK Copyright © 1940,
Methuen.
Reproduced from D. Osorio and M. V. Srinivasan, Camouflage by Edge Enhancement in Animal Coloration
Patterns and Its Implications for Visual Mechanisms, Proceedings of The Royal Society B, 244 (1310), pp. 81–85,
DOI: 10.1098/rspb.1991.0054 Copyright © 1991, The Royal Society.
Reproduced from H.B. Cott, Adaptive Coloration in Animals, Figure 17, Methuen, London, UK Copyright © 1940,
Methuen.

camouflage, where historically the early inspiration was often from nature (Behrens 2002, 2011),
reinforced its acceptance as ‘proven’ in biology. Given that crypsis depends upon matching the back-
ground, whereas disruptive effects depend upon creating false edges or surfaces, it is an interesting
question as to how crypsis and disruptive coloration work in tandem—a topic we return to later.
We now outline experimental studies of camouflage relevant to four main aspects of perceptual
organization: first, cryptic coloration and background matching; second, the problem of obscur-
ing edges; third the problem of obscuring 3D form; and fourth the concealment of motion.

Cryptic Coloration and Background Matching


Julesz (1981, 1984) proposed that just as trichromatic colour vision encodes visible spectra via
three channels, which are defined by the cone photoreceptor spectral sensitivities, so there should
be a small number of local texture channels (Landy and Graham 2004; Rosenholtz this volume).
One could hope to replicate any texture with a small number of textons in the same way that one
can reproduce colours with three primaries. Julesz found that textures were in some cases readily
discriminated when they had the same mean intensity and second-order (i.e. spatial frequency
power spectrum) and even higher-order statistics. This led to the hypothesis that there are chan-
nels that would represent local features, such as the size and aspect ratio of ‘blobs’, the termination
of lines and the presence of line intersections. This theory has been influential, especially in work
on preattentive visual discrimination, but the limited set of textons has yet to be identified. In
recent decades much effort has gone into understanding the coding of natural images, but to our
848 Osorio and Cuthill

Fig. 41.3  Artificial targets, baited with mealworms, survived better under bird predation if the
contrasting colour patches intersected the ‘wing’ edges (bottom left) than targets bearing otherwise
similar oak-bark-like textures that did not intersect the edges (top left). High contrast edge-disrupting
patterns and differential blending with the background reduce the signal from the target’s outline
(right-hand panels: edge images from applying a Laplacian-of-Gaussian filter to similar targets).
Reproduced from Martin Stevens and Innes C Cuthill, Disruptive coloration, crypsis and edge detection in early
visual processing, Proceedings of The Royal Society B, 273 (1598), pp. 2433–38, DOI: 10.1098/rspb.2006.3556
Copyright © 2006, The Royal Society.

knowledge a small basis-set of spatial mechanisms analogous to cone fundamentals has not been
identified. Indeed, the principle of sparse coding argues for a large set of low-level mechanisms
(Simoncelli and Olhausen 2001). Similarly, systems for generating naturalistic visual textures in
computer graphics involve many free parameters (Portilla and Simoncelli 2000; Peyré 2009), but,
even so, graphics do not convincingly resemble natural surfaces. It is therefore intriguing that
cryptic camouflage often matches the background so well (Figure 41.1).
Hanlon (2007) has proposed that three main types of camouflage pattern—which he calls
Uniform, Mottle, and Disruptive—are widespread in both aquatic and terrestrial animals. This
classification often seems to work, but the number of distinguishable backgrounds and camou-
flage patterns is much greater than three. However, it is possible that a small basis-set of patterns
can generate cryptic camouflage for a wide range of backgrounds (Julesz 1984). Coloration pat-
terns are typically under genetic control and, at least in the wings of butterflies and moths, a small
number of developmental mechanisms underlie much diversity (Beldade and Brakefield 2002).
Camouflage and Perceptual Organization in the Animal Kingdom 849

An animal lineage with a suitable ‘basis-set’ of genetically defined patterns would perhaps be able
to evolve camouflage for a range of natural backgrounds. Certainly, the coat pattern variation
in all living cat species does not seem to be heavily constrained by taxonomic similarity (Allen
et al. 2011). Instead, the colour variation, which could plausibly be generated by slight changes in
the reaction-diffusion equations underlying pattern development, has readily switched between
spots, stripes, and uniform fur in relation to habitat type.

Physiologically Controlled Coloration


Flatfish and cuttlefish provide direct evidence for the range of spatial patterns needed for cam-
ouflage. These bottom-living marine animals use a limited set of patterns or local features, whose
contrast is varied under rapid physiological control (Figure 41.1). Both groups alter their appear-
ance under visual control to produce superb camouflage, over a few minutes for flatfish or less
than a second for cuttlefish. In terms of ecology, the ability to change colour rapidly has major
benefits because the range of habitats in which you can be concealed is increased, and changing
colour rapidly can itself be employed as a distraction tactic, or to prevent the adversary developing
a search image (Hanlon, Forsythe, and Joneschild 1999; Bond and Kamil 2006). In terms of how
camouflage patterns actually work, it actually matters little whether the colours are produced by
chromophores under neural control (as in cephalopods), fixed pigments in skin, hair, feathers, or
a shell, or from an artist’s palette. What colour-changing animals do give us is a powerful experi-
mental system for asking the animal itself what matters for concealment.

Flatfish Patterns
Three studies have looked at how flatfish vary their visual appearance (Fig 41.1A). We encourage the
reader to view images of these animals via the internet. Saidel (1988) found that two North American
species, the southern flounder (Paralichthys lethostigma) and the winter flounder (Pseudopleuronectes
americanus), control the level of expression of a single pattern in response to varying backgrounds.
Both species control the contrast in a pattern of dark and light, somewhat blurred, spots roughly
10 mm across. In Paralichthys both the mean reflectance and the contrast of the background influence
the coloration, and the maximum contrast across the body ranged from 14% to 70% (Saidel 1988).
Another North Atlantic species, the plaice (Pleuronectes platessa; Figure 41.1A; Kelman, Tiptus, and
Osorio 2006), has an advantage over the summer and winter flounders in that it can add two patterns
to a fairly uniform ‘ground’ pattern. One of these patterns comprises predominantly about thirty
small (<5 mm diameter) dark and light spots in roughly equal numbers; the other is blurred dark
blotches, which form a low-frequency grating-like pattern. The fish mixes these two patterns freely,
changing appearance over the course of a few minutes according to the visual background.
The most elaborate adaptive coloration described in a fish is for the eyed flounder Bothus occela-
tus. When Ramachandran and co-workers (1996) analysed Fourier-transformed images of the
fish, they found that three principal components accounted for the range of patterns that the ani-
mals could display in their aquaria. The authors describe the components as composed of a ‘low vs
high’ spatial frequency channel, a medium spatial frequency channel, and a narrow-band channel
at eight cycles per fish. It is not easy to relate directly these principal components, defined in terms
of spatial frequency, to body patterns, but the eight-cycle per fish channel probably corresponds
to a regular pattern of dark blotches much like those on the plaice (Figure 41.1A; Ramachandran
et al. 1996, Figure 41.1C). Another pattern corresponds to the roughly 100 light annular (or ‘ocel-
lar’) features and a smaller number (about thirty) of dark annuli that give this fish its name. In
addition, the fish can display a finer-grained gravel-like texture. Apart from the evidence for three
principal components, the fish can apparently display isolated features, such as a single dark spot.
850 Osorio and Cuthill

Ramachandran and co-workers (1996) pointed out that the eyed flounder lives in shallow tropi-
cal water, which is relatively clear. They suggested that this could explain why it has a more elaborate
coloration system than the summer and winter flounders, which have only one degree of freedom
in their pattern: changing contrast. It is tempting to suggest—though without direct evidence—that
flatfish use one, two, or three basic patterns according to the visual environment in which they live.
Fish that live in clearer water of more varied habitats would benefit from a greater range of patterns.
Shohet and co-workers (2007) make a similar proposal for different cuttlefish species.

Cuttlefish
Although flatfish often have good camouflage, their adaptive coloration is much simpler than that
of cephalopod molluscs, especially octopuses and cuttlefish (Figure 41.1B). These animals change
their skin coloration under visual control in a fraction of a second, and can even produce moving
patterns of dark bands. Observation of cuttlefish coloration patterns, produced in response to
varying backgrounds, allows unique insights into the vision of these extraordinary molluscs—and
of their adversaries, especially teleost fish (Langridge, Boon, and Osorio 2007).
European cuttlefish (Sepia officinalis) body patterns are produced by the controlled expression of
about forty visual features known as behavioural components, and they can also control the physical
texture of their skin (Hanlon and Messenger 1988). The level of expression of each component can
be varied in a continuous manner (Kelman, Osorio, and Baddeley 2008). Our unpublished principal
components analysis of the coloration patterns displayed on a large range of natural backgrounds
indicates that there are at least six degrees of freedom in the range of cryptic patterns produced by
cuttlefish (see also Crook, Baddeley, and Osorio 2002). This is suggestive of great flexibility and inde-
pendent control of the separate pattern components, which must be matched by a corresponding
visual ability. At present, however, the way in which the expression of these patterns is coordinated,
and the full range of camouflage patterns produced in natural conditions, remains poorly studied.
Hanlon and Messenger (1988) suggested that five main body patterns are used for camouflage.
These were called: Uniform Light, Stipple, Light Mottle, Dark Mottle, and Disruptive. The reader
should note that the terms for body patterns are capitalized to distinguish them from camou-
flage mechanisms. In particular, it is not certain that the Disruptive pattern works as disrup-
tive rather than cryptic camouflage (Ruxton et al. 2004b; Zylinski and Osorio 2011). As we have
mentioned, Hanlon (2007) has identified three basic types of pattern in cephalopods and other
animals: Uniform, Mottle, and Disruptive. In experimental aquaria, most cuttlefish patterns can
indeed be classified by a combination of mottle and disruptive elements, which is comparable
to the two degrees of freedom seen in the plaice (Figure 41.1). The ‘disruptive’ pattern compo-
nents, defined by expert human observers, include about ten comparatively large well-defined
light and dark features, including a white square on the centre of the animal and a dark head
bar (Figure 41.1B; Hanlon and Messenger 1988; Chiao, Kelman, and Hanlon 2005). The mottle
pattern comprises less crisply defined features, and is comparable to the blotches used by flatfish
(Hanlon and Messenger 1988).

Selection of Coloration Patterns by Cuttlefish


The cuttlefish’s capacity to alter its appearance according to the visual background allows us to
investigate the animal’s spatial vision. Most obviously, one can test the effects of varying a specific
image parameter in the background. Studies have used both printed patterns, such as checker-
boards (Figure 41.1B; Chaio and Hanlon 2001; Zylinski, Osorio, and Shohet 2009a), and more
natural substrates, such as sand, gravel, and stones (Marshall and Messenger 1996; Shohet et al.
2007; Barbosa et al. 2008). Patterns have been designed to test the animals’ sensitivity to low-level
visual parameters, including colour, spatial frequency, contrast, orientation, and spatial phase
Camouflage and Perceptual Organization in the Animal Kingdom 851

(Marshall and Messenger 1996; Zylinski and Osorio 2011), or local features such as edges, objects,
and depth cues (e.g. Chiao, Chubb, and Hanlon 2007; Zylinski et al. 2009a, 2009b). This work is
reviewed elsewhere (Kelman et al. 2008; Hanlon et al. 2011; Zylinski and Osorio 2011), but the
main conclusions are as follows. Regarding low-level image parameters, cuttlefish are sensitive
to mean reflectance, contrast, spatial frequency, and spatial phase (Kelman et al. 2008). They are
sensitive to orientation, but this affects the body and arm orientation rather than the pattern
displayed (Shohet et al. 2006; Barbosa et al. 2011). Cuttlefish are sensitive both to the presence
of local edges (Zylinski et al. 2009a, 2009b), and whether the spatial organization of local edge
fragments is consistent with the presence of objects (Zylinski et al. 2012). Cuttlefish are sensitive
to visual depth and pictorial cues consistent with visual depth (Kelman et al. 2008). Often the
contrast of the coloration patterns is varied to match approximately the contrast in the back-
ground (Kelman et al. 2008; Zylinski et al. 2009a). Despite their mastery of camouflage cuttlefish
are colour-blind, having only one visual pigment (Marshall and Messenger 1996; Mäthger et al.
2006), but this deficiency seems to have little detriment for camouflage (Chiao et al. 2011), pre-
sumably because reflectance spectra of their natural backgrounds have a simple and predictable
form (yellows-through-browns), where reflectance increases monotonically with wavelength and,
as such, the colour is well predicted by luminance.
Many of the cuttlefish’s responses can be interpreted on the basis that the animals express the
Disruptive pattern on a background composed of discrete objects, whose size approximates that
of the ‘white square’ pattern component, and the Mottle on a textured surface (Figure 41.1B). It
is striking how many image parameters, local features, and higher-level information are used to
make this seemingly simple decision. This leads to a system that is reminiscent of the fact that
humans use multiple mechanisms for figure-ground segregation (Kelman et al. 2008; Zylinski and
Osorio 2011; Zylinski et al. 2012; see also Peterson this volume).

Symmetry
Almost all mobile animals have a clear plane of symmetry, usually bilateral—flatfish are an obvi-
ous exception—and symmetry of both the outline and surface patterning are known Gestalt cues
for perceptual organization (van der Helm this volume). The absence of simple planes of sym-
metry in most natural backgrounds is therefore a potential problem for cryptic animals. Indeed,
Cuthill and co-workers (Cuthill, Hiby, and Lloyd 2006; Cuthill et  al. 2006) showed that birds
found symmetrically coloured camouflaged prey more rapidly than asymmetric patterned prey,
although not all symmetrical patterns are necessarily equally easy to detect (Merilaita and Lind
2006). This makes it rather perplexing that more animals have not evolved asymmetric pattern-
ing although, in insects at least, there may be genetical or developmental constraints that make
it hard for surface pattern and underlying body plan to be decoupled. Selection experiments for
changed wing shape in butterflies produce tightly correlated changes in colour pattern (Monteiro,
Brakefield, and French 1997). Thus the genetic control of morphological symmetry, which is
probably constrained by locomotor requirements, seems tightly linked to surface patterning (see
discussion in Cuthill, Stevens, et al. 2006). Regularity could be expected to be another feature that
predators use to break camouflage, and blue tits find prey with spatially regular patterns more
rapidly (Dimitrova and Merilaita 2012).

The Problem of Multiple Backgrounds


In trying to understand the complex colour patterns of animals that cannot change their
appearance, Thayer (1909) painted background scenes as viewed through animal-shaped sten-
cils: a duck-shaped segment of lakeside, a fish-shaped portion of sea-grass. Interpreting animal
852 Osorio and Cuthill

camouflage as sampling the background was a major conceptual advance, but the question
arises:  what sort of background sample is optimal? Endler (1978, 1984, 1991) proposed that
crypsis should be defined as coloration that represents a random sample of the background at the
place and time where predation risk is highest. Others have argued that a random sample is not
necessarily optimal (Merilaita, Tuomi, and Jormalainen 1999; Merilaita, Lyytinen, and Mappes
2001; Ruxton et al. 2004b), supported by experiments showing that not all random samples are
equally concealed (Merilaita et al. 1999). If the background is heterogeneous and a single sample
must be chosen (i.e. no colour change by an individual), what is the best sample? Natural selection
will favour the pattern with the minimum average detectability across all backgrounds it may be
viewed against. The sample that is the minimum average difference from all possible backgrounds
against which it might be viewed is the most likely sample (in the sense of statistical likelihood),
not any random sample (Cuthill and Troscianko 2009). Defining such a maximum likelihood
sample is straightforward for a single perceptual dimension, but not for multiple dimensions
and not when low-level attributes such as colours, lines, and textures have been integrated into
features. However, if we accept such a ‘most likely’ pattern can be defined, three evolutionary
outcomes can be imagined: selection for a single, ‘typical’, specialist colour pattern; negative fre-
quency dependent selection (i.e. the predation intensity on any one pattern—phenotype—varies
with the relative abundance of that phenotype, such that rare phenotypes have an advantage and
common phenotypes are at a disadvantage) for multiple patterns matching different, common,
backgrounds; or selection for a single, ‘compromise’, pattern that combines possible backgrounds
as a weighted average. The best strategy will depend on how relative discriminability varies across
the multiple backgrounds (Merilaita et  al. 1999; Houston, Stevens, and Cuthill 2007). Loosely
speaking, similar backgrounds favour a compromise ‘average’ coloration, while the possibility of
being seen against rather different substrates favours a single specialist pattern, or divergent selec-
tion for multiple specialist patterns. In an ingenious experiment where captive blue jays searched
for computer-generated prey, whose coloration was controlled by a genetic algorithm and so
could evolve in response to the birds’ predation success, Bond and Kamil (2006) showed that a
fine-grained homogeneous background selected for a single prey colour whereas coarse-grained
heterogeneous backgrounds selected for polymorphism (multiple types). However, without a
metric for perceived contrast between different textures, the evaluation of what backgrounds can
be considered ‘similar’ or ‘different’ has to be evaluated empirically on a case-by-case basis. This
is an important area for future research and relates directly to the need for a mechanism-rooted
theory of texture perception.
The similarity to the background is not the only factor affecting detectability of a target. The
complexity of the background also affects visual search; that is, locating the target depends on not
only target-distractor similarity but also the amount of variation between background features
that are similar to the target (Duncan and Humphreys 1989). As a result, a camouflaged animal
may be better concealed in more complex habitats independent of its match to the background
(Merilaita et al. 2001; Merilaita 2003; Dimitrova and Merilaita 2010). In line with this, there is
recent evidence for animals choosing backgrounds that are not merely a good match to their own
patterns, but that are more visually complex (Kjernsmo and Merilaita 2012).

Obscuring Edges
The previous section has dealt with how visual textures in camouflage patterns match the back-
ground but, even when there is a close match, visual discontinuities at edges can reveal the outline
of an object or salient features within the object. The latter can include phase differences at the
Camouflage and Perceptual Organization in the Animal Kingdom 853

conjunction of body parts (e.g. limbs against body) or features, such as eyes or their components,
with a contour unlike those in the background. One strategy to obscure edges, which is used by
flatfish and cuttlefish, is to have partially transparent marginal fins that also continue the body
pattern, and hence merge the body into the background (Figure 41.1); partial burying has a simi-
lar effect.
Much better known are disruptive patterns, where colour is used to disguise or distract atten-
tion from the true outline of the animal or salient body parts, and hence to defeat figure-ground
segregation. Thayer (1909) was the first to outline what Cott (1940. p.  47) said were ‘certainly
the most important set of principles relating to concealment’. Both Thayer and Cott were art-
ists, having an intuitive understanding of the use of shading to create false perceptions of shape,
form, and movement, and both were active in campaigning for the adoption of camouflage by
the military in, respectively, the First and Second World Wars (Behrens 2002, 2011). Cott greatly
refined Thayer’s original ideas, and he produced a battery of illustrations from across the animal
kingdom to explain how disruption could work and plausibly to illustrate their action in nature
(Figure 41.2A). However, as recent researchers have realized, the term ‘disruptive coloration’ actu-
ally comprises several mechanisms, and some of those discussed by Thayer and Cott as disruptive
are better classified differently (Stevens and Merilaita 2009). We discuss these in turn.
For Thayer (1909), the central thesis was a paradox: that apparently conspicuous colours could
be concealing. This included patterns we now regard as classic disruptive coloration (he used the
term ‘ruptive’), namely the use of adjacent high-contrast colours to break up shape and form, but
he also extended the principle to patterns that do not conceal but instead deceive in other ways.
For example, the idea that high-contrast patterns could interfere with motion perception and
otherwise confuse attackers is discussed later in the section on Concealing Motion.
‘True’ disruptive coloration, for concealment per se, works against object detection by percep-
tual grouping, but, as Merilaita (1998) clarified, it employs mechanisms above and beyond back-
ground matching. Indeed, in Cott’s (1940) original formulation, it is essential that some colour
patches do not resemble colour patches found in the background; in our own treatment of dis-
ruptive coloration we relax this constraint. For Cott, two components were vital and, although he
did not make the connection, they relate directly to principles of perception. First, some colour
patches must match the background; second, some colour patches must be strongly contrasting
from the first patch type(s) and, in Cott’s and Thayer’s views, also from the background. Cott
called this ‘differential blending’, and we can see this as working against perceptual grouping of
the target by colour similarity. The background matching of some patches creates a weak bound-
ary between the animal and its surround at these junctions. The high and sharp contrast between
other patches on the animal and these background-matching regions creates strong false edges
internal to the animal’s boundary. The effect is that, for the viewer, some colour patches on the ani-
mal are statistically more likely to belong to the background than they are to each other (Cuthill
and Troscianko 2009).
In order to disrupt the outline of the animal, the prediction is that the contrasting colour patches
should intersect the edge of the animal more often than expected if the animal’s pattern was sim-
ply a random sample of the background texture. That is, if the animal’s true outline is interrupted
by high contrast, ‘strong’ pseudo-edges that are perpendicular to the animal’s boundary, then the
viewer gets powerful conflicting evidence for edges that are not consistent with the continuous
outline of a prey item. Merilaita (1998) showed this to be true of the dark and light colour patches
on a marine isopod crustacean. More recently, the efficacy of disruptive patterning against birds
has been demonstrated by using simulated wing patterns on artificial moth-like baited targets
pinned to trees (Cuthill et al. 2005). This study showed that colour blocks that intersected the edge
854 Osorio and Cuthill

of the ‘wing’ reduced the rate of attacks on the models compared to otherwise similar controls
with only internal patterning or that were uniformly coloured. A  computer-based experiment
using the same sort of targets on pictures of tree bark replicated the results with humans (Fraser
et al. 2007), suggesting that the perceptual mechanisms being fooled are common across birds
and humans. Most plausible would be continuity of strong edges, suggesting a bounding contour.
Consistent with this, it is striking that edges in camouflage patterns are often ‘enhanced’ with a
light margin to pale regions and a dark margin to dark regions (Figure 41.2B), a fact remarked
upon by Cott (1940). One possible interpretation (Osorio and Srinivasan 1991) is that such fea-
tures strongly excite edge detectors without unduly compromising cryptic camouflage. With this
in mind, Stevens and Cuthill (2006) analysed in situ photographs of the experimental targets used
in the bird predation experiments of Cuthill et al. (2005), appropriately calibrated for avian colour
vision. Using a straight-line detector from machine vision, the Hough transform, allied to a physi-
ologically plausible edge detector, the Marr-Hildreth Laplacian-of-Gaussian, Stevens and Cuthill
(2006) showed that edge-intersecting disruptive coloration defeated target detection, compared to
non-disruptive controls, in a pattern similar to the observed bird predation (Figure 41.3).
A camouflaged animal’s outline is not the only potentially revealing feature; mismatches in
the phases of patterns on adjacent body parts, or the distinctive colour and shape of an eye are
also salient features for a predator. Cott (1940) illustrated species, from birds to fish that have eye
stripes that match the colour of the pupil or iris, effectively forming a background with which the
eye blends. He also noted species with stripes bisecting the eye, using disruption to break up the
circular shape. Similarly, he illustrated frogs whose complex body patterns matched seamlessly on
different parts of the folded leg when sitting hunched up (Figure 41.2A). He called this coincident
disruptive coloration, the adjacency of strong contrasts creating false bounding contours span-
ning different body parts. Recently the effectiveness of coincident disruptive coloration in con-
cealing separate body regions has been experimentally verified in the field, using artificial targets
under bird predation (Cuthill and Székely 2009).
The resurgence in interest in Cott’s theories has focused mainly on concealment of the body’s
edge through peripherally placed disruptive colour patches. As we have discussed, the effects can
be explained as exploiting low-level visual processes, namely edge detection and contour inte-
gration. However, Cott’s and subsequent accounts make frequent reference to disruptive colora-
tion distracting attention from the body’s edge, through internally placed coherent ‘false shapes’
that contrast strongly with the surrounding body coloration. Cott called this ‘surface disruption’
and Stevens and others (2009) showed that this can be as or more effective than edge disruption
against avian predators. It is not clear whether the mechanism is actually diversion of attention, or
a lower-level process such as simultaneous contrast masking nearby (true) edges. Indeed, Cott’s
suggestion that small, highly conspicuous ‘distraction marks’ could decrease predation by dis-
tracting attention has rather equivocal support. One might imagine that if the marks are both
conspicuous and uniquely borne by prey, predators would learn to use these cues to detect prey.
This is what has been found in field experiments on birds searching for artificial prey (Stevens,
Graham et al. 2008). However, in laboratory experiments on birds where trials were intermixed
and there was a correspondingly reduced potential to learn that a mark was a perfect predictor of
prey presence, distraction marks reduced detection (Dimitrova et al. 2009).
There are a number of open questions about disruptive camouflage. Disruptive coloration
is sometimes discussed as if it were a strict alternative to background matching. It is certainly
true that seemingly disruptive camouflage patterns have a high visual contrast, and Cott (1940)
argued for a principle of ‘maximum disruptive contrast’, in which, subject to some patches match-
ing the background (‘differential blending’), the remaining colour patches should be maximally
Camouflage and Perceptual Organization in the Animal Kingdom 855

contrasting from these, and unlike background colours. However, in principle there is no reason
why features that distract from the natural outline of an animal should not present the same level
of contrast as background objects, as is probably the case for the cuttlefish Disruptive pattern
(Mäthger et al. 2006; Kelman et al. 2008; Zylinski et al. 2009a); indeed all military camouflage
patterns described as ‘disruptive’ consist of colours found in natural backgrounds. Stevens and
co-workers (2006), again using artificial moth-like prey in the field, found that bird predation was
lowest for disruptive patterns where the contrast between adjacent patches was high, but all col-
ours were within the background range. Disruptive patterns where some elements had yet higher
contrast, but were rare in the background, had increased predation, although they still fared better
than similarly coloured targets without outline-disrupting elements. In much the same way, when
humans search for similar targets on computer screens, if some prey patch colours are not found
in the background, detectability increases regardless of high internal contrast (Fraser et al. 2007).
The conclusion is that high contrast between adjacent patches is beneficial for the creation of false
bounding contours, but that, contrary to Cott’s suggestion, contrast is constrained by the need to
match common background colours.

Obscuring 3D Form
Both cryptic and disruptive camouflage are often studied from the point of view of 2D image
segregation. However, it is perfectly plausible that animals may benefit from cryptic patterns that
match the light and shade of naturally illuminated scenes, especially when the animal is larger
than the objects that make up the background. The intensity difference between objects in shadow
compared to directly illuminated surfaces can be very much larger than that between reflective
surfaces under uniform illumination, but to our knowledge no one has attempted to establish how
the dynamic range of camouflage patterns matches the intensity range of surfaces such as leaves
or stones or their shadows.
Although there are few if any direct studies, it seems plausible that some camouflage patterns
produce a disruptive effect whereby a continuous body surface is seen as lying in different depth
planes. For example matte black spots or patches can appear as holes in a surface, and white fea-
tures as glossy highlights. Figure 41.2C illustrates Cott’s (1940) interpretation of the enhanced
borders as a 3D effect. A charming example of a false 3D effect is produced by cuttlefish, which
shadow the white square on their mantle to create the effect of a pebble (Langridge 2006).

Countershading
Countershading, like disruptive coloration, is a principle of camouflage that was ‘discovered’ in
the late nineteenth century (Poulton 1890; Thayer 1896), found military application in the early
twentieth century, and has recently been a subject of direct experimental study. Many animals
have a dark upper surface and a pale lower surface separated by an intensity gradient. This type of
pattern counters the effect of natural illumination gradients, on the 3D body, which may benefit
camouflage. Thus when cuttlefish rotate from the usual orientation, they move their dark and light
regions so they remain on the top and bottom body surfaces, respectively (Ferguson, Messenger,
and Budelmann 1994). Historically, the taxonomic ubiquity of such dorso-ventral gradients in
coloration was seen as evidence of the adaptive benefits of concealment of 3D form. However,
there are many adaptive reasons to have such a gradient, some of which see the colour only as an
incidental by-product of the pigment gradient: for example, protection from UV light, or resist-
ance to abrasion—because melanin toughens biological tissues (Kiltie 1988; Ruxton, Speed, and
Kelly 2004a; Rowland 2009). In fact, recent experimental studies on model ‘caterpillars’ coloured
856 Osorio and Cuthill

uniformly, or with countershading or reverse countershading patterns, have demonstrated that


countershading helps concealment from birds (Rowland et al. 2007, 2008). However, the prin-
ciple by which countershading patterns achieve camouflage is less obvious. In pelagic fish it is
likely that countershading allows the animals to match the space light in the open water beyond
the animal (an effect also achieved by mirror-like scales), so the fish becomes invisible. In other
habitats countershading may either facilitate matching of the background, where the background
differs according to viewing direction (e.g. for pelagic fish, the light surface when seen from below
favours a light belly, the dark depths when seen from above favour a dark back), or conceal the 3D
form of the body through diminished self-shading. Recently Allen and co-workers (2012) com-
pared the predicted pattern of fur shading to counteract dorso-ventral gradients created by illumi-
nation in different light environments against the distribution of coat colours across 114 species of
ruminants (grazing mammals such as deer, sheep, and cattle). There is a correspondence between
the observed pattern and that predicted, after controlling for possibly confounding effects of simi-
larity due to taxonomic closeness; this lends support to the self-shadow concealment hypothesis.

Concealing Motion
The term ‘motion camouflage’ can be discussed in two contexts: crypsis, when the background
itself moves, and concealment while the animal itself is in motion. To take the first, many back-
grounds have moving elements—leaves in the wind, seaweed in the tide—and an otherwise
background-matching, but static, animal may be revealed by its failure to match the motion sta-
tistics of the background. The swaying, stop-start motion of a chameleon or praying mantis seems
to mimic the rocking of leaves and twigs in the breeze, and the lack of consistent linear motion
towards the prey may itself reduce salience. Analysis of the movements of an Australian lizard, the
jacky dragon Amphibolurus muricatus, shows that when it signals to other members of its species,
its motion statistics move well outside the background distribution, but when not signalling, its
own distribution falls within that of the background (Peters and Evans 2003; Peters, Hemmi, and
Zeil 2007). Cuttlefish reduce the contrast in their body patterns during motion (Zylinski, Osorio,
and Shohet 2009c), perhaps because the high contrast edges seen in disruptive patterning are
more easily detected in motion.
The second issue is whether a moving animal can remain concealed. Many facts point to the
conclusion that motion breaks camouflage. Correlated motion is a strong cue to grouping, so
that an otherwise highly camouflaged object is readily segregated from the background because
its pattern elements share a common fate absent in otherwise identical background elements.
Experiments on the detection of targets on complex backgrounds indicate that, for single targets,
neither background matching nor disruptive camouflage offer any benefits (Hall et al. 2013). This
would explain why big cats stalking prey, and soldiers moving across open ground, move in a
combination of stealthy motion interspersed with frequent pauses.
If the need for motion precludes concealment, other means of defence must be used (e.g. capac-
ity for flight, defensive spines, or toxins), some of which involve the use of colour. Warning colours
associated with unpalatability, or mimicry of such patterns, fall outside the remit of this chapter
(instead see, e.g., Ruxton et al. 2004b), but coloration designed to confuse or deceive has histori-
cally, although erroneously, been bracketed within disruptive coloration and so we discuss it briefly
here. For example, the idea that high-contrast patterns could interfere with judgment of velocity
and otherwise confuse attackers, which goes back to Thayer (1909), was a tactic that became
known as ‘dazzle’ coloration when deployed on ships during both World Wars (see Williams 2001;
Behrens 2002). Part of the alleged success was attributed to interference with the optical range
Camouflage and Perceptual Organization in the Animal Kingdom 857

finding used on U-boats, but the difficulty of judging speed and trajectory has also been cited
(Williams 2001; Behrens 2002). The mechanism(s) by which such patterns have their effects is
less clear, because perception of speed is affected by many factors, notably size, contrast, and
texture orientation (see Scott-Samuel et al. 2011). Dazzle patterning may work through any or
all of such factors. Recent research shows that the perceptual distortions created by high-contrast
stripes can be quite significant for speed (Scott-Samuel et al. 2011) and can affect capture success
(Stevens, Yule and Ruxton et al. 2008). This can be added to the (long) list of proposed evolution-
ary explanations for zebra stripes (see, e.g., Cloudsley-Thompson 1999; Caro 2011). Thayer (1909)
argued that the stripes matched the vertical patterning created by savannah grasses, and so func-
tion through background matching, but Godfrey, Lythgoe, and Rumball (1987), through Fourier
analysis, showed that zebra stripes, unlike tiger stripes, were a poor match to the background.
Alternatively, given that zebra live in herds, the stripes could serve both a background-matching
and disruptive function, if the background is considered to be other zebras. Ironically, given their
frequent occurrence in discussions on camouflage, the only function for zebra stripes that has
been experimentally tested is their effectiveness in repelling biting flies (Waage 1981; Egri et al.
2012; Caro et al. 2014).

Conclusions
The scientific study of animal camouflage and the development of Gestalt psychology drew
heavily from each other in the first half of the twentieth century. Nature provides compelling
examples of the sort of problems a visual system has to solve in separating figure from ground
and in identifying relevant objects for attention. To explain the form of animal camouflage,
it remains essential to understand not only the photoreceptors of the animal from which the
target seeks concealment (photoreceptors which may be very different in number and tuning
from our own), but also the cognitive processes behind perception. It is clear that features
such as disruptive coloration and edge enhancement, coincidence of colour patches across
adjacent body parts, and gradients in shading that counter illumination gradients, to name
but a few, are adaptations against the Gestalt principles used in object segregation. In turn, we
believe that animal camouflage offers an excellent model system in which to test the general-
ity of these principles beyond Homo sapiens.

References
Allen, W. A., R. Baddeley, I. C. Cuthill, and N. E. Scott-Samuel (2012). ‘A Quantitative Test of the
Predicted Relationship between Countershading and Lighting Environment’. Amer. Natur. 180: 762–776.
Allen, W. L., I. C. Cuthill, N. E. Scott-Samuel, and R. Baddeley (2011). ‘Why the Leopard Got Its
Spots: Relating Pattern Development to Ecology in Felids’. Proc. R. Soc. B 278: 1373–1380.
Barbosa A., L. M. Mäthger, K. C. Buresch, J. Kelly, C. Chubb, et al. (2008). ‘Cuttlefish Camouflage: The
Effects of Substrate Contrast and Size in Evoking Uniform, Mottle or Disruptive Body Patterns’. Vision
Res. 48: 1242–1253.
Barbosa, A., J. J. Allen, L. M. Mäthger, and R. T. Hanlon (2011). ‘Cuttlefish Use Visual Cues to Determine
Arm Postures for Camouflage’. Proc. R. Soc. B 279: 84–90.
Behrens, R. R. (2002). False Colors: Art, Design and Modern Camouflage. Dysart, IA: Bobolink Books.
Behrens, R. R. (2011). ‘Nature’s Artistry: Abbott H. Thayer’s Assertions about Camouflage in Art, War and
Nature’. In Animal Camouflage: Mechanisms and Function, edited by M. Stevens and S. Merilaita, pp.
87–100. Cambridge: Cambridge University Press.
858 Osorio and Cuthill

Beldade, P. and P. M. Brakefield (2002). ‘The Genetics and Evo-Devo of Butterfly Wing Patterns’. Nature
Reviews Genetics 3: 442–452.
Bennett, A. T. D., I. C. Cuthill, and K. Norris (1994). ‘Sexual Selection and the Mismeasure of Color’.
Am. Nat. 144: 848–860.
Bond, A. B. and A. C. Kamil (2006). ‘Spatial Heterogeneity, Predator Cognition, and the Evolution of Color
Polymorphism in Virtual Prey’. Proc. Nat Acad. Sci. USA 103: 3214–3219.
Caro, T. (2011). ‘The Functions of Black-and-White Colouration in Mammals’. In Animal
Camouflage: Mechanisms and Function, edited by M. Stevens and S. Merilaita, pp. 298–329.
Cambridge: Cambridge University Press.
Caro, T., A. Izzo, R. C. Reiner, H. Walker, and T. Stankowich. (2014). ‘The Function of Zebra Stripes’.
Nat. Commun. 5: 3535.
Chiao, C.-C. and R. T. Hanlon (2001). ‘Cuttlefish Camouflage: Visual Perception of Size, Contrast
and Number of White Squares on Artificial Substrata Initiates Disruptive Coloration’. J. Exp. Biol.
204: 2119–2125.
Chiao, C.-C., E. J. Kelman, and R. T. Hanlon (2005). ‘Disruptive Body Patterning of Cuttlefish (Sepia
officinalis) Requires Visual Information Regarding Edges and Contrast of Objects in Natural Substrate
Backgrounds’. Biological Bulletin 208: 7–11.
Chiao C.-C., C. Chubb, and R. T. Hanlon (2007). ‘Interactive Effects of Size, Contrast, Intensity and
Configuration of Background Objects In Evoking Disruptive Camouflage in Cuttlefish’. Vision Res.
47: 2223–2235.
Chiao, C.-C, J. K. Wickiser, J. J. Allen, B. Genter, and R. T. Hanlon (2011). ‘Hyperspectral Imaging of
Cuttlefish Camouflage Indicates Good Color Match in the Eyes of Fish Predators’. Proc. Nat. Acad. Sci.
USA 108: 9148–9153.
Cloudsley-Thompson, J. L. (1999). ‘Multiple Factors in the Evolution of Animal Coloration’. Naturwiss.
86: 123–132.
Cott, H. B. (1940). Adaptive Coloration in Animals. London: Methuen.
Crook, A. C., R. J. Baddeley, and D. Osorio (2002). ‘Identifying the Structure in Cuttlefish Visual Signals’.
Phil. Trans. R. Soc. Lond. B 357: 1617–1624.
Cuthill, I. C. and A. T. D. Bennett (1993). ‘Mimicry and the Eye of the Beholder’. Proc. R. Soc. B
253: 203–204.
Cuthill, I. C., M. Stevens, J. Sheppard, T. Maddocks, C. A. Parraga, et al. (2005). ‘Disruptive Coloration
and Background Pattern Matching’. Nature 434, 72–74.
Cuthill, I. C. (2006). ‘Color Perception’. In Bird Coloration. Vol. 1: Mechanisms and Measurement, edited by
G. E. Hill and K. J. McGraw, pp. 3–40. Cambridge, MA: Harvard University Press.
Cuthill, I. C., E. Hiby, and E. Lloyd (2006a). ‘The Predation Costs of Symmetrical Cryptic Coloration’. Proc.
R. Soc. B 273: 1267–1271.
Cuthill, I. C., M. Stevens, A. M. M. Windsor, and H. J. Walker (2006b). ‘The Effects of Pattern Symmetry
on Detection of Disruptive and Background Matching Coloration’. Behav. Ecol. 17: 828–832.
Cuthill I. C. and A. Székely (2009). ‘Coincident Disruptive Coloration’. Phil. Trans. R. Soc. B 364: 489–496.
Cuthill, I. C. and T. S. Troscianko (2009). ‘Animal Camouflage: Biology Meets Psychology, Computer
Science and Art’. Int. J. Des. Nat. Ecodyn. 4(3): 183–202.
Denton, E. J. (1970). ‘On the Organization of Reflecting Surfaces in Some Marine Animals’. Phil. Trans.
R. Soc. B 258: 285–313.
Dimitrova, M., N. Stobbe, H. M. Schaefer, and S. Merilaita (2009). ‘Concealed by
Conspicuousness: Distractive Prey Markings and Backgrounds’. Proc. R. Soc. B 276: 1905–1910.
Dimitrova, M. and S. Merilaita (2010). ‘Prey Concealment: Visual Background Complexity and Prey
Contrast Distribution’. Behav. Ecol. 21: 176–181.
Camouflage and Perceptual Organization in the Animal Kingdom 859

Dimitrova, M. and S. Merilaita (2012). ‘Prey Pattern Regularity and Background Complexity Affect
Detectability of Background-Matching Prey’. Behav. Ecol. 23: 384–390.
Duncan, J. and G. W. Humphreys (1989). ‘Visual Search and Stimulus Similarity’. Psych. Rev. 96: 433–458.
Egri, A., M. Blahó, G. Kriska, R. Farkas, M. Gyurkovszky, S. Åkesson, and G. Horváth (2012)
‘Polarotactic Tabanids Find Striped Patterns with Brightness and/Or Polarization Modulation Least
Attractive: An Advantage Of Zebra Stripes’. J. Exp. Biol. 215: 736–745.
Endler, J. A. (1978). ‘A Predator’s View of Animal Color Patterns’. Evol. Biol. 11: 319–364.
Endler, J. A. (1981). ‘An Overview of the Relationships between Mimicry and Crypsis’. Biol. J. Linn. Soc.
16: 25–31.
Endler, J. A. (1984). ‘Progressive Background Matching in Moths, and a Quantitative Measure of Crypsis’.
Biol. J. Linn. Soc. 22: 187–231.
Endler, J. A. (1991). ‘Interactions between Predators and Prey’. In Behavioural Ecology: An Evolutionary
Approach. 3rd edn, edited by J. R. Krebs and N. B. Davis, pp. 169–196. Oxford: Blackwell.
Ferguson, G., J. Messenger, and B. Budelmann (1994). ‘Gravity and Light Influence the Countershading
Reflexes of the Cuttlefish Sepia officinalis’. J. Exp. Biol. 191: 247–256.
Fraser, S., A. Callahan, D. Klassen, and T. N. Sherratt (2007). ‘Empirical Tests of the Role of Disruptive
Coloration in Reducing Detectability’. Proc. Roy. Soc. B 274: 1325–1331.
Frisby, J. (2004). ‘Bela Julesz 1928—2003: A Personal Tribute’. Perception 33: 633–637.
Godfrey, D., J. N. Lythgoe, and D. A. Rumball (1987). ‘Zebra Stripes and Tiger Stripes: The Spatial
Frequency Distribution of the Pattern Compared to that of the Background is Significant in Display and
Crypsis’. Biol. J. Linn. Soc. 32: 427–433.
Hall, J. R., I. C. Cuthill, R. Baddeley, A. J. Shohet, and N. E. Scott-Samuel (2013). ‘Camouflage, Detection
and Identification of Moving Targets’. Proc. R. Soc. B 280(1758): 20130064.
Hanlon, R. T. and J. B. Messenger (1988). ‘Adaptive Coloration in Young Cuttlefish (Sepia officinalis
L.): The Morphology and Development of Body Patterns and their Relation to Behaviour’. Phil. Trans.
R. Soc. B 320: 437–487.
Hanlon, R. T., J. W. Forsythe, and D. E. Joneschild (1999). ‘Crypsis, Conspicuousness, Mimicry and
Polyphenism as Antipredator Defences of Foraging Octopuses on Indo-Pacific Coral Reefs, with a
Method of Quantifying Crypsis from Video Tapes’. Biol. J. Linn. Soc. 66: 1–22.
Hanlon, R. T. (2007). ‘Cephalopod Dynamic Camouflage’. Curr. Biol. 17: 400–404.
Hanlon, R. T., C.-C. Chiao, L. M. Mäthger, K. C. Buresch, A. Barbosa, J. J. Allen, L. Siemann, and C.
Chubb (2011). ‘Rapid Adaptive Camouflage in Cephalopods’. In Animal Camouflage: Mechanisms and
Functions, edited by M. Stevens, and S. Merilaita, pp. 145–163. Cambridge: Cambridge University Press.
Houston, A. I., M. Stevens, and I. C. Cuthill (2007). ‘Animal Camouflage: Compromise or Specialize in a
2 Patch-Type Environment?’ Behav. Ecol. 18: 769–775.
Jordan, T. M., J. C. Partridge, and N. W. Roberts (2012). ‘Non-Polarizing Broadband Multilayer Reflectors
in Fish’. Nature Photonics 6: 759–763.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: University of Chicago Press.
Julesz, B. (1981). ‘Textons, the Elements of Texture Perception, and their Interactions’. Nature 290: 91–97.
Julesz, B. (1984). ‘A Brief Outline of the Texton Theory of Human Vision’. Trends Neurosci. 7: 41–45.
Keen, A. M. (1932). ‘Protective Coloration in the Light of Gestalt Theory’. J. Gen. Psychol. 6: 200–203.
Kelman, E. J., P. Tiptus, and D. Osorio (2006). ‘Juvenile Plaice (Pleuronectes platessa) Produce Camouflage
by Flexibly Combining two Separate Patterns’. J. Exp. Biol. 209: 3288–3292.
Kelman E. J., D. Osorio, and R. J. Baddeley (2008). ‘A Review of Cuttlefish Camouflage and Object
Recognition and Evidence for Depth Perception’. J. Exp. Biol. 211: 1757–1763.
Kiltie, R. A. (1988). ‘Countershading: Universally Deceptive or Deceptively Universal?’ Trends Ecol. Evol.
3: 21–23.
860 Osorio and Cuthill

Kiltie, R. A., J. Fan, and A. F. Laine (1995). ‘A Wavelet-Based Metric for Visual Texture Discrimination with
Applications in Evolutionary Ecology’. Math. Biosci. 126: 21–39.
Kjernsmo, K. and S. Merilaita (2012). ‘Background Choice as an Anti-Predator Strategy: The Roles of
Background Matching and Visual Complexity in the Habitat Choice of the Least Killifish’. Proc. R. Soc.
B. 279: 4192–4198.
Landy, M. S. and N. Graham (2004). ‘Visual perception of texture’. In The Visual Neurosciences, edited by
L. M. Chalupa and J. S. Werner, pp. 1106–1118. Cambridge, MA: MIT Press.
Langridge, K. V. (2006). ‘Symmetrical Crypsis and Asymmetrical Signalling in the Cuttlefish Sepia
officinalis’. Proc. R. Soc. B. 273: 959–967.
Langridge, K. V., M. Broom, and D. Osorio (2007). ‘Selective Signalling by Cuttlefish to Predators’. Current
Biology 17 R1044–R1045.
Marshall, N. J. and J. B. Messenger (1996). ‘Colour-Blind Camouflage’. Nature 382: 408–409.
Mäthger, L., A. Barbosa, S. Miner, and R. T. Hanlon (2006). ‘Color Blindness and Contrast Perception in
Cuttlefish (Sepia officinalis) Determined by a Visual Sensorimotor Assay’. Vis. Res. 46: 1746–1753.
Merilaita, S. (1998). ‘Crypsis through Disruptive Coloration in an Isopod’. Proc. Roy. Soc. B. 265:
1059–1064.
Merilaita, S., J. Tuomi, and V. Jormalainen (1999). ‘Optimization of Cryptic Coloration in Heterogeneous
Habitats’. Biol. J. Linn. Soc. 67: 151–161.
Merilaita, S., A. Lyytinen, and J. Mappes (2001). ‘Selection for Cryptic Coloration in a Visually
Heterogeneous Habitat’. Proc R. Soc. Lond. B 268: 1925–1929.
Merilaita, S. (2003). ‘Visual Background Complexity Facilitates the Evolution of Camouflage’. Evolution
57: 1248–1254.
Merilaita, S. and J. Lind (2006). ‘Great Tits (Parus major) Searching for Artificial Prey: Implications for
Cryptic Coloration and Symmetry’. Behav. Ecol. 17: 84–87.
Metzger, W. (2009). Laws of Seeing, trans. by L. Spillman and S. Lehar. Cambridge, MA: MIT Press.
(Originally published 1936. Gesetze des Sehens. Frankfurt: Kramer.)
Monteiro, A., P. M. Brakefield, and V. French (1997). ‘The Relationship between Eyespot Shape and
Wing Shape in the Butterfly Bicyclus anynana: A Genetic and Morphometrical Approach’. J. Evol. Biol.
10: 787–802.
Nieder A. (2002). ‘Seeing More than Meets the Eye: Processing of Illusory Contours in Animals’. J. Comp.
Physiol. A 188: 249–260.
Osorio, D., Srinivasan, M. V. (1991). Camouflage by edge enhancement in animal coloration patterns and
its implications for visual mechanisms. Proc. R. Soc. Lond. B, 244: 81–85.
Peters, R. A. and C. S. Evans (2003). ‘Design of the Jacky Dragon Visual Display: Signal and Noise
Characteristics in a Complex Visual Environment’. J. Comp. Physiol. A 189: 447–459.
Peters, R. A., J. M. Hemmi, and J. Zeil (2007). ‘Signalling against the Wind: Modifying Motion Signal
Structure in Response to Increased Noise’. Curr. Biol. 17: 1231–1234.
Peyré, G. (2009). ‘Sparse Modeling of Textures’. J. Mathematical Imaging and Vision 34: 17–31.
Portilla, J. and E. P. Simoncelli (2000). ‘A Parametric Texture Model Based on Joint Statistics of Complex
Wavelet Coefficients’. Int. J. Computer Vision: 40: 49–70.
Poulton, E. B. (1890). The Colours of Animals: Their Meaning and Use. Especially Considered in the Case of
Insects. 2nd edn. London: Kegan Paul, Trench Trübner and Co.
Ramachandran, V. S., C. W. Tyler, R. L. Gregory, D. Rogers-Ramachandran, S. Duensing, C. Pillsbury,
and C. Ramachandran (1996). ‘Rapid Adaptive Camouflage in Tropical Flounders’. Nature
379: 815–818.
Rowland, H. M., M. P. Speed, G. D. Ruxton, M. Edmunds, M. Stevens, and I. F. Harvey (2007).
‘Countershading Enhances Cryptic Protection: An Experiment with Wild Birds and Artificial Prey’.
Anim. Behav. 74: 1249–1258.
Camouflage and Perceptual Organization in the Animal Kingdom 861

Rowland, H. M., I. C. Cuthill, I. F. Harvey, M. P. Speed, and G. D. Ruxton (2008). ‘Can’t Tell the
Caterpillars from the Trees: Countershading Enhances Survival in a Woodland’. Proc. R. Soc. B
275: 2539–2545.
Rowland, H. M. (2009). ‘From Abbott Thayer to the Present Day: What Have We Learned about the
Function of Countershading?’ Phil. Trans. R. Soc. B 364: 519–527.
Ruxton, G. D., M. P. Speed, and D. Kelly (2004a). ‘What, if Anything, is the Adaptive Function of
Countershading?’ Anim. Behav. 68: 445–451.
Ruxton, G., M. Speed, and T. Sherratt (2004b). Avoiding Attack: The Evolutionary Ecology of Crypsis,
Warning Signals and Mimicry. Oxford: Oxford University Press.
Saidel, W. M. (1988). ‘How to Be Unseen: An Essay in Obscurity’. In Sensory Biology of Aquatic Animals,
edited by J. Atema, R. Fay, A. N. Popper, and W. Tavolga, pp. 487–513. New York: Springer.
Scott-Samuel, N. E., R. Baddeley, C. E. Palmer, and I. C. Cuthill (2011). ‘Dazzle Camouflage Affects Speed
Perception’. PLoS One 6(6): e20233.
Shapley, R. M., N. Rubin, and D. Ringach (2004). ‘Visual Segmentation and Illusory Contours’. In The
Visual Neurosciences, edited by L. M. Chalupa and J. S. Werner, pp. 1119–1128. Chicago: MIT Press.
Shohet A. J., R. J. Baddeley, J. C. Anderson, E. J. Kelman, and D. Osorio (2006). ‘Cuttlefish Response
to Visual Orientation of Substrates, Water Flow and a Model of Motion Camouflage’. J. Exp. Biol.
209: 4717–4723.
Shohet, A., R. J. Baddeley, J. Anderson, and D. Osorio (2007). ‘Cuttlefish Camouflage: A Quantitative
Study of Patterning’. Biol. J. Linn. Soc. 92: 335–345.
Simoncelli, E. P. and B. A. Olhausen (2001). ‘Natural Image Statistics And Neural Representation’. Ann.
Rev. Neurosci. 24: 1193–1216.
Skelhorn, J., H. M. Rowland, and G. D. Ruxton (2010a). ‘The Evolution and Ecology of Masquerade’.
Bio. J. Linn. Soc. 99: 1–8.
Skelhorn, J., H. M. Rowland, M. P. Speed, and G. D. Ruxton (2010b). ‘Masquerade: Camouflage Without
Crypsis’. Science 327: 51.
Stevens, M. and I. C. Cuthill (2006). ‘Disruptive Coloration, Crypsis and Edge Detection in Early Visual
Processing’. Proc. R. Soc. B 273: 2141–2147.
Stevens, M., I. C. Cuthill, A. M. M. Windsor, and H. J. Walker (2006). ‘Disruptive Contrast in Animal
Camouflage’. Proc. R. Soc. B 273: 2433–2438.
Stevens, M., J. Graham, I. S. Winney, and A. Cantor (2008). ‘Testing Thayer’s Hypothesis: Can Camouflage
Work by Distraction?’ Biol. Lett. 4: 648–650.
Stevens, M., D. H. Yule, and G. D. Ruxton (2008). ‘Dazzle Coloration and Prey Movement’. Proc. R. Soc. B
275: 2639–2643.
Stevens, M. and Merilaita, S. (2009). Animal camouflage: current issues and new perspectives. Phil. Trans.
R. Soc. B 364: 423–427.
Stevens, M., I. S. Winney, A. Cantor, and J. Graham (2009). ‘Object Outline and Surface Disruption in
Animal Camouflage’. Proc. R. Soc. B 276: 781–786.
Stuart-Fox D. and A. Moussalli (2009). ‘Camouflage, Communication and Thermoregulation: Lessons from
Colour Changing Organisms’. Phil. Trans. R. Soc. B 364: 463–470.
Thayer, A. H. (1896). ‘The Law Which Underlies Protective Coloration’. Auk 13: 477–482.
Thayer, G. H. (1909). Concealing-Coloration in the Animal Kingdom: An Exposition of the Laws of Disguise
through Color and Pattern: Being a Summary of Abbott H. Thayer’s Discoveries. New York: Macmillan.
Waage, J. (1981). ‘How the Zebra Got its Stripes—Biting Flies as Selective Agents in the Evolution of Zebra
Coloration’. J. Ent. Soc. S. Afr. 44: 351–358.
Wente, W. H. and J. B. Phillips (2005). ‘Microhabitat Selection by the Pacific Treefrog, Hyla regilla’. Animal
Behaviour 70: 279–287.
Williams, D. (2001). Naval Camouflage 1914–1945. Barnsley: Pen and Sword Books.
862 Osorio and Cuthill

Zylinski, S., D. Osorio, and A. J. Shohet (2009a). ‘Edge Detection and Texture Classification by Cuttlefish’. J.
Vision 9: 1–10.
Zylinski, S., D. Osorio, and A. J. Shohet (2009b). ‘Perception of Edges and Visual Texture in the
Camouflage of the Common Cuttlefish, Sepia officinalis’. Phil. Trans. R. Soc. B 364: 439–448.
Zylinski, S., D. Osorio, and A. J. Shohet (2009c). ‘Cuttlefish Camouflage: Context-Dependent Body Pattern
Use during Motion’. Proc. R. Soc. B 276: 3963–3969.
Zylinski, S. and D. Osorio (2011). ‘What Can Camouflage Tell us about Non-Human Visual Perception?
A Case Study of Multiple Cue Use in the Cuttlefish’. In Animal Camouflage: Mechanisms and Function,
edited by M. Stevens and S. Merilaita, pp. 164–185. Cambridge: Cambridge University Press.
Zylinski, S. and A. S. Darmaillacq, and N. Shashar (2012). ‘Visual Interpolation for Contour Completion
by the European Cuttlefish (Sepia officinalis) and its Use in Dynamic Camouflage’. Proc. R. Soc. B
279: 2386–2390.
Chapter 42

Design Insights: Gestalt, Bauhaus,


and Japanese Gardens
Gert J. van Tonder and Dhanraj Vishwanath

Introduction to Perceptual Organization


and Visual Design
‘Design’ encompasses a range of concepts that go well beyond visual perception. The word derives
from the Latin dēsignāre ‘to designate,’ meaning ‘to mark out’ (Collins English Dictionary 2011).
In terms of visual design (e.g. graphic, landscape, architectural, sculptural, product and fashion
design), this refers to lifting out from the morass of configural possibilities those which most
clearly convey the intentions of the designer, communicate how the design should be used, and
intuitively reveal the physical constraints imposed by the material design.
With a given utility in mind, the designer shapes an object into a given form, selectively enhanc-
ing the essence of its visual character. Visual style is the framework through which the clarifica-
tion of pattern is achieved. The creativity of designers lies in the originality of their conceptions
and style. In good design, perceived form matches perceived function, and both are consistent
with intended function. This, in fact, is not too distant from a broadly construed definition of
perception. Yet great design is not automatically achieved. Specifying the relevant goals, environ-
ment, primitives, requirements and constraints depends on the talent, skill, and experience of the
designer (Ralph and Wand 2009).
Design is rarely experienced as neutral. It is often imbued with an aesthetic that, while resist-
ing succinct verbal description, offers an immediate affirmation in its resonance with perception
(Arnheim 1969). Certain designed objects—a car, house, garment, garden, or painting—may be
coveted for their visual appearance, while others are not. What, visually, sets them apart?
When someone creates a design, the salient perceptual qualities in what is both mentally
envisaged and actually seen in real time as the design progresses dominate the trajectory
of the design process: perception serendipitously and fundamentally shapes the design out-
come. Any human-made creation therefore reflects back upon perception, offering potential
insight into the constructs that resonate with the internal organization of percepts—a good
reason why vision researchers should have an active interest in visual design. Haptic percep-
tion and motor function are, of course, other fundamental dimensions of design, especially
in design with a human end user in mind. Haptics constrain the range of possibilities among
visual patterns that would permit a given action. A  device built for manual manipulation
needs to suit the physical dimensions, constraints, and functionality of the human hand,
regardless of visual appearance. In this chapter, the focus will be on visual perception with
the assumption that we are already considering designs that fall within the functional haptic
range of the human body.
864 van Tonder and Vishwanath

A different visual aesthetic results when considerations about functional utility of the designed
item far outweigh those regarding the accommodation of a human user. Craftsmanship is the
art of combining qualitatively and aesthetically rich user interfaces with a high degree of func-
tional utility. Perception is not infallible: some designs are intentionally made with a high degree
of visual appeal, but handling of such an object should swiftly expose discrepancies between its
visual ‘promise’ of functionality and its actual frustrating performance. Design can even deliber-
ately counter the perceptual tendency to match form with function. Cartoons by Heath Robinson
(1872–1944) and Rube Goldberg (1883–1970) depict machines that accomplish simple tasks
through absurdly complex means, to the point of rendering them useless in practical terms.
Nature can be considered the evolutionary cradle for perception. While it is likely that all sen-
tient entities experience their own version of ‘reality’ (von Uexküll 1926), human-made designs
can alter, enhance, or antagonize mechanisms of perceptual organization that originally evolved
to deal with a natural environment unfettered by human hands. Of particular interest in this chap-
ter, therefore, are applied examples where human design aims to recreate some idealized aspect
of nature. The first section will be devoted to the intuitive insight captured by instances in which
classical Japanese designs emphasize the relation between human perception and natural form.
The same perceptual factors implicit in the centuries-old gardening manuals of Japan are partly
incorporated in ideas put forth by the Gestalt school of psychology, the Bauhaus and other move-
ments, nearly a millennium later, as will be discussed in the second section. We will also demon-
strate how Japanese design principles more directly influenced Bauhaus design.
In the third section, we discuss how naturalistic structure shares principles with the visual
patterns emphasized in Japanese design, Gestalt and Bauhaus approaches, thus serving as their
potential common denominator.
The appendix at the end of the chapter revisits a few recent general frameworks for thinking
about visual perception of designed structure.

Perceptual Effects in Classical Japanese Architecture


and Designed Landscape
Japanese Design Concepts
Among the great landscape designs of the world, classical Japanese architecture and gardens are
of special interest. Over the last millennium, they have culminated in a canon of design princi-
ples for engendering an idealized naturalistic order among design elements; specifically, the order
present in large naturalistic vistas is aimed to be recreated in relatively limited space. Japanese
garden design offers valuable insight into what a good, balanced natural shape is and how different
natural and human-made structures can be harmoniously combined.
The quintessential Japanese garden contrasts starkly with baroque perspective gardens,
such as the courts at Versailles, Herrenhausen, and Veitshöchheim. These structures impose
non-naturalistic, pure geometries onto natural design elements, usually over large spatial scales.
The baroque garden appears as the continuation of human architectural geometry into the sur-
rounding exterior space, while in a classical Japanese rock garden the transition from human
design to naturalistic form is more emphasized (Arnheim 1966).
The key concepts in Japanese design relate to form and visual organization. Nōtan—the
overall gist of light and dark in a design—concentrates on the balance, spatial layout, and
softness of light and dark (Tanizaki [1933] 1977); it concerns the shape of figure and the
shape of the empty spaces delineated around the figure; the interplay and shapes of light,
specularity, and shadow, and any contrasting visual attributes, be it light, colour, size, shape,
Design Insights 865

or other qualities. Hongatte—the way in which the design layout guides the gaze (Kuitert
2002)—refers to visual balance, asymmetry, and incompleteness in the visible parts. Mitate—
literally ‘setting up the eye’—relates to techniques for bringing a new visual awareness to a
familiar object through the creation of visual allegories. For example, re-using the foundation
stone from a pillar of a temple as a stone washbasin not only introduces novel, interesting
stone shapes into a new context, but creates metaphorical narratives, for example by linking
the foundations of a place of spiritual practice with a fountain, a life-giving source of purifica-
tion. Shin-Gyō-Sō concerns the degree of formality in applying light, shadow, asymmetry, and
irregularity (Keane 1996, p. 77). In the design of a stone path, for example, at the most formal
level—Shin—stone shapes will be regular, angular, with little or no variation in colour, shape,
and size, arranged in a regular tessellation in a straight path with a straight border. Individual
stones and the path as a whole will tend to occupy fully rectangular frames. However, the
stones are not usually smoothly polished, as this is thought to rob them of their simple, natu-
ral materiality—a powerful Japanese design aspect referred to as Wabi-Sabi (Yanagi 1972). At
the most informal level—Sō—stones of varied shapes are spaced at more irregular intervals,
with small stones interspersed with large, light stones with dark, regular with irregular, rough
with smooth, the entire path winding within a loosely defined, jagged border, as if acciden-
tally stumbled upon in nature.
In actual designs, the combination of the three levels gives rise to very complex variations on
the theme. A formal path may intersect an informal one going in a different direction, creating
the impression that the paths overlap transparently. A path with a formal border may have a more
informal placement of stones within the border, and so forth. Such differences in levels of formal-
ity are found in many design cultures. Its formalized expression in Japanese garden design turned
it into a universally useful design aid, ubiquitous among all the Japanese arts.
There is no simple recipe for design. Concepts like Nōtan, Hongatte, Mitate, Shin-Gyō-Sō, and
Wabi-Sabi directly relate to the appearance of a design, intuitively conveying qualitative relations
between part and whole. Its greatest utility is as a mental tool for increasing one’s own awareness
of, and ability to respond more sensitively to, various perceived visual aspects of the design as it
is created.

Visual Structure in Classical Japanese Interior Design


A major limitation in the traditional Japanese dwelling was, and still is, the shortage of space and
natural light (Tanizaki [1933] 1977). The solution is a system of rectangular architectural frames,
wherein layers of sliding doors can swiftly alter visual access to the exterior. Traditional Japanese
architecture thus naturally lends itself to a style of dwelling where the imagery of nature is always
near, subtly framed by layers of wood, clay, and paper.
Sliding doors consist of wooden lattices covered in opaque (Fusuma) or translucent (Shōji)
paper. Smaller windows usually consist of shaped openings in clay walls, fitted with a latticed
sliding window panel. This wide array of sliding panels therefore can let in diffuse light, allow
a direct view to the exterior, or cut off all visibility. It changes not only the amount of light
entering the room, but also articulates the interior space. First, windows of various sizes cre-
ate impressions of spatial depth perspective (Figure 42.1). At the entrances to tea huts, such
as Tai-An and Saigyō-An in Kyoto, the smallest window is placed furthest from the entrance
(Suzuki 1979) and usually closer to the floor than windows right at the entrance. Windows
are deliberately not aligned, but arranged in an irregular step-like manner, a strategy also
followed on the architectural exterior. These effects help engender the appearance of greater
spaciousness in the architectural interior.
866 van Tonder and Vishwanath

Fig. 42.1  A glimpse of the exterior and interior of a small tea hut at Nobutsu-An in Kyoto, Japan.
Note the many contrasts between light and dark, small and large, and regular pattern set off against
irregular pattern. Intersecting lines are carefully avoided, while clearly demarcated T-junctions
enhance spaciousness.

Gilded panels reflect ambient light back onto surfaces, brightening the room, clearly delineating
shape silhouettes, and appearing as transparent layers beyond which space continues. Sometimes
applied in gilded parallelogram shapes (Naito and Nishikawa 1977, colour plate 91), it gives a
shimmering impression of spacious floorboards continuing around corners. Coloured panels are
traditionally painted with ink and a mixture of powdered seashell, ground semi-precious stone,
and nikawa—a gelatinous glue. This matte pigment results in nearly equiluminant coloured
regions, confounding the definiteness of distance, perceived size and the flatness, or shape, of the
surrounding walls (Akino 2012). Woven tatami mats reflect light back from the floor, whereas
strips of white washi, Japanese mulberry-bark paper, are pasted low along walls at locations where
various small tasks, such as mixing tea, require better visibility (Figure 42.1).
Straight lines are carefully placed so that repetitive sequences are contrasted with irregular pat-
terns. This is beautifully demonstrated in the irregular bundling together of built-in bamboo lat-
tices that act as window meshing (e.g. the far left of Figure 42.1). Appearing as a regular lattice
from a distance, its inherent irregularity dominates when viewed up close, creating the impression
of different meshes overlaid—another interesting depth and grouping effect.
Where two wooden frame lines intersect, the thinner line will deliberately be misaligned on
both sides of the thicker line to reduce the degree of smooth continuation. Discontinuity across
a visual junction is configured into two adjacent T-junctions, thus implying a greater number of
occluding elements than if there were merely a crossing of two straight lines. This enhances the
perception of spatiality, not necessarily veridical depth. In traditional construction, the layout of
sliding panels in their frames result in three nested T-junctions overlaid at each corner. In modern
design, where such simple details are often neglected, this kind of spatial articulation is easily lost.
Nōtan is thus expressed through light and dark paper, different hues of clay walls, with wooden
beams, gilded wall panels, and windows carefully arranged into an irregular, balanced pattern
with a subtle interplay of light and dark. Combining these devices culminates in an open-ended,
underspecified visual space of many scales and amassed layers of potential occlusion, from which
perception constructs an experience of a rich depth articulation and expansiveness in the sur-
rounding space. This perceptually inferred space is, at some level, physically implausible if the
physical visual clues were interpreted as literal ecological cues.
Occluding layers—the rich variety of sliding windows, in particular—hint at the spatial con-
tinuation of whatever is occluded. Traditional architects and gardeners are well aware that a small
section of a garden outside, viewed through layered frames, appears much enlarged, filled with
a greater number of components, and that shapes seen within the frame appear more beautiful
Design Insights 867

(Nitschke 1993). This traditional design wisdom is supported by psychophysical observation of


boundary extension (Intraub and Richardson 1989)—a consistent perceptual tendency to recall
a greater (reconstructed) portion of what was actually seen through a frame, as if subjects could
imagine what lies beyond occluding edges.
Irregularity is a key aspect of the visual character of naturalistic landscape design. The care-
ful attention to irregularity in the architectural interior is therefore a powerful visual link to the
designed exterior landscape.

The Green Gestalt School: Visual Organization


in Japanese Gardens
The two oldest surviving texts on Japanese garden design originate from eleventh-century instruc-
tions. Sakuteiki (attributed to Toshitsuna Tachibana, late eleventh/early twelfth century) presents
design guidelines in the context of classical poetic metaphor (Shimoyama 1976); the other—the
Sansui manual (Shingen 1466; attributed to the teachings of the eleventh-century gardener priest,
Zōen)—uses more concise design statements and illustrations as mode of instruction. Both texts
emphasize gardens as a recreation of the profound mystery and beauty experienced in nature; but
not all of nature is considered essential. The designer is instructed to search for and emulate places
of unusual natural splendour, but not as an exhaustive miniature replica of nature; a reduction of
the number of parts is implied. This constitutes one of the main challenges in Japanese garden
design.
The texts draw attention to naturalistic landmarks that appear irregular and asymmetric (van
Tonder and Lyons 2005) as most ideal for emulation, and instruct on how to choose natural mate-
rials for its implementation. It provides guidelines on what sizes of rocks are needed given the
garden courtyard space. This sets the scale of the whole in relation to the rectangular frame of
the courtyard walls. The relative scale of nearest neighbours is also important: rocks should not
be of equal size or half the size of each other, but arranged in a one-third or two-thirds size ratio
(Shingen 1466; Slawson 1987). The rule of thirds (Smith 1797), similar to the golden ratio, is also
common in Western art and design.
The main rocks are first arranged into a structural ‘backbone’, with smaller rocks later added
in ‘good agreement’. The shapes of main rocks should ideally be angular and asymmetrical. Their
placement on the ground should never line up, but follow an irregular winding pattern, with
stones interspersed like the ‘scales on a dragon’. If a rock, or rock cluster, appears to lean in one
direction, its neighbouring elements, of different sizes, should lean back, creating counterbalance
(Jiroh and Keane 2001). This aspect of asymmetric structure is a key towards understanding visual
balance in naturalistic shape (Figure 42.2A), and is even continued in the empty spaces between
rocks (van Tonder, Lyons, and Ejima 2002).
Rocks must not be spaced equal distances apart. Exact repetition is avoided where possible. Informal
analysis suggests that the ratio of the average size of any two nearest neighbouring rocks (or rock clus-
ters) to the distance between their geometrical centroids is roughly kept constant (Figure 42.6B). In
Ryoanji, this ratio is roughly 1:2 (van Tonder and Lyons 2005), a ratio at which textural crowding
diminishes. Textural crowding is the involuntary grouping together of elements into a texture pattern
in which the shape of individual texture elements are not effortlessly apprehensible (see Rosenholtz,
this volume). Hence, at a spacing ratio of 1:2 the global tessellation of rocks would be as visually salient
as the individual rock shapes. The method seems like a sophisticated proximity-and-size rule, where
smaller rocks are placed more closely together, and larger rocks spaced further apart—another essen-
tial aspect observed in natural rocky outcrops (Figure 42.2A).
868 van Tonder and Vishwanath

(a)

(b)

(c)

Fig. 42.2  (a) Exposed bedrock, eroded by wind, sun, ice and rain, remains as irregularly overlapping
heaps, facing upwards against gravity, with similar triangular shapes appearing at many spatial
scales. (b) The most visually dominant rock cluster in the garden at Ryoanji temple, Kyoto. Note the
many instances of triangularity in whole shapes and surface texture markings, with individual rocks
leaning towards each other. (c) The Ryoanji garden emulates a sparse naturalistic rock outcrop.

Japanese gardeners today still use the metaphor that a good design will show its ‘skin, flesh and
bones’ (Ogawa 2011) in one glance, meaning that the overall structural backbone, the shapes of
clusters, individual rocks, and their textures must all be visible. A rock should be placed in the
original orientation in which it was found in the wilderness so as not to ‘anger its inhabiting spirit’.
This taboo is a way of preserving the visual integrity between the shape of the rock as a whole,
and the directionality of its smaller facets and surface textures as chiselled out by erosion, so that
the impression of an entire rocky ridge can be conveyed with a single design component. Rocks
should be buried deeply enough that the visual junction with the ground plane lends the appear-
ance of continuing as solid bedrock underground (Slawson 1987), instead of betraying the pres-
ence of a small, unconnected design component (Figure 42.2B). A similar practice prevails among
Western masons, who match the orientation of stone in construction to its original alignment in
Design Insights 869

the quarry. Many cultures also pay heed to the orientation of timber: matching the dry and wet
(north and south) sides of the wood with architectural conditions on site, and using timber from
trees that endured windy conditions for building components that have to bear the greatest load
increase the durability of a wooden construction.
Using triadic rock groupings (Shingen 1466)—where each individual rock and rock cluster is
approached as a triangle—allows the design to be conceived of as a multiscale composition of
triangles knit into a whole (Figure 42.2B). Deliberately using a hierarchy of triangular templates
is thought to simplify the mental load for the designer (Arnheim 1966, 1969) when having simul-
taneously to deal with a lot of visual factors, such as asymmetry, proportion, and visual balance
(Slawson 1987).
Medieval Japanese design influenced Jugendstil, Art Noveau, the Vienna Secession, and
Bauhaus, nearly a millennium later, to adopt a renewed sensitivity to irregularity, asymmetry,
minimalism, and other factors that characterize perceptual organization.

Gestalt Principles of Grouping and Design


The Gestalt School, Bauhaus, and Influence of Japanese Design
The Bauhaus design school and the Gestalt school of psychology were contemporary institutions
grappling to understand perception in their own terms. One notable Bauhaus exercise was devel-
oped to hone perception of light and shadow, similar to the notion of Nōtan, by rapid live sketch-
ing of scenery as a reduced mosaic grid with as few cells as possible in different grey values (Itten
1975). The emphasis was on ‘seeing the gist’ and capturing its impression through drawing of
light, dark, curve, and texture.
It is known that Japanese art influenced Bauhaus (Behrens 2002), as contact between Japan
and the West increased dramatically towards the end of the nineteenth century. The minimalism
of Japanese woodblock prints and katagami—paper stencils for silk dyeing—appealed greatly to
Western graphic designers, becoming a major inspiration for renewed clarity of line and empha-
sis on non-figural depiction. This appeal was not without major misunderstandings. For exam-
ple, numerous layers of katagami sheets are used to stencil in different sections and colours of a
textile design, such as a floral motif with birds. Each separate sheet by itself, however, appears
as a strangely non-figural, abstract design. Unknown in Europe at the time and becoming very
popular among members of the Vienna Secession, these strange-looking stencils were mistakenly
regarded as intentional abstract designs (Shin-tsu Tai et al. 1998, pp. 89–90), unwittingly spurring
a design style aimed at abstraction of natural shape. ‘Idealized nature’, a concept shared with (and
to some extent even borrowed from) Japanese design, neatly fits with late nineteenth-century
Western ideas in art theory (Hildebrand [1893] 1945) that again influenced art nouveau, later art
deco, and also Bauhaus.
Greater access to East Asian calligraphy also influenced Western design during this era. Written
vertically, East Asian kanji script is more pliable to the ideals of balanced composition followed
in landscape and figural painting. For example, even normal fluctuations in the darkness of ink as
the brush runs dry create the impression of spatial landmarks (Figure 42.3 top right). East Asian
scrolls probably began to influence Western approaches to page layout, so that it is not really
surprising that the design of text enjoyed renewed interest among Bauhaus instructors, such as
Moholy-Nagy.
Some medieval European script is among the hardest to read fluently. Spacing is strictly uni-
form, key features on individual letters are virtually undifferentiated, and particular orientations
dominate. Visually beautiful (Figure 42.3 top left), the letters, words, and paragraphs melt into a
870 van Tonder and Vishwanath

Developed in 1957 by Max Miedinger & Eduard Hoffmann in Switzerland, Helvetica was intended as a
neutral font without intrinsic meaning in the shape of letters. We are not so sure about that, but it does
read smoothly.

Fig. 42.3  Examples of page layout and font design. Top left: Section from an anonymous medieval
vellum manuscript. Courtesy of the National Library of Medicine. Top right: A section from a
seventeenth-century letter between friends, courtesy of Nobutsu-An, Kyoto. Bottom: Example of a
modern font based on Bauhaus ideals.

grey monoglyph that resists fluent reading. Its East Asian counterpart may be found in the love
letters of court nobility in classical Japan, where excessively fluid script renders text virtually unin-
telligible to all but the most accomplished among the initiated.
Typography designers at the Bauhaus, among others, sought the opposite effect: page format
with clearly articulated flow of text lines and paragraphs, with text and figures interspersed in
a more irregular, asymmetrical composition in an effort to improve readability. Improved font
design was another objective. A good font balances the salience of individual letters with that of
whole words. Overt spacing is important, but the shape of extremities on individual letters influ-
ences the similarity, alignment, and spacing between parts with significant effect on perceptual
grouping of letters into words (Figure 42.3 bottom), which is incorporated in the technique of
‘kerning’, in which letters with salient primitives, such as closed bubbles (‘a’), gaps (‘c’), junctions
(‘k’, ‘x’) and bilateral symmetry (‘w’) resist blending into a uniform texture, promoting legibility.
The debate on legibility against readability of serif vs sans-serif fonts is still ongoing and delves
further into this issue (Poole 2008).
The mantras of good design relating to principles of composition developed at the Bauhaus and
other contemporaneous movements bear testament to the importance of the perceptual effects of
sparse, irregular, and asymmetrically balanced patterns. ‘Ornament and crime’ (Loos 1908), ‘form
follows function’ (attributed to Mies van der Rohe; see Schulze and Windhorst 2012), and ‘less is
more’ (Sullivan 1896) conceivably refer to aspects of perceptual organization and more generally
to the notion of “good Gestalt”.

Internal Laws of Perceptual Organization


Cross-pollination between the Bauhaus and Gestalt school is putatively evident in their shared
emphasis on concepts such as figural ‘goodness’—structural configurations that facilitate
lawful perceptual organization. However, the true extent of their mutual influence remains
surprisingly obscure (Boudewijnse 2012). The idea of Gestalt qualities was first proposed by
Design Insights 871

Christian von Ehrenfels and later championed by Wertheimer (1938a), who was one of the
founders of the Berlin Gestalt movement. The central idea of gestalt perception was that the
perceptual whole transcends and modifies the properties of the parts. These ideas originate
in work by Brentano and his school, of which Ehrenfels, Wertheimer, and other figures in the
Gestalt movement were students (see Wagemans and Albertazzi for an overview of the origins
of Gestalt philosophy).
A significant contribution of the Gestalt movement was the derivation of a number of inter-
nal ‘laws’ that seemingly govern perceptual grouping. Every visual experience is perceptually
organized as a figure seen on a surrounding background, the visual qualities of the figure and
background (see Kogo and van Ee, this volume) unfolding even in the absence of clear visual
markings, such as when viewing a parabolic Ganzfeld screen (Metzger [1936] 2006). Here, the
perceptual figure appears to span the entire visual field, in the form of a thick bank of fog. In sim-
ple terms, perceptual organization is crystallized along structural constraints, such as smooth-
ness of alignment between parts, similarity or shared commonality in one or more visual aspect,
spatial proximity and density (on parts, see Singh, this volume), the degree of figural complete-
ness or closure in the arrangement of parts, and the degree of bilateral or higher-order symmetry
in the configuration of parts (Koffka 1935). Convex formations (see Bertamini and Casati, this
volume) appear more salient than concave configurations within the same set of parts (Rubin
1921), and the simplest potential configuration of parts arises as the dominant perceptual figure
(Wertheimer 1938b).
Arnheim (1966) presented a powerful vocabulary of higher-level qualities in perceptual organi-
zation, based on his interpretation of order and complexity. He defines order as ‘the degree and
kind of lawfulness governing the relations among the parts of an entity’, and complexity as ‘the
multiplicity of the relationships among the parts of an entity’. Order and complexity are antago-
nistic yet interdependent. Great design would display a high degree of both order and complexity.
Different kinds of structural order can be discerned. Homogeneity, at a minimum level of
complexity, is the application of a common quality to an entire pattern, whereas coordination,
of greater complexity, is the degree to which all parts constituting the whole have similar impor-
tance and carry similar weight. Parts constitute a hierarchy when distributed along a gradient of
importance with regards to the whole. Accident is highly defined, irrational, and not achieved by
an explicit principle.
Disorder could be thought of as the clash of uncoordinated orders among parts, and only
possible when within each part there is a discernible order. Structural definition is the extent
to which a given order is carried through. A  relation between parts is rational when it is
being formed according to some simple principle such as straightness, exact repetition, or
symmetry.
Arnheim (1966, 1988) also discusses ‘directed tension’ between parts as a quality of gestalt.
A universal design strategy is to present a structural centre—analogous to the concept of percep-
tual figure—from which various tensions are directed to the other elements of a composition (see
also Alexander 2002). Depending on the perceived directionality of these tensions, different quali-
tative wholes are experienced. The tensions may be directed in obedience to some larger organiz-
ing principle, such as gravity. In triangular composition—a canon of many artistic traditions—the
triangle is a centre with a strong directed tension in itself. In a mandala, the overall tensions are
directed towards and away from a central middle point.
With this articulation of structural aspects discernible in design, Arnheim provided a vocab-
ulary that still inspires scientific experiments in the perception of design (e.g. Locher 2003;
McManus, Stoever, and Kim 2011).
872 van Tonder and Vishwanath

Design and Koffka’s Analysis of Art


The Gestalt psychologist Heinz Werner (1956) investigated, among many other aspects of per-
ception, the human ability to imitate. This led him to postulate that the world is naturally expe-
rienced physiognomically—imbued with meaning, mood and personality—when the observer
stops explicitly thinking about the metric properties of what is perceived. The animation movie
sequence of simple geometric figures by Heider and Simmel (1944) is a classical example. Deeply
influenced by Werner, Koffka (1940) presented an analysis—now mostly forgotten—of the psy-
chology of art. He proposed that the physiognomy of the perceptual Gestalt was experienced as a
relationship between ‘self ’ and the perceived ‘world’, in what he called an ‘ego-world field’ echoing
the idea of perception as intentional acts (after Brentano; Koffka rejected most historical schools
of aesthetic theory, including the ‘empathy’ theory of art developed earlier by Lipps in 1903).
In Koffka’s analysis of qualities, also developed by Metzger (see Albertazzi 2010), the primary
qualities experienced directly in perception concern both part and whole. Spatial location, light-
ness, color and orientation are examples. Secondary qualities are more diffuse or holistic, extend-
ing beyond immediate visual attributes to an overall character: rounded, smooth, elongated, spiky,
rough, large, and so forth. The tertiary qualities—physiognomy—transcend these structural lev-
els to express a disposition. The gestalt as a perceptual object in the inner realm of perception
could be cheerful, graceful, cheeky, sad, bold, difficult, revealing its fundamental inner nature, or
‘requiredness’, so that one would know how to behave meaningfully towards them. Koffka defines
the relationship between the ‘self ’ and the perceptual ‘world’ as a field in which the depth, breadth,
and directionality characterize the scope of one’s resonance with the perceived world. When the
part–whole relationships in a Gestalt are not lawful—if a certain part occupies the wrong place
in this hierarchy, or if it contradicts its order, if it seems superfluous, or if it shifts the balance
by demanding too much attention—one senses that there is something ‘wrong’ with the design.
When a design is not a self-contained Gestalt, but demands extraneous relationships to be mean-
ingful, the disruption to the bidirectional self–world resonance is immediately felt.
Koffka’s analysis and the general principles of perception and phenomenology deriving from the
ideas of the Brentano school have profound implications for the relation between perception and
design. In the design process, the true intentions of the designer resonate, implicitly or explicitly, with
the design. Those intentions set the requiredness of the perceived design. Hence, the design becomes
a genuine interface linking the inner perceptual realms of designer and user. If it violates the physi-
ognomy of the Gestalt it will distort the resonance of the self of the user with the perceived design. In
such a case it may be difficult to articulate exactly what is amiss, but there would be a sense that the
design is awkward or dishonest. Instead of a linguistic critique, the intuitive user experience of the
perceptual physiognomy of a design could therefore more truly gauge the success of the designer’s
intentions. The concept of affordances (Gibson 1979)—the way in which object shapes appear to imply
their intended use—is analogous to the ‘requiredness’ of Koffka’s psychology of art, but is not primar-
ily concerned with meaning. It is focussed on cycles of stimulus and response through which learned
associations with physical parameters in the environment are acquired; it derives from the German
word, Aufforderungscharakter (demand character) used by Koffka – a significant influence on Gibson -
and also reflected in von Uexküll’s use of funktionale Tönung or functional tone. Affordances further
share some aspects of empathy theory (Vischer 1873; Lipps 1903) and emotive expression considered
by Hildebrand ([1893] 1945) in painting and sculpture. More modern variants of this idea are found
in in the mirror neuron hypothesis (Rizzolatti and Graighero 2004) and perception-action modelling
(Preston and de Waal 2002).
Design Insights 873

Nature and Design, Chaos and Symmetry


Patterns of Growth and Decay
Vision, with its internal laws of perceptual pattern organization, evolved with nature as its train-
ing ground. Rugged mountain slopes, swirling clouds, and the branching structure of a tree, in
fact, are imbued with the very part–whole qualities identified by the Gestalt school. In natural
form, these structural properties emerge from processes of growth and decay that causally link
part and whole at all different spatiotemporal scales (Thompson 1917), even if different in their
causative origin. Pressure in the earth’s crust and underlying magma, or erosive forces of sun,
rain, and wind, vs cell growth rate depending on the amount of sunlight, nutrient gradients, and
carefully clocked hormones, all conjure self-similar growth structures. The result is that physical
structures more closely related to the same causative origin are more proximal in space and time,
constituting closed convex hulls at some spatial scale. The parts share similarities in size, shape,
and other structural properties, and the structural similarities populate proximal spatiotemporal
scales. It is possible that in these properties lie the evolutionary source for many of the perceptual
laws discussed by the Gestalt school (see also Koenderink, this volume, on Gestalt as ecological
templates).
Essentially, faster-growing parts stretch and break away from their source, slam into slower
growing parts, and pile up until the density of material forces a change in the direction of struc-
tural growth. For example, as a tree branch grows, new potential branches are sent off into various
orientations at each branch node, among which only those that result in receiving the great-
est amount of light thicken as main branches for structural support. Over time, the structure
becomes an undulating structural spine with thinner twigs fanning out to cover as large a surface
area as possible (Figure 42.4A). Similarly, water flowing through a narrowing cascade acceler-
ates, stretches away from the slower water behind it, and collides with water that already passed
through and slowed down. The crashing water piles up and deflects further incoming water side-
ways, into the opposite direction. Structurally, the rushing water is thus very similar to a growing
branch (Figure 42.4B).
These are the shapes intuitively aspired to when Japanese designers attempt to capture the
essence of nature. The trained eye can uncoil the complexity of that Gestalt into just a few compo-
nents that still evoke a similar naturalistic effect (recall Figure 42.2A). This is the essence of what
might be referred to as naturalistic minimalism.
In our natural environment, perfect symmetry is an exception, rather than the rule. When
a drop falls perpendicularly into a still body of liquid, the ensuing collision is sufficiently
symmetrical to allow a perfect splash crown to emerge. Evolutionarily speaking, symmetrical
bodies should demand less complex genetics and motor control. Symmetrical flowers, fruit-
ing bodies, and the bilateral bodies of animals can be regarded as symmetrical collisions
between two or more equal parts. It signals an unusual occurrence on the natural backdrop
of structural hierarchies. For animals, bilaterally symmetrical configurations strongly hint
at the potential presence of other intentional agents. The necessity for a rapid flight-or-fight
response may be the evolutionary factor driving the acute perceptual sensitivity to symmetry
(see van der Helm’s chapter, this volume, on symmetry perception). The perceptual domi-
nance of this tendency surfaces when we make designs, as noticed by Japanese gardeners and
Gestaltists alike. Humans naturally tend to arrange objects at evenly spaced intervals or into
symmetric compositions. This innate tendency can even become a hindrance when the aim
is to create naturalistic design.
(a)

(b)

(c) (d)

Fig. 42.4  Natural and handmade patterns of growth and decay. (a) The undulating branch of a
clover azalea. Notice how the thickest branch or spine undulates to and fro. (b) Splashing white
foam in a flowing stream. Where the flow decelerates, the foam changes direction and sends
small eddies swirling outwards, creating a spine of to and fro lines. On a much larger scale, such
patterns appear as Kármán Vortex Streets in the atmosphere, where clouds swirl around an isolated
mountainous island in an open ocean. (c) Tracing detail of swirls on a first-century-BC bronze Celtic
mirror, excavated at Dordrecht. The undulating spine coils over four spatial scales, branching out
at sudden changes in direction. (d) A gilded wooden swirl from an eighteenth-century Austrian
baroque palace, showing one complete cycle of piling (bottom spiral), acceleration (smooth middle
section), deceleration (curl on upper end), and directional change (outwards swirls at the top).
Design Insights 875

Visual Structure in Natural Landscape and its Implications


In a very large survey on global visual preferences, Komar and Melamid (1995) found that land-
scapes resembling Pleistocene savannah were by far the most universally appealing, whether
tested on subjects hailing from tundra, desert, or anywhere else. A savannah landscape typically
has a few trees set in level grassland, with blue skies, a source of water in view, signs of the presence
of other humans, mountains in the distance, and a path leading off into the horizon. Apparently,
this is the evolutionary imprint of the hominin Eden. Its effect is most pronounced in visual pref-
erences of prepubescent subjects (Synek 1998), but clearly asserts itself in designed landscapes.
Most gardening traditions employ the components mentioned above (Dutton 2009); Japanese
gardens employ this landscape formula in particular clarity.
This preference is also interpreted in the perceptual tuning to fractal patterns. Normal subjects
have difficulties in distinguishing true fractals from pseudo fractals, but consistently prefer fractal
dimensions between 1.3 and 1.5 when comparing fractal stimuli (Spehar et al. 2003). It is thought
that trade-offs between strategies for visual reconnaissance and hiding are optimal at this level of
visual complexity. Unsurprisingly, the savannah grassland has a similar fractal dimension.
Eye movements during a search task trace out a trajectory with a fractal dimension of about
1.4 (Fairbanks and Taylor 2011). Compared to either random search trajectories or linear scan-
ning strategies, the 1.4 fractal search path is more effective in discovering targets in the visual
field. A tantalizing possibility is that preference for 1.4 dimensional fractal landscapes and the
wider recurrence of self-similar structure in human design are due to the close resonance of these
structures with evolved search strategies implicit in eye movements, hence allowing perceptual
organization to function in an optimal way not yet well understood.
Natural images display an inverse power distribution in their Fourier power spectra, reminis-
cent of the power laws observed by Zipf (1949). Fourier spectra of artistic images from Western
and Eastern traditions also obey the same inverse power law, even if in a dense form (Graham and
Field 2007). The finding is interpreted either as an aesthetic effect (Spehar et al. 2003), namely, that
artists implicitly recreate natural scene statistics (see Dakin’s chapter, this volume, on statistical
features) because of aesthetic preferences, or purely as an adaptation to scene statistics (Graham
and Redies 2010). In the latter view, artists intuitively present visual markings that the visual sys-
tem can more naturally parse, regardless of aesthetics.

Future Directions for the Scientific Exploration


of Perception and Visual Design
As discussed above, perception and design share deep connections. Evolution already shaped
the visuomotor skills necessary for stone-knapping hominids in Olduvai, 2.6 million years ago,
demanding acute perceptual sensitivity to the smoothness of convexity on chipped stone surfaces
(de la Torre 2011). By 800,000 years ago, proto-design had thus apparently already evolved into a
process of shaping meaningful parts, and had been in practice long enough to assert itself in the
perception of Homo sapiens.
With the advent of 3D and 4D printing, the assembly-by-parts approach is about to become replaced
by assembly of Cartesian layers. A motor with all its movable parts, all made from different materials,
can be printed as a complete, fully functional configuration from the outset (ZCorporation 2010).
Virtual 3D folding of shape enables design unconstrained by physical material limitations imposed in
our normal environment (Hansmeyer 2012). How these new visual forms outside the realm of ‘organic’
assembly will eventually affect the laws of perception is an open question.
876 van Tonder and Vishwanath

Analytical effort aimed at automating the design process promises to free human designers
from overwhelming repetitive details (Jupp and Gero 2006). In fact, there is such an enormous
amount of bad design in the world, that one may wish for the coming of the great ‘design-bot’.
However, as authors with a passion for art and science, we would like to see greater scientific
understanding of design and its process, but not with the aim of removing the human designer
from the loop. It should be aimed at better equipping the coming generation of designers, rather
than planning their extinction.

Appendix

Measures of Designed Structure


Stylistic Visual Signature
The Shin-Gyō-Sō levels of visual formality in Japanese design, and Gestalt observations of reifica-
tion (perceiving a complete whole from incomplete parts) and invariance (perceiving a constant
whole even if parts are distorted) (Lehar 2003) bear on the fact that perceptual organization of
figure and ground continues normally even when the parts are deformed, as long as a consistent
transformation is applied throughout.
Style, at various levels, can be compared to a broad transformation of this kind. Think of a
cathedral, built in the Gothic style. First, it is clearly distinct from other architectural styles, even
those that are also intricately hewn from stone. One can conceive of the gothic church as a feature-
less house, which is then transformed so that its components are elongated in the vertical orienta-
tion. Each salient part, such as a sloped point, window, or corner, is locally multiplied at slightly
different locations and spatial scales. These are then selectively further elongated, vertically. All
upper horizontals are replaced with gothic arcs. With a knotted vine motif as a final touch along
the edges, what started as a normal house will have a distinctly Gothic appearance.
Second, the same Gothic building can be interpreted and its style conveyed through a different
visual style—for example, as a sculpture made out of scrap metal. Welded into position, different
rusted metal rods can conjure in assembly the visual signature of vertical elongation, arcs, knotted
vines, and other features that characterize a structure as Gothic.
Third, either the cathedral or the sculpture can be shown in a picture, in different visual
styles. It can be a photograph—again taken in any of a huge array of photographic styles—or
it can be drawn as an architectural plan, the outlines of every part clearly emphasized in
the absence of textures and colours. If sketched in charcoal, the rough, dusty strokes may
evoke a granular gist of light and shadow; it can be painted in oil, with dapples of colour, and
not so much boundary contours, conveying an impression of the arcs, spires, and gargoyles.
Through copper etching, it may be shown in a sea of black dots and scrapes that swarm into
an instantly recognizable gestalt of a gothic church. All of these designs, if well executed, will
convey a distinctly ‘gothic’ character.
One visual style can therefore be presented in another stylistic mode, with each layer of style
retaining its own character. Style is primarily a qualitative visual system, mastery of style implies
consistent application of a given transformation to all the parts, embodying the Gestalt notion
of invariance. The fact that the gothic style is still recognizable when depicted in stylized—often
disconnected—markings bears witness to the efficacy of Gestalt reification.
Proportionality is a key feature of natural shape (Thompson 1917), giving distinct species their
unique structures. Proportional systems are already present among the oldest human-made visual
designs, such as bodily proportions used in Palaeolithic art (Francis 2001). Specific proportions
Design Insights 877

canonize the design shapes of different ancient civilizations, such as the instantly recognizable
proportions of an Egyptian sculpture or funerary mask. In the proportional systems used in
font design, or depicting the human body (Massironi 2002, pp.  35–43) by da Vinci, Dürer, Le
Corbusier, and many others, proportion refers to consistent spatial size relationships between
defined parts. In other stylistic effects, proportion can refer to the relative amount of colour to the
amount of luminance contrast, the salience of contours in relation to the salience of colours (think
of Monet’s impressionist painting style versus a cartoon by Hergé), or textures, or it can relate to
the degree to which contours are locally deformed, or even disconnected, while grouping globally
into a specified configuration. If applied to various objects in the same style, these objects appear
to belong together, a consequence of the shared fate of their underlying features.

Structured Empty Space and Medial Axis Representation


Perceptually, the empty space—what artists often refer to as negative space (Arnheim 1966,
p. 130)—is more emphasized in a deliberately minimalist design, such as a sparse landscape com-
position (Tanizaki [1933] 1977; Nitschke 1993). When the rocks and the empty spaces between
them are particularly clearly articulated, such as the flat gravel courtyard with five rock clusters
at Ryoanji temple in Kyoto (Figure 42.2C), even modest analytical means could reveal essential
structural aspects of the design. Here, we will revisit an analysis in which medial axis transforma-
tion is used.
Blum (1973) conceived of the medial axis as a means for compact shape encoding. Medial axes
can be thought of as the set of loci that would run along the central skeletal spines of the main
body and protrusions of a shape silhouette (Figure 42.5C). It can be computed via various meth-
ods, for example, by collecting the centroids of the set of all the largest possible disks that can be
locally fitted into a silhouette shape (Figure 42.5B). Psotka (1978) showed that points coinciding
with medial axes are highly salient, apparently playing a role in guiding attention during percep-
tion of whole figures. Kovács, Fehér, and Julesz (1998) suggest that certain sets of stable points on
the medial axis may be perceptually significant when keeping track of biological shapes in motion.
These medial points seem to overlap with the coordinates at which motion sensors placed on a
moving agent translate into believable impressions of bodily movement.
The empty space between the stones in Ryoanji would be encoded as a compact structural skel-
eton connecting all the open gravel spaces. This reveals the medial ‘shape’ of the negative space.
The empty space globally constitutes a dichotomously branching structure (Figure 42.6A,  D)
resembling small rivulets successively converging into a single axis (van Tonder et  al. 2002).

(a) (b) (c)

Fig. 42.5  (a) Medial axis transformation of the empty space between two points: any point on the
medial axis is equidistant from the two points. (b) The set of centres of the largest included disks that
touch the boundary contours of this triangle trace out an inverted ‘Y’-shaped medial axis. (c) Medial
axis transformation of a human silhouette appears as a skeletal midline along the body and limbs.
Local maxima—or medial points—are emphasized in black.
878 van Tonder and Vishwanath

(a) (b)

Y
(c) (d)

Fig. 42.6  (a) Medial axes in the empty space at the Ryoanji dry rock garden form a four-level
dichotomous branching tree. Thin lines indicate the architectural layout of the temple before it was
destroyed in 1797. The intended viewing location is indicated by the letter ‘O’ inside the central hall.
(b) Note the relative size–distance relations between nearest rocks. Taller rocks are shaded darker.
Rocks in the leftmost cluster (c) and the whole set of clusters (d) do not line up, but are arranged
into irregular folding screen configurations facing the viewing location.

Going from the trunk to the tips of the tree, the lengths of limbs increases logarithmically. Adding
to that a branching pattern at counterbalanced angles, the empty space resembles the branch-
ing structures ubiquitous throughout nature (Prusinkiewicz and Lindenmayer 1990). A similar
branching structure converges outward from the most conspicuous rock cluster on the left (Figure
42.6A, C). Adding or removing any element in the composition significantly disrupts the ordered
structure of the empty space. Even if dissimilar from at a glance, baroque vista gardens can also
be represented as branching networks. This level of abstraction thus enables a more sophisticated
comparison of different landscaping traditions.
Medial axes designate information-rich loci where maximal amounts of shape boundary
surfaces can be encoded with minimal parameters (Leyton 1987). A practical consequence
in Ryoanji is that the surface facets from the entire set of rock clusters (approximating each
cluster with a convex hull envelope) are at their most surveyable at the most global medial
point Y. There are obvious evolutionary connotations with placing the viewer in a location
that affords high visual access to the surroundings. Strikingly, this point is near one of the
intended viewing points of the garden, the centre ‘O’ of the abbot’s hall in the original archi-
tectural layout. Classical illustrations depict the Ryoanji rock garden from this viewpoint
(Akisato 1799). Outlining the central loci of empty spaces, medial axes also map the paths of
least obstruction for spatial navigation.
The original intentions with the Ryoanji garden design are not exactly known, but the
probability of randomly stumbling upon this composition is sufficiently small (van Tonder
2006) to suggest that the perception of visual balance and other proportional relationships
may be particularly acute when a subject’s viewing location is physically aligned with the
medial loci of the viewed spatial layout, a perceptual consequence related to natural mapping
(see ‘Natural Mappings’).
Design Insights 879

Isovist Theory and Space Syntax in Urban


and  Architectural Layout
Isovist theory (Benedikt 1979), another analytical approach to visuospatial accessibility, predicts the
perceived degree of spaciousness of an architectural space. An isovist graph is one that is computed
using the viewing position of the viewer and sampling the sight lines in all possible directions. The 2D
isovist in a room can be thought of as the set of rays or sight-lines that would emanate outwards from
the viewer in every direction and terminate on an architectural structure (Figure 42.7A, B). The isovist
graph is a plot of the length of each sight-line against angle, with the viewer’s direction of gaze as the
zero angle reference (Figure 42.7C, D). Graph entries can be scaled down with distance away from the
direction of gaze to enhance the predicted differences in perceived spaciousness. Using the technique,
a rectangular room is predicted to appear more spacious when viewed from a corner than when it is
looked at from the middle of a wall (Figure 42.7C, D). The isovist theory was developed by an archi-
tect trying to address the discrepancy between physical floor space—a fixed number of square metres
regardless of where the entrance is—and how the architectural plan and placement of entrances influ-
ence the appearance of spaciousness. In tea architecture (Figure 42.1), entrances are placed in room
corners to convey a sense of greater spaciousness (Suzuki 1979), a device underscoring the predic-
tions of isovist theory. Combining isovist theory with medial axis transformation, space syntax theory
(Hillier and Hanson 1984) successfully predicts the known density patterns of traffic and pedestrian
flow in major cities and architectural spaces around the world. The Olympic Delivery Authority (ODA)
used space syntax in planning the complex 2012 London Olympics and Paralympics infrastructure. Its
success underscores the observation that humans visually assess a path for greatest visual accessibility,
simplicity, and depth of sight-line.

(a) (b)

(c) (d)

Fig. 42.7  Isovists projected from the (a) corner and (b) side of a rectangular room, and their sight-line
graphs (c, d). Here, sight-lines are linearly scaled down away from the direction of gaze (centre red
bold line) to emphasize the influence of the viewing direction. The area under the isovist graph is
(c) larger for the corner projection than from the side (d), predicting that the room will look more
spacious from this viewpoint.
880 van Tonder and Vishwanath

Bilateral Symmetry and Self-similarity in Human Design


In the use of bilateral symmetry, perceptual organization completes a full circle, from nature as
the driver of perceptual evolution to the internal laws that shape how we see, and affect what
will become salient in our designs. Even the oldest known human-made engraving on an
80 000-year-old stone blade reflects our resilient natural tendency towards symmetrical design,
attention to the central axis of symmetry, exact repetition of shape and interval, and smooth align-
ment between parts (Henshilwood et al. 2002).
Strictly speaking, self-similarity, reflections, translations, rotations, and other transforms are all
scalar aspects of symmetry (Weyl 1952), but at an intuitive glance, self-similarity and symmetry—
bilateral symmetry in particular—appear qualitatively unique enough that designers and artists
distinguish between the two.
In hindsight, it is obvious then that the self-similarity of natural form would emerge, through-
out many epochs in human design (see Kimchi’s chapter on hierarchical patterns, this volume).
Cathedral and temple architecture in particular complement a high degree of various aspects of
symmetry with a repetition of the whole in its parts, in some cases over many spatial scales (Bovill
1996). Medieval Japanese garden design guidelines developed from the refined observation of
actual rock formations and many attempts to recreate nature’s essential balanced asymmetry, in
spite of the innate human perceptual bias towards more pronounced bilateral symmetry.
The self-similar circular layout of Ba-ila villages in southern Zambia (Eglash 1999, p. 27) recurs
over at least three spatial scales. The gates to Ba-ila villages and compounds and the entrances to
individual dwellings are arranged along various axes of symmetry that relate to a global structural
centre onto which all the constituent parts converge. The layout of Tang Dynasty capitals repre-
sents self-similarity in a rectangular format (Nitschke 2000). The city as a whole, its aristocratic
quarters and normal compounds, down to the main hall of each compound, are laid out as bilat-
erally symmetrical rectangles, centred along a central north–south axis with a protective barrier
and deity on the north side, and a main entrance towards the south—a self-similar arrangement
spanning four orders of magnitude in these urban complexes.
Pseudo self-similar flourishes, knots, and mazes commonly adorn an infinite range of designs
throughout the ages (Gombrich 1979), appearing in African carvings, textiles, and basketry,
Greek mosaics, Roman frescoes, Celtic mazes and accessories (Figure 42.4C), engraved Mayan
masonry, Islamic arabesques, curling vines in Indonesia, lattices depicting lightning and smoke
in East Asia, leafy branches in European cathedrals, and shell motifs in Baroque palaces (Figure
42.4D). In essence, these decorations are stylistic signatures of undulating growth and decay, pat-
terns fitted into symmetrical, regular frames to suit rectilinear human-made objects. While on
the surface the possibilities for doing so may appear infinite, there is a surprisingly limited set of
unique spatial arrangements for tiling such motifs into one-, two-, and three-dimensional pat-
terns (see Koenderink, this volume).
The drip paintings by Jackson Pollock exhibit unexpected self-similar properties (Taylor,
Micolich, and Jonas 1999). Created before the advent of the formalization of fractal geometry
by Mandelbrot (1977), Pollock obviously acted upon his perceptual experience, whether that
involved implicit perception of fractal structure or some other equivalent order. In his own words,
he tried to ‘capture the language of nature’.

Natural Mappings
Natural mapping (Norman 1988) emphasizes the importance of resonance between form and
function. Specifically, natural mapping refers to a design methodology where the layout of
Design Insights 881

controls is intentionally arranged to resemble the spatial layout of the designed object or environ-
ment. Consider, for example, a gas stove top with four burners arranged into a square layout. By
aligning the control knobs for the burners in a straight line, it is not clear which knob maps to
which burner. Even after repeated use, users may still make mistakes, when all it takes to create a
flawless interface is to place the four knobs into a square pattern that visually matches the layout of
the burners. According to Norman, great designs require neither labels nor manuals, but are suf-
ficiently intuitive to be used on the fly. The alignment between the user, controls, and the design
itself is also important for fluent use. We know from experience how difficult it can be to navigate
from a map that is rotated relative to the actual surroundings, even if it is an accurate mapping
of the terrain. Through the use of an intentional viewing point, classic Japanese gardens place the
viewer within a natural mapping from which the visual balance and other features of the design
can be most acutely experienced—a form of natural mapping for aesthetic enhancement or map-
ping where the need for mental rotation is kept to a minimum.
On the scale of architecture, the new Seattle Central Library, by Koolhaas and Prince-Ramus
(Goldberger 2004) presents a natural mapping of the romanized alphabet. The entire floor space
in the building consists of one long alphabetically indexed walkway, coiled into a huge helical
spiral. One can thus literally walk from book indices A to Z in one single stretch, a very efficient
design for both staff and users, although in this case the mapping is not directly perceptual but
requires cognitive knowledge of the relation between letters and organization. This type of heli-
cal structure is already exemplified in designs such as the Guggenheim Museum in Manhattan,
by Frank Lloyd Wright, although in the Seattle Central Library the helix is intentionally mapped
to another structure, the alphabet, and thus presents a clearer example of intentional functional
mapping between two structures. The design was received with mixed emotions, for reasons other
than the impact of the helical design (Cheek 2007).
Natural mapping can be extended to the structural mapping of the human body. The chair is an
example of a hugely successful design because it naturally maps to the body. The seat, arm rests,
opening for the legs, and rest for the back and head nearly resemble the visual layout of the user’s
anatomy, resulting in an intuitively grasped design. Grasping a design this fluently reveals some
discrepancies in perception of the actual qualitative experience of physically interacting with that
design: some of the most beautifully designed chairs have delivered an extremely uncomfortable
sitting experience, to the surprise of both their makers and users.
Ba-ila villages and Tang dynasty cities represent large-scale examples of natural mappings with
bilateral symmetry along a central axis, and with a clearly directional head-and-tail assignment.
As with a chair, these design layouts are suggestive of the human body. In fact, in traditional
maps showing the layout of Zen temple complexes in Kyoto, the names of architectural gates,
paths, halls, and facilities within the temple complex are typically inscribed on a human silhouette
(Masuno 2008, p. 150), where the human silhouette is spread in the ‘Vitruvian man’ style, with
different parts mapped to specified body parts.
Self-similar urban layouts mapped to the body are doubly powerful. First, there is the mapping
with the familiar body. Second, grasping the mapping of urban organization at any spatial level
informs one’s knowledge of its organization at other scales.

Acknowledgements
The authors thank Johan Wagemans, Steve Palmer, and the anonymous reviewers for many help-
ful comments. Thanks also to Branka Spehar for re-discovering the 1940 essay on art and psychol-
ogy by Koffka.
882 van Tonder and Vishwanath

References
Akino, A. (2012). Unpublished interview with the artist. Ai Akino is a classically trained Nihonga painter
from Kyoto, Japan.
Akisato, R. (1799). Miyako Rinsen Meishō Zue (Illustrated Guide to Famous Places In and Around the
Capital). 6 vols. Kyoto.
Albertazzi, L. (2010). ‘The Roots of Metaphorical Information’. In Perception Beyond Inference. The
Information Content of Perceptual Processes, edited by L. Albertazzi, G. van Tonder, and D. Vishwanath,
pp. 345–390. Cambridge MA: MIT Press.
Alexander, C. (2002). The Order of Nature. New York: Routledge.
Arnheim, R. (1966). ‘Order and Complexity in Landscape Design’. In Toward a Psychology of Art, pp. 123–
135. Berkeley: University of California Press.
Arnheim, R. (1969). Visual Thinking. Berkeley: University of California Press.
Arnheim, R. (1988). The Power of the Centre: A Study of Composition in the Visual Arts. Berkeley: University
of California Press.
Behrens, R. (2002). ‘How Form Functions: On Esthetics and Gestalt Theory’. Gestalt Theory 24: 317–325.
Benedikt, M. (1979). ‘To Take Hold of Space: Isovists and Isovist Fields’. Environment and Planning B
6: 47–65. doi: 10.1068/b060047
Blum, H. (1973). ‘Biological Shape and Visual Science (Part I)’. Journal of Theoretical Biology
38: 205–287.
Boudewijnse, G. (2012). ‘Gestalt Theory and Bauhaus—A Correspondence’. Gestalt Theory 34(1): 81–98.
Bovill, C. (1996). Fractal Geometry in Architecture and Design. Boston: Birkhäuser.
Cheek, L. (2007; updated 2012). On Architecture: How the New Central Library Really Stacks Up. Online.
http://www.seattlepi.com/ae/article/On-Architecture-How-the-new-Central-Library-1232303.
php?source=mypi. Accessed 15 August 2012.
Collins English Dictionary 11th Edition (2011; updated 2012). Collins. Online http://www.collinsdictionary.
com/dictionary/english. Accessed 30 November 2012.
Dutton, D. (2009). The Art Instinct. New York: Bloomsbury Press.
Eglash, R. (1999). African Fractals: Modern Computing and Indigenous Design. New Brunswick: Rutgers
University Press.
Fairbanks, M. S. and R. P. Taylor (2011). ‘Measuring the Spatial Properties of Temporal and Spatial
Patterns: From the Human Eye to the Foraging Albatross’. In Non-linear Dynamical Analysis for the
Behavioral Sciences Using Real Data. Boca Raton, FL: CRC Press, Taylor and Francis Group.
Francis, J. E. (2001). ‘Style and Classification’. In Handbook of Rock Art Research, edited by D. S. Whitley,
pp. 221–244. New York: Altamira Press.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Goldberger, P. (2004; updated 2012). ‘High-Tech Bibliophilia’. New Yorker 17 May. Online. http://www.
newyorker.com/critics/skyline/?040524crsk_skyline. Accessed 17 November 2012.
Gombrich, E. H. (1979). The Sense of Order: A Study in the Psychology of Decorative Art. Ithaca, NY: Cornell
University Press.
Graham, D. J. and D. J. Field (2007). ‘Statistical Regularities of Art Images and Natural Scenes: Spectra,
Sparseness and Nonlinearities’. Spatial Vision 21: 149–164. doi: 10.1163/156856807782753877
Graham, D. J. and C. Redies (2010). ‘Statistical Regularities In Art: Relations with Visual Coding and
Perception’. Vision Research 50: 1503–1509. doi: 10.1016/j.visres.2010.05.002
Hansmeyer, M. (2012). Building Unimaginable Shapes. TEDGlobal 2012. [Online]. http://www.ted.com/
talks/michael_hansmeyer_building_unimaginable_shapes.html. Accessed 14 December 2012.
Heider, F. and M. Simmel (1944). ‘An Experimental Study of Apparent Behavior’. American Journal of
Psychology 57: 243–259.
Design Insights 883

Henshilwood, C. S., F. d’Errico, R. Yates, Z. Jacobs, C. Tribolo, G. A. T. Duller, N. Mercier,


J. C. Sealy, H. Valladas, I. Watts, and A. G. Wintle (2002). ‘Emergence of Modern Human
Behavior: Middle Stone Age Engravings from South Africa’. Science 295: 12–78. doi: 10.1126/
science.1067575
Hildebrand, A. ([1893] 1945). The Problem of Form in Painting and Sculpture, translated by M. Meyer and
R. M. Ogden. New York: G. E. Stechert. (Originally published 1893, Strazburg.)
Hillier B. and J. Hanson (1984). The Social Logic of Space. Cambridge: Cambridge University Press.
Intraub, H. and M. Richardson (1989). ‘Wide-Angle Memories of Close-Up Scenes’. Journal of
Experimental Psychology: Learning, Memory and Cognition 15: 179–187.
Itten, J. (1975). Design and Form: The Basic Course at the Bauhaus, translated from the German
Gestaltungs—und Formenlehre by F. Bradley. London: Thames and Hudson.
Jiroh, T. and M. P. Keane (2001). Sakuteiki: Visions of the Japanese Garden—A Modern Translation of Japan’s
Gardening Classic. Tokyo: Tuttle Publishing.
Jupp, J. and J. S. Gero (2006). ‘Towards Computational Analysis of Style in Architectural Design’. Journal of
the American Society for Information Science 57(11): 1537–1550.
Keane, M. P. (1996). Japanese Garden Design. Tokyo: Charles E. Tuttle.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt.
Koffka, K. (1940). ‘Problems in the Psychology of Art’. In ART: A Bryn Mwar Symposium, edited by
R. Bernheimer, R. Carpenter, K. Koffka, and M. C. Nahm, pp. 180–273 (reissued in 1972 from Bryn
Mwar Notes and Monographs, Volume IX, 1940). New York: Sentry Press.
Komar, V. and A. Melamid (1995). Komar + Melamid: The Most Wanted Paintings. Dia Center for the Arts.
Online. http://awp.diaart.org/km/index.php/homepage.html. Accessed 2 January 2012.
Kovács, I., A. Fehér, and B. Julesz (1998). ‘Medial-Point Description of Shape: A Representation for Action
Coding and Its Psychophysical Correlates’. Vision Research 38: 2323–2333.
Kuitert, W. (2002). Themes in the History of Japanese Garden Art. Honolulu: University of Hawaii Press.
Lehar, S. (2003). ‘Gestalt Isomorphism and the Primacy of Subjective Conscious Experience: A Gestalt
Bubble Model’. Behavioral and Brain Sciences 26(4): 357–408.
Leyton, M. (1987). ‘Symmetry-Curvature Duality’. Computer Vision, Graphics and Image Processing
38: 327–341.
Lipps, T. (1903). Ästhetik: Psychologie des Schönen und der Kunst: Grundlegung der Ästhetik, Erster Teil.
Hamburg: L. Voss.
Locher, P. (2003). ‘An Empirical Investigation of the Visual Rightness Theory of Picture Perception’. Acta
Psychologica 114: 147–164.
Loos, A. (1908). Ornament and Crime. Innsbruck (reprint Vienna 1930).
McManus, I. C., K. Stoever, and D. Kim (2011). ‘Arnheim’s Gestalt Theory of Visual Balance: Examining
the Compositional Structure of Art Photographs and Abstract Images’. i-Perception 2: 1–2.
Mandelbrot, B. B. (1977). The Fractal Geometry of Nature. New York: W. H.
Massironi, M. (2002). The Psychology of Graphic Images: Seeing, Drawing, Communicating.
London: Lawrence Erlbaum Associates.
Masuno, S. (2008). 禅と禅芸術としての庭 (Gardens Related to Zen and Zen Art). Tokyo: Asahi Press.
Metzger, W. (2006). Laws of Seeing. Cambridge, MA: MIT Press. (Original German text published in
1936.)
Naito, A. and T. Nishikawa (1977). Katsura: A Princely Retreat, translated into English by Charles S. Terry.
Tokyo: Kodansha International.
Nitschke, G. (1993). From Shinto to Ando. London: The Academy Group.
Nitschke, G. (2000). Japanese Gardens. Cologne: Benedikt Taschen.
Norman, D. A. (1988). The Design of Everyday Things. New York: Basic Books.
Ogawa, K. (2011). Unpublished interview with master gardener Katsuaki Ogawa, Kyoto, Japan.
884 van Tonder and Vishwanath

Poole, A. (2008). Which Are More Legible: Serif or Sans Serif Typefaces? Online. (Updated March 2012).
http://alexpoole.info/blog/which-are-more-legible-serif-or-sans-serif-typefaces/. Accessed on 18 March
2012.
Preston, S. D. and F. B. M. de Waal (2002). ‘Empathy: Its Ultimate and Proximate Bases’. Behavioural Brain
Science 25: 1–72.
Prusinkiewicz, P. and A. Lindenmayer (1990). The Algorithmic Beauty of Plants. Berlin: Springer.
Psotka, J. (1978). ‘Perceptual Processes that May Create Stick Figures and Balance’. Journal of Experimental
Psychology Human Perception and Performance 4: 101–111.
Ralph, P. and Y. Wand (2009). ‘A Proposal for a Formal Definition of the Design Concept’. In Design
Requirements Workshop (LNBIP 14), edited by K. Lyytinen, P. Loucopoulos, J. Mylopoulos, and
W. Robinson, pp. 103–136. New York: Springer. doi: 10.1007/978-3-540-92966-6_6
Rizzolatti, G. and L. Craighero (2004). ‘The Mirror-Neuron System’. Annual Review of Neuroscience
27: 169–192.
Rubin, E. (1921). Visuell Wahrgenommene Figuren. Copenhagen: Gyldendals.
Schulze, F. and E. Windhorst (2012). Mies Van Der Rohe, a Critical Biography (New and Revised Edition).
Chicago: University of Chicago Press.
Shimoyama, S. (1976). Translation of Sakuteiki: The Book of the Garden. Tokyo: Town and City Planners.
Shingen (1466). Senzui Narabi ni Yagyou no Zu (Illustrations for Designing Mountain, Water and Hillside
Field Landscapes). Sonkeikaku Library, Sonkeikaku Sōkan Series. Tokyo: Ikutoku Zaidan.
Shin-tsu Tai, S., S. Campbell Kuo, R. L. Wilson, and T. S. Michie (1998). Carved Paper: The Art of the
Japanese Stencil. New York and Tokyo: Santa Barbara Museum of Arts and Weatherhill Inc.
Slawson, D. A. (1987). Secret Teachings in the Art of Japanese Gardens. Tokyo: Kodansha.
Smith, J. T. (1797). Remarks on Rural Scenery with Twenty Etchings of Cottages, from Nature: And Some
Observations and Precepts Relative to the Picturesque. London: Joseph Downes.
Spehar, B., C. Clifford, B. Newell, and R. P. Taylor (2003). ‘Universal Aesthetics of Fractals’. Computers and
Graphics 27: 813–820. doi: 10.1016/S0097-8493(03)00154-7
Sullivan, L. H. (1896). ‘The Tall Office Building Artistically Considered’. Originally published in Lippincott’s
Magazine 57: 403–409.
Suzuki, T. (1979). 茶室と露地 (Tea Rooms and Tea Gardens). Tokyo: Sekai Bunkasha.
Synek, E. (1998). ‘Evolutionäre Ästhetik: Vergleich von prä—und postpubertären Landschaftspräferenzen
durch Einsatz von computergenerierten Bildern’. (Evolutionary Aesthetic: Comparison of Visual
Preference for Computer Generated Landscapes before and after Adolescence). Doctoral thesis,
University of Vienna.
Tanizaki, J. ([1933] 1977). In’ei Raisan. (In Praise of Shadows). Translated by E. Seidensticker and T. Harper.
Sedgwick, ME: Leete’s Island Books.
Taylor, R. P., A. Micolich, and D. Jonas (1999). ‘Fractal Analysis of Pollock’s Drip Paintings’. Nature
399: 422. doi: 10.1038/20833
Thompson, D. W. (1917). On Growth and Form: The New Edition. Cambridge: Cambridge University Press.
Also see On Growth and Form: The Complete Revised Edition (1992). New York: Dover Publications.de
la Torre, I. (2011). ‘The Origins of Stone Tool Technology in Africa: A Historical Perspective’.
Philosophical Transactions of the Royal Society B 366(1567): 1028–1037.
von Uexküll, J. (1926). Theoretical Biology. New York: Harcourt, Brace & Co.
van Tonder, G. J., M. J. Lyons, and Y. Ejima (2002). ‘Visual Structure of a Japanese Zen Garden’. Nature
419: 359–360. doi: 10.1038/419359a
van Tonder, G. J. and M. J. Lyons (2005). ‘Visual Perception in Japanese Rock Garden Design’. Axiomathes
Special Issue on Cognition and Design 15(3): 353–371. doi: 10.1007/s10516-004-5448-8
van Tonder, G. J. (2006). ‘Order and Complexity in Naturalistic Landscapes’. In Visual Thought: The
Depictive Space of Perception, edited by L. Albertazzi, pp. 257–301. Amsterdam: Benjamin Press.
Design Insights 885

Vischer, R. (1873). On the Optical Sense of Form: A Contribution to Aesthetics. Doctoral thesis.


Werner, H. (1956). ‘On Physiognomic Perception’. In The New Landscape in Art and Science, edited by
G. Kepes. Chicago: Paul Theobald and Co.
Wertheimer, M. (1938a). Gestalt Theory. In: Ellis, W. D. (ed.) (1938). A sourcebook of Gestalt psychology,
pp. 1-11. New York: Harcourt, Brace and Co.
Wertheimer, M. (1938b). Laws of organization in perceptual forms. In: Ellis, W. D. (ed.) (1938).
A sourcebook of Gestalt psychology, pp. 56-71. New York: Harcourt, Brace and Co.
Weyl, H. (1952). Symmetry. Princeton, NJ: Princeton University Press.
Yanagi, S. (1972). The Unknown Craftsman: A Japanese Insight into Beauty. Tokyo: Kodansha International.
ZCorporation (2010; updated 2012). ZPrinter ® 650. (Promotional video content). Online. http://www.
youtube.com/watch?v=7QP73uTJApw. Accessed 14 December 2012.
Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.
Chapter 43

Perceptual organization in visual art


Jan J. Koenderink

Introduction
Definition of ‘visual art’
‘Art’ is not necessarily defined by an aesthetic dimension. A sunset may evoke aesthetic experi-
ences, so may flowers, or butterflies, but natural phenomena are not art. One might suppose that
art is necessarily of human manufacture. But if someone points out a sunset to you, what is the
difference from pointing at a urinal, as Duchamp famously did1? The sunset was certainly not
manufactured, but merely pointed out. So was the urinal. If the urinal is appreciated as an objet
trouvé2 (admitted as an objet d’art), then why not the sunset, the flower, or the butterfly? The single
common factor appears to be that art is intentional3, it implies an ‘artist’, who may, but need not, be
a manufacturer. This is indeed a necessary requirement, but it is not sufficient. I will first introduce
a few important distinctions.
‘Visual art’ is art that is meant to be looked at, instead of being heard, felt, etc. However, a
copy of The Brothers Karamazov is meant to be looked at too (one is supposed to read it), but it
is generally not reckoned to be ‘visual art’. Yet Fyodor Dostoyevsky4 was certainly an artist, and
his novel is ART. Likewise the famous Fountain (actually a ‘found’ urinal) displayed by Marcel
Duchamp in 1917, is art, but not ‘visual art’. It appeals to cognition and reflective thought, rather
than immediate visual awareness. Today, conceptual art5 holds the floor—this is indeed the polit-
ically correct thing in a democracy, because most people ‘see with their ears’ as my artist friends
say. However, this chapter is focused singularly on visual art, ignoring conceptual art.

 Duchamp’s Fountain is one of the landmark objects of twentieth-century art. Virtually any book on ‘modern
1

art’ will have a section on it. A place to start is http://en.wikipedia.org/wiki/Fountain_%28Duchamp%29.


  Objet trouvé is French for ‘found object’. It has become a standard term in art circles. In English one more often
2

uses ‘ready made’. A place to start is http://en.wikipedia.org/wiki/Found_object.


  ‘Intentionality’ is a philosophical term meaning something akin to ‘pointing to something (usually something
3

in the world)’. For instance a thought is necessarily about something, you cannot have a thought that is about
nothing, although you may have thoughts about NOTHING. The term is usually traced to the teachings of
Franz Brentano (see also Albertazzi, this volume). Notice that ‘intention’ has nothing to do with the intentions
of anybody. A starting point is http://en.wikipedia.org/wiki/Intentionality. On Franz Brentano see http://
en.wikipedia.org/wiki/Franz_Brentano.
  Fyodor Mikhailovich Dostoyevsky (1821–1881) was a Russian writer of novels, short stories, and essays. See
4

http://en.wikipedia.org/wiki/Fyodor_Dostoyevsky. The Brothers Karamazov is his final novel. See http://


en.wikipedia.org/wiki/The_Brothers_Karamazov.
  In our times ‘conceptual art’ is almost synonymous with art period. This is a fact, whatever thoughts one may
5

have on it. A starting point is http://en.wikipedia.org/wiki/Conceptual_art. The number of popular books on


the topic is almost infinite.
Perceptual Organization in Visual Art 887

Although one should not fail to distinguish sharply between ‘visual art’ and ‘conceptual art’, this
may not always be easy because many paintings from western art fit into both categories. Raphael’s
Sistine Madonna6 (Figure 43.1 La Madonna di San Sisto, 1513/1514) is meant to be looked at, and
manages to strike an immediate visual impression. Yet it was commissioned as an altarpiece, and
has obvious religious connotations. It is art, both visual and conceptual. To someone coming from
a non-western culture the conceptual part may be non-existent; to such an observer the painting
is pure visual art. The same applies to the western appreciation of African tribal art as visual art,
when it was originally intended as conceptual.
As everyone knows from the newspapers, art has an important economic dimension, and
indeed one pragmatic definition of art is that it has a value on the art market. When a tin of shit
(Piero Manzoni’s Merda d’Artista7 Figure 43.2, 1961) sold for £97 250 at Sotheby’s in October 2008
(tin number 83 of 90; the cans were originally to be valued according to their weight in gold, or
$37 each in 1961), this thus marked it as a piece of Art. The value on the art market is important
for both visual and conceptual art. It is often considered a metric on artistic value, comparable to
the citation count in the case of scientific contributions, and making similar sense. This definition
places works of art in a single category with rare coins and postage stamps, evidently unfortunate.
What is lacking here is an ‘observer’. The investor is not an observer; in fact an investor is likely to
store the artwork in a vault. Here we identify another necessary condition for designating some
objects ‘art’.
This is perhaps best explained with an example; I use the case of pictures. What exactly is a
‘picture’, a painting say? It was famously discovered by Maurice Denis8 that a painting is (among
other things) a physical object:
It is well to remember that a picture before being a battle horse, a nude woman, or some anecdote, is
essentially a flat surface covered with colours assembled in a certain order.

However, used as a tea tray, such an object is certainly not a picture. In order to be a picture, there
should exist a double-sided intentionality, namely
the picture was intended by an artist to be looked at as a picture;
the picture is looked at as a picture, by an ‘observer’.

  Raphael is the short name of Raffaello Sanzio da Urbino (1483–1520). Raphael was one of the best known
6

Italian painters and architects of the High Renaissance. There are many books on the man and his work, a
convenient starting point is http://en.wikipedia.org/wiki/Raphael. Raphael’s Sistine Madonna (La Madonna
di San Sisto, 1513/4) is the last painting he personally finished. It was completed ca. 1513–1514, as a commis-
sioned altarpiece. See http://en.wikipedia.org/wiki/Sistine_Madonna.
7  I use Piero Manzoni’s Merda d’Artista to illustrate what I think of ‘conceptual art’. Maybe you (the reader) think
it is a work of genius. That is fine, as long as my point that conceptual art is not visual art comes across. (Who
cares for visual art anyway? It is the concept that counts!) My (mis-)use of Manzoni is perhaps unfair. Read up
on this at http://en.wikipedia.org/wiki/Artist%27s_shit and http://en.wikipedia.org/wiki/Piero_Manzoni
  Maurice Denis (1870–1943) was a French painter, a member of the Symbolist and Les Nabis movements. He
8

was something of a theorist too, and did quite a bit of writing. On his life see http://en.wikipedia.org/wiki
Maurice_Denis. The quotation is from a Symbolist Manifesto of 1890: ‘Se rappeler qu’un tableau, avant
d’etre un cheval de bataille, une femme nue ou une quelconque anecdote, est essentiellement une surface
plane recouverte de couleurs en un certain ordre assemblées’ (Définition du néo-traditionalisme, Revue Art
et Critique, 30 August 1890).
Fig. 43.1  La Madonna di San Sisto, or the Sistine Madonna by Raphael (Raffaello Sanzio da Urbino
1483–1520). It was finished only a few years before his death, c. 1513–1514, as a commissioned
altarpiece. It was his last painting.
Raphael (1483–1520): The Sistine Madonna, 1512–1513. Dresden Gemaeldegalerie Alte Meister, Staatliche
Kunstsammlungen. Photo: Elke Estel Hans -Peter Klut. © 2015. Photo Scala, Florence/bpk, Bildagentur fuer Kunst,
Kultur und Geschichte, Berlin
Perceptual Organization in Visual Art 889

Fig. 43.2  Piero Manzoni (1933–1963), Merda d’artista, No. 4, 1961, Diameter 6.5 cm.
Manzoni, Piero (1933–1963): Merda d’artista (Artist’s Shit) No. 014. May, 1961. New York, Museum of Modern
Art (MoMA). Metal, paper, and ‘artist’s shit’, 1 7/8’ (4.8 cm) x 2 1/2’ (6.5 cm) indiameter. Gift of Jo Carole and
Ronald S. Lauder. Acc. n.: 4.1999. © 2015. The Museum of Modern Art, New York/Scala, Florence

‘Looked at as a picture’ implies looking ‘into’, and entering a ‘pictorial world’9. Consider these
examples:
An ancient stained wall is not a picture: even though it might beat a Jackson Pollock10 in attracting
visual interest it is not a picture, since the artist is lacking. No work of art comes into existence
as a cosmic accident. Designating the wall an objet trouvé2 might provide an artist’s intention3,
although this in no way changes the wall as a physical object. People have discovered striking
renderings of the face of Jesus in trees, old rags, cookies, and the wood grain of toilet
doors11 http://en.wikipedia.org/wiki/Holy_Face_of_Jesus or http://en.wikipedia.org/wiki/
Perceptions_of_religious_imagery_in_natural_phenomena. These are not to be counted as
works of art, since the artist’s intention is lacking.

9
  See Koenderink, J., van Doorn, A. J., and Wagemans, J. (2011). Depth. i-Perception 2(6): 541–564.
  Paul Jackson Pollock (1912–1956), known as Jackson Pollock, was an influential American painter and a
10

major figure in the abstract expressionist movement. He became extremely influential. Jackson Pollock
was best known for his unique drip painting, and was sometimes known as ‘Jack the Dripper’. See http://
en.wikipedia.org/wiki/Jackson_Pollock. (If you fail to ‘get’ the nickname see http://en.wikipedia.org/wiki/
Jack_the_Ripper.)
11 The Holy Face of Jesus is one of the acheiropoieta relating to Christ. These have been reported through-
out the centuries. Devotions to the face of Jesus have been practiced throughout the ages. Devotions
to the Holy Face were approved by Pope Leo XIII in 1895 and Pope Pius XII in 1958. The shroud of
Turin is the best known example. See http://en.wikipedia.org/wiki/Holy_Face_of_Jesus. On the face
in the toilet door see http://www.telegraph.co.uk/news/religion/6373674/Jesuss-facespotted-on-the-
toilet-door-in-Ikea-Glasgow.html. Another recent example is a face in a tree stump at Belfast cemetery
890 Koenderink

The observer’s intention is just as necessary. In a hilarious painting by Mark Tansey12, a cow is
forced by several earnest looking men to look at a painting by Paulus Potter (Figure 43.3).
The cow remains apparently unaware of the explicit erotic overtones of this work, thus one
concludes that in the bovine universe the painting is just another object, despite its lifelike
size and color. The observer is lacking, because the cow is looking ‘at’ instead of ‘into’ the
painting. In this setting Potter’s work is just an object.
Depending on the art-form, the physical object matters. Although no mere physical object is a
‘work of art’, it may provide ‘a link’ to it. Examples of this are Roman marble copies (mere
pieces of stone handiwork) of original Greek bronzes13. Without such a link, the work of art
(in the intention of the Greek authors) doesn’t exist anymore. Without the double intentional
significance14, the physical object is just junk.
The double-sided intentional nature thus explains the ontological status of ‘pictures’. The value
on the market is irrelevant. There is much that might well be considered ‘art’ that is either not mar-
ketable or would bring merely some value typical of used goods. Examples are tattoos, ornaments
on teacups or weapons, facial makeup, and so forth.
In this chapter I take a broad view and consider ‘art’ (used as short for visual art) to be any
object, change applied to an object, happening, or expression, when it has double-sided intention-
ality15. Art is designed to affect immediate visual awareness in some specific way.
A work of art presupposes a certain ‘visual literacy’ in order to be ‘read’. It is a hermeneutical
task15, in George Steiner’s16 terms ‘not a science, but an exact art’. Steiner’s ‘four movements’ indeed

(http://www.belfasttelegraph.co.uk/news/local-national/northern-ireland/face-of-jesus-christ-appears-
on-tree-stump-at-belfast-cemetery-16195735.html), which drew crowds of visitors.
  Mark Tansey (born 1949) is an American painter born in San Jose, California. The Innocent Eye Test dates
12

from 1981. According to Tansey (quoted in Mark Tansey: Visions and Revisions, by Arthur C. Danto; and
see http://www.101bananas.com/art/innocent.html): ‘I think of the painted picture as an embodiment of the
very problem that we face with the notion “reality”. The problem or question is, which reality? In a painted
picture, is it the depicted reality, or the reality of the picture plane, or the multidimensional reality the artist and
viewer exist in? That all three are involved points to the fact that pictures are inherently problematic. This prob-
lem is not one that can or ought to be eradicated by reductionist or purist solutions. We know that to successfully
achieve the real is to destroy the medium; there is more to be achieved by using it than through its destruction.’
  Roman marble copies of original Greek bronzes: a well known example is the famous Discobolus. See http://
13

en.wikipedia.org/wiki/Discobolus. The Greek original was completed towards the end of the Severe period,
c. 460–450 BC, but the original Greek bronze is lost. However, there exist numerous Roman copies, including
full-scale ones in marble. The first one found (in 1781) is the Palombara Discobolus. It was famously bought
by Adolf Hitler in 1937 (and put in the Munich Glyptothek), but was returned to Rome in 1948.
  Edmund Husserl has a notion of ‘double-intentionality’ that is quite different from my meaning here.
14

In order to avoid problems I will speak of a ‘double-sided intentionality’ associated with works of art. In
Husserl’s view the Langsintentionalität runs along protention and retention in the living present, where-
as the Querintentionalität runs from the living present to the object of which consciousness is aware.
See http://www.iep.utm.edu/phe-time/#SH1e. On Husserl (Edmund Gustav Albrecht Husserl, 1859–1938)
see http://en.wikipedia.org/wiki/Edmund_Husserl.
15  Hermeneutics is (roughly speaking) the art and science of text interpretation. See http://en.wikipedia.org/
wiki/Hermeneutics.
  Francis George Steiner (born 1929), is an influential European-born American literary critic, essayist,
16

philosopher, novelist, translator, and educator. See http://en.wikipedia.org/wiki/George_Steiner. Here


I am mainly referring to his influential book on translation, After Babel (1975), for which see http://
en.wikipedia.org/wiki/After_Babel.
(a)

(b)

Fig. 43.3  (a) Mark Tansey’s (born 1949) The Innocent Eye Test, 1981. The cow is looking at Paulus
Potter’s (1625–1654) The Young Bull, 1647 (b). The cow remains apparently unaware of the explicit
erotic overtones of this work. One concludes that in the bovine universe the painting is just another
irrelevant object, despite its life size and lifelike color. (Keep in mind that this figure reproduces a
painting, rather than a ‘documentary photograph’!)
(a) Tansey, Mark (b. 1949): The Innocent Eye Test, 1981. New York, Metropolitan Museum of Art. Oil on canvas.
78 x 120 in. (198.1 x 304.8 cm). Gift of Jan Cowles and Charles Cowles, in honor of William S. Lieberman, 1988.
© 2015. Image copyright The Metropolitan Museum (b) Potter, Paul (1625–1654): Le jeune taureau Un berger et
son betail, belier, agneau, vache et taureau. 1647. The Hague, Mauritshuis. © 2015. White Images/Scala, Florence
of Art/Art Resource/Scala, Florence
892 Koenderink

apply to art appreciation. First there is the blind trust to find something there, a step into the dark,
for better or for worse: to find nothing is experienced as a painful breach of trust. Then there is an
act of aggression, as the observer ‘conquers’ the work, followed by incorporation, as the observer
makes the work his or her own. Finally, there is retribution, wherein the observer (as indeed with
the initial trust) honors the artist’s intentions. The work is re-created in the observer, albeit in
novel form, for ‘to understand is to decipher; to see [orig. hear] significance is to translate’. Exact
re-creation is impossible, the artist’s meaning is always lost. Each observer sees only him- or herself.
My central interest will be modern western art (which involves the art of western Europe of
the late middle ages to the present, the art of the United States since the sixteenth century, etc.),
especially painting, sculpture, and architecture. I will also occasionally touch on non-western art
and other fields of endeavor such as photography, cinema, fashion, graphics design, and so forth.
Of course, the interest is merely visual organization, I ignore the conceptual, magical, religious,
and so forth, connotations, even though these are often the very reason for the existence of the art.
I focus on Gestalt properties, that is on the nature of the organization of the work, to the extent
that it may be considered ‘visual’17. Although there are certainly works of art whose organization
is almost completely visual, in many cases there exists organization on many simultaneous levels.
I start by making some (minimal) distinctions.

The Stratified Structure of Works of Art


I again use the case of pictures as an example. Pictures sometimes carry ideal meanings, not unlike
poems, although this is not necessarily the case. Here I am mainly concerned with an ‘anatomical’
analysis. Pictures may be analyzed as being composed of mutually heterogeneous levels of ‘being’18,
of which I identify four major (from the perspective of visual organization) ones:
Level 1: the smallest relevant constituents. These are the strokes of a drawing, the touches of a
painting, and so forth, as they are visually evident. These are essential infima, the structure of
the paper or canvas often being noticeable, but seen as part of the physical object rather than
the double-intentional picture. If the maker intentionally chooses a physical texture (rough
paper, film grain) such that it becomes part of the work, it is considered an objet trouvé.
Level 2: simple meaningful units. Here one thinks of mutually dependent pairs of strokes, sets of
touches making up an edge, and so forth. ‘Meaningful’ involves a spontaneously felt relation
in immediate awareness. A  single stroke may well be a meaningful unit, but sometimes the
simplest units contain many strokes.
Level 3: salient Gestalts. Any number of simple meaningful units may cohere in Gestalts. These do
not necessarily stand for nameable parts. If they do, the naming comes afterwards, as cognition
kicks in. They appear in awareness as significant geometrical configurations, or even volumetric
entities. These Gestalts often fluctuate on prolonged observation, as microgenesis organizes the
presentations. The work may actually prevent microgenesis from ever reaching a ‘fixed point’.
Level 4: represented entities. These are perceived objects, events, states of affairs, and in some
cases plots or stories. The spectrum is huge, this merges into the domain of reflective thought.

  Classic authors on the topic are Rudolf Arnheim (1904–2007; see http://en.wikipedia.org/wiki/Rudolf_
17

Arnheim) and José Ortegay Gasset (1883–1955; see http://plato.stanford.edu/entries/gasset/).


  Roman Ingarden’s ontological thoughts are particularly relevant here. See http://plato.stanford.edu/
18

entries/ingarden/.
Perceptual Organization in Visual Art 893

None of these strata is necessarily present in any given instance, although they may all be simul-
taneously relevant. The profile of weights that might be placed on the strata is a useful indicator
of style. It varies widely, as one notices in mutually comparing works by Mondrian19, Pollock11,
Malevich20, Rubens21, and Botticelli22, for instance.
One may associate different aesthetic values, either positive or negative, with the strata. But
what is more important is that the strata are never seen in isolation, except for special cases like
art restoration work—but then the work is not a ‘picture’ in the sense used by me here. Pictures
are organic wholes, implying that the strata are mutually interdependent23. There appears to be a
two-way causal flow24. A superstratum contributes context to objects or processes in a substra-
tum, whereas a substratum contributes substantial qualities to objects of the superstratum. In
this way, paintings may be comparable to polyphonic harmonies. Notice that there is room for
both harmony and disharmony, a crucial point in aesthetic appreciation. Of course, this may be
more easily noticeable in a Rubens painting than in a work by Malevich, simply because of their
very different structural complexities.

Some Illustrative Instances


Ornamental patterns
Perhaps the purest examples of visual art are ornamental patterns25. These range from very simple,
like an intentional scar, tattoo, or war paint, to extremely complicated, like the ornamental tessel-
lations of the Alhambra26.
The simplest ornamental patterns are found in all cultures worldwide. They almost invariably
include spirals, used in scarification, tattoos, amulets, and ornamentation. In the west they are per-
haps best known as the Celtic symbols27 found on many Dolmans and grave sites. The Celtic spirals
mostly rotate clockwise. One finds both dense (Archimedean) and open (logarithmic) varieties28.
They also occur in connected pairs and triples (triskele). In modern western culture one finds these
designs in church windows, mosaic floors, emblems, jewelry, and so forth29. Very similar designs

  Pieter Cornelis ‘Piet’ Mondriaan, after 1906 Mondrian (1872–1944), was a Dutch painter. He was an impor-
19

tant contributor to the De Stijl art movement and group. See http://en.wikipedia.org/wiki/Piet_Mondrian.
  Kazimir Severinovich Malevich (1879–1935) was a Russian painter and art theoretician. He was a pio-
20

neer of geometric abstract art and the originator of the avant-garde Suprematist movement. See http://
en.wikipedia.org/wiki/Kazimir_Malevich.
  Sir Peter Paul Rubens (1577–1640), was a Flemish baroque painter, and a proponent of an extravagant baroque
21

style that emphasized movement, color, and sensuality. See http://en.wikipedia.org/wiki/Peter_Paul_Rubens.


  Alessandro di Mariano di Vanni Filipepi, better known as Sandro Botticelli (1445–1510) was an Italian
22

painter of the early Renaissance. He belonged to the Florentine school under the patronage of Lorenzo de
Medici. See http://en.wikipedia.org/wiki/Sandro_Botticelli.
  Riedl, R. (1978). Order in Living Organisms: a Systems Analysis of Evolution. New York: Wiley.
23

  Riedl, R. (1984). Biology of Knowledge: the Evolutionary Basis of Reason. Chichester: John Wiley and Sons.
24

  Gombrich, E.H. (1994). The Sense of Order: a Study in the Psychology of Decorative Art (The Wrightsman
25

Lectures, Vol. 9), 2nd edn. London: Phaidon Press.


  On the Alhambra see http://en.wikipedia.org/wiki/Alhambra.
26

  On Celtic ornaments see http://en.wikipedia.org/wiki/Celtic_art.


27

  On spiral curves see http://en.wikipedia.org/wiki/Spiral.


28

  On the triskele see http://en.wikipedia.org/wiki/Triple_spiral.


29
894 Koenderink

occur in facial tattoos of the Maori30, African scarifications31 (Figure 43.4) and jewelry (earrings),
Navaho sand paintings32, Australian aboriginal art33, and Japanese family emblems34.
The spiral has a very simple organization, not much more complicated than a line.
However, it manages to cover an arbitrarily large area in a manner that is immediately visu-
ally evident. One might say spirals render an area visible. Other ways to render areas is by
(usually regular) stippling, or (usually regular) hatching—also common, and visually evi-
dent patterns.
The double and triple spirals are composite patterns, yet are immediately recognized as unified
designs. They cannot be arbitrarily extended, like the single spiral. Thus, they naturally fit within
a circular outline. Concentric circles, ornamental knots, mazes and labyrinths fit into the same
overall family of visual organization. They are found as ornamentation on bodies, weapons, pot-
tery, jewelry, floors, and walls. They serve as family emblems, powerful symbols (the swastika of
the Third Reich falls in this class), etc.
Another important class of ornamentation that often has strong perceptual organization is that
of band patterns. These occur in Europe from the stone age on35, and are found worldwide in vir-
tually all cultures. They naturally occur at the boundaries of disks and as ‘bracelets’ on rotationally
symmetric objects like weapons, pots, and sticks. In the simplest cases one finds parallel lines,
often zig-zag or wavy. In more complicated cases one finds repeated localized configurations. The
repetition is often ‘with variations’, usually regular ones. Most typical are simple alternations, like
in the ‘egg and dart’ pattern36 found at the Erechtheion (c. 421 BCE37).
Formally, the organization is defined by the ‘frieze groups’38, which are the classes of infinite
discrete symmetry groups for patterns on a strip. There are seven different frieze groups. The
groups are built on translations and glide reflections, one may find additional reflections along
the translation axis as well as half-turns. These basic organizations are found in ornamental bor-
ders of the most diverse origin (e.g., painted on or scratched in pottery, in basketry, in ‘barbed
wire’ tattoos, in tile borders), all over the world, in the most diverse cultures. Although the rep-
etition with variation is indeed visually salient there is little indication that the taxonomy of the
frieze groups plays an important role in visual organization39. It is apparently not part of a ‘visual
grammar’.

30  On tattooed Maori heads see http://en.wikipedia.org/wiki/Mokomokai.


  On African scarifications see http://www.ezakwantu.com/Gallery%20Scarification.htm.
31

On sand painting see http://en.wikipedia.org/wiki/Sandpainting.


32

On indigenous Australian art (also known as Australian Aboriginal art) see http://en.wikipedia.org/wiki/
33

Indigenous_Australian_art.
On Japanese family emblems see http://en.wikipedia.org/wiki/Mon_%28emblem%29.
34

35 On the Funnelbeaker culture sewe http://en.wikipedia.org/wiki/Funnelbeaker_culture.


On the egg and dart pattern see http://en.wikipedia.org/wiki/Egg-and-dart.
36

37 On the Erechtheion see http://en.wikipedia.org/wiki/Erechtheion.


The frieze groups are treated in Coxeter, H. S. M. (1969). Introduction to Geometry, pp. 47–49.
38

New York: John Wiley and Sons. See also Jablan, S. V. (1995) Theory of Symmetry and Ornament.
Mathematical Institute: Belgrade. (Electronic reprint available as: Symmetry and Ornament at http://www.
emis.de/monographs/jablan/index.html.)
39 On visual discrimination of the frieze (note 39) and wallpaper (note 41) groups see Landwehr, K. (2011).
Visual discrimination of the 17 plane symmetry groups. Symmetry 30(3): 207–219.
(a) (b)

(c)

Fig. 43.4  Example of traditional African scarification.


in order of appearance: © John Warburton-Lee Photography / Alamy; © Robert Harding Picture Library Ltd /
Alamy; © Joerg Boethling / Alamy
896 Koenderink

The patterns that are being repeated are necessarily ‘local’. They are often abstract geometrical
forms, like circles or crosses, that may also be used for their own sake. Indeed, starburst patterns,
circles (concentric or intertwined pairs or triples), and especially crosses, are found in all cultures.
Crosses are especially common, even in non-Christian (due to distance in space or time) civiliza-
tions. These simple configurations have frequently been given meaningful interpretations (circles
and starbursts standing for the sun, crosses for human copulation, etc.), but it would seem that
the visual salience preceded such meanings (which indeed can vary). The basic forms are also
found in the colorations of animals and plants, think of the ‘eyes’ found on butterfly wings. The
‘releasers’ that evoke standard action patterns in birds and fishes are often based on similar pat-
terns. In more advanced cultures one often encounters stylized images of floral motifs, animals,
and humans. However, such stylizations are frequently based upon one of the basic forms, which
appears to give them their impact.
It would seem that these forms are indeed part of a ‘visual grammar’. Their common prop-
erty appears to be simplicity (minimal structural information content) combined with high
non-accidentalness (see also van der Helm, this volume, on simplicity).
In two dimensions one obtains the so called ‘wallpaper patterns’40. Again, their organization
can be fully formalized through the symmetry groups in the plane. There are 17 distinct groups,
as has been known since 189141. All were already used by the ancient Egyptians! Indeed, these
groups have been invented independently by many cultures worldwide. Fabulous examples
are found in the tilings of Islamic architecture. The Alhambra is the paradigmatic example
(Figure 43.5). I know of no comprehensive accounts on the visual perception of these patterns.
It seems unlikely that naive observers would spontaneously differentiate between the various
types. As with the frieze groups, there is little indication that the taxonomy of the wallpaper
groups plays an important role in visual organization. It is not a part of ‘visual grammar’.
A particularly simple manner to induce perceptually salient organization is by bilateral sym-
metry about a vertical axis42 (see also van der Helm, this volume, on symmetry). This works
with virtually any pattern—witness the Rorschach inkblot figures43 (Figure 43.6). Such patterns
are localized and are easily fitted into various bilaterally symmetrical regions (coins, round
emblems, square tiles, heraldic patterns, vases, etc.). Although heraldic symmetry is often very
strict, e.g., spread eagles with two heads, one looking left, one looking right, heraldic trees are
often not quite bilaterally symmetric. They don’t need to be, because they ‘simply look it’ anyway
(Figure 43.6). With some degree of scrutiny you can make out the difference, but this has no
relevance to the Gestalt. ‘Just looking’ reveals a ‘visual symmetry’, even if (strictly speaking) it
isn’t there.
Bilateral symmetry about a vertical axis again combines minimization of structural information
content (a mere ‘etcetera’ suffices) with remarkable non-accidentalness.

On the ‘wallpaper groups’: Pólya, G. (1924). Über die Analogie der Kristallsymmetrie in der Ebene. Z
40

Kristallogr 60: 278–282.
Fedorov, E. (1891). Simmetrija na ploskosti [Symmetry in the plane]. Zapiski Imperatorskogo
41

Sant-Petersburgskogo Mineralogicheskogo Obshchestva [Proceedings of the Imperial St. Petersburg


Mineralogical Society], 28 (series 2): 245–291 (in Russian).
On symmetries see Weyl, H. (1952). Symmetry. Princeton University Press. On the importance of the
42

vertical axis of bilateral symmetry in perception, see Mach, E. (1886). Die Analyse der Empfindungen und
das Verhältnis des Physischen zum Psychischen. The text is available at http://www.uni-leipzig.de/~psycho/
wundt/opera/mach/empfndng/AlysEmIn.htm.
On the Rorschach test see http://en.wikipedia.org/wiki/Rorschach_test.
43
Perceptual Organization in Visual Art 897

Fig. 43.5  Example of a sophisticated tiling pattern from the Alhambra. The Alhambra is a treasure trove
of such tessellations of the plane. The reason is, no doubt, that Islam forbids the depiction of reality.
Thus artists either design all kinds of abstractions of Koranic writings or they move towards ornamental
patterns. Of course, mural tile work is perfectly suited for that.
© batarliah/istockphoto.com

Faces (as seen en face) are the most important instances of bilateral symmetry from a (human)
biological perspective. Given almost any bilaterally symmetric blob, human observers are likely
to ‘see’ a face in it44. This fact (though rarely acknowledged explicitly) is of the utmost impor-
tance to the visual arts. Women in particular specialize in optimizing the ideal ‘face’ configura-
tion (see Behrmann et al., this volume). Ideal faces are perfectly bilaterally symmetric of course,
whereas no actual face really is. Bilateral symmetry is a visual organization that readily arises in
vision, even when the actual patterns are far from ‘ideal’. Apparently it has a marked template
character (see also Koenderink, this volume, on Gestalts as ecological templates).
Humbert de Superville45, in his Essai sur les Signes Inconditionnels dans l’Art (Leiden, 1827) lists
the most important visual organizations of the generic face. This is perhaps one of the more inter-
esting treatises from the perspective of experimental phenomenology.

Fashion
Human figures are easily the most important objects for a human observer. Virtually all humans
are ‘artists’ in that they intentionally shape and decorate their bodies such as to evoke certain

On pareidolia (seeing faces anywhere) see http://en.wikipedia.org/wiki/Pareidolia. Spectacular examples


44

are found regularly on the Faces in Places website (http://facesinplaces.blogspot.nl/).


David Pierre Giottino Humbert de Superville (born The Hague, 18 July 1770, died Leiden 9 January 1849).
45

See http://digi.ub.uni-heidelberg.de/diglit/superville1827/0006?sid=dd31a03a096431e9277bcc612775728c.
(a) (b)

(c)

(d)

Fig.  43.6 Card 2 of the Rorschach test. Some popular responses are ‘two humans’, ‘four-legged
animal’, (a) ‘animal: dog, elephant, bear’. The website adds: ‘The red details of card II are often seen as
blood, and are the most distinctive features. Responses to them can provide indications about how a
subject is likely to manage feelings of anger or physical harm. This card can induce a variety of sexual
responses’. (b), (c), and (d) Drawings by Alphonse Mucha (1860–1939). Notice the apparent symmetry.
This ‘symmetry’ does not survive scrutiny, or even a good look. Yet the symmetry is obvious at first
glance! Perhaps unfortunately, we don’t have much of a ‘psychophysics of the cursory glance’ today.
(a) © zmeel /istockphoto.com (b) Awakening of Morning’, 1899. Chicago (IL), The Curt Teich Postcard Archives.
© 2015. Photo Curt Teich Postcard Archives/Heritage Images/Scala, Florence (c) Mucha, Alphonse (1860-1939):
Irises, 1898. Moscow, Pushkin Museum. © 2015. Photo Fine Art Images/Heritage Images/Scala, Florence (d)
Dance (From the series The Arts), 1898. Artist: Mucha, Alfons Marie (1860-1939). © 2015. Photo Fine Art Images/
Heritage Images/Scala
Perceptual Organization in Visual Art 899

Fig. 43.7  Make-up scheme (Yauheniya Piatkevich-Hauss no. 11865306). Such ‘face charts’


(for various complexes) can be found all over the Web. This one is at http://depositphotos.
com/11865306/stock-photo-Make-up-scheme.html. Such charts clearly reveal the releaser function
of make up. Niko Tinbergen (the ethologist) made similar schemes for the heads of various birds.
© Solveig/Depositphotos.com

gut-level visual responses in others. Methods may aim at eternity (witness mummified Maori
heads), a lifetime (scarification, tattoo, skull deformation), a short period (seasonal fashion), a
mere occasion (make-up), or just a fleeting moment (intentional smile, slightly bending the arm
in order to de-emphasize the elbow joint by Victorian ladies, articulating the finger pattern). Most
of these methods immediately address the momentary visual awareness of others. Both faces and
bodies yield strong Gestalts. Paintings and sculptures can be seen as carrying on body display ‘by
other means’.
Most facial ‘make-up’ is aimed at evoking emotional responses, often of a sexual nature, in oth-
ers. This generally implies the accentuation of desirable ‘releaser’ patterns46 (Figure 43.7), that is to
say, accentuations of the natural countenance. Comparatively rare exceptions include the make-up
used by the military to visually merge in the environment (camouflage47) and tribal ‘war paints’
that are supposed to induce fear in opponents, or, perhaps, promote courage, or recklessness, in
the wearer. The camouflage techniques reverse the usual make-up techniques by de-emphasizing
the eyes and mouth, and even optically defragment the face. The dark eye-stripes48 encountered
with many prey animals similarly de-emphasize the eyes, which are otherwise salient indicators
of an animal’s presence. Apparently the laws of visual organization rule throughout the animal
kingdom (see also Cuthill & Osorio, this volume).
A steady component of female make-up is the accentuation of the eyes, usually by darken-
ing or coloring the eye sockets, evidently with the intention of drawing attention to them. It is
known from ancient Egyptian, Greek, and Roman remains. This sometimes includes taking a
drug (Atropa belladonna49) in order to dilate the pupils. Another steady component is overall face
color (white in the Japanese geisha, brownish in modern western women), hairline (shaving in

On releasers see: http://en.wikipedia.org/wiki/Ethology.
46

47 The art of camouflage was actually developed by a remarkable artist: see http://en.wikipedia.org/wiki/


Abbott_Handerson_Thayer.
On eye stripes see the entry on the blog of the artist James Gurney, http://gurneyjourney.blogspot.
48

nl/2008/02/eye-stripe.html.
On Atropa belladonna see http://en.wikipedia.org/wiki/Atropa_belladonna.
49
900 Koenderink

the middle ages), hair silhouette (cutting, braiding, binding), and hair color (tinting). Usually the
mouth receives a strong accent (much like the eyes), involving lip color, shape, and size. These
components define the overall first impression. They cause the face to ‘read’ clearly, even at a cur-
sory glance. They also introduce a ‘style’ (e.g., compare the classical geisha, the ancient Egyptian
woman, the modern western young urban professional) thus they intentionally set out to trigger
specific visual organizations. More volatile fashions aim at the shape of the face (false shading to
accentuate bony structure, rouge to raise the cheeks, powder to kill a highlight on the nose, and so
forth). In some cases actual ornamentation may be added. All this is carefully orchestrated so as
to evoke a highly organized perception in immediate visual awareness.
That these facial Gestalts are to a large extent conventional becomes evident by widening the
scope beyond one’s daily social environment. Different cultures often use fully different methods,
even one’s own culture changes over time, both in the short and long terms. As one compares
painted portraits over the centuries one encounters remarkable uniformity over an era, but great
diversity over longer time spans. In more recent times we have photography and the cinema,
yielding detailed and veridical data. Of course, one has to ‘correct’ for various photographic tech-
niques here, the camera operators typically adding their own job of ‘make-up’ in a purely optical
way. With only moderate experience one is able to date a face accurately, hardly being off by a
decade and usually getting it right within a few years. The ‘decade look’50 can be picked up at a
glance, and is mostly a matter of visual organization.
Theatrical make-up uses the same techniques51, but in a highly condensed manner. The face
should ‘read’ in the intended manner even from a great distance, and in all lights. Despite their
differences, the methods of stage make-up and glamour make-up are only quantitatively different.
Both aim at creating a strong visual Gestalt of some desired kind, say of age, character, or profession.
What goes for the face ipso facto holds for the body52. A person may control the visual impres-
sion of the body by assuming certain (studied) poses, moving in particular etc. by accentuating
or hiding various features by way of appropriately chosen dress. If there is an ample layer of fat,
‘foundation’ (corsetry, bras, etc.) may work wonders ‘behind the scene’–optically, that is. These are
deployed so as to influence the immediate visual impression of others.
Again, going through western painting throughout the centuries (not to speak of non-western
cultures!) reveals an amazing variety over time, especially as concerns women. Men appear to
vary predominantly through different conventional clothing, whereas women actually appear to
vary in body shape, as is evident from the rendering of nudes. Yet this is evidently nonsense!
From a biological perspective, it is evident that women have (anatomically and physiologically)
not changed that much during historical time. Going through a selection of paintings forcefully
shows that the body image is a conventional Gestalt. It is of vital importance in society, and it also
pervades the visual arts, both in sculpture and in painting.
One might say (as is the case with the ornaments discussed above) that the body image is
a meme53. It is no different from (and closely related to) ‘fashion’ in clothes. Memes are com-
paratively stable ‘mental images’ (or schemes), that are somehow ‘contagious’. They apparently

50 On decade looks see http://www.addictedcosmetics.co.uk/site/images/infotheque/pdf/Make%20up%20


Through%20the%20Decades.pdf.
On theatrical make-up see http://en.wikipedia.org/wiki/Theatrical_makeup.
51

On the female body in art throughout the ages see Hollander, A. (1980). Seeing Through Clothes.
52

New York: Avon Books.
53 On memes see Blackmore, S. J. (1999). The Meme Machine. Oxford: Oxford University Press.
Perceptual Organization in Visual Art 901

spread from person to person within a time-slice of culture, and soon become traditionalized.
One witnesses changes that seem comparatively fast compared with the lifetime of an estab-
lished meme. Almost by definition, all memes of interest to the present quest are especially
good Gestalts.
Here is a striking example of such a sudden ‘transition’. The female body image throughout
(visually) recorded history is roughly characterized as a vertical column with some conventional
modulation of the silhouette (accentuated belly and short legs in the western middle ages, flat belly,
narrow waist, and wide hips (‘36–24–36’) in modern times) with a structured upper part (breasts,
shoulders, and head). The columnar nature is emphasized in Egyptian, Greek (kore), and Roman
art, to be continued in the western middle ages all the way up to the twentieth century. The long
robe is the dress that highly accentuates this by hiding the legs, thus delineating the column rising
from the floor. Trousers came only recently.
In 1961 Marilyn Monroe54 wears jeans55 (and even a bikini—invented by Louis Réard in 194656)
in The Misfits57. Her penultimate act is an emotional solo performance. She intentionally keeps her
legs together, although she goes through emotional contortions, mainly bending at the hips and
knees. Michelangelo Antonioni’s Blow-Up58 dates from 1966, only 5 years later. One notices that
the photographer’s models are instructed to pose with legs widely apart, poses that are orthogonal
to the classical ideal. Jean Shrimpton59 (‘the shrimp’) and Lesley Lawson60 (‘Twiggy’) set the scene
in the fashion world of that period, and introduced a novel model of the modern female. The
poses became angular, emphasizing knee and elbow joints, which tended to be played down in
the past. Fashion accentuated the effect through strategically constructed sleeves, and stockings,
striving for an androgynous effect. Designers often forced the models to wear caps, causing them
to look like young boys at an awkward age. Remarkably, this changeover occurred in just a few
years. Pre-1960s and post-1960s photographs of women are impossible to confuse. The fashion
(graphic) artists immediately followed suit. Soon modern visual artists did the same.
The particular revolution described above gave rise to major changes in the composition of
fashion photographs. This can be nicely monitored from Antonioni’s Blow-up photo sessions
mentioned in the last paragraph59. Instead of the composition involving the single figure (essen-
tially a Greek sculpture), or a small group (say the three Graces61), the composition involves an
arbitrary number of models that repeat (or play upon each other’s) awkward poses. If a single
model is photographed in the angular pose the pose is usually related to the picture frame, or
suitably arranged props. In this way one obtains again a well-organized perceptual organization,

54 On Marilyn Monroe (born Norma Jeane Mortenson, 1926–1962) see http://en.wikipedia.org/wiki/


Marilyn_Monroe.
On jeans see http://en.wikipedia.org/wiki/Jeans.
55

The bikini was invented by Louis Réard in 1946 (http://en.wikipedia.org/wiki/Louis_Réard).


56

The Misfits (1961) is a film drama directed by John Huston, starring Clark Gable, Marilyn Monroe,
57

Montgomery Clift, Thelma Ritten, and Eli Wallach.


Blow-up stars David Hemmings. There is a special role for the sixties model Veruschka. The plot is after a
58

short story by Julio Cortázar, Las Babas del Diablo (1959).


Jean Rosemary Shrimpton (born 1942) is an English model and actress.
59

60 Lesley Lawson (born Hornby, 1949), widely known by the nickname Twiggy, is an English model, actress,
and singer.
The three Graces (Charites) became a popular theme in western art. See http://en.wikipedia.org/wiki/
61

Charites.
902 Koenderink

albeit of a completely different kind from the generic perceptual organizations from before the
transition. This illustrates that strong compositions are possible in any ‘style’. No photographer
could avoid the change, as a study of the work of the well-known fashion photographers reveals
(study Richard Avedon62 as an example).

Sculpture
Sculpture is the art of composition in three dimensions. Here we mainly focus on the clas-
sical bronze, stone, and wood sculptures, although the realm of ‘sculpture’ has been greatly
expanded in recent times. Moreover we concentrate on simple works (busts, figures, putti,
single animals, etc.), and ignore most groups (like Rodin’s Burghers of Calais63), or extended
scenes (like Bernini’s St Theresa64). Some dyadic and even triadic topics are readily regarded as
‘simple’ though—think of ‘the three Graces’62, ‘mother and child’ (e.g., Isis with Horus, Mary
with the Infant Jesus), or ‘woman with male corpse’ (e.g., the Pietà), in one of the conventional
poses.
Sculpture is all about perceptual organization. Although one may display the plaster cast of an
object as a ‘sculpture’ (not uncommon in our era), this is evidently conceptual art, not different
from displaying a urinal. Sculpture proper is ‘architectonic’, it is about the composition of vol-
umes and surfaces. In 1893 the German sculptor Adolf von Hildebrand65 published a theory that
was ridiculed (but acclaimed by others) at the time. He was only interested in ‘naturalistic’ work.
He distinguished sharply between the Daseinsform and the Wirkungsform of volumetric objects.
The Daseinsform is what might be called the physical presence of an object. It enters awareness
through movements of the vantage point (binocular vision, moving around the object, or looking
at the manipulated object). Thus, it is not a thing of immediate visual awareness, but a cognitive
construction on the basis of many successive awarenesses. The Wirkungsform is an artistic con-
struction that works from a single viewpoint, immediately. This involves architectonic thinking
on the part of the artist. The artist has to understand microgenesis. The observer should appreci-
ate the view as ‘natural’, and be able to capture it in immediate visual awareness. As Hildebrand
observes, children’s drawings work immediately. He concludes that the Wirkungsform should
include what makes children’s drawings work. Thus, sculpting is not about copying nature. It is
about affecting human visual awareness. He mentions the ‘Grecian nose’66 as an example (‘ . . . it is
not as if the Greeks had noses like that. . . . ’).
Most western sculpture made before World War I is ‘volumetric’, and can be largely understood
in terms of an overall composition based on a small number of simple (ovoid, cubical, or cylin-
drical) major forms, smoothed together and elaborated by way of surface relief. Here ‘surface’

Richard Avedon (1923–2004), born Richard Avonda was an American fashion and portrait photographer.
62

See http://en.wikipedia.org/wiki/Richard_Avedon.
The Burghers of Calais is one of Rodin’s major works. See http://en.wikipedia.org/wiki/The_Burghers_of_
63

Calais.
64 Saint Teresa in Ecstasy is a sculptural group in the Cornaro Chapel, Santa Maria della Vittoria, Rome. It was
designed by Gian Lorenzo Bernini. It is a major work of the high Roman baroque.
Adolf von Hildebrand was the author of an important book Das Problem der Form (1893). One can find a
65

wealth of information at http://www.adolf-von-hildebrand.de.


On the Grecian nose in art see http://www.ehow.co.uk/facts_7568296_greek-nose.html. In her book on
66

cosmetics (Harriet Hubbard Ayer’s Book of Health and Beauty) of 1902 the author describes the Greek nose
as ‘perfect’. This seems to have been the general opinion throughout the nineteenth century.
Perceptual Organization in Visual Art 903

should be understood in a very broad sense. Thus—for visual purposes—a cube can be under-
stood as essentially a sphere (a compact volumetric object with aspect ratios of roughly 1:1:1),
with a superficial ‘dressing’ of corners and edges. The overall composition is due to the mutual
relation of the major forms, and is retained when the sculpture suffers through weathering, and
so forth, as is often seen in old unrestored works. Even the overall configuration usually yields a
strong cylindrical, ovoid, or block-like impression67 (Figure 43.8). Exceptions (e.g., horse rider,
boy with dolphin, etc.) are usually seen as ‘groups’ of pieces that might exist as individuals. The
relations between group members are of a higher order than the relations between the subvolumes
of a single member.
An interesting instance of variations on a single basic shape are the ‘character heads’ made by
the Austrian sculptor Franz Xavier Messerschmidt68 (Figure 43.9). By all counts Messerschmidt
might be denoted as mentally ill when he produced 64 studies of his own head assuming the
most incredible grimaces. There is no doubt a system in this madness, although we remain in
the dark as to Messerschmidt’s formal design. What is of interest here is that the basic form,
Messerschmidt’s skull, remains constant over the series, whereas the muscular/fatty/skinny clad-
ding varies widely. The set is well documented, and makes a fascinating body of work for the study
of (sculptural) form.
Later developments in mainstream western sculpture involve extreme non-convexities. These
may take the form of holes (see also Bertamini & Casati, this volume) or are due to the bending of
elongated volumes. Such work still retains the overall volumetric character though. Constructivism
changed that by introducing non-volumetric elements like wires, rods, and plates. Such work may
lead to completely different perceptual organizations, in which the overall, mostly empty space,
dominates over volumetric, filled space. If the classical organization is like a rock, the new one is
like a leafless tree in the winter. The introduction of non-rigidly connected parts in arbitrary move-
ments destroyed even this static spatial organization. The perceptual organization may be similar
to that of a flock of birds. The visual organization changes when you walk around a work, very
differently for open and closed sculpture, the reason being that you look through open structures
(Figure 43.10). The Constructivists introduced transparent material for much the same reasons.

Painting
By ‘painting’ I refer to any type of essentially ‘planar’ art, be it drawing, embroidery, map making,
intarsia, sand painting, you name it. I mainly limit the discussion to works of human or slightly
smaller size and mainly confined to some visually obvious ‘frame’. The frame may be implicitly
defined by the size of the paper or explicitly as with an actual frame around a canvas, etc. In most
cases the frame, in whatever form, is an important part of the composition. Paintings as physical
objects are arrangements of colors on a planar surface of limited extent. Paintings as artworks may
or may not succeed in evoking varieties of visual awareness in observers that suit the intention of
the artist. Success or failure depends upon the distribution of colors, at least if the group of observ-
ers are in the artist’s intended target group. Thus ‘composition’ is everything69.
Of course, the range of possible visual awarenesses that the artist might want to evoke is virtu-
ally unlimited. To complicate matters, artists often had, and have, secret agendas. Apart from the

An introduction to sculpture is http://en.wikipedia.org/wiki/Sculpture.


67

68 On Franz Xavier Messerschmidt (1736–1783) see http://en.wikipedia.org/wiki/Franz_Xaver_


Messerschmidt.
69 On composition in the visual arts see http://en.wikipedia.org/wiki/Composition_(visual_arts).
(a) (b)

(c)

Fig. 43.8  The Egyptian piece in (a) is almost a cubical chunk of stone (man called Ay Second
Prophet of Amun and High Priest of the goddess Mut at Thebes, Limestone, XVIII Dynasty, 1336–
1327 BCE, Brooklyn Museum New York). (b) Peplos Kore from Paros (c. 530 BCE, Acropolis Museum,
Athens). (c) The Venus de Milo, Greek Hellenistic, c. 100 BCE, Louvre, Paris. Notice that so-called
‘abstraction’ comes first and so-called ‘naturalism’ only in later stages. This is entirely typical. Art
does not arise from a need for mimesis, it derived from an urge to create something that should
hold itself against nature. Naturalism only becomes possible when the artist has ‘conquered
nature’.
(a) Block Statue of Ay, ca. 1336–1327 B.C.E. Limestone, 18 9/16 x 10 x 12 1/4in. (47.1 x 25.4 x 31.1cm). Brooklyn
Museum, Charles Edwin Wilbour Fund, 66.174.1. Creative Commons-BY Accession Number: 66.174.1 (b) Peplos
Kore, c. 530 b.C., from Athens. Athens, Acropolis Museum. Marble. h 4 ft. (m 1.21).- © 2015. Marie Mauzy/Scala,
Florence (c) Greek civilization, 2nd century b.C. Statue of Aphrodite known as Venus of Milos, circa 100 b.C. From
the Island of Milos, Cyclades, Greece. Paris, Louvre. Marble, height 202 cm.© 2015. DeAgostini Picture Library/Scala,
Florence
(a) (b)

(c)

Fig. 43.9  Three ‘character heads’ by Franz Xaver Messerschmidt (1736–1787). At one point in his
career Messerschmidt became mentally ill, and started on a project of 64 representations of his own
head in various states of grimace. The set (most have been kept) is worth close study because these
(mutually very different) shapes are all based on a single template, namely the sculptor’s own skull.
(a) Messerschmidt, Franz Xaver (1736–1783): The Yawner, after 1770. Budapest, Museum of Fine Arts Budapest
(Szepmueveszeti Muzeum). Photo: Jozsa Denes © 2015. The Museum of Fine Arts Budapest/Scala, Florence.
(b) Messerschmidt, Franz Xaver (1736–1783): A Hypocrite and Slanderer, Bust, Austrian, Made in: Austria, ca.
1770–1783. New York, Metropolitan Museum of Art. © 2015. Image copyright The Metropolitan Museum of
Art/Art Resource/Scala, Florence. (c) Messerschmidt, Franz Xaver (1736-1783): A Hypocrite and Slanderer, Bust,
Austrian, Made in: Austria, ca. 1770–1783. New York, Metropolitan Museum of Art. © 2015. Image copyright
The Metropolitan Museum of Art/Art Resource/Scala, Florence
906 Koenderink

Fig. 43.10  Naum Gabo (1890–1977) Constructed Head No. 2 (1916, original lost). The Gabo is
constructed from planar sheets. Compare the Egyptian piece in fi
­ gure 43.8 (a), which is compact,
like a pebble.
Artist: Gabo, Naum Caption: Head No. 2 ,1916, enlarged version 1964 Classification: sculpture Medium: Steel
Dimensions: object: 1753 x 1340 x 1226 mm © Tate, London 2015. The Work of Naum Gabo © Nina & Graham Williams

urge to evoke visual awareness in their intended audience, they often have pedagogic or idealistic
objectives (this includes propaganda and advertisement). Here we only consider visual aware-
ness proper. The best illustrators and propagandists are invariably good artists. They have to be,
otherwise their ‘messages’ would not be driven home. For all we care, ‘pure art’ is a nonentity.
I simply concentrate on the perceptual organization, and ignore the ‘message’. This may be hard if
the cognitive message is very loud. A thoroughly detached attitude is of the foremost importance.
Experimental phenomenology should proceed in the same way as a physician performing an
autopsy. In studying visual awareness one should be ‘all eye’.
The first impact upon the eye is the composition. The composition is often not noticed by the
observer in a conscious fashion but it is always an important part of the artist’s trade. The compo-
sition is why certain images are remembered forever and others are forgotten after so much as a
glance.
An example of a memorable image is the photograph taken by Joe Rosenthal on 23 February 1945
on Iwo Jima, generally known as Raising the Flag on Iwo Jima70 (Figure 43.11). It depicts five marines

70 On the battle of Iwo Jima see http://en.wikipedia.org/wiki/Battle_of_Iwo_Jima.


(a)

(b) (c)

Fig. 43.11  (a) Original photograph of the raising of the flag at Iwo Jima. (b) the first stamp.
(c) a recent parody.
(a) © MPVHistory / Alamy. (b) © Zoonar GmbH / Alamy.
908 Koenderink

and a US Navy corpsman raising the US flag atop Mount Suribachi. Three of the five did not survive
the battle. The photograph won a Pulitzer Prize in the same year, and in 1954 it was used as the
theme of the Marine Corps War Memorial (by Felix de Weldon) at Arlington National Cemetery. By
public demand it was printed on a postage stamp 5 months after the event, selling over 137 million
(the biggest selling stamp issued by the US Post Office). The photograph has been re-enacted, pub-
lished, painted, sculpted, cartooned, tattooed, etc., countless times. It is a true public image.
Another example is the painting American Gothic71 (Figure 43.12) by Grant Wood (1930). Whereas
initially the painting raised huge controversy, it soon became a public image. There exist numerous
copies (including sculptures), and countless parodies. A postage stamp was issued in 1998.
Why do these images command such public interest, even among people with scant interest in the
arts, and even many years after their first publication? It is not just their conceptual meaning, although
that evidently plays a role too. It is their immediate visual impact, as the many parodies, many of which
are just visual puns only roughly reflecting the gist of the image, show. Apparently these images ‘have
something’ that other pictures lack. The ‘something’ evidently has to do with the perceptual organization
evoked by them. The images have a Gestalt quality that easily survives reduction to postage stamp size.
The first visual impression is largely based upon the overall ‘gist’72. This gist is retained even
in a thumbnail reduction to a dozen by a dozen pixels. Art directors73 who have to select pic-
tures for magazines often look at reduced images (by printing proof sheets, using a reducing
glass, and so on). It is generally agreed that if an image doesn’t survive such minified viewing
it will certainly fail to have ‘impact’, even when printed large at high resolution in some glossy
magazine. Of course, in cases of iconic images, images for use in signs, etc., the gist may be all
there is (Figure 43.13).
Artists use various kinds of preliminary depictions74. The croquis is a gestural drawing of the live
model. It is done fast, and captures the essentials. The croquis (usually a number of croquis) are
used by the artist to design the final composition. The croquis is sought by the connoisseur because
of its sprezzatura75. The esquisse75 is a first sketch. The esquisse is intended to be used by the artist,
and is sought by the connoisseur because it allows a rare insight in the artist’s mind set. The esquisse
is often a stronger statement than the finished work. Several (or many) may be made, in order to
explore the range of possibilities of a project. The croquis and esquisse are usually small in size. The
ébauche75 is the underpainting for a painting, it is not intended to be seen, or used as such, since
its fate is to be overpainted. It is the size of the final painting. Since it is painted in a much broader
style, the ébauche may well be more indicative of the artist’s intentions than the final work though.
Famously, the Impressionists were accused of passing off their ébauches as final paintings.
Thus, the exploration of the gist is usually an important part of the evolution of a work. All these
exploratory or summary statements are of considerable interest to the study of visual organiza-
tion as it applies to the visual arts. In many cases they may be of more immediate interest than

On Grant Wood’s American Gothic see http://en.wikipedia.org/wiki/American_Gothic.


71

72 On gist see Aude Oliva’s chapter ‘Gist of a scene’ at http://cvcl.mit.edu/papers/oliva04.pdf.


On art directors see http://en.wikipedia.org/wiki/Art_director.
73

74 On croquis see http://en.wikipedia.org/wiki/Croquis, on esquisse http://fr.wikipedia.org/wiki/Esquisse,


and on ébauche http://en.wikipedia.org/wiki/Ébauche.
The term sprezzatura derives from Baldessare Castiglione’s The Book of the Courtier (1508), it is ‘. . . a cer-
75

tain nonchalance, so as to conceal all art and make whatever one does or says appear to be without effort
and almost without any thought about it . . . ’. The book is available at http://archive.org/details/bookof-
courtier00castuoft.
(a)

(b) (c)

Fig. 43.12  (a) Grant Wood’s (1891–1942) American Gothic (1930, Art Institute of Chicago). (b) a
Department of Agriculture Food Bank Debit Card. (c) one of the many parodies [the Web message
said: ‘Paris Hilton, left, and Nicole Richie pose with Tinkerbelle in this undated publicity photo. The
friends star in Fox’s new reality series “The Simple Life”, in which Hilton and Richie try to survive on
a camp. Notice how such parodies can (pictorially) be far off (e.g., the left figure is higher than the
right one, both figures are female, much younger, the clothes are very different, also in color, the
background is fully different, and so forth), yet are immediately recognized for what they are. There
seems to be no explicit ‘reasoning’ involved. Apparently the ‘gist’ is very generic in such cases.
(a) Wood, Grant (1892–1942): American Gothic (American Gothic), 1930. Chicago (IL), Art Institute of Chicago.
oil on panel, 78 x 65 cm © 2015. DeAgostini Picture Library/Scala, Florence (b) © GarRobMil (c) © REX/Snap Stills
910 Koenderink

Fig. 43.13  Isotypes (International System of Typographic Picture Education) were promoted by Otto
Neurath (1882–1945), an Austrian philosopher and member of the Wiener Kreis, in about 1935.
They were designed by an artist, Gerd Arntz (German-Dutch, 1900–1988; see http://www.gerdarntz.
org/). Such pictograms are still widely used all over the world. Most can be ‘read’ at a glance,
without any prior instruction.
© DACS 2015.

the study of completed works. It is hard to say to what extent the artistic development of a work
parallels microgenesis of visual perception76—cases where it apparently does and cases where it
clearly does not are not hard to find.
The impact of an image starts with the gist, but most images, except perhaps gestural sketches,
esquisses made in preparation for final works, and so forth, have relevant structures at other scales
that will be revealed under continued observation. Even comparatively simple paintings usually
require a ‘good glance’ involving a dozen fixations in order to obtain a preliminary impression.
This is not yet full scrutiny, but it certainly moves part of the way to visual cognition. Many of the
parts will still be in mere visual awareness though. Their impact on the whole is pre-cognitive and
depends upon Gestalt factors rather than cognitive factors. Most images one sees have many lay-
ers of scale, and even after scrutiny there is usually quite a bit of ‘mystery’ left; there are structural

On microgenesis see Brown, J.W. (1999). Microgenesis and Buddhism: the Concept of Momentariness.
76

Philosophy East and West 49(3): 261–277.


Perceptual Organization in Visual Art 911

elements that remain on the pre-cognitive level although one is well aware of them. An under-
standing of this spectrum that ranges from pure awareness, over cognitive stages to pure reflective
thought, is largely lacking.
A fact that is often forgotten, or certainly highly underestimated, is that virtually all images are
instances from an extremely huge number of possibilities. Consider a low-quality image from the
internet: it is likely to have a file size of 4 kb, implying that it is one of a set of 84000, a huge number.
The image is a member of a set of more than 2 × 103612 possible images. No one has a feel for num-
bers like that. You have at most 105 hairs on your head. The number of particles in the universe is
estimated at 1080, again, much smaller. Remember that is for just a low-quality image! Thus, the
number of possible images is for all practical purposes infinite. Of course, most of these images
‘look like nothing’, that is to say they look like ‘noise patterns, which all look the same. The ones
that ‘look like something’ are only a tiny fraction, though still an essentially infinite set. There is
no way one could ever see them all.
The ‘space of images’ as explored here is merely the space of physical images, or as Maurice
Denis put it ‘essentially a flat surface covered with colors assembled in a certain order’. What is
really of interest in the present investigation is, of course, the space of visual presentations of a
human observer. This is much more difficult to describe, it is a virtual space. This is the space of
real interest. The discussion that follows focuses on this visual space, although I will use the space
of physical images to indicate rough ballpark estimates.
One can identify the style of a painting at a glance and immediately identify an artist from a
work one has never seen before; a ‘fake van Gogh’ can be spotted at first sight, and so forth. It is a
priori likely that the set of images that are striking at first sight is also huge, but no doubt one will
not have encountered more than a vanishingly small fraction yet, no matter what one’s age. There
is still ample room for further development in the arts, so to speak. Perhaps the amazing thing
is that ‘visual organization’ works as well as it apparently does. However, it seems quite possible,
perhaps even likely, that the ability of human observers to deal with images enables them to deal
with only a small, singular subset.
From the perspective of experimental phenomenology, it is evidently of interest to attempt
to attain an overview of the boundaries of human visual microgenesis. This is far more diffi-
cult a problem than might be expected. Throughout the history of western art there have been
‘paradigm shifts’, not only of a mild character (a style change) but also of a cataclysmic nature.
Although hardly imaginable now, the paintings of the early Impressionists were considered dan-
gerous enough that pregnant women were kept away from the salon des refusés for fear of miscar-
riages77. The Cubist movement, and the work of ‘Jack the Dripper’78, perhaps fall into a similar
category. Such occasions can be seen as the conquest of a novel area, previously terra incognita,
of the space of images. In the case of the globe one at least had a notion that there was a ‘white
area’ somewhere, it could be marked hic sunt dracones79. This is not really possible with the space
of images. The new area discovered by Jackson Pollock must have felt more like the fear of early
sailors that they would fall off the edge of the (thought to be flat) earth.
Many of these cataclysmic changes had to do with attacks on our trust in the structure of the
generic terrestrial environment. This involves the ground plane, the existence of mutually disjunct

On the salon des refusés see http://en.wikipedia.org/wiki/Salon_des_Refusés.


77

78 Jack the Dripper (Paul Jackson Pollock 1912–1956, known as Jackson Pollock) was an influential American
painter and a major figure in the abstract expressionist movement.
On hic sunt dracones (‘here be dragons’) see http://en.wikipedia.org/wiki/Here_be_dragons.
79
912 Koenderink

(a) (b)

Fig. 43.14  (a) Ingres (1780–1867) La Source (begun 1820, completed 1856, Musée d’Orsay, Paris).
(b) Pollock (1912–1956) Echo No. 25 (1951, Pollock-Krasner Foundation/Artists Rights Society (ARS),
New York). Compare the spatial structure. The figure in the Ingres is a solid form that stands in front
of a background, there is space behind the body. In the Pollock there is only a faint, fleeting, and
changing impression of objects and environment. The pictorial surface dominates over any classical
‘pictorial space’.
(a) Ingres, Jean Auguste Dominique (1780-1867): La source. Paris, Musee d’Orsay. peinture. © 2015. White
Images/Scala, Florence (b) Pollock, Jackson (1912-1956): Echo (Number 25, 1951). New York, 10 x 12 Museum
of Modern Art (MoMA). Enamel on unprimed canvas, 7’ 7 7/8’ x 7’ 2’ (233.4 x 218.4 cm). Acquired through the
Lillie P. Bliss Bequest and the Mr. and Mrs. David Rockefeller Fund. 241.1969 © 2015. The Museum of Modern
Art, New York/Scala, Florence

solid bodies, optical properties like the opaqueness and diffuse scattering of material surfaces, and
so forth. Impressionism80 destroyed part of that by dissolving the picture of the environment into
a chromatic, misty space. Cubism81 merged solid bodies with the background, and began their
fragmentation. Pollock completely sacrificed solid bodies (Figure 43.14). The observer finally lost
the ground under his or her feet. Meanwhile, movements like Surrealism and Dadaism attacked
from the other side, so to speak, and destroyed the relationships an observer silently expects to
find in the generic terrestrial scene82.
An analysis in terms of experimental phenomenology suggests a first rough inventory of the
part of the space of images that might be open to the human visual observer. One criterion is

On impressionism see http://en.wikipedia.org/wiki/Impressionism.


80

On cubism see http://en.wikipedia.org/wiki/Cubism.


81

On generic terrestrial scenes see Clarke, K. (1949) Landscape into Art (available for download at http://
82

archive.org/details/landscapeintoart000630mbp).
Perceptual Organization in Visual Art 913

whether microgenesis arrives at some fixed point after prolonged looking. Such fixed points
appear to occur in one of the following three cases:
a more or less uniform image;
a highly structured image, that is statistically uniform even in its small parts;
a ‘classical’ scene.
In the first case one sees nothing remarkable, whereas it is evident that this will never
change, for want of structure. The blue sky is an instance, so are many modern minimalist
paintings83. In the second case microgenesis ‘gives up’ in face of complexity. The image is
summarized as ‘texture’. The film grain in the sky of a 1950s monochrome photograph is an
example84. One doesn’t even try to ‘see anything’ in such a sky, although the texture is noted.
The third case is that of the nineteenth-century still life, landscape, or genre painting. One
simply sees what is there, and that is it. The proviso here is that images are rarely exhausted
at one ontic level. The genre scene may well offer interesting ‘mystery’ in the background, in
the rendering of structure and so forth. After all, no painter is going to paint all the individual
leaves of grass, yet the image of a meadow can hardly be painted a uniform (dead) green.
These three categories serve for a first parceling of the space of images, a bit like the distinc-
tion between the oceans and continents of the globe. Of course, the boundaries cannot be sharp.
Given any image, it is always possible to construct a huge number of images that are essentially
look-alikes. Thus, an image is not like a point, but like an open environment85 in image space. Such
open environments will be different for a glance, a good look, or under scrutiny. Under a glance
the environment of look-alikes may well have a complicated structure, since the observer is likely
to ‘miss’ parts that would be easily ‘got’ at another glance.
Perhaps more interesting are the images for which microgenesis fails to immediately arrive at a
(single) fixed point. One may distinguish (at least)
spontaneous jumps from one fixed point to another;
spontaneous fluctuations between a limited number of fixed points;
endless, chaotic fluctuations of visual presentation.
In the first case the observer notices that visual awareness suddenly changes, whereas it is hard
to regain the previous presentation. An example is the well known ‘Dalmatian dog’ picture86. At
first blush it looks like a pattern of blotches. Once you’ve seen the dog, it will stubbornly stay. In
the second case the presentations jump back and forth between a number of fairly obvious pres-
entations. A well-known case is Jastrow’s duck-rabbit:87 you never see anything like a ‘duck-rabbit’,
but either a duck or a rabbit. Moreover, these presentations spontaneously flip. The third case
is perhaps the most interesting, both from an artistic and a scientific perspective. It is the case
famously described by Leonardo da Vinci, in which the observer never stops to ‘hallucinate’ in the

On ‘minimal art’ see http://en.wikipedia.org/wiki/Minimalism#Minimal_art.2C_minimalism_in_visual_art.


83

84 On film grain see http://en.wikipedia.org/wiki/Film_grain and http://grubbasoftware.com/filmlibrary_


trixpan.html. Famous for its artistic use of film grain was the German Twen magazine (1951–1971): http://
de.wikipedia.org/wiki/Twen_(Zeitschrift).
On open environments see http://en.wikipedia.org/wiki/Neighbourhood_(mathematics).
85

The Dalmatian dog picture can be seen at http://psylux.psych.tu-dresden.de/i1/kaw/diverses%20


86

Material/www.illusionworks.com/html/camouflage.html.
Jastrow’s duck-rabbit can be seen at http://en.wikipedia.org/wiki/File:Duck-Rabbit_illusion.jpg.
87
(a)

(b)

Fig. 43.15  (a) Rapid East by Suzanne Unrein, Courtesy of the artist (b) Robert Pepperell, Succulus
(2005) Oil on panel, 123 x 123 cm. Notice how Unrein paints in a ‘post-neo-baroque’-style. She
writes: ‘I started with Rubens, Correggio and Raphael, then branched out to less likely combinations
of Poussin and Bougereau. Now it’s the animaliers of the 17th & 18th centuries, the boar hunts
and dogfights. By combining the hounds from these genres with the figures from more epic scenes
the dogs become a dysfunctional Greek chorus further confusing the summarizing of a scene.
I am less interested in the narrative than the elements and forms that inspire the abstraction, and
movement, with a larger range of color combinations. By combining figures from a variety of artists
in a range of eras, I want to transport them from their original meaning into the contemporary
Perceptual Organization in Visual Art 915

presence of an image88. The first to attempt an analysis in the style of experimental phenomenol-


ogy on the topic was John Ruskin89. The effect was used in western art mainly in informal draw-
ings, or the background of ‘official’ paintings, until the surrealists claimed it as one of their main
devices. Leonardo writes:
look at walls splashed with a number of stains or stones of various mixed colors. If you have to invent
some scene, you can see their resemblances (similitudine) to a number of landscapes, adorned in vari-
ous ways with mountains, rivers, rocks, trees, plains, wide valleys and hills. Moreover, you can see vari-
ous battles, the rapid actions of figures, strange expressions on faces, costumes, and an infinite number
of things, which you can reduce to good, integrated form. This happens thus on walls and varicolored
stones, as in the sound of bells, in whose pealing you can find every word and name you can imagine.

Of course, the same thing happens when you look at (or into) a painting. John Ruskin is special
because he saw that one doesn’t need any ancient stained wall. Every vision suffices if you only
tune into the presence of ‘mystery’ in everything. Nothing is absolutely clear. You cannot count
the grains of sand beneath your feet, nor the leaves on the tree before you. What the painter paints
is not the leaves, but a leafy, ‘mysterious’ texture90. Therein lies the art.
There is a huge realm of the visual arts that exploits the pleasure experienced by observers due
to Ruskin’s mystery. It has merely come bluntly to the surface in modern times. Like all pictorial
structure, mystery occurs at all ontic levels. Much of surrealism occurred at the level of the repre-
sented entities. This is the level where René Magritte91 worked. In a sense, it is the least ‘visual’ of
these manifestations. The level of the ‘leafy texture’ is the level of the smallest relevant constitu-
ents. It is purely visual, and interesting, although only mildly so. It is to be expected in virtually
any serious painting (Magritte intentionally tried to avoid it). The most interesting levels from a
conceptual point of view are the levels of the simple meaningful units and the salient Gestalts. Some
of the more interesting work of Salvador Dali92 plays on the latter level, but the former is perhaps
the more interesting from the viewpoint of experimental phenomenology. Artists who address

88 Leonardo’s observations on what one might see in an old wall can be found at http://www.mirabilissimein-
venzioni.com/ing_treatiseonpainting_ing.html.
John Ruskin’s mystery is discussed in his Elements of Drawing, which can be downloaded from http://www.
89

gutenberg.org/files/30325/30325-h/30325-h.html.
On background texture (leafiness) see http://www.artsconnected.org/toolkit/encyc_texturetypes.html.
90

Good descriptions can be found in John Ruskin’s Modern Painters, an electronic version of which is avail-
able at http://www.lancs.ac.uk/fass/ruskin/empi/index.htm.
René François Ghislain Magritte (1898–1967) was a Belgian surrealist artist. See http://en.wikipedia.org/
91

wiki/René_Magritte.
Salvador Domingo Felipe Jacinto Dalí i Domènech, 1st Marqués de Dalí de Pubol (1904–1989), known as
92

Salvador Dalí, was a major surrealist artist. See http://en.wikipedia.org/wiki/Salvador_Dal%C3%AD.

domain and the challenge of newer interpretations’. Pepperell’s painting is ambiguous on purpose,
he writes ‘ . . . paintings and drawings are the result of intensive experimentation in materials
and methods designed to evoke a very specific, though elusive, state of mind. The works induce
a disrupted perceptual condition in which what we see cannot be matched with what we know.
Instead of a recognizable depiction the viewer is presented with—what the art historian Dario
Gamboni has called—a ‘potential image’, that is, a complex multiplicity of possible images, none of
which ever finally resolves’.
916 Koenderink

this level (for instance, Robert Pepperell93, or Suzanna Unrein94) play on the sentiments described
by Leonardo (Figure 43.15).

Conclusion
The topic is virtually boundless. I  have only touched on a few conceptually interesting issues
here, fully ignoring extensive fields of endeavor like architecture, photography, cinema, or mime.
Moreover, I did not touch on the tangencies with music, poetry, and so forth. Each subtopic could
easily be extended into a book, or a lifetime of research.
My main objective in this chapter has been to offer some general background for thought, and
to indicate potentially profitable openings for future research in the experimental phenomenol-
ogy of the visual arts.

Robert Pepperell (born 1963) is an artist and professor of fine art at the Cardiff School of Art and Design.
93

His website is http://www.robertpepperell.com.


Suzanne Unrein is a Californian artist. Her website is http://www.suzanneunrein.com.
94
Section 10

Theoretical approaches
Chapter 44

Hierarchical organization by
and-or tree
Jungseock Joo, Shuo Wang, and Song-Chun Zhu

Introduction
A natural scene is composed of many components. See the example of the scene of beach in
Figure 44.2. When we look at this image, our visual systems process a series of tasks in order to
understand the whole scene. These tasks include to decompose the whole scene into parts, group
them to form larger and larger parts, and organize discovered parts in a certain way. It has been a
fundamental problem in computer vision to mimic these procedures by machine vision systems.
However, this is a very challenging task due to the huge complexity arisen from an enormous
number of distinct scene configurations, which are composed of a variety of objects and regions
of varying shapes in different layouts.
In this chapter we will introduce a general model for scene or object categories that can repre-
sent varying configurations effectively. The desired properties of such models can be summarized
as follows:
1 It should incorporate generic grouping rules among image primitives at low-middle level
interpretation (i.e. Gestalt Laws) as well as category-specific production rules of parts at high
level (i.e. image grammar).
2 Compositionality is required as it ensures that the model can be expressive enough to deal with
hugely varying configurations of many components by a relatively small dictionary.
3 The structural representation should be flexible so that it can adaptively capture unique
configuration of each instance at multi-scales, as opposed to fixed representations.
4 Finally, the learned models should be unambiguous and allow only one interpretation to each
instance of a given scene or an object.
In order to fulfil such requirements, the proposed model will be a hierarchical compositional
model based on the tiling method. The tiling, as shown in Figure 44.1, can be seen as a process
of composing complex shapes by assembling smaller and simpler parts. Figure 44.1(a) shows a
tiling puzzle, an ancient invention in China, called ‘Tangram’. While it is composed of a small set
of very simple pieces, one can composite an enormous number of a variety of complex shapes by
assembling them. The same intuition can be also found in real-world examples such as tessellated
street pavement and ceramic tile flooring. In such cases, one can observe complex high-order
patterns emerging from one or few types of tiles according to specific configurations, namely,
organizations of tiles.
Inspired by these examples, each individual component of scene or object will be treated as a tile
in the proposed model whose visual dictionary will be a collection of all observable tiles. Each tile
is treated as a template that explains a specific part of the image. Then, the task of understanding
920 Joo, Wang, and Zhu

(a) Tangram (b) Tiling

Fish Swan House


Fig. 44.1  (a) The ‘Tangram’, the ancient Chinese puzzle, which consists of seven pieces and a few
examples of completed shapes formed by the pieces. One can composite an enormous number
of different shapes by assembling the same set of pieces. (b) Various types of tilings, also called
tessellation, in the real world. Although building blocks are simple and may be even identical,
high-order patterns can still emerge from specific configurations, namely, organizations.

Street scene Composition by scene tiles Hierarchical organization

Human upper-body Composition by body parts Hierarchical organization

Fig. 44.2  A natural scene (top) as well as an object (bottom) contain a number of components and
their subcomponents. We can completely understand the image by decomposing the whole into the
parts and organizing them.

the whole scene will simply become tiling, which is identifying proper tiles and assembling them.
As the nature of tiling, we consider the assembly of tiles in 2D space in this chapter, in contrast to
another class of models to cope with 3D arrangement of parts or primitives.
Our framework which utilizes image parts (tiles) and their relations is closely related
to a series of theories in part-based object recognition of human vision, for example
‘Recognition-by-Components’ by Biederman (1987). According to these models, humans per-
ceive given scenes as their ‘structural descriptions’ with a limited set of known components in
memory while a huge flexibility is achieved through combinations of the components. On the
Hierarchical Organization by And-Or Tree 921

other hand, another class of theories, ‘image-based’ models (Edelman and Bülthoff 1992; Tarr
and Bülthoff 1998), suggest that our brains store many viewpoint-specific images of the same
object. By analogy, our model also incorporates multiple templates, each of which explains an
aspect specific to viewpoint or appearance type. Such treatment allows us to deal with complex
and non-rigid parts of real-world objects such as humans. In contrast to image-based models, we
define the set of templates at the part level (rather than at the entire image level) and parse the
image into the parts with selected templates where the relations among the parts are also captured
by the model structure. Therefore, our proposed model can be seen as a combined approach that
can benefit from both classes of models.

Background Review
In this section, a group of related researches on perceptual organization will be briefly reviewed.
In particular, we will consider two different dimensions:  (1)  whether their grouping rules and
parts are generic or category-specific (see ‘Grouping rules:  generic vs category-specific’), and
(2) whether their representations are built on a flat layer or in hierarchy (see ‘Organization: flat
vs. hierarchical’).

Grouping Rules: Generic vs Category-Specific


At low level, an image can be seen as a collection of simple image features or primitives such as
line segments, junctions, and so on. At this level of abstract, relationship among primitives is dis-
regarded. It is the role of perceptual organization that exploits such relationship and detects the
groupings of elementary primitives. Gestalt laws such as proximity, continuity, etc. explain certain
patterns of grouping capabilities of humans, which lead to advanced interpretation enriched by
geometric contexts among primitives as middle-level representation.
These grouping rules and simple primitives are generic and commonly observed in any types
of objects and scene categories. The generic grouping rules of image structures have been studied
in many works in the literature, including Lowe’s early work (1985). Lowe viewed that the goal of
perceptual organization is to find out image relations arising from actual structure in the scene.
He measured this quantity for each grouping rule, such as collinearity and parallelism. Mohan
and Nevatia (1992) also exploited such grouping rules to detect geometrically related edges for
scene segmentation. These generic grouping rules often form simple and common groupings
of primitives, such as L-junction. More recently, Wu and Zhu (2007) defined a set of common
‘graphlets’ (simple primitives and junctions) as basic building blocks, and parsed the whole scene
from detected graphlets in a bottom-up manner.
Beside generic parts, any object or scene class also has its own unique parts as well as distinct
grouping rules, which can be seen as category-specific information. Thus, it is difficult to under-
stand the entire pattern of image solely by generic rules. Such unique parts, which could be formed
from generic parts, may have complex structures (compared to simple primitives) and be shared by
objects within one or a few classes. Therefore, learning and representing them cannot be achieved
in the same way as generic parts and grouping rules. In the late 1970s, Saund (1992) was among
the first to go beyond generic Gestalt laws. He pointed out that domain-specific knowledge plays
an important role in shape representation and one might lose this important information when
relying on a fixed set of generic shape primitives alone.
More generally, the goal of many high-level vision tasks is to learn category-specific dictionar-
ies of parts and their configurations. These dictionaries tend to contain more complex elements
than common primitives so that they can reflect distinct properties of each category of object or
922 Joo, Wang, and Zhu

scene. The corresponding configurations can also capture unique structure or relations of parts.
For example, a human and a dog have different sets of parts and different configurations, and none
of them can be identified by generic rules without domain knowledge.

Organization: Flat vs Hierarchical
The generic grouping rules, such as Gestalt laws, have been often posed as relational constraints
on the parts, which are modelled in a flat layer. For example, Zhu (1999) proposed a mathematical
framework based on Markov Random Field (MRF) whose neighbourhood structures captured
relationship between line segments. Through the structures, Gestalt laws were explicitly modelled
as pairwise features so that they could act as constraints posed on shape elements. Porway, Wang,
and Zhu (2010) also employed MRFs for aerial image parsing where the common elements of
aerial images such as parking lots, roads, etc. were defined on the graph. Subsequently, the statisti-
cal constraints such as relative position were added between objects.
However, certain relations or groupings can be better organized and expressed in the hierarchy of
different levels of abstract. A fractal pattern is a good example in which one can observe the law of sym-
metry recursively at infinite scales. Let’s also recall the beach example in Figure 44.2 which contains
many components and their subcomponents. One can easily imagine the huge complexity that would
be generated by modelling all components and their relations together on the flat representation.
The use of hierarchical representation for image modelling dates back to 1970s in Fu’s early
works (Fu 1974):  syntactic approaches in which pattern structures and sub-pattern relations
were modelled as symbolic tokens and production rules by analogy to natural languages.
Dickinson, Pentland, and Rosenfeld (1992) adopted a hierarchical Bayesian network for 3D
object recognition, where layers of short boundaries, object faces, and aspects were linked hier-
archically. Sarkar and Boyer (1994) also used the Bayesian network for grouping primitives into
hierarchical structures in aerial images. In both models, groupings were governed by condi-
tional probabilities defined over layers in the hierarchy. More recently, Geman and collaborators
(Bienenstock, German, and Potter 1997) presented grammatical and compositional frameworks
with applications such as vehicle licence plate recognition (Jin and Geman 2006). Zhu and
Mumford (2006) also proposed a general framework for image grammar named the And-Or
Graph, which we adopt in our model and will discuss in details in ‘Hierarchical Organization
by AOT’.
The key advantage of these approaches is that they can represent an enormous number of dis-
tinct configurations by composing a relatively smaller number of elements, instead of enumerat-
ing all possible configurations. In addition, hierarchical structures further allow us to limit local
complexity at each scale. As discussed at the beginning of this chapter, these are critical aspects in
modelling highly complex and versatile scene or object classes.
Again, the remaining question is how to learn image parts and their relations. In the rest of this
chapter, we will introduce a hierarchical compositional model based on ‘Hierarchical Tiling’. In
this model, the grouping rules will be defined by region-based recursive decomposition and each
subregion will correspond to an atomic element in the dictionary (see ‘Hierarchical Organization
by AOT’). Then the learning problem can be posed as a node pruning and parameter estimation
problem (see ‘Structure Learning by Parameter Estimation in AOT’).

Hierarchical Organization by AOT


Now we provide the definition and details of our model for hierarchical organization. We
adopt the And-Or Tree (AOT) (Zhu and Mumford 2006) as our main framework. The AOT
Hierarchical Organization by And-Or Tree 923

has been used for modelling objects and scenes in the literature of computer vision (Zhu,
Chen, and Yuille 2009). An AOT, as the stochastic image grammar, represents the hierar-
chical decompositions of elements and produces a number of varying configurations by
alternating sub-components subject to probabilistic distributions defined over nodes and
edges.
Each node in the AOT plays a distinct role according to its node type. As Figure 44.3 illustrates,
an AOT has three types of node: AND nodes, OR nodes, and Terminal nodes. Note that all nodes
are associated with specific subregions and the root node corresponds to the whole region of
image. Each type can be characterized as follows.
1 An AND node represents the composition of two subregions. For instance, ‘upper-body’ = ‘head’
∪ ‘torso’. By the definition of hierarchical tiling, AND nodes always have two child nodes.
2 An OR node contains several alternating ways to decompose the current region. This is a switch
indicating how and where to partition the current region.
3 A Terminal node corresponds to the most elementary region that is not decomposed further.
Note that an AOT is a ‘whole’ representation of the entire scene class, in the sense that all possible
decompositions of all subregions are integrated in this AOT. In order to represent a particular
image, one needs to make choices at OR nodes to select specific decompositions. We call this
process parsing, which yields corresponding representations as follows:
1 A  Parse Tree is an image-specific instance drawn from AOT. This is a set of selected nodes
including terminal and non-terminal nodes.
2 A Configuration means a spatial layout of elementary regions in a parse tree. In other words, it
is a set of terminal nodes in a parse tree which does not reflect hierarchical relationship.
3 A  whole AOT can be seen as an entire collection of all possible parse trees and
configurations.
One important benefit of this representation is the flexibility that is required to account for
varying scene components and configurations. When built on an 8 x 8 grid, the AOT can
generate more than 4 × 1031 different parse trees. This flexibility comes from only 1296 rectan-
gular building blocks that are reconfigurable. The efficiency of this model partly relies on the

Or-nodes

And-nodes … …

Or-nodes

And-nodes … … Learning … …


Configurations …

Initial full AOT/HST |Ωpt |~O(1031) Learned HST |Ωpt |~O(103)

Fig. 44.3  During the learning process, a number of invalid configurations are eliminated from the
initial model. This results in a huge drop in the complexity of the model and the final model only
contains a compact set of meaningful configurations which can be frequently observed in the
training images.
924 Joo, Wang, and Zhu

fact that smaller subregions—nodes in lower layers in AOT—can be shared by multiple par-
ent regions of a higher order. However, such huge flexibility also introduces a counter-effect
on increased complexity and ambiguity. We will discuss this issue in detail in the following
section.

Mathematical Formalism
In this subsection, we define notations and introduce mathematical formalisms. Given a set of N
training images {Ii}, our objective is to learn an AOT with visual dictionary and associated param-
eters. Let us define the AOT as follows.

AOT = (S,V ; Θ, ∆),  (1)

where S is a start symbol at root, i.e. the whole region, V is a set of nodes in AOT. A node, vi
∈ v, has one of types: {AND, OR, Terminal} as section 3. Θ is the set of model parameters which
control the frequencies of decomposition rules being activated at OR nodes. The tiling dictionary
of the scenes is denoted by Δ, which is also a set of terminal nodes in V.
From the AOT, the learning problem can be formulated as maximum likelihood estimation
(MLE).
N
(∆, Θ)* = arg max ∆ ,Θ ∑ log p(I i ; ∆, Θ). (2)
i =1

In the AOT model, each image I is generated by a hidden parse tree, pt. Then the data likelihood
in Eq. (2) can be marginalized over parse trees and further factorized as follows.

p(I i ; ∆, Θ) = ∑ p(I i , pt ; ∆, Θ) (3)


pt 

= ∑ p(I i | pt ; ∆) ⋅ p( pt ; Θ). (4)


pt

For a certain parse tree, pt, the first factor of the product in Eq. (4) represents the likelihood
of an image given the parse tree. In order words, it measures how well the parse tree and corre-
sponding configuration are suited to or explains the given image. And the second part, p( pt ; Θ),
is a prior probability of the parse tree and this measures how commonly this parse tree would be
used. This part is not affected by the choice of image.

Structure Learning by Parameter Estimation in AOT


So far, we have discussed the general structure of our model. The next step is to learn actual
models from training images. To learn a model means to define the whole structure and estimate
optimal parameters such as probabilistic distribution, from training data. In our model, this can
be understood as learning how frequently each decomposition has happened and ruling out those
paths that have never or rarely occurred.
This procedure can be easily understood when we think of what we do when we learn our
visual world. For example, let’s imagine a typical scene of ‘beach’ (as in presented in Figure 44.2).
One would probably construct in mind a horizontally divided scene with the sky at the top and
the ocean at the bottom, because this spatial configuration is very common in beach scenes that
we have observed, and we have learned and stored such frequency of configurations in our mind.
Hierarchical Organization by And-Or Tree 925

Therefore, our learning procedure follows the exact same strategy as humans do. The algorithm
takes as input a set of training images and infers the most probable interpretations of them, i.e.
parse trees and configurations. Next, it can evaluate what kinds of configuration are the most
common and how frequent each one is. Such information is stored as parameters of the learned
model, and eventually, can be used for analysing a new image.
On the other hand, the main difficulty in many structure-learning algorithms comes from the
fact that there are too many different ways to decompose the scene into parts, i.e. ambiguity.
This difficulty can be alleviated here by constraining the feasible set of structures by definition
of the hierarchical tiling described in the previous section. The hierarchical tiling AOT contains
a number of rectangles on the grid as basic building blocks as well as rules of decomposition.
In this representation, the original continuous geometric space is quantized at the resolution of
the grid, and moreover, factorized into the local forms of three regions: one parent region at an
AND node and two subsequent subregions. Therefore, the complexity is locally limited, and this
makes the model manageable in learning. Note that, despite this constraint, it can still represent
a combinatorial number of parse trees, which provide enough flexibility to modelling a variety
of configurations.
Figure 44.3 illustrates the key idea of the learning procedure which can be seen as a shrink-
ing process. It first establishes a very ‘fat’ and highly over-complete initial model. This model
can generate an exponential number of different configurations. Some of these configurations
are useful (they correspond to real examples of natural scenes); however, most of the other
configurations do not make any sense and are unable to capture the meaningful structure of
any natural scenes. These meaningless configurations will be gradually eliminated from the
initial model during the learning procedure. Eventually, the learned model can generate a
much more compact set of configurations and parse trees, which one can commonly observe
in real images.

Iterative Learning
In our formulation, a parse tree is a latent variable that is not observable. One common algorithm
used for maximum likelihood estimates with latent variables is the expectation-maximization
(EM) algorithm (Dempster, Laird, and Rubin 1977). This is an iterative algorithm and alternates
between evaluating the posterior distribution of latent variable and updating model parameters,
based on the current estimates at each iteration. Our learning algorithm follows a similar iterative
strategy which alternates between inference of the optimal parse trees and updating parameters.
The details of each step can be summarized as follows.
1 Inference. Inference is the task of evaluating the most probable parse tree which can be
considered as the best interpretation of a given image under the current parameters of
AOT. We obtain the optimal parse tree for each image by dynamic programming (DP) in a
bottom-up process. For a given image Ii, the optimal parse tree, pt i*, maximizes the following
probability:

pt i* = arg max pt p( Ii | pt ; ∆ t ). p( pt ; Θt ) (5)

The parse tree prior is the product of branching frequencies at OR nodes.

p( pt ; Θt ) = ∏ Θt(v ,v ) ,
ch

(6)
v ∈V OR ⊂ pt

where Θt(v ,v )
is the branching frequency from an OR node v to its child node, vch.
ch
926 Joo, Wang, and Zhu

2 Activation frequency update. After obtaining the optimal parse trees, now the parameters of
model are updated. These parameters include the activation frequency, Θ, which indicates
frequencies of decomposition rules.

Θt(v+,1v =
∑ 1[v, v ∈ pt ] .
i ch
*
i
ch
)
∑ 1[v ∈ pt ] (7)
i
*
i

3 Node pruning. According to the updated frequency, the dictionary is compressed by pruning
nodes which have never or rarely been activated.

∆ t +1 = ∆ t \ {v ; f (v ) < ε, v ∈ ∆ t },(8)

1
f (v ) =
M i
∑1[v ∈ pti* ]. (9)


These steps are repeated until the model converges. At the beginning, an initial AOT contains a huge
number of decomposition rules and a large dictionary, and there is a very high ambiguity on parsing
images. As iterations proceed, the model parameters keep being refined; in addition, the size of the
dictionary becomes smaller. A series of relevant experimental results is presented in the following
sections, with applications to the scene and the human body.

Case Study I. Scene


In this section, we present an example of the concrete development of the introduced algorithm
for scene modelling and its evaluation. The experimental results of this section were reported in
Wang, Wang, and Zhu (2013).
For the purpose of scene analysis, a dataset of natural scene images is proposed to a computer
vision community (Russell et al. 2008). This dataset contains 2688 images from eight categories of
outdoor scene including coast, highway, open country, street, forest, tall building, inside city, and
mountain. Figure 44.5 shows examples of each category in the dataset.
For each image, our algorithm first generates multiple segmentations by a graph-based seg-
mentation method (Felzenszwalb and Huttenlocher 2004), as shown in Figure 44.4(b), while
varying the parameter, k, which controls the granularity of segmented regions. From the set of
segmentation layers, we obtain the optimal parse tree and corresponding configuration, which
are consistent with prior learned parsing and preserve the homogeneity of each terminal tile.
That is, we encourage the model to parse an image into a more familiar configuration where each
perceptually homogeneous subregion, an image segment, is explained by an elementary part in
one piece.

Qualitative Results
Table 44.1 shows the statistics on the complexity of AOT. The size of parsing space that an initial
AOT defines is combinatorial. It contains a huge number of region decomposition rules and this
can generate an enormous number of distinct parse trees. This also implies a high ambiguity.
Through the iterative learning procedure, the admissible parsing space quickly shrinks by pruning
Hierarchical Organization by And-Or Tree 927

Table 44.1  The shrinkage of AOT for a ‘street’ scene at each iteration round

AND OR
Round |V | |V | |VT | | Ωpt |

0 6048 1296 1296 4.48 × 1031

1 570 519 366 8.01 × 107


2 351 386 256 2.23 × 105
3 238 302 184 1.14 × 103
4 221 290 173 9140

k=300 k=400 k=500 k=600 k=700 k=800 k=900 k=1000 k=1100

k=1200 k=1300 k=1500 k=1700 k=2800 k=2900 k=3300 k=4600 k=5000


(a) Scene image (b) Segmentation pool

(c) Parse tree (d) Scene configuration (e) Local adjustment

Fig. 44.4  Parse an image into scene configuration. (a) Input image. (b) Segmentations in different
layers. (c) The optimal parse tree of given image. (d) Scene configuration. (e) Scene configuration
with localized parts.

many infrequent parsing rules and nodes. After convergence, the learned AOT only contains a
compact set of common parsing paths and nodes.

Scene Category Classification


The goal of scene category classification is to predict a scene category to which each image belongs.
This is a multi-class classification problem which has attracted many researches in computer vision.
Many prior works have focused on exploring better image feature without considering structural rep-
resentation - ‘gist’ (Oliva and Torralba 2001) or ‘bag of words’ (Li and Perona 2005), or building their
models on limited or fixed structures - ‘spatial pyramid’ (Lazebnik, Schmid, and Ponce 2006). In con-
trast, our model can take advantage of much more flexible representation by AOT.
Specifically, we obtain a set of typical configurations of each scene category from the learned
AOT, as shown in Figure 44.5. We use SIFT descriptors and colour moments of each terminal
window as image features and train category classifiers by support vector machine (SVM). Given
a test image, we assign the best category whose prediction score is maximum. Note that the
928 Joo, Wang, and Zhu

ground-truth segmentations (label map) of training images are provided in this dataset and we
used them for compatible comparisons with the other method.
We compare the performance of our model with prior works including:  (1)  a holistic ‘gist’
feature-based method (Oliva and Torralba 2001); (2) a BoW based method (Li and Perona 2005);
(3) the spatial pyramid matching (SPM) method (Lazebnik et al. 2006); (4) the locality-constrained
linear coding (LLC) (Wang et  al. 2010), and (5)  the tangram model (Tgm) (Zhu et  al. 2012).
Figure 44.5 shows the average precision (AP) of different methods, where our method outper-
forms the others.
This is strong evidence supporting the needs of flexible and hierarchical models in understand-
ing the scene. Without such a hierarchy, one can still identify some common visual words (BoW),
but one loses the spatial information and the relationship between parts, and fails to capture the
context on the entire scene. Although some uniformly predefined configurations have been used
in SPM, it still results in poor performance. One possible explanation is that their configura-
tions, regular grids at multi-resolution, are not coherent with real images of scenes. Therefore, by

0.025 Coast

0.02

0.015
Posterior

0.01 (f) ‘Tall building scene’

0.005

0
0 10 20 30 40 50 (b) ‘Coast scene’
Configuration
Forest

(g) ‘High way scene’


Highway Inside city Mountain

Open- Street Tall-


country building

(a) Configuration distribution


(d) ‘Mountain scene’ (h) ‘Street scene’

(c) ‘Open country scene’ (e) ‘Forest scene’ (i) ‘Inside city scene’

Methods Gist[15] BoW [12] SPM [11] LLC [22] Tgm [25] Ours[23]

AP(%) 72.15 84.57 84.92 87.97 86.07 91.71

Fig. 44.5  Scene classification based on the categorical typical configurations. (a) The learned
configuration distributions where the horizontal axis is the index of configuration and the vertical
axis is the posterior probability. (b)–(i) The categorical typical configurations for each scene category.
The performance of scene classification is shown in the bottom table.
Hierarchical Organization by And-Or Tree 929

Fig. 44.6  Examples of upper bodies of humans.

pursuing meaningful spatial layout from training data, the hierarchical tiling model can improve
classification performance.

Case Study II. Object: Human Figures


In this section, we present the application of our algorithm to an object with examples of human
bodies. As shown in Figure 44.7, a complete human body can be understood as a hierarchical
organization of body parts. In fact, this type of hierarchical model has been used for tasks such as
human pose estimation (Zhu et al. 2011) and general object detection (Felzenszwalb et al. 2010)
in the recent literature. The common idea behind such methods is to decompose the whole object
into its parts and analyse them. Compared to the conventional whole-template-based approach,
which has no part definition, the strength of the part-based approach is in capturing the indi-
vidual geometric variation of each part and the relationship between parts; this has led to an
improvement in object detection performance (Felzenszwalb et al. 2010).
While a majority of works focus on learning the parameters of manually defined structures of
objects, the other line of research has pursued learning the unknown structure of objects from
images (Zhu et al. 2008; Fidler and Leonardis 2007). The learning method we have introduced in this
chapter also falls into this category. In our method the task of learning is equivalent to identifying
the hierarchical dictionary of body parts including varying types of appearance from raw images.
Figure 44.6 shows examples of input training images that contain the upper bodies of humans.
Images are pre-processed by cropping and aligning with respect to the positions of head and waist.
The algorithm starts by learning appearance models at each rectangular subregion in the AOT.
This is essentially a task to learn the conditional image likelihood given a terminal node, p(I | pt )
in Eq. (4). To model the likelihood of appearance, a hybrid image template (HIT) has been used
in this experiment. The HIT is a generative image model with four different types of low-level
feature: {sketch, colour, texture, flatness}. Details can be found in Si and Zhu (2012). A single HIT
template can be learned for each terminal node to represent an individual part. An entire AOT
containing a number of HIT templates can generate many compositional human poses, each of
which is a composed HIT template for the human body.
For each subregion at a terminal node, the corresponding patches of all training images are
cropped and clustered by their appearance into k distinct groups. From each cluster, a single HIT
template is learned so that one rectangular subregion has k different appearance models. These k
templates address the different appearance types of a part. For example, a head can be modelled
as a mixture of templates including ‘head with hat’, ‘head with long hair’, and so on. For this rea-
son, the task of parsing now includes the choice of specific appearance types at previous terminal
nodes. Consequently, each terminal node becomes another OR node whose children are a set of
appearance templates. A  complete parse tree now includes spatial configuration as well as the
associated appearance types of parts.
930 Joo, Wang, and Zhu

E F

Image D

A
C
B
E C F

D A B

Configuration Parse tree


Fig. 44.7  As in the case of scene, one can interpret the human body as a collection of body parts
which can be organized in the object-level And-Or Tree. The configuration will be governed by the
pose of body as well as different clothing or accessories (jeans, skirt, etc.) that each person wears.

Learning Body Parts


At this point, we still do not have a clue as to what subregions are true human parts, and all templates
are treated as potential parts. As in the case of the scene, we build a fat initial AOT and go through
iterations in order to develop and refine a compact model in which ambiguous parts are suppressed.
Figure 44.8 shows a series of optimal configurations being developed through iterations. At
the beginning, the ambiguity is very high as there are too many redundant parts and the parse
tree prior is still immature. As the learning proceeds, the optimal configuration becomes more
meaningful and finally captures the correct parts of human bodies. Some of those parts are pre-
sented in Figure 44.8. These are the parts which appear most frequently during the learning, and
are thus included in the learned dictionary.
From the result, one might wonder how the model can determine true parts or which parts are
preferred over the others. There are two factors which decide the optimal parse tree: the image
likelihood from selected appearance templates and the parse tree prior, which controls the overall
frequency of parts being activated. The set of true atomic parts that can be modelled by rigid tem-
plates tend to be more robust in relation to articulation, which leads to higher image likelihood.
As a result, we can deduce that some good appearance templates (hence, good parse trees) are
more likely to be selected at the earlier stages of learning and that the other ambiguous parse trees
will move towards a smaller set of good parse trees to which stronger priors are given.

Conclusion
In this chapter, a hierarchical representation of images and its learning algorithm were discussed.
The And-Or Tree (AOT) was adopted as the main framework modelling the hierarchy of image
structure. An algorithm to learn the parameters and dictionary of the AOT was suggested with
Hierarchical Organization by And-Or Tree 931

t=1 t=2 … t=T ∆T

Fig. 44.8  (Left) The optimal configurations pooled from the AOT at each iteration. At the beginning,
the ambiguity is very high. As the learning proceeds, the optimal configuration becomes more
meaningful, and finally captures the correct parts of human bodies. (Right) Some popular elements
in the dictionary of the AOT after learning.

mathematical formalisms. Finally, to demonstrate the introduced model and learning method,
two concrete cases, for natural scenes and human bodies, were presented, with various experi-
mental results.

Acknowledgements
This work was supported by NSF CNS 1028381, DARPA MSEE grant FA 8650-11-1-7149 and
MURI grant from ONR N00014-10-1-0933. We would like to thank Johan Wagemans and two
anonymous reviewers for their valuable comments.

References
Biederman, I. (1987). ‘Recognition-by-Components: A Theory of Human Image Understanding’.
Psychological Review 94: 115–147.
Bienenstock, E., S. Geman, and D. Potter (1997). ‘Compositionality, MDL Priors, and Object Recognition’.
In Advances in Neural Information Processing Systems, edited by M. C. Mozer, M. I. Jordan, and
T. Petsche, pp. 838–844. Cambridge, MA: MIT Press.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). ‘Maximum Likelihood from Incomplete Data via the
EM Algorithm’. Journal of the Royal Statistical Society, Series B 39: 1–38.
Dickinson, S. J., A. P. Pentland, and A. Rosenfeld. (1992). ‘From Volumes to Views: An Approach to
3-D Object Recognition’. Computer Vision, Graphics, and Image Processing: Image Understanding
55(2): 130–154.
Edelman, S. and H. H. Bülthoff (1992). ‘Orientation Dependence in the Recognition of Familiar and Novel
Views of Three-Dimensional Objects’. Vision Research 32: 2385–2400.
Felzenszwalb, P. F. and D. P. Huttenlocher (2004). ‘Efficient Graph-Based Image Segmentation’.
International Journal of Computer Vision 59(2): 167–181.
Felzenszwalb, P. F., R. B. Girshick, D. A. McAllester, and D. Ramanan (2010). ‘Object Detection with
Discriminatively Trained Part-based Models’. IEEE Transactions on Pattern Analysis and Machine
Intelligence 32(9): 1627–1645.
932 Joo, Wang, and Zhu

Fidler, S. and A. Leonardis (2007). ‘Towards Scalable Representations of Object Categories: Learning


a Hierarchy of Parts’. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Los
Alamitos, CA: IEEE.
Fu, K.-S. (1974). Syntactic Methods in Pattern Recognition. New York: Academic.
Jin, Y. and S. Geman (2006). ‘Context and Hierarchy in a Probabilistic Image Model’. In IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2145–2152. Los Alamitos, CA: IEEE.
Lazebnik, S., C. Schmid, and J. Ponce (2006). ‘Beyond Bags of Features: Spatial Pyramid Matching for
Recognizing Natural Scene Categories’. In IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2169–2178. Los Alamitos, CA: IEEE.
Li, F.-F. and P. Perona (2005). ‘A Bayesian Hierarchical Model for Learning Natural Scene Categories’. In
IEEE Conference on Computer Vision and Pattern Recognition, pp. 524–531. Los Alamitos, CA: IEEE.
Lowe, D. G. (1985). Perceptual Organization and Visual Recognition. Norwell, MA: Kluwer Academic.
Mohan, R. and R. Nevatia (1992). ‘Perceptual Organization for Scene Segmentation and Description’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 14(6): 616–635.
Oliva, A. and A. Torralba. (2001). ‘Modeling the Shape of the Scene: A Holistic Representation of the
Spatial Envelope’. International Journal of Computer Vision 42(3):145–175.
Porway, J., Q. Wang, and S.-C. Zhu (2010). ‘A Hierarchical and Contextual Model for Aerial Image Parsing’.
International Journal of Computer Vision 88(2): 254–283.
Russell, B. C., A. Torralba, K. P. Murphy, and W. T. Freeman (2008). ‘Labelme: A Database and Web-Based
Tool for Image Annotation’. International Journal of Computer Vision 77(1–3): 157–173.
Sarkar, S. and K. L. Boyer (1994). Computing Perceptual Organization in Computer Vision. Hackensack,
NJ: World Scientific.
Saund, E. (1992). ‘Putting Knowledge into a Visual Shape Representation’. Artificial Intelligence
54(1): 71–119.
Si, Z. and S.-C. Zhu (2012). ‘Learning Hybrid Image Templates (HIT) by Information Projection’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34(7): 1354–1367.
Tarr, M. J. and H. H. Bülthoff (1998). ‘Image-Based Object Recognition in Man, Monkey and Machine’.
Cognition 67: 1–20.
Wang, J., J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong (2010). ‘Locality-Constrained Linear Coding for
Image Classification’. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3260–3367.
Los Alamitos, CA: IEEE.
Wang, S., Y. Wang, and S.-C. Zhu (2012). ‘Hierarchical Space Tiling in Scene Modeling’. In Asia Conference
on Computer Vision, pp. 796–810. Berlin: Springer.
Wu, T., G.-S. Xia, and S.-C. Zhu (2007). ‘Compositional Boosting for Computing Hierarchical Image
Structures’. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Los Alamitos,
CA: IEEE.
Zhu, J., T. Wu, S.-C. Zhu, X. Yang, and W. Zhang (2012). ‘Learning Reconfigurable Scene Representation
by Tangram Model’. In IEEE Workshop on Computer Vision, pp. 449–456. Los Alamitos, CA: IEEE.
Zhu, L., Y. Chen, and A. L. Yuille (2009). ‘Unsupervised Learning of Probabilistic Grammar-Markov
Models for Object Categories’. IEEE Transactions on Pattern Analysis and Machine Intelligence
31(1): 114–128.
Zhu, L., Y. Chen, C. Lin, and A. L. Yuille (2011). ‘Max Margin Learning of Hierarchical Configural
Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation’. International Journal
of Computer Vision 93(1): 1–21.
Zhu, S.-C. (1999). ‘Embedding Gestalt Laws in Markov Random Fields’. IEEE Transactions on Pattern
Analysis and Machine Intelligence 21(11): 1170–1187.
Zhu, S.-C. and D. Mumford (2006). ‘A Stochastic Grammar of Images’. Foundations and Trends in Computer
Graphics and Vision 2(4): 259–362.
Chapter 45

Probabilistic models of
perceptual features
Jacob Feldman

Features
A ubiquitous element in perceptual theory is that of a feature, meaning a measurable attribute
of an object, such as its color, form, orientation, or motion. Features are a routine part of the
description of experimental stimuli, and an essential component of verbal descriptions of every-
day visual experience (the black pen is on the square table). Features play a wide variety of roles
in perceptual theory. Features such as convexity and symmetry are thought to influence figure/
ground interpretation (Kanizsa and Gerbino, 1976), helping to form initial representations of
objects (see Peterson, this volume). Later on each object’s features are bound together to form
complex object representations (Treisman and Gelade 1980; Ashby et al. 1996). Still later each
objects’ features are used to classify them into larger categories (Feldman 2000; Lee and Navarro
2002; Ullman et al. 2002).
But behind the simple idea of a ‘feature’ lurks some deep theoretical issues and controversies,
involving how features are defined and what motivates the choice of a particular feature vocab-
ulary (Jepson and Richards 1992; Koenderink 1993). This brief chapter centers on the ongo-
ing evolution of the feature concept from a ‘classical’ view, in which features are deterministic
attributes of objects, to a more probabilistic view, in which features are probabilistic estimates
of attributes inferred from image data. The newer view has grown in prominence in conjunc-
tion with a broader probabilistic conception of perceptual inference more generally (Knill and
Richards 1996).
It is useful first to distinguish certain commonly used terms and among different notions of ‘fea-
ture’ that are occasionally conflated. The terms feature, property, dimension, and attribute are all
commonly used to refer to an image characteristic that varies among visual objects. Each of these
terms is sometimes used to indicate the characteristic that can vary (e.g. size), or to a particular
value that it can take (e.g., large). Thus some authors refer to color as a feature, while others use
the term to refer to specific values such as red, white, or blue; and so forth. Some authors reserve
one term (e.g. feature) for the variable, another for the value (e.g. property), but such usage does
not seem to be consistent across the literature. The term feature is sometimes reserved for discrete
qualities, meaning those that can take one of a finite number of distinct values, including discre-
tizations of what are normally continuous-valued properties: examples include red vs. green (two
discrete cases of the continuous parameter color) or vertical vs. horizontal (discrete cases of the
continuous parameter orientation), and so forth (Aitkin 2009). Features with exactly two values,
often referred to as binary or Boolean, can be understood to involve the presence or absence of
some attribute (e.g. red vs. non-red).
934 Feldman

A more subtle distinction particular to the term feature is that it is sometimes used to refer
to localizable elements within an image, such as the facial ‘features’—eyes, nose, and mouth—
located at various positions on a face. Researchers in stereopsis, for example, refer to correspond-
ence between features in the left and right visual images, meaning local elements of the image with
well-defined locations (Poggio and Poggio, 1984). Any visual function that involves searching for,
counting, or measuring distances among features presumes this sense of the word. In contrast
many ‘features’, such as shape or color, are not localizable, but are characteristic of whole objects.
The distinction between these two senses of feature breaks down a bit when spatially localizable
elements are described in terms of their characteristics. For example, a T-junction is a spatially
localizable element of a line drawing, but is also a characteristic that some line junctions have and
others do not. In this review I will focus on the first sense of feature, as a characteristic that varies
among objects, although the issue of localizability becomes central later when we consider local
vs. global support for features.

Classical vs. probabilistic models of features


Historically, features have usually (and often tacitly) been defined by clear-cut criteria: e.g. feature
f holds when image measurement m lies above some threshold m0 (m ≥ m0) and does not hold
(i.e., ¬f holds) otherwise (m < m0) (Figure 45.1). Thus vertical lines are those between within 5◦ of
the direction of gravity; collinear edges have orientation difference of less than 30◦ (e.g. Field et al.
1993); relatable edges have linear extensions that intersect at an acute angle (Kellman and Shipley
1991). Such definitions have the advantage of clarity, and are often perfectly apt for experimental
contexts in which stimuli are artificially constructed to either fulfill them or not fulfill them as
desired for the purposes of the experiment. However, with natural stimuli this simple criterial
conceptualization of features suffers from at least three problems: hard boundaries, arbitrariness,
and insensitivity to context.

m0

(a)
m

¬f f

(b)
Probability

p(m|¬ f ) p(m| f )

m
Fig. 45.1  Schematic illustrating the difference between (a) classical features, which divide the
measurement space (m) into clean-cut classes; and (b) probabilistic features, which are based on
potentially overlapping probability distributions.
Probabilistic models of perceptual features 935

1  Hard boundaries. Criterial definitions impose clear-cut boundaries between values of a


feature, treating all instances that meet the criterion equivalently. Thus all vertical lines are
equally vertical, while all non-vertical lines are equally not. Such a boundary inevitably treats
many nearby cases as qualitatively different, while folding together cases that are distant in the
underlying space, a distortion that rarely corresponds well to the more graded percept. With hard
criteria, in-between cases do not exist; there is no way of expressing the idea of a line that is
somewhat, almost, or partly vertical.

This issue parallels a famous debate in the literature on cognitive categories, which following the
seminal papers of Posner and Keele (1968) and Rosch (1973) evolved from a ‘classical’ conception
based on necessary and sufficient features (see Smith and Medin 1981) to a graded and ‘fuzzy’
view based on prototypes (Posner and Keele 1968; Reed 1972), exemplars (Medin and Schaffer
1978; Nosofsky 1986), or both (Nosofsky et al. 1994; Anderson and Betz 2001), in order to account
for the observation that some category instances seem to be better examples of the category than
others. In recent years the modern view has been expressed via probabilistic models in which
conceptual representations are probabilistic estimates of underlying generating classes (Anderson
1991; Ashby and Alfonso-Reese 1995; Goodman et al. 2008; Briscoe and Feldman 2011). In a few
famous cases, perceptual processes do seem to impose relatively hard boundaries at thresholds
along continuous parameters, a phenomenon known as categorical perception (see Harnad 1987).
However such cases are exceptional, and in any case even they seem to involve gradations in the
vicinity of the threshold.
2  Arbitrariness. In the classical view a feature like between 600 and 601 meters in height is
perfectly well-defined, even though it captures no natural kind, and may not distinguish in any
useful way between objects that satisfy it and those that do not. Such features are arbitrary in
that they fail to relate to real classes actually extant in the environment. A desirable property of a
feature vocabulary is that it be well-tuned to the classes it use used to describe, a desideratum the
classical model in no way guarantees.
3  Insensitivity to context. A feature like has a 6-cylinder engine is perfectly well-defined for
cars, but makes no sense when applied to trees, and vice versa for evergreen. Such features
make meaningful distinctions only within a single narrow context. Indeed human subjects are
known to employ different features depending on context (Blair and Homa 2005; Schyns et al.
1998; Goldstone and Steyvers 2001) and can learn new features in new contexts (De Baene
et al. 2008; Stilp et al. 2010). But a classical feature vocabulary does not in any way constrain
the context in which features are applied, since their definitions make reference only to image
conditions satisfied or not. As with arbitrariness, the problem is that classical features allow no
connection between their definitions and the properties of the environment.

The sections that follow outline a modern probabilistic conception of features that avoids each
of the above defects. Probabilistic conceptions of features are certainly not new, but have grown
over several decades (from roots in signal detection theory; see Green and Swets 1966). The recent
explosion in probabilistic conceptions of perception (see Kersten 2004 or Feldman, this volume)
has introduced a natural mathematical language for expressing many probabilistic ideas, includ-
ing that of a perceptual feature. In what follows I  attempt to lay out the basic modern idea of
probabilistic features in a simple and general way.
936 Feldman

Probabilistic features
From a probabilistic viewpoint, features are attributes of objects that are estimated from image
measurements, rather than measurements of image properties per se. The assumption is that
measurable image properties derive from both fixed distal properties of objects as well as random
noise, and that useful features attempt to extract the signal from the noise (see Figure 45.1). To
formalize this, we assume that an object feature f involves a likelihood distribution over image
measurements m,

p (m | f ) ∼ µ + e (1)

where µ is some mean value of m conditioned on the presence of the feature, and e is an error
drawn from a noise distribution with mean 0, such as a Gaussian

(
e ~ N 0, σ 2 ) (2)

The probabilistic assignment of feature values to image structures then proceeds by Bayes’ rule: an
object with measurement m is assigned feature f in proportion to the posterior

p ( f | m) ∝ p (m | f ) p ( f ) (3)

p ( f | m) ∝ p (m | f ) p ( f ). where p(f) is a prior distribution over feature values. The prior may
(though need not) be uniform (e.g. p ( f ) = p(¬f ) in the case of a Boolean feature) in which case
the posterior is proportional to the likelihood. The likelihood model p(m | f ) is sometimes called
a generative model because in effect it is a model of how the image was generated, describing how
observables (m) are generated stochastically from the distal reality (f). The Gaussian model given
above is only an example; other functional forms may be assumed, so long as they define a distri-
bution p(m | f ).
For example, the feature large might classically have been defined via a range of permissible
object sizes. But probabilistically it would be defined via a mean size µ, say 3 cm, plus some error
distribution, say normal with standard deviation 1 cm. (The mean µ itself might be conditioned
on other aspects of context, allowing large to mean different things in different contexts; see
below.) In contrast with the classical view, this means that largeness is a graded quality, with some
objects more likely to be regarded as large (namely, those closer to 3 cm) and others less likely.
This also means that the category of large objects can actually overlap with that of small objects
(see Figure 45.1). That is, a given object can be described by two contradictory features, although
generally with different probabilities.

Non-accidental features
Non-accidental features are an important class of perceptual feature that have received somewhat
more careful mathematical attention. As originally defined by Binford (1981) and Lowe (1987)
non-accidental features are properties of 2D configurations (e.g., cotermination of line segments
in the image) that reliably occur in the presence of associated 3D configurations (cotermination of
3D line segments in the world) but are very unlikely otherwise; that is, they are unlikely to occur
‘by accident’. Other examples include collinearity, parallelism, and skew symmetry (Wagemans
1993). More generally, a non-accidental feature is one that has high probability if certain distal
Probabilistic models of perceptual features 937

conditions are satisfied, but low probability otherwise. There is substantial, though not unalloyed,
empirical evidence that the visual system is particularly sensitive to non-accidental features1
(Wagemans 1992; Vogels et al. 2001; Feldman 2007; Amir et al. 2012), and they play an impor-
tant role in Biederman’s influential (1987) Recognition by Components (RBC) account of object
recognition.
Formally, a discrete image feature M (corresponding, say, to a fixed range of some measure-
ment m) is non-accidental with respect to a distal feature f if M has high probability in the
presence of f, i.e. p( M | f ) ≈ 1, but low probability otherwise, p( M | f ) ≈ 0. Jepson and Richards
(1992) showed that another condition is required in order for M to reliably indicate the presence
of f, namely that the prior on f be elevated relative to alternatives. That is, f must be a condition
that occurs with elevated probability in the world; it must be a recurring regularity (see also
Feldman 2009).2 As in the ubiquitous illustration of Bayesian inference in a medical context—in
which reliable inference of a disease based on a positive test requires not only an accurate (sen-
sitive and specific) test but also a high prior (e.g. see Gigerenzer and Hoffrage 1995)—it is not
sufficient that a measurement class M be likely conditioned on a world state f; the world state
f must itself have a high prior.
An example of a non-accidental feature is collinearity, extensively studied in the literature on
contour integration and completion (Hess et al., this volume; Field et al. 1993; Uttal et al. 1970;
Elder and Goldberg 2002; Geisler et al. 2001). In classical definitions, collinearity is defined via a
criterion on the orientation difference between successive edges in a chain. In probabilistic for-
mulations (Feldman 1995, 1997; Feldman and Singh 2005), collinearity is defined by a probabil-
ity distribution over turning angles (usually a normal or von Mises distribution) centered on
0◦ (straight continuation). This distribution gives a formal definition of the graded quality the
Gestaltists called ‘good continuation’, with perfectly straight being the ‘best’ and deviations from
straight constituting progressively ‘worse’ instances. In the probabilistic conception there is no
such thing as a turning angle that is definitely collinear or definitely not; any turning angle might
be an instance of the class (i.e., have been generated from a smooth contour process), though
straighter ones are more likely to be. Moreover, collinearity understood this way satisfies the

1  More precisely, there is very strong evidence that qualitative features such as non-accidental ones have spe-
cial salience relative to ‘metric’ or quantitative features (see references in text). But it is not completely clear
whether non-accidentalness is the correct mathematical characterization of ‘qualitative’ features.
2  To see why, assume that we express the condition p( M | f ) ≈ 1 as p( M | f ) = 1 − ε (with ε some low nonze-
ro probability), and similarly p ( M | ¬f ) = ε . Similarly assume f has low prior compared to alternatives,
e.g. p ( f ) = ε and p ( ¬f ) = 1 − ε (meaning that f occurring a priori is just as unlikely an accident as M occurring
without f). With these assumptions the posterior on f when M holds will be
p ( f | M ) = p ( M | f ) p( f )
p ( M | f ) p ( f ) + p ( M | ¬f ) p ( ¬f )
(1 − ε)(ε)
= (1 − ε)(ε) + (ε)(1 − ε)
= 1/ 2
That is, the probability of f in the presence of M(1 / 2) is no greater than the probability of ¬f (also 1 / 2). That
is, if f has low prior, then even though M is non-accidental in the standard sense, observing M does not actu-
ally indicate that f is particularly likely. As Jepson & Richards showed, a small “accident probability” of ε (i.e.,
non-accidentalness) only leads to reliable inference if the feature prior p( f ) is substantially greater than ε.
938 Feldman

requirement of elevated prior needed to guarantee statistical reliability. Collinear turning angles,
generated approximately from the von Mises distribution, occur along smooth contours, but rela-
tively rarely otherwise (only ‘by accident’). Smooth contours themselves are ubiquitous in the
world because they occur along the boundaries of many objects (Ren et al. 2008). Because of this
elevated probability, image conditions suggestive of collinearity generally do reliably signal col-
linearity in the world. Like a positive test for a disease that does have a high prior, observed col-
linearity reliably signals common physical origins.

Local vs. global features


A persistent issue in the definition of visual features is the size of the image region that contributes
data to determining them. At one extreme, local features, like color, depend on data at a point or
within a small neighborhood of the image. At the other extreme, more global features reflect prop-
erties of entire objects, large image regions, or even the entire image. Few features are perfectly
local. Even nominally local image features such as motion or luminance, which are in principle
well-defined at each point in the image, often require integration over substantial regions of the
image in order to achieve stable estimates. Image motion, for example, is often ambiguous unless
a substantial image region is considered (Ullman 1979). The percept of luminance (perceived
reflectance) can involve comparisons over large image distances (Gilchrist 1977). Texture per-
ception similarly requires integration across image patches (Rosenholtz, this volume; Wagemans
et al. 1993; Pizlo et al. 1997) and is even influenced by global shape (Harrison and Feldman 2009).
Many features, like figure/ground polarity along a contour, are in principle properties of indi-
vidual points or small neighborhoods (Kim and Feldman 2009), but are nevertheless determined
in part by evidence from outside this neighborhood (Kogo and van Ee, this volume; Zhang and
von der Heydt 2010).
The ubiquitous dependence of local features on structure elsewhere in the image has led to a
widespread recognition of the insufficiency of the classical notion of receptive field (the image
region that directly influences a cell’s response), as many cells are also demonstrably influenced
by a much larger region (Fitzpatrick 2000). Whether this influence is conveyed via feedback from
later brain areas or via horizontal (lateral) connections is an area of ongoing debate (Angelucci
and Bullier 2003; Craft et al. 2007).
From a computational point of view, the difficulty posed by non-local features is the poten-
tially enormous increase in computing complexity they pose. The larger the region of the image
contributing to the determination of a feature, the more complex the computation. Partly as a
result many of the most influential modern proposals for basic feature vocabularies (e.g. SIFT,
Lowe 2004 and HMAX, Riesenhuber and Poggio 1999) rely on more sophisticated definitions of
local image features and feed-forward computational architectures. But many perceptual deci-
sions made by human observers with apparent ease depend on subtle aspects of entire objects or
scenes that are difficult to specify or model (Treisman and Paterson 1984; Biederman and Shiffrar
1987; Pomerantz and Pristach 1989; Wilder et al. 2011). To understand such non-local features
probabilistically requires the construction of appropriate generative models, in many cases multi-
dimensional and hierarchical ones.
Many examples come from the domain of shape, a quintessentially non-local class of feature
that defies easy classical definitions. Many intuitively transparent shape features, that is, lack clear
qualitative definitions, but can be understood probabilistically once suitable probabilistic models
are defined. For example, human observers can readily distinguish shapes with two parts from
those with only one (Figure 45.2), suggesting a perceptually accessible feature of two-partedness.
Probabilistic models of perceptual features 939

But the distinction between multipart and single-part shapes is notoriously difficult to model
because the decomposition of shapes into component parts does not rely on any simple attribute
but instead involves a large set of non-local shape cues (Singh and Hoffman, 2001; de Winter and
Wagemans 2006). Classically, one would need to find some parameter reflecting two-partedness,
and set a threshold above which a shape is considered to have two parts rather than one. But such
a parameter is difficult to identify, and any threshold along it would be arbitrary. One can define
a spectrum of shapes (see abscissa of Figure 45.2) that vary smoothly from shapes clearly having
one part (left of figure) to those clearly having two (right of figure). Exactly where along this spec-
trum the boundary between one and two lies is unclear.
Alternatively, one can understand this shape feature probabilistically by defining distinct gen-
erative models for one- and two-part shapes. In the framework of Feldman and Singh (2006), a
one-part model would have a single axis (see Figure 45.2) from which the shape grows laterally;
this tends to yield simple elliptical shapes with random variations. Similarly, a two-part model
would have two axes, one branching off the other (see Figure 45.2), thus tending to generate shapes
with two distinct parts. (The recursively branching aspect of this generative model makes it hier-
archical; see Goldstone et al. 1991; Sanocki 1999; Geisler and Super 2000 for diverse discussions
of hierarchy in perceptual representations.) Each model can generate shapes anywhere along the
spectrum, but with different probabilities; the distributions overlap. Figure 45.2 illustrates how
the relative probability of the two models (more specifically, their posterior ratio) varies from one
end of the shape space to the other, with clearly one-part shapes (left) having higher probability
under the one-axis model, and clearly two-part shapes (right) having higher probability under

“One-part” “Two-part”
model A model B
Probability

High Low 1 1/Low 1/High

p(A|SHAPE)/p(B|SHAPE)
Fig. 45.2  The shape feature two-parts vs. one-part, viewed probabilistically. The figure shows a
spectrum of shapes ranging from a single part (left) to two parts (right). Towards the left shapes are
well fit by a two-part model and poorly fit by a one-part model; at the right, vice versa. (Models are
shown with ribs; likelihood is diminished by variance in the lengths and directions of the ribs along
with several other factors.) The figure illustrates how the relative probability (posterior ratio) of the two
models shifts from favoring the one-part model on the left to favoring the two-part model on the right.
940 Feldman

the two-axis model, and intermediate shapes lying in between. (In the Feldman and Singh (2006)
model, variance in the lengths and angles of the ‘ribs’ [correspondences between axis points and
shape points, shown in the figure] entail poor fit between the model and shape and thus dimin-
ish likelihood. One can see by looking at the ribs in the figure how, for example, variance among
the rib lengths increases as the fit between the shape and the model degrades.) Briscoe (2008; see
Feldman et  al. 2013) found empirical evidence for an exaggerated perceptual division between
one-part and two-part shapes at about the point where the posterior ratio shifts from favoring one
model to favoring the other.
Figures 45.3 and 45.4 illustrate two other shape features, respectively straight vs. bent
(Figure 45.3) and circular vs. elliptical (Figure 45.4). Again, both these shape spaces involve
smoothly varying aspects of shape that, in a classical view, would require an arbitrary division
between shape categories, but which are more elegantly described as varying probabilistically.
Incidentally, both of these shape features (in their classical guises) are invoked in distinctions
between geons in RBC (Biederman 1987).

Probabilistic features and the statistical


structure of the environment
Viewing features probabilistically solves the three problems of the classical model described above.
1  Soft boundaries. First, and most obviously, probabilistic features avoid the hard boundaries
characteristic of classical features, instead allowing smooth variation in likelihood depending

“Straight” “Curved”
model A model B
Probability

High Low 1 1/Low 1/High

p(A|SHAPE)/p(B|SHAPE)
Fig. 45.3  The shape feature bent vs. straight viewed probabilistically. Straighter shapes (left) are well
fit by a straight-axis model and poorly fit by a bent-axis model, while more bent shapes (right) are
better fit by the bent-axis model. (Models are shown with ribs; likelihood is diminished by variance in
the lengths and directions of the ribs along with several other factors.) The figure illustrates how the
relative probability (posterior ratio) of the two models shifts from favoring the straight-axis model on
the left to favoring the bent-axis model on the right.
Probabilistic models of perceptual features 941

on image parameters. While classical features may lump together highly dissimilar objects, or
exaggerate small differences among highly similar objects, probabilistic features make categorical
distinctions only in accord with the statistical evidence.
2  Non-arbitrarinesss. Moreover, more subtly, probabilistic features also solve the problem of
arbitrariness and context insensitivity. One of the main benefits for the probabilistic approach is
that it allows us to understand and formalize the connection between the feature lexicon—the set
of features used by the observer—and the statistical structure of the world (Barlow 1961; Shepard
1994). The world has predictable probabilistic structure: forms, scenes, and spatial relations tend
to occur in systematic, reliably recurring ways. A useful feature vocabulary is one that effectively
describes the probabilistic terrain.

One way to characterize the probabilistic structure in the world is by describing its ‘modes’,
meaning statistical peaks in the probability distribution that describes the world. A  simple
example is the mean-plus-error definition of feature f = µ + e given above, which defines a mode
p(m|f) in the measurement space m. A  simple assumption is that image structure contains a
set of such modes, each corresponding to a distinct naturally occurring class; in this case the
underlying distribution is the union of such modes, called a mixture distribution (see McLachlan
and Basford 1988). An effective feature, then, would be one that distinguishes ‘natural modes’

“Circular” “Elliptical”
model A model B
Probability

High Low 1 1/Low 1/High

p(A|SHAPE)/p(B|SHAPE)
Fig. 45.4  The shape feature circular vs. elliptical viewed probabilistically. More circular shapes (left)
are well fit by a point-axis model and poorly fit by a straight-axis model, while more bent shapes
(right) are better fit by the straight-axis model. (Models are shown with ribs; likelihood is diminished
by variance in the lengths and directions of the ribs along with several other factors.) The figure
illustrates how the relative probability (posterior ratio) of the two models shifts from favoring the
point-axis model on the left to favoring the straight-axis model on the right.
942 Feldman

(Richards and Bobick 1988; Feldman 2012). Just as a single probabilistic feature separates one
modal distribution from another (see again Figure 45.1), a set of features is useful when it dis-
tinguishes the variety of modes extant in the world from each other. That is, a feature set is
meaningful when it ‘carves nature at its joints’—and the probabilistic formulation allows us to
specify where the joints are. Probabilistic features viewed this way are both non-arbitrary and
context-dependent.
Probabilistic features are non-arbitrary because their utility depends on the statistical structure
of the world they are used to describe, and a model of this statistical structure is part of the theory
supporting them. Classical features, by contrast, are defined ex nihilo; their definitions need not
in any way relate to the world. A classical definition of large/small might adopt an arbitrary size
cutoff; a probabilistic definition hinges on modal size categories in the world, and thus would be

m
2 m1

Joint distribution
p(m1,m2)

f2

B C

m2
p(m2)
m1

Marginal
distributions p(m1)

f1

Fig. 45.5  Context-sensitivity in probabilistic features. Because of the shape of the joint distribution
p(m1, m2) (shown in inset and as contour plot in main figure), feature f2 is well-defined for one value
of f1 (where it distinguishes mode A from mode B) but not for the other value of f1, which has only
one mode (C).
Probabilistic models of perceptual features 943

different for spoons (one mode about 10 cm, the other about 12 cm, say) vs. cars (one mode about
4 m, the other about 5 m).

3  Context-sensitivity. Similarly, probabilistic features are potentially sensitive to context, because


the nature of the modes to which they are attuned can change subject to the probabilistic structure
of the world (that is, the joint probability distribution p(m1, m2, . . .) of all image measurements).
A feature may usefully distinguish modes in one context (i.e. conditioned on the value of another
feature) but not in another (just as has a six-cylinder engine makes a useful distinction among
cars but not among trees). Figure 45.5 illustrates a simple joint probability distribution (that is,
a model of a world)—a mixture of three modes—in which one feature f1 distinguishes modes
for one value of another feature f2, but not for the other value of f2—an admittedly simplistic but
useful illustration of context-sensitivity.

Gestalt perceptual features, like proximity, good continuation, and closure, are infamous for
their vague definitions. The probabilistic formulation suggests that these features are difficult to
define because they mean different things in different contexts; a rich probabilistic description of
the world is required to specify exactly what they mean in the diverse situations in which they are
used. Creating such generative models is, of course, a substantial scientific challenge that has not
yet been met in many cases.

Conclusion
Perceptual features are involved in virtually all aspects of vision science, but are still treated in a
variety of divergent ways. Behavioral experiments still often use features defined by intuitively
simple criteria. At the same time, an enormous neuroscientific literature has established sophis-
ticated feature concepts based on the response properties of cells in visual cortical areas. Early
in the processing stream, these include such well-established properties as orientation, motion,
and stereoscopic disparity. Later in the stream, these include increasingly non-local properties
such as contour curvature (Pasupathy and Connor 2002), medial axis structure (Hung et al. 2012;
Lescroart and Biederman 2012), aspects of 3D shape (Yamane et al. 2008), and other less eas-
ily verbalized aspects of global shape (Op de Beeck et al. 2001; David et al. 2006; Cadieu et al.
2007). An important common theme to many modern proposals is that the visual system’s choice
of features is in some way optimized to the statistical structure of the visual world (Field 1987;
Olshausen 2003; Geisler et al. 2009). Indeed, there is a growing consensus that the underlying
neural code is inherently probabilistic (Rieke et al. 1996; Yang and Shadlen 2007). However, a fully
developed probabilistic model of visual features, in particular one that extends beyond early rep-
resentations to incorporate non-local features such as form, shape, and spatial relations, does not
yet exist. Such a model must be considered one of the main goals of the next decade of research
in the visual sciences.

Acknowledgment
I am grateful to Irv Biederman, Manish Singh, Wolf Vanpaemel, Johan Wagemans, and an anony-
mous reviewer for helpful comments. Preparation of this article was supported by NIH EY021494.
Please direct correspondence to the author at jacob@ruccs.rutgers.edu.
944 Feldman

References
Aitkin, C. (2009). Discretization of continuous features by human learners. Unpublished doctoral
dissertation, Rutgers University.
Amir, O., Biederman, I., and Hayworth, K. J. (2012). ‘Sensitivity to nonaccidental properties across various
shape dimensions’. Vision Research 62: 35–43.
Anderson, J. R. (1991). ‘The adaptive nature of human categorization’. Psychological Review 98(3): 409–29.
Anderson, J. R., and Betz, J. (2001). ‘A hybrid model of categorization’. Psychonomic Bulletin and Review
8(4): 629–47.
Angelucci, A., and Bullier, J. (2003). ‘Reaching beyond the classical receptive field of V1
neurons: horizontal or feedback axons?’ Journal of Physiology Paris 97(2–3): 141–54.
Ashby, F. G., and Alfonso-Reese, L. A. (1995). ‘Categorization as probability density estimation’. Journal of
Mathematical Psychology 39: 216–33.
Ashby, F. G., Prinzmetal, W., Ivry, R., and Maddox, W. T. (1996). ‘A formal theory of feature binding in
object perception’. Psychological Review 103: 165–92.
Barlow, H. B. (1961). ‘Possible principles underlying the transformation of sensory messages’. In Sensory
communication, edited by W. A. Rosenblith, pp. 217–234 (Cambridge: M.I.T. Press).
Biederman, I. (1987). ‘Recognition by components: a theory of human image understanding’. Psychological
Review 94: 115–47.
Biederman, I., and Shiffrar, M. (1987). ‘Sexing day-old chicks’. Journal of Experimental
Psychology: Learning, Memory, and Cognition 13: 640–5.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–44.
Blair, M., and Homa, D. L. (2005). ‘Integrating novel dimensions to eliminate category exceptions: when
more is less’. Journal of Experimental Psychology: Learning, Memory and Cognition 31(2): 258–71.
Briscoe, E. (2008). Shape skeletons and shape similarity. Unpublished doctoral dissertation, Rutgers
University.
Briscoe, E., and Feldman, J. (2011). ‘Conceptual complexity and the bias/variance tradeoff ’. Cognition
118: 2–16.
Cadieu, C., Kouh, M., Pasupathy, A., Connor, C. E., Riesenhuber, M., and Poggio, T. (2007). ‘A model of
V4 shape selectivity and invariance’. Journal of Neurophysiology 98: 1733–50.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). ‘A neural model of figure-ground
organization’. Journal of Neurophysiology 97(6): 4310–26.
David, S. V., Hayden, B. Y., and Gallant, J. L. (2006). ‘Spectral receptive field properties explain shape
selectivity in area V4’. Journal of Neurophysiology 96: 3492–505.
De Baene, W., Ons, B., Wagemans, J., and Vogels, R. (2008). ‘Effects of category learning on the stimulus
selectivity of macaque inferior temporal neurons’. Learning and Memory 15: 717–27.
De Winter, J., and Wagemans, J. (2006). ‘Segmentation of object outlines into parts: A large-scale
integrative study’. Cognition 99(3): 275–325.
Elder, J. H., and Goldberg, R. M. (2002). ‘Ecological statistics of Gestalt laws for the perceptual
organization of contours’. Journal of Vision 2(4): 324–53.
Feldman, J. (1995). ‘Perceptual models of small dot clusters’. In Partitioning data sets, edited by I. J. Cox,
P. Hansen, and B. Julesz, pp. 331–357 DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, vol. 19.
Feldman, J. (1997). ‘Curvilinearity, covariance, and regularity in perceptual groups’. Vision Research
37(20): 2835–48.
Feldman, J. (2000). ‘Minimization of Boolean complexity in human concept learning’. Nature 407: 630–3.
Feldman, J. (2007). ‘Formation of visual ‘objects’ in the early computation of spatial relations’. Perception
and Psychophysics 69(5): 816–27.
Probabilistic models of perceptual features 945

Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychological Review 116(4): 875–87.
Feldman, J. (2012). ‘Symbolic representation of probabilistic worlds’. Cognition 123: 61–83.
Feldman, J. (this volume). ‘Bayesian models of perceptual organization’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Feldman, J., and Singh, M. (2005). ‘Information along contours and object boundaries’. Psychological
Review 112(1): 243–52.
Feldman, J., and Singh, M. (2006). ‘Bayesian estimation of the shape skeleton’. Proceedings of the National
Academy of Science 103(47): 18014–19.
Feldman, J., Singh, M., Briscoe, E., Froyen, V., Kim, S., and Wilder, J. D. (2013). ‘An integrated Bayesian
approach to shape representation and perceptual organization’. In Shape perception in human and
computer vision: an interdisciplinary perspective, edited by S. Dickinson and Z. Pizlo, pp 55–70.
(New York: Springer).
Field, D. J. (1987). ‘Relations between the statistics of natural images and the response properties of cortical
cells’. Journal of the Optical Society of America A 4(12): 2379–94.
Field, D. J., Hayes, A., and Hess, R. F. (1993). ‘Contour integration by the human visual system: Evidence
for a local “association field”’. Vision Research 33(2): 173–93.
Fitzpatrick, D. (2000). ‘Seeing beyond the receptive field in primary visual cortex’. Current Opinion in
Neurobiology 10: 438–43.
Geisler, W. S., and Super, B. J. (2000). ‘Perceptual organization of two-dimensional patterns’. Psychological
Review 107(4): 677–708.
Geisler, W. S., Perry, J. S., Super, B. J., and Gallogly, D. P. (2001). ‘Edge co-occurrence in natural images
predicts contour grouping performance’. Vision Research 41: 711–24.
Geisler, W. S., Najemnik, J., and Ing, A. D. (2009). ‘Optimal stimulus encoders for natural tasks’. Journal of
Vision 9(13): 1–16.
Gigerenzer, G., and Hoffrage, U. (1995). ‘How to improve Bayesian reasoning without
instruction: Frequency formats’. Psychological Review 102(4): 684–704.
Gilchrist, A. L. (1977). ‘Perceived lightness depends on perceived spatial arrangement’. Science 195: 185–87.
Goldstone, R. L., and Steyvers, M. (2001). ‘The sensitization and differentiation of dimensions during
category learning’. Journal of Experimental Psychology 130(1): 116–39.
Goldstone, R. L., Medin, D. L., and Gentner, D. (1991). ‘Relational similarity and the nonindependance of
features in similarity judgments’. Cognitive Psychology 23: 222–62.
Goodman, N. D., Tenenbaum, J. B., Feldman, J., and Griffiths, T. L. (2008). ‘A rational analysis of
rule-based concept learning’. Cognitive Science 32(1): 108–54.
Green, D. M., and Swets, J. A. (1966). Signal detection theory and psychophysics. (New York: Wiley).
Harnad, S. (1987). Categorical perception: the groundwork of cognition. (Cambridge: Cambridge
University Press).
Harrison, S., and Feldman, J. (2009). ‘Influence of shape and medial axis structure on texture perception’.
Journal of Vision 9(6): 1–21.
Hess, R. F., May, K. A., and Dumoulin, S. O. (this volume). ‘Contour integration: Psychophysical,
neurophysiological and computational perspectives’. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Hung, C. C., Carlson, E. T., and Connor, C. E. (2012). ‘Medial axis shape coding in macaque infer-
otemporal cortex’. Neuron 74(6): 1099–113.
Jepson, A., and Richards, W. A. (1992). ‘What makes a good feature?’ In Spatial vision in humans and
robots, edited by L. Harris and M. Jenkin, pp. 89–125 (Cambridge: Cambridge University Press).
Kanizsa, G., and Gerbino, W. (1976). ‘Convexity and symmetry in figure-ground organization’. In Vision
and artifact, edited by M. Henle, pp. 25–32. (New York: Springer).
946 Feldman

Kellman, P. J., and Shipley, T. F. (1991). ‘A theory of visual interpolation in object perception’. Cognitive
Psychology 23: 141–221.
Kersten, D., Mamassian, P., and Yuille, A. (2004). ‘Object perception as Bayesian inference’. Annual Review
of Psychology 55: 271–304.
Kim, S.-H., and Feldman, J. (2009). ‘Globally inconsistent figure/ground relations induced by a negative
part’. Journal of Vision 9(10): 1–13.
Knill, D. C., and Richards, W. (Eds.). (1996). Perception as Bayesian inference. (Cambridge: Cambridge
University Press).
Koenderink, J. J. (1993). ‘What is a “feature”?’ Journal of Intelligent Systems 3(1): 49–82.
Kogo, N., and van Ee, R. (this volume). ‘Neural mechanisms of figure-ground organization: Border-
ownership, competition and perceptual switching’. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Lee, M. D., and Navarro, D. J. (2002). ‘Extending the ALCOVE model of category learning to featural
stimulus domains’. Psychonomic Bullein and Review 9(1): 43–58.
Lescroart, M. D., and Biederman, I. (2013). ‘Cortical representation of medial axis structure’. Cerebral
Cortex, 23, 629–637. doi: 10.1093/cercor/bhs046
Lowe, D. G. (1987). ‘Three-dimensional object recognition from single two-dimensional images’. Artificial
Intelligence 31: 355–95.
Lowe, D. G. (2004). ‘Distinctive image features from scale-invariant keypoints’. International Journal of
Computer Vision 60(2): 91–110.
McLachlan, G. J., and Basford, K. E. (1988). Mixture models: inference and applications to clustering.
(New York: Marcel Dekker).
Medin, D. L., and Schaffer, M. M. (1978). ‘Context model of classification learning’. Psychological Review
85: 207–38.
Nosofsky, R. M. (1986). ‘Attention, similarity, and the identification-categorization relationship’. Journal of
Experimental Psychology: General 115(1): 39–61.
Nosofsky, R. M., Palmeri, T. J., and McKinley, S. C. (1994). ‘Rule-plus-exception model of classification
learning’. Psychological Review 101(1): 53–79.
Olshausen, B. (2003). ‘Principles of image representation in visual cortex’. In The Visual Neurosciences,
edited by L. M. Chalupa and J. S. Werner, pp. 1603–15 (Cambridge: M.I.T. Press).
Op de Beeck, H., Wagemans, J., and Vogels, R. (2001). ‘Inferotemporal neurons represent low- dimensional
configurations of parameterized shapes’. Nature Neuroscience 4(12): 1244–52.
Pasupathy, A., and Connor, C. E. (2002). ‘Population coding of shape in area V4’. Nature Neuroscience
(12): 1332–8.
Peterson, M. (this volume). ‘Low-level and high-level contributions to figure-ground organization’. In
Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University
Press).
Pizlo, Z., Salach-Golyska, M., and Rosenfeld, A. (1997). ‘Curve detection in a noisy image’. Vision Research
37(9): 1217–41.
Poggio, G. F., and Poggio, T. (1984). ‘The analysis of stereopsis’. Annual reviews of neuroscience 7: 379–412.
Pomerantz, J. R., and Pristach, E. A. (1989). ‘Emergent features, attention, and perceptual glue in visual
form perception’. Journal of Experimental Psychology: Human Perception and Performance 15(4): 635–49.
Posner, M. I., and Keele, S. W. (1968). ‘On the genesis of abstract ideas’. Journal of Experimental Psychology
77(3): 353–63.
Reed, S. K. (1972). ‘Pattern recognition and categorization’. Cognitive Psychology 3: 382–407.
Ren, X., Fowlkes, C. C., and Malik, J. (2008). ‘Learning probabilistic models for contour completion in
natural images’. International Journal of Computer Vision 77: 47–63.
Probabilistic models of perceptual features 947

Richards, W. A., and Bobick, A. (1988). ‘Playing twenty questions with nature’. In Computational processes
in human vision: An interdisciplinary perspective, edited by Z. Pylyshyn, pp. 3–26 (Norwood, NJ: Ablex
Publishing Corporation).
Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996). Spikes: exploring the neural
code. (Cambridge: M.I.T. Press).
Riesenhuber, M., and Poggio, T. (1999). ‘Hierarchical models of object recognition in cortex’. Nature
Neuroscience 2: 1019–25.
Rosch, E. H. (1973). ‘Natural categories’. Cognitive Psychology 4: 328–50.
Rosenholtz, R. (this volume). ‘Texture perception’. In Oxford Handbook of Perceptual Organization, edited
by J. Wagemans. (Oxford: Oxford University Press).
Sanocki, T. (1999). ‘Constructing structural descriptions’. Visual Cognition 6(3/4): 299–318.
Schyns, P. G., Goldstone, R. L., and Thibaut, J.-P. (1998). ‘The development of features in object concepts’.
Behavioral and brain Sciences 21: 1–54.
Shepard, R. N. (1994). ‘Perceptual-cognitive universals as reflections of the world’. Psychonomic Bulletin and
Review 1(1): 2–28.
Singh, M., and Hoffman, D. D. (2001). ‘Part-based representations of visual shape and implications
for visual cognition’. In From fragments to objects: segmentation and grouping in vision, advances in
psychology, edited by T. Shipley and P. Kellman, vol. 130, pp. 401–59. (New York: Elsevier).
Smith, E., and Medin, D. (1981). Categories and concepts. (Cambridge, MA: Harvard University Press).
Stilp, C. E., Rogers, T. T., and Kluender, K. R. (2010). ‘Rapid efficient coding of correlated complex acoustic
properties’. Proceedings of the National Academy of Sciences 107(50): 21914–19.
Treisman, A., and Gelade, G. (1980). ‘A feature-integration theory of attention’. Cognitive Psychology
12: 97–136.
Treisman, A., and Paterson, R. (1984). ‘Emergent features, attention, and object perception’. Journal of
Experimental Psychology: Human Perception and Performance 10(1): 12–31.
Ullman, S. (1979). The Interpretation of Visual Motion. (Cambridge, MA: M.I.T. Press).
Ullman, S., Vidal-Naquet, M., and Sali, E. (2002). ‘Visual features of intermediate complexity and their use
in classification’. Nature neuroscience 5(7): 682–7.
Uttal, W. R., Bunnell, L. M., and Corwin, S. (1970). ‘On the detectability of straight lines in visual noise: an
extension of French’s paradigm into the millisecond domain’. Perception and Psychophysics 8(6) 385–8.
Vogels, R., Biederman, I., Bar, M., and Lorincz, A. (2001). ‘Inferior temporal neurons show greater
sensitivity to nonaccidental than to metric shape differences’. Journal of Cognitive Neuroscience
13(4): 444–53.
Wagemans, J. (1992). ‘Perceptual use of non-accidental properties’. Canadian Journal of Psychology
46(2): 236–79.
Wagemans, J. (1993). ‘Skewed symmetry: a nonaccidental property used to perceive visual forms’. Journal of
Experimental Psychology: Human Perception and Performance 19(2): 364–80.
Wagemans, J., van Gool, L., Swinnen, V., and van Horebeek, J. (1993). ‘Higher-order structure in regularity
detection’. Vision Research 33(8): 1067–88.
Wilder, J., Feldman, J., and Singh, M. (2011). ‘Superordinate shape classification using natural shape
statistics’. Cognition 119: 325–40.
Yamane, Y., Carlson, E. T., Bowman, K. C., Wang, Z., and Connor, C. E. (2008). ‘A neural code
for three-dimensional object shape in macaque inferotemporal cortex’. Nature Neuroscience
11(11): 1352–60.
Yang, T., and Shadlen, M. N. (2007). ‘Probabilistic reasoning by neurons’. Nature 447: 1075–82.
Zhang, N., and von der Heydt, R. (2010). ‘Analysis of the context integration mechanisms underlying
figure-ground organization in the visual cortex’. Journal of Neuroscience 30(19): 6482–96.
Chapter 46

On the dynamic perceptual


characteristics of Gestalten:
Theory-based methods
James T. Townsend and Michael J. Wenger

Introduction
A major historical event transpired in 2012, marking the centennial anniversary of the year in
which Wertheimer published his famous monograph, ‘Experimental Studies of the Perception
of Movement’. Many published reviews of progress, experimental and theoretical studies, and
stock-taking essays marked this signal year. Over the intervening century there has been
inspiring growth in the corpus of data related to Gestalt phenomena and in suggestions as to
operational definitions of holism.
The very existence of the present volume on perceptual organization is a testament to the
importance and new vitality of many interlocked themes within this fold. Especially recom-
mended for readers of this chapter are collateral chapters by Bertamini and Casati, Feldman,
Kimchi, Pomerantz and Cragin, Behrmann, and van Leeuwen.
With certain exceptions, it seems fair to make the following observations about this body of
work: first, there is a noticeable absence of a generally accepted, unified theory of Gestalt phe-
nomena. Second, aside from a few quite specific models of performance in some particular
sphere, rigorous definitions and quantitative models are scarce. Third, in the realm of quantitative
dynamic information-processing characteristics, definitions, proposed explanations, and deriva-
tions regarding concepts of holistic vs non-holistic objects are rare if extant at all. Our focus is on
the third of these.
Our primary goal is the establishment of a mathematical language within which the prop-
erties of strategic concepts that describe and purport to distinguish configural as opposed to
non-configural perception can be elucidated. A  secondary goal is to propose what seem to
be reasonable specifications, within this language, of configural vs non-configural percep-
tion. The first goal is theoretically noncommittal, and should be relatively uncontroversial. The
second amounts to stating hypotheses (we call them ‘working axioms’) about how configural
vs non-configural processing may take place. However, it is important to point out that this
approach in no way pretends to be a computational model of configural perception. Rather,
it should be viewed as a meta-theoretical set of methodologies that are capable of assessing
a number of critical mechanisms associated with configural vs non-configural perception,
and hypotheses about them. As such, their application should aid in guiding the construc-
tion of principled, parameterized, computational models of configural and non-configural
perception.
On the Dynamic Perceptual Characteristics of Gestalten 949

A Meta-Theoretical Language for Dynamic Perceptual


Gestalten: Systems Factorial Technology
Our approach (O’Toole, Wenger, and Townsend 2001; Townsend and Nozawa 1995; Townsend
and Wenger 2004a, 2004b; Wenger and Townsend 2001) is founded on a meta-theory and tax-
onomy of key properties of elementary psychological systems. By meta-theory we mean a broad
theoretical set of axioms, usually expressible in mathematical or logical syntax, within which a
set of explicitly parameterized models resides (i.e. obeys the axioms). A key characteristic of our
approach to characterizing Gestalten1 is that each of the concepts are defined mathematically, in
the most general manner possible, using the formalisms of probability theory. Space precludes our
providing all of the technical details, and so we suggest that interested readers pursue these in a set
of our more technical publications (see, in particular Townsend and Ashby 1983; Townsend and
Nozawa 1995; Townsend and Wenger 2004b). Readers interested in an historical overview of the
use of these constructs should consult Townsend and Wenger (2004a).
The relationship of constituent parts to the whole that they comprise has a long history exem-
plified in eighteenth- and nineteenth-century philosophy (see also Albertazzi, this volume). The
philosophical precursor of Gestalt psychology (as in Wertheimer, Koehler, and Koffka) appears in
phenomenological schools vs the forefather of structuralism (as in Wundt and Titchener). Over sev-
eral generations, it has been supposed that the three founders of modern Gestalt psychology always
espoused the precept ‘the whole is greater than the sum of the parts’. As pointed out by Kubovy and
Pomerantz (1981), there is no record of such a proclamation. In fact, Koehler seems to have suffered
dismay at continually being associated with that quotation. And Koffka (1935) takes considerable
pains to emphasize that ‘the whole is something else than the sum of the parts, because summing is
a meaningless procedure, whereas the whole-part relationship is meaningful’. This broader interpre-
tation of the forefathers’ views is more compatible with the present study of both potential superior-
ity as well as potential inferiority, depending on circumstances, as we shall learn below.
So, even in colloquial language, configural perception should somehow differ from the folk idea
of a percept of an object being merely ‘the sum of the parts’. Somehow, the parts of a perceptual
object should interact in some manner. In terms of striving to locate a single term which, at least
globally, if somewhat indefinitely, captures the concept of interaction, we are impelled to consider
opposing concepts, and especially that of independence or the lack thereof. Thus, a key concept
will be probabilistic or stochastic independence. Suppose A and B are the two events in ques-
tion and we wish to express their joint probability. Then P (A ∩ B) = P ( A ) × P (B), and this
foundational definition can be used to define independence with respect to either the times or
frequencies of events (Townsend and Ashby 1983).
In addition to independence, there are other critical issues that must be taken into account, and
we refer to the cumulative development of these issues as systems factorial technology (Townsend
and Thomas 1994). Figure 46.1 illustrates a subset of these distinctions schematically. One is archi-
tecture: is perception of any set of parts of an object accomplished in parallel (simultaneously),
serially (one at a time with no temporal overlap), or in some hybrid fashion? Serial processing is
defined by a set of discrete items or subsystems (e.g. stages) being worked on one at a time. Parallel

  With a nod to linguistic refinement, we will follow German usage in using leading capitals when Gestalt
1

appears in noun form but lower case when employed adjectively. Also, Gestalten with the added en will follow
standard German to indicate the plural.
950 Townsend and Wenger

Stopping rule

Self-terminating/ Exhaustive/
Architecture Minimum time Maximum time

g
g e sin e
sin
ces ops Tim ces ops Tim
Pro Pro st
st
g
sin
ces arts g
Pro st sin
ces arts
Pro st
Serial B
B

A
A

e g
Tim sin e
sin
g ces ops Tim
ces s Pro st
Pro stop
g
sin
ces arts sin
g
Pro st ces arts
Pro st
Parallel A A
B B

Fig. 46.1  Schematic representation of the critical distinctions with respect to processing architecture
and stopping-rule. In these examples, two processes (A and B) execute either serially (sequentially)
or in parallel. Once begun, processing continues until either the first (or fastest) or last (or slowest)
process completes.

processing is defined by a set of discrete items or subsystems (e.g. channels) being worked on
simultaneously. In Figure 46.1, this can be understood in terms of the temporal arrangement of
the two processes (A and B). Formally, this distinction is captured in the form of the probability
distribution for overall finishing time (the externally observable reaction time, in the terms of an
experiment), which is composed from the probability distributions on the (usually unobservable)
finishing times of the two internal processes. General forms for the four possibilities considered in
Figure 46.1 can be found in Appendix A of Townsend and Nozawa (1995, pp. 351–354).
Of course, there are many kinds of architectures other than serial and parallel, although these
have received the bulk of the attention of the cognitive community. For instance, hybrid models
could be a mixture of serial and parallel models, or more complex network models of which parallel
and serial networks comprise a special case (Schweickert 1978; Schweickert and Townsend 1989).
Another important type of system is constituted by a sequence of processes but with overlap of the
processing times, unlike true serial processing (e.g. Taylor 1976). When the next stage can start at the
same time as the previous one, we have the concept of continuous flow (e.g. Ashby 1982; McClelland
1979; Schweickert and Mounts 1998). These models are of great value, but they currently lie outside
the scope of methodologies that can test them against ordinary parallel or serial systems.
Not quite so paramount is the notion of the decisional stopping rule, or ‘stopping rule’ for
short. Suppose, as in many experiments and real-life situations, that a subset of features is suf-
ficient to make a correct response. In that case, a reasonable question is whether all the features
On the Dynamic Perceptual Characteristics of Gestalten 951

are processed even if they need not be. In the psychological literature, there are three cases of
interest:
1 Exhaustive or maximum-time processing. All aspects (e.g. features) are processed. In the case of
two elements, this can be represented by the Boolean AND operator.
2 Race or minimum-time processing. Processing ceases as soon as a single aspect is processed. In
the case of two elements, this can be represented by the Boolean OR operator.
3 Single-target self-termination. There is only one aspect in an object that is capable of
determining the correct response, and the system stops when and only when that aspect is
completed.
Since we typically think of a Gestalt as being a total unity, one axiom or part of a definition of
Gestalt processing might be that a Gestalt is perceived as a unity, which would imply exhaustive
processing of all features, even though a correct decision could be made on the basis of only a
subset of the features.
Finally, the concept of workload capacity turns out to be pivotal in our working definition of
Gestalt processing. This issue concerns how increasing workload—for instance, objects or faces
which are made up of fewer or a larger number of aspects—affects processing efficiency. A tra-
ditional approach might be to use mean Reaction Time (RT). However, we have developed an
instrument which takes into account the entire distribution of RTs in greater vs lesser workload
conditions (Townsend and Nozawa 1995; Townsend and Wenger 2004b; Wenger et al. 2010). As
will become increasingly apparent throughout this chapter, capacity will serve as a prime gauge of
configural superiority, introduced above as a potential marker of Gestalt perception.
The workload capacity yardstick consists of the predictions of a standard parallel model. This
model assumes parallel processing, stochastic independence, and unlimited capacity. It will prove
highly useful in our assembly of a yardstick for measuring capacity in arbitrary systems. The
capacity statistic C(t) measures the speed of channels acting together in comparison with the
speed predicted by the standard parallel model.
Stochastic independence implies that each channel’s processing time is independent of all oth-
ers. Unlimited capacity stipulates that the marginal processing time distribution of each channel is
invariant across any changes in workload. Informally, unlimited capacity implies that the average
processing time of any channel is unaffected by the overall workload on the system. It is critical
to observe that in processing a finite number of items, the decisional stopping rule will affect the
overall decision time. For instance, minimum-time (OR) processing requires that only a single
item be finished. On the other hand, maximum-time (AND) processing demands that all items
be completed. Therefore, we must derive capacity measures that take the appropriate stopping
rule into account. This can be accomplished for any logical stopping rule (e.g. find the one target
among five distractors), but we will focus on the most commonly studied in the literature so far
and these are the OR and AND decision rules.
If at any point in time, C(t)  =  1, the system is said to be of unlimited capacity at that time
point. Overall, the system is acting just as efficiently as the standard parallel model but not more
efficiently. If at time t, C(t) > 1, we call the system super-capacity. In super-capacity systems, the
individual channels are running faster than when they were working alone. Finally, if C(t) < 1 at
time t, the system is said to be of limited capacity at that time point.
Thus, the bound separating super-capacity or limited capacity is simply C(t) = 1. In addition,
it can be seen that C(t) permits observation and predictions of workload capacity over an entire
range of time. For instance, we have observed that in some tasks, people can be super-capacity
early on, but reveal limited capacity later in time. In contrast, most modern conceptions take
capacity as a non-dynamic, single number.
952 Townsend and Wenger

Working axioms for configural perception


The notion of working axioms is motivated by the following considerations. First, the term ‘axiom’
suggests a proposition accepted at first without proof in order to study its consequences. Second,
the modifier ‘working’ emphasizes the evolving interaction of theoretical correlates with ongoing
experimentation. The driving idea in these working definitions is that a configuration or Gestalt
is evidenced through properties such as performance efficiency, as captured by our workload
capacity statistic. The potential material existence of brain systems which carry out configural
processing operations that have these properties are pointed out in Kubilius, Wagemans and Op
de Beeck (2011).
First, we distinguish between two potential effects of configurality: configural superiority and
configural inferiority. Configural superiority is manifested when the perceptual cohesion of parts
renders more efficient perception than when the parts are processed independently. Configural
inferiority is manifested when the perceptual cohesion of parts renders less efficient perception
than when the parts are processed independently.

Working axioms for configural perception 1


For configural superiority:
1.1  Gestalt perception is parallel on any partition of the figure.
1.2  (A) When a configural object, such as a face, is represented by a set of features, Gestalt perception
is based on mutually facilitatory parallel channels. (B) In the limit of 1.2(A), perception of a
configural object becomes holistic in the sense that parts start and finish simultaneously.
1.3  Gestalt perception is super-capacity on any partition, such as features, of the figure.
1.4  Gestalt perception is exhaustive on any partition of the figure in configural superiority
designs.
The neutral term ‘partition’ is used in lieu of loaded expressions such as ‘feature’ to allow for future
interpretations that may not exist presently. We use this word in the formal sense as any division
of a pattern whose set union of the parts equals the pattern. Subsequent empirical research will
determine their veracity. It can thus be observed that whereas 1.2 refers to features, the other stip-
ulations refer to arbitrary partitions. This is because we feel comfortable at this point in making
the 1.2 assertion only for a psychologically meaningful segregation of the figure’s parts whereas we
are stating that however the researcher divides up the figures those constraints (1.1, 1.2, 1.4) will
be in force. For instance, even if the investigator divides up a face into the bottom vs top halves
instead of natural features like eyes, mouth, and so on, it is proposed that the perceptual system
will still process the former in parallel.
In a sense, we view configural inferiority as being evidenced when an observer’s task cannot
benefit and may suffer, from a Gestalt’s systemic properties as outlined in Working Axiom 1 just
above. As such, we encapsulate this propensity as follows:

Working axioms for configural perception 2


2. For configural inferiority:  there is inevitably a strong tendency for all of 1.1–1.4 to be imple-
mented. Certain perceptual tasks fail to take advantage of the configurality and may even impede
performance.
Working Axiom 1.1 is probably the least controversial. It is doubtful if many investigators would
wish to assume that Gestalt perception takes place in a serial fashion (although there may be
On the Dynamic Perceptual Characteristics of Gestalten 953

circumstances where Gestalt organization can proceed in a more or less sequential manner; see,
e.g. Roelfsema and Houtkamp 2011).
Although parallel processing is an obvious choice with regard to the architecture associated
with configural processing, a question immediately arises as to the stochastic independence of the
parallel channels. For instance, the classic parallel race model assumes stochastically independ-
ent parallel channels (e.g. Egeth 1966; Smith 2000; Townsend 1974). Furthermore, the channels
could actually prove to be negatively (i.e. mutually inhibitory) interactive, which seems far from
the sense of configurality. Hence, we posit that in many tasks, a positive interaction will lead to
workload capacity results that are super-capacity. Parallel models possessing facilitatory chan-
nels readily produce super-capacity while mutually inhibitory channels evoke limited capacity
(e.g. Egeth 1966; Smith 2000; Townsend 1974).
Super-capacity processing obviously exceeds standard parallel processing in efficiency and is a
palpable example of configural superiority, as intimated in Working Axiom 1.3. The triad of par-
allelism, positive interactions, and super-capacity seems to be compatible with certain stochastic
versions of Hebbian learning. Thus, a stochastic Hebbian model advanced in a dissertation by
Blaha (2010) captures many aspects of a dramatic improvement of performance by observers in a
configural learning experiment.
The intent of Working Axiom 1.4 is to capture the oft-heard claim that ‘holistic face percep-
tion is obligatory’, and presumably this admonition might also refer to any Gestalt (although
see, for example, Plomp and van Leeuwen 2006; Stins and van Leeuwen 1993; van Leeuwen and
Lachmann 2004, also see Behrmann, Richler, Avidan, & Kimchi, this volume; Koenderink on
Gestalt templates, this volume). Although there may be more than one meaning to this statement,
at least one appears to be that if one part of a face is gazed at, all parts are perceived. Moreover,
it is also motivated by the notion of a Gestalt existing as a unity. If a unity is processed, no part
should be omitted.
Working Axiom 2 supplements the original list of Wenger and Townsend (2001; see also
O’Toole et al., 2001), since the latter focused on configurality superiority. To encompass phenom-
ena associated with configural inferiority, more facets are needed.

The Garnerian approach


As remarked earlier, Garner’s research on dimensions or features which were either susceptible, or
not, to perceptual analysis, has proven to be extremely influential. In fact, it may be fair to say that
the majority of the research effort on topics relating to Gestalten in face perception in particular
owes a great deal to his approach and in a sizeable number of cases, to his actual paradigms.
Garner made seminal contributions to the study of Gestalten. Among a number of innovations,
his notions of separable vs integral dimensions have been particularly influential. Garner inter-
preted these fundamental concepts through operational definitions that resulted in predictions
in experiments designed to invoke those definitions. Separability intuitively captures the idea of
being susceptible to analysis and independence. Integrality is just the opposite. We will learn that
all of the processing issues limned in earlier participate in computational investigations of sepa-
rability and analogous themes.
It is useful to parse out these notions a bit before proceeding. Although integral dimensions
could, in principle, be learned or ‘welded together’ through practice, they could just be that
way due to more or less innate properties of our sensory-cognitive systems. Perhaps the prin-
cipal example is that of perception of hue and saturation in colour vision. These dimensions
appear to be inborn as far as we can tell (e.g. Fific, Nosofsky, and Townsend 2008). At some
954 Townsend and Wenger

risk of oversimplification, Garner’s major operational specifications can be divided into two
major types:
1 Integrality of dimensions can hurt performance when the task involves attention to one
dimension and other dimensions, with which the attended one is integral, are present and
varied (usually more or less randomly) in the trial-to-trial presentations. This operationalism
eventuates in Garner filtering tasks and, if inferior performance does erupt, the phenomenon
of Garner interference.
2 Integrality of dimensions can help performance if perception of any of two or more dimensions
is redundant with regard to specifying the correct response.
In carrying out either 1 or 2, it can make a difference as to whether, say, the studied dimension or
item is, in a control condition, accompanied by nothing else (e.g. a blank), or whether a neutral
distractor is used. In any case, it is clear that in 1 having the full Gestalt present, when the observer
is supposed to focus on only one of the dimensions, may be deleterious.
This phenomenon is clearly a type of configural inferiority. The assistance provided by the pres-
ence of the Gestalt-interactive pair (or more) of redundant dimensions (as opposed to their sim-
ple additivity) is a kind of configural superiority. However, the latter term constitutes a very broad
spectrum of potential mechanisms and empirical consequences as opposed to the narrower focus
of a redundant targets effect.
Yet we do need to observe that while both of these Garnerian concepts intuitively capture
themes of Gestalten, they are by no means logically related to one another. Experiments could
logically find any combination of outcomes regarding them. Likewise, qualitative and quantita-
tive models of perception could well predict any particular combination of them. Of course, they
could be linked in any particular system.
From this standpoint, we learn that point 2---the redundancy facilitation effect---must be
mildly modified to be theoretically sound. Thus, in the case of accuracy, even when the dimen-
sions are stochastically independent, their redundancy leads to performance superior to a single
dimension by itself (a prediction known as probability summation). A completely analogous pre-
diction appears with RTs in the sense that independent redundant dimensions predict superiority
(i.e. faster RTs or improved accuracy) over single dimensions at least in the presence of parallel
processing. Thus, redundant superiority per se need not be associated with integrality or any par-
ticular form of Gestalt behaviour. We can view this state of affairs through our workload capacity
statistic. As a prime example using RTs within a redundant target design, assume for simplicity
that both of the two channels operate equally quickly. Then, if C(t) > 1/2, a redundancy gain will
occur (performance will exceed that of either of the channels stimulated alone). When C(t) = 1,
the standard parallel model prediction, the benefits of redundancy are reasonably dramatic.
Accordingly, a straightforward tactic to strengthen the Garner redundancy concept to rule out
the increase in speed due to redundancy alone in non-configural systems is to inspect data to see
if performance exceeds that expected from such systems. This concept, of performance contrasted
with what can be predicted from ordinary parallel processing has some history (e.g. Raab 1962;
Townsend and Nozawa 1995). However, historically, it took some time for notions such as co-
activation (e.g. Colonius and Townsend 1997; Miller 1982) and super-capacity (e.g. Townsend
and Nozawa 1995; Townsend and Wenger 2004b) to develop.
Now recall that the typical Garner filtering or interference experiment assays performance on
a single target, within both the control, fixed distractor dimension as well as the experimental
varying-value distractor dimension. There is no way to use any kind of redundancy to improve
performance. However, just as in the case of superiority, the causal mechanism for interference
On the Dynamic Perceptual Characteristics of Gestalten 955

could exist at one or more of several levels, from a relatively low perceptual echelon to a higher
order attentional level.

Configurality superiority, inferiority: a new view


At this point, it is relevant to reintroduce our workload capacity function C(t), which measures
performance in a redundant signals condition against that expected from a certain special kind
of parallel model. Recall that any model of this class possesses stochastically independent chan-
nels and unlimited capacity (the efficiency of each channel does not depend on how many other
channels are working). The bound separating super-capacity from standard parallel performance
is basically a prediction from that very class of parallel models, giving us the ability to quantify
configurality as a function of time within the performance of a single observer (e.g. super-capacity
early on, but limited capacity later).
With respect to other important characterizations of Gestalten, particularly those of Garner, it
is important to note that Pomerantz and Garner (e.g. 1973, see also Pomerantz and Cragin this
volume) distinguish between integrality and configurality, whereas at this point, we do not. In
their view, integral dimensions are fused together in such a way that both redundancy gains are
found (although it is not typically determined whether these exceed what could be expected from
simple statistical race gains) as well as inferiority in Garner interference tasks. Several modelling
potentials exist for such findings. For instance, facilitatory, interactive parallel channels are a pos-
sibility as is co-activation as far as the superiority goes. Possibly, attentional failure in selecting the
pertinent dimension might cause inferiority in the interference design.
Their notion of configurality, which began to appear in the mid-1970s (first under the name
‘nominal’, see, e.g., Garner 1974, pp. 168–169), is seen rather differently. Pomerantz and colleagues
sometimes use the metaphor of a single faucet that mixes hot and cold water (Pomerantz, per-
sonal communication 2013). This metaphor seems very close to our mathematical description of
co-activation (e.g. Townsend and Nozawa 1995). In any case, they associate this type of Gestalt
effect with the presence of inferiority in interference designs, but a failure to discover superiority
in redundancy conditions. Their idea is that in the control condition, observers are able to take
advantage of the Gestalten formed by the figures in such a way that the redundancy conditions
permit no further gains. The inferiority in the interference conditions, on the other hand, is due
simply to an inability to profit from the Gestalthood of the figures, rather than to true interference.

A Critical Complimentary Consideration of Gestalten:


General Recognition Theory and Violations of Independence
and Separability
As noted earlier, a starting point for our meta-theoretical characterization of Gestalten is the
construct of independence. Specifically, we take the core meaning of a Gestalt to arise from the
antithesis of independence. As we have developed our theory in the time domain, we have in par-
allel (no pun intended) developed a characterization of Gestalten based on constructs originally
developed to address many of the issues associated with the Garnerian notions of integrality and
separability.
The specific theoretical foundation for this complementary approach is known as general rec-
ognition theory (GRT, Ashby and Townsend 1986). GRT is a multidimensional generalization of
signal detection theory (Green and Swets 1966), which extends the distinction between differen-
tial levels of stimulus information and the manner in which that information is used from simple
956 Townsend and Wenger

one-dimensional stimuli to multidimensional combinations. Many of the earliest applications of


GRT to questions of integrality and separability were made in the context of categorization judg-
ments (e.g. Ashby et al. 2001; Ashby and Maddox 1994; Ashby, Boynton, and Lee 1994) with later
applications including consideration of Gestalt perception of objects and faces (e.g. Cornes et al.
2011; Ingvalson and Wenger 2005; Wenger and Ingvalson 2002, 2003).
The most powerful aspect of GRT with respect to the characterization of Gestalten is that, like
the temporally based approach discussed previously, it begins with a theory of perceptual repre-
sentation and decision-making that immediately links to empirical methods and measures for
assessing the nature and extent of Gestalten states, and can do so at the level of the individual
observer. We begin with the theoretical characterization of the perceptual representation of the
stimulus and do this for the simplest possible Gestalt: one arising from two stimulus dimensions
each of which can exist at two levels.
We assume variability in the encoding of the stimulus dimensions across repeated encounters
(Ashby and Lee 1993; Ashby and Townsend 1986). As such, we can idealize the perceptual repre-
sentation for each stimulus as a bivariate distribution of perceptual effects. This can be done using
any distributional assumptions; for present purposes we adhere to the practice that has been used
most frequently in the application of GRT and assume that this bivariate distribution is Gaussian.
Thus, the perceptual representation for any of the i = 1, . . . , 4 stimuli in our simplest case is com-
pletely specified by a mean vector
µA 
µi =  
 µB 

and a covariance matrix

 σ2A ρσ
i AσB

Σi =  
i AσB
ρσ σ B 
2

To make the theoretical characterization complete, we need only add decision bounds, to ‘carve
up’ the representational space into response regions. For simplicity only, we will assume that
these decision bounds are continuous and linear, though more complex types can be easily
accommodated (e.g. Maddox and Bogdanov 2000; Maddox 2001; Maddox and Bohil 2003).
With these as the elements of our theoretical language, we can develop theory-based characteri-
zations of any given hypothesized Gestalten that allow for immediate predictions for observable
behaviour.
We begin with the natural ‘null hypothesis’ for a percept that is neither a Gestalt, in which parts
interact positively, nor one whose parts interact negatively: complete independence and separabil-
ity everywhere. We now define the pertinent concepts more formally.
A first possibility for a Gestalt is one in which the integrality exists in the manner in which a
response decision is made. This type of Gestalten can be represented by allowing the decision
bounds to vary in their location across the levels of one or both of the dimensions, and is referred
to as a violation of decisional separability (DS). A second possibility is one in which the perceptual
distributions change, in their location, variability, or both, as a function of the level of each of the
two dimensions. This is referred to as a violation of perceptual separability (PS). Each of these two
possibilities is a type of Gestalten that is defined across stimuli.
The third possibility is one that is defined within stimuli and is thus closest to the vernacular con-
ception of a Gestalt (see O’Toole et al. 2001). In this possibility, the ‘amount’ of perceptual evidence
for one of the dimensions reliably co-varies in some way with the ‘amount’ of perceptual evidence
On the Dynamic Perceptual Characteristics of Gestalten 957

for the other dimension. One way to represent this in our simple Gaussian example is to allow any
or all of the ρi to be non-zero. This is referred to as a violation of perceptual independence (PI).
The experimental methodology that follows from the theoretical requirements of GRT is
known as the complete identification paradigm, and the experimental design implemented in this
paradigm is the feature-complete factorial design. In this design, each level of each dimension is
presented with equal frequency, and the observer is required to give a response (or sequence of
responses) that provides explicit evidence of the observer’s perceptual and decisional state with
respect to each dimension on each trial. The paradigm and design are flexible enough to address
for configural superiority and configural inferiority effects.
Within the assumptions of this paradigm and design, we can add information from GRT to our
working axioms:

Working axioms for configural perception 3


For both configural superiority and inferiority designs:
3.1 Gestalt perception of an individual figure will evidence a violation of PI on any partition of that
figure.
3.2  Gestalt perception of any partition of a set of figures in the context of variations across that set of
figures will evidence violations of PS, DS, or both.
A small set of simple examples may be of help here, and we rely on an analysis of the Thatcher
illusion (Cornes et al. 2011) for these examples. Consider first how a violation of PI might rep-
resent a Gestalt state. For the purposes of this example, assume that Gestalt states will exist only
for upright faces. Assume next that when the orientation of the facial surround and the internal
features are both upright, there exists a positive dependency in the sources of perceptual informa-
tion about these two aspects of the stimulus. Finally, assume that when the internal features are
mis-oriented with respect to the facial surround, there exists a negative dependency in the two
sources of perceptual dependency. This would give rise to the representation in Figure 46.2, panel
(c). In this case, note that for all four perceptual states the marginal means and variances for each
of the two sources of stimulus information are invariant across the levels of the other source of
information. The Gestalt effects (positive and negative dependencies within the perceptual rep-
resentation of each stimulus) are hypothesized to lie in the non-zero correlations in each of the
respective covariance matrixes.
A second type of Gestalt state could arise because of a violation of perceptual separability. In
this case, the source of the Gestalt is hypothesized to be a variation in the mean level of percep-
tual evidence for the orientation of the internal features as a function of the orientation of the
facial surround. In this case, the mean level of information supporting the perception of the
internal features as upright is greatest when the two dimensions of the stimulus are consistent.
In addition, the mean level of the information supporting the perception of the internal features
as inverted is greatest when the facial surround is upright. The Gestalt effects are in this case
hypothesized to lie in the values of the marginal means for the internal features, with the within-
stimulus correlations being 0 (i.e. no violations of PI, panel (d) of Figure 46.2).
The third type of Gestalt state could arise because the decision to judge the internal features as upright
or inverted is different when the facial surround is upright rather than inverted. For this hypothesis,
assume that when the facial surround is upright, the observer adopts a liberal response strategy with
respect to identifying the internal features relative to when the facial surround is inverted. The Gestalt
effects are in this case hypothesized to lie in the location of the decision bounds that divide the space of
the perceptual representation into the different response regions for each of the dimensions, with the
decision bounds in this example assumed to be linear with a non-zero slope (panel (e) of Figure 46.2).
(a)

Upr
Upr
Inv
Inv
Internal feature Facial surround

(b) (c)
Internal feature

Internal feature
Upr Upr

Inv Inv

Inv Upr Inv Upr


Facial surround Facial surround

(d) (e)
Internal feature

Internal feature

Upr Upr

Inv Inv

Inv Upr Inv Upr


Facial surround Facial surround

Fig. 46.2  Example GRT representations of the hypothetical sources of configurality in the Thatcher
illusion: (a) Bivariate distributions of perceptual evidence given stimuli in which the facial surround
and the internal features could are presented either upright (Upr) or inverted (Inv). The vertical planes
(outlined in red) represent the decision bounds which divide the representational space into four
response regions. (b) Contours of equal likelihood given preservation of PI, PS, and DS. (c) Contours of
equal likelihood for the situation in which PI is violated in upright but not inverted stimuli. (d) Contours
of equal likelihood for the situation in which PS is violated for the upright but not inverted stimuli.
(e) Contours of equal likelihood for the situation in which PI and PS are preserved and DS is violated.
On the Dynamic Perceptual Characteristics of Gestalten 959

In each of these three examples, the variations in the stimulus change the pattern of behav-
ioural responses that are predicted. In each case, there is the potential for the Thatcher
manipulation---inversion of the internal features relative to the facial surround---to be best
detected when the facial surround is upright rather than inverted. This would be the behav-
ioural ‘signature’ of the Thatcher illusion as a Gestalt effect. However, a critical point to note
here is that only one of the hypotheses just considered applies to the perception of an indi-
vidual stimulus on a within-trial basis, and that is the violation of PI. Violations of either PS or
DS pertain to the perception of sets of stimuli. This raises an interesting ‘disconnect’ between
the general state of theorizing (or, more accurately operationalizing) about Gestalten and the
experimental methods that are typically used to assess the presence or absence of Gestalt states.
In general, the vernacular conception of Gestalten within the scientific community is most
consistent with a violation of PI. That is, the Gestalt state is assumed to exist for the observer
within the perception of an individual stimulus (see Cornes et al. 2011 for a discussion specific
to the Thatcher illusion). Unfortunately, the overwhelming majority of experimental studies that
have probed Gestalt perception have used tasks (including tasks used in the Garnerian approach)
in which it is possible to glean information about the observer’s state with respect to only one of
the stimulus dimensions on each trial. Thus, these tasks cannot provide the data needed to assess
potential violations of PI, meaning that it becomes difficult if not impossible to connect the exper-
imental evidence with the theoretical construct at the level at which investigators are postulating
the Gestalt state. The exception are studies that implement the feature-complete factorial design
and use a complete identification response task. We will have more to say about data from these
designs in the final section of this chapter

A Brief Consideration of the Experimental Evidence


Systems factorial technology
A number of experiments focusing on Gestalt perception, utilizing SFT have been carried out since its
inception in the late 1980s (Townsend and Nozawa 1988, 1995). Wenger and Townsend (2001) con-
firmed parallel processing for both realistic faces as well as scrambled-feature faces. However, there
was also widespread limited capacity along with some evidence for super-capacity in realistic faces
vs scrambled-feature faces. Moreover, obligatory face perception in the sense of exhaustive feature
completion even when early termination could yield correct responses was never affirmed: observ-
ers inevitably cease feature processing as soon as they can. The latter finding indicates that people
can choose to be feature-analytic when circumstances afford such attentional control.
These findings have been confirmed in the broad sense in every study we have run, but sub-
sequent studies have further determined that when exhaustive processing of facial features
is demanded, people tend to demonstrate super-capacity parallel perception (e.g. Wenger and
Townsend 2006). Indeed, word perception is explained by the same type of systems characteristics
as facial perception.
Fific and Townsend (2010) developed an extension of SFT and selective influence for categori-
zation of faces. In this study, they replicated and expanded the part–whole paradigm (e.g. Tanaka
and Farah 1993; Tanaka and Sengco 1997) to include two features, and to second-order rather
than primary features. The part–whole paradigm compares placement of a learned vs unlearned
feature in a known facial context (e.g. contours and other features) as contrasted with an unknown
context. Neither offers logical information about the featural identity since both appear randomly
as context. However, the familiar context aids accuracy. After replication of the earlier findings
with two features and using RTs, our investigation carried out AND and OR experiments designed
960 Townsend and Wenger

to identify architecture, stopping rules and, less directly, channel interactions across the studied
configural features. Finally, we also used not only new facial contexts but also feature-alone, with-
out any facial context at all.
First, in both the OR as well as the AND conditions, observers were faster in the familiar face
stimuli than with the new face or features alone situation. Next, in the OR conditions, all observ-
ers indicated strong parallel processing along with a ‘stop as soon as the first target feature is
completed’ (i.e. minimum time) stopping rule, both in the familiar face context as well as the new
face context. However, some observers proved to be serial, minimum time, although only in the
features-alone conditions. The combination of ordinary parallel or serial processing, for instance,
not co-active or parallel interactive, provides strong support for analytic processing even though
the learned contexts aided efficiency.
In contrast, within the AND experiment and when presented with familiar faces, observers
appeared to mix an ordinary exhaustive (note: more holistic!) parallel processing strategy with a
decided tendency towards facilitatory interactive channels (see, e.g., Eidels et al. 2011). There was
also some interaction present in the new face and features-alone conditions though not much.
Analysis of the learning phases of the experiment also support this account.
Overall, these results point to a graded notion of Gestalt perception, namely that significant
parallel interactions can appear under certain circumstances, such as when exhaustive processing
of facial features is obligatory. However, when experimental conditions afford the opportunity
to be analytic and stop as soon as sufficient information is accrued to make a correct response,
observers will do so. Even when interactive parallel processing is found, the parts do not reveal a
perfect correlation (i.e. starting and finishing at the same moments, indicating the whole is pro-
cessed as a complete unit). Supplementing the above précis with other studies in the literature we
summarize the provisional findings through SFT. Theoretical and empirical results accrued over
the past fifteen years or so have thoroughly verified the parallel nature of within-object feature and
dimensional perception. In a number of experiments with well-organized figures like faces, a type
of parallel processing called co-activation has been discovered. Co-activation entails summation
across channels or possibly positive channel interaction.
Interestingly, even objects such as realistic faces, which are prime candidates for Gestalten, do
not inevitably evoke super-capacity perception. Sometimes even moderately limited capacity is
found in such circumstance, especially if early termination (i.e. non-exhaustive processing) of
features is allowed. On the other hand, when a task calls for processing of all the featural informa-
tion contained in Gestalt items (exhaustive processing), the investigator tends to witness higher
degrees of super-capacity. Moreover, when people learn to glue together meaningless features into
patterns, again within tasks which demand exhaustive featural processing of the target category,
rather extraordinary magnitudes of super-capacity are witnessed, implying efficiency far exceed-
ing ordinary parallel processing (as per Blaha and Townsend 2004).

General recognition theory


The meta-theoretical language provided by GRT has been used theoretically and experimentally
to characterize a variety of Gestalten. We would be remiss, however, if we did not point out that in
addition to applications of GRT to questions of perceptual and cognitive independence, it has also
served as the foundation of one of the lead theories of categorization. Interested readers should
consult the numerous contributions by Ashby, Maddox, and their colleagues for examples of this
latter work (e.g. Ashby and Lee 1991; Ashby and Maddox 1993, 1994; Maddox 1992; Maddox and
Ashby 1993; Maddox 2001).
On the Dynamic Perceptual Characteristics of Gestalten 961

The most recent applications of GRT to questions of configurality have come in the context of
studies of the perception of and memory for faces, although we should also note that we have done
the same with respect to perceptual organization of hierarchical forms (Copeland and Wenger
2006). Specifically, we have applied the constructs and methods of GRT to the holistic encoding
hypothesis (Wenger and Ingvalson 2002, 2003), the composite face effect (Richler et al. 2008), the
Thatcher illusion (Cornes et al. 2011), and face inversion (Mestry et al. 2012). An intriguing regu-
larity from these studies is the consistent lack of evidence (or at best weak evidence) for violations
of PI. Instead, these studies have revealed that the empirical regularities that are commonly taken
as the ‘signatures’ of Gestalten do not produce compelling evidence for the state—a violation of
PI—that is most consistent with the vernacular conception of Gestalten.
One intriguing possibility here is that the non-parametric quantitative methods that have to
date been the most widely used methods for supporting inferences regarding PI, PS, and DS may
actually be overly conservative with respect to detecting violations of PI. This observation has
come from ongoing work by Menneer and colleagues (e.g. Menneer et al. 2009; Menneer, Blaha,
and Wenger 2012) examining alternative statistical methods for supporting inferences regarding
PI, PS, and PS. One particular aspect of this work is the evaluation of probit regression models to
GRT data, as first suggested by DeCarlo (e.g. 2003). Preliminary results suggest that probit models
are capable of detecting true violations of PI that can be missed by other methods. The following
paragraphs attempt to encapsulate the recent contributions arrived at through GRT.

Perceptual independence
Recall that perceptual independence (PI) is defined as the stochastic independence occurring on
a within-trial basis among features or dimensions. We have previously suggested that, in a sense,
violations of PI could be considered the strongest type of non-independence possibly indicative
of Gestalt perception. It has not often been detected in our data, even for respectable Gestalten. It
is not clear why this is the case, as featural inter-channel dependencies, for example in a Hebbian
sense, stand as one of the most natural ways to bring about configural superiority. In addition,
cross-channel interactions provide the best explanation in a number of response-time experi-
ments where configural superiority effects are found (a few of which are Eidels, Townsend, and
Pomerantz 2008; Fific and Townsend 2010; Townsend and Houpt 2012; Eidels et al. 2008).

Perceptual separability
Violations of perceptual separability (PS) occur when a change on one feature, across trials,
for example, causes perceptual effects on a distinct feature. Although violations of PS could be
brought about through a failure of perceptual independence, dynamic systems have been devel-
oped which evince non-separability even though perceptual independence is intact. Perceptual
non-separability in the form of what Garner called integrality has been found with Gestalten more
frequently than positive perceptual dependencies, but less often than decisional non-separability
of a type that would be associated with Gestalt-like decision making.

Decisional separability
Intriguingly, when viewing Gestalten such as realistic faces, a failure of decisional separability
(DS) has been experimentally diagnosed more frequently than either of the other two types of
‘independence’. Investigators working in the area of visual object perception have sometimes
recoiled from these findings apparently because it is felt that a decisional influence is not suf-
ficiently perceptual. Our view is that such influences are also perceptual. For instance, when, as
we have sometimes discovered with Gestalten (e.g. faces), decisional criteria apparently tend to be
962 Townsend and Wenger

lowered or raised on the constituent features together, is this not a perceptual effect? For example,
in a recent GRT study of facial race-feature perception and adaptation, it was discovered that
adaptation to racial physiognomy or skin tine led to dramatic alterations in both perceptual sepa-
rability as well as decisional criteria (Blaha, Silbert, and Townsend 2011).

Conclusions and Frontiers


We have begun developing a theoretical language for Gestalt perception. Use of the language
permits the construction of tentative definitions and axioms of Gestalt processing. It allows, and
sometimes even compels, connections among diverse operational and verbal concepts and defini-
tions. Moreover, it facilitates the translation of Gestalt properties and theorems about them into
experimental hypotheses and subsequent tests.
One essential area of research which we do not have space to cover in any detail is the relation-
ship of holistic or configural vs featural information processing. Many studies have used inverted
faces to segregate out featural vs configural processing, with the idea that inverted perception
must rely on feature perception. However, in most cases, this concept is employed as a definition
without a converging system of checks.
In any event, the pendulum has swung back and forth so fast on this question that it is almost
invisible. One reason for the distinct findings could be that, as declared earlier, most investigators
tacitly assume that the various operational definitions proposed by Garner, Shepard, and others
inevitably accompany configural or holistic perception. Yet, as we have been at pains to convince
the reader, none of them is by any means destined to call upon the same systems properties as the
others. As a case in point, the Garner interference (i.e. a configural inferiority type of task) paradigm
demands an efficient segregation of attentional resources. On the other hand, configural superior-
ity should be in evidence when various parts of a face or object interact (or perhaps co-activate)
in a facilitatory manner. A  theorist can invent a model in which these ‘definitional’ properties
co-occur, but it is equally straightforward to construct models where they are dissociated.
Consider two relatively recent studies. We can take as starting point, the straightforward
hypothesis of Searcy and Bartlett (1996) that within faces, these two information modes are, in the
present terminology, perceived in an independent, parallel format. Ingvalson and Wenger (2005),
employing the strategies put forth herein, investigated this hypothesis and, in addition, stopping
rule and workload capacity. They discovered that configural and featural information sources
were processed in parallel and with unlimited and sometimes super-capacity. The stopping rule
was identified as ‘minimum time’ or a horse race between the two types of information. Thus, any
kind of serial processing as well as an exhaustive stopping rule were falsified. The combination of
minimum time stopping with decided evidence of super-capacity is interesting.
With regard to theoretical explanations of the Ingvalson and Wenger (2005) findings, it is
theoretically possible to witness super-capacity even though the two channels are processed
independently: for instance, if an observer simply puts more effort into her task in spite of (or
because of; see Kahneman 1973). However, a more natural account for the Ingvalson and Wenger
super-capacity findings provides for super-capacity through positive (facilitatory) channel inter-
actions (e.g. Eidels et al. 2011; Townsend and Wenger 2004b).
Nevertheless, positively interactive parallel models make predictions not only for capacity, but
also with regard to architectural tests. In fact, as the Eidels et al. simulation results indicate, facili-
tatory interactions tend to produce a small negative blip in the survivor function interaction con-
trasts, followed by a large positive hump, much like co-active models (e.g. Townsend and Nozawa
1995). Such negative departures of the contrast functions are not visible in the Ingvalson and
On the Dynamic Perceptual Characteristics of Gestalten 963

Wenger data. Further research on this issue is called for. There were other less critical findings
that have to be neglected here.
In contrast, Amishav and Kimchi (2010) used the Garner interference (therefore, configural
inferiority) design to investigate this issue. In contrast to the Ingvalson and Wenger (2005) find-
ings, they determined processing to be highly integral (i.e. non-independent and non-separable),
possibly indicating strong cross-talk across the two types of informational channels. It is logically,
mathematically, and scientifically possible that in attentional sharing (or divided attention) exper-
iments, relative independence or even positive facilitation might be found, but that in a configural
inferiority design, attention cannot be confined to a single source without a cost. Although this is
not the place for a detailed review of the literature, we suggest that any such literature evaluation
should first parse the studies into the types of methodology used. If there is sufficient regularity
after that, perhaps general inference drawing can advance.
Our approach is, like that of Garner and colleagues (see Pomerantz and Cragin, this volume)
oriented toward an information processing perspective. However, it seems clear that ultimately,
topology and geometry must be brought into the picture (see Bertamini and Casati, this volume
for a related discussion). A  very brief overview of these topics is now in order. First we need
quickly to note that topology is the branch of mathematics where qualitative, but not quantitative,
relationships among points matter. In fact, any deformation of an object which does not tear it is
a perfectly good topological transformation. The legendary statement that ‘topologists are defined
by the characteristic that they can’t tell the difference between a tea cup and a doughnut’ is due to
this aspect of topology. Geometry, on the other hand, is devoted to the study of shape, size, rela-
tive position of figures, and certain quantifiable properties of space. In general, geometries assume
that a distance between points and things like angles exist—properties that are meaningless in
topology. Euclidean geometry can be characterized in a number of ways, but the presence of the
famous Euclidean metric in which the distance between points A  and B in an n-dimensional
space is

∑ (B - A )
2
i i
i =1

is the best-known property. It took centuries for mathematicians to discover the existence of
non-Euclidean geometries. Considerable effort has been devoted by psychometricians and math-
ematical psychologist to investigate at lease some non-Euclidean geometries in the context of
human perception (e.g. Shepard 1964).
Chen (2005) has discussed the relationship of certain topological notions, such as the presence
of holes, to Gestalt perception. Eidels and colleagues (2008) showed how similarity concepts asso-
ciated with Chen’s efforts could be merged with systems factorial technology in studying Gestalt
processing.
Though quantitatively rigorous, our approach is at a substantially more macroscopic level than
those which attempt to capture neuro-anatomical structure and process. One apposite example is
the feed-forward model provided by Poggio and colleagues (e.g. Riesenhuber and Poggio 1999;
Serre, Oliva, and Poggio 2007). This model rests on a hierarchical ascending network of computa-
tions based on summation and max-rule decisions, which capture some of the elemental increas-
ing invariance of feature processing in the afferent, ventral pathways. It is unknown whether such
models could be extended to make predictions corresponding to the relatively larger-scale aspects
treated here but it would seem valuable to do so.
964 Townsend and Wenger

Another contender with regard to object vs face perception is defined by Biederman and his
colleagues. For instance, Biederman and Kalocsai (1997) introduced a theory based on earlier
ideas stemming from von der Marlsburg’s laboratory. The key elements envision an early layer of
hypercolumn pattern of representation for objects as well as faces. Subsequently, several types of
relational variables are instituted among the parts (typically Biederman’s geons; see, e.g., Hummel
and Biederman 1992) that permit discrimination and generalization among objects. However, the
system associated with face perception is strikingly different and contains two sub-tracks. One
of these tracks preserves spatial relationships and stores the information in hypercolumn—like
lattices which can later be matched against probe stimuli. These lattices are permitted to undergo
a certain degree of distortion to maximize closeness of match. In addition, a second track centres
each column of filters on a particular facial feature. The latter apparently allows selectivity of the
input into a holistic representation, thus avoiding such artefacts as unrelated object occlusion.
This bipartite structure is able to encompass a number of phenomena associated with face and
(and vs) object perception, including certain configural properties in face processing. Although
inspired by visual neurophysiology, much of the data guiding this as well as the Poggio and com-
pany model are behavioural in nature. Thus, it does not seem too outlandish to suggest that exten-
sions or special analyses might engender predictions concerning the architecture (presumably
heavily parallel, though with sequential hierarchies), workload capacity, stopping rule, and inde-
pendence, for example, of various types of parts (e.g. geons).
One of the most prominent and exciting developments, with respect to the focus of this chap-
ter, must be the theoretical unification of SFT and GRT. This effort has begun on several fronts.
For instance, we have recently formulated a new mathematical workload capacity function
which bonds information based on RTs (part of the SFT toolbox) with that assessing accuracy
(Townsend and Altieri 2012). However, this new statistic has not yet been employed in the study
of Gestalt perception. Similarly, Townsend, Houpt, and Silbert (2012) offer an extended GRT
which includes parallel architectures and permits a strengthened methodology based both on
RT as well as accuracy. Nonetheless, the RT-based methodologies which afford identification of
architecture (e.g. serial vs parallel processing; Townsend and Wenger 2004a) have not yet been
unified with GRT and accuracy in general.
Finally, theoretical work on the applied mathematics associated with model analysis and prob-
ing of failures of the different types of dependence is proceeding at a lively pace, both on GRT as
well as SFT. It could turn out that, say, perceptual independence may be more subject to Type II
errors than the other two types of independence. Only further theoretical and experimental prob-
ing will tell the tale. We think the next decade or so should see a growing comprehension of the
underpinning process machinery which handles Gestalt perception.

References
Amishav, R. and R. Kimchi (2010). ‘Perceptual Integrality of Componential and Configural Information in
Faces’. Psychonomic Bulletin & Review 17(5): 743–748.
Ashby, F. G. (1982). ‘Deriving Exact Predictions from the Cascade Model’. Psychological Review
89: 599–607.
Ashby, F. G. and Townsend, J. T. (1986). ‘Varieties of Perceptual Independence’. Psychological Review
93: 154–179.
Ashby, F. G. and W. W. Lee (1991). ‘Predicting Similarity and Categorization from Identification’. Journal of
Experimental Psychology: General 120: 150–172.
Ashby, F. G. and W. W. Lee (1993). ‘Perceptual Variability as a Fundamental Axiom of Perceptual Science’.
In Foundations of Perceptual Theory, edited by S. C. Masin, pp. 369–399. Amsterdam: Elsevier.
On the Dynamic Perceptual Characteristics of Gestalten 965

Ashby, F. G. and W. T. Maddox (1993). ‘Relations between Prototype, Exemplar, and Decision Bound
Models of Categorization’. Journal of Mathematical Psychology 37: 372–400.
Ashby, F. G., G. Boynton, and W. W. Lee (1994). ‘Categorization Response Time with Multidimensional
Stimuli’. Perception & Psychophysics 55: 11–27.
Ashby, F. G. and W. T. Maddox (1994). ‘A Response Time Theory of Separability and Integrality in Speeded
Classification’. Journal of Mathematical Psychology 38: 423–466.
Ashby, F. G., E. M. Waldron, W. W. Lee, and A. Berkman (2001). ‘Suboptimality in Human Categorization
and Identification’. Journal of Experimental Psychology: General 130: 77–96.
Biederman, I. and P. Kalocsai (1997). ‘Neurocomputational Bases of Object and Face Recognition’.
Philosophical Transactions of the Royal Society London: Biological Sciences 352: 1203–1219.
Blaha, L. M. and J. T. Townsend (2004). ‘From Nonsense to Gestalt: The Influence of Configural Learning
on Processing Capacity’. Paper presented at the 2004 Meeting of the Society for Mathematical
Psychology, July, Ann Arbor, MI.
Blaha, L. M. (2010). ‘A Dynamic Hebbian-style Model of Configural Learning’. Dissertation submitted
in partial fulfilment of the requirements for the degree doctor of philosophy, Indiana University,
Bloomington.
Blaha, L. M., N. Silbert, and J. T. Townsend (2011). ‘A General Recognition Theory of Race Gestalten 28
Adaptation’. Paper presented at the annual meeting of the Vision Sciences Society, May, Naples, FL.
Chen, L. (2005). ‘The Topological Approach to Perceptual Organization’. Visual Cognition 12: 553–637.
Colonius, H. and J. T. Townsend (1997). ‘Activation-state Representation of Models for the
Redundant-signals-effect’. In Choice, Decision, and Measurement: Essays in Honor of R. Duncan Luce,
edited by A. A. J. Marley, pp. 245–254. Hillsdale, NJ: Erlbaum.
Copeland, A. M. and M. J. Wenger (2006). ‘An Investigation of Perceptual and Decisional Influences on the
Perception of Hierarchical Forms’. Perception 35: 511–529.
Cornes, K., N. Donnelly, H. Godwin, and M. J. Wenger (2011). ‘Perceptual and Decisional Factors
Affecting the Detection of the Thatcher Illusion’. Journal of Experimental Psychology: Human Perception
and Performance 37: 645–668.
DeCarlo, L. T. (2003). ‘Using the Plum Procedure of SPSS to Test Unequal Variance and Generalized Signal
Detection Models’. Behavior Research Methods, Instruments, and Computers 35: 49–56.
Egeth, H. (1966). ‘Parallel versus Serial Processes in Multidimensional Stimulus Discrimination’. Perception
and Psychophysics 1: 245–252.
Eidels, A., J. T. Townsend, and J. R. Pomerantz (2008). ‘Where Similarity Beats Redundancy: The
Importance of Context, Higher Order Similarity, and Response Assignment’. Journal of Experimental
Psychology: Human Perception and Performance 34(6): 1441–1463.
Eidels, A., J. W. Houpt, N. Altieri, L. Pei, and J. T. Townsend (2011). ‘Nice Guys Nish Fast And Bad Guys
Nish Last: Facilitatory vs Inhibitory Interaction in Parallel Systems’. Journal of Mathematical Psychology
55: 176–190.
Fific, M., R. M. Nosofsky, and J. T. Townsend (2008). ‘Information-processing Architectures in
Multidimensional Classification: A Validation Test of the Systems Factorial Technology’. Journal of
Experimental Psychology: Human Perception and Performance 34(2): 356–375.
Fific, M. and J. T. Townsend (2010). ‘Information-processing Alternatives to Holistic
Perception: Identifying the Mechanisms of Secondary-level Holism within a Categorization Paradigm’.
Journal of Experimental Psychology: Learning, Memory, and Cognition 36(5): 1290–1313.
Garner, W. R. (1974). The Processing of Information and Structure. New York: Wiley.
Green, D. M. and J. A. Swets (1966). Signal Detection Theory and Psychophysics. New York: Wiley.
Hummel, J. E. and I. Biederman (1992). ‘Dynamic Binding in a Neural Network for Shape Recognition’.
Psychological Review 99: 480–517.
Ingvalson, E. M. and M. J. Wenger (2005). ‘A Strong Test of the Dual Mode Hypothesis’. Perception and
Psychophysics 67: 14–35.
966 Townsend and Wenger

Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall.


Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace.
Kubilius, J., J. Wagemans, and H. P. Op de Beeck (2011). ‘Emergence of Perceptual Gestalts in
the Human Visual Cortex: The Case of the Configural-superiority Effect’. Psychological Science
22(10): 1296–1303.
Kubovy, M. and J. R. Pomerantz (1981). Perceptual Organization. Hillsdale, NJ: Erlbaum.
McClelland, J. L. (1979). ‘On the Time Relations of Mental Processes: An Examination of Systems of
Processes in Cascade’. Psychological Review 86: 287–330.
Maddox, W. T. (1992). ‘Perceptual and Decisional Separability’. In Multidimensional Models of Perception
and Cognition, edited by F. G. Ashby, pp. 147–180. Hillsdale, NJ: Erlbaum.
Maddox, W. T. and F. G. Ashby (1993). ‘Comparing Decision Bound and Exemplar Models of
Categorization’. Perception & Psychophysics 53: 49–70.
Maddox, W. T. and S. V. Bogdanov (2000). ‘On the Relation between Decision Rules and Perceptual
Representation in Multidimensional Perceptual Categorization’. Perception & Psychophysics 62: 984–997.
Maddox, W. T. (2001). ‘Separating Perceptual Processes from Decisional Processes in Identification and
Categorization’. Perception & Psychophysics 63: 1183–1200.
Maddox, W. T. and C. J. Bohil (2003). ‘A Theoretical Framework for Understanding the Effects of
Simultaneous Base-rate and Payoff Manipulations on Decision Criterion Learning in Perceptual
Categorization’. Journal of Experimental Psychology: Learning, Memory, and Cognition 29: 307–320.
Menneer, T., N. Silbert, K. Cornes, M. J. Wenger, J. T. Townsend, et al. (2009). ‘Contrasting Methods of
Model Estimation for Configural and Holistic Perception’. Poster presented at the 2009 Vision Sciences
Society Meeting, May, Naples FL.
Menneer, T., L. Blaha, and M. J. Wenger (2012). ‘Probit Analysis for Multidimensional Signal
Detection: An Evaluation and Comparison with Standard Analyses’. Manuscript under review.
Mestry, N., T. Menneer, M. J. Wenger, and N. Donnelly (2012). ‘Identifying Sources of Configurality in
Three Face Processing Tasks’. Manuscript under review.
Miller, J. O. (1982). ‘Divided Attention: Evidence for Coactivation with Redundant Signals’. Cognitive
Psychology 14: 247–279.
O’Toole, A. J., M. J. Wenger, and J. T. Townsend (2001). ‘Quantitative Models of Perceiving and
Remembering Faces: Precedents and Possibilities’. In Computational, Geometric, and Process Perspectives
on Facial Cognition: Contexts and Challenges, edited by M. J. Wenger and J. T. Townsend, pp. 1–38.
Mahwah NJ: Erlbaum.
Plomp, G. and van Leeuwen, C. (2006). ‘Asymmetric Priming Effects in Visual Processing of Occlusion
Patterns’. Attention, Perception, and Psychophysics 68(6): 946–958.
Pomerantz, J. R. and W. Garner (1973). ‘Stimulus Configuration in Selective Attention Tasks’. Attention,
Perception, and Psychophysics 14(3): 565–569.
Pomerantz, J. R. (2013). Personal communication, February.
Raab, D. H. (1962). ‘Statistical Facilitation of Simple Reaction Times’. Transactions of the New York Academy
of Sciences 24: 574–590.
Richler, J. J., L. Gauthier, M. J. Wenger, and T. J. Palmeri (2008). ‘Holistic Processing of Faces: Perceptual
and Decisional Components’. Journal of Experimental Psychology: Learning, Memory, and Cognition
38: 328–342.
Riesenhuber, M. and T. Poggio (1999). ‘Hierarchical Models of Object Recognition in Cortex’. Nature
Neuroscience 2: 1019–1025.
Roelfsema, P. R. and R. Houtkamp (2011). ‘Incremental Grouping of Image Elements in Vision’. Attention,
Perception, and Performance 73: 2542–2572.
Schweickert, R. (1978). ‘A Critical Path Generalization of the Additive Factor Method: Analysis of a Stroop
Task’. Journal of Mathematical Psychology 18: 105–139.
On the Dynamic Perceptual Characteristics of Gestalten 967

Schweickert, R. and J. T. Townsend (1989). ‘A Trichotomy Method: Interactions of Factors Prolonging


Sequential and Concurrent Mental Processes in Stochastic PERT Networks’. Journal of Mathematical
Psychology 33: 328–347.
Schweickert, R. and J. Mounts (1998). ‘Additive Effects of Factors on Reaction Time and Evoked Potentials
in Continuous-flow Models’. In Recent Progress in Mathematical Psychology: Psychophysics, Knowledge,
Representation, Cognition, and Measurement, edited by C. E. Dowling and F. S. Roberts, pp. 311–327.
Mahwah, NJ: Erlbaum.
Searcy, J. H. and J. C. Bartlett (1996). ‘Inversion and Processing of Component and Spatial-relational
Information in Faces’. Journal of Experimental Psychology: Human Perception and Performance
22: 904–915.
Serre, T., A. Oliva, and T. Poggio (2007). ‘A Feedforward Architecture Accounts for Rapid Categorization’.
Proceedings of the National Academy of Sciences 104: 6424–6429.
Shepard, R. (1964). ‘Attention and the Metric Structure of the Stimulus Space’. Journal of Mathematical
Psychology 1(1): 54–87.
Smith, P. (2000). ‘Stochastic Dynamic Models of Response Time and Accuracy: A Foundational Primer’.
Journal of Mathematical Psychology 44(3): 408–463.
Stins, J. F. and C. van Leeuwen (1993). ‘Context Influence on the Perception of Figures as Conditional upon
Perceptual Organization Strategies’. Attention, Perception, and Psychophysics 53(1): 34–42.
Tanaka, J. and M. Farah (1993). ‘Parts and Wholes in Face Recognition’. Quarterly Journal of Experimental
Psychology 46(2): 225–245.
Tanaka, J. and J. Sengco (1997). ‘Features and their Configuration in Face Recognition’. Memory &
Cognition 25(5): 583–592.
Taylor, D. A. (1976). ‘Stage Analysis of Reaction Time’. Psychological Bulletin 83: 161–191.
Townsend, J. T. (1974). ‘Issues and Models Concerning the Processing of a Finite Number of Inputs’. In
Human Information Processing: Tutorials in Performance and Cognition, edited by B. H. Kantowitz,
pp. 133–168. Hillsdale, NJ: Erlbaum.
Townsend, J. T. and F. G. Ashby (1983). Stochastic Modeling of Elementary Psychological Processes.
Cambridge: Cambridge University press.
Townsend, J. T. and G. Nozawa (1988). ‘Strong Evidence for Parallel Processing with Dot Stimuli’. Paper
presented at the 29th Meeting of the Psychonomic Society, November, Chicago.
Townsend, J. T. and R. D. Thomas (1994). ‘Stochastic Dependencies in Parallel and Serial Models: Effects
on Systems Factorial Interactions’. Journal of Mathematical Psychology 38: 1–34.
Townsend, J. T. and G. Nozawa (1995). ‘On the Spatio-temporal Properties of Elementary Perception:
An Investigation of Parallel, Serial, and Coactive Theories’. Journal of Mathematical Psychology
39: 321–359.
Townsend, J. T. and M. J. Wenger (2004a). ‘The Serial-parallel Dilemma: A Case Study in a Linkage of
Theory and Method’. Psychonomic Bulletin & Review 11: 391–418.
Townsend, J. T. and M. J. Wenger (2004b). ‘A Theory of Interactive Parallel Processing: New Capacity
Measures and Predictions for a Response Time Inequality Series’. Psychological Review 111: 1003–1035.
Townsend, J. T. and N. Altieri (2012). ‘An Accuracy/Response Time Capacity Assessment Function that
Measures Performance against Standard Parallel Predictions’. Psychological Review 119(3): 500–516.
Townsend, J. T. and J. W. Houpt (2012). ‘A New Perspective on Visual Word Processing Efficiency’. In
Proceedings of Fechner Day 28: 91–96.
Townsend J. T., J. W. Houpt, and N. D. Silbert (2012). ‘General Recognition Theory Extended to Include
Response Times: Predictions for a Class of Parallel Systems’. Journal of Mathematical Psychology
56(6): 476–94.
van Leeuwen, C. and T. Lachmann (2004). ‘Negative and Positive Congruence Effects in Letters and
Shapes’. Attention, Perception, and Psychophysics 66(6): 908–925.
968 Townsend and Wenger

Wenger, M. J. and J. T. Townsend (2001). ‘Faces as Gestalt Stimuli: Process Characteristics’. In


Computational, Geometric, and Process Perspectives on Facial Cognition, edited by M. J. Wenger and
J. T. Townsend, pp. 229–284. Mahwah, NJ: Erlbaum.
Wenger, M. J. and E. M. Ingvalson (2002). ‘A Decisional Component Of Holistic Encoding’. Journal of
Experimental Psychology: Learning, Memory, and Cognition 28: 872–892.
Wenger, M. J. and E. M. Ingvalson (2003). ‘Preserving Informational Separability and Violating Decisional
Separability in Facial Perception and Recognition’. Journal of Experimental Psychology: Learning,
Memory, and Cognition 29: 1106–1118.
Wenger, M. J. and T. J. Townsend (2006). ‘On the Costs and Benefits of Faces and Words’. Journal of
Experimental Psychology: Human Perception and Performance 32: 755–779.
Wenger, M. J., S. Negash, R. C. Petersen, and L. Petersen (2010). ‘Modeling and Estimating Recall
Processing Capacity: Sensitivity and Diagnostic Utility in Application to Mild Cognitive Impairment’.
Journal of Mathematical Psychology 54: 73–89.
Chapter 47

Hierarchical stages or emergence


in perceptual integration?
Cees van Leeuwen

Visual hierarchy gives straightforward


but unsatisfactory answers
Ever since Hubel and Wiesel’s (1959) seminal investigations of primary visual cortex (V1),
researchers have overwhelmingly been studying visual perception from a hierarchical perspective
on information processing. The visual input signal proceeds from the retina through the Lateral
Geniculate Nuclei (LGN), to reach the neurons in primary visual cortex. Their classical recep-
tive fields, i.e. the stimulation these neurons maximally respond to, are mainly local (approx. 1
degree of visual angle in cat) orientation-selective transitions in luminance, i.e. static contours or
perpendicular contour movement. Lateral connections between these neurons were disregarded
or were understood mainly to be inhibitory and contrast-sharpening, and thus receptive fields
were construed as largely context-independent. Thus the receptive fields provided the low-level
features that form the basis of mid- and high-level visual information processing.
Hubel and Wiesel (1974) found the basic features to be systematically laid out in an orientation
preference map. The map and that of other features such as color, form, location, and spatial fre-
quency suggests combinatorial optimization; for instance iso-orientation gradients on the orienta-
tion map are orthogonal to iso-frequency gradients (Nauhaus et al. 2012). Such systematicity may
be adaptive for projecting a multi-dimensional feature space onto an essentially two-dimensional
sheet of cortical tissue.
Whereas the basic features are all essentially separate, these are usually not part of our vis-
ual experience. Such properties are usually integral. These properties emerge from relationships
between basic features. (More about them in the section on Garner interference. Garner distin-
guished integral and configural dimensions, a distinction that need not concern us here, see also
Townsend and Wenger, this volume). From the initial mosaic of features, in order to achieve an
integral visual representation, visually-evoked activity continues its journey through a hierarchical
progression of regions. Felleman and Van Essen (1991) already distinguished ten levels of cortical
processing; fourteen if the front-end of retina and LGN, as well as at the top end the entorhinal
cortex and hippocampus, are also taken into account. One visual pathway goes through V2 and
V4 to areas of the inferior temporal cortex: posterior inferotemporal (PIT), central inferotemporal
(CIT), and anterior inferotemporal (AIT): the ventral stream; the other stream branches off after
V2/V3 (Livingstone and Hubel 1988): the dorsal stream. For perceptual organization the primary
focus has typically been the ventral stream; this is where researchers situate the grouping of early
features into tentative structures (Nikolaev and van Leeuwen 2004); from which higher up in the
hierarchy whole, recognizable objects are construed.
970 van Leeuwen

The visual hierarchy achieves integral representation through convergence. Whereas LGN neu-
rons are not selective for orientation, to obtain this feature in V1 requires the output of several
LGN neurons to converge on V1 simple cells. Besides simple cells, complex cells were distinguished
of which the receptive field is larger and more distinctive; Hubel and Wiesel proposed them to
be the result of output from several simple cells converging on a complex cell. Convergence is
understood to continue along the ventral stream (Kastner et al. 2001), leading to receptive field
properties not available at lower level (Hubel and Wiesel 1998): e.g. a representation in V4 is based
on convex and concave curvature (Carlson et al. 2011). Correspondingly, these representations
are becoming increasingly abstract; e.g. curvature representations in V4 in Macaque are invariant
against color changes (Bushnell and Pasupathy 2011). Also, the populations of neurons that carry
the representations become increasingly sparse (Carlson et al. 2011).
The higher up, the more the representations become integral and abstract, i.e. invariant under
perturbations such as location or viewpoint changes (Nakatani et  al. 2002) or occlusion (e.g.
Plomp et al. 2006). In individual neurons of macaque inferotemporal cortex (Tanaka et al. 1991),
although some of these cells respond specifically to whole, structured objects such as faces or
hands, most of them are more responsive to simplified objects. These cells provide higher-order
features with more or less position and orientation-invariant representation. The ‘more or less’
is added because the classes of stimuli these neurons respond to vary widely; some are orienta-
tion invariant, some are not; some are invariant with respect to contrast polarity, some are not.
Collectively, neurons in temporal areas represent objects by using a variety of combinations of
active and inactive columns for individual features (Tsunoda et al. 2001). They are organized in
spots, also known as columns, that are activated by the same stimulus. Some researchers proposed
that these columns constitute a map, the dimensions of which representing some abstract param-
eters of object space (Op de Beeck et al. 2001). Whether or not this proposal holds, it remains true
that realistic objects at this level are coded in a sparse and distributed population (Quiroga et al.
2008; Young and Yamane 1992).
In the psychological literature, the hierarchical approach to the visual system has found a func-
tional expression early on in the influential work of Neisser (1967), who identified the hierarchical
levels with stages of processing. Although Neisser recalled much from these views in subsequent
work (Neisser 1976), these early ideas have remained remarkably persistent amongst psychologists.
Most today acknowledge hierarchical stages in perception albeit ones that are ordered as cascades
rather than strictly sequentially. Neisser (1967) regards the early stages of perception as automatic
and the later ones as attentional. This notion has been elaborated by Anne Treisman, mostly in
visual search experiments. Treisman and Gelade (1980) showed that visual detection of target ele-
ments in a field of distracters is easy when the target is distinguished by a single basic feature.
When, however, a conjunction of features is needed to identify a target, search is slow and difficult.
Presumably, this is because attention is deployed by visiting the spatial location of each item-by-
item. Treisman concluded that spatially selective attention is needed for feature integration.
However, regardless of whether a basic feature identifies the target, the ease of finding it
amongst non-targets depends on their homogeneity (Duncan and Humphreys 1989); search for
conjunctions of basic features need not involve spatial selection, as long as these conjunctions
result in the emergence of a higher-order, integral feature that is salient enough (Nakayama and
Silverman 1986; Treisman and Sato 1990; Wolfe et al. 1988). We will come back to this notion
shortly. For now we may consider salience as the product of competition amongst target and dis-
tracter features, positively biased for relevant target features (Desimone and Duncan 1995) and/
or negatively biased for nontarget features, including the target’s own components (Rensink and
Enns 1995).
Hierarchical Stages or Emergence in Perceptual Integration? 971

Rapid detection of conjunctions could, in principle, be explained by strategic selection of a


higher-order feature map—but since in natural scenes, rather complex features including entire
3D objects could be efficiently searched (Enns and Rensink 1990), this would require an arbi-
trary number of feature maps. These being unavailable, complex detection in this approach must
be restricted to higher, object-level representations of the world (Duncan and Humphreys 1989;
Egeth and Yantis 1997; Kahneman and Henik 1981).
To enable complex detection at the highest levels of processing, according to the hierarchi-
cal approach, it is required that widely spread visual information, including that from different
regions along the ventral and dorsal pathways, converges on specific areas. Candidate regions are
those that receive information from multiple modalities, such as the Lateral Occipital Complex
(LOC). Neural representations here are found to be specific to combinations of dorsal and ventral
stream information, e.g. neurons have been found in LOC that are selective for graspable visual
objects over faces or houses (Amedi et al. 2002). A subset of these convergence areas may enable
conscious access to visual representations (Dehaene et al. 2006), in other words: be held responsi-
ble for the content of our visual experience.

Unresolved problems in the hierarchical approach


Contra to the hierarchical approach, in which visual consciousness ‘reads out’ the visual informa-
tion at the highest levels, Gaetano Kanizsa (1994) and earlier Gestaltists warned against such an
‘error of expectancy’: the hierarchical view to perception is misleading us about why objects look
the way they do. It mistakes the content of perception for our reasoning about these contents.
The latter is informed by instruction, background knowledge and our inferences. But consider
Figure 47.1. What it tells us is that the highest level is not always in control of our experience.
While discussing visual search, we have already encountered the concept of ‘salience’. Here,
again, we might want to say that the perceptual content is salient; it ‘pops out’ and automatically
grabs our attention in a way similar to a loud noise or a high-contrast moving stimulus. But
such notions are question-begging. For explaining why something pops out, we rely on common
sense. A loud noise pops out because it is loud. But what is it about Figure 47.1? We might want

Fig. 47.1  Popping out or popping in? Seeing is not always believing. <http://illutionista.blogspot.
be/2011/07/eating-hand-illusion-punching-face.html.>
972 van Leeuwen

to say that the event is salient, because it is unlikely. Recall, however, that we are then drawing on
precisely the kind of knowledge and inferences that would prevent us from seeing what we are
actually seeing here. We might say the event is salient, because mid-level vision is producing an
unusual output. This requires conscious awareness to have access to the mid-level representations,
in which, according to Wolfe and Cave (1999) targets and non-targets consist of loosely bundled
collections of features. But as far as this level is concerned, there is nothing unusual about the
scene; it is just a few bundles of surfaces, some of which are partially occluded. The event is salient
because it seems a fist is being swallowed. This illusion, therefore, is taking the notion of popping
out to the extreme: what is supposedly popping out is actually popping in.
All things considered, perhaps perception scientists have focused too exclusively on the hier-
archical approach. In fact, from a neuroscience point of view the hierarchical picture is not that
clear-cut. On the one hand, hierarchy seems not always necessary: single cells in V1 have been
found that code for grouping and e.g. are sensitive to occlusion information (Sugita 1999). On
the other hand, neurons selective for specific grouping configurations, irrespective of the sensory
characteristics of their components, occur outside of the ventral stream hierarchy, in macaque
lateral intraparietal sulcus (LIP) (Yokoi and Komatsu 2009). The LIP belongs to the dorsal stream
or ‘where’ system, for processing location and/or action-relevant information (Ungerleider and
Mishkin 1982; Goodale and Milner 1992), and is associated with attention and saccade targeting.
Using fMRI, areas of both the ventral and dorsal stream showed object-selectively; in intermediate
processing areas these representations were viewpoint and size specific, whereas in higher areas
they were viewpoint-independent (Konen and Kastner 2008). Generally speaking, it is not surpris-
ing that the ‘where’ system is involved in perceptual grouping. Consider, for instance, grouping by
proximity, which is primarily an issue of ‘where’ the components are localized in space (Gepshtein
and Kubovy 2007; Kubovy et al. 1998; Nikolaev et al. 2008). These observations might suggest that
hierarchy does not adequately characterize the distribution of labor in visual processing areas.

Approaches opposing the hierarchical view


For long, some perceptual theorists and experimenters have been revolting against the hierarchi-
cal view: German ‘Ganzheitspsychologie’, Gestalt psychology, and Gibsonian ecological realism.
All these approaches have sought to downplay the basic role of the mosaic of isolated local fea-
tures, arguing from a variety of perspectives that basic visual information consists of holistic prop-
erties. Consider what Koffka addressed as ‘das Aufforderungscharacter der Stimuli’ and Gibson
with the apparently cognate notion of ‘affordance’, both emphasizing that perception is dynamic
in nature and samples over time the higher-order characteristics of the surrounding environ-
ment. Gibsonians considered the visual system is naturally or adaptively ‘tuned’ to this informa-
tion. Gestaltists considered it to be the product of a creative synthesis, guided by the valuation of
the whole, for which sensory stimulation offers no more than boundary conditions. In Gestalt
psychology, this valuation was summarized under the notion of Prägnanz, meaning goodness of
the figure. ‘Ganzheitspsychologie’ (Sander, Krüger) regarded early perception to originate in the
perceiver’s emotions, body and behavioral dispositions. Shape characteristics like ‘roundedness’,
‘spikiness’ provide a context for further differentiation based on sensory information.
These approaches claimed to have principled answers to why we see the world the way we do
(Gestalt) or why we base our actions around certain properties (Gibson). However, they have
left the mosaic of featural representations an uncontested neurophysiological reality. Without an
account of how holistic properties could arise in the visual system, all this talk has to remain
question-begging.
Hierarchical Stages or Emergence in Perceptual Integration? 973

Integral properties challenge the hierarchical model


Studies aiming to establish holistic perception early in the visual system have focused on inte-
gral properties. The prevalence of such properties in perception is confirmed in psychological
studies in the integral superiority effect (Pomerantz et al. 1977; see also Pomerantz and Cragin,
this volume). These authors found, for instance, that ‘()’ and ‘((’ despite the presence of an extra,
redundant parenthesis, were more easy to distinguish from each other than ‘)’ and ‘(’. Kimchi and
Bloch (1998) showed that whereas classification of two curved lines together or two straight lines
together was easier than classifying mixtures of the two, the opposite occurred when the items
formed configurations, e.g. a pattern of two straight lines is extremely difficult to distinguish from
a pattern of two curved lines, if the two have a similar global configuration (e.g. both look like
an X-shape), whereas mixtures that differ in their configuration e.g. ‘X’ vs ‘()’ are extremely easy.
Thus, notwithstanding the hierarchical view, ‘how things look’ matters in what is easy to perceive.
How could it possibly be that these integral properties are present in early perception? After all,
they are supposedly built out of elementary features. We should distinguish, however, between
how we construe them and what is prior in processing. For constructing closure ‘()’ you need
‘(’ and ‘)’. But that doesn’t mean that, when ‘()’ is presented in a scene, you detect this by first
analyzing ‘(’ and ‘)’ separately and then putting them together. You could begin, for instance by
fitting a closure ‘O’ template to it, before segmenting the two halves. In that case you would have
detected closure before seeing the ‘(’ and the ‘)’. Of course, a problem is that the number of pos-
sible templates is exploding. Perception can only operate with a limited number. How are they
determined?

Reverse Hierarchy Theory


One way in which this process could be understood is reversed hierarchy theory; Hochstein and
Ahissar (2002), for instance, believed that a crucial part of perception is top-down activity. In this
view, high-level object representations are pre-activated, and selected based on the extent they
fit with the lower level information. Rather than being inert until external information comes
in, the brain is actively anticipating visual stimulation. This state of the brain implies that prior
experience and expectancies may bias visual perception (Hochstein and Ahissar 2002; Lee and
Mumford 2003). Top-down connections would, in principle, effectuate such feedback from higher
levels. Feedback activity might be able to make contact with the incoming lower-level informa-
tion, at any level needed, selectively enhancing and repressing certain activity patterns every-
where in a coordinated fashion, and thus configure lower-order templates on the fly.
This certainly sounds attractive as it would make sense of the abundant top-down connectiv-
ity between the areas of the visual system, but on the other hand it also has the ring of wishful
thinking. Recall that the brain does not have room for indefinite numbers of higher-order feature
maps. How does the higher-level system know which neural subpopulation at lower-level to selec-
tively activate? Treisman and Gelade (1980) at least provided a partial solution to this problem,
by making selection a matter of spatially focused attention. Only items in the limited focus of
attention are effectively integrated. Spatial selectivity is easy to project downward from areas such
as LIP, since all downward areas preserve to some degree the spatial coordinates of the visual
field—ignoring the complication of receptive fields trans-saccadic remapping in eye-movement
(e.g. Melcher and Colby 2008). On the other hand, the problem of how integration is achieved
is not resolved merely by restricting it to a small spatial window. There are, moreover, a host of
other forms of attentional selectivity besides purely spatial ones, such as object driven and divided
attention, that pose greater selection problems.
974 van Leeuwen

A modern, neurally-informed version of Treisman’s approach is found in Roelfsema (2006), which


distinguishes between base and incremental grouping. Base grouping is easy; it can be done through
a feedforward sweep of activity converging on single neurons. Grouping is hard, for instance, in
the presence of nearby and/or similar distracter information. Incremental grouping is performed,
according to these authors, through top-down feedback, all the way down to V1 (Roelfsema et al.
1998). This, however, is a slow process that depends on the spreading of an attentional gradi-
ent through the early visual system, by way of a mechanism such as synchronous oscillations or
enhanced neuronal activity (Roelfsema 2006). Neurons in macaque V1, for instance, responded
more strongly to texture elements belonging to a figure defined by texture boundaries than to ele-
ments belonging to a background (Roelfsema et al. 1998; Lamme et al. 1998; Zipser et al. 1996). Yet
this mechanism remains too slow to establish perceptual organization in the real-time processing
of stimuli of realistic complexity. Whereas, as we will discuss, perceptual organization in complex
stimuli arises within 60 ms (Nikolaev and van Leeuwen 2004), attentional effects in humans have
onset latencies in the order of 100 ms (Hillyard et al. 1998), and this is before recurrent feedback
even begins to spread.

Predictive Coding
According to Murray (2008), we must take care to distinguish effects of attention that are
pattern-specific from non-specific shifts in the baseline firing rates of neurons. Baseline shifts can
strengthen or weaken a given lower-level signal and can selectively affect a certain brain region,
independently of what is represented there; the firing rates of neurons, even when no stimulus is
present in the receptive field (Luck et al. 1997).
Moreover, also reduction in activity has been reported as a result of attention allocation
(Corthout and Supèr 2004). Possibly, this top-down effect could be understood as predictive cod-
ing: this notion proposes that inferences of high-level areas are compared with incoming sensory
information in lower areas through cortical feedback and the error between them is minimized by
modifying the neural activities (Rao and Ballard 1999). Using fMRI, Murray et al. 2002 found that
whereas activity increases in the higher areas, in particular the lateral occipital complex; when
elements grouped into objects as opposed to randomly arranged, reduction of activity occurs in
primary visual cortex. This observation suggests that activity in early visual areas may be reduced
as a result of grouping processes in higher areas. Reduced activity in early visual areas, as meas-
ured by fMRI was shown to indicate reduction of visual sensitivity (Hesselmann et  al. 2010),
presumably due to these processes.
Reduction of activity has also been claimed to have the opposite effect: Kok et al. (2012) found
that the reduction corresponded to a sharpening of sensory representations. Sharpening is under-
stood as top-down suppression of neural responses that are inconsistent with the current expec-
tations. These results suggest an active pruning of neural representations, in other words, active
expectation making representations increasingly sparse. Then again, multi-unit recording stud-
ies in ferrets and rats have provided evidence against such active sparsification in visual cortex
(Berkes et al. 2009).
Overall, we may conclude that top-down effects on early visual perception are both ubiq-
uitous and varied, sufficiently to accommodate contradicting theories; top-down effects may
selectively or aselectively increase or decrease firing rates, change the tuning properties of neu-
rons, including receptive field locations and sizes. Some of these effects may be predictive; per-
ception does not begin when the light hits the retina. None of these mechanisms, however, are
fast enough to enable the rapid detection of complex object properties that configural superior-
ity requires.
Hierarchical Stages or Emergence in Perceptual Integration? 975

Intrinsic generation of holistic representation


Let us therefore consider the possibility of intrinsic holism: the view that the visual system has
an intrinsic tendency to produce coherent patterns of activity from the visual input. Already at
the level of early processing, in particular V1, intrinsic mechanisms for generating global struc-
ture may exist. Conversely, some ‘basic’ grouping might occur at the higher levels. Gilaie-Dotan
et al. (2009) offer a case in point. They observed a patient with severely deactivated mid-level
visual areas (V2-V4). The patient lacked the specific, dedicated function of these areas: ‘look-
ing at objects further than about 4 m, I can see the parts but I cannot see them integrated as
coherent objects, which I could recognize; however, closer objects I can identify if they are not
obstructed; sometimes I  can see coherent integrated objects without being able to figure out
what these objects are’ (p. 1690). In addition, face perception is severely impaired. Nevertheless,
the patient was capable of near-normal everyday behavior. Most interestingly, higher areas in
this patient were selectively activated for object categories like houses and places. This sug-
gests that activity in higher-order brain regions are not driven by lower-order activity, but that
higher-level representations are ‘. . . generated ‘de novo’ by local amplification processes in each
cortical region’ (p 1700).

Early higher-order features


Some response properties of V1 neurons are suggestive of the power of early, intrinsic holism.
I  mentioned Sugita’s (1999) occlusion-selective V1 neurons. Moreover, some V1 neurons will
respond with precise timing to a line ‘passing over’ their RFs even when the RF and surround are
masked (Fiorani et al. 1992). Neurons in V1 and V2 have been observed to respond to complex
contours, such as illusory boundaries (Grosof et al. 1993; Von der Heydt and Peterhans 1989).
Contours can, in principle, besides by simple luminance edges, be defined by more complex cues,
such as texture (Kastner et al. 2001). Texture-defined boundaries as found in V1 defy the hierar-
chical model, as they are complex by definition. Kastner et al. (2001) showed that texture-based
segregation can be found in the human visual cortex using fMRI. Line textures activated areas V1,
besides V2/VP, V4, TEO, and V3A as compared with blank presentations.
Kastner et  al. (2001) also observed that texture checkerboard patterns evoked more activity,
relative to uniform textures, in area V4 but not in V1 or V2. This means that here we have a
later area being involved in processes typically believed to occur earlier—the early areas respond
strongly to normal checkerboards of similar dimensions. Perhaps, larger spatial-scale receptive
field sizes than V1 or V2 could provide were needed here. But perhaps, these areas lack specific
long-range connections for texture boundaries. We may, therefore, propose that integration occurs
within each level subject to restrictions given by the layout of receptive fields and the nature of
their intrinsic connections.

Contextual modulation
Neurons in primary visual cortex (V1) respond differently to a simple visual element when it is
presented in isolation from when it is embedded within a complex image (Das and Gilbert 1995).
Beyond their classical receptive field, there is a surround region; its diameter is estimated to be
at least 2–5 times larger than the classical receptive field (Fitzpatrick 2000). Stimulation of this
region can cause both inhibition and facilitation of a cell’s responses, and modification of its RF
(Blakemore and Tobin 1972), spatial summation of low-contrast stimuli (Kapadia et  al. 1995),
and cross-orientation modulation (Das and Gilbert 1999; Khoe et al. 2004). Khoe et al. (2004)
studied detection thresholds for low-contrast Gabor patches, in combination with event-related
976 van Leeuwen

potentials (ERP) analyses of brain activity. Detection sensitivity increases for such stimuli when
flanked by other patches in collinear orientation, as compared to ones in the orthogonal orienta-
tion. Collinear stimuli gave rise an increased ERP response between 80 to 140 ms from stimulus
onset, centered on the midline occipital scalp, which could be traced to primary visual cortex.
Such interactions are thought to depend on local excitatory connections between cells in V1
(Kapadia et al. 1995; Polat et al. 1998).
Das and Gilbert (1999) showed that the strength of these connections declines gradually with
cortical distance in a manner that is largely radially symmetrical and relatively independent of
orientation preferences. Contextual influence of flanking visual stimuli varies systematically with
a neuron’s position within the cortical orientation map. The spread of connections could provide
neurons with a graded specialization for processing angular visual features such as corners and
T junctions. This means that already at the level of V1, complex features can be detected. In par-
ticular, T-junctions are an important clue that an object is partially hidden behind an occlude,
in accordance with the observation that occlusion is detected early in perception (see Kogo and
van Ee, this volume). According to Das and Gilbert (1999), these features could have their own
higher-order maps in V1, linked with the orientation map. In other words, higher-order maps
thought to belong to mid-level may be found already in early visual areas.

Long-range contextual modulation


An important further mechanism of early holism could be found in the way feature maps in V1
are linked beyond the surround region (see Alexander and van Leeuwen 2010 for a review). Long
range connectivity enables modulation of activity by stimuli well beyond the classical RF and its
immediate surround. In contrast with short-range connections, long-range intrinsic connections
are excitatory, and link patchy regions with similar response properties (Malach et al. 1993; Lund
et al. 1993). Traditionally, the function of these long-range connections has been understood to
be assembling the estimates of local orientation (within columns) into long curves. These con-
nections may have other possible roles as well, such as representing texture flows; patterns of
multiple locally near-parallel curves, or zebra stripes. (Ben-Shahar et al. 2003). Texture flows are
more than individual parallel curves; the flow is distributed across a region; consider, for instance,
the ’flow’ patterns that can be observed in animal fur. The perception of contour flow enables to
segregate complex (Das and Gilbert 1995) and curvature perception (Ben Shahar and Zucker
2004). Whereas this information is available early, it is emphasized in later processing areas. In
V4, for instance, shape contours are collectively represented in terms of the minima and maxima
in curvature they contain (Feldman and Singh 2005).
From the survey of neural representation, we may conclude that the necessary architecture for
early holism is available already at the level of V1. If so, what to make of the empirical evidence
for convergence and the increasingly sparse representations in mid-level and higher visual areas?
Sparsification may be a way to establish selectivity dynamically (e.g. Lörincz et al. 2002). Now
consider that basically all evidence for sparsification comes from animal studies. Training requires
animals spending months of exposure with the same, restricted set of configurations. In other
words, their representations will have been extremely sparsified. How much this encoding resem-
bles what arises in more natural conditions remains unknown. Here, I have made efforts to show
that the two need not be too similar.

Time course of contextual modulation


Early holism could be achieved through spreading of activity through these lateral connections.
Accordingly, the response properties of many cells in V1 are not static, but develop over time.
Hierarchical Stages or Emergence in Perceptual Integration? 977

In V1, and more predominantly in the adjacent area V2, Zhou et al. (2000) and Qiu and von der
Heydt (2005) observed in macaque, neurons sensitive to boundary assignment. One neuron will
fire if the figure is on one side of an edge, but will remain silent and another will fire instead if the
figure is on the other side of the edge. These distinctions are made as early as 30 ms after stimulus
onset. Thus, even receptive fields in early areas such as V1 are sensitive to context almost instan-
taneously after a stimulus onset.
In the input layers (4C) of V1 neurons reach a peak in orientation selectively with a latency of
30-45 milliseconds, persisting for 40-85 ms (macaque). The output layers (layers 2, 3, 4B, 5 or 6),
however, show a development in selectivity, in which often neurons shows several different peaks.
This could be understood in terms of wide-range lateral inhibition needed for high-level of orienta-
tion selectivity in V1 (Ringach et al. 1997) but also, I should add, as a result of modulation from
long-range connections within V1. Along with the architecture of neural connectivity, the dynamics
provides the machinery for early holism, through spreading of activity within the early visual areas.
Due to activation spreading, the time course of activity in cells, regions and systems shows an
increased context-dependency in early visual areas with time. Around 60 ms from stimulus onset
the activity of neurons in V1 becomes dependent on that of their neighbors through horizontal
connections (in the same neuronal layer), for instance the interactions of oriented contour seg-
ments through local association fields (Kapadia et al. 1995; Polat et al. 1998; Bauer and Heinze
2002). These effects can be observed in human scalp EEG: the earliest ERP component C1—which
peaks at 60–90 ms after stimulus onset—is not affected by attention (Clark et al. 2004; Martinez
et al. 1999; Di Russo et al. 2003), although the later portion of this component may reflect contribu-
tions from visual areas other than V1 (Foxe and Simpson 2002). The earliest attentional processes
in EEG reflect spatial attention. ERP studies (reviewed by Hillyard et al. 1998) showed that spatial
attention affects ERP components not earlier than about 90 ms after stimulus onset. The 80–100
ms latency is generally understood to be the earliest moment where attentional feedback kicks in.

Time course of attention deployment


According to the early holism proposal, in animal studies attentional modulation affects an already
organized activity pattern in V1—contra Treisman and Gelade (1980). This result has been contested
In studies with humans using EEG. Using high-density event-related brain potentials, Han et  al.
(2001) compared grouping by proximity with grouping by similarity, relative to a uniform grouping
condition with static pictorial stimuli. They found that the time course and focus of activity of group-
ing by proximity and similarity differ. Proximity grouping gave rise to an early positivity (around 110
ms) in the medial occipital region in combination with an occipitoparietal negativity around 230 ms
in the right hemisphere. Similarity grouping showed a negativity around 340 ms, with a maximum
amplitude over the left occipitoparietal area. This is in accordance with Khoe (2004), who found that
later effects of collinearity (latencies of 245–295 and 300–350) were found laterally, suggesting an
origin in the LOC. With the criterion that beyond 100 ms, processes at low level vision are subject
to feedback, Han et al. concluded that all processes involved in grouping are affected by attention.
Han et  al (2001) interpreted the early occipital activity as spatial parsing; the subsequent
occipitoparietal negativity as suggesting the involvement of the ‘where’ system. In case of simi-
larity grouping, the late onset as well as the scalp distribution of the activity suggests that the
‘what’ system is mostly doing the grouping work. Hence the hemispheric asymmetry in both
processes:  left-side processing tends to be oriented towards substructures, which typically are
small-scale; right-hemisphere processing favors macro-structures, which are typically of larger
scale (Sergent 1982; Kitterle et al. 1992; Kenemans et al. 2000). Thus proximity grouping being
centered on the right hemisphere and similarity grouping on the left hemisphere, reflects the fact
978 van Leeuwen

that the former can be done on the basis of low-spatial resolution information, whereas the latter
required a combination of low and high spatial resolution aspects of the stimuli. Eliminating low
spatial frequency information from the stimuli, left hemisphere activity became dominant.
Even though for proximity, the locus of these effects seems early, the time course of perceptual
grouping might seem to confirm that it is attentionally driven. By varying the task, requiring spa-
tial attention to be narrowly or widely focused, it is possible to observe differences in perceptual
integration (Stins and van Leeuwen 1993). Han et al. (2005) varied the target of the task by setting
the task either to detect a target color in the center or more distributed across the stimulus. They
measured the effects of this manipulation on evoked potentials. Han et al. (2005) found that all the
grouping-related evoked activity not only started later than 100 ms, but also depended on the task.
There are, however, earlier correlates of grouping in neural activity than the ones observed by
Han et  al (2001, 2005). In the dot-lattice display of Figure 47.2, Nikolaev et  al. (2008) studied

(a)
c
b Aspect ratio AR = |b| / |a|
a d

(b)

AR=1.0 AR=1.1

AR=1.2 AR=1.3

Fig. 47.2  Dot lattices. The dots appear to group into strips. (a) The four most likely groupings are
labeled a, b, c, and d, with the inter-dot distance increasing from a to d. Perception of lattices
depends on their aspect ratio (AR), which is the ratio of two shortest inter-dot distances: along a
(the shortest) and b. When AR = 1.0, the organizations parallel to a and b are equally likely. When
AR > 1.0, the organization parallel to a is more likely than the organization parallel to b. These
phenomena are manifestations of grouping by proximity. (b) Dot lattices of four aspect ratios.
Reproduced from Experimental Brain Research, 186(1), pp. 107–122, Dissociation of early evoked cortical activity in
perceptual grouping, Andrey R. Nikolaev, Sergei Gepshtein, Michael Kubovy, and Cees van Leeuwen, DOI: 10.1007/
s00221-007-1214-7 Copyright (c) 2008, Springer-Verlag. With kind permission from Springer Science and Business Media.
Hierarchical Stages or Emergence in Perceptual Integration? 979

grouping by proximity using a design based on a parametrized grouping strength. They found
an effect of proximity, more precisely of aspect ratio (AR, see Figure 47.2) on C1 in the medial
occipital region starting from 55 ms after onset of the stimulus. As mentioned, C1 is considered
the earliest evoked response of the primary visual cortex; it is usually registered in the central
occipital area 45–100 ms after stimulus presentation. This result suggests that C1 activity reflects
early spatial grouping. The early activity was higher in the right than left hemisphere, consistently
with Han et al.’s (2001) observation that low spatial frequencies are processed more in the right
than left hemisphere. Therefore, proximity grouping at this stage depends more on low than high
spatial frequency content of visual stimuli.
One of the reasons this result was not observed in Han et al. (2001) may have been that their
task never involved reporting grouping. In this respect it is interesting that in Nikolaev et al.
(2008) the amplitude of C1 depended on individual sensitivity to subtle differences in AR. The
more sensitive an observer, the better AR predicted the amplitude of C1. The absence of an effect
of AR on C1 in low grouping sensitivity observers was compensated by an effect on the next
peak. This is the P1 in posterior lateral occipital areas (without a clear asymmetry), having its
earliest effect of proximity (AR) at 108 ms from stimulus onset, i.e. right at the onset of atten-
tional feedback activity. The effect is present in all observers, but the trend is opposite to that of
C1, in that the lower the proximity sensitivity, the larger its effect on P1 amplitude. Thus, the two
events represent different aspects of perceptual grouping, with the transition between the two
taking place on the interval from 55 to 108 ms after stimulus onset. Perceptual grouping, there-
fore, may be regarded as a multistage process, which consists of early attention-independent
processes and later processes that depend on attention, where the latter may compensate the
former if needed.

Traces of pre-attentional binding in attentional processes


Like context-sensitivity within areas, attention-based grouping also seems to be spreading; in
macaque V1 spatially selective attention spreads out in an approx 300 ms period from the focus of
attention, following grouping criteria (Wanning et al. (2011). Attention spreads through modally,
but not amodally completed regions (Davis and Driver 1997); Attention spreading depends on
whether object components are similar or connected (Baylis and Driver 1992). Attention spreads
even between visual hemifields. Kasai and Kondo (2007) and Kasai (2010) presented stimuli to
both hemifields, which were either connected or unconnected by a line. The task involved target
detection in one hemifield. Attention was reflected by a larger amplitude of ERP at occipitotem-
poral electrode sites in the contralateral hemisphere. These effects were revealed in ERP: first in
N1 (150–210 ms) and also in the subsequent N2pc (330/310–390 ms). The N1 component is
associated with orienting visuospatial attention to a task relevant stimulus (Luck et al. 1990) and
with enhancing target signal (Mangun et al. 1993); The N2pc component is associated with spa-
tial selection of target stimuli in visual search displays (Eimer 1996; Luck and Hilliard 1994) and
in particular with selecting task-relevant targets or suppression of their surrounding nontargets
(Eimer 1996). These effects were reduced by the presence of a connection between the two objects.
Thus, attention spreads mandatorily based on connectedness.
Attention involves already organized representations; attentional selection, therefore, can-
not prevent the intrusion of information that the early visual feature integration processes have
already tied up with the target. Effects of irrelevant features into selective attention can, therefore,
be interpreted as a sign that feature integration has taken place (cf. Mounts and Tomaselli 2005;
Pomerantz and Lockhead 1991). Two of its particular manifestations, incongruence and (MacLeod
980 van Leeuwen

1991; van Leeuwen and Bakker 1995; Patching and Quinlan 2002) and Garner effects (Garner
1974, 1976, 1988), have had a crucial role for detecting feature integration in behavioral studies.
Incongruence effects involve the deterioration of a response to a target feature resulting from
one or more incongruent but irrelevant other features presented at the same trial, as compared to
a congruent feature. They belong to the family that also includes the classical Stroop task (Stroop
1935) in which naming the ink color of a color-word is delayed if the color-word is different
(incongruent) from the color of the ink which has to be named (e.g. the word red printed in
green ink), as well as auditory versions (Hamers and Lambert 1972), the Eriksen flanker paradigm
(Eriksen and Eriksen 1974), tasks using individual faces and names (Egner and Hirsch 2005),
numerical values and physical sizes (Algom et  al. 1996), names of countries and their capitals
(Dishon-Berkovits and Algom 2000), and versions employing object—or shape-based stimuli
(Pomerantz et al. 1989; for a review: Marks 2004). These effects, therefore, are generic to different
levels of processing. Different Stroop-like tasks will involve a mixture of partially overlapping, and
partially distinct brain mechanisms (see, for instance, a recent meta-analysis in Nee et al. 2007).
Consider the stimuli in Figure 47.3. According to their contours the stimuli on one diagonal
are congruent and the ones on the other incongruent. Participants responding to whether the
concave contour has a rectangular or triangular shape, show an effect of congruency of the outer
contour on response latencies and EEG. These effects imply that concave and surrounding con-
tour shapes have somehow become related in the representation of the figure.
Garner interference was named by Pomerantz (1983) after the work of Garner (1974, 1976, and
1988). Stimulus dimensions, such as brightness or saturation, are assumed to describe a stimulus
in a ‘feature space’ (Garner 1976). Dimensions are called separable if variation along the irrelevant
dimension results in same performance as without variation. An example of separable dimensions
are circle size and radius inclination (Garner and Felfoldy 1970). When variation of the stimuli along
an irrelevant dimension of this space slow the response to the target compared to when the irrelevant
dimension is held constant, Garner called such dimensions integral, which means that they have been
integrated perceptually. Brightness and saturation are typically integral dimensions (Garner 1976).

G3L3 G3L4

G4L3 G4L4

Fig. 47.3  Stimuli composed of a larger outer contour (global feature G) and a smaller inner contour
(local feature L) which were either a triangular or rectangular in shape, yielding the congruent stimuli
G3L3, G4L4 and the incongruent ones: G3L4, G4L3.
Participants classified the figures as triangular or rectangular according to the shape of the inner contour. Reprinted
from NeuroImage, 45(4), Lars T. Boenke, Frank W. Ohl, Andrey R. Nikolaev, Thomas Lachmann, and Cees van
Leeuwen, Different time courses of Stroop and Garner effects in perception — An Event-Related Potentials Study, pp.
1272–1288, doi: 10.1016/j.neuroimage.2009.01.019 Copyright (c) 2009, with permission from Elsevier.
Hierarchical Stages or Emergence in Perceptual Integration? 981

In one of his studies, for instance, Garner (1988) used the dimensions ‘letters’ and ‘color’. Letters
C and O were presented in green or red ink color. The task was to name the ink color, which varied
randomly in both letter conditions. Here, the irrelevant feature was associated with the ‘letters’
dimension. In the baseline condition, the letters ‘O’ or ‘C’ would occur in separate blocks; in the
filtering conditions they would be randomly intermixed. Irrelevant variation of the letters had
impact on the response to the color dimension, which implies that letter identity and color are
integral dimensions.
As independent factors in one single experiment, incongruence and Garner effects occurred
either jointly (Pomerantz 1983; Pomerantz et  al. 1989; Marks 2004) or mutually exclusively
(Melara and Mounts 1993; Patching and Quinlan 2002, van Leeuwen and Bakker 1995). These
effects might thus be considered as belonging to different mechanisms. But perhaps better, they
could be regarded as the same mechanism operating on two different time scales. In both cases,
the principle is that attentional selection failed, based on the previous inclusion with the target
information of task-irrelevant information. Their difference may then be considered in terms of
the time it takes this irrelevant information to become connected with the target. Incongruence
effects occur when conflicting information is presented within a narrow time window (Flowers
1990). Thus, memory involvement is minimal. The Garner effect, on the other hand, is a conflict
operating between presentations, and thus involves episodic memory. Incongruence and Garner
effects, therefore, differ considerably in the width of their scope and that of their feedback cycle,
the drawing upon a much wider feedback cycle than the former.
As a result, their time course will differ. Boenke et  al. (2009) used ERP analyses to observe
the time course of incongruence and Garner effects. In accordance with Kasai’s (2010) effects
of spreading of attention, they found incongruence effects on N1 and N2. The first interval was
observed on N1, between 172–216 ms after stimulus onset and had a maximum at 200 ms, located
in the parieto-occipital areas, more predominantly on the right. The amplitude was larger in
incongruent than congruent condition. The second interval occurred between 268–360 ms after
stimulus onset and included the negative component N2 and the rising part of the P3 component,
predominantly in the fronto-central region of the scalp.
Garner effects in Boenke et al. (2009) started off later. The earliest one between 328–400 ms
after stimulus onset. This interval corresponded to the rising part of the positive component
P3 and was observed predominantly above the fronto-central areas). The first maximum in
the Garner effect almost coincided with the second maximum in the incongruence effect. This
moment (336 ms) was also the maximum of interaction with the Garner effect observed over left
frontal, central, temporal, and parietal areas. This result implies that Stroop and Garner effects
occur in cascaded stages, resolving the longstanding question about their interdependence. We
may conclude that the time course of Garner effects follows the principle of spreading attention;
with Garner effects depending on information from the preceding episode, they depend on a
wider feedback cycle than incongruence effects, and thus the rise time of the former is longer, and
their latency larger, than that of the latter.

Conclusions and open issues


In the present chapter, I have been trying to go beyond placing some critical notes in the margin of
the hierarchical approach to perception, and instead of hierarchical convergence to higher-order
representation, suggest an alternative principle of perception. I  have sketched the visual system
as a complex network of lateral and large-scale within area connections as well as between-areas
feedback loops; these enable areas and circuits to reach integral representation through recurrent
982 van Leeuwen

activation cycles operating at multiple scales. These cycles work in parallel (e.g. between ventral and
dorsal stream), but where the onset of their evoked activity differs, they operate as cascaded stages.
According to a principle I have been peddling since the late eighties (e.g. van Leeuwen et al.
1997), early holism is realized through diffusive coupling through lateral and large-scale intrinsic
connections, prior to the deployment of attentional feedback. The coupling results in spreading
activity on, respectively, circuit-scale (Gong and van Leeuwen, 2009), area-scale (Alexander et al.
2011), and whole head-scale traveling wave activity (Alexander et al. 2013).
Starting from approximately 100 ms after onset of a stimulus, attentional feedback also begins
to spread, but cannot separate what earlier processes have already joined together. Early-onset
attentional feedback processes have been shown to extend to congruency of proximal information
in the visual display; later ones to extend to information in episodic memory (Boenke et al. 2009).
This is because the onset latency of the effect is determined by the width of the feedback cycle,
which determines the time it takes for the contextual modulation to arrive: short for features close
by within the pattern or long for episodic memory.

Perceiving beyond the hierarchy


Spreading activity in perceptual systems cannot go on forever. It needs to settle, and next be annihi-
lated, in order for the system to continue working. Within each area, we may therefore expect activa-
tion to go through certain macrocycles, in which pattern coherence is periodically reset. In olfactory
perception, Skarda and Freeman (1987) have described such macrocycles as transitions between sta-
ble and instable regimes in system activity, as coordinated with the breathing cycle; upon inhalation
the system is geared towards stability, and thereby responsive to incoming odor; upon exhalation the
attractors are annihilated for the system to be optimally sensitive to new information. Freeman and
van Dijk (1987) observed a similar cycle in visual perception; we might consider a system becoming
instable, and thus ready to anticipate new information in preparation for, what was dubbed a ‘visual
sniff ’ (Freeman 1991). Whenever new information is expected, for instance, when moving our eyes
to a new location, we may be taking a visual sniff. Macrocycles in visual perception can be consid-
ered on the scale of saccadic eye-movement, i.e. approx. 300–450 ms on average. Within this period,
the visual system to envelop several perceptual cycles, starting from the elementary interactions
between neighboring neurons and gradually extending to include episodic and semantic memory.

Open issues
In this chapter, I drew a perspective of visual processing based on intrinsic holism, as established
through the dynamic spreading of signals via short and long range lateral, as well as top-down feed-
back connections. Since the mechanism is essentially indifferent with respect to pre-attentional
and attentional processes in perception, we might consider a unified theoretical framework, in
which processes are distinguished, based on the scale of which these interactions are taking place.
The exact layout of the theory will depend on a precise, empirical study of the way spreading activ-
ity can achieve coherence in the brain. The next chapter will provide some of the results that could
offer the groundwork for such a theory.

Acknowledgments
The author is supported by an Odysseus research grant from the Flemish Organization for
Science (FWO) and wishes to thank Lee de-Wit, Pieter Roelfsema, and Andrey Nikolaev for use-
ful comments.
Hierarchical Stages or Emergence in Perceptual Integration? 983

References
Alexanders, D.M. and van Leeuwen, C. (2010). Mapping of contextual modulation in the population
response of primary visual cortex. Cognitive Neurodynamics 4: 1–24.
Alexander, D.M., Trengove, C., Sheridan, P., and van Leeuwen, C. (2011). Generalization of learning by
synchronous waves: from perceptual organization to invariant organization. Cognitive Neurodynamics
5: 113–32.
Alexander, D.A., Jurica, P., Trengove, C., Nikolalev, A.R., Gepshtein, S., Zviagyntsev, M., Mathiak,
K., Schulze-Bonhage, A., Rüscher, J., Ball, T., and van Leeuwen, C. (2013). Traveling waves and
trial averaging; the nature of single-trial and averaged brain responses in large-scale cortical signals.
NeuroImage doi: 10.1016/j.neuroimage.2013.01.016.
Algom, D., Dekel, A., and Pansky, A. (1996). The perception of number from the separability of the
stimulus: the Stroop effect revisited. Memory & Cognition 24: 557–72.
Amedi, A., Jacobson, G., Hendler, T., Malach, R., and Zohary, E. (2002). Convergence of visual and tactile
shape processing in the human lateral occipital complex. Cerebral Cortex 12: 1202–12.
Bauer, R. and Heinze, S. (2002). Contour integration in striate cortex. Experimental Brain Research
147: 145–52.
Baylis, G.C. and Driver, J. (1992). Visual parsing and response competition: The effect of grouping factors.
Perception & Psychophysics 51: 145–62.
Ben-Shahar, O. and Zucker, S.W. (2004). Sensitivity to curvatures in orientation-based texture
segmentation. Vision research 44: 257–77.
Ben-Shahar, O., Huggins, P.S., Izo, T., and Zucker, S.W. (2003). Cortical connections and early visual
function: intra-and inter-columnar processing. Journal of Physiology Paris 97: 191–208.
Berkes, P. White, B.L., and Fiser, J. (2009). No evidence for active sparsification in the visual cortex. Paper
presented at NIPS 22. <http://books.nips.cc/papers/files/nips22/NIPS2009_0145.pdf>
Blakemore, C. and Tobin, E.A. (1972). Lateral inhibition between orientation detectors in the cat’s visual
cortex. Experimental Brain Research 15: 439–40.
Boenke, L.T., Ohl, F., Nikolaev, A.R., Lachmann, T., and van Leeuwen, C. (2009). Stroop and Garner
interference dissociated in the time course of perception, an event-related potentials study. NeuroImage
45: 1272–88.
Bushnell, B.N. and Pasupathy, A. (2011). Shape encoding consistency across colors in primate V4. Journal
of Neurophysiology 108: 1299–308.
Carlson, E.T., Rasquinha, R.J., Zhang, K., and Connor, C.E. (2011). A sparse object coding scheme in area
V4. Current Biology 21: 288–93.
Clark, V.P., Fan, S., and Hillyard, S.A. (2004). Identification of early visual evoked potential generators by
retinotopic and topographic analyses. Human Brain Mapping 2(3): 170–87.
Corthout, E. and Supèr, H. (2004). Contextual modulation in V1: the Rossi-Zipser controversy.
Experimental Brain Research 156: 118–23.
Das, A. and Gilbert, C.D. (1995). Long-range horizontal connections and their role in cortical
reorganization revealed by optical recording of cat primary visual cortex. Nature 375: 780–4.
Das, A. and Gilbert, C.D. (1999). Topography of contextual modulations mediated by short-range
interactions in primary visual cortex. Nature 399: 655–61.
Davis, G. and Driver, J. (1997). Spreading of visual attention to modally versus amodally completed
regions. Psychological Science 8(4): 275–81.
Dehaene, S., Changeux, J.P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious, and
subliminal processing: a testable taxonomy. Trends in Cognitive Sciences 10: 204–11.
Di Russo, F., Martínez, A., and Hillyard, S.A. (2003). Source analysis of event-related cortical activity
during visuo-spatial attention. Cerebral Cortex 13(5): 486–99.
984 van Leeuwen

Dishon-Berkovits, M., Algom, D. (2000). The Stroop effect: it is not the robust phenomenon that you have
thought it to be. Memory & Cognition 28: 1437–49.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of
neuroscience 18(1): 193–222.
Duncan, J. and Humphreys, G.W. (1989). Visual search and stimulus similarity. Psychological Review
96: 433–58.
Eimer, M. (1996). The N2pc component as an indicator of attention selectivity. Electroencephalography and
Clinical Neurophysiology 99: 225–34.
Egeth, H.E. and Yantis, S. (1997). Visual attention: Control, representation, and time course. Annual review
of psychology 48(1): 269–97.
Egner, T. and Hirsch, J. (2005). Cognitive control mechanisms resolve conflict through cortical
amplification of task-relevant information. Nature Neuroscience 8: 1784–90.
Enns, J.T. and Rensink, R.A. (1990). Sensitivity to three-dimensional orientation in visual search.
Psychological Science 1(5): 323–6.
Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters upon the identification of a target letter in a
nonsearch task. Perception & Psychophysics 16: 143–9.
Feldman, J. and Singh, M. (2005). Information along contours and object boundaries. Psychological Review
112: 243–52.
Felleman, D.J. and Van Essen, D.C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cerebral Cortex 1: 1–47.
Fiorani, M., Rosa, M.G., Gattass, R., and Rocha-Miranda, C.E. (1992). Dynamic surrounds of receptive
fields in primate striate cortex: a physiological basis for perceptual completion? Proceedings of the
National Academy of Sciences USA 89: 8547–51.
Fitzpatrick, D. (2000) Seeing beyond the receptive field in primary visual cortex. Current Opinions in
Neurobiology 10: 438–43.
Flowers, J.H. (1990). Priming effects in perceptual classification. Perception & Psychophysics 47:
135–48.
Foxe, J.J. and Simpson, G.V. (2002). Flow of activation from V1 to frontal cortex in humans. Experimental
Brain Research 142(1): 139–50.
Freeman, W.J. (1991). Insights into processes of visual perception from studies in the olfactory system.
In: L. Squire, N.M. Weinberger, G. Lynch, and J.L. McGaugh (eds.), Memory: Organization and Locus of
Change, pp. 35–48. New York: Oxford University Press.
Freeman, W.J. and van Dijk, B.W. (1987). Spatial patterns of visual cortical fast EEG during conditioned
reflex in a rhesus monkey. Brain Research 422(2): 267–76.
Garner, W.R. (1974). The Processing of Information and Structure. Potomac: Erlbaum Publishers.
Garner, W.R. (1976). Interaction of stimulus dimensions in concept and choice processes. Cognitive
Psychology 8: 98–123.
Garner, W.R. (1988). Facilitation and interference with a separable redundant dimension in stimulus
comparison. Perception & Psychophysics: 44: 321–30.
Garner, W.R. and Felfoldy, G.L. (1970). Integrality of stimulus dimensions in various types of information
processing. Cognitive Psychology 1: 225–41.
Gepshtein, S. and Kubovy, M. (2007). The lawful perception of apparent motion. Journal of Vision
7(8):9: 1–15.
Gilaie-Dotan, S, Perry, A., Bonneh, Y., Malach, R., and Bentin, S. (2009). Seeing with profoundly
deactivated mid-level visual areas: nonhierarchical functioning in the human visual cortex. Cerebral
Cortex 19: 1687–703.
Gong, P. and van Leeuwen, C. (2009). Distributed dynamical computation in neural circuits with
propagating coherent activity patterns. PloS Computational Biology 5(12): e1000611.
Hierarchical Stages or Emergence in Perceptual Integration? 985

Goodale, M.A., and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in
Neuroscience 15: 20–5.
Grosof, D.H., Shapley, R.M., and Hawken, M.J. (1993). Macaque V1 neurons can signal ‘illusory contours’.
Nature 365: 550–2.
Hamers, J.F. and Lambert, W.E. (1972). Bilingual interdependencies in auditory perception. Journal of
Verbal Learning and Verbal Behaviour 11: 303–10.
Han, S., Song, Y., Ding, Y., Yund, E.W., and Woods, D.L. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38: 926–35.
Han, S., Jiang, Y., Mao, L., Humphreys, G.W., and Qin, J. (2005). Attentional modulation of perceptual
grouping in human visual cortex: ERP studies. Human Brain Mapping 26: 199–209.
Hesselmann, G., Sadaghiani, S., Friston, K.J., and Kleinschmidt, A. (2010) Predictive coding or evidence
accumulation? False inference and neuronal fluctuations. PloS ONE 5(3), e9926: doi:10.1371/journal.
pone.0009926.
Hillyard, S.A., Vogel, E.K., and Luck, S.J. (1998). Sensory gain control (amplification) as a mechanism of
selective attention: electrophysiological and neuroimaging evidence. Philosophical Transactions of the
Royal Society of London. Series B: Biological Sciences 353: 1257–70.
Hochstein, S. and Ahissar, M. (2002). View from the top-hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804.
Hubel, D.H. and Wiesel, T.N. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of
Physiology 148: 574–91.
Hubel, D.H. and Wiesel, T.N. (1974). Sequence regularity and geometry of orientation columns in the
monkey striate cortex. Journal of Comparative Neurology 158: 267–94.
Hubel, D.H. and Wiesel T.N. (1998). Early exploration of the visual cortex. Neuron: 20 401–12.
Kahneman, D. and Henik, A. (1981). Perceptual organization and attention. In: M. Kubovy and
J.R. Pomerantz (eds), Perceptual Organization, pp. 181–211. Hillsdale: Erlbaum.
Kanizsa, G. (1994). Gestalt theory has been misinterpreted, but has also had some real conceptual
difficulties. Philosophical Psychology 7: 149–62.
Kapadia, M.K., Ito, M., Gilbert, C.D., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: parallel studies in human observers and in V1 of alert monkeys. Neuron
15: 843–56.
Kasai, T. (2010). Attention-spreading based on hierarchical spatial representations for connected objects.
Journal of Cognitive Neuroscience 22: 12–22.
Kasai, T. and Kondo, M. (2007). Electrophysiological correlates of attention-spreading in visual grouping.
Neuroreport 18: 93–8.
Kastner, S., De Weerd, P., Pinsk, M.A., Elizondo, M.I., Desimone, R., and Ungerleider, L.G. (2001).
Modulation of sensory suppression: implications for receptive field sizes in the human visual cortex.
Journal of Neurophysiology 86: 1398–411.
Kenemans, J.L., Baas, J.M., Mangun, G.R., Lijffijt, M., and Verbaten, M.N. (2000). On the processing of
spatial frequencies as revealed by evoked-potential source modeling. Clinical neurophysiology: official
journal of the International Federation of Clinical Neurophysiology 111: 1113–23.
Khoe, W., Freeman, E., Woldorff, M.G., and Mangun. G.R. (2004). Electrophysiological correlates of
lateral interactions in human visual cortex. Vision Research 44: 1659–73.
Kimchi, R. and Bloch, B. (1998). Dominance of configural properties in visual form perception.
Psychonomic Bulletin & Review 5: 135–9.
Kitterle, F.L., Hellige, J.B. and Christman, S. (1992). Visual hemispheric asymmetries depend on which
spatial frequencies are task relevant. Brain and Cognition 20: 308–14.
Kok, P., Jehee, J.F.M., and de Lange, F.P. (2012). Less is more: expectation sharpens representations in the
primary cortex. Neuron 75: 265–70.
986 van Leeuwen

Konen, Ch. and Kastner, S. (2008). Tho hierarchically organized neural systems for object information in
human visual cortex. Nature Neuroscience 11: 224–31.
Kubovy, M., Holcombe, A.O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity.
Cognitive Psychology 35: 71–98.
Lamme, V.A., Supèr, H., and Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing
cortex. Current Opinion in Neurobiology 8: 529–35.
Lee, T.S. and Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. JOSA A
20(7): 1434–48.
Livingstone, M. and Hubel, D. (1988). Segregation of form, color, movement, and depth: anatomy,
physiology, and perception. Science 240: 740–9.
Lörincz, A., Szirtes, G., Takács, B., Biederman, I., and Vogels, R. (2002). Relating priming and repetition
suppression. International Journal of Neural Systems 12: 187–201.
Luck, S.J., Heinze, H.J., Mangun, G.R., and Hillyard, S.A. (1990). Visual event-related potentials
index focused attention within bilateral stimulus arrays: II. Functional dissociations of P1 and N1
components. Electroencephalography and Clinical Neurophysiology 75: 528–42.
Luck, S.J. and Hillyard, S.A. (1994). Spatial filtering during visual search: evidence from human
electrophysiology. Journal of Experimental Psychology: Human Perception and Performance 20: 1000–14.
Luck, S.J., Chelazzi, L., Hillyard, S.A., and Desimone, R. (1997). Neural mechanisms of spatial selective
attention in areas V1, V2, and V4 of macaque visual cortex. Journal of Neurophysiology 77: 24–42.
Lund, J.S., Yoshioka, T., and Levitt, J.B. (1993). Comparison of intrinsic connectivity in different areas of
macaque monkey cerebral cortex. Cerebral Cortex 3: 148–62.
MacLeod, C.M. (1991). Half a century of research on the Stroop effect: an integrative review. Psychological
Bulletin 109: 163–203.
Malach, R., Amir, Y., Harel, M., and Grinvald, A. (1993). Relationship between intrinsic connections and
functional architecture revealed by optical imaging and in vivo targeted biocytin injections in primate
striate cortex. Proceedings of the National Academy of Sciences USA 90: 10469–73.
Mangun, G.R., Hillyard, S.A., and Luck, S.J. (1993) Electrocortical substrates of visual selective attention.
In: Meyer, D. and Kornblum, S. (eds.), Attention and Performance XIV, pp. 219–43. Cambridge,
MA: MIT Press.
Marks, L.E. (2004). Cross-modal interactions in speeded classification. In: G. Calvert, C. Spence and B.E.
Stein (eds.), The Handbook of Multisensory Processes, pp. 85–106. Cambridge, MA: MIT Press.
Martinez, A., Anllo-Vento, L., Sereno, M.I., Frank, L.R., Buxton, R.B., Dubowitz, D.J. et al. (1999).
Involvement of striate and extrastriate visual cortical areas in spatial attention. Nature Neuroscience
2: 364–9.
Melara, R.D. and Mounts, J.R. (1993). Selective attention to Stroop dimensions: effects of baseline
discriminability, response mode, and practice. Memory & Cognition 21: 627–45.
Melcher, D. and Colby, C.L. (2008). Trans-saccadic perception. Trends in Cognitive Science 12: 466–73.
Mounts, J.R. and Tomaselli, R.G. (2005). Competition for representation is mediated by relative attentional
salience. Acta psychologica 118: 261–75.
Murray, S.O. (2008). The effects of spatial attention in early human visual cortex are stimulus independent.
Journal of Vision: 8(10).
Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., and Woods, D.L. (2002). Shape perception
reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences USA
99: 15164–9.
Nakatani, C., Pollatsek, A., and Johnson, S.H. (2002). Viewpoint-dependent recognition of scenes. The
Quarterly Journal of Experimental Psychology: Section A 55(1): 115–39.
Nakayama, K. and Silverman, G.H. (1986). Serial and parallel processing of visual feature conjunctions.
Nature 320: 264–5.
Hierarchical Stages or Emergence in Perceptual Integration? 987

Nauhaus, I., Nielsen, K.J., Disney, A.A., and Callaway, E.M. (2012). Orthogonal micro-organization of
orientation and spatial frequency in primate primary visual cortex. Nature Neuroscience 15: doi:10.1038/
nn.3255.
Nee, D.E., Wager, T.D., and Jonides, J. (2007). Interference resolution: insights from a meta-analysis of
neuroimaging tasks. Cognitive, Affective, & Behavioral Neuroscience 7: 1–17.
Neisser, U. (1967). Cognitive Psychology. East Norwalk: Appleton-Century-Crofts.
Neisser, U. (1976). Cognition and reality: Principles and Implications of Cognitive Psychology. New York,
NY: WH Freeman, Holt and Co.
Nikolaev, A.R. and van Leeuwen, C. (2004). Flexibility in spatial and non-spatial feature grouping: an
Event-Related Potentials study. Cognitive Brain Research 22: 13–25.
Nikolaev, A.R., Gepshtein, S., Kubovy, M., and van Leeuwen, C. (2008). Dissociation of early evoked
cortical activity in perceptual grouping. Experimental Brain Research 186: 107–22.
Op de Beeck, H., Wagemans, J., and Vogels, R. (2001). Inferotemporal neurons represent low-dimensional
configurations of parameterized shapes. Nature Neuroscience 4: 1244–52.
Patching, G.R. and Quinlan, P.T. (2002). Garner and congruence effects in the speeded classification of
bimodal signals. Journal of Experimental Psychology: Human Perception and Performance 28: 755–75.
Plomp, G., Liu, L., van Leeuwen, C., and Ioannides, A.A. (2006). The mosaic stage in amodal completion
as characterized by magnetoencephalography responses. Journal of Cognitive Neuroscience 18: 1394–405.
Polat, U., Mizobe, K., Pettet, M.W., Kasamatsu, T., and Norcia, A.M. (1998).Collinear stimuli regulate
visual responses depending on cell’s contrast threshold. Nature 391: 580–4.
Pomerantz, J.R. (1983). Global and local precedence: selective attention in form and motion perception.
Journal of Experimental Psychology, General 112: 516–40.
Pomerantz, J.R. and Lockhead, G.R.(1991). Perception of structure: an overview. In: G.R. Lockhead and
J.R. Pomerantz (eds.), The Perception of Structure, pp. 1–20. Washington, DC: American Psychological
Association.
Pomerantz, J.R., Sager, L.C., and Stoever, R.J. (1977). Perception of wholes and of their component
parts: some configural superiority effects. Journal of Experimental Psychology: Human Perception &
Performance 3(3): 422.
Pomerantz, J.R., Pristach, E.A., and Carson, C.E. (1989). Attention and object perception. In: B. Shepp,
and S. Ballesteros (eds.), Object Perception: Structure and Process, pp. 53–89. Hillsdale: Erlbaum.
Qiu, F.T. and Von Der Heydt, R. (2005). Figure and ground in the visual cortex: V2 combines stereoscopic
cues with Gestalt rules. Neuron 47(1): 155.
Quiroga, R.Q., Kreiman, G., Koch, C., and Fried, I. (2008). Sparse but not ‘grandmother-cell’ coding in the
medial temporal lobe. Trends in Cognitive Science 12: 87–91.
Rao, R.P. and Ballard, D.H. (1999). Predictive coding in the visual cortex: a functional interpretation of
some extra-classical receptive-field effects. Nature Neuroscience 2: 79–87.
Rensink, R.A. and Enns, J.T. (1995). Preemption effects in visual search: Evidence for low-level grouping.
Psychological Review 102: 101–30.
Ringach, D., Hawken, M., and Shapley, R. (1997). The dynamics of orientation tuning in the macaque
monkey striate cortex. Nature 387: 281–4.
Roelfsema, P.R. (2006). Cortical algorithms for perceptual grouping. Annual Review of Neuroscience
29: 203–27.
Roelfsema, P.R., Lamme, V.A., and Spekreijse, H. (1998). Object-based attention in the primary visual
cortex of the macaque monkey. Nature 395(6700): 376–81.
Sergent, J. (1982). The cerebral balance of power: Confrontation or cooperation? Journal of Experimental
Psychology: Human Perception & Performance 8: 253–72.
Skarda, C.A. and Freeman, W.J. (1987). How brains make chaos in order to make sense of the world.
Behavioral and Brain Sciences 10: 161–95.
988 van Leeuwen

Stins, J. and van Leeuwen, C. (1993). Context influence on the perception of figures as conditional upon
perceptual organization strategies. Perception & Psychophysics 53: 34–42.
Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology 18
643–62.
Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature 401: 269–72.
Tanaka, K., Saito, H., Fukada, Y., and Moriya, M. (1991) Coding visual images of objects in the
inferotemporal cortex of the macaque monkey. Journal of Neurophysiology 66: 170–89.
Treisman, A. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology
12: 97–136.
Treisman, A. and Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human
Perception & Performance 16: 459–78.
Tsunoda, K., Yamane, Y., Nishizaki, M., and Tanifuji, M. (2001). Complex objects are represented
I macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience 4
(8): 832–8.
Ungerleider, L.G. and Mishkin, M. (1982). Two cortical visual systems. In: D.J. Ingle, M.A. Goodale, and
R.J.W. Mansfield (eds.), Analysis of Visual Behavior, pp. 549–80. Cambridge: MIT Press.
von der Heydt, R. and Peterhans, E. (1989). Mechanisms of contour perception in monkey visual cortex.
I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48.
van Leeuwen, C. and Bakker, L. (1995). Stroop can occur without Garner interference: Strategic and
mandatory influences in multidimensional stimuli. Perception & Psychophysics 57: 379–92.
van Leeuwen, C., Steyvers, M., and Nooter, M. (1997). Stability and intermittency in large-scale coupled
oscillator models for perceptual segmentation. Journal of Mathematical Psychology 41: 319–44.
Wanning, A., Stanisor, L., and Roelfsema, P.R. (2011). Automatic spread of attentional response
modulation along Gestalt criteria in primary visual cortex. Nature Neuroscience 14: 1243–4.
Wolfe, J.M. and Cave, K.R. (1999) Psychophysical evidence for a binding problem in human vision. Neuron
24: 11–17.
Wolfe, J.M., Cave, K.R., and Franzel, S.L. (1988). Guided search: An alternative to the feature integration
model for visual search. Journal of Experimental Psychology: Human Perception & Performance
15: 419–33.
Yokoi, I. and Komatsu, H. (2009). Relationship between neural responses and visual grouping in the
monkey parietal cortex. Journal of Neuroscience 29: 13210–21.
Young, M.P. and Yamane, S. (1992). Sparse population coding of faces in the inferotemporal cortex. Science
256: 1327–31.
Zipser, K., Lamme, V.A., and Schiller, P.H. (1996). Contextual modulation in primary visual cortex. The
Journal of Neuroscience 16: 7376–89.
Zhou, H., Friedman, H.S., and Von Der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. The Journal of Neuroscience 20(17): 6594–611.
Chapter 48

Cortical dynamics and oscillations:


What controls what we see?
Cees van Leeuwen

The Visual System As Distributed and Parallel


In the previous chapter, I sketched the visual system as a complex network in which lateral and
large-scale within-area as well as between-areas feedback loops connect brain regions and circuits.
The system reaches integral representation through recurrent activation cycles operating at mul-
tiple scales within this network. These cycles work in parallel (for instance between ventral and
dorsal stream), but where the onset of their evoked activity differs, they may operate as cascaded
stages. In all these stages, activity spreading within and between regions makes visual representa-
tions dynamically dependent on their context; from contour patterns in early visual perception,
to episodic events in the later stages. In perceptual organization, it is clearly in evidence that these
different processes jointly contribute to what we perceive.
For instance, which part of an image we see as figure and which as ground, depends on tradi-
tional Gestalt factors such as good continuation, parallelism, convexity, and symmetry (Rubin
1921). These are likely to belong to the ‘what’ system in perceptual organization, in other words: the
ventral stream. But contrary to the notion that visual object information is exclusively processed
in the ventral stream, object representations exist in parallel in both streams (Konen and Kastner
2008). Foreground depends also on the dorsal stream, or the ‘where system’: perceivers tend to
assign the role of figure to surfaces in the lower part of the visual field (Vecera et al. 2002) and
to surfaces with a wide base and a narrow top (Hulleman and Humphreys 2004). Also semantic
or episodic factors come into play; a silhouette of familiar shape is more likely to be considered
figure than the same shape upside-down (Peterson and Skow-Grant 2003). We may conclude that
representation in the visual system is distributed; different parts of the system represent visual
information in different, and potentially contradictory respects.
Classical recurrent neural networks can represent visual information in a distributed manner.
But they can process only one distributed pattern at a time, since pattern components are iden-
tified based on simultaneous activity (von der Malsburg 1985). Perceptual representations are
distributed in a more radical sense than this; visual input is intrinsically ambiguous and, because
of this, it would be important for perceptual organization not to settle on one single representa-
tion, but offer a range of options. Partially occluded objects illustrate this. Any such object can
be completed in an indefinite number of ways, in principle, and the task for the visual system is
to consider a range of plausible ones (Buffart et al. 1983; van Lier et al. 1995). We maintain such
alternative representations, at least for the time needed for the visual system to settle on one of
these alternatives. Among the possibilities, there is likely to be a representation as of the pattern
without occlusion, i.e. as a mosaic. For instance, consider Figure 48.1.
990 van Leeuwen

Fig. 48.1  Four occluded figures (right side of each panel) and their possible local, global, and mosaic
interpretations.
Part (a): Adapted from R.J. van Lier, P.A. van der Helm, and E.L.J. Leeuwenberg, Competing global and local
completions in visual occlusion, Journal of Experimental Psychology: Human Perception and Performance, 21(3),
pp. 571–583. http://dx.doi.org/10.1037/0096-1523.21.3.571 (c) 1995, American Psychological Association.
Parts (b)–(d): Reproduced from G. Plomp, C. Nakatani, V. Bonnardel, and C. van Leeuwen, Amodal completion as
reflected by gaze durations, Perception, 33(10), pp. 1185–2000, doi: 10.1068/p5342x Copyright © 2004, Pion.
With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com.

In line with the hierarchical account of perception described in the previous chapter, Sekuler and
Palmer (1992) proposed that the mosaic interpretation is actually computed first. In behavioral stud-
ies, priming with short stimulus onset asynchrony (SOAs; the latency between the onset of the prime
and the target stimulus) facilitated the mosaic figure, whereas long SOAs facilitate the occlusion
interpretation. More recent studies of facilitation by the prime using MEG measurement showed no
such processing order. Indeed, in the period of 50–300 ms after stimulus onset, priming facilitated
both mosaic and different occluded interpretations. This effect was found in occipitotemporal areas,
in particular in the right fusiform cortex, which therefore acts as a hub for different occluded figure
interpretations in this stage of perception (Liu et al. 2006). Thus, for at least this time period, this
part of the visual system keeps active multiple alternative representations of a pattern, including the
Cortical Dynamics and Oscillations 991

mosaic, and thus leaves the choice between several alternative options open. Surrounding (Bruno
et al. 1997; Dinnerstein and Wertheimer 1957; Rauschenberger et al. 2004) or preceding context,
including primes (Plomp et al. 2006; Plomp and van Leeuwen 2006) can bias the choice between
these interpretations during this interval. Occlusion, therefore, provides a key example of the visual
system keeping multiple representations of the same object active at the same time.
Since the visual system compiles and maintains different representations in parallel, even of the
same pattern, neural networks, which allow one pattern to be processed at a time, will not do. Since
each of these representations is determined, to various extents, by shared information from ‘what’
and ‘where’ visual functions, as well as by episodic and semantic memory, a study of isolated areas,
regions or activity sources alone, will not do. We need to consider the coexistence of these repre-
sentations, their interaction, and the mechanisms with which these interactions are effectuated.

Distributed Systems and Connectivity Issues


This section will be concerned with the question:  what kind of architecture permits the visual
system to consist of different subsystems, and yet have such a rich connectivity that enables them
to share their information? We need to consider the connectivity to link circuits within brain
areas, as well as large-scale brain networks. These are collections of interconnected brain areas at
distances larger than two cm, which involve cortical areas, subcortical structures, and the neurons
that control muscles and glands (Bressler and Menon 2010). The brain has a complex network
structure that is known as a ‘modular small-world’ structure (He et al. 2007; Iturria-Medina et al.
2007; Sporns and Zwi 2004). ‘Small world’ means that the network consists of a number of densely
interconnected clusters (like regular networks) with sparse connections in-between, which con-
nect units in an optimally efficient ways (like random networks) between clusters. ‘Modular’
means that the clusters are connected via hubs.
Processing in domain-specific subsystems (local processes) and processing with access to widely
distributed domains of semantic and episodic information (global processes) might seem to require
two vastly different kinds of network architectures. However, small-world networks do enable us
to combine both types of processes, as their architecture is both locally clustered and globally con-
nected (Watts and Strogatz 1998). In fact, small-world structure is demonstrably the best way to
organize how large arrays of dynamical units interact (Latora and Marchiori 2001). The architecture
is efficient enough to enable global processing, without the need for the output of local processes to
converge on a single area. Areas in which information from different local processes converge could
therefore have a different function than previously considered. Rather than the seat of higher, global
processing, they are the hubs, the relay stations that globally shared information passes through.
How did the brain become such an optimal structure? It cannot possibly be prescribed by the
genes, which simply do not contain enough information to determine the layout of all possi-
ble connections between a billion neurons. This suggests that part of the problem is solved by
self-organization. Brain structure evolves through gradual rewiring of synaptic connections, in
which, along with processes such as maturation, the activity patterns within the network play a
constitutive role. Early on in the visual system and throughout the immature brain, large-scale
burst and wave-like pattern dynamics (Nakatani et al. 2003) dominate spontaneous activity. In a
series of papers, Gong and van Leeuwen 2003, 2004; Kwok et al. 2007; and Rubinov et al. 2009a;
van den Berg et al. 2012; van den Berg and van Leeuwen 2004) have shown in a simplified, theo-
retical model, how such spontaneous activity shapes and maintains, in principle, the essential
properties of a global brain network’s optimal state (see Figure 48.2 for illustration).
Evolution of small-world structure may be disrupted when connections become too sparse. This
may be what we are observing in initial stages of schizophrenia. Failure to maintain small-world
992 van Leeuwen

Fig. 48.2  Adaptive rewiring leads from an initial random network (left), to modular small-world
structure (right) in small iterative steps. Coupled chaotic oscillators at the nodes synchronize and
desynchronize their activity spontaneously. Over time, pairs of synchronized units that are not connected
receive a connection, and where connected units are not synchronized, connections are removed.
During this process, a modular, small-world structure emerges from an initially random configuration.
Reproduced from Daan van den Berg, Pulin Gong, Michael Breakspear, and Cees van Leeuwen, Fragmentation:
loss of global coherence or breakdown of modularity in functional brain architecture?, Frontiers in Systems
Neuroscience, 6, p. 20, doi: 10.3389/fnsys.2012.00020 (c) 2012, Frontiers Media S.A. This work is licensed under
a Creative Commons Attribution 3.0 License.

structure with increasing sparseness means that the network tends, to some degree, to resemble
a random structure (van den Berg et al. 2012). In the real brain, this may have dramatic conse-
quences. Because of the randomness, the system will have difficulties tracing the origin of signals
in the brain, which means that the observer cannot distinguish perception from hallucination. In
random networks, global connections are relatively predominant (Rubinov et al. 2009b; van den
Berg et al. 2012). The consequence is that patients who suffer connectivity loss, e.g. beginning
schizophrenics, will have difficulty in directing their attention towards local structures (Bellgrove
et al. 2003; Coleman et al. 2009).
Sleep deprivation is another way in which excess randomness is introduced to the network. Our
wakeful experiences continually modify brain connectivity, in a manner that can be considered
random as far as large-scale structure is concerned. One of the functions of sleep, therefore, is to
restore the small-world network structure (Koenis et al. 2011). Indeed, whereas in (REM) sleep
deprivation selectively only affects basic visual discrimination tasks (Karni et al. 1994), general
sleep deprivation (but not, for instance, physical exercise) leads to weakened perceptual organiza-
tion performance on the hidden figures task (Lybrand et al. 1954). In non-REM sleep, we observe
wave-like activity similar to the immature brain, and we may speculate on its role in restoring the
network connectivity structure.
I mentioned the importance of brain connectivity and its pathologies. But the structural connec-
tivity is only relevant, insofar it leads to co-activation of brain circuits and regions. Studies using
fMRI have shown large-scale, distributed patterns of spontaneous activity in the brain (Cordes
et al. 2000; Lowe et al. 1998). These patterns reflect brain connectivity structure (Achard et al.
2006; Bassett and Bullmore 2006; Stam 2004). Correlated patterns in spontaneous fMRI activity
predict which brain regions are likely to respond together during a task (Fox and Raichle 2007).
Pre-stimulus activity could therefore be a way of anticipating the incoming sensory information
by dynamically established coordination of active circuits (Hesselmann et al. 2008). These authors
briefly presented Rubin’s ambiguous face/vase stimuli and observed that when pre-stimulus
Cortical Dynamics and Oscillations 993

activity in the fusiform area, a cortical region preferentially responding to faces, was high, observ-
ers were likely to subsequently perceive the stimulus as a face instead of a vase. Correlated activity
in brain circuits and regions should enable transient coalitions of distributed brain regions, which
jointly represent the information available to the system.
It is possible, therefore, to extract a ‘functional network’ from the activity patterns (for reviews,
see Basset and Bullmore 2006; Bullmore and Sporns 2009). In addition to small worlds, functional
networks extracted from fMRI (Eguiluz et al. 2005) and EEG (Linkenkaer-Hansen et al. 2001 for
amplitude; Gong et al. 2003 for coherence interval durations) have the property of scale invari-
ance. This means that their characteristics are preserved if the measurement scale is increased
or decreased. Scale invariance is a necessary condition for criticality, and hence for dynamically
assembled complexity and long-term memory in brain activity (Linkenkaer-Hansen et al. 2001).
Networks that have both scale-invariance and modular small world properties can arise as a prod-
uct of network rewiring to spontaneous activity, if we assume that new units are recruited at random
into the network (Gong and van Leeuwen 2003). Thus, the properties of functional connectivity
networks may be the product of adaptation of the system to its own spontaneous activity patterns.

Oscillatory Activity
Coordination of brain regions across a range of scales should be flexible, in a manner that hard-
wired connectivity alone could not provide. One way in which this could be achieved is through
control of excitability. Simultaneous activity between neurons, or regions, is an effective means of
enhancing signal effectivity (Fries 2005).
Let us therefore consider which properties of brain activity are useful in this respect. Activity that
is bounded and cyclical is called oscillatory or (in the continuous case) as wave activity. Periodic
and a-periodic oscillators have a natural tendency to synchronize, either complete (Yamada and
Fujisaka 1983; Pecora and Caroll 1990) or phase only (Rosenblum et al. 1996).
In 1929, Hans Berger first observed the oscillatory properties of the EEG. Tallon-Baudry and
Bertrand (1999) argued that synchrony is always the result of a mixture of internal states and
external events. The effects of spontaneous activity on perception can be explained by the fact that
it continues during task performance: evoked activity shows a similar neuroanatomical distribu-
tion to that observed at rest (Arieli et al. 1996). This property of brain activity may have become
recruited for coordinating activity, and for enabling multiple patterns of activity simultaneously
(evidence reviewed in Thut et al. 2012). According to an influential point of view, synchronization
of oscillatory activity binds together distributed representations (Milner 1974; von der Malsburg
1985). Unlike in classical neural networks, synchronous oscillations allow multiple distributed
patterns to be processed in parallel, as they can be separated in phase.
Episodes of oscillatory brain activity are typically decomposed into an array of band-passed signals.
We distinguish delta, theta, alpha, beta and gamma frequency bands. Distinct cognitive and perceptual
functions have traditionally been associated with each of these bands. EEG and MEG signals provide
us with a picture of how phase and amplitude evolve in time over within bands at different locations
of the scalp. We can study couplings between amplitudes and/or phases at different locations within
frequency bands or between phases and amplitudes of different frequency bands. This includes, for
instance, the coupling of phase (phase synchrony) at two different locations at the scalp or the coupling
between theta phase and gamma amplitude at a certain location (phase-amplitude coupling).

Alpha Activity
Generally, large-scale wave patterns in activity, below eight Hz, are uncommon in healthy adults
when awake. Without stimulation and when the observer is relaxed spontaneous activity is
994 van Leeuwen

dominated by eight to twelve Hz, i.e. alpha activity. Alpha activity is a ‘far from unitary phenome-
non’ (Foxe and Snyder 2011, p. 10). It arises from cortico-thalamic or cortico-cortical loops. Alpha
frequency increases during execution of difficult tasks compared with more simple ones (complex
addition and mental rotation vs. simple addition and visual imagery). The increase is largest in
the hemisphere that is dominant for the task, i.e. arithmetical tasks for the left, and visuo-spatial
tasks for the right hemisphere (Osaka 1984). Peak alpha frequency correlates positively with spe-
cific verbal and non-verbal abilities (Anokhin and Vogel 1996; Jausovec and Jausovec 2000; Shaw
2004) and memory performance (Klimesch et al. 1990) and are a reliable individual characteristic.
In perceptual organization, the peak alpha frequency has implications for whether a perceiver is
likely to integrate the surrounding context (i.e. field dependence) or as isolated from its surround-
ing context (field independence)—see van Leeuwen and Smit (2012). This individual difference
has consequences for whether a pattern is perceived as a consistent whole, or as a loose collection
of object features. According to some authors (Peterson and Hochberg 1983; Peterson and Gibson
1991), objects are predominantly perceived in a ‘piecemeal fashion’. That is, they are seen as a loose
collection of features. This, however, may be a consequence of presenting objects in isolation.
When object are seen in a surrounding context, the objects themselves tend overall to be seen as
integral wholes. However, this happens to different degrees, depending on perceiver’s peak alpha.
Alpha activity, thus, is an important modulator of whether perception is predominantly local or
global.
This observation is in accordance with the understanding that alpha activity is involved in sup-
pressing neurons responsible for processing stimuli outside of the focus of attention (Lopes da
Silva 1991). Alpha oscillations, represent a certain rhythm ‘pulsed inhibition’ (Mathewson et al.
2011) on attentional processes. In the previous chapter, we have seen that attention spreads over
time (e.g. Roelfsema 2006). When the spreading is periodically inhibited, then if this happens
relatively fast, perceptual integration will remain within a restricted region.
Presentation of a stimulus affects the ongoing alpha EEG/MEG. This effect takes the form of an
event-related amplitude decrease (called event-related desynchronization or ERD, based on the
assumption that amplitude is the result of large numbers of neurons firing in unison) and subse-
quent re synchronization (ERS). A visual input results in the desynchronization of occipital alpha
rhythms (Pfurtscheller and Lopes da Silva 1999). The alpha ERD can be understood as a sign that
the area is engaged in processing.

Pattern Dynamics of Alpha Activity


Pioneering work by Lehmann and colleagues has analyzed the spatial distribution of amplitude
of spontaneous EEG activity in the alpha range (Lehmann et al. 1987). They showed that in the
resting condition, certain spatial patterns of EEG activity across the scalp are systematically pre-
ferred. Distributions of electrical brain potential, consisting of a maximum and minimum, each
surrounded by concentric gradients, remained stationary for certain periods of time, before sud-
denly jumping to a new location.
More recently this phenomenon has been studied using phase synchronization of alpha activ-
ity over the entire scalp. The large-scale correlation patterns in spontaneous activity have a
small-world structure with heritable characteristics (Smit et al. 2007).
The patterns themselves take the form of travelling or standing waves (Ito et al. 2005): one is
a gradual phase shift in alpha activity between frontal and occipital regions. The other pattern
involves an abrupt phase shift in the central region. This pattern may correspond to a stand-
ing wave composed of two traveling waves propagating in opposite directions. In-between the
Cortical Dynamics and Oscillations 995

periods where wave activity dominates the brain, there are episodes where the activity appears
more disorganized.
The alternation of irregular and regular episodes is a fundamental property of brain activity
(Gong et al. 2007; Kitzbichler et al. 2009). These episodes emerge, hold, and dissipate across a
range of temporal scales (Freeman and Baird 1987; Friston 2000; Gong et al. 2003; Leopold and
Logothetis 2003). Ito et al. (2007) characterized the short- and long-term behavior of these pat-
terns. To some patterns visited earlier, the system had a tendency to dwell in, or return within
hundreds of milliseconds; on a time scale of several to ten seconds. The transitions were irregular
in the short-term but showed systematic preferences in the long-term dynamics. This kind of
wandering behavior is called chaotic itinerancy (Kaneko and Tsuda 2001). Chaotic itinerancy is a
mechanism that enables a system to visit a broad variety of synchronized states, and to dwell near
them without becoming trapped in any of them. Chaotic itinerancy offers a theoretical basis for
the transient character of brain dynamics and suggests flexibility which is essential for effective
brain functioning. Thus, the dynamical properties of spontaneous activity provide the brain with
flexibility: an openness to respond to a great variety of stimuli.
This kind of dynamics may play a role in perceptual organization. First: consider perceptual
organization to be a process that needs to be achieved rapidly. Too much stability of any preced-
ing state will hamper that. Second: dynamic flexibility is needed, in order not to settle on a given
interpretation. We can observe spontaneous changes of interpretation in ambiguous figures, such
as the Necker cube. The same mechanism may be at work, when it comes to detecting a hidden
perceptual structure. This will never work if the system settles on a given interpretation of an
object and stays there, until perturbed by new incoming stimulation. Some spontaneous wander-
ing should characterize perceptual organization.

Anticipatory Activity: Beta and Gamma


When the observer changes from relaxation to active anticipation, activity changes as well: faster
rhythms gain in prominence. Lopes da Silva et  al. (1970) observed this phenomenon in dogs.
Cortical areas that showed alpha rhythms in relaxed animals shifted to beta and gamma activity
when a stimulus associated with a reward was expected.
The beta band activity has traditionally been associated with sensori-motor integration
(Murthy and Fetz 1992). Tallon-Baudry et al. (2001) observed sustained beta range activity during
short-term memory rehearsal of a stimulus in epilepsy patients with intracranially implanted elec-
trodes. In a study in which monkeys had to discriminate between vibrotactile stimuli, beta band
oscillations were observed in medial prefrontal and primary motor cortices prior to the motor
response. These oscillations were absent, however, in a control condition, where the motor behav-
ior did not require a perceptual decision (Hernandez et al. 2010). Beta-activity is also observed in
visual object retrieval from semantic memory (Supp et al. 2005). Von Stein et al. (1999) observed
enhanced beta coherence in temporal and parietal cortex during presentation of semantic infor-
mation, independently of the presentation modality.
Beta oscillations arise in model studies of realistic neural circuits consisting of regular-spiking
pyramidal neurons, fast-spiking and low-threshold interneurons. These oscillations peak at
high-beta:  23–24 Hz. Normally, when fast-spiking interneurons are selectively activated, this
leads to higher-frequency, gamma activity. But when the low-threshold spiking neurons become
involved, their intermittent recruitment lowers the resonance frequency of the ensemble
(Vierling-Claassen et al. 2010). Low-threshold interneurons are interesting for communication
between areas, because, unlike the other interneurons, which synapse locally, these ones synapse
996 van Leeuwen

on distal dendrites of pyramidal neurons (Markram et al. 2004). Beta oscillations may therefore
facilitate information transfer between areas (Livanov 1977). Wrobel (2000) showed in cat that
during attentive visual behavior, 300 ms to one second long bursts of beta frequency activity oper-
ate within the cortico-geniculate feedback cycle to enhance visual information transmission from
the LGN. Beta bursting spread to other visual centers, including the lateral posterior and pulvinar
complex and higher cortical areas. These bursts coincide in time with gamma oscillations.
Accordingly, Vierling-Claasen’s et al.’s (2010) model produced a lot of gamma along with the
beta activity. Across various cognitive tasks, beta and gamma power show similar scalp distribu-
tions (Fitzgibbon et al. 2004). According to Siegel et al. (2012), whereas gamma activity reflects
the emergence of a percept, it is likely that beta oscillations reflect maintenance of perceptual
information. Combined with the previous observations about the role of beta in transmission of
information, this implies that maintenance of visual stimuli occurs through interactions between
areas (Simione et al. 2012).
Gross et  al. (2004), using MEG, demonstrated a role for beta oscillations in maintenance of
information attentional blink conditions. The attentional blink involves the presentation of sev-
eral visual stimuli in rapid succession (at a rate of approx 100 ms); two targets are embedded in
the presentation sequence. Whereas the first one is usually detected easily, the second one is often
missed, in particular if the temporal separation (lag) equals 300 ms. Gross et al. (2004) showed
that detection in these conditions was accompanied by enhanced beta coherence between sources
in temporal cortex DLPFC and PPC. In the same task, Nakatani et al. (2005) demonstrated the
role of gamma synchrony prior to the onset of the target, which was increased when the target
was successfully detected, as compared to when the target was missed. Taken together the results
of Gross and Nakatani support Siegel et al. (2012) about the complementary roles of beta and
gamma frequencies.
Synchrony in the gamma band, therefore, may be related to the emergence of the percept rather
than to its maintenance. Nakatani and van Leeuwen. (2006) studied the relationship between
long-distance transient phase synchronization in EEG and perceptual switching in the Necker
cube. Transient periods of response related synchrony between parietal and frontal areas were
observed. They start 800–600 ms prior to the switch response and are sometimes accompanied by
transient alpha band activity in the occipital area. The results indicate that perceptual switching
processes involve parietal and frontal areas; these are the ones that are normally associated with
visual attention and decision-making.

Evoked Activity: Beta and Gamma


Consistency of synchrony in evoked activity may result from ongoing activity through a reor-
ganization of phase (phase resetting). Phase resetting is held responsible for the generation
event-related potentials (ERP) (Makeig et al. 2002). Quasi-stable patterns of synchrony in the beta
and gamma frequency range in the rest condition are demarcated by abrupt phase changes with
a frequency in the theta or alpha range (Freeman et al. 2003). Stimulation aligns such patterns to
stimulus onset (Freeman 2005).
Thus, episodes of regular and irregular activities alternate, not only in spontaneous but also in
evoked activity. These episodes may have a functional role in information processing. Irregular
activity will reflect information processes occurring within regions; at the scalp, these periods will
look desynchronized and unstable.
The episodes of quasi-stable synchronized activity have been called ‘coherence intervals’ (van
Leeuwen 2007; van Leeuwen and Bakker 1995). During these intervals, previously processed
Cortical Dynamics and Oscillations 997

information is propagated to other brain areas. The differences in time it takes for such informa-
tion to reach their multiple destinations is accommodated by keeping the window open for a
while, e.g. up to 200 ms (van Wassenhove et al. 2007).
The regular episodes thus provide a mechanism for global broadcasting of results in informa-
tion processing that are needed for conscious access to visual information (Baars 1988, 2002). In
the previous chapter, we have seen how traditionally, conscious access is centered upon conver-
gence zones; areas where the information from many regions comes together. Rather than conver-
gence, we see these areas as hubs, or relay stations, in the communication between brain regions,
based on principles of synchrony. As a result, conscious access functions belong to organized
brain activity, rather than specific local regions. The activity is not tied to any region in particular,
as it travels along the cortex; it may, however, visit the hubs regions more consistently then others
(see Alexander et al. 2013).
During these intervals, the informational content remains unchanged. As a result, the con-
tent of perceptual experience is fixed in an extended psychological present (cf. Stroud 1955).
The duration of coherence intervals was estimated at 50–300 ms (Bressler et al. 1993; Dennett
and Kinsbourne 1991; Varela 1995). In the rest condition the durations of the patterns have a
power-law distribution (Gong et al. 2003; Kitzbichler et al. 2009) which indicates that the system
is in a state of dynamical criticality (Kitzbichler et al. 2009). When the system is perturbed by
a stimulus, the scale-free distribution is suppressed and changes into a characteristic distribu-
tion (Nikolaev et al. 2010; Nikolaev et al. 2005). The new distribution often turns out to be an
extreme-value distribution (Nikolaev et al. 2010). Indeed, as the interval reflects the propagation
of information, this will take place in parallel across multiple channels. The extreme-value distri-
bution of these intervals then means that the length of the interval is determined by the slowest
channel (cf. Pöppel 1970). Since the slowest channel determines the durations of episodes of syn-
chronous activity, their averages may reflect information-processing demands of the task at hand.
We tested this prediction by studying the patterns of quasi-stable synchrony over small regions
on the human scalp with an electrode spacing of two cm (Nikolaev et  al. 2005). We selected
electrode chains over the scalp region with maximal ERP activity following presentation of the
stimuli. To obtain the intervals of quasi-stable synchrony we measured the variability of phase
synchronization indices within electrode chains. Then the duration of the intervals in which the
variability fell below the threshold was computed. The comparison of durations showed that in the
beta EEG frequency range the intervals were longer when observers were engaged in a perceptual
task than when they were stimulated without task. This result was interpreted as evidence that
more information was transferred across brain areas in ‘task’ than ‘no-task’ conditions.

Coherence Intervals Reflect Stimulus Pattern Information


In order to quantitatively demonstrate the role of these local synchronization patterns in global
information processes, we adopted a paradigm from psychophysics, in which participants
reported orientation of the perceived groupings of dot lattices. Proximity determines perceived
grouping through aspect ratio (AR) which is the ratio of the two shortest inter-dot distances, b vs.
a (Kubovy et al. 1998, chap. 53). The larger AR the stronger is the preference for grouping accord-
ing to proximity; the more AR approaches 1, the more ambiguous is interpretation of orientation
of the dot lattice. Ambiguity equals uncertainty or the inverse of information (van Leeuwen and
van den Hof 1991). Thus, the larger AR, the more information contained in the stimulus.
In a preliminary investigation, we determined which evoked component of the brain signal was
sensitive to AR (Nikolaev et al. 2008). At the scalp location of that component, we measured the
998 van Leeuwen

durations of synchronized intervals in relation to the aspect ratio of the dot lattice. We found a
simple, linear relation of aspect ratio with coherence interval duration. This means that the more
information contained in the stimulus, the longer the coherence intervals in the evoked activity.
In individuals, the duration of the coherence intervals was found to be strongly correlated to
grouping sensitivity. Thus, coherence intervals directly reflect the amount of stimulus information
processed rather than available in the physical stimulus.
We concluded that the intervals of synchronized activity may reflect the time needed for prom-
ulgation of the stimulus information from the visual system to the rest of the brain. The coherence
intervals, thus, represent global broadcasting of visual information. Global broadcasting has been
associated with visual conscious awareness and the emergence of visual experience (Dehaene
et al. 2006).
Global broadcasting takes central stage in global workspace theories and models of visual
information processing. These models are increasingly successful in dealing with a wide range of
phenomena in visual experience, such as the limited capacity of visual working memory, visual
persistence, and the attentional blink (e.g. Simione et al. 2012). Large-scale dynamics provides a
mechanism for coordinating the information processing which endows these models with greater
neural plausibility.

Event-Related Gamma Activity


With oscillatory activity, two patterns can be simultaneously active and still be separated in phase.
Singer and others set out to study oscillatory activity in local field potentials, initially of mildly
anaesthetized cat and monkey, and later on in awake animals (Eckhorn et al. 1988; Gray et al.
1989). They observed synchronization between distinct areas of the visual cortex, depending on
whether these areas were activated by a single, coherent pattern. These synchronizations typically
occurred in the gamma range (40–70 Hz) of oscillation frequency. The dynamic phase synchrony
in the gamma band enables transient association of cortical assemblies (Engel and Singer 2001).
The authors concluded, somewhat controversially to date, that gamma oscillations were involved
in the representation of distinct features as belonging to a perceptual whole, in other words, in
perceptual integration of visual features.
These kind of invasive studies are impossible in humans. At larger scale, oscillations can be
studied by measuring electrical (EEG) or magnetic (MEG) potential at the scalp (Revonsuo et al.
1997; Rodriguez et al. 1999; Varela et al. 2001; Engel and Singer 2001). Phase synchrony in the
gamma band (30–80 Hz) EEG is a sensitive measure for various phenomena, such as object detec-
tion, memory retention, illusion, attention, readiness, and consciousness (Fell et al. 2003; Lee et al.
2003; Lutz et al. 2002; Rodriguez et al. 1999; Tallon-Baudry and Bertrand 1999; Tallon-Baudry
et  al. 1998; Tallon-Baudry et  al. 1997). In a random-dot stereogram experiment, gamma band
synchrony appears transiently when a percept becomes organized and it disappears quickly after
the percept has been obtained (Revonsuo et al. 1997).

Slow Wave Modulations


Transitions for conscious access can be related to delta (<4 Hz) and theta (4–8 Hz) ranges (Baars
and Franklin 2003; Gaillard et al. 2009; Sergent et al. 2005; Zylberberg et al. 2011). Cortical theta is
prominent in young children; In older children and adults, it tends to appear predominantly dur-
ing drowsy, meditative, or sleeping states, but not during the deepest stages of sleep. Theta phase is
considered as the carrier for information encoding and read-out, which are the two most funda-
mental functions of neural information processing and conscious access (Lisman and Idiart 1995).
Cortical Dynamics and Oscillations 999

Delta band activity is the frequency of the P3 ERP component, which has been taken to signal the
emergence of global workspace activity (Sergent et al. 2005). Delta activity is observed a solitary,
high amplitude brain wave with an oscillation period between zero and four hertz. Delta phase
has been related to top-down modulation of sensory signal strength (Lakatos et al. 2005, 2009).

Coupling of Slow and Fast Waves


Lower frequency oscillations tend to recruit neurons in larger cortical areas but tend to be more
spatially restricted in the case of higher frequencies, for instance beta/gamma rhythms. Thus,
whereas in gamma oscillation, the cortex appears to be functionally organized as a mosaic of neu-
ronal assemblies, the lower frequencies may be more widespread across the brain. A possible way
in which the brain at large scale can coordinate cortical processes at smaller scale is by modula-
tion of fast by slower waves. Canolty et al. (2006) reported coupling between theta band (4–8 Hz)
phase and high-gamma band (80–150 Hz) amplitude in ECoG data in various cognitive tasks.
Slow oscillatory activity can bias input selection, connect populations of neurons into assem-
blies, and facilitate synaptic plasticity (reviewed in Buzsaki and Draguhn 2004). Large-scale
networks are recruited during oscillations of low frequency (Steriade 2001). Slow rhythms syn-
chronize large spatial domains and connect local neuronal assemblies by orchestrating the timing
of high frequency oscillations (Buzsaki and Draguhn 2004).
Fast oscillatory activities, in particular, gamma (> 30 Hz) and beta (12–30Hz) oscillations which
were considered important for, respectively, emergence and maintenance of perceptual represen-
tation, both in models (Dehaene et al. 2006; Raffone and Wolters, 2001) and empirical studies
(Gross et al. 2004; Kranczioch et al. 2007; Nakatani et al. 2005) can thus be coupled to slow oscil-
lations. The coupling may therefore support the interaction between access control processors and
sensory information processing and maintenance in posterior areas.
Such cross-frequency coupling may play a key role for conscious access. Several models of con-
sciousness agree that conscious access involves large-scale cooperative and competitive interactions
in the brain, beyond specialized processing in segregated modules (e.g. Baars 1988, Block 2001;
Dehaene et al. 1998, Deheane et al. 2006; Maia and Cleeremans 2005; Tononi and Edelman 1998).
The principles for such global processing architecture were proposed in the Global Workspace
Theory (Baars 1988, 2002); the conditions for the neurocomputational implementation of such
principles were further specified (Dehaene et al. 2006; Gaillard et al. 2009). These views have led
to development of computational models with multi-modular and neurally-inspired characteris-
tics, Global Workspace (GW) models (Dehaene et al. 2006; Simione et al. 2012).
For instance, Simione’s model accounts for a set of perceptual phenomena in which conscious
access is involved, which includes the effect of partial report (Sperling 1960), the limited capacity
of visual working memory (Luck and Vogel 1997), and the Attentional Blink effect e.g., Raymond
et  al. 1992). The Attentional Blink effect arises in the model because the second target is pro-
cessed only at a first parallel (perceptual) stage, and therefore does not give rise to the global
self-sustained activity pattern of the GW supporting conscious access, as long as the GW is still
occupied. This is the result of interactive gating between lower perceptual processing modules and
higher access control modules. The access control modules consist of GW and visuospatial work-
ing memory (VSWM) modules, with maintenance of target information being largely distributed
and also involving perceptual processing modules.
The model suggests that coupling between theta phase and amplitude of fast oscillations,
between beta and lower-gamma, support the interaction between the GW and distributed codes
in posterior cortex for processing and maintenance of target information. Nakatani et  al. (in
press) investigated phase-amplitude coupling in AB conditions. They found coupling between
1000 van Leeuwen

the phase of access control-related slow oscillatory activity and the amplitude of fast oscillations
encoding perceptual contents for conscious access in a cognitive task. This coupling increased in
strength during practice of the task, corresponding with increase of correct target recognition
under AB conditions.

Conclusions and Open Issues


Oscillations control in a coordinated fashion the excitability of neurons. Different frequency
bands of oscillations appear to have different roles in information processing: alpha has predomi-
nantly been associated with relaxation and inhibition. Its effect on processing is indirect, insofar
as peak alpha frequency provides ‘pulsed inhibition’, thereby establishing a time window for per-
ceptual integration. Beta activity reflects the maintenance of visual information and the com-
munication of the percept between areas, thus establishing a virtual global workspace, a unified
theatre of consciousness. Gamma arise when the percept emerges, and may reflect initial feature
binding and integration, albeit with somewhat shorter loops than beta. The lower frequencies
offer a mechanism for orchestrating the higher frequency ones.
One way in which the organized activity manifests itself, is in the coupling between activity
in different frequency ranges. In characterizing brain function, therefore, the precise timing of
activity plays an essential role. With existing methods for analyzing brain activity, it has been pos-
sible to track the flow of activity with high temporal resolution (Liu and Ioannides 1996). Doing
so in single trials reveals that results are not described well by the average. There is a great deal
of trial-to-trial variability in the spatiotemporal organization of brain activity. This suggests that
signal averaging can be misleading. Indeed, it was recently shown that trial-averaging techniques
lead to false positives in identifying static sources of brain activity, and to an underestimation of
moving, i.e. spreading components in brain activity, i.e. traveling waves (Alexander et al. 2013). It
is this type of activity that we have emphasized here, as having a role in brain function in general,
and in conscious access in particular. In this account, consciousness does not belong to any spe-
cific region, but to the spatiotemporal organization of brain activity.

Acknowledgments
The author is supported by an Odysseus research grant from the Flemish Organization for Science
(FWO) and wishes to thank Lee de-Wit, Michael Herzog, and Naoki Kogo for useful comments.

References
Achard, S., Salvador, R., Whitcher, B., Suckling, J., and Bullmore, E. (2006). A resilient, low-frequency,
small-world human brain functional network with highly connected association cortical hubs. Journal
of Neuroscience 26: 63–72.
Alexander, D.A., Jurica, P., Trengove, C., Nikolalev, A.R., Gepshtein, S., Zviagyntsev, M., Mathiak, K.,
Schulze-Bonhage, A., Rüscher, J., Ball, T., and van Leeuwen, C. (2013). Traveling waves and trial
averaging; the nature of single-trial and averaged brain responses in large-scale cortical signals. NeuroImage
73: 95–112.
Anokhin, A.P. and Vogel, F. (1996). EEG alpha rhythm frequency and intelligence in normal adults.
Intelligence 23: 1–14.
Arieli, A., Sterkin, A., Grinvald, A., and Aertsen, A. (1996). Dynamics of ongoing activity: explanation of
the large variability in evoked cortical responses. Science 273(5283): 1868–71.
Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge: Cambridge University Press.
Cortical Dynamics and Oscillations 1001

Baars, B. J. (2002). The conscious access hypothesis: origins and recent evidence. Trends in Cognitive
Sciences 6: 47–52.
Baars, B. J. and Franklin, S. (2003). How conscious experience and working memory interact. Trends in
Cognitive Sciences 7: 166–72.
Bassett, D. and Bullmore, E. (2006). Small-world brain networks. Neuroscientist 12: 512–23.
Bellgrove, M.A., Vance, A., and Bradshaw, J.L. (2003). Local-global processing in early-onset
schizophrenia: evidence for an impairment in shifting the spatial scale of attention. Brain and Cognition
51: 48–65.
Block, N. (2001). Paradox and cross purposes in recent work on consciousness. Cognition 79(1–2): 197–219.
Bressler, S. L. and Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and
principles. Trends in Cognitive Sciences 14: 277–90.
Bressler, S. L., Coppola, R., and Nakamura, R. (1993). Episodic multiregional cortical coherence at
multiple frequencies during visual task performance. Nature 366(6451): 153–6.
Bruno, N., Bertamini, M., and Domini, F. (1997). Amodal completion of partly occluded surfaces: is there
a mosaic stage? Journal of Experimental Psychology: Human Perception & Performance 23: 1412–26.
Buffart, H., Leeuwenberg, E., and Restle, F. (1983). Analysis of ambiguity in visual pattern completion.
Journal of Experimental Psychology: Human Perception & Performance 9: 980–1000.
Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and
functional systems. Nature Reviews Neuroscience 10: 186–98.
Buzsaki, G. and Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science 304 (5679):
1926–9.
Canolty, R. T., Edwards, E., Dalal, S. S., Soltani, M., Nagarajan, S. S., Kirsch, H. E., Berger, M. S., Barbaro,
N. M., and Knight, R. T. (2006) High gamma power is phase-locked to theta oscillations in human
neocortex. Science 313: 1626–8.
Coleman, M.J., Cestnick, L., Krastoshevsky, O., Krause, V., Huang, Z., Mendell, N.R., and Deborah
L., and Levy, D.L. (2009). Schizophrenia Patients Show Deficits in Shifts of Attention to Different
Levels of Global-Local Stimuli: Evidence for Magnocellular Dysfunction. Schizophrenia Bulletin
35: 1108–116.
Cordes, D., Haughton, V. M., Arfanakis, K., Wendt, G. J., Turski, P. A., Moritz, C. H., . . . Meyerand, M. E.
(2000). Mapping functionally related regions of brain with functional connectivity MR imaging. American
Journal of Neuroradiology 21(9): 1636–44.
Dehaene, S., Kerszberg, M., and Changeux, J. P. (1998). A neuronal model of a global workspace in
effortful cognitive tasks. Proceedings of the National Academy of Science, USA 95(24): 14529–34.
Dehaene, S., Changeux, J. P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious, and
subliminal processing: a testable taxonomy. Trends in Cognitive Sciences 10(5): 204–11.
Dennett, D. and Kinsbourne, M. (1991). Time and the observer: the where and when of time in the brain.
Behavioral and Brain Sciences 15: 183–247.
Dinnerstein, D. and Wertheimer, M. (1957). Some determinants of phenomenal overlapping. American
Journal of Psychology 70: 21–37.
Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M., and Reitboeck, H. J. (1988).
Coherent oscillations: A mechanism of feature linking in the visual cortex. Biological Cybernetics
60: 121–30.
Eguiluz, V. M., Chialvo, D. R., Cecchi, G. A., Baliki, M., and Apkarian, A. V. (2005). Scale-free brain
functional networks. Physical Review Letters 94: 18102.
Engel, A.K. and Singer, W. (2001). Temporal binding and the neural correlates of sensory awareness. Trends
in Cognitive Sciences 5: 16–25.
Fell, J., Fernandez, G., Klaver, P., Elger, C. E., and Fries, P. (2003). Is synchronized neuronal gamma
activity relevant for selective attention? Brain Research: Brain Research Review 42: 265–72.
1002 van Leeuwen

Fitzgibbon, S. P., Pope, K. J., Mackenzie, L., Clark, C. R., and Willoughby, J. O. (2004). Cognitive tasks
augment gamma EEG power. Clinical Neurophysiology 115: 1802–9.
Fox, M. D. and Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional
magnetic resonance imaging. Nature reviews Neuroscience 8(9): 700–11.
Foxe, J.J. and Snyder, A.C. (2011). The role of alpha-band brain oscillations as a sensory suppression
mechanism during selective attention. Frontiers in Psychology art. 154.
Freeman, W. J. (2005). Origin, structure, and role of background EEG activity. Part 3. Neural frame
classification. Clinical Neurophysiology 116(5): 1118–29.
Freeman, W. J. and Baird, B. (1987). Relation of olfactory EEG to behavior: spatial analysis. Behavioral
Neuroscience 101(3): 393–408.
Freeman, W. J., Burke, B. C., and Holmes, M. D. (2003). Aperiodic phase re-setting in scalp EEG
of beta-gamma oscillations by state transitions at alpha-theta rates. Human Brain Mapping
19(4): 248–72.
Fries, P. (2005). A mechanism for cognitive dynamics: neuronal communication through neuronal
coherence. Trends in Cognitive Sciences 9: 474–80.
Friston, K. J. (2000). The labile brain. I. Neuronal transients and nonlinear coupling. Philosophical
Transactions of the Royal Society of London. Series B: Biological Sciences 355(1394): 215–36.
Gaillard, R., Dehaene, S., Adam, C., Clemenceau, S., Hasboun, D., Baulac, M., et al. (2009). Converging
intracranial markers of conscious access. PLoS Biology 7: e61.
Gong, P. and van Leeuwen, C. (2003). Emergence of scale-free network with chaotic units. Physica A,
Statistical mechanics and its applications 321: 679–88.
Gong, P. and van Leeuwen, C. (2004). Evolution to a small-world network with chaotic units. Europhysics
Letters 67: 328–33.
Gong, P., Nikolaev, A. R., and van Leeuwen, C. (2003). Scale-invariant fluctuations of the dynamical
synchronization in human brain electrical activity. Neuroscience Letters 336: 33–6.
Gong, P., Nikolaev, A.R., and van Leeuwen, C. (2007). Dynamics of collective phase synchronization in
human electrocortical activity. Physical Review E, 76, art. 011904.
Gray, C. M., König, P., Engel, A. K., and Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit
intercolumnar synchronization which reflects global stimulus properties. Nature 338: 334–7.
Gross, J., Schmitz, F., Schnitzler, I., Kessler, K., Shapiro, K., Hommel, B., and Schnitzler, A. (2004).
Modulation of long-range neuronal synchrony reflects temporal limitations of visual attention in
humans. Proccedings of the National Academy of Science, USA 101: 13050–5.
He Y., Chen Z.J., and Evans A.C. (2007): Small-world anatomical networks in the human brain revealed by
cortical thickness from MRI. Cerebral Cortex 17: 2407–19.
Hernández, A., Nácher V, Luna, R., Zainos, A., Lemus, L. et al. (2010). Decoding a perceptual decision
process across cortex. Neuron 66: 300–14.
Hesselmann G., Kell C.A., Eger E., and Kleinschmidt A. (2008) Spontaneous local variations in ongoing
neural activity bias perceptual decisions. Proceedings of the National Academy of Sciences, USA
105: 10984–9.
Hulleman, J. and Humphreys, G.W. (2004). A new cue to figure-ground coding: top-bottom polarity.
Vision Research 44: 2779–91.
Ito, J., Nikolaev, A. R., and Leeuwen, C. (2005). Spatial and temporal structure of phase synchronization of
spontaneous alpha EEG activity. Biological Cybernetics 92(1): 54–60.
Ito, J., Nikolaev, A. R., and van Leeuwen, C. (2007). Dynamics of spontaneous transitions between global
brain states. Human Brain Mapping 28(9): 904–13.
Iturria-Medina Y, Canales-Rodríguez EJ, Melie-García L, Valdés-Hernández PA, Martínez-Montes E,
Alemán-Gómez Y, Sánchez-Bornot JM (2007): Characterizing brain anatomical connections using
diffusion weighted MRI and graph theory. NeuroImage 36: 645–60.
Cortical Dynamics and Oscillations 1003

Jaušovec, N. and Jaušovec, K. (2000). Correlations between ERP parameters and intelligence: A reconsideration.
Biological Psychology 55: 137–54.
Kaneko, K. and Tsuda, I. (2001). Complex systems: chaos and beyond—A constructive approach with
applications in life sciences. Berlin: Springer Verlag.
Karni, A., Tanne, D., Rubenstein, B.S., Askenasy, J.J.M., and Sagi, D. (1994). Dependence on REM sleep of
overnight improvement of a perceptual skill. Science 265: 679–82.
Kitzbichler, M. G., Smith, M. L., Christensen, S. R., and Bullmore, E. (2009). Broadband criticality of
human brain network synchronization. PLoS computational biology 5(3): e1000314.
Klimesch, W., Schimke, H., Ladurner, G., and Pfurtscheller, G. (1990). Alpha frequency and memory
performance. Journal of Psychophysiology 4: 381–90.
Koenis, M. M.G., Romeijn, N., Piantoni, G., Verweij, I., Van der Werf, Y. D., Van Someren, E. J.W., and
Stam, C. J. (2011). Does sleep restore the topology of functional brain networks? Human Brain Mapping
doi: 10.1002/hbm.21455.
Konen, Ch. and Kastner, S. (2008). Tho hierarchically organized neural systems for object information in
human visual cortex. Nature Neuroscience 11: 224–31.
Kranczioch, C., Debener, S., Maye, A., and Engel, A. K. (2007). Temporal dynamics of access to
consciousness in the attentional blink. Neuroimage 37: 947–55.
Kubovy, M., Holcombe, A. O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity.
Cognitive Psychology 35(1): 71–98.
Kwok, H.F., Jurica, P. Raffone, A., and van Leeuwen, C. (2007). Robust emergence of small-world structure
in networks of spiking neurons. Cognitive Neurodynamics 1: 39–51.
Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., and Schroeder, C. E. (2005). An oscillatory
hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of
Neurophysiology 94: 1904–11.
Lakatos, P., O’Connell, M. N., Barczak, A., Mills, A., Javitt, D. C., and Schroeder, C. E. (2009). The leading
sense: supramodal control of neurophysiological context by attention. Neuron 64: 419–30.
Latora, V. and Marchiori, M. (2001). Efficient behavior of small-world networks. Physical Review Letters
87: 198701.
Lee, K. H., Williams, L. M., Breakspear, M., and Gordon, E. (2003). Synchronous gamma activity:
A review and contribution to an integrative neuroscience model of schizophrenia. Brain Research: Brain
Research Review 41: 57–78.
Lehmann, D., Ozaki, H., and Pal, I. (1987). EEG alpha map series: brain micro-states by space-oriented
adaptive segmentation. Electroencephalography and Clinical Neurophysiology 67(3): 271–88.
Leopold, D. A. and Logothetis, N. K. (2003). Spatial patterns of spontaneous local field activity in the
monkey visual cortex. Reviews in the Neurosciences 14(1–2): 195–205.
Linkenkaer-Hansen, K., Nikouline, V.V., Palva, J.M. and Ilmoniemi, R.J. (2001). Long-range temporal
correlations and scaling behavior in human brain oscillations. Journal of Neuroscience 21: 1370–7.
Lisman, J. E. and Idiart, M. A. (1995). Storage of 7 +/- 2 short-term memories in oscillatory subcycles.
Science 267: 1512–15.
Liu, L. and Ioannides, A.A.(1996). A correlation study of averaged and single trial MEG signals: the average
describes multiple histories each in a different set of single trials. Brain Topography 8: 385–96.
Liu, L., Plomp, G., van Leeuwen, C., and Ioannides, A.A. (2006). Neural correlates of priming on occluded
figure interpretation in human fusiform cortex. Neuroscience 141: 1585–97.
Livanov, M. N. (1977). Spatial Organization of Cerebral Processes. New York: John Wiley and Sons.
Lowe, M. J., Mock, B. J., and Sorenson, J. A. (1998). Functional connectivity in single and multislice
echoplanar imaging using resting-state fluctuations. Neuroimage 7(2): 119–32.
Lopes da Silva, F.H. (1991). Neural mechanisms underlying brain waves: from neural membranes to
networks. Electroencephalography and Clinical Neurophysiology 79: 81–93.
1004 van Leeuwen

Lopes da Silva, F.H., van Rotterdam, A., Storm van Leeuwen, W., and Tielen, A.M. (1970). Dynamic
characteristics of visual evoked potentials in the dog. II. Beta frequency selectivity in evoked potentials
and background activity. Electroencephalography and Clinical Neurophysiology 29: 260–8.
Luck, S. J. and Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions.
Nature 390: 279–81.
Lutz, A., Lachaux, J. P., Martinerie, J., and Varela, F. J. (2002). Guiding the study of brain dynamics by
using first-person data: Synchrony patterns correlate with ongoing conscious states during a simple
visual task. Proceedings of the National Academy of Sciences, USA 99: 1586–91.
Lybrand, W.A., Andrews, T. G., and Ross, S. (1954). Systemic Fatigue and Perceptual Organization The
American Journal of Psychology 67: 704–7.
Maia, T.V. and Cleeremans, A. (2005). Consciousness: Converging insights from connectionist modeling
and neuroscience. Trends in Cognitive Sciences 9: 397–404.
Makeig, S., Westerfield, M., Jung, T. P., Enghoff, S., Townsend, J., Courchesne, E., and Sejnowski, T. J.
(2002). Dynamic brain sources of visual evoked responses. Science 295(5555): 690–4.
Markram, H., Toledo-Rodriguez, M., Wang, Y., Gupta, A., Silberberg, G., and Wu, C. (2004).
Interneurons of the neocortical inhibitory system. Nature Reviews Neuroscience 5: 793–807.
Mathewson, K.E., Lleras, A., Beck, D.M., Fabiani, M., Ro, T., and Gratton, G. (2011). Pulsed out of
awareness: EEG alpha oscillations represent a pulsed-inhibition of ongoing cortical processing. Frontiers
in Psychology 2, 99: doi: 10.3389/fpsyg.2011.00099.
Milner, P. M. (1974). A model for visual shape recognition. Psychological Review 81: 521–35.
Murthy, V.N. and Fetz, E.E. (1992). Coherent 25–35 Hz oscillations in the sensorimotor cortex of awake
behaving monkeys. Proceedings of the National Academy of Science, USA, 89: 5670–4.
Nakatani, C., Ito, J., Nikolaev, A. R., Gong, P., and van Leeuwen, C. (2005). Phase synchronization analysis
of EEG during attentional blink. Journal of Cognitive Neuroscience 12: 343–54.
Nakatani, C., Raffone, A., and van Leeuwen, C. (in press). Increased efficiency of conscious access with
enhanced coupling of slow and fast neural oscillations. Journal of Cognitive Neuroscience.
Nakatani, H., Khalilov, I., Gong, P., and van Leeuwen, C. (2003). Nonlinearity in giant depolarizing
potentials. Physics Letters A, 319: 167–72.
Nakatani, H. and van Leeuwen, C. (2006). Transient synchrony of distant brain areas and perceptual
switching. Biological Cybernetics 94: 445–57.
Nikolaev, A. R., Gepshtein, S., Kubovy, M., and van Leeuwen, C. (2008). Dissociation of early evoked
cortical activity in perceptual grouping. Experimental Brain Research 186(1): 107–22.
Nikolaev, A. R., Gepshtein, S., Gong, P., and van Leeuwen, C. (2010). Duration of coherence intervals in
electrical brain activity in perceptual organization. Cerebral Cortex 20(2): 365–82.
Nikolaev, A. R., Gong, P., and van Leeuwen, C. (2005). Evoked phase synchronization between adjacent
high-density electrodes in human scalp EEG: duration and time course related to behavior. Clinical
Neurophysiology 116(10): 2403–19.
Osaka, M. (1984). Peak alpha frequency of EEG during a mental task: task difficulty and hemispheric
differences. Psychophysiology 21(1): 101–5.
Pecora, L.M. and Carroll, T. L. (1990). Synchronization in chaotic systems. Physical Review Letters 64: 821–4.
Peterson, M.A. and Gibson, B. S. (1991). Directing spatial attention within an object: Altering the
functional equivalence of shape description. Journal of Experimental Psychology: Human Perception and
Performance, 17: 170–82.
Peterson, M. A. and Hochberg, J. (1983), Opposed-Set Measurement Procedure: A Quantitative Analysis
of the Role of Local Cues and Intention in Form Perception. Journal of Experimental Psychology: Human
Perception and Performance 9: 183–93.
Peterson, M. A. and Skow-Grant, E. (2003). Memory and learning in figure-ground perception. In: B. Ross
and D. Irwin (eds.), Cognitive Vision: Psychology of Learning and Motivation, Vol. 42, pp. 1–34. San
Diego: Academic Press.
Cortical Dynamics and Oscillations 1005

Pfurtscheller, G. and Lopes da Silva, F. H. (1999). Event-related EEG/MEG synchronization and


desynchronization: basic principles. Clinical Neurophysiology 110: 1842–57.
Plomp, G. and van Leeuwen, C. (2006). Asymmetric Priming Effects in Visual Processing of Occlusion
Patterns. Perception & Psychophysics 68: 946–58.
Plomp, G., Nakatani, C., Bonnardel, V., and van Leeuwen, C. (2004). Amodal completion as reflected in
gaze durations. Perception 33: 1185–2000.
Plomp, G., Liu, L., van Leeuwen, C., and Ioannides, A. A. (2006) The mosaic stage in amodal
completion as characterized by magnetoencephalography responses. Journal of Cognitive Neuroscience
18: 1394–405.
Pöppel, E. (1970). Excitability cycles in central intermittency. Psychologische Forschung 34: 1–9.
Raffone, A. and Wolters, G. (2001). A cortical mechanism for binding in visual working memory. Journal of
Cognitive Neuroscience 13: 766–85.
Rauschenberger, R., Peterson, M. A., Mosca, F., and Bruno, N. (2004). Amodal completion in visual
search: preemption or context effects? Psycholical Science 15: 351–5.
Raymond, J. E., Shapiro, K. L., and Arnell, K. M. (1992). Temporary suppression of visual processing
in an RSVP task: an attentional blink? Journal of Experimental Psychology: Human Perception and
Performance 18: 849–60.
Revonsuo, A., Wilenius-Emet, M., Kuusela, J., and Lehto, M. (1997). The neural generation of a unified
illusion in human vision. Neuroreport 8: 3867–70.
Rodriguez, E., George, N., Lachaux, J.-P., Martinerie, J., Renault, B., and Varela, F. J. (1999). Perception’s
shadow: Long distance synchronization of human brain activity. Nature 397: 430–3.
Roelfsema, P. R. (2006). Cortical algorithms for perceptual grouping. Annual Review of Neuroscience
29: 203–27.
Rosenblum M.G., Pikovsky A. S., and Kurths J. (1996) Phase synchronization of chaotic oscillators.
Physical Review Letters 76: 1804–7.
Rubin, E. (1921). Visuell wahrgenommene Figuren. Kopenhagen: Gyldendalske Boghandel.
Rubinov, M., Sporns, O., van Leeuwen, C., and Breakspear, M. (2009a) Symbiotic relationship between
brain dynamics and architectures. BMC Neuroscience 10: 55.
Rubinov, M., Knock, S. A., Stam, C. J., Micheloyannis, S., Harris, A.W. F., Williams, L. M., and
Breakspear, M. (2009b). Small-world properties of nonlinear brain activity in schizophrenia. Human
Brain Mapping 30: 403–16.
Sekuler, A. B. and Palmer, S. E. (1992). Perception of partly occluded objects: a microgenetic analysis.
Journal of Experimental Psychology: General 21: 95–111.
Sergent, C., Baillet, S., and Dehaene, S. (2005). Timing of the brain events underlying access to
consciousness during the attentional blink. Nature Neuroscience 8: 1391–400.
Siegel, M., Donner, T. H., and Engel, A. K. (2012). Spectral fingerprints of large-scale neuronal
interactions. Nature Reviews Neuroscience 13: 121–34.
Simione, L., Raffone, A., Wolters, G., Salmas, P., Nakatani, C., Belardinelli, M. O., and van Leeuwen,
C. (2012). ViSA: A Neurodynamic Model for Visuo-Spatial Working Memory, Attentional Blink, and
Conscious Access. Psychological Review 119: 745–69.
Shaw, J. C. (2004). The Brain’s Alpha Rhythms and the Mind. Amsterdam: Elsevier Science.
Smit, D. J., Stam, C. J., Posthuma, D., Boomsma, D. I., and De Geus, E. J. (2007). Heritability of ‘small‐
world’ networks in the brain: A graph theoretical analysis of resting‐state EEG functional connectivity.
Human brain mapping 29(12): 1368–78.
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs
74: 29.
Sporns, O. and Zwi, J.(2004). The small world of the cerebral cortex. Neuroinformatics 2: 145–62.
Stam, C. J. (2004). Functional connectivity patterns of human magnetoencephalographic
recordings: A ‘small-world’ network? Neuroscience Letters 355: 25–8.
1006 van Leeuwen

Steriade, M. (2001). Impact of network activities on neuronal properties in corticothalamic systems. Journal
of Neurophysiology 86(1): 1–39.
Stroud, J. M. (1955). The fine structure of psychological time. In: H. Quasten (ed.), Information Theory in
Psychology, pp. 174–207. Glencoe, Illinois: Free Press.
Supp, G. G., Schlogl, A., Fiebach, C. J., Gunter, T. C., Vigliocco, G., Pfurtscheller, G., and Petsche, H.
(2005). Semantic memory retrieval: cortical couplings in object recognition in the N400 window.
European Journal of Neuroscience 21: 1139–43.
Tallon-Baudry, C. and Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in object
representation. Trends in Cognitive Sciences 3: 151–62.
Tallon-Baudry, C., Bertrand, O., Delpuech, C., and Permier, J. (1997). Oscillatory g-band (30–70 Hz)
activity induced by a visual search task in humans. Journal of Neuroscience 17: 722–34.
Tallon-Baudry, C., Bertrand, O., Peronnet, F., and Pernier, J. (1998). Induced gamma-band activity during
the delay of a visual short-term memory task in humans. Journal of Neuroscience 18: 4244–54.
Tallon-Baudry, C., Bertrand, O., and Fischer, C. (2001). Oscillatory synchrony between human extrastriate
areas during visual short-term memory maintenance. Journal of Neuroscience 21: RC177.
Tononi, G. and Edelman, G. M. (1998). Consciousness and complexity. Science 282: 1846–51.
Thut, G., Miniussi, C., and Gross, J. (2012). The functional importance of rhytmic activity in the brain.
Current Biology 22: R658–R663.
van den Berg, D. and van Leeuwen, C. (2004). Adaptive rewiring in chaotic networks renders small-world
connectivity with consistent clusters. Europhysics Letters 65: 459–64.
van den Berg, D., Gong, P., Breakspear, M., and van Leeuwen, C. (2012). Fragmentation: Loss of
global coherence or breakdown of modularity in functional brain architecture? Frontiers in Systems
Neuroscience 6: 20.
van Leeuwen, C. (2007). What needs to emerge to make you conscious? Journal of Consciousness Studies
14(1–2): 115–36.
van Leeuwen, C. and Bakker, L. (1995). Stroop can occur without Garner interference: strategic and
mandatory influences in multidimensional stimuli. Perception and Psychophysics 57(3): 379–92.
van Leeuwen, C. and Smit, D.J.A. (2012). Restless brains, wandering minds. In: S. Edelman, T. Fekete, and
N. Zach (eds.), Being in Time: Dynamical Models of Phenomenal Awareness. Advances in Consciousness
Research, pp. 121–47. Amsterdam: John Benjamins PC.
van Leeuwen, C. and van den Hof, M. (1991). What has happened to Prägnanz? Coding, stability, or
resonance. Perception & Psychophysics 50(5): 435–48.
van Lier, R. J., van der Helm, P. A., and Leeuwenberg, E. L. J. (1995). Competing global and local
completions in visual occlusion. Journal of Experimental Psychology: Human Perception and Performance
21: 571–83.
van Wassenhove, V., Grant, K. W., and Poeppel, D. (2007). Temporal window of integration in
auditory-visual speech perception. Neuropsychologia 45: 598–607. doi: S0028-3932(06)00011-X [pii]
10.1016/j.neuropsychologia.2006.01.001.
Varela, F. J. (1995). Resonant cell assemblies: a new approach to cognitive functions and neuronal
synchrony. Biological Research 28(1): 81–95.
Varela, F., Lachaux, J.P., Rodriguez, E., and Martinerie, J. (2001). The brainweb: phase synchronization and
large-scale integration. Nature Reviews Neuroscience 2: 229–39.
Vecera, S. P., Vogel, E. K., and Woodman, G. F. (2002). Lower region: A new cue for figure-ground
assignment. Journal of Experimental Psychology: General 131: 194–205.
Vierling-Claassen, D., Cardin, J.A., Moore, C.I., and Jones S. R. (2010). Computational modeling of
distinct neocortical oscillations driven by cell-type selective optogenetic drive: separable resonant
circuits controlled by low-threshold spiking and fast-spiking interneurons. Frontiers in Human
Neuroscience 4:198. doi: 10.3389/fnhum.2010.00198.
Cortical Dynamics and Oscillations 1007

von der Malsburg, C. (1985). Nervous structures with dynamical links. Berichte der Bunsengesellschaft für
physikalische Chemie 89: 703–10.
von Stein, A., Rappelsberger, P., Sarnthein, J., and Petsche, H. (1999). Synchronization between temporal
and parietal cortex during multimodal object processing in man. Cerebral Cortex 9: 137–50.
Watts, D. and Strogatz, S. (1998). Collective dynamics of ‘small-world’ networks. Nature 393: 440–2.
Wrobel, A. (2000). Beta activity: A carrier for visual attention. Acta Neurobiologica Experimentalis
60: 247–60.
Yamada, T. and Fujisaka, H. (1983). Stability theory of synchronized motion in coupled oscillator systems.
Progress in Theoretical Physics 70: 1240–8.
Zylberberg, A., Dehaene, S., Roelfsema, P. R., and Sigman, M. (2011). The human Turing machine: a
neural framework for mental programs. Trends in Cognitive Sciences 15: 293–300.
Chapter 49

Bayesian models of perceptual


organization
Jacob Feldman

Inference in Perception
One of the central ideas in the study of perception is that the proximal stimulus—the pattern of
energy that impinges on sensory receptors, such as the visual image—is not sufficient to specify the
actual state of the world outside (the distal stimulus). That is, while the image of your grandmother
on your retina might look like your grandmother, it also looks like an infinity of other arrange-
ments of matter, each having a different combination of three-dimensional structures, surface
properties, colour properties, etc., so that they happen to look just like your grandmother from a
particular viewpoint. Naturally, the brain generally does not perceive these far-fetched alternatives,
but rapidly converges on a single solution, which is what we consciously perceive. A shape on the
retina might be a large object that is far away, or a smaller one that is closer, or anything in between.
A mid-grey region on the retina might be a bright white object in dim light, a dark object in bright
light, or anything in between. An elliptical shape on the retina might be an elliptical object face-on,
a circular object slanted back in depth, or anything in between. Every proximal stimulus is consist-
ent with an infinite family of possible scenes, only one of which is perceived.
The central problem for the perceptual system is to quickly and reliably decide among all these
alternatives, and the central problem for visual science is to figure out what rules, principles, or
mechanisms the brain uses to do so. This process was called unconscious inference by Helmholtz,
perhaps the first scientist to appreciate the problem, and is sometimes called inverse optics to
convey the idea that the brain must in a sense invert the process of optical projection—to take the
image and recover the world that gave rise to it.
The modern history of visual science contains a wealth of proposals for how exactly this process
works, far too numerous to review here. Some are very broad, like the Gestalt idea of Prägnanz
(infer the simplest or most reasonable scene consistent with the image). Many others are nar-
rowly addressed to specific aspects of the problem, like the inference of shape or surface colour.
But historically, the vast majority of these proposals suffer from one or both of the following two
problems. First, many (like Prägnanz and many other older suggestions) are too vague to be real-
ized as computational mechanisms. They rest on central ideas, like the Gestalt term ‘goodness of
form’, that are subjectively defined and cannot be implemented algorithmically without a host
of additional assumptions. Second, many proposed rules are arbitrary or unmotivated, meaning
that is unclear exactly why the brain would choose them rather than an infinity of other equally
effective ones. Of course, it cannot be taken for granted that mental processes are principled in
this sense, and some have argued for a view of the brain as a ‘bag of tricks’ (Ramachandran 1985).
Nevertheless, to many theorists, a mental function as central and evolutionarily ancient as percep-
tual inference seems to demand a more coherent and principled explanation.
Bayesian Models of Perceptual Organization 1009

Inverse Probability and Bayes’ Rule


In recent decades, Bayesian inference has been proposed as a solution to these difficulties, rep-
resenting a principled, mathematically well-defined, and comprehensive solution to the problem
of inferring the most plausible interpretation of sensory data. Bayesian inference begins with the
mathematical notion of conditional probability, which is simply probability restricted to some
particular set of circumstances. For example, the conditional probability of A conditioned on
B, denoted p(A|B), means the probability that A is true given that B is true. Mathematically, this
conditional probability is simply the ratio of the probability of A and B both being true, p(A and
B), divided by the probability that B is true, p(B), hence

p( A and B) (1)
p( A | B) =
p(B) 

Similarly, the probability of B given A is the ratio of the probability that B and A are both true
divided by the probability that A is true, hence

p(B and A) (2)


p(B | A) =
p( A) 

It was the Reverend Thomas Bayes (1763) who first noticed that these mathematically simple
observations can be combined to yield a formula1 for the conditional probability p(A|B) (A given
B) in terms of the inverse conditional probability p(B|A) (B given A)

p(B | A) p( A) (3)
p( A | B) =
p(B) 

This formula is now called Bayes’ theorem or Bayes’ rule.2 Before Bayes, the mathematics of prob-
ability had been used exclusively to calculate the chances of a particular random outcome of a
stochastic process, like the chance of getting ten consecutive heads in ten flips of a fair coin [p(ten
heads|fair coin)]. Bayes realized that his rule allowed us to invert this inference and calculate the
probability of the conditions that gave rise to the observed outcome—here, the probability, having
observed ten consecutive heads, that the coin was fair in the first place [p(fair coin|10 heads)].
Of course, to determine this, you need to assume that there is some other hypothesis we might
entertain about the state of the coin, such as that it is biased towards heads. Bayes’ logic, often
called inverse probability, allows us to evaluate the plausibility of various hypotheses about the
state of the world (the nature of the coin) on the basis of what we have observed (the sequence of
flips). For example, it allows us to quantify the degree to which observing ten heads in a row might
persuade us that the coin is biased towards heads.

  More specifically, note that p(B and A) = p(A and B) (conjunction is commutative). Substitute the latter for
1

the former in Eq. (1) to see that p(A|B)p(B), and likewise p(B)p(A|B), are both equal to p(A and B) and thus to
each other. Divide both sides of p(A|B)p(B) = p(B|A)p(A) by p(B) to yield Bayes’ rule.
  The rule does not actually appear in this form in Bayes’ essay. But Bayes’ focus was indeed on the underlying
2

problem of inverse inference and deserves credit for the main insight (see Stigler 1983).
1010 Feldman

Bayes and his followers, especially the visionary French mathematician Laplace, saw how
inverse probability could form the basis of a full-fledged theory of inductive inference (see Stigler
1986). As David Hume had pointed out only a few decades previously, much of what we believe in
real life—including all generalizations from experience—cannot be proved with logical certainty,
but instead merely seems intuitively plausible on the basis of our knowledge and observations. To
philosophers seeking a deductive basis for our beliefs, this argument was devastating. But Laplace
realized that Bayes’ rule allowed us to quantify belief—to precisely gauge the plausibility of induc-
tive hypotheses.
By Bayes’ rule, given any data D which has a variety of possible hypothetical causes H1, H2, etc.,
each cause Hi is plausible in proportion to the product of two numbers: the probability of the data
if the hypothesis is true p(D|Hi), called the likelihood, and the prior probability of the hypothesis,
p(Hi), that is, how probable the hypothesis was in the first place. If the various hypotheses are all
mutually exclusive, then the probability of the data D is the sum of its probability under all the
various hypotheses:

p(D) = p(H1 ) p(D | H1 ) + p(H1 ) p(D | H1 ) + ... ∑ p(H i )p(D | H i ) (4)


i


Plugging this into Bayes’ rule (with Hi playing the role of A, and D playing the role of B), this means
that the probability of hypothesis Hi given data D, called the posterior probability, P(Hi|D), is

p(H i ) p(D | H i ) p(H i ) p(D | H i ) (5)


p(H i | D) = = ,
p(D) ∑ i p(H i ) p(D | H i )


or in words

prior for H i × likelihood of H i


posterior for H i = (6)
sum of (prior × likelihood) over all hypotheses


The posterior probability p(Hi|D) quantifies how much we should believe Hi after considering the
data. It is simply the ratio of the probability of the evidence under Hi (the product of its prior and
likelihood) relative to the total probability of the evidence arising under all hypotheses (the sum
of the prior–likelihood products for all the hypotheses). This ratio measures how plausible Hi is
relative to all the other hypotheses under consideration.
But Laplace’s ambitious account was followed by a century of intense controversy about the use
of inverse probability (see Howie 2004). In modern retellings, the critics’ objections to Bayesian
inference are often reduced to the idea that to use Bayes’ rule we need to know the prior probabil-
ity of each of the hypotheses (for example, the probability that the coin was fair in the first place),
and that we often don’t have this information. But their criticism was far more fundamental, and
relates to the meaning of probability itself. They argued that many propositions—those whose
truth value is fixed but unknown—can’t be assigned probabilities at all, in which case the use of
inverse probability to assign them probabilities would be nonsensical. This criticism reflects a
conception of probability, often called frequentism, in which probability refers exclusively to rela-
tive frequency in a repeatable chance situation. Thus, in their view, you can calculate the probabil-
ity of a string of heads for a fair coin because this is a random event that occurs on some fraction
Bayesian Models of Perceptual Organization 1011

of trials; but you can’t calculate a probability of a non-repeatable state of nature, like this coin is
fair, or the Higgs boson exists because such hypotheses are either definitely true or definitely false,
and are not ‘random’. The frequentist objection was not just that we don’t know the prior for many
hypotheses, but that most hypotheses don’t have priors—or posteriors, or any probabilities at all.
But, in contrast, Bayesians generally thought of probability as quantifying the degree of belief,
and were perfectly content to apply it to any proposition at all, including non-repeatable ones. To
Bayesians, the probability of any proposition is simply a characterization of our state of knowl-
edge about it, and can freely be applied to any proposition as a way of quantifying how strongly
we believe it. This conception of probability, sometimes called subjectivist (or epistemic or some-
times just Bayesian), is thus essential to the Bayesian programme. Without it, one cannot calculate
the posterior probability of a non-repeatable proposition because such propositions simply don’t
have probabilities—and this would rule out most uses of Bayes’ rule to perform induction. But to
subjectivists, Bayesian inverse probability can be used to determine the posterior probability, and
thus the strength of belief, for any hypothesis at all.3

Bayesian Inference as a Model of Perception


The use of Bayesian inference as a model for perception rests on two basic ideas. The first, just
mentioned, is the basic idea of inverse probability as a general method for determining belief
under conditions of uncertainty. Bayesian inference allows us to quantify the degree to which dif-
ferent scene models—hypotheses about what is actually going on in the world—should be believed
on the basis of sensory data. Indeed, to many researchers, the subjectivist attitude towards prob-
ability resonates perfectly with the inherently ‘subjective’ nature of perception—that is, that by
definition it involves understanding belief from the observer ’s point of view.
The other attribute of Bayesian inference that drives enthusiasm in its favour is its rationality.
Cox (1961) showed that Bayesian inference is unique among inference systems in satisfying basic
considerations of internal consistency, such as invariance to the order in which evidence is consid-
ered. If one wishes to assign degrees of belief to hypotheses in a rational way, one must inevitably
use the conventional rules of probability, and specifically Bayes’ rule. Later de Finetti (1970/1974)
demonstrated that if a system of inference differs from Bayesian inference in any substantive way,
it is subject to catastrophic failures of rationality. (His so-called Dutch book theorem shows, in
essence, that any non-Bayesian reasoner can be turned into a ‘money pump’.) In recent decades
these strong arguments for the uniqueness of Bayesian inference as a system for fixing belief were
brought to wide attention by Jaynes (2003). Though there are of course many subtleties surrounding
the putatively optimal nature of Bayesian inference (see Earman 1992), most modern statisticians
now regard Bayesian inference as a normative method for making inferences on the basis of data.
This characterization of Bayesian inference—as an optimal method for deciding what to believe
under conditions of uncertainty—makes it perfectly suited to the central problem of perception,
that of estimating the properties of the physical world based on sense data. The basic idea is to
think of the stimulus (e.g. the visual image) as reflecting both stable properties of the world (which

  This philosophical disagreement underlies the recent debate between traditional statistics centred on null
3

hypothesis significance testing (NHST) and Bayesian inference (see Lee and Wagenmakers 2005). NHST was
invented by fervent frequentists (Fisher, Neyman, and Pearson) who insisted that scientific hypotheses, being
non-repeatable, cannot have probabilities. This position rules out the application of Bayes’ rule to estimate the
posterior probability of a hypothesis, leading them to propose alternative ways of evaluating hypotheses such
as ‘rejecting the null’.
1012 Feldman

we would like to infer) plus some uncertainty introduced in the process of image formation (which
we would like to disregard). Bayesian inference allows us estimate the stable properties of the
world conditioned on the image data. The aptness of Bayesian inference as a model of perceptual
inference was first noticed in the 1980s by a number of authors, and brought to wider attention by
the collection of papers in Knill and Richards (1996). Since then the applications of Bayesian infer-
ence to perception have multiplied and evolved, while always retaining the core idea of associating
perceptual belief with the posterior probability as given by Bayes’ rule. Several excellent introduc-
tions are already available (e.g. Bülthoff and Yuille 1991; Kersten et al. 2004) each with a slightly
different emphasis or slant. The current chapter is intended as an introduction to the main ideas
of Bayesian inference as applied to human perception and perceptual organization. The emphasis
will be on central principles rather than on mathematical details or recent technical advances.

Basic Calculations in Bayesian Inference


All Bayesian inference involves a comparison among some number of hypotheses Hi, drawn from
a hypothesis set H, each of which has associated with it a prior probability p(H) and a likelihood
function p(X|H) which gives the probability of each possible dataset X conditioned on H.4 In many
cases, the hypotheses H are qualitatively distinct from each other (H is finite or countably infinite).
In other cases the hypotheses form a continuous family of hypotheses (the hypothesis space) dis-
tinguished by the setting of some number of parameters. In this case the problem is often called
parameter estimation, because the observer’s goal is to determine, based on the data at hand, the
most probable value of the parameter(s), or, more broadly, the distribution of probability of over all
possible values of the parameter(s) (called the posterior distribution). The mathematics of discrete
hypothesis comparison and parameter estimation can look quite different (the former involving
summation while the latter involves integration) but the logic is essentially the same: in both cases
the goal is to infer the posterior assignment of belief to hypotheses, conditioned on the data.
The hypothesis with greatest posterior probability, the mode (maximum value) of the posterior
distribution, is called the maximum a posteriori (MAP) hypothesis. If we need to reduce our pos-
terior beliefs to a single value, this is by definition the most plausible, and casual descriptions of
Bayesian inference often imply that Bayes’ rule dictates that we choose the MAP hypothesis. But
Bayes’ rule does not actually authorize this reduction; it simply dictates how much to believe each
hypothesis—that is, the full posterior distribution. In many situations use of the MAP be quite
undesirable: for example, broadly distributed posteriors that have many other highly probable val-
ues, or multimodal posteriors that have multiple peaks that are almost as plausible as the MAP.

  Students are often warned that the likelihood function is not a probability distribution, a remark that in my
4

experience tends to cause confusion. In traditional terminology, likelihood is a property of the model or
hypothesis, not the data, and one refers, for example, to the likelihood of H (and not the likelihood of the
data under H). This is because the term ‘likelihood’ was introduced by frequentists (specifically Fisher 1925),
who insisted that hypotheses did not have probabilities, and sought a word other than ‘probability’ to express
the degree of support given by the data to the hypothesis in question. To Bayesians, however, the distinction
is unimportant, since both data and hypotheses can have probabilities. So Bayesians have tended (especially
recently) to refer to the likelihood of the data under the hypothesis, or the likelihood of the hypothesis, in both
cases meaning the probability p(D|H). In this sense, likelihoods are indeed probabilities. However, note that
the likelihoods of the various hypotheses do not have to sum to one; for example, it is perfectly possible for
many hypotheses to have likelihood near one given a dataset that they all fit well. In this sense, the distribution
of likelihood over hypotheses (models) is certainly not a probability distribution. But the distribution of likeli-
hood over the data for a single fixed model is, in fact, a probability distribution and sums to one.
Bayesian Models of Perceptual Organization 1013

Reducing the posterior distribution to a single ‘winner’ discards useful information, and it should
be kept in mind that only the full posterior distribution expresses the totality of our posterior beliefs.

Example: parameter estimation in motion


An example of parameter estimation drawn from perception is the estimation of motion based on
a sequence of dynamically changing images. In everyday vision, we think of motion as a property
of coherent objects plainly moving through space, in which case it is hard to appreciate the pro-
found ambiguity involved. But in fact dynamically changing images are generally consistent with
many motion interpretations, because the same changes can be interpreted as one visual pattern
moving at one velocity (speed and direction), or another pattern moving at another velocity, or
many options in between.
So the estimation of motion requires a comparison among a range of potential interpretations
of an ambiguous collection of image data. As such, it can be placed in a Bayesian framework if
one can provide (1) a prior over potential motions, indicating which velocities are more a priori
plausible and which less, and (2) a likelihood function allowing us to measure the fit between
each motion sequence and each potential interpretation. Weiss et  al. (2002) have shown that
many phenomena of motion interpretation—both under normal conditions as well as a range
of standard motion illusions—are predicted by a simple Bayesian model in which (1) the prior
favours slower speeds over faster ones, and (2) the likelihood is based on conventional Gaussian
noise assumptions. That is, the posterior distribution favours interpretations that minimize speed
while simultaneously maximizing fit to the observed data (leading to the simple slogan ‘slow and
smooth’). The close fit between human percepts and the predictions of the Bayesian model is par-
ticularly striking in that in addition to accounting for normal motion percepts, it also systemati-
cally explains certain illusions of motions as side-effects of rational inference.

Example: discrete hypotheses in contour integration


An example of discrete hypotheses in perception comes from the problem of contour integration
(see Elder 2013; Singh 2013), in the question of whether two visual edges belong to the same
contour (H1) or different contours (H2). Because physical contours can take on a wide variety
of geometric forms, practically any observed configuration of two edges is consistent with the
hypothesis of a single common contour. But because edges drawn from the same contour tend to
be relatively collinear, the angle between two observed edges provides some evidence about how
plausible this hypothesis is relative to the competing hypothesis that the two edges arise from dis-
tinct contours. This decision, repeated many times for pairs of edges throughout the image, forms
the basis for the extraction of coherent object contours from the visual image (Feldman 2001).
To formalize this as a Bayesian problem, we need priors p(H1) and p(H2) for the two hypotheses,
and likelihood functions p(α|H1) and p(α|H2) that express the probability of the angle between
the two edges (called the turning angle) conditioned under each hypothesis. Several authors have
modelled the same-contour likelihood function p(α|H1) as a normal distribution centred on col-
linearity (0º turning angle; see Feldman 1997; Geisler et al. 2001). Figure 49.1. illustrates the deci-
sion problem in its Bayesian formulation. In essence, each successive pair of contour elements
must be classified as either part of the same contour or as parts of distinct contours. The likelihood
of each hypothesis is determined by the geometry of the observed configuration, with the normal
likelihood function assigning higher likelihood to element pairs that are closer to collinear. The
prior (in practice fitted to subjects’ responses) tends to favour H2, presumably because most image
edges come from disparate objects. Bayes’ rule puts these together to determine the most plausible
1014 Feldman

Likelihood functions

Hypothesis A:
One contour
α
-180° 0 180°
?
Collinear most likely...

All directions
? equally likely...

Hypothesis B:
Two contours

-180° 0 180°

Fig. 49.1  Two edges can be interpreted as part of the same smooth contour (hypothesis A, top) or
as two distinct contours (hypothesis B, bottom). Each hypothesis has a likelihood (right) that is a
function of the turning angle α; with p(α|A) sharply peaked at 0º but p(α|B) flat.

grouping. Applying this simple formulation more broadly to all the image edge pairs allows the
image to be divided up into a discrete collection of ‘smooth’ contours—that is, contours made up
of elements which Bayes’ rule says all belong to the same contour. The resulting parse of the image
into contours agrees closely with human judgments (Feldman 2001). Related models have been
applied to contour completion and extrapolation (Singh and Fulvio 2005).

Bayesian Perceptual Organization


The problems of perceptual organization—how to group the visual image into contours, surfaces,
and objects—seems at first blush quite different from other problems in visual perception, because
the property we seek to estimate is not a physical parameter of the world but a representation of
how we choose to organize it. Still, Bayesian methods can be applied in a straightforward fashion
as long as we assume that each image is potentially subject to many grouping interpretations, but
that some are more intrinsically plausible than others (allowing us to define a prior over interpreta-
tions) and some fit the observed image better than others (allowing us to define a likelihood func-
tion). We can then use Bayes’ rule to infer a posterior distribution over grouping interpretations.
More specifically, many problems in perceptual organization can be thought of as choices
among discrete alternatives. Each qualitatively distinct way of organizing the image consti-
tutes an alternative hypothesis. Should a grid of dots be organized into vertical or horizontal
stripes (Zucker et al. 1983; Claessens and Wagemans 2008)? Should a configuration of dots be
grouped into distinct clusters, and if so in what way (Compton and Logan 1993; Cohen et al.
2008; Juni et al. 2010)? What is the most plausible way to divide a smooth shape into a set of
component parts (De Winter and Wagemans 2006; Singh and Hoffman 2001)? Each of these
problems can be placed into a Bayesian framework by assigning to each distinct alternative
interpretation a prior and a method for determining likelihood.
Bayesian Models of Perceptual Organization 1015

(a) Prior

High prior Low prior


p (SKEL)

(b) Likelihood Shape Skeleton axis


Rib direction error density p(f)
al Rib length error density p(e)
norm
Axis Shape point x
Rib length error ex
Rib direction error fx
Ribs

p (SHAPE SKEL)

(c) MAP skeleton


s
Rib
n
to
ele
sk
AP
M

Maximizes
posterior
p (SKEL|SHAPE)
(d) Examples

Fig. 49.2  Generative model for shape from Feldman and Singh (2006), giving: (a) prior over skeletons,
(b) likelihood function, (c) MAP skeleton, the maximum posterior skeleton for the given shape, and (d)
examples of the MAP skeleton.
Adapted from Jacob Feldman and Manish Singh, Bayesian estimation of the shape skeleton, Proceedings of
the National Academy of Sciences, USA, 103(47), pp. 18014–18019, Figures 1, 2a, and 5e, doi: 10.1073/
pnas.0608811103 Copyright (2006) National Academy of Sciences, U.S.A.

Each of these problems requires its own unique approach, but broadly speaking a Bayesian
framework for any problem in perceptual organization flows from a generative model for image
configurations (Feldman et al. 2013). Perceptual organization is based on the idea that the visual
image is generated by regular processes that tend to create visual structures with varying prob-
ability, which can be used to define likelihood functions. The challenge of Bayesian perceptual
grouping is to discover psychologically reasonable generative models of visual structure.
For example, Feldman and Singh (2006) proposed a Bayesian approach to shape representation
based on the idea that shapes are generated from axial structures (skeletons) from which the shape
contour is understood to have ‘grown’ laterally. Each skeleton consists of a hierarchically organ-
ized collection of axes, and generates a shape via a probabilistic process that defines a probability
distribution over shapes (Fig. 49.2). This allows a prior over skeletons to be defined, along with
a likelihood function that determines the probability of any given contour shape conditioned on
1016 Feldman

the skeleton. This in turn allows the visual system to determine the MAP skeleton (the skeleton
most likely to have generated the observed shape) or, more broadly, a posterior distribution over
skeletons. The estimated skeleton in turn determines the perceived decomposition into parts,
with each section of the contour identified with a distinct generating axis perceived as a distinct
‘part’. This shape model is certainly oversimplified relative to the myriad factors that influence real
shapes, but the basic framework can be augmented with a more elaborate generative model, and
tuned to the properties of natural shapes (Wilder et al. 2011). Because the framework is Bayesian,
the resulting representation of shape is, in the sense discussed above, optimal given the assump-
tions specified in the generative model.

Discussion
This section raises issues that often arise when Bayesian models of cognitive processes are
considered.

Bayesian updating
Bayesian inference is sometimes referred to as Bayesian updating because of the inherently pro-
gressive way that the arrival of new data leads the observer’s belief to evolve from the prior towards
the ultimate posterior. The initial prior represents the observer’s beliefs before any data have been
encountered. When data arrive, belief in all hypotheses is modified to reflect them: the likelihood
of each hypothesis is multiplied by its prior (Bayes’ rule) to yield a new, updated posterior belief
distribution. From there on, the state of belief continues to evolve as new data are acquired, with
the posterior at each step becoming the prior for the next step. In this way, belief is gradually
pushed by the data away from the initial prior and towards beliefs that better reflect the data.
More specifically, because of the way the mathematics works, the posterior distribution tends to
get narrower and narrower (more and more sharply peaked) as more and more data come in. That
is, belief typically evolves from a broad prior distribution (representing uncertainty about the state of
the world) towards a progressively narrower posterior distribution (representing increasingly well-
informed belief). In this sense, the influence of the prior gradually diminishes over the course of
inference—in a Bayesian cliché, the ‘likelihood swamps the prior’. Partly for this reason, though the
source of the prior can be controversial (see Where do the priors come from?), in many situations
(though not all) its exact form is not too important, because the likelihood eventually dominates it.

Where do the priors come from?


As already mentioned, a great deal of controversy has centred on the epistemological status of
prior probabilities. Frequentists long insisted that priors were justified only in the presence of ‘real
knowledge’ about the relative frequencies of various hypotheses, a requirement that they argued
ruled out most uses. A  similar attitude is surprisingly common among present-day Bayesians
in cognitive science (see Feldman 2013), many of whom aim to validate priors with respect to
tabulations of relative frequency in natural conditions (e.g. Geisler et al. 2001; Burge et al. 2010;
see Dakin 2013). However, as mentioned above, this restriction would limit the application of
Bayesian models to hypotheses which (1) can be objectively tabulated and (2) are repeated many
times under essentially identical conditions; otherwise objective relative frequencies cannot be
defined. Unfortunately, these constraints would rule out many hypotheses which are of central
interest in cognitive science, such as interpreting the intended meaning of a sentence (itself a
belief, and not subject to objective measurement, and in any event unlikely ever to be repeated) or
choosing the ‘best’ way to organize the image (again subjective, and again dependent on possibly
Bayesian Models of Perceptual Organization 1017

unique aspects of the particular image). However, as already discussed, Bayesian inference is not
really limited to such situations if (as is traditional for Bayesians) probabilities are treated simply
as quantifications of belief. In this view, priors do not represent the relative frequency with which
conditions in the world obtain, but rather the observer’s uncertainty (prior to receiving the data in
question) about the hypotheses under consideration.
There are many ways of boiling this uncertainty down to a specific prior. Many descend from
the Laplace’s principle of insufficient reason (sometimes called the principle of indifference), which
holds that a set of hypotheses, none of which one has any reason to favour, should be assigned equal
priors. The simplest example of this is the assignment of uniform priors over symmetric options,
such as the two sides of a coin or the six sides of a die. More elaborate mathematical arguments
can be used to derive specific priors from more generalized symmetry arguments. One is Jeffreys’
prior, which allows more generalized equivalences between interchangeable hypotheses (Jeffreys
1939/1961). Another is the maximum entropy prior (Jaynes 1982), which prescribes the prior that
introduces the least information (in the technical sense of Shannon) beyond what is known.
Bayesians often favour so-called uninformative priors, meaning priors that are as ‘neutral’ as
possible; this allows the data (via the likelihood) to be the primary influence on posterior belief.
Exactly how to choose an uninformative prior can, however, be problematic. For example, to
estimate the probability of success of a binomial process, like the probability of heads in a coin
toss, it is tempting to adopt a uniform prior over success probability (i.e. equal over the range 0
to 100 per cent).5 But mathematical arguments suggest that a truly uninformative prior should be
relatively peaked at 0 and 100 per cent (the beta(0,0) distribution, sometimes called the Haldane
prior; see Lee 2004). But recall that as data accumulate, the likelihood tends to swamp the prior,
and the influence of the prior progressively diminishes. Hence while the choice of prior may be
philosophically controversial, in some real situations the actual choice is moot.
More specifically, certain types of simple priors occur over and over again in Bayesian accounts.
When a particular parameter x is believed to fall around some value µ, but with some uncertainty
that is approximately symmetric about µ, Bayesians routinely assume a Gaussian (normal) prior
distribution for µ, i.e. p(x) ∝ N(µ, σ2). Again, this is simply a formal way of expressing what is
known about the value of x (that it falls somewhere near µ) in as neutral a manner as possible
(technically, this is the maximum entropy prior with mean µ and variance σ2). Gaussian error
is often a reasonable assumption because random variations from independent sources, when
summed, tend to yield a normal distribution (the central limit theorem).6 But it should be kept
in mind that an assumption of normal error along x does not entail an affirmative assertion that
repeated samples of x would be normally distributed—indeed in many situations (such as where
x is a fixed quantity of the world, like a physical constant) this interpretation does not even make
sense. Such simple assumptions work surprisingly well in practice and are often the basis for
robust inference. Another common assumption is that priors for different parameters that have
no obvious relationship are independent (that is, knowing the value of one conveys no informa-
tion about the value of the other). Bayesian models that assume independence among parameters

  Bayes himself suggested this prior, now sometimes called Bayes’ postulate, but he was apparently uncertain of
5

its validity, which may have contributed to his reluctance to publish his essay (which was eventually published
posthumously; see Stigler 1983).
  More technically, the central limit theorem says that the sum of random variables with finite variances tends
6

towards normality in the limit. In practice this means that if x is really the sum of a number of component vari-
ables, each of which is random though not necessarily normal itself, then x tends to be normally distributed.
1018 Feldman

whose relationship is unknown are sometimes called naïve Bayesian models. Again, an assump-
tion of independence does not reflect an affirmative empirical assertion about the real-world rela-
tionship between the parameters, but rather is an expression of ignorance about their relationship.
In the context of perception, there are several ways to think of the source of the prior. Of course,
perceptual data arrive in a continuous stream from the moment of birth (or before). So in one sense
the prior represents belief prior to experience—that is, the innate knowledge about the environ-
ment with which evolution has endowed our brains. But in another sense it simply represents belief
prior to a given perceptual act, in which case it must also reflect the updated beliefs stemming from
learning over the course of life. Of course, there is a long history of controversy about the magni-
tude and specificity of innate knowledge (Elman et al. 1996; Carruthers et al. 2005). Bayesian the-
ory does not intrinsically take a position on this issue, easily accommodating either very broad or
uninformative ‘blank slate’ priors, more narrowly tuned ‘nativist’ priors representing more specific
knowledge about the environment, or anything in between. In any case because adult perceivers
benefit from both innate knowledge and experience, priors estimated by experimental techniques
(e.g. Girshick et al. 2011) must be assumed to reflect both evolution and learning in combination.

Computing the Posterior


In simple situations, it is sometimes possible to derive explicit formulae for the posterior distribu-
tion. For example, normal (Gaussian) priors and likelihoods lead to normal posteriors, allowing
for easy computation. (Priors and posteriors in the same model family are called conjugate.) But in
many realistic situations the priors and likelihoods give rise to an unwieldy posterior that cannot be
expressed analytically. Much of the modern Bayesian literature is devoted to developing techniques
to approximate the posterior in such situations. These include expectation maximization (EM),
Markov chain Monte Carlo (MCMC), and Bayesian belief networks (Pearl 1988), each appropriate
in somewhat different situations. (See Griffiths and Yuille (2006) for a brief introduction to these
techniques, or Hastie et al. (2001) or Lee (2004) for more in-depth treatments.) However it should
be kept in mind that all these techniques share a common core principle, the determination of the
posterior belief based on Bayes’ rule.

Simplicity and likelihood from a Bayesian perspective


The likelihood principle in perceptual theory is the idea that the brain aims to select the hypothesis
that is most likely to be true in the world.7 Recently Bayesian inference has been held up as the
ultimate realization of this principle (Gregory 2006). Historically, the likelihood principle has
been contrasted with the simplicity or minimum principle, which holds that the brain will select the
simplest hypothesis consistent with sense data (Hochberg and McAlister 1953; Leeuwenberg and
Boselie 1988). Simplicity too can be defined in a variety of ways, which has led to an inconclusive
debate in which examples purporting to illustrate the preference for simplicity over likelihood, or
vice versa, could be dissected without clear resolution (Hatfield and Epstein 1985; Perkins 1976).

7  This should not be confused with what statisticians call the likelihood principle, a completely different idea.
The statistical likelihood principle asserts that the data should influence our belief in a hypothesis only via
the probability of those data conditioned on the hypothesis (i.e. the likelihood). This principle is universally
accepted by Bayesians; indeed the likelihood is the only term in Bayes’ rule that involves the data. But it is
violated by classical statistics, where, for example, the significance of a finding depends in part on the prob-
ability of data that did not actually occur in the experiment. For example, when one integrates the tail of a
sampling distribution, one is adding up the probability of many events that did not actually occur.
Bayesian Models of Perceptual Organization 1019

More recently, Chater (1996) has argued that simplicity and likelihood are two sides of the same
coin, for several reasons that stem from Bayesian arguments. First, basic considerations from infor-
mation theory suggest that more likely propositions are automatically simpler in that they can be
expressed in more compact codes. Specifically, Shannon (1948) showed that an optimal code—
meaning one that has minimum expected code length—should express each proposition A in a
code of length proportional to the negative log probability of A, i.e. −log p(A). This quantity is some-
times referred to as the surprisal, because it quantifies how ‘surprising’ the message is (larger values
indicate less probable outcomes), or as the description length (DL), because it also quantifies how
many symbols it occupies in an optimal code (longer codes for more unusual messages). Just as in
Morse code (or for that matter approximately in English) more frequently used concepts should
be assigned shorter expressions, so that the total length of expressions is minimized on average.
Because the proposition with maximum posterior probability (the MAP) also has minimum nega-
tive log posterior probability, the MAP hypothesis is also the minimum DL (MDL) hypothesis. More
specifically, while in Bayesian inference the MAP hypothesis is the one that maximizes the product of
the prior and the likelihood p(H)p(D|H), in MDL the winning hypothesis is the one that minimizes
the sum of the DL of the model plus the DL of the data as encoded via the model [−log p(H)  −  log
p(D|H), a sum of logs having replaced a product]. In this sense the simplest interpretation is neces-
sarily also the most probable—though it must be kept in mind that this easy identification rests on
the perhaps tenuous assumption that the underlying coding language is optimal.
More broadly, Bayesian inference tends to favour simple hypotheses even without any assump-
tions about the optimality of the coding language.8 This tendency, sometimes called ‘Bayes Occam’
(after Occam’s razor, a traditional term for the preference for simplicity), reflects fundamental
considerations about the way prior probability is distributed over hypotheses (see MacKay 2003).
Assuming that the hypotheses Hi are mutually exclusive, then their total prior necessarily equals
one (∑i p(Hi) = 1), meaning simply that the observer believes that one of them must be correct.
This in turn means that models with more parameters must distribute the same total prior over
a larger set of specific models (combinations of parameter settings) inevitably requiring each
model (on average) to be assigned a smaller prior. That is, more highly parametrized models—
models that can express a wider variety of states of nature—necessarily assign lower priors to each
individual hypothesis. Hence in this sense Bayesian inference automatically assigns lower priors
to more complex models and higher priors to simple ones, thus enforcing a simplicity metric
without any mechanisms designed especially for the purpose. This is really an instance of the
ubiquitous bias–variance tradeoff, that is, the tradeoff between the fit to the data (which benefits
from more complex hypotheses) and generalization to future data (which is impaired by more
complex hypotheses; see Hastie et al. 2001). Bayesians argue that Bayes’ rule provides an ideal
solution to this dilemma because it determines the optimal combination of data fit (reflected in
the likelihood) and bias (reflected in the prior).
Indeed the link between probability and complexity is fundamental to information theory, and
also leads to an alternative ‘subjectivist’ method for constructing priors. Kolmogorov (1965) and
Chaitin (1966) introduced a universal measure of complexity (now usually called Kolmogorov
complexity) which in a technical sense is invariant to differences in the language used to express
messages (see Li and Vitányi 1997). This means that just as DL can be thought of as −log p(H),
p(H) can be defined as (proportional to) 2−K(H) where K(H) is the Kolmogorov complexity of the
hypothesis H (see Cover and Thomas 1991). Solomonoff (1964) first observed that this defines a

  ‘The simplest law is chosen because it is most likely to give correct predictions’ (Jeffreys 1939/1961, p. 4).
8
1020 Feldman

‘universal prior’, assigning high priors to simple hypotheses and low priors to complex ones in a
way that is internally consistent and invariant to coding language—another way in which simplic-
ity and Bayesian inference are intertwined (see Chater 1996).
Though the close relationship between simplicity and Bayesian inference is widely recognized,
the exact nature of the relationship is more controversial. Bayesians regard the calculation of the
Bayesian posterior as fundamental, and the simplicity principle as merely a heuristic whose value
derives from its correspondence to Bayes’ rule. The originators of MDL and information-theoretic
statistics (e.g. Akaike 1974; Rissanen 1978; Wallace 2004) take the opposite view, regarding the
minimization of complexity (DL or related measures) as the more fundamental principle and
dismissing as naïve some of the assumptions underlying Bayesian inference (see Burnham and
Anderson 2002; Grünwald 2005). This debate roughly parallels the controversy in the perception
literature over simplicity and likelihood (see Feldman 2009; van der Helm 2013).

Decision Making and Loss Functions


Bayes’ rule dictates how belief should be distributed among hypotheses. But a full account of
Bayesian decision making requires that we also quantify the consequences of each potential deci-
sion, usually called the loss function (or utility function or payoff matrix). For example, misclas-
sifying heartburn as a heart attack costs money in wasted medical procedures, but misclassifying a
heart attack as heartburn may cost the patient his or her life. Hence the posterior belief in the two
hypotheses (heart attack or heartburn) is not sufficient by itself to make a rational decision: one
must also take into account the cost (loss) of each outcome, including both ways of misclassifying
the symptoms as well as both ways of classifying them correctly. More broadly, each combination
of an action and a state of nature entails a particular cost, usually thought of as being given by
the nature of the problem. Bayesian decision theory dictates that the agent select the action that
minimizes the (expected) loss—that is, the outcome which (according to the best estimate, the
posterior) maximizes the benefit to the agent.
Different loss functions entail different rational choices of action. For example, if all incorrect
responses are equally penalized, and correct responses not penalized at all (called zero–one loss) then
the MAP is the rational choice, because it is the one most likely to avoid the penalty. (This is presuma-
bly the basis of the canard that Bayesian theory requires selection of the maximum posterior hypoth-
esis, which is correct only for zero–one loss, and generally incorrect otherwise.) Other loss functions
entail other minimum-loss decisions: for example under some circumstances quadratic loss (e.g. loss
proportional to squared error) is minimized at the posterior mean (rather than the mode, which is
the MAP), and other loss functions are minimized at the posterior median (Lee 2004).
Bayesian models of perception have primarily focused on simple estimation without considera-
tion of the loss function, but this is undesirable for several reasons (Maloney 2002). First, percep-
tion in the context of real behaviour subserves action, and for this reason in the last few decades
the perception literature has evolved towards an increasing tendency to study perception and
action in conjunction. Second, more subtly, it is essential to incorporate a loss function in order to
understand how experimental data speak to Bayesian models. Subjects’ responses are not, after all,
pure expressions of posterior belief, but rather are choices that reflect both belief and the expected
consequences of actions. For example, in experiments, subjects implicitly or explicitly develop
expectations about the relative cost of right and wrong answers, which help guide their actions.
Hence in interpreting response data we need to consider both the subjects’ posterior belief and
their perceptions of payoff. Most experimental data offered in support of Bayesian models actu-
ally show probability matching behaviour, that is, responses drawn in proportion to their posterior
Bayesian Models of Perceptual Organization 1021

probability, referred to by Bayesians as sampling from the posterior. Again, only zero–one loss
would require rational subjects to choose the MAP response on every trial, so probability match-
ing generally rules out zero–one loss (but obviously does not rule out Bayesian models more
generally). The choice of loss functions in real situations probably depend on details of the task,
and remains a subject of research.
Loss functions in naturalistic behavioural situations can be arbitrarily complex, and it is not
generally understood either how they are apprehended or how human decision making takes
them into account. Trommershauser et al. (2003) explored this problem by imposing a moderately
complex loss function on their subjects in a simple motor task; they asked their subjects to touch a
target on a screen that was surrounded by several different penalty zones structured so that misses
in one direction cost more than misses in the other direction. Their subjects were surprisingly
adept at modulating their taps so that expected loss (penalty) was minimized, implying a detailed
knowledge of the noise in their own arm motions and a quick apprehension of the geometry of
the imposed utility function (see also Trommershauser et al. 2008).

Where do the Hypotheses Come From?


Another fundamental problem for Bayesian inference is the source of the hypotheses. Bayesian
theory provides a method for quantifying belief in each hypothesis, but it does not provide the
hypothesis set H, nor any principled way to generate it. Traditional Bayesians are generally con-
tent to assume that some member of the H lies sufficiently ‘close’ to the truth, meaning that it
approximates reality within some acceptable margin of error. Such assumptions are occasionally
criticized as naïve (Burnham and Anderson 2002).
But the application of Bayesian theory to problems in perception and cognition elevates this
issue to a more central epistemological concern. Intuitively, we assume that the real world has a
definite state which perception either does or does not reflect. If, however, the hypothesis set H
does not actually contain the truth—and Bayesian theory provides no reason to believe it does—
then it may turn out that none of our perceptual beliefs may be literally true, because the true
hypothesis was never under consideration (cf. Hoffman 2009; Hoffman and Singh 2012). In this
sense, the perceived world might be both a rational belief (in that the assignment of posterior
belief follows Bayes’ rule) and, in a very concrete sense, a grand hallucination (because none of
the resulting beliefs are true).
Thus while Bayesian theory provides an optimal method for using all information available
to determine belief, it is not magic; the validity of its conclusions is limited by the validity of its
premises. Indeed this point is well understood by Bayesians, who often argue that all inference
is based on assumptions (see Jaynes 2003; MacKay 2003). (This is in contrast to frequentists,
who aspired to a science of inference free of subjective assumptions.) But it gains special signifi-
cance in the context of perception, because perceptual beliefs are the very fabric of subjective
reality.

Competence Versus Performance


Bayesian inference is a rational, idealized mathematical framework for determining perceptual
beliefs, based on the sense data presented to the system coupled with whatever prior knowledge
the system brings to bear. But it does not, in and of itself, specify computational mechanisms for
actually calculating those beliefs. That is, Bayesian inference quantifies exactly how strongly the
system should believe each hypothesis, but does not provide any specific mechanisms whereby
1022 Feldman

the system might arrive at those beliefs. In this sense, Bayesian inference is a competence theory
(Chomsky’s term) or a theory of the computation (Marr’s term), meaning it is an abstract specifica-
tion of the function to be computed rather than the means to compute it. Many theorists, concur-
ring with Marr and Chomsky, argue that competence theories play a necessary role in cognitive
theory, parallel to but distinct from that of process accounts. Competence theories by their nature
abstract away from details of implementation and help connect the computations that experi-
ments uncover with the underlying problem those computations help solve. Conversely, some
psychologists denigrate competence theories as abstractions that are irrelevant to real psychologi-
cal processes (Rumelhart et al. 1986), and indeed Bayesian models have been criticized on these
grounds (McClelland et al. 2010; Jones and Love 2011).
But to those sympathetic to competence accounts, rational models have an appealingly ‘explan-
atory’ quality precisely because of their optimality. Bayesian inference is, in a well-defined sense,
the best way to solve whatever decision problem the brain is faced with. Natural selection pushes
organisms to adopt the most effective solutions available, so evolution should tend to favour
Bayes-optimal solutions whenever possible (see Geisler and Diehl 2002). For this reason, any
phenomenon that can be understood as part of a Bayesian model automatically inherits an evo-
lutionary rationale.

Conclusions
In a sense, perception and Bayesian inference are perfectly matched. Perception is the process by
which the mind forms beliefs about the outside world on the basis of sense data combined with
prior knowledge. Bayesian inference is a system for determining what to believe on the basis of
data and prior knowledge. Moreover, the rationality of Bayesian inference means that perceptual
beliefs that follow the Bayesian posterior are, in a well-defined sense, optimal given the infor-
mation available. This optimality has been argued to provide a selective advantage in evolution
(Geisler and Diehl 2002), driving our ancestors towards Bayes-optimal percepts. Moreover opti-
mality helps explain why the perceptual system, notwithstanding its many apparent quirks and
special rules, works the way it does—because these rules approximate the Bayesian posterior.
Moreover, the comprehensive nature of the Bayesian framework allows it to be applied to any
problem that can be expressed probabilistically. All these advantages have led to a tremendous
increase in interest in Bayesian accounts of perception in the last decade.
Still, a number of reservations and difficulties must be noted. First, to some researchers a
commitment to a Bayesian framework seems to involve a dubious assumption that the brain is
rational. Many psychologists regard the perceptual system as a hodge-podge of hacks, dictated
by accidents of evolutionary history and constrained by the exigencies of neural hardware. While
to its advocates the rationality of Bayesian inference is one of its main attractions, to sceptics the
hypothesis of rationality inherent in the Bayesian framework seems at best empirically implausi-
ble and at worse naïve.
Second, more specifically, the essential role of the prior poses a puzzle in the context of percep-
tion, where the role of prior knowledge and expectations (traditionally called ‘top-down’ influ-
ences) has been debated for decades. Indeed there is a great deal of evidence (see Pylyshyn 1999)
that perception is singularly uninfluenced by certain kinds of knowledge, which at the very least
suggests that the Bayesian model must be limited in scope to an encapsulated perception module
walled off from information that an all-embracing Bayesian account would deem relevant.
Finally, many researchers wonder if the Bayesian framework is too flexible to be taken seriously,
potentially encompassing any conceivable empirical finding. However while Bayesian accounts
Bayesian Models of Perceptual Organization 1023

are indeed quite adaptable, any specific set of assumptions about priors, likelihoods, and loss
functions provides a wealth of extremely specific empirical predictions, which in many specific
perceptual domains have been validated experimentally.
Hence notwithstanding all of these concerns, to its proponents Bayesian inference provides
something that perceptual theory has never really had before: a ‘paradigm’ in the sense of Kuhn
(1962)—that is, an integrated, systematic, and mathematically coherent framework in which
to pose basic scientific questions and evaluate potential answers. Whether or not the Bayesian
approach turns out to be as comprehensive or empirically successful as its advocates hope, this
represents a huge step forward in the study of perception.

Acknowledgments
I am grateful to Lee de-Wit, Vicky Froyen, Manish Singh, Johan Wagemans, and an anonymous
reviewer for helpful comments. Presentation of this article was supported by NIH EY0211494.
Please correspind directly with the author at jacob@ruccs.rutgers.edu.

References
Akaike, H. (1974). ‘A new look at the statistical model identification’. IEEE Trans Automat Contr
19(6): 716–723.
Bayes, T. (1763). ‘An essay towards solving a problem in the doctrine of chances’. Phil Trans R Soc. Lond
53: 370–418.
Bülthoff, H. H. and A. L. Yuille (1991). ‘Bayesian models for seeing shapes and depth’. Comm Theor Biol
2(4): 283–314.
Burge, J., C. C. Fowlkes, and M. S. Banks (2010). ‘Natural-scene statistics predict how the figure- ground
cue of convexity affects human depth perception’. J Neurosci 30(21): 7269–7280.
Burnham, K. P. and D. R. Anderson (2002). Model Selection and Multi-model Inference: a Practical
Information-theoretic Approach (New York: Springer).
Carruthers, P., S. Laurence, and S. Stich (2005). The Innate Mind: Structure and Contents (Oxford: Oxford
University Press).
Chaitin, G. (1966). ‘On the length of programs for computing finite binary sequences’. J Assoc Comput
Machin 13(4): 547–569.
Chater, N. (1996). ‘Reconciling simplicity and likelihood principles in perceptual organization’. Psychol Rev
103(3): 566–581.
Claessens, P. M. E. and J. Wagemans (2008). ‘A Bayesian framework for cue integration in multistable
grouping: proximity, collinearity, and orientation priors in zigzag lattices’. J Vision 8(7): 1–23.
Cohen, E. H., M. Singh, and L. T. Maloney (2008). ‘Perceptual segmentation and the perceived orientation
of dot clusters: the role of robust statistics’. J Vision 8(7): 1–13.
Compton, B. J. and G. D. Logan (1993). ‘Evaluating a computational model of perceptual grouping by
proximity’. Percept Psychophys 53(4): 403–421.
Cover, T. M. and J. A. Thomas (1991). Elements of Information Theory (New York: John Wiley).
Cox, R. T. (1961). The Algebra of Probable Inference (Oxford: Oxford University Press).
Dakin, S. (2013). ‘Statistical regularities’. In Handbook of Perceptual Organization, edited by J. Wagemans.
(This volume, forthcoming.)
de Finetti, B. (1970/1974). Teoria delle Probabilita 1 (Turin: Giulio Einaudi). [Translated by A. Machi and
A. Smith, 1990 as Theory of Probability 1 (Chichester: John Wiley and Sons).]
De Winter, J. and J. Wagemans (2006). ‘Segmentation of object outlines into parts: a large-scale integrative
study’. Cognition, 99(3): 275–325.
1024 Feldman

Earman, J. (1992). Bayes or Bust?: a Critical Examination of Bayesian Confirmation Theory (Cambridge,
MA: MIT Press).
Elder, J. (2013). ‘Contour grouping’. In Handbook of Perceptual Organization, edited by J. Wagemans. (This
volume, forthcoming.)
Elman, J., A. Karmiloff-Smith, E. Bates, M. Johnson, D. Parisi, and K. Plunkett (1996). Rethinking
Innateness: a Connectionist Perspective on Development (Cambridge, MA: MIT Press).
Feldman, J. (1997). ‘Curvilinearity, covariance, and regularity in perceptual groups’. Vision Res
37(20): 2835–2848.
Feldman, J. (2001). ‘Bayesian contour integration’. Percept Psychophys 63(7): 1171–1182.
Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychol Rev 116(4): 875–887.
Feldman, J. (2013). ‘Tuning your priors to the world’. Top Cogn Sci 5(1): 13–34.
Feldman, J. and M. Singh (2006). ‘Bayesian estimation of the shape skeleton’. Proc Natl Acad Sci USA
103(47): 18014–18019.
Feldman, J., Singh, M., and Froyen, V. (2013). ‘Perceptual grouping as Bayesian mixture estimation’. In
Oxford Handbook of Computational Perceptual Organization edited by Gepshtein, Maloney and Singh,
forthcoming.
Fisher, R. (1925). Statistical Methods for Research Workers (Edinburgh: Oliver and Boyd).
Geisler, W. S. and R. L. Diehl (2002). ‘Bayesian natural selection and the evolution of perceptual systems’.
Phil Trans R Soc Lond B 357: 419–448.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge co-occurrence in natural images
predicts contour grouping performance’. Vision Res 41: 711–724.
Girshick, A. R., M. S. Landy, and E. P. Simoncelli (2011). ‘Cardinal rules: visual orientation perception
reflects knowledge of environmental statistics’. Nat Neurosci 14(7): 926–932.
Gregory, R. (2006). ‘Editorial essay’. Perception 35: 143–144.
Griffiths, T. L. and A. L. Yuille (2006). ‘A primer on probabilistic inference’. Trends Cogn Sci 10(7).
Supplement to special issue on Probabilistic Models of Cognition. Available at: <http://www.stat.ucla.
edu/~yuille/pubs/ucla/A204_tgriffiths_chater2007.pdf>)
Grünwald, P. D. (2005). ‘A tutorial introduction to the minimum description length principle’. In Advances
in Minimum Description Length: Theory and Applications, edited by P. D. Grünwald, I. J. Myung, and
M. Pitt.(Cambridge, MA: MIT Press).
Hastie, T., R. Tibshirani, and J. Friedman (2001). The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (New York: Springer).
Hatfield, G. and W. Epstein (1985). ‘The status of the minimum principle in the theoretical analysis of
visual perception’. Psychol Bull 97(2): 155–186.
Hochberg, J. and E. McAlister (1953). ‘A quantitative approach to figural “goodness” ’. J Exp Psychol
46: 361–364.
Hoffman, D. D. (2009). ‘The user-interface theory of perception: natural selection drives true perception to
swift extinction’. In Object Categorization: Computer and Human Vision Perspectives, edited by
S. Dickinson, M. Tarr, A. Leonardis, and B. Schiele.(Cambridge: Cambridge University Press).
Hoffman, D. D. and M. Singh (2012). Computational evolutionary perception. Perception. 41: 1073–1091.
Howie, D. (2004). Interpreting Probability: Controversies and Developments in the Early Twentieth Century
(Cambridge: Cambridge University Press).
Jaynes, E. T. (1982). ‘On the rationale of maximum-entropy methods’. Proc IEEE 70(9): 939–952.
Jaynes, E. T. (2003). Probability Theory: the Logic of Science (Cambridge: Cambridge University Press).
Jeffreys, H. (1939/1961). Theory of Probability, 3rd edn (Oxford: Clarendon Press).
Jones, M. and B. C. Love (2011). ‘Bayesian fundamentalism or enlightenment? On the explanatory status
and theoretical contributions of Bayesian models of cognition’. Behav Brain Sci 34: 169–188.
Bayesian Models of Perceptual Organization 1025

Juni, M. Z., M. Singh, and L. T. Maloney (2010). ‘Robust visual estimation as source separation’. J Vision
10(14): 2; doi: 10.1167/10.14.2.
Kersten, D., P. Mamassian, and A. Yuille (2004). ‘Object perception as Bayesian inference’. Ann Rev Psychol
55: 271–304.
Knill, D. C. and W. Richards (eds) (1996). Perception as Bayesian Inference (Cambridge: Cambridge
University Press).
Kolmogorov, A. N. (1965). ‘Three approaches to the quantitative definition of information’. Prob Inform
Transmission 1(1): 1–7.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions (Chicago: University of Chicago Press).
Lee, M. D. and E.-J. Wagenmakers (2005). ‘Bayesian statistical inference in psychology: comment on
Trafimow (2003)’. Psychol Rev 112(3): 662–668.
Lee, P. (2004). Bayesian Statistics: an Introduction, 3rd edn (Chichester: Wiley).
Leeuwenberg, E. L. J. and F. Boselie (1988). ‘Against the likelihood principle in visual form perception’.
Psychol Rev 95: 485–491.
Li, M. and P. Vitányi (1997). An Introduction to Kolmogorov Complexity and its Applications
(New York: Springer).
McClelland, J. L., M. M. Botvinick, D. C. Noelle, D. C. Plaut, T. T. Rogers, M. S. Seidenberg, et al. (2010).
‘Letting structure emerge: connectionist and dynamical systems approaches to understanding cognition’.
Trends Cogn Sci 14: 348–356.
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms (Cambridge: Cambridge
University Press).
Maloney, L. T. (2002). ‘Statistical decision theory and biological vision’. In Perception and the Physical
World: Psychological and Philosophical Issues in Perception, edited by D. Heyer and R. Mausfeld,
pp. 145–189 (New York: Wiley).
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (San Mateo,
CA: Morgan Kauffman).
Perkins, D. (1976). ‘How good a bet is good form?’ Perception 5: 393–406.
Pylyshyn, Z. (1999). ‘Is vision continuous with cognition? The case for cognitive impenetrability of visual
perception’. Behav Brain Sci 22(3): 341–365.
Ramachandran, V. S. (1985). ‘The neurobiology of perception’. Perception 14: 97–103.
Rissanen, J. (1978). ‘Modeling by shortest data description’. Automatica 14: 465–471.
Rumelhart, D. E., J. L. McClelland, and G. E. Hinton (1986). Parallel Distributed Processing: Explorations in
the Microstructure of Cognition (Cambridge, MA: MIT Press).
Shannon, C. (1948). ‘A mathematical theory of communication’. Bell Syst Tech J 27: 379–423.
Singh, M. (2013). ‘Visual representation of contour geometry’. In Handbook of Perceptual O rganization,
edited by J. Wagemans. (This volume, forthcoming.)
Singh, M. and J. M. Fulvio (2005). ‘Visual extrapolation of contour geometry’. Proc Natl Acad Sci USA
102(3): 939–944.
Singh, M. and D. D. Hoffman (2001). ‘Part-based representations of visual shape and implications for visual
cognition’. In From Fragments to Objects: Segmentation and Grouping in Vision, Advances in Psychology
Vol. 130, edited by T. Shipley and P. Kellman, pp. 401–459 (New York: Elsevier).
Solomonoff, R. (1964). ‘A formal theory of inductive inference: part II’. Inform Control 7: 224–254.
Stigler, S. M. (1983). ‘Who discovered Bayes’s theorem?’ Am Statistician 37(4): 290–296.
Stigler S. M. (1986). The History of Statistics: the Measurement of Uncertainty Before 1900 (Cambridge,
MA: Harvard University Press).
Trommershauser, J., L. T. Maloney, and M. S. Landy (2003). ‘Statistical decision theory and the selection of
rapid, goal-directed movements’. J Opt Soc Am A: Opt Image Sci Vis 20(7): 1419–1433.
1026 Feldman

Trommershauser, J., L. T. Maloney, and M. S. Landy (2008). ‘Decision making, movement planning and
statistical decision theory’. Trends Cogn Sci 12(8): 291–297.
van der Helm, P. (2013). ‘Simplicity in perceptual organization’. In Handbook of Perceptual Organization,
edited by J. Wagemans. (This volume, forthcoming.)
Wallace, C. S. (2004). Statistical and Inductive Inference by Minimum Message Length (New York: Springer).
Weiss, Y., E. P. Simoncelli, and E. H. Adelson (2002). ‘Motion illusions as optimal percepts’. Nat Neurosci
5(6): 598–604.
Wilder, J., J. Feldman, and M. Singh (2011). ‘Superordinate shape classification using natural shape
statistics’. Cognition 119: 325–340.
Zucker, S. W., K. A. Stevens, and P. Sander (1983). ‘The relation between proximity and brightness
similarity in dot patterns’. Percept Psychophys 34(6): 513–522.
Chapter 50

Simplicity in perceptual organization


Peter A. van der Helm

1 Introduction
Perceptual organization is the neuro-cognitive process that takes the light in our eyes as input
and that enables us to interpret scenes as structured wholes consisting of objects arranged in
space—wholes which, moreover, usually are sufficiently veridical to guide action. This auto-
matic process may seem to occur effortlessly, but by all accounts, it must be very complex and
yet very flexible. To organize meaningless patches of light into meaningfully structured wholes
within (literally) the blink of an eye, it must combine a high combinatorial capacity with a
high speed (notice that a recognition model that tests previously stored templates against the
visual input might avoid the combinatorics but would not achieve the required speed). To give
a gist (following Gray 1999, but many others have argued similarly), multiple sets of features
at multiple, sometimes overlapping, locations in a stimulus must be grouped simultaneously.
This implies that the process must cope with a large number of possible combinations in paral-
lel, which also suggests that these possible combinations are engaged in a stimulus-dependent
competition between grouping criteria. Hence, the combinatorial capacity of the perceptual
organization process must be very high. This, together with its high speed (it completes in
the range of 100–300 ms), reveals the truly impressive nature of the perceptual organization
process.
One of the great mysteries of perception is how the human visual system manages to do all this.
An intriguing idea in this context is that, from among all possible interpretations of a stimulus,
the visual system selects the one defined by a minimum number of parameters. This simplicity
principle has gained empirical support but is also controversial. Indeed, simplicity is obviously an
appealing property in many settings, but can it be the guiding principle of the intricate process
sketched above? To review this idea, this chapter focuses on underlying theoretical issues which
may be introduced by way of a brief history of this principle.

2  A brief history of simplicity


An early predecessor of the simplicity principle is what became known as Occam’s razor. Its
origins can be traced back to Aristotle (384–322 BC), and it entails the advice—expressed in
various forms by William of Occam (±1290–1349)—to keep theories and models as simple as
possible, that is, to not make them more complex than needed to account for the available data.
The underlying idea is that, all else being equal, the simplest of all possible interpretations of data
is the best one. A modern version of Occam’s razor is Rissanen’s (1978) minimum description
length principle (MDL principle) in the mathematical domain of algorithmic information the-
ory (AIT, a.k.a. the theory of Kolmogorov complexity; Li and Vitányi 1997). The MDL principle
1028 van der Helm

applies to model selection and, more general, to inductive inference (Solomonoff 1964a, 1964b).
It proposes a trade-off between the complexity of hypotheses as such and their explanatory
power, as follows:
The best hypothesis to explain given data is the one that minimizes the sum of
(a)  the information needed to describe the hypothesis; and
(b)  the information needed to describe the data with the help of the hypothesis.
For instance, in physics, Einstein’s theory as such is more complex than that of Newton, but
because it explains much more data, it is nevertheless considered to be better. Applied to percep-
tual organization, the two amounts of information above can be taken to refer to, respectively, the
view-independent complexity of hypothesized distal stimuli as such and their view-dependent
degree of consistency with the proximal stimulus at hand. The MDL principle then suggests that,
in the absence of further knowledge, the best interpretation of a stimulus is the one that mini-
mizes the sum of these two amounts of information.
Another predecessor of the simplicity principle is the law of Prägnanz. The early twentieth-century
Gestalt psychologists Wertheimer (1912, 1923), Köhler (1920), and Koffka (1935) proposed that this
law underlies perceptual groupings based on properties such as symmetry and similarity. It was
inspired by the minimum principle in physics, which holds that dynamic physical systems tend to
settle into relatively stable states defined by minimum energy loads. Applied to perceptual organiza-
tion, the law of Prägnanz suggests that, when faced with a stimulus, the human visual system tends
to settle into relatively stable neural states reflecting cognitive properties such as symmetry and sim-
plicity. This idea does not exclude the influence of knowledge represented at higher cognitive levels,
but it takes this influence to be subordinate to stimulus-driven mechanisms of a largely autonomous
visual system.
Nowadays, the neural side of the law of Prägnanz finds elaboration in connectionist and
dynamic-systems approaches to cognition. In the spirit of Marr’s (1982/2010) levels of descrip-
tion, these two kinds of approaches are complementary in that connectionism usually focuses on
the internal mechanisms of information processing systems, while dynamic systems theory (DST)
usually focuses on the physical development over time of whole systems. Also complementary,
but then usually focusing on the nature of outcomes of information processes, is representational
theory in which the cognitive side of the law of Prägnanz finds elaboration. This may be specified
as follows. For perceptual organization, Koffka formulated the law of Prägnanz as holding
‘of several geometrically possible organizations that one will actually occur which possesses the best, the
most stable shape’ (1935: 138),
and Hochberg and McAlister put this in information-theoretic terms by
‘the less the amount of information needed to define a given organization as compared to the other alterna-
tives, the more likely that the figure will be so perceived’ (1953: 361),
specifying descriptive information loads, or complexities, by
‘the number of different items we must be given, in order to specify or reproduce a given pattern (361).
Hochberg and McAlister coined this information-theoretic idea as the descriptive minimum principle, and
nowadays it is also known as the simplicity principle.
Hence, just as the MDL principle in AIT, the simplicity principle in perception promotes sim-
plest codes as specifying the outcomes of an inference process based on descriptive codes of
things. Such descriptive codes are much like computer codes, that is, representations that can
be seen as reproduction recipes for things and whose internal structures are therefore enforced
by the internal structures of those things. Both the MDL principle and the simplicity principle
Simplicity in Perceptual Organization 1029

reflect modern information-theoretic approaches which contrast with Shannon’s (1948) classical
selective-information approach in communication theory. Shannon’s approach promotes optimal
codes, that is, nominalistic label codes (as in the Morse code) that minimize the long-time aver-
age burden on communication channels—assuming the transmission probabilities of codes are
known. The simplicity principle further contrasts with von Helmholtz’s (1909/1962) likelihood
principle. The latter holds that the internal neuro-cognitive process of perceptual organization
is guided by veridicality and yields interpretations most likely to be true in the external world—
assuming such probabilities are known. Shannon’s and von Helmholtz’s approaches are appealing
but suffer from the problem that, in many situations, the required probabilities are unknown if not
unknowable. A main objective of modern descriptive-information theory is to circumvent this
problem, that is, to make inferences without having to know the real probabilities.
An initial problem for modern information theory was that complexities depend on the
chosen descriptive coding language. However, both theoretical findings in AIT (Chaitin 1969;
Kolmogorov 1965; Solomonoff 1964a, 1964b) and empirical findings in perception (Simon 1972)
provided evidence that, regarding complexity rankings, it does not matter much which descrip-
tive coding language is employed. This evidence is not solid proof, but does suggest that descrip-
tive simplicity is a fairly stable concept.
The simplicity principle in perception agrees with ideas by Attneave (1954 1982) and Garner
(1962, 1974), for instance, and it has been promoted most prominently in Leeuwenberg’s
(1968, 1969, 1971) structural information theory (SIT). SIT was developed independently of
AIT, but in hindsight, its current implementation of the simplicity principle can be seen as a
perception-tailored version of the MDL principle in AIT (van der Helm 2000). A notable dif-
ference, though, is that the MDL principle postulates that simplest interpretations are the best
ones (without qualifying what ‘best’ means), whereas the simplicity principle postulates that
they are the ones most likely to result from the internal neuro-cognitive process of perceptual
organization—which may not be interpretations most likely to be true in the external world.
This historical overview raises three questions which, below, are discussed in more detail.
The first question is whether the human visual system indeed organizes stimuli in the simplest
way; this is basically an empirical question, but because it has been plagued by unclarities, it is
addressed by looking at operationalizations of simplicity. The second question is whether simplest
stimulus organizations are sufficiently veridical; this is a theoretical question which is addressed
by using AIT findings in a comparison between the simplicity and likelihood principles. The third
question is whether the simplicity principle agrees with the putative high combinatorial capacity
and speed of perceptual organization; this is a tractability question which is addressed by relating
SIT to DST and connectionism to assess how the simplicity principle might be neurally realized.

3  Operationalizations of simplicity
Hochberg and McAlister (1953) introduced the simplicity principle in an article entitled A quan-
titative approach to figural ‘goodness’. Figural goodness is an intuitive Gestalt notion and the idea
behind the association between descriptive simplicity and goodness is that simplicity entails both
accuracy and parsimony. For instance, a square can be represented as if it were a rectangle, but
representing it as a square is both more accurate and more efficient in terms of memory resources
as it requires fewer descriptive parameters. Assuming that patterns are represented in the simplest
way, simpler patterns are thus expected to be better in the sense that they can be remembered or
reproduced more easily.
1030 van der Helm

Hence, the motto here is ‘what is simple, is easy to learn’. Notice that this is the inverse of the
motto ‘what has been learned, is simple’, which expresses that patterns that have been seen often
are familiar so that they are experienced as being simple. The latter motto agrees with the likeli-
hood principle rather than with the simplicity principle, but it shows that simplicity has different
connotations which may be relevant in different settings (see also Sober 2002). Therefore, this
section first addresses this issue.

3.1  Classical vs modern information-theoretic simplicity


In classical selective-information theory, the idea is that simpler things are things that convey
less information because they belong to larger sets of actually occurring equivalent things (i.e.
identical things, or similar things if their dissimilarities can be ignored in the situation at hand).
A random dot cloud, for instance, is thus said to be simple: the set of random dot clouds is larger
than any set of more structured dot patterns, so that a randomly picked dot pattern has a relatively
high probability of being a random dot cloud. It therefore gets a shorter optimal code in Shannon’s
selective-information approach.
The objects in Figure 50.1, on the other hand, can be said to be simple in the sense that
they have a highly regular internal structure. This idea about simplicity agrees with modern
descriptive-information theory, in which individual things get shorter descriptive codes if they
contain more structural regularity. This time, things may also be simple for another reason, by the
way. For instance, the binary string 11111111111 is simple because it contains a structural regu-
larity as all bits are identical, while the binary string 01 is simple because it contains only two bits.
Shortest descriptive codes account for the simplicity of both cases, but for the rest, the two cases
are hardly comparable. This illustrates that the complexity of simplest codes is not always the most
appropriate property to be used in inter-stimulus comparisons (i.e. in comparisons between inter-
pretations of different stimuli). Indeed, the simplicity principle applies primarily to intra-stimulus
comparisons, that is, to comparisons between different candidate interpretations of an individual
stimulus. Furthermore, beside the complexity, also other properties of simplest codes may be
used in inter-stimulus comparisons. For instance, unlike optimal codes, simplest codes have a
hierarchical structure reflecting the hierarchical structure of simplest stimulus organizations, so

(a) (b)

Fig. 50.1  Objects that are simple because they have a highly regular internal structure consisting
of a superstructure (visualized by thick dashes) that determines the positions of many identical
subordinate structures (visualized by thin dashes). The hierarchy in (a) is the inverse of that in (b),
and in both cases, the objects are presumably classified on the basis of primarily the perceptually
dominant superstructure.
Simplicity in Perceptual Organization 1031

that classifications of different stimuli may be assessed on the basis of these hierarchical code
structures (see Figure 50.1; for more examples, see Leeuwenberg and van der Helm 2014).
These different ideas about simplicity are also reflected in the following. In classical informa-
tion theory, the length of an optimal code for an individual pattern is determined by the size of
the set of all actually occurring identical patterns. In modern information theory, conversely, the
length of the simplest descriptive code for an individual pattern determines the size of the set
of all theoretically possible equally complex patterns (as in AIT, which focuses on the algorith-
mically relevant complexities of simplest descriptive codes) or the set of all theoretically pos-
sible equally structured patterns (as in SIT, which focuses on the perceptually relevant structural
classes implied by simplest descriptive codes). The fact that descriptively simpler patterns belong
to smaller structural classes (Collard and Buffart 1983) agrees with Garner’s (1962, 1970) idea
of inferred subsets and his motto of ‘good patterns have few alternatives’. For instance, the set of
all imaginable squares is smaller than the set of all imaginable rectangles. In fact, in perception,
the structural class to which a pattern belongs is considered to be more relevant than its precise
metrical details (MacKay 1950), so that one could say that this class constitutes the generic repre-
sentation of the pattern (e.g. the mental representation of a particular square primarily represents
‘a square’ and its precise size is secondary). This suggests that a pattern should not be treated in
isolation, but in reference to its structural class (Lachmann and van Leeuwen 2005a, 2005b).
Hence, all in all, it is true that Shannon’s optimal codes have a flavour of simplicity. They are
shorter for more frequently occurring things, and thereby, minimize the long-term average length
of nominalistic label codes over many identical and different things. However, it is crucial to
distinguish this from the simplicity principle which minimizes the length of descriptive codes
for individual things. Furthermore, notice that the foregoing deals with view-independent prop-
erties only. Indeed, initially, both the simplicity principle and likelihood principle focused on
view-independent properties of hypothesized distal objects to predict the most likely outcome
of the perceptual organization process—that is, ignoring how well hypotheses fit the proximal
data. The latter issue is about view-dependencies, and as discussed next, the inclusion of this issue
boosted research on perceptual organization.

3.2 View-dependencies
Because descriptive simplicity is a fairly stable concept (see above), the assessment of complexities
of hypothesized distal objects (i.e. objects as hypothesized in candidate interpretations) as such
is not a big problem for the simplicity principle. For the likelihood principle, however, the assess-
ment of their probabilities is a problem. It predicts that the most likely outcome of the perceptual
organization process is the one that is also objectively most likely to be true in the world. However,
despite suggestions (Brunswick 1956), such objective probabilities in the world are unknown, if
not unknowable. This does not exclude that perception is guided by the likelihood principle, but
it does mean that this may not be verifiable (Leeuwenberg and Boselie 1988).
Be that as it may, in the 1980s, proponents of the likelihood principle switched to view-depend-
ent properties, that is, to properties that determine the degree of consistency between a candi-
date interpretation and the proximal stimulus (see, e.g., Gregory 1980). For these properties, fair
approximations of their objective probabilities in the world can be assessed better. This led to a
debate in which advocates of one principle presented phenomena that were claimed to be explained
by this principle but not by the other principle—however, advocates of the other principle were
generally able to counter such arguments (see, e.g., Boselie and Leeuwenberg’s 1986, reaction to
Rock 1983, and to Pomerantz and Kubovy 1986; Sutherland’s 1988, reaction to Leeuwenberg and
Boselie 1988; Leeuwenberg, van der Helm, and van Lier’s 1994, reaction to Biederman 1987). The
1032 van der Helm

crux of this debate is illustrated by Figure 50.2, for which both principles—as formulated at the
time—would make the correct amodal-completion prediction. That is, the simplicity principle
could say that the preferred interpretation is the one in which, view-independently, the com-
pleted shape is the simplest one. The likelihood principle, conversely, could say that it is the one
without unlikely view-dependent coincidences of edges and junctions of the two shapes.
Both arguments seemed to be valid, and in both the simplicity paradigm and the likelihood
paradigm, the result of this debate was the insight that perceptual organization requires an inte-
grated account of both view-independent and view-dependent factors (see, e.g., Gigerenzer and
Murray 1987; Knill and Richards 1996; Tarr and Bülthoff 1998; van der Helm 2000; van Lier, van
der Helm, and Leeuwenberg 1994, 1995; van Lier 1999). For the simplicity principle, such an
integration implies compliance with the MDL principle in AIT (see above), and no matter which
underlying principle one adopts, it concurs with an integration of information from the ven-
tral and dorsal streams in the brain (Ungerleider and Mishkin 1982). These streams are believed
to be dedicated to object perception and spatial perception, respectively, and an integration of
view-independent and view-dependent factors can thus be said to reflect an interaction between
these streams, to go from percepts of objects as such to percepts of objects arranged in space.
Hence, the past few decades showed a convergence of ideas about the factors to be included in
perceptual organization. This convergence, however, does not mean that the two principles agree
on how these factors are to be quantified. As explicated next in Bayesian terms, the latter issue is
not just a matter of complexities vs probabilities.

3.3  Bayesian models


Thomas Bayes (1702–1761) proposed what became known as Bayes’ rule (Bayes 1763/1958). It
holds that the posterior probability p(H|D) of hypothesis H given data D is to be computed by
multiplying the prior probability p(H) of hypothesis H as such and the conditional probability
p(D|H) of data D given hypothesis H (it also involves a normalization factor, but this factor is cur-
rently irrelevant as it does not affect the ranking of hypotheses by their posterior probabilities).
Bayes’ rule is a powerful mathematical tool to model all kinds of things in terms of probabilities
(for more background information, see Feldman, this volume). Its general goal is to establish a
posterior probability distribution over hypotheses, but a specific goal is to select the most likely
hypothesis, that is, the one with the highest posterior probability under the employed prior and

(a) (b) (c)

Fig. 50.2  The pattern in (a) is readily interpreted as a parallelogram partly occluding the shape in
(b) rather than the shape in (c). In this case, this preference could be claimed to occur either because,
unlike the shape in (b), the shape in (c) would have to take a rather coincidental position to yield the
pattern in (a), or because the shape in (b) is simpler than the shape in (c). In general, however, both
factors seem to play a role.
Simplicity in Perceptual Organization 1033

conditional probabilities. Notice, however, that Bayes’ rule does not prescribe where the prior and
conditional probabilities come from (cf. Watanabe 1969). The failure to recognize this crucial
point has led to overly strong claims (see also Bowers and Davis 2012a, 2012b). For instance,
Chater (1996) claimed that the simplicity and likelihood principles in perception are equiva-
lent, but this claim assumed implicitly—and incorrectly—that any Bayesian model automatically
implies compliance with the Helmholtzian likelihood principle (van der Helm 2000, 2011a). This
may be clarified further as follows.
In Bayesian terms, the above-mentioned convergence of ideas about the factors to be included
in perceptual organization means that both the likelihood paradigm and the simplicity para-
digm nowadays promote an integration of priors and conditionals—where the priors refer to
view-independent factors of candidate interpretations as such, while the conditionals refer to their
view-dependent degree of consistency with proximal stimuli. Hence, Bayes’ rule can be employed
to predict the most likely outcome of the human perceptual organization process. However, for a
modeller, the key question then is: where do I get the priors and conditionals from? If one wants
to model perceptual organization rather than explaining it, one might subjectively choose certain
probabilities, whether or not backed up by compelling arguments (for fine examples, see Knill and
Richards 1996). This is customary in Bayesian approaches, but notice that compliance with either
one of the explanatory simplicity and likelihood principles requires more specific probabilities.
The natural way to model the likelihood principle, on the one hand, is to use Bayes’ rule. After
all, this principle assumes that objective probabilities in the world (pw) determine the outcome of
the perceptual organization process. That is, for proximal stimulus D, the likelihood principle can
be formalized in Bayesian terms by:
Select the hypothesis H that maximizes pw(H|D) = pw(H) * pw(D|H)

where pw(H) is the prior probability of hypothesis H, while pw(D|H) is the probability that the
proximal stimulus D arises if the real distal stimulus is as hypothesized in H.
The natural way to model the simplicity principle, on the other hand, is to minimize the sum of
prior and conditional complexities (just as specified for the MDL principle in AIT). However, one
may also convert descriptive complexities C into artificial probabilities pa = 2−C; these are called
algorithmic probabilities in AIT (Li and Vitányi 1997) and precisals in SIT (van der Helm 2000).
Under this conversion, minimizing the sum of prior and conditional complexities C is equivalent
to maximizing the product of prior and conditional probabilities pa. Normalization then is irrel-
evant, and these artificial probabilities thus imply that also the simplicity principle can be formal-
ized in Bayesian terms, namely, by:
Select the hypothesis H that maximizes pa(H|D) = pa(H)*pa(D|H)

Thus, both principles can be formalized in Bayesian terms to predict the most likely outcome of
the perceptual organization process. The crucial difference then still is, however, that the likeli-
hood principle employs probabilities pw based on the frequency of occurrence of things in the
world whereas the simplicity principle employs probabilities pa derived from the descriptive com-
plexity of individual things.
Hence, to determine if the Bayesian formulation of the simplicity principle complies with the
likelihood principle, one should assess how close the latter’s objective probabilities pw and the
former’s artifical probabilities pa might be (van der Helm 2000, 2011a). This is discussed further
in the next section, but notice that a proof of equivalence of the principles is out of the question,
simply because the pw are unknown. The next two examples may illustrate various things dis-
cussed so far.
1034 van der Helm

3.4  Example 1: Straight vs curved edges


The general viewpoint assumption is an assumption put forward in the likelihood paradigm
(Biederman 1987; Binford 1981; Rock 1983; Witkin and Tenenbaum 1983). It holds that a proxi-
mal stimulus is interpreted assuming it does not contain features that would arise only in an
accidental view of the distal stimulus. This suggests, for instance, that a proximal straight line can
safely be interpreted as a distal straight edge because it can be caused by a distal curved edge only
from an accidental viewpoint position. Straightness is therefore called a non-accidental prop-
erty: if such a property is present in the proximal stimulus, then it is most likely present in the
distal stimulus too.
The general viewpoint assumption is indeed plausible, but notice that it derives its plausibility
from favouring interpretations involving high conditional probabilities. For instance, a curved
distal edge yields a straight proximal line from hardly any viewpoint, so that a straight proximal
line has a low probability to occur if the curved distal edge hypothesis were true. A straight distal
edge, conversely, yields a straight proximal line from nearly every viewpoint, so that a straight
proximal line has a high probability to occur if the straight distal edge hypothesis were true. It is
true that Pomerantz and Kubovy (1986) argued that, in the case of a straight proximal line, the
preference for the straight distal edge hypothesis should be justified by showing that straight edges
occur more frequently in the world than curved edges. This, however, would be a justification in
terms of prior probabilities whereas, as just argued, it is justified better in terms of conditional
probabilities. Yet, according to Bayes’ rule, a high conditional probability may be suppressed by
a low prior probability, so it still remains to be seen if the prior probability in the world is high
enough to allow for a justification within the likelihood paradigm (Leeuwenberg, van der Helm,
and van Lier 1994).

3.5 Example 2: T-junctions
Each of the four configurations in Figure 50.3 can, in principle, be interpreted as consisting of
one object or as consisting of two objects. Going from left to right, however, the two-objects
interpretation (definitely preferred in a) gradually looses strength in favour of the one-object
interpretation (definitely preferred in d). By way of a clever experiment involving twelve of such
configurations, Feldman (2007) provided strong evidence for this. For instance, he found that,
just as the configuration in a, the T-junction in b is perceived as two objects, and that, just as the
configuration in d, the hook in c is perceived as one object.
T-junctions are particularly interesting because, in many models of amodal completion, they
are considered to be cues for occlusion (e.g. Boselie 1994; see also van Lier and Gerbino, this vol-
ume). That is, if the proximal stimulus contains a T-junction, this is taken as a strong cue that the
distal scene comprises one surface partly occluded by another (see, e.g., Figure 50.2). However,
before the visual system can infer this occlusion, it first has to segment the proximal stimulus into
the visible parts of those two surfaces, and Feldman’s (2007) data in fact suggest that T-junctions
are cues for segmentation rather than for occlusion. That is, they trigger segmentation even when
occlusion is not at hand.
To explain this, one may invoke van Lier, van der Helm, and Leeuwenberg’s (1994) empirically
successful amodal-completion model. It quantifies prior complexities of interpretations using
SIT’s coding model, and it quantifies conditional complexities under the same motto, namely,
that complexity reflects the effort to construct things. Thus, for an interpretation, the prior
complexity reflects the effort to construct the hypothesized distal objects, and the conditional
complexity reflects the effort to bring these objects in the relative position given in the proximal
Simplicity in Perceptual Organization 1035

stimulus. Notice that these conditional complexities are quantitatively equal to what Feldman
(2007, 2009) called co-dimensions—with the difference that Feldman (who assumed uniform pri-
ors) took a high co-dimension to be an asset of an interpretation, whereas van Lier, van der Helm,
Leeuwenberg’s (who assumed non-uniform priors) took a high conditional complexity to be a
liability. The latter agrees with the simplicity principle, and implies the following for Figure 50.3.
Going from left to right, the one-object interpretation has prior complexities of 5, 4, 3, and 1
(reflecting the number of line segments and angles needed to describe each configuration as one
object) and a conditional complexity of 0 in each case (i.e. no degree of positional freedom to be
removed to arrive at the proximal configurations). Likewise, the two-objects interpretation has a
prior complexity of 2 in each case (i.e. just two separate line segments to be described) and con-
ditional complexities of 0, 1, 2, and 3 (reflecting the degrees of positional freedom to be removed
to arrive at the proximal configurations). Hence, the one-object interpretation has posterior com-
plexities of 5, 4, 3, and 1, respectively, and the two-objects interpretation has posterior complexi-
ties of 2, 3, 4, and 5, respectively. This explains Feldman’s (2007) data that the hook is preferably
interpreted as one object whereas the T-junction is preferably interpreted as two objects (see also
van der Helm 2011a).
Hence, both examples stress the relevance of an interplay between non-uniform priors and
non-uniform conditionals. Notice that this still stands apart from the difference between the sim-
plicity and likelihood principles. This difference returns in the next section.

4  The veridicality of simplicity


Evolutionarily, a fair degree of veridicality in the world seems a prerequisite for any visual system
to survive. The likelihood principle yields highly veridical percepts by definition, but what about
the simplicity principle? It is true that Mach (1922/1959) suggested that simplicity and likeli-
hood are different sides of the same coin; that Perkins (1976) concluded that simplest interpreta-
tions run little risk of misinterpreting stimuli; and that the MDL principle postulates that simplest
interpretations are the best ones. However, it is not obvious at all that simplicity yields veridicality
(see also Sober 2002). For instance, the simplicity and likelihood principles cannot be proved to
be equivalent (see above). The next two preconsiderations set the stage for a further discussion of
this issue.

(a) (b) (c) (d)


Fig. 50.3  Four configurations that can be interpreted as consisting of one object or as consisting
of two objects. Taken as one object, a simpler (i.e. more regular) one belongs to a smaller object
category; taken as two objects, a simpler (i.e. less coincidental) relative position of the two objects
belongs to a larger position category.
1036 van der Helm

4.1  Preconsideration 1: Feature extraction versus feature


integration
In neuroscience, the perceptual organization process is believed to comprise three intertwined
subprocesses which, together, yield integrated percepts composed of selected features (Lamme
and Roelfsema 2000; Lamme, Supèr, and Spekreijse 1998). These subprocesses are feature extrac-
tion, feature binding, and feature selection (see next section for more details). As for feature
extraction, the visual system’s sensitivity to basic features such as line orientations seems to cor-
relate with their objective probabilities of occurrence in the world (Howe and Purves 2004, 2005;
Yang and Purves 2003, 2004). This is interesting as it suggests that the visual system’s capability
to extract features has adapted to the statistics in the world. This may even extend to features
like symmetry, and seems to be in the spirit of the likelihood principle rather than the simplicity
principle.
The simplicity principle is indeed silent about the visual system’s feature extraction capability,
but notice that it is in its spirit to assume that, via a two-way interaction between visual systems
and the world, feature extraction mechanisms obtained sufficient evolutionary survival value (see
below; see also van der Helm, this volume). Currently more important, however, is that the sim-
plicity and likelihood principles differ fundamentally regarding the selection of integrated per-
cepts, and that the issue at stake here is not the visual system’s feature extraction capability, but the
veridicality of integrated percepts.

4.2  Preconsideration 2: Occamian bias in Bayesian modelling


It has been noticed that Bayesian models tend to exhibit a bias towards simplicity (MacKay
2003), and this bias has been taken to reflect a rapprochement of the simplicity and likeli-
hood principles (Feldman 2009; Sober 2002). This bias, however, has nothing to do with the
Helmholtzian likelihood principle, and merely reflects a Bayesian implementation of the sim-
plicity principle. This becomes clear if one looks closer at MacKay’s (2003) explication of this
bias. MacKay argued that a category of more complex instances spreads probability mass over
a larger number of instances than a category of simpler instances does, so that individual
instances in such a smaller category tend to get higher probabilities. This, however, presup-
poses (a) a correlation between complexity and category size, and (b) that every category gets
an equal probability mass. These assumptions cannot be justified within the likelihood para-
digm, but are in line with the simplicity paradigm.
That is, MacKay seemed to have in mind a world with objects generated, each time, by first select-
ing randomly a complexity category, and then by selecting randomly an instance from that category.
Thus, in the first step, every category has a same probability of being selected, and in the second step,
every instance in the selected category has again a same probability of being selected. The instances in
a category of complexity C can be defined by C parameters, so that the category size is proportional to
2C. This implies that the probability that a particular instance is selected is proportional to 2−C which,
notably, is nothing but the simplicity paradigm’s artificial probability pa (see previous section).

4.3  The margin between simplicity and likelihood


In the just-sketched imagined world, the simplicity and likelihood principles would actually be
equivalent (at least, regarding the priors). Thereby, it touches upon the heart of the veridicality
issue, that is, it immediately raises the question of how close this imagined world might be to the
actual world, or more general, the question of how close the two principles might be in other
imaginable worlds. Because the probabilities in the actual world are unknown, the first question
Simplicity in Perceptual Organization 1037

cannot be answered, but the second question found an answer in AIT’s Fundamental Inequality
(Li and Vitányi 1997) which, in my words, holds:
For any enumerable probability distribution P over things x with Kolmogorov complexities K(x), the
difference between the real probabilities p(x) and the artificial probabilities 2−K(x) is maximally equal to
the complexity K(P) of the distribution P.

An enumerable distribution is (or can, with arbitrary precision, be approximated by) a


rational-valued function of two nonnegative integer arguments (examples are the uniform distri-
bution, the normal distribution, and the Poisson distribution). Furthermore, the complexity K(P)
is the length of a shortest descriptive code specifying the probabilities p(x), that is, it is roughly
given by the number of different categories to which P assigns probabilities. In other words, the
fewer different categories to be considered, the fewer different probabilities to be assigned, the
simpler the probability distribution is.
The Fundamental Inequality is admittedly a very general finding. It is unknown if any actual
world exhibits an enumerable distribution over things, and Kolmogorov complexity is in fact
an incomputable theoretical construct. Nevertheless, this finding holds for both priors and con-
ditionals and suggests that, depending on the probability distribution in a world at hand, the
simplicity and likelihood principles might be close. The next question then is what this evidence
suggests regarding the veridicality of simplest interpretations in perception.
In this respect, notice first that natural environments like jungles exhibit larger shape diversities
than those exhibited by human-made environments like cities. The Fundamental Inequality then
suggests that simplicity-guided visual systems yield a higher degree of veridicality in human-made
environments than in natural environments. This makes sense considering that jungle inhabitants
rely on smell and sound rather than on sight. In fact, the Fundamental Inequality seems to explain
why organisms tend to create environments with reduced shape diversity (Allen 1879), that is, if
visual systems indeed are guided by simplicity, then reducing shape diversity enables them to yield
more veridical percepts. This would establish the above-mentioned two-way interaction between
visual systems and the world (van der Helm 2011b). To evaluate the relevance of the Fundamental
Inequality in perception in more detail, one has to consider priors and conditionals separately.
First, even in human-made environments, the shape diversity may be too large to allow for
a simple probability distribution. The Fundamental Inequality then suggests that the difference
between prior probabilities pw in the world and simplicity-based artificial prior probabilities pa
may well be large. In any case, there is no indication that the pa might be veridical. Another way
of looking at this is by considering structural-class sizes. That is, simpler objects (i.e. those with
higher pa) belong to smaller object categories (see Figure 50.3), which suggests that they probably
occur with lower pw in the world. Hence, the simplicity and likelihood principles seem far apart
regarding the priors. For instance, straight edges are simpler than curved edges, but there is no
reason to assume they occur more frequently.
Second, different views of a scene usually give rise to only a few qualitatively different spatial
arrangements of objects. This small diversity suggests, by the Fundamental Inequality, that the dif-
ference between conditional probabilities pw in the world and simplicity-based artificial conditional
probabilities pa may well be small, so that the pa may well be veridical. To look at this too in another
way, Figure 50.3 illustrates that simpler arrangements (i.e. those with higher pa) belong to larger sets of
position categories, which suggests that they probably also occur with higher pw in the world. Hence,
the simplicity and likelihood principles seem close regarding the conditionals. For instance, for the spa-
tial arrangements in Figure 50.3, the conditional complexities as formally quantified by van Lier, van
der Helm, and Leeuwenberg (1994) are in fact basically identical to the number of coincidences one
1038 van der Helm

would count intuitively. Hence, taking high conditional complexities to be a liability (as the simplicity
principle does) agrees with Rock’s (1983) avoidance-of-coincidences principle which is in line with
the general viewpoint assumption as put forward in the likelihood paradigm (see previous section).
Thus, in sum, the simplicity principle’s priors are probably not veridical, but its conditionals
probably are. On the one hand, this suggests that attempts to assess if the human visual system
is guided by simplicity or by likelihood should focus on the priors, because the conditionals do
not seem to be decisive in this respect. On the other hand, the simplicity principle’s veridicality
difference between priors and conditionals might explain experiences that scenes look weird at
first glance, but less so at subsequent glances. That is, by way of co-evolution, seeing organisms
can usually move as well, and this allows them to get different views of a same scene to infer better
what the scene entails. This inference process can be modelled neatly by a recursive application of
Bayes’ rule, which means that posteriors obtained for one glance are taken as priors for the next
glance. This implies that the effect of the first priors fades away and that the conditionals become
the decisive entities. Hence, although the simplicity principle’s priors probably are not veridical,
the fact that its conditionals probably are veridical seems sufficient to reliably guide actions in
everyday situations. In other words, a visual system that aims at internal efficiency seems to yield,
as a side-effect, an evolutionary sufficient degree of veridicality in the external world.

5  The neural realization of simplicity


The previous sections focused on the question of what is processed rather than on the question of
how things are processed. That is, the simplicity and likelihood principles predict which interpreta-
tions result from the perceptual organization process, but this does not yet indicate how candidate
interpretations are processed. Notice that any stimulus may give rise to a superexponential num-
ber of candidate interpretations, so that evaluating each of them separately may require more time
than is available in this universe (cf. van Rooij 2008). To allow for a tractable process, the likelihood
paradigm tends to rely on heuristics (see, e.g., Hoffman 1998), but this does not yet indicate how
candidate interpretations are mentally structured and represented. The simplicity paradigm relies on
descriptive coding schemes which do suggest how candidate interpretations are mentally structured
and represented, but this does not yet resolve the tractability question (cf. Hatfield and Epstein 1985).
What is clear, however, is that the simplicity principle requires a nonlinear process: in line with
the law of Prägnanz, it implies that a minor change in the input may give a dramatic change in
the output. This is also the case in connectionism and DST, and honoring ideas therein, findings
within SIT in fact open—in an explanatory or epistemological sense (cf. Jilk, Lebiere, O’Reilly,
and Anderson 2008)—a pluralist perspective on how the brain might arrive at simplest interpre-
tations. This is explicated in the next given context of (a) processing in the visual hierarchy in the
brain, and perhaps surprising (b) quantum computing.

5.1  The visual hierarchy in the brain


As mentioned, neurally, the perceptual organization process is believed to comprise three inter-
twined subprocesses, namely, feature extraction, feature binding, and feature selection (see Figure
50.4). Together, these subprocesses yield integrated percepts composed of selected features. For
instance, the exogenous (i.e. stimulus-driven) subprocess of feature extraction—which is also
called the feedforward sweep—codes more complex things in higher visual areas. Furthermore,
the recurrent subprocess of feature selection selects different features from feature constella-
tions and integrates them into percepts. Here, without excluding influences by endogenous (i.e.
attention-driven) recurrent processing starting from beyond the visual hierarchy (Lamme and
Simplicity in Perceptual Organization 1039

Roelfsema 2000; Lamme, Supèr, and Spekreijse 1998; Peterson 1994), the latter subprocess is taken
to be a predominantly exogenous subprocess within the visual hierarchy (Gray 1999; Pylyshyn
1999). Currently more relevant, those feature constellations are thought to be the result of the
exogenous subprocess of horizontal binding of similar features coded within visual areas. This
subprocess seems to be mediated by transient neural assemblies which also have been implicated
in the phenomenon of neuronal synchronization (Gilbert 1992). This phenomenon is discussed
next in more detail.
Neuronal synchronization is the phenomenon that neurons, in transient assemblies, temporar-
ily synchronize their activity. Not to be confused with neuroplasticity which involves changes in
connectivity, such assemblies are thought to arise when neurons shift their allegiance to different
groups by altering connection strengths (Edelman 1987), which may also imply a shift in the spec-
ificity and function of neurons (Gilbert 1992). Both theoretically (Milner 1974; von der Malsburg
1981) and empirically (e.g. Eckhorn et al. 1988, 2001; Finkel, Yen, and Menschik 1998; Fries 2005;
Gray and Singer 1989; Salinas and Sejnowski 2001), neuronal synchronization has been associated
with cortical integration, and more general, with cognitive processing. Synchronization in the
gamma-band (30–70 Hz), in particular, has been associated with feature binding in perceptual
organization.
It is true that these associations are indicative of what neuronal synchronization is involved in,
but notice that they are not indicative of the nature of the underlying process. For instance, not
only inside but also outside connectionism, the neural network in the brain is taken to perform
parallel distributed processing (PDP). PDP, however, neither requires nor automatically implies
synchronization which, therefore, is likely to subserve a form of neuro-cognitive processing that
is more special than standard PDP. The question then is what this special form of processing
might be.
The neural side of this question has been investigated in DST. That is, by varying system param-
eters, DST has yielded valuable insights into the physical conditions under which networks may
exhibit synchronization (e.g. Buzsáki and Draguhn 2004; Campbell, Wang, and Jayaprakash 1999;
Hummel and Holyoak 2003, 2005; van Leeuwen, Steyvers, and Nooter 1997). The point now is
that SIT’s simplicity approach provides complementary insights, namely, into the cognitive side of
synchronization. To set the stage, the next subsection ventures briefly into the prospected applica-
tion of quantum physics in computing.

5.2  Quantum computing


Classical computers work with bits. A bit represents either a one or a zero, so that a classical com-
puter with N bits can be in only one of 2N states at any one time. Quantum computers, conversely,
are prospected to work with qubits (Feynman 1982). A qubit can represent a one, a zero, or any
quantum superposition of these two qubit states, so that a quantum computer with N qubits can
be in an arbitrary superposition of up to 2N states simultaneously. A final read-out gives one of
these states, but, crucially, the superposition of all these states directly affects the outcome of the
read-out. Such a superposition effectively means that, until the read-out, the up to 2N superposed
states can be processed in what van der Helm (2004) called a transparallel fashion, that is, simul-
taneously as if only one state were concerned. Hence, compared to naive computing methods,
quantum computing promises a dramatic reduction in the amount of work and time needed to
complete a computing task.
Inspired by this, quantum-physical phenomena like superposition have been proposed to
underlie consciousness in that they might be the source of neuronal synchronization (Penrose
1989; Penrose and Hameroff 2011; see also Atmanspacher 2011). It is true that this quantum
1040 van der Helm

mind hypothesis does not seem tenable, because quantum-physical phenomena do not seem to
last long enough to be useful for neuro-cognitive processing (Chalmers 1995, 1997; Searle 1997;
Seife 2000; Stenger 1992; Tegmark 2000). However, a cognitive form of superposition still seems
needed to account for perceptual organization (see also Townsend, Wenger, and Khodadadi,
this volume and Townsend and Nozawa’s 1995, similar call for what they coined a coactive
architecture yielding supercapacity). As discussed next, SIT provides such a cognitive option; it
is perhaps somewhat speculative and technical, but it is also mathematically sound and neurally
plausible.

5.3  The transparallel mind hypothesis


Within SIT, an algorithm has been developed to compute simplest codes of symbol strings (van
der Helm 2004). Symbol strings are not visual stimuli, but the objective of computing simplest
codes raises basically the same problems. To be more specific, this algorithm relies on distrib-
uted representations of transparent holographic regularities (see van der Helm, this volume), and
implements the three intertwined subprocesses that are believed to take place in the visual hierar-
chy in the brain (see Figure 50.4). For instance, it implements the subprocess of feature selection
by way of Dijkstra’s (1959) shortest path method. This method relates SIT’s algorithm to connec-
tionist modelling because it is comparable to computer implementations, in connectionist simula-
tions, of selection by activation spreading. A notable difference, though, is that it is not applied to
one fixed network suited for all possible inputs (as in standard connectionist modelling), but to
a hierarchy of input-dependent networks which represents all candidate interpretations for only
the input at hand.
Such an input-dependent network on N nodes at some hierarchical level forms a superposition of
up to 2N similar regularities extracted from the previous hierarchical level. These input-dependent
networks therefore find neuronal counterparts in the transient neural assemblies that are thought
to be responsible for binding similar features. Moreover, such an input-dependent network is
provably a hyperstring, which means that the up to 2N superposed regularities can be hierarchi-
cally recoded in a transparallel fashion, that is, simultaneously as if only one regularity were con-
cerned (van der Helm 2004).

Selection of different features

Binding of similar features

Binding of similar features

Extraction of visual features


Fig. 50.4  The process in the visual hierarchy in the brain is believed to comprise the three intertwined
subprocesses of feedforward feature extraction, horizontal feature binding, and recurrent feature
selection.
Simplicity in Perceptual Organization 1041

Hence, transparallel processing by hyperstrings is in fact as powerful as transparallel process-


ing by quantum computers. A notable difference, though, is that quantum computers form a still
prospected hardware option to perform transparallel processing, whereas hyperstrings provide
an already feasible software option to perform transparallel processing on classical computers.
This challenges the alleged but unproved general superiority of quantum computers over classi-
cal computers (cf. Hagar 2011). By the way, more sophisticated computing methods usually have
more application restrictions, and the vast majority of computing problems cannot benefit from
either transparallel method. This does not detract from what they can do, however, and each
method is bound to find its own niche.
Currently more relevant, transparallel processing by hyperstrings not only enables a tractable
computation of simplest codes of symbol strings, but also provides a computational explanation of
neuronal synchronization (van der Helm 2012, 2014). That is, as said, neuronal synchronization is
something else than standard PDP, and it might well be a manifestation of transparallel recoding
of similar features. Whether this explanation is tenable remains to be seen, but for one thing, this
pluralist picture of transient hyperstring-like neural assemblies subserving transparallel feature
processing does justice to the high combinatorial capacity and speed of the human perceptual
organization process.

6 Conclusions
It remains to be seen if human perceptual organization is indeed guided by the Occamian simplic-
ity principle which aims at internal efficiency, but this chapter shows that this principle is a seri-
ous contender of the Helmholtzian likelihood principle which aims at external veridicality. The
controversy between these principles is plagued by unclarities, but as reviewed, these unclarities
can be resolved—enabling a clear view on their fundamental differences. One insight then is that
empirical attempts to distinguish between them should focus on view-independent aspects of can-
didate stimulus interpretations, because view-dependent aspects do not seem to be decisive in this
respect. Their functional equivalence regarding view-dependent aspects, in turn, suggests that the
simplicity principle also has evolutionary survival value in that it yields sufficient veridicality in
everyday situations. Furthermore, the simplicity principle’s stance—that internal neuro-cognitive
mechanisms tend to yield parsimoneous percepts—is not only in line with Gestalt psychology but
is also sustained by the computational explanation of neuronal synchronization as being a mani-
festation of transparallel feature processing. This explanation suggests that the simplicity principle
is neurally realized by way of flexible cognitive architecture implemented in the relatively rigid
neural architecture of the brain.

Acknowledgment
Preparation of this chapter was supported by Methusalem grant METH/08/02 awarded to Johan
Wagemans (www.gestaltrevision.be).

References
Allen, G. (1879). ‘The origin of the sense of symmetry’. Mind 4: 301–316.
Atmanspacher, H. (2011). ‘Quantum approaches to consciousness’. In The Stanford Encyclopedia of
Philosophy, edited by E. N. Zalta. Retrieved from http://plato.stanford.edu.
Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61: 183–193.
1042 van der Helm

Attneave, F. (1982). ‘Prägnanz and soap-bubble systems: A Theoretical Exploration’. In Organization and
Representation in Perception, edited by J. Beck, pp. 11–29. Hillsdale, NJ: Erlbaum.
Bayes, T. (1958). ‘Studies in the history of probability and statistics: IX. Thomas Bayes’ (1763) Essay “Towards
Solving a Problem in the Doctrine of Chances” (in modernized notation)’. Biometrika 45: 296–315.
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94: 115–147.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–244.
Boselie, F. (1994). ‘Local and global factors in visual occlusion’. Perception 23: 517–528.
Boselie, F. and E. L. J. Leeuwenberg (1986). ‘A test of the minimum principle requires a perceptual coding
system’. Perception 15: 331–354.
Bowers, J. S. and C. J. Davis (2012a). ‘Bayesian just-so stories in psychology and neuroscience’. Psychological
Bulletin 3: 389–414.
Bowers, J. S. and C. J. Davis (2012b). ‘Is that what Bayesians believe? Reply to Griffiths, Chater, Norris, and
Pouget (2012)’. Psychological Bulletin 3: 423–426.
Brunswick, E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley,
CA: University of California Press.
Buzsáki, G. and A. Draguhn (2004). ‘Neuronal oscillations in cortical networks’. Science 304: 1926–1929.
Campbell, S. R., D. L. Wang, and C. Jayaprakash (1999). ‘Synchrony and desynchrony in integrate-and-fire
oscillators’. Neural Computation 11: 1595–1619.
Chaitin, G. J. (1969). ‘On the length of programs for computing finite binary sequences: Statistical
considerations’. Journal of the Association for Computing Machinery 16: 145–159.
Chalmers, D. J. (1995). ‘Facing up to the problem of consciousness’. Journal of Consciousness Studies 2:
200–219.
Chalmers, D. J. (1997). The Conscious Mind: In Search of a Fundamental Theory. Oxford: Oxford University
Press.
Chater, N. (1996). ‘Reconciling simplicity and likelihood principles in perceptual organization’.
Psychological Review 103: 566–581.
Collard, R. F. A. and H. F. J. M. Buffart (1983). ‘Minimization of structural information: A set-theoretical
approach’. Pattern Recognition 16: 231–242.
Dijkstra, E. W. (1959). ‘A note on two problems in connexion with graphs’. Numerische Mathematik 1:
269–271.
Eckhorn, R., R. Bauer, W. Jordan, M. Brosch, W. Kruse, M. Munk, and H. J. Reitboeck (1988). ‘Coherent
oscillations: A mechanism of feature linking in the visual cortex?’ Biological Cybernetics 60: 121–130.
Eckhorn, R., A. Bruns, M. Saam, A. Gail, A. Gabriel, and H. J. Brinksmeyer (2001). ‘Flexible cortical
gamma-band correlations suggest neural principles of visual processing’. Visual Cognition 8: 519–530.
Edelman, G. M. (1987). Neural Darwinism: The Theory of Neuronal Group Selection. New York: Basic Books.
Feldman, J. (2007). ‘Formation of visual “objects” in the early computation of spatial relations’. Perception
and Psychophysics 69: 816–827.
Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychological Review 116: 875–887.
Feynman, R. (1982). ‘Simulating physics with computers’. International Journal of Theoretical Physics 21:
467–488.
Finkel, L. H., S.-C. Yen, and E. D. Menschik (1998). ‘Synchronization: The computational currency of
cognition’. In ICANN 98, Proceedings of the 8th International Conference on Artificial Neural Networks
(Skövde, Sweden: 2–4 September 1998), edited by L. Niklasson, M. Boden, and T. Ziemke. New York:
Springer-Verlag.
Fries, P. (2005). ‘A mechanism for cognitive dynamics: Neuronal communication through neuronal
coherence’. Trends in Cognitive Sciences 9: 474–480.
Garner, W. R. (1962). Uncertainty and Structure as Psychological Concepts. New York: Wiley.
Simplicity in Perceptual Organization 1043

Garner, W. R. (1970). ‘Good patterns have few alternatives’. American Scientist 58: 34–42.
Garner, W. R. (1974). The Processing of Information and Structure. Potomac, MD: Erlbaum.
Gigerenzer, G. and Murray, D. J. (1987). Cognition as Intuitive Statistics. Hillsdale, NJ: Erlbaum.
Gilbert, C. D. (1992). ‘Horizontal integration and cortical dynamics’. Neuron 9: 1–13.
Gray, C. M. (1999). ‘The temporal correlation hypothesis of visual feature integration: Still alive and well’.
Neuron 24: 31–47.
Gray, C. M. and W. Singer (1989). ‘Stimulus-specific neuronal oscillations in orientation columns of cat
visual cortex’. Proceedings of the National Academy of Sciences USA 86: 1698–1702.
Gregory, R. L. (1980). ‘Perceptions as hypotheses’. Philosophical Transactions of the Royal Society of London
B 290: 181–197.
Hagar, A. (2011). ‘Quantum computing’. In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta.
Retrieved from http://plato.stanford.edu.
Hatfield, G. C. and W. Epstein (1985). ‘The status of the minimum principle in the theoretical analysis of
visual perception’. Psychological Bulletin 97: 155–186.
Hochberg, J. E. and E. McAlister (1953). ‘A quantitative approach to figural “goodness” ’. Journal of
Experimental Psychology 46: 361–364.
Hoffman, D. D. (1998). Visual Intelligence. New York: Norton.
Howe, C. Q. and D. Purves (2004). ‘Size contrast and assimilation explained by the statistics of natural
scene geometry’. Journal of Cognitive Neuroscience 16: 90–102.
Howe, C. Q. and D. Purves (2005). ‘Natural-scene geometry predicts the perception of angles and line
orientation’. Proceedings of the National Academy of Sciences USA 102: 1228–1233.
Hummel, J. E. and K. J. Holyoak (2003). ‘A symbolic-connectionist theory of relational inference and
generalization’. Psychological Review 110: 220–264.
Hummel, J. E. and K. J. Holyoak (2005). ‘Relational reasoning in a neurally-plausible cognitive architecture:
An overview of the LISA project’. Current Directions in Cognitive Science 14: 153–157.
Jilk, D. J., C. Lebiere, C. O’Reilly, and J. R. Anderson (2008). ‘SAL: An explicitly pluralistic cognitive
architecture’. Journal of Experimental and Theoretical Artificial Intelligence 20: 197–218.
Knill, D. C. and W. Richards (eds) (1996). Perception as Bayesian Inference. Cambridge: Cambridge
University Press.
Koffka, K. (1935). Principles of Gestalt Psychology. London: Routledge and Kegan Paul.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand [Static and stationary
physical shapes]. Braunschweig: Vieweg.
Kolmogorov, A. N. (1965). ‘Three approaches to the quantitative definition of information’. Problems in
Information Transmission 1: 1–7.
Lachmann, T. and C. van Leeuwen (2005a). ‘Individual pattern representations are context-independent,
but their collective representation is context-dependent’. Quarterly Journal of Experimental Psychology:
Human Experimental Psychology 58: 1265–1294.
Lachmann, T. and C. van Leeuwen (2005b). ‘Task-invariant aspects of goodness in perceptual
representation’. Quarterly Journal of Experimental Psychology: Human Experimental Psychology 58:
1295–1310.
Lamme, V. A. F. and P. R. Roelfsema (2000). ‘The distinct modes of vision offered by feedforward and
recurrent processing’. Trends in Neuroscience 23: 571–579.
Lamme, V. A. F., H. Supèr, and H. Spekreijse (1998). ‘Feedforward, horizontal, and feedback processing in
the visual cortex’. Current Opinion in Neurobiology 8: 529–535.
Leeuwenberg, E. L. J. (1968). Structural Information of Visual Patterns: An Efficient Coding System in
Perception. The Hague: Mouton and Co.
Leeuwenberg, E. L. J. (1969). ‘Quantitative specification of information in sequential patterns’. Psychological
Review 76: 216–220.
1044 van der Helm

Leeuwenberg, E. L. J. (1971). ‘A perceptual coding language for visual and auditory patterns’. American
Journal of Psychology 84: 307–349.
Leeuwenberg, E. L. J. and F. Boselie (1988). ‘Against the likelihood principle in visual form perception’.
Psychological Review 95: 485–491.
Leeuwenberg, E. L. J. and P. A. van der Helm (2013). Structural Information Theory: The Simplicity of Visual
Form. Cambridge: Cambridge University Press.
Leeuwenberg, E. L. J., P. A. van der Helm, and R. J. van Lier (1994). ‘From geons to structure: A note on
object classification’. Perception 23: 505–515.
Li, M. and P. Vitànyi (1997). An Introduction to Kolmogorov Complexity and its Applications (2nd edn).
New York: Springer-Verlag.
Mach, E. (1959). The Analysis of Sensations and the Relation of the Physical to the Psychical.
New York: Dover. (Originally published 1922.)
MacKay, D. (1950). Quantal aspects of scientific information. Philosophical Magazine 41: 289–301.
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge
University Press.
Marr, D. (2010). Vision. Cambridge, MA: MIT Press. (Originally published 1982 by Freeman.)
Milner, P. (1974). ‘A model for visual shape recognition’. Psychological Review 81: 521–535.
Penrose, R. (1989). The Emperor’s New Mind: Concerning Computers, Minds and the Laws of Physics.
Oxford: Oxford University Press.
Penrose, R. and S. Hameroff (2011). ‘Consciousness in the universe: Neuroscience, quantum space-
time geometry and orch OR theory’. Journal of Cosmology 14, http://journalofcosmology.com/
Consciousness160.html.
Perkins, D. (1976). ‘How good a bet is good form?’ Perception 5: 393–406.
Peterson, M. A. (1994). ‘Shape recognition can and does occur before figure-ground organization’. Current
Directions in Psychological Science 3: 105–111.
Pomerantz, J. and M. Kubovy (1986). ‘Theoretical approaches to perceptual organization: Simplicity and
likelihood principles’. In Handbook of Perception and Human Performance: Vol. 2. Cognitive Processes
and Performance, edited by K. R. Boff, L. Kaufman, and J. P. Thomas, pp. 36–46. New York: Wiley.
Pylyshyn, Z. W. (1999). ‘Is vision continuous with cognition? The case of impenetrability of visual
perception’. Behavioral and Brain Sciences 22: 341–423.
Rissanen, J. J. (1978). ‘Modelling by the shortest data description’. Automatica 14: 465–471.
Rock, I. (1983). The Logic of Perception. Cambridge, MA: MIT Press.
Salinas, E. and T. J. Sejnowski (2001). ‘Correlated neuronal activity and the flow of neural information’.
Nature Reviews Neuroscience 2: 539–550.
Searle, J. R. (1997). The Mystery of Consciousness. New York: The New York Review of Books.
Seife, C. (2000). ‘Cold numbers unmake the quantum mind’. Science 287: 791.
Shannon, C. E. (1948). ‘A mathematical theory of communication’. Bell System Technical Journal 27: 379–
423, 623–656.
Simon, H. A. (1972). ‘Complexity and the representation of patterned sequences of symbols’. Psychological
Review 79: 369–382.
Sober, E. (2002). ‘What is the problem of simplicity?’ In Simplicity, Inference, and Econometric Modelling,
edited by H. Keuzenkamp, M. McAleer, and A. Zellner, pp. 13–32. Cambridge: Cambridge University
Press.
Solomonoff, R. J. (1964a). ‘A formal theory of inductive inference, Part 1’. Information and Control 7: 1–22.
Solomonoff, R. J. (1964b). ‘A formal theory of inductive inference, Part 2’. Information and Control 7:
224–254.
Stenger, V. (1992). ‘The myth of quantum consciousness’. The Humanist 53: 13–15.
Simplicity in Perceptual Organization 1045

Sutherland, S. (1988). ‘Simplicity is not enough’. In Working Models of Human Perception, edited by
B. A. G. Elsendoorn and H. Bouma, pp. 381–390. London: Academic Press.
Tarr, M. J. and H. H. Bülthoff (1998). ‘Image-based object recognition in man, monkey and machine’.
Cognition 67: 1–20.
Tegmark, M. (2000). ‘Importance of quantum decoherence in brain processes’. Physical Review E61:
4194–4206.
Townsend, J. T. and G. Nozawa (1995). ‘Spatio-temporal properties of elementary perception: An
investigation of parallel, serial, and coactive theories’. Journal of Mathematical Psychology 39: 321–359.
Ungerleider, L. G. and M. Mishkin (1982). ‘Two cortical visual systems’. In Analysis of Visual Behavior,
edited by D. J. Ingle, M. A. Goodale, and R. J. W. Mansfield, pp. 549–586. Cambridge, MA: MIT Press.
van der Helm, P. A. (2000). ‘Simplicity versus likelihood in visual perception: From surprisals to precisals’.
Psychological Bulletin 126: 770–800.
van der Helm, P. A. (2004). ‘Transparallel processing by hyperstrings’. Proceedings of the National Academy
of Sciences USA 101(30): 10862–10867.
van der Helm, P. A. (2011a). ‘Bayesian confusions surrounding simplicity and likelihood in perceptual
organization’. Acta Psychologica 138: 337–346.
van der Helm, P. A. (2011b). ‘The influence of perception on the distribution of multiple symmetries in
nature and art’. Symmetry 3: 54–71.
van der Helm, P. A. (2012). ‘Cognitive architecture of perceptual organization: From neurons to gnosons’.
Cognitive Processing 13: 13–40.
van der Helm, P. A. (2014). Simplicity in Vision: A Multidisciplinary Account of Perceptual Organization.
Cambridge: Cambridge University Press.
von Helmholtz, H. L. F. (1962). Treatise on Physiological Optics, trans. by J. P. C. Southall. New York: Dover.
(Originally published 1909.)
van Leeuwen, C., M. Steyvers, and M. Nooter (1997). ‘Stability and intermittency in large-scale coupled
oscillator models for perceptual segmentation’. Journal of Mathematical Psychology 41: 319–344.
van Lier, R. (1999). ‘Investigating global effects in visual occlusion: From a partly occluded square to a tree-
trunk’s rear’. Acta Psychologica 102: 203–220.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1994). ‘Integrating global and local aspects of
visual occlusion’. Perception 23: 883–903.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1995). ‘Competing global and local
completions in visual occlusion’. Journal of Experimental Psychology: Human Perception and Performance
21: 571–583.
von der Malsburg, C. (1981). ‘The correlation theory of brain function’. Internal Report 81–2, Max-Planck-
Institute for Biophysical Chemistry, Göttingen, Germany.
van Rooij, I. (2008). ‘The tractable cognition thesis’. Cognitive Science 32: 939–984.
Watanabe, S. (1969). Knowing and Guessing. New York: Wiley.
Wertheimer, M. (1912). ‘Experimentelle Studien über das Sehen von Bewegung’. Zeitschrift für Psychologie
12: 161–265.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt’ [On Gestalt theory]. Psychologische
Forschung 4: 301–350.
Witkin, A. P. and J. M. Tenenbaum (1983). ‘On the role of structure in vision’. In Human and Machine
Vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 481–543. New York: Academic Press.
Yang, Z. Y. and D. Purves (2003). ‘Image/source statistics of surfaces in natural scenes’. Network-
computation in Neural Systems 14: 371–390.
Yang, Z. Y. and D. Purves (2004). ‘The statistical structure of natural light patterns determines perceived
light intensity’. Proceedings of the National Academy of Sciences USA 101: 8745–8750.
Chapter 51

Gestalts as ecological templates


Jan J. Koenderink

Visual Awareness
Open your eyes in bright daylight, what happens? Typically you will be immediately aware of
the scene in front of you. There is nothing you can do about it, the ‘presentations’ simply hap-
pen to you. The presentations follow each other at a rate of about a dozen a second (Brown 1996;
VanRullen and Koch 2003). Typically each one is similar to the immediately preceding one,
though occasionally sudden changes occur. Changes appear to be both of an endogenous and an
exogenous origin1.
You have no control over the presentations, except by way of voluntary eye movements, and
so forth. But many of the eye fixations are generated endogenously, rather than voluntarily. They
also ‘happen to you’, although you won’t notice. They are part of what I  propose to call your
‘zombie nature’2. Apart from immediate awareness you have a stream of cognitions and reflec-
tive thoughts. The latter are your doing; you largely have control over your thoughts, although a
minority ‘simply occur’ to you. In cases where you know to be aware of an ‘illusion’, you usually
can’t ‘correct’ your awareness3.
Your awareness is your reality in the sense that it is simply given to you4. Introspectively, a ‘cor-
rected illusion’ in reflective thought is much ‘less real’ than the illusion in your immediate visual
awareness. Thoughts may be right or wrong (your rational mind knows that), but awareness is
beyond this or that, right or wrong (your gut feelings depend on that).

Qualities and Meanings


The content of your presentations is exhausted by qualities and meanings. Here I use ‘meaning’ in
the sense of something like ‘good horse sense’ or ‘gut feeling’. A large dark something may appear
‘threatening’, even if you don’t know its why, what, or where. Taking a rope for a snake5 means
being aware of a ‘coiled elongatedness’ entering cognition. A meaning is not that different from a

1  The generic example is the depth flips of a ‘Necker cube’ (Necker 1832).
  The reference is to ‘philosophical zombies’. See the entry on philosophical zombies in the Stanford Encyclopedia
2

of Philosophy at <http://plato.stanford.edu/entries/zombies/>.
  See <http://en.wikipedia.org/wiki/Optical_illusion> on ‘Optical Illusions’.
3

4  Notice that my use of ‘reality’ is phenomenological, and different from what is often called ‘physical reality’.
The German distinction between Realität and Wirklichkeit does not seem to have an equivalent in English.
  Mistaking a rope for a snake refers, of course, to the generic example of illusion from the Vedanta. See <http://
5

en.wikipedia.org/wiki/Vedanta>.
Gestalts as Ecological Templates 1047

quality like ‘redness’6, except that it carries an emotional load that pure qualities lack. But it is only
a matter of degree; it is not that redness is devoid of emotional charge.

The Physical and the Mental Realms: Bridging Hypotheses


The ‘physical world’ is a description of your habitat in terms that are publicly agreed on as the
rock-bottom truth. It is the most ‘objective’ description that humanity has been able to put
together. It is extremely effective in simple tasks, like setting a man on the moon (a matter of
straightforward engineering), perhaps less so for more complicated tasks, like predicting tomor-
row’s weather (involving chaotic systems). The physical world is something you can know (by
studying the sciences); it is not something you can be immediately aware of. You still see the
sun rise and set, even though you may prefer Copernicus’ interpretation. You may have had
this conviction since childhood, but it will not keep you from seeing the sun ‘set’. Likewise, you
experience the earth as ‘flat’ and the blue sky as a ‘dome’. It is the way ‘things are’, at least in your
presentations.
The ‘mental world’ is what you experience. For the larger part it is immediate awareness. There
are also confabulated thoughts that treat awareness as a mere moment in a ‘stream of conscious-
ness’. Reflective thought does not partake in the immediate qualities and meanings that make up
awareness, thus it is only about experience.
In discussions on psychological issues, like perception, one has occasion to deal with both men-
tal and physical objects. The alternative would be pure behaviourism, that is, physiology, which is
self-contradictory because any ‘science’ is by definition a social undertaking7. Mental and physical
objects have to be treated on ontologically categorically distinct levels. The round square in your
mind is just as round as it is square8, although there is no such an object in the physical world. The
Higgs boson9 is a major player in the physical world (more so than your chair, an arbitrary and
continually changing conglomerate of molecules), although you may have no mental image of it,
or may not even have heard of it. Objects like the round square commonly occur in your thoughts,
and it is impossible to think without them (consider ‘heaven’, ‘honesty’, ‘common opinion’, and so
forth). Likewise, physics cannot do without many fictional entities like ‘entropy’, ‘wave functions’,
or ‘photons’.
The causal theory of perception has that objects in the physical world are the causes of objects
in your mental eye. It is a theory for which not the slightest thread of evidence can be found in
the sciences. Nevertheless, it is a very popular theory (it may well be the most common), because
mainstream thought ascribes to the notion of a ‘God’s Eye View’. This is a view of reality that
would have you to believe that:
•  There is a unique way things look (as seen by Him!).
•  This view is independent of the observer, thus fully objective (for He is never wrong!).
•  Physics is the unique way to come to know this objective reality.

  On the notion of qualia see <http://en.wikipedia.org/wiki/Qualia>.


6

  See <http://en.wikipedia.org/wiki/God%27s_eye_view>.
7

  The ‘being’ of an object such as a ‘round square’ is famously discussed in Alexius Meinong’s theory of objects
8

(Meinong 1899).
  On the Higgs boson, see <http://en.wikipedia.org/wiki/Higgs_boson>.
9
1048 Koenderink

•  Limited anatomical or physiological resources are the causes of illusion and error (your cat or
dog has only a limited view of the world).
•  Modern western man comes close(st) to seeing things as they really are (as widespread practices
like rain dancing, black magic, and so forth, suggest).
The core concept is objective reality. In western philosophy, Kant’s Copernican revolution10
replaced it with transcendental idealism. However, this never really influenced the attitude of
mainstream science in a serious way, nor that of generally accepted common sense. Holding such
convictions by default (perhaps unfortunately, few people question them or even think about
them) leads to numerous further misunderstandings. Fortunately, there are (and have always
been) thinkers who expressly reject the God’s Eye View (e.g.11). However, it is perhaps fair to say
that they represent a marginal stream of thought.
The causal theory of perception is an idea that purports to bridge the ontological chasm between
the two realms of the physical and the mental. I will call it a ‘bridging hypothesis’. This particular
bridging hypothesis is based on the God’s Eye View; I used it only to introduce the concept. But
the concept of ‘bridging hypothesis’ as such is something one cannot do without.
A number of distinct bridging hypotheses have been proposed. I mention only a few. A common
one is the causal theory of perception. Another is the notion of the Gestalt school that Gestalts
in visual awareness are isomorphic with certain brain activities (Kohler 1920). Eliminative mate-
rialists hold the notion that ‘pain’ is really nothing but the ‘firing of C-fibres’12. The notion that
consciousness is due to activity of a NCC (‘Neural Centre of Consciousness’) is hardly a bridg-
ing hypothesis, but more like a theory held by one of Molière’s physicians in the farce La mal-
ade imaginaire (e.g. opium induces sleep due to its virtus dormitiva). One of the few bridging
hypotheses that makes any sense to me is the one proposed by Erwin Schrödinger (of quantum
mechanical fame).
Schrödinger proposed that awareness is generated when the organism learns (Schrödinger
1958). All learning is necessarily by mistake, that is, through the falsification of expectations
through actual experience. This is an idea that finds wide acceptance in biology, psychology, and
philosophy. But Schrödinger gives it a novel twist: you ‘meet the world’ when your expectation is
suddenly exposed as wrong, thereby initiating a spark of enlightenment so to speak. Awareness
can be understood as a series of such micro-enlightenments. I find Schrödinger’s proposal attrac-
tive, although there is no way to prove it in the framework of the sciences because it is a pure
bridging hypothesis. However, it leads to interesting consequences, thus it has its value as a heuris-
tic device. Moreover, the alternatives (the NCC and so forth13) seem to me just silly in comparison.
I’ll refer to it as the Schrödinger principle.

10 See Kant’s Preface to the second edition of the Critique of Pure Reason (1787, a serious revision of the first
edition of 1781).
Alfred Korzybski, ‘Science & Sanity’, available at <http://esgs.free.fr/uk/art/sands.htm>.
11

‘Type physicalism’ proposes that mental event types (such as pain in an individual) are identical with spe-
12

cific event types in the brain. In this case the ‘C-fibre firings’ in the individual. Of course, this extends to
all sentient beings and all times.
E.g. Francis Crick: ‘You, your joys and your sorrows, your memories and your ambitions, your sense of
13

personal identity and free will, are in fact no more than the behaviour of a vast assembly of nerve cells and
their associated molecules’. See <http://www.consciousentities.com/crick.htm>.
Gestalts as Ecological Templates 1049

The Sherlock Method of Imposing Meaning on Chaos


The famous method of Sherlock Holmes14 is widely used in criminal investigation. It has to be,
for it is essentially the only method that lets you pursue open problems in an at least partially
understood domain. I  know of no other method except exhaustive search, which is usually
ruled out because of pragmatic reasons—it tends to be slow. In taking your important life’s deci-
sions (e.g. ‘am I going to have another beer?’) you aren’t going to wait eons for an algorithm to
complete.
Pure forensic research sometimes suffices to solve a crime by stumbling upon the solution more
or less by accident, but more frequently does not. If it does not, all forensic research can do is to
build a file of the scene of the crime in the widest sense. There is virtually no limit to what might
be (and often is!) collected, from DNA traces, to weather records, discarded cigarette butts, bro-
ken twigs, records of telephone conversations, and what have you. The sky is the limit. Very few of
these traces are likely to become relevant at any time in the investigation. The file is so voluminous
that it will never become exhausted. Most of the traces will never be considered at all. So much
for forensic science. Typically, it doesn’t solve crimes. It does not even supply ‘data’, but only an
(arbitrarily extensive) snapshot of the scene of the crime.
Solving crimes is of considerable importance to society. The method of Sherlock Holmes offers
at least a way to proceed. The detective comes up with a ‘plot’. If ideas are scarce he starts with
a random idea (‘the butler did it’ being as good as any). Given the plot, he generates questions.
Answers to many such questions can be searched for in the file delivered by forensic research. If
the file fails to yield an answer, perhaps additional forensic work is requested. In case the plot does
not work out, the detective swaps it for another one. Usually this one will be more focused, as the
previous work will have led to novel ideas and questions. The questions allow focused search in
the file. Even better, they allow him to ignore most of the ‘potential evidence’. The case is declared
‘solved’ when a sufficient number of unlikely expectations have been corroborated by the evi-
dence in the file. The probability of being wrong can be made almost arbitrarily small, because the
probabilities for the unlikely events have to be multiplied. This is not unlike the game of ‘Twenty
Questions’15. Starting with complete ignorance, twenty (yes/no) questions tend to be sufficient to
guess any word an opponent may have taken in mind. Small wonder, since 2–20 equals one in a
million. How many words can your opponent take in mind anyway?
Perception is like playing Twenty Questions with nature (Richards 1982). The sensory systems
build a huge forensic file. This file fills the sensory front-end of the brain. It is a volatile buffer
that is continually overwritten. The agent may ignore most of this structure using the Sherlock
method. It actively looks for evidence in the mess delivered by the sensory systems. It does ‘reality
checks’, not ‘computations on the data’ in the sense of ‘inverse optics’ (Poggio 1985). If the organ-
ism reaches a dead end, it tends to switch behaviour. Consider an example: your keys lie on the
table in front of you, in full view. You need your keys and start looking for them. Since you never
leave keys on the table you ‘overlook’ what is in front of you, and start exploring key hooks, draw-
ers, keyholes, coat pockets, and so forth. You arrive at a dead end because you applied the wrong

On the fictional detective Sherlock Holmes see <http://en.wikipedia.org/wiki/Sherlock_Holmes>.


14

‘Twenty questions’ is a spoken parlor game. It originated in the US, because very popular in the late 1940s
15

(through a radio quiz program). The game spread through Europe and was popular till (at least) the 1990s.
An online version can be found at <http://20q.net/>.
1050 Koenderink

plot. In this case the plot is template-like, bit like von Uexküll’s16 ‘seek image’ (Suchbild) in animal
behaviour. Did you lose completely? No, you collected a long list of places where not to look, pos-
sibly a great time saver. More importantly, you detected the need for a paradigm shift.
The Sherlock model centres upon the framing of questions. Notice that the meaning of an answer
is defined by the question, for questions imply a set of acceptable answers. That is why a discarded
cigarette butt—otherwise an irrelevant object—may bring the butler to the gallows. Questions are
like computer formats that define whether a certain sequence of key presses will be interpreted
as a number, a password, a command, or what have you. The meaning is not in the sequence of
key presses, but in the currently active format. This is how awareness is generated (here I  use
Schrödinger’s bridging hypothesis!), and how awareness becomes composed of meanings.
I will refer to this important principle as Sherlock’s principle: ‘The meaning of an answer is in
the question; questions derive from a plot.’
‘Meaning’ cannot be computed from mere structure, as the causal theory of perception implies.
Algorithms (of the ‘inverse optics’ kind in vision) merely transform meaningless structure into
equally meaningless structure. In most computer applications the meaning is provided by a user,
the computer simply computing a sequence of symbols or an array of pixels. In the case of sentient
beings, the meaning has to be intrinsic, that is to say, imposed by the agent’s intentionality. This does
not imply that the meaning is a mere arbitrary hallucination. It will be confronted with the structure
currently in the perceptual front-ends. Such ‘reality checks’ keep the system from free-wheeling.
‘Controlled hallucination’ is like ‘analysis by synthesis’, and very different from brute hallucination.
Although it is clear how meaning might be transferred so to speak, it remains unclear how the
agent might get at its plots. One a priori principle that appears rational is that any plot should ulti-
mately be due to repeated, uncontradicted experience. ‘Plots’ are similar to Searle’s (1983) ‘local
background’, Rumelhart’s (1980) ‘schemata’, or Minsky’s (1975) ‘frames’. The alternative would be
that plots might be present at birth, or might be revealed by some supernatural power. The lat-
ter possibility should be reserved to religion, as it certainly lies outside the sciences. The former
one is more interesting. It is certain that organisms are not born without structure, anatomically
and physiologically. No doubt certain abilities involving (even extensive) brain activity are pre-
sent prior to actual experiences. However, to hold that such actions would be accompanied by
immediate awareness would be to fall back on revelation. I will consider them part of the zombie
nature. Of course, one may (eventually) become aware of one’s actions, even automatic ones after
the fact. After all, the body and its movements are just another part of the physical world.

Animal Behaviour (Ethology)


At this point I make a connection to biology. Reasoning from Schrödinger’s notion, animals have
perceptual awareness, much like us, although they appear to have subhuman cognitive abilities,
and perhaps lack reflective thought completely, as their absence of linguistic abilities would sug-
gest. Indeed, few owners of cats, dogs, or horses would doubt that their pets are perceptually
aware; they hardly consider the possibility that they are caring for zombies. Thus, the study of ani-
mal behaviour is of some potential interest to the understanding of human perceptual awareness.
The absence of reflective thought might render such studies simpler, or perhaps ‘cleaner’, than is
possible in man. Since animals are behaviourally advanced in respect to human babies, animal
studies might be expected to complement human developmental studies.
The study of animal behaviour is ethology, a rather recent subfield of biology, whose founding
fathers Konrad Lorenz, Niko Tinbergen, and Karl von Frisch shared a Nobel Prize (in physiology

On Jakob von Uexküll see <http://en.wikipedia.org/wiki/Jakob_von_Uexk%C3%BCll>.


16
Gestalts as Ecological Templates 1051

or medicine) in 197317. A most important immediate forerunner was Jakob von Uexküll16, whose
marks are abundantly present in conceptual biology, psychology, and philosophy.
Important instances of animal behaviour over a wide range of species are Fixed Action Patterns
(FAPs), and Releasers18. These might be said to make up most of ‘instinctive behaviour’. What is
striking about the FAPs is that they occur even when the circumstances are not appropriate.
For instance, birds have been spotted to feed fish19, apparently because they ‘mistake the open
mouths of the fish for the open beaks of their chicks’. However, such an interpretation is no doubt
too anthropomorphic. Geese that roll eggs to their nest appear to act rationally20. When they do
the same with a potato one may suspect low visual acuity, and perhaps defective spectral resolu-
tion. However, the geese keep ‘rolling’ even when you remove the egg. Apparently, they can’t help
‘rolling’ once locked into the action pattern. The action can also be triggered by a brick placed in
the vicinity of the nest. The attempts of the bird to ‘roll the egg (brick) to the nest’ appear comi-
cal to the human observer. In many cases the ‘releasers’ trigger behaviour that even threatens the
survival of the species. A spectacular example involves male Australian Jewel beetles, mating beer
bottles left about the roads to exhaustion21. This in spite of the fact that the optical system of the
beetle easily resolves the difference between a beer bottle and a female. Such ‘mistakes’ can actu-
ally be quite useful, for instance, certain dairying ants appear to milk aphids (plant lice), yet the
ants ‘really’ mistake the rear ends of the lice for the heads of their fellow ants22.
What’s in these animals’ minds? Do they have any? Or is ‘mind’ synonymous with ‘human mind’
or even ‘my mind’? A major reference suggests that humans are unique (Genesis 26:27: ‘And God
said, Let us make man in our image. So God created man in his own image, in the image of God
created he him . . . ’23). However, the generic knowledge of medical doctors and veterinarians is
pretty much identical.
Such animal examples remind one of the fact that reflective thought often ‘knows’ that certain
spectacular ‘visual illusions’ in awareness are indeed illusionary, whereas awareness cannot be
‘corrected’ at all. One says that ‘vision is cognitively impenetrable’. Thus the ‘fixed action patterns’
and ‘releasers’ of ethology have many features in common with the Gestalts in human vision in
that they appear to be prepackaged responses that cannot be circumvented by the animal. On
the level of immediate awareness humans are not that different from what ethology reveals in
animal behaviour. I give some striking examples of template-like phenomena in human percep-
tion below. Although the emphasis tends to be on the illusory character of such phenomena, the
positive side is their adaptive significance. All well-adapted user interfaces have to be illusory
(also below).

See <http://www.nobelprize.org/nobel_prizes/medicine/laureates/1973/>.
17

On ethology see <http://en.wikipedia.org/wiki/Ethology>.


18

19 A video clip of a bird feeding fish is at <http://www.youtube.com/watch?v=vEDBABFnpRQ>.


This is the example made famous by Konrad Lorenz. A video is at <http://www.youtube.com/
20

watch?v=vUNZv-ByPkU>.
See ‘University of Toronto Mississauga professor wins Ig Nobel Prize for beer, sex research’, at <http://www.
21

eurekalert.org/pub_releases/2011-09/uot-uot092911.php>.
22 Video clips at <http://www.youtube.com/watch?v=tE7UL2pAaL0>, <http://www.youtube.com/
watch?v=IcdAgvroj5w>, <http://www.youtube.com/watch?v=NybgIxjlAGQ>, and <http://www.youtube.
com/watch?v=43id_NRajDo>.
23 King James Bible (completed 1611). Available online at the Electronic Text Center of the University of
Virginia: <http://www.rasmusen.org/special/USEFUL/CHILDREN/Things-to-do-words/literature-kids/
Childrens.bible.book/bible_kjv/kjv/etext.lib.virginia.edu/kjv.browse.html>.
1052 Koenderink

Von Uexküll and Gibson


Von Uexküll introduced the important notion of Umwelt (Uexküll 2011). The Umwelt describes
the world as it is relevant to the animal. For instance, to an animal without eyes the electromag-
netic field that humans know as ‘light’ is irrelevant. It is not part of their Umwelt. A human swim-
mer is not aware of the electric fields that sharks use to navigate and find their prey. Such fields
are part of the Umwelt of the shark, but not of that of the human. Ultrasonic sounds are part of
the Umwelt of bats, but not of that of humans. And so forth. What goes for the action of the world
on the agent also holds for the opposite (e.g. motor actions). Humans cannot change spectral
reflectivity of the skin, spread strong electric fields, or emit ultrasounds, like some animals can. Of
course, the body itself is an important part of the Umwelt.
The Umwelts of different species may or may not spatially overlap. All—including that of
humans—Umwelts are only small parts of the physical world. Did you know that a hundred bil-
lion solar neutrinos24 pass through your thumbnail every second? You have zillions of such blind-
nesses. Von Uexküll uses the imaginative notion that each sentient being is enveloped in its own
sphere like a soap bubble. It never gets out of it, and it is fully unaware of anything outside of the
boundary of this sphere. In that sense the beings in their Umwelts are like the Leibnizian monads,
for ‘the monads have no windows’ (Leibniz 1991).
Despite their isolation, the life of sentient beings is somehow in harmony with that of all oth-
ers. Von Uexküll uses the image of droplets on a cobweb that all reflect each other, much like
‘Indra’s Net’ (Cook 1977). An example of such a harmony is that spiders build webs that have
exactly the right mechanical properties and maze sizes to fit flies. In Leibniz’s monadology this
harmony is pre-established. In von Uexküll’s account it is due to the co-evolution of all spe-
cies and their terrestrial environment, although von Uexküll remained sceptical with respect to
Darwinism.
James Gibson25 must have been well aware of von Uexküll’s work, since it was regularly cited in
behaviourist psychology. He coined the concepts of ‘ecological niche’ and ‘affordance’ in analogy
to von Uexküll’s Umwelt and ‘functional tone’. However, where von Uexküll is very consequent in
defining Umwelt and qualities as intrinsic to the organism, Gibson is usually ambiguous, and often
locates meanings and qualities in the physical world. To von Uexküll a stone becomes a projectile
only when you pick it up with the intention to throw it, whereas to Gibson one property of a physi-
cal stone is its ‘throwability’:
An affordance is not bestowed upon an object by a need of an observer and his act of perceiving it. The
object offers what it does because it is what it is.
(Gibson 1986)

Gibson also hardly recognizes the Leibnizian harmony, which always has been a major source
of wonder to many researchers of the animal world. This appears to reflect the well-known dif-
ference in perspective between the Anglo-Saxon and continental European traditions in general.
An account in contemporary terms might run as follows. The physical world (Welt) is per-
haps the least clearly defined entity. For our (biologically inspired) purposes we certainly
don’t need reference to quarks26, Dirac’s equation27, and so forth. The ‘physical world’ is the

24 On solar neutrinos see <http://en.wikipedia.org/wiki/Solar_neutrino_problem>.


On James J. Gibson see <http://en.wikipedia.org/wiki/James_J._Gibson>.
25

26 On quarks <http://en.wikipedia.org/wiki/Quark>.
On Dirac’s equation <http://en.wikipedia.org/wiki/Dirac_equation>.
27
Gestalts as Ecological Templates 1053

everyday world as described by the applied sciences on scales relevant to humans. Although
very vague, one may simply consider a huge chunk with respect to a large variety of scales.
An overdose doesn’t hurt, because the physical world as such is irrelevant to the organism
(Turvey, Shaw, Reed, and Mace 1981). The Umwelt is a subset of the physical world that might
conceivably involve the organism, because it might be involved in actions to its body (in the
widest sense), or might be the target of its own actions. Thus the Umwelt is different from
a mere geographical niche (Umgebung), which is Gibson’s use. The body itself is part of the
Umwelt.
We need additional distinctions. The ‘sense world’ (Merkwelt) is a subset of the Umwelt that
might causally affect the organism’s sense organs. The ‘act world’ (Wirkwelt) is a subset of the
Umwelt that might be causally affected by the organism’s effectors. Sense world and act world
allow of dual descriptions. One is in terms of the causal nexus (mainly physics) of the Umwelt, the
other is in terms of neural activity in the body of the organism. In the latter case one thinks of the
act world as the ‘motor system’ (in the most general sense, including the glandular system, etc.),
and of the sense world as the ‘sensoria’ with their associated neural ‘front-ends’. All these above
distinctions in what is usually simply called ‘world’ are basic in discussing organisms, and are
commonly introduced in modern accounts (MacIver 2009).
In virtually all organisms one encounters closed loops of sensorimotor behaviour. An action in
the act world causes an action in the sense world, the chain being closed in the Umwelt. Activity in
the sense world causes actions in the act world, the chain being closed in the brain. Umwelt, sense
world, brain and act world are nodes in a single closed loop. The brain may complicate this loop in
numerous ways. For instance, an intended motor action yields an expectation of consequent sen-
sor activity, the so called reafference signal (von Uexküll’s ‘new loop’ (Uexküll 1926), now usually
associated with von Holst and Mittelstaedt 1950). The reafference is an expectation that may, or
may not, happen to successfully predict sensory effects. Mismatches are informative, because the
organism ‘meets the Umwelt’ in the mismatch, thus this may again lead to awareness according to
Schrödinger’s principle.
In these functional loops (Funktionskreisen) certain invariants eventually obtain a ‘functional
tone’, an envelope based on frequent uncontradicted experience. Since an invariant may occur
in many intertwined functional loops, such functional tones may acquire multiple degrees of
freedom. Eventually they become carriers of meaning. When traced to the Umwelt they are like
Gibson’s affordances, although that strips them from their roots in the functional loops, and
moves them from their proper ontological level.
One important point is that the functional tone derives from uncontradicted experience. I will
refer to this as von Uexküll’s principle: ‘The form of awareness reflects prior experience. There is
no awareness from “revelation” ’. Of course, this also involves Schrödinger’s bridging hypothesis
again. I return to this point later.
The ‘inner world’ (Innenwelt) of the organism can be thought of as a ‘projection’ of the func-
tional organization (as implemented by the whole body, including the brain) on the Umwelt. It
is the implementation of intentionality. Without the organism inner world and Umwelt disap-
pear, and one is left with the meaningless chaos that is the physical world. This is a revolutionary
notion with, for many, perhaps shocking consequences. It implies, for instance, that even space
and time—as you know them—are your constructions, not pre-existing entities that you happen
to find yourself immersed in. There are indeed many instances of animals that lack space and/or
time as judged from the structure of their sense and act worlds. Humans also appear to construct
their own space-times (Koenderink, Richards, and van Doorn 2012).
As von Uexküll remarks, the inner world of an organism must forever remain a closed book
to us. It can only be experienced from within, and cannot possibly be revealed by external
1054 Koenderink

observation. This recommended him to the behaviourist movement28 in the United States of the
early twentieth century. The inner world is mental. It is ‘what it is like’ to be a certain being. He
recognizes that we will never be able to enter the inner world of other beings. This is echoed by
Thomas Nagel in his famous paper ‘What is it like to be a bat?’ (Nagel 1974).
Notice that von Uexküll’s account (and the consequent account from ethology) suggests logical
and mutually complementary tasks for anatomy, physiology, brain science, ethology, behavior-
istic psychology, and cognitive science. It also treats phenomenological research as beyond the
realm of the sciences. Of course, this assumes that the various disciplines ‘play fair’, and stick to
their assigned areas of endeavor and discourse. Perhaps unfortunately, brain scientists engaging in
‘mind talk’, and psychologists engaging in ‘brain talk’, are commonly overstepping their bounda-
ries (Manzotti and Moderato 2010).
In my view phenomenological research is not altogether ruled out as a science, as von Uexküll
implies, because it applies singularly to homo sapiens, whereas he considers general, typically alien
phenomenology. In the human case a ‘shared subjectivity’ is possible due to the fact that individu-
als cannot be pried loose from their embedding in a social structure. This enables an empathic
or ‘silent’ understanding between individuals, a ‘pointing to the moon’29. Successful pointing, as
a silent communication device, implies emphatic understanding (Montag, Gallinat, and Heinz
2008; Stein 1917). When ‘pointing to the moon’, your dog will look at your finger, and so do young
children. However, dogs will never ‘get it’, whereas little children soon will.
An example would be a ‘visual proof ’, as frequently used by the Gestaltists. ‘Kanizsa’s triangle’
(Kanizsa 1955) tells us something, ‘we know not what’, but we all agree. Is it a scientific fact? That
is a matter of definition, but it is definitely a fact of experimental phenomenology, because the
triangle belongs to the ‘inner world’. When neuro-cognition purports to explore it, it oversteps its
boundaries. There is a place for experimental phenomenology because we are humans. Neither
behaviourism, nor cognitive science—by design—address the inner world.

The Human Condition: Awareness, Cognition,


and Reflective Thought
Both humans and animals have perceptual awareness (in a sense they are awareness, in that it
exhausts their reality). It is likely that all vertebrates have a similar basic structure that is in place
at birth. The similarities between newly hatched chicks, fishes, and human babies are striking. The
so-called ‘core systems’ identified by Elizabeth Spelke in psychology30 and Giorgio Vallortigara31
in ethology comprise (at least):
•  inanimate manipulable objects;
•  animate agents;
•  numbers;

On behaviourism see <http://en.wikipedia.org/wiki/Behaviorism>.


28

On Hotei (‘the laughing Buddha’) pointing to the moon see <http://www.myrkothum.com/the-meaning-of-


29

the-finger-pointing-to-the-moon/>.
Elizabeth S. Spelke’s website at the Department of Psychology of Harvard University has a good list of
30

important publications: <http://www.wjh.harvard.edu/~lds/index.html?spelke.html>.
Giorgio Vallortigara’s website at the Center for Mind/Brain Sciences of the University of Trento has a useful
31

list of publications: <http://www.unitn.it/en/cimec/11761/giorgio-vallortigara>.
Gestalts as Ecological Templates 1055

•  geometrical shapes and space;


•  social partners.
In humans these are just a foundation, whereas in some animals they are all there is and will
ever be. However, without the foundation human development might well be impossible. Humans
are singular animals (Twain 1903) in that they have a highly developed cognitive system that
complements the basic awareness system, and—more importantly—have language and reflective
thought to complement this. It seems evident that at least some animals have appreciable cogni-
tive abilities32, whereas reflective thought appears to be singularly human.
It seems likely that there is an almost continuous spectrum between immediate perceptual
awareness and reflective thought based on vision. For a start, consider these ballpark temporal
ranges:
•  Tenth of a second:  immediate visual awareness (presentation, glimpse). Based on a single
fixation and the preceding moments. It happens autonomously.
•  A second:  a glance, involving a few involuntary fixations, perhaps a single voluntary one.
A glance is based on a number of presentations, some of these due to different fixations picked
automatically. The temporal order within a glance is not necessarily conserved.
•  Few seconds:  a good look, involving several glances and voluntary fixations. The voluntary
fixations are driven by cognition; good looks are due to your actions. In retrospect you may
know what you did; there is a notion of temporal order.
•  Many seconds:  scrutiny, involving as many good looks as necessary. Scrutiny is driven by
cognition and reflective thought. Typically you are looking for something, or trying to clear up
some issue by optical means. Rational processes are in control. You can explain what you are
doing, to yourself, or to others. This is a typically human action.
There is a gradual transition from mere awareness, generated by pre-aware microgenetic pro-
cesses, to rational thought on the basis of optical sampling.
The processes that lead to awareness are themselves pre-aware, thus subconscious. They are at
best proto-rational. Yet they are mainly top-down, that is, constructive, rather than bottom-up,
thus reflexive. Bottom-up processes certainly occur in microgenesis, but they cannot lead to
qualities and meanings by themselves, they have to be considered protopathic. Such proto-
pathic processes are important in injecting ‘gists’ that may help to launch microgenetic threads.
This yields a head start in the Twenty Questions game. The microgenetic process may be con-
sidered as an evolutionary game in which the final (fittest) winner decides on the awareness.
The game consists of generating plots and running reality checks of the plots through probing
the sensory front-ends. This allows most of the sensory input to be ignored, and some of it to
be promoted to the status of quality and meaning. It is an implementation of Sherlock’s method
(Brown 1972, 1977).
The elementary process is ‘poking’ with the intention to meet a ‘resistance’. In biological terms
one imagines that the most basic drive of organisms is to expand their world (a Nietzschean Wille
zur Macht33), leading to a ‘poking’ of their environment by any means possible. In the lower,
unicellular organisms such poking appears to be random. When the poking meets resistance,
the organism is in direct contact with the world. When the poking becomes ‘aimed’, searching

32 See <http://en.wikipedia.org/wiki/Animal_cognition>.
See <http://en.wikipedia.org/wiki/Will_to_power>.
33
1056 Koenderink

for specific resistance, it is alike to questioning the world. Eventually this leads to ‘presentations’,
that is, awareness (Schopenhauer’s Die Welt als Wille (poking) und Vorstellung (presentation)
(Schopenhauer 1818/1819)). In humans this evolves into a nexus of qualities, meanings, and
emotions.
The process is systolic, the microgenesis of the next presentation going on, even as one experi-
ences the previous one. The timescale is largely limited by the fact that the perceptual front-end
buffers are continually being overwritten, thus there is only so much time for a reality check.
A natural termination is enforced when the volley of threads launched by the microgenesis has
been tried against front-end activity. Then a next systole is required in which some threads are
killed, others diversified (split into several independent threads). Thus the process is much like
any variety of genetic algorithm—for instance, ‘harmony search’ (Geem, Kim, and Loganathan
2001). One imagines the individual threads to be fairly simple, because any single presentation
cannot be very complicated. The general gist will be kept, and the focal structure is probably lim-
ited by the magical number seven plus or minus two.
The cognitive processes are distinct from this as they have their own agenda of plots. These
plots may be injected to bias microgenesis, the resulting awareness again making its way to the
input of cognitive processes. Apart from triggering plots, the cognitive processes generate con-
cepts that may enter reflective thought. In a way, the world on and in which cognition works is
awareness.
Reflective thought, finally, may be expected to launch novel cognitive processes. It works on the
conceptual level to confabulate ‘stories’ that account for the sequence of good looks. The world on
and in which reflective thought works is cognition.
Thus the various levels are intertwined in complicated ways. To try to understand vision in
simple terms as ‘bottom up this—top down that’ is far too simplistic to have much explanatory
power.

Human Visual Awareness as a User Interface


A good way to summarize the above account is to say that human visual awareness34 is an ‘opti-
cal user interface’ (Hoffman 2008, 2009). This implies many things, several of great conceptual
importance. I’ll discuss only a few. Consider the implications of visual awareness being a ‘user
interface’. A user interface35 is a system designed to:
•  both disconnect the user from the world, and to reconnect the user to a subset of the world. The
reconnection fully redefines the natural causal interactions between the agent and the world;
•  screen the user from complexities of the world that the user does not ‘need to know’. Thus, the
interface is by its very design non-veridical;
•  enable simple and efficacious interaction with the world in terms of the interface.
Thus the user ends up interacting with the interface, rather than the world per se. The world is
‘summarized’ in the interface in a way that promotes efficacious actions, rather than understand-
ing. This is definitely to the advantage of the user. It optimizes ‘fitness’ in the evolutionary sense,
at the expense of veridicality. What the user doesn’t need to know, the user will never know: the
interface is there to make sure of that.

34 I use ‘awareness’ as synonymous with ‘sentience’. See <http://en.wikipedia.org/wiki/Sentience>.


See <http://en.wikipedia.org/wiki/User_interface>.
35
Gestalts as Ecological Templates 1057

Perhaps the best-known example is the ‘desktop’ paradigm of laptop computers36. Consider
the process of deleting a text file. The text file ‘is’ an icon on the desktop. You use the mouse to
‘drag it’ to the ‘trash’, which is another icon on the desktop. As you place the text file on top of the
trash, it magically disappears. What really happened? That depends. To the interface program-
mer you moved the mouse, thus defining a sequence of screen locations. The program writes
the empty desktop over the text file icon, then writes the text file icon in its new location over
the desktop. This process is terminated once the mouse is over the trash. The text file icon is not
redrawn; instead a message is send to the file manager. The file manager is another program. It
manages nested lists of files. It deletes the text file from the list. This deletion generates a signal
to the ‘system’ (another program) that ‘frees’ the space on the disk (or somewhere else) where
the text file was stored. Nothing happened to the text file (a hacker may still ‘retrieve it’). Only
a reference was deleted and the desktop picture changed. The systems programmer has another
story. The electronics engineer another story still. The chips technician has yet another story, and
so forth. The user doesn’t have to know, nor does the user want to know. The fact that the text file
icon suddenly disappeared was encouraging (the ‘text file disappeared in the trash’). Are text files
like such icons? No way! The text file is different things to different people. Fortunately, the user
doesn’t need to know.
It is actually a good thing not to know what goes on in the physical environment you find your-
self in. You don’t want to be a systems programmer, an electronic technician, a chip specialist, a
solid-state physicist, a quantum mechanics expert . . . just to delete a text file! Moreover, you don’t
want to know what is inside the box you call ‘computer’ (vacuum tubes, transistors, silicon chips,
sawdust, empty beer cans, or what have you). Thus desktop interfaces are good. Everybody agrees on
that. The surprising thing is that people somehow hesitate when talking about perception and the
physical world. Most contemporary philosophers consider it problematic that we do not have the
kind of awareness that might be designated ‘veridical’. (Strange enough, it is usually silently under-
stood that we all know what might be meant by ‘veridical’. Does it include string theory37? This is the
God’s Eye View again.)
Biological evolution38 doesn’t care about all this. It simply optimizes biological fitness39. As a
consequence strange things may happen, as is amply recorded in ethological research. It is not
that humans are exempt of such strange behaviors either. After all, rain dancing40, black magic41,
and various religious beliefs42 are still widespread. Many of these cases are beneficial to the agents,
some not. In all cases the agents are ‘fooled’ by their user interfaces.

Some Common Templates


The entities of the user interface are arbitrary ‘templates’. They are like the icon of the text file on
the desktop of your laptop computer. The icon has really nothing in common with the text file as
you understand it (which is probably a nested sequence of letters, words, sentences, paragraphs,

See <http://en.wikipedia.org/wiki/Desktop_metaphor>.
36

See <http://en.wikipedia.org/wiki/String_theory>.
37

See <http://en.wikipedia.org/wiki/Evolution>.
38

39 See <http://en.wikipedia.org/wiki/Fitness_%28biology%29>.
See <http://en.wikipedia.org/wiki/Rainmaking_%28ritual%29>.
40

41 See <http://en.wikipedia.org/wiki/Black_magic>.
See <http://en.wikipedia.org/wiki/Religion>.
42
1058 Koenderink

and so forth, possibly organized on a page in some pleasant pattern, unless you are using the
UNIX vi editor), it simply stands for it like a name. The icon has nothing else to do with what you
mean than a mere conventional association.
Most people are not aware of this, or prefer to forget it. When they have unfortunately deleted
their text file accidentally, they start searching for its icon(!). Yet only the icon is really gone,
whereas the text file (at least immediately after the act of deletion) can probably still be recovered,
thus is still ‘on’ your computer. The icon is like the Gestalt, quality, or meaning, in your visual
awareness. Although the elements of your immediate awareness are not physical objects, they are
indeed your reality. But they are your reality, and nothing beyond that. That does not mean they
have no useful existence. As you change your text file (you probably wrote it in the first place),
it will have different effects as you send it as a letter. Using the internet—at considerable remove
from ‘daily reality’—you can donate half of your income to a house for stray cats. This will have
real consequences to your life, for instance, it may prevent you from paying your rent, causing
you to have to sleep in the streets. Although sleeping in the streets is tough (‘real’ reality), it is still
experienced in terms of your user interface. Everything is.
In immediate visual awareness you encounter qualities and meanings, packaged as Gestalts.
These are, no doubt, elements of your optical user interface. They are template objects. Consider
a few common templates:
•  figures and grounds43;
•  volumetric objects;
•  causal interactions (Michotte 1946);
and so forth.
What about them? The familiar phenomenon of ‘figure–ground reversal’ is sufficient evidence
for the volatile nature of this distinction. You know, no doubt, that you see only the frontal sur-
faces of ‘volumetric’ objects. The apple you see may actually turn out to be hollow on turning it
around. Causal interactions may be faked like in a magician’s show, or in the interaction of the
text file with the trash icon.
Here I will discuss a few fairly obvious and common reflections on the fact that human visual
awareness is a ‘user interface’. One spots this because the elements of the user interface tend to be
abiding templates, rather than ‘solutions of the inverse optics problem’. I simply give some obvious
examples. Many more can be found, one needs only look for them. What is perhaps surprising
is that mainstream vision research has failed to notice these facts, for one is not talking of minor
effects! The reason is, no doubt, that they were never looked for.
External local sign. ‘Local sign’ is a concept due to Lotze (1852). It is a place label on fibres
of the optic nerve, a solution to the problem of how the brain ‘knows where the fibres are from’.
Tarachopia (Hess 1982) appears to be an aphasia revealing a defective local sign. ‘External local
sign’ (Koenderink, van Doorn, and Todd 2009) assigns a ‘visual ray’ (Burton 1945), that is a direc-
tion in the world, in oculocentric coordinates, to fibres in the optic nerve. Early speculations
about the origin of external local sign are due to Berkeley (1709). Otherwise hardly any phenom-
enological research exists on the topic.
In a simple experiment we mapped external local sign throughout the field of view of a few
dozen observers. One simple overall measure of external local sign is the angular spread of the
visual rays over the full field of view. This is the diameter of the ‘visual field’, which is the subjec-
tive correlate of the field of view. Whereas the field of view of the human eye subtends about 180º,

43 See <http://en.wikipedia.org/wiki/Figure%E2%80%93ground_%28perception%29>.
Gestalts as Ecological Templates 1059

we find a wide spectrum of visual field diameters. The distribution appears to be bimodal, most
observers having a visual field of about 90º across. Thus most observers experience visual objects
as far more ‘in front of them’ than they are.
External local sign appears to be an important rigid ‘template’ that strongly influences human
awareness of visual space. We found that virtually all observers commit huge errors (exceeding
100º) when asked to rotate (under remote control) one of two congruent objects in the scene in
front of them so as to be geometrically parallel (Koenderink, van Doorn, de Ridder, and Oomes
2010). We also showed that visual observers make huge mistakes in judging whether a number of
people in front of them are arranged in strict military order. Such non-veridical observations are
due to the application of a rigid template that fails to implement the optical fact that visual direc-
tions fan out from the eye into a half-space.
Linear perspective of pictorial box spaces. Pictorial ‘box spaces’ are renderings of cubicles
(Panofski 1927). They were common in the woodcuts of the Middle Ages, but still in use today.
The early renderings are in a free style reminiscent of one-point perspective. Later one used true
one-point perspective, which is very simple in the case of cubes. The front and back faces of the
cube are rendered as squares, the image of the back face smaller than that of the front one. Then
corresponding vertices are joined (the ‘orthogonals’) so as to define the side faces. The front face
is left ‘transparent’, so the cubicle is open to the view. In a true linear perspective the orthogonals
would be concurrent lines. The construction is so simple that many draftsmen sketch it free hand.
The cubicle then acts as a ‘stage’ that the artist may fill with any content. The stage defines the
pictorial space; it acts as a scaffold or skeleton to the pictorial structure.
In linear perspective there is a well-defined viewpoint and thus angular size of the cube. Given
the viewpoint, the ratio of the sizes of front and back face is fixed. As you change this ratio, the
prediction is that the cubicle will either appear as a thin slab (ratio nearer to unity) or a deep
corridor (ratio larger than fiducial). In an experiment we asked observers to adjust the ratio such
that the awareness was of a true cubicle (Pont, Nefs, van Doorn, Wijntjes, te Pas, de Ridder, and
Koenderink 2012). We did this for a wide range of viewpoints, varying both distance and angular
size. The result was clear-cut in that the prediction was not borne out at all. What observers do is
set a fixed ratio. They impose a template, even when ‘not applicable’.
The result may account for the fact that observers judge wide-angle or telephotographs
as ‘distorted’ as compared with photographs taken with a ‘normal lens’ (field of view about
40–50º). They do this, even when the viewpoint is perspectively ‘correct’. Apparently they apply
templates for familiar things, and experience obvious deviations from the template as distor-
tions. That is no doubt why artists ‘correct for distortions’ when depicting wide-angle scenes
(Pirenne 1970).
Shape from shading. ‘Shading’ is an important shape cue for visual artists. It has been used from
the earliest time on. An interpretation in terms of optics starts in Renaissance art, and becomes
a proper (applied) science in the seventeenth and eighteenth centuries. Shading was taught as a
discipline in western academies of art till the early twentieth century (Baxandall 1995).
The perception of shading was initially studied with the simplest patterns, designed to isolate
the ‘shading cue’ in its simplest form. The canonical stimulus has been a circular disk filled
with a linear lightness gradient. From an optical analysis one finds that such a pattern can
be due to the illumination of a curved surface in infinitely many ways. Assuming a uniform,
unidirectional illumination, the possible surfaces would be quadrics:  spherical, cylindrical,
saddle-shaped, and anything in between. From the phenomenology we know that observers
are only aware of spherical patches though. In order to become aware of a cylinder one needs
to change the shape of the patch from circular to square, whereas saddle shapes are never
reported. Perhaps surprisingly, an analysis reveals that the prior is biased towards saddle shapes
1060 Koenderink

(about 57 per cent of the area of a Gaussian random surface is saddle-shaped; Koenderink and
van Doorn 2003).
Apparently human visual observers apply templates that do not include a saddle shape. This
may be due to a general disregard of saddle shapes. For instance Alberti, writing in the fifteenth
century, proposes a ‘complete’ catalogue of shapes that lacks saddles (Alberti 1435). Apparently,
they never occurred to this highly educated intellectual. The correct taxonomy only came with
Gauss in the nineteenth century (Gauss 1828).
An interpretation might be that spheres and cylinders are ‘thing-like’ whereas saddle shapes
cannot be (you can’t have an object bounded by a saddle-like surface throughout). Thus the tem-
plate might be biased to ‘things’, that is to say, volumetric objects of manipulable size.

Conclusion
Human visual awareness is perhaps best characterized as an optical user interface. The elements of
the interface are template-like. They have qualities and meanings that derive from their functional
role in the interface. Thus, awareness is non-veridical by design. Evolution optimizes biological
fitness, rather than physical veridicality. In this, human visual awareness is not unlike the struc-
ture of animal vision as described by ethology.
Throughout the paper I have consistently used three principles that appear fundamental to the
understanding of visual awareness (the epithets are mine, and perhaps not entirely fair):
•  Sherlock’s principle: The meaning of an answer is in the question, questions derive from a plot.
•  Schrödinger’s principle:  The occurrence of awareness corresponds to the falsification of an
expectation.
•  Von Uexküll’s principle: The form of awareness reflects prior experience. There is no awareness
from ‘revelation’.
Many of the conceptual leads are due to von Uexküll, who has indeed left his marks on various
strands of modern biology, psychology, philosophy, semiotics, artificial intelligence and robotics,
and so forth.
Can the user interface be changed, or extended in the course of the life of an individual? The
quick answer appears to be ‘No!’, or at least ‘Hardly!’ Non-vertebrate animals appear to have
fixed interfaces, and the majority of vertebrates are not that far ahead. Even primates (including
humans) appear to have predominantly fixed interfaces, although these develop over a number
of years in the child. The human interface has many traits common to those of all vertebrates, is
still adapted to savannah hunter-gatherer life, and so forth. Yet it appears that the human interface
has at least some (very limited) flexibility. Most adaptations to the technological age are on the
level of reflective thought and novel sensorimotor and cognitive adaptations. They tend to be in
the margin of visual awareness per se, more like a layer of (painfully cognitive) ‘corrections’. Yet it
is obvious how novelty might arise. It has to be through the formation of novel functional loops,
slowly developing novel ‘functional tones’.
One might wonder why the ‘application of templates’ would lead to awareness at all. At first
blush it would seem to run counter to Schrödinger’s principle. But notice that the implementation
of the ‘application of a template’ would be the launching of a microgenetic thread that would still
have to pass a reality check. A standard template is likely to be violated in such checks, and to be
fine-tuned to fit (or be killed). Thus, the templates are more like plots, enabling the system to come
to terms with the optical structure impinging upon it. There is no reason to think they would not
lead to the falsification of expectations on various different levels.
Gestalts as Ecological Templates 1061

References
Alberti, L. B. (1435). De Pictura. (On Painting, trans. C. Grayson, ed. M. Kemp. Harmondsworth: Penguin,
1972.)
Baxandall, Michael (1995). Shadows and Enlightenment (London, New Haven: Yale University Press).
Berkeley, G. (1709). An Essay Towards a New Theory of Vision (Dublin: Pepyat).
Brown, J. W. (1972). Aphasia, Apraxia and Agnosia (Springfield: Charles C. Thomas).
Brown, J. W. (1977). Mind, Brain and Consciousness (New York: Academic Press).
Brown, J. W. (1996). Time, Will and Mental Process (New York: Plenum Press).
Burton, H. E. (1945). ‘The Optics of Euclid’. J Opt Soc Am 35: 357–372.
Cook, Francis H. (1977). Hua-Yen Buddhism: The Jewel Net of Indra (University Park and
London: Pennsylvania State University Press).
Gauss, Carl Friedrich (1828). Disquisitiones generales circa superficies curvas (Gottingae: Typis
Dieterichianis).
Geem, Z. W., J. H. Kim, and G. V. Loganathan (2001). ‘A New Heuristic Optimization
Algorithm: Harmony Search’. Simulation 76(2): 60–68.
Gibson, J. J. (1986). The Ecological Approach to Visual Perception, pp. 138–139 (London: Routledge).
Hess, R. (1982). ‘Developmental sensory impairment: Amblyopia or tarachopia’. Human Neurobiology
1: 1–29.
Hoffman, D. (2008). ‘Sensory Experiences as Cryptic Symbols of a Multi-modal User Interface’
[Computer, Felsen, Gehirne und Sterne: Raetselhafte Zeichen einer multimodalen
Benutzerschnittstelle]. In Kunst und Kognition, ed. M. Bauer, F. Liptay, and S. Marschall, 261–279
(Munich: Wilhelm Fink).
Hoffman, D. (2009). ‘The Interface Theory of Perception: Natural Selection Drives True Perception to Swift
Extinction’. In Object Categorization: Computer and Human Vision Perspectives, ed. S. Dickinson,
M. Tarr, A. Leonardis, and B. Schiele, pp. 148–165 (Cambridge: Cambridge University Press).
Kanizsa, G. (1955). ‘Margini quasi-percettivi in campi con stimolazione omogenea’. Rivista di Psicologia
49(1): 7–30.
Koenderink, J. J. and A. J. van Doorn (2003). ‘Shape and shading’. In The Visual Neurosciences, ed.
L. M. Chalupa and J. S. Werner, pp. 1090–1105 (Cambridge, MA: MIT Press).
Koenderink, J. J., A. J. van Doorn, and J. T. Todd (2009). ‘Wide Distribution of External Local Sign in the
Normal Population.’ Psychological Research 73: 14–22.
Koenderink, J. J., A. J. van Doorn, H. de Ridder, and S. Oomes. (2010). ‘Visual rays are parallel.’ Perception
39(9): 1163–1171.
Koenderink, J. J., W. A. Richards, and A. J. van Doorn (2012). ‘Space-time disarray and visual awareness.’
i-Perception 3: 159–165.
Kohler, W. (1920/1955). Die physischen Gestalten in Ruhe und im stationaren Zustand. Abridged trans.
in A Source Book of Gestalt Psychology, ed. W. D. Ellis, pp. 71–88 (New York: The Humanities Press).
(Original work published in 1920.)
Leibniz, G. W. (1991). La Monadologie, ed. E. Boutroux (Paris: LGF).
Lotze, R. H. (1852). Medicinische Psychologie oder Physiologie der Seele (Leipzig: Weidmann’sche
Buchhandlung).
MacIver, M. A. (2009). ‘Neuroethology: From Morphological Computation to Planning’. In The Cambridge
Handbook of Situated Cognition, ed. P. Robbins and M. Aydede, pp. 480–504 (New York: Cambridge
University Press).
Manzotti, R. and P. Moderato (2010). ‘Is neuroscience the forthcoming “mind science”?’ Behaviour and
Philosophy 38(1): 1–28.
1062 Koenderink

Meinong, A. (1899). ‘Über Gegenstände höherer Ordnung und deren Verhältniss zur inneren
Wahrnehmung.’ Zeitschrift für Psychologie und Physiologie der Sinnesorgane 21: 187–272.
Michotte, Albert (1946). La perception de la causalité (Louvain: Institut Supérieur de Philosophie)
Minsky, Marvin (1975). ‘A Framework for Representing Knowledge.’ In The Psychology of Computer Vision,
ed. Patrick Henry Winston (New York: McGraw-Hill).
Montag, Christiane, Jürgen Gallinat, and Andreas Heinz (2008). ‘Theodor Lipps and the Concept of
Empathy: 1851–1914.’ Am J Psychiatry 165: 1261.
Nagel, T. (1974). ‘What is it Like to be a Bat?’ The Philosophical Review 83(4) (October): 435–450.
Necker, L. A. (1832). ‘Observations on some Remarkable Optical Phaenomena seen in Switzerland; and on
an Optical Phaenomenon which Occurs on Viewing a Figure of a Crystal or Geometrical Solid.’ London
and Edinburgh Philosophical Magazine and Journal of Science 1(5): 329–337.
Panofski, E. (1927). Die Perspektive als ‘symbolische Form’. Vorträge in der Bibliothek Warburg 1924/1925
(Leipzig: Teubner).
Pirenne, M. H. (1970). Optics, Painting, and Photography (Cambridge: Cambridge University Press).
Poggio, T. (1985). ‘Early Vision: From Computational Structure to Algorithms and Parallel Hardware.’
Computer Vision, Graphics, and Image Processing 31: 139–155.
Pont, S. C., H. T. Nefs, A. J. van Doorn, M. W. A. Wijntjes, S. F. te Pas, H. de Ridder, and J. J. Koenderink
(2012). ‘Depth in Box Spaces.’ Seeing and Perceiving 25(3–4): 339–349.
Richards, W. A. (1982). ‘How to Play 20 Questions with Nature and Win.’ MIT A.I. Memo No. 660
(December).
Rumelhart, D. E. (1980). ‘Schemata: The Building Blocks of Cognition’. In Theoretical Issues in Reading
Comprehension, ed. R. J. Spiro et al., pp. 33–58 (Hillsdale, NJ: Lawrence Erlbaum).
Schopenhauer, A. (1818–1819/1966). The World as Will and Representation [Die Welt als Wille und
Vorstellung], vol. 1; vol. 2 (1844/1966) (New York: Dover Publications).
Schrödinger, E. (1958). Mind and Matter: The Tarner Lectures (Cambridge: Cambridge University Press).
Searle, J. (1983). Intentionality: An Essay in the Philosophy of Mind, vol. 9 (Cambridge: Cambridge
University Press).
Stein, Saint Edith (1917). Zum Problem der Einfühlung (Halle an der Saale). Reprinted in Herder
Edith-Stein-Gesamtausgabe, vol. 5, ed. Andreas Uwe Müller (Freiburg: Herder, 2008).
Turvey, M. T., R. E. Shaw, E. S. Reed, and W. M. Mace (1981). ‘Ecological Laws of Perceiving and Acting: In
Reply to Fodor and Pylyshyn.’ Cognition 9: 237–304.
Twain, Mark (1903/1997). ‘Was the World Made for Man?’ Reprinted in John Carey, Eyewitness to Science,
p. 250 (Boston: Harvard University Press).
VanRullen, R. and C. Koch (2003). ‘Is Perception Discrete or Continuous?’ Trends in Cognitive Science
7(5): 207–213.
von Holst, E. and H. Mittelstaedt (1950). ‘The Reafference Principle: Interaction between the Central
Nervous System and the Periphery’. In Selected Papers of Erich von Holst: vol. 1: The Behavioural
Physiology of Animals and Man, trans. R. Martin, pp. 139–73 (London: Methuen). (From German.)
von Uexküll, J. J. (1926). Theoretical Biology (London: Kegan Paul, Trubner).
von Uexküll, J. J. (2011). A Foray into the Worlds of Animals and Humans, with A Theory of Meaning, trans.
Joseph D. O’Neil, introduction by Dorion Sagan (Minneapolis: University of Minnesota Press).
Index of Names

Note: page numbers in italics refer to figures. References to footnotes are indicated by the ­suffix, ‘n’,
followed by the note number, for e­ xample 282n6.

Achard, S. 992 Appelbaum, L.G.  349


Adelson, E.H.  396, 397, 400, 400–1, 426 Araque, N.O.  656
Adelson, E.H. and Movshon, J.A.  507 Arden, G.B. and Weale, R.A.  825, 826
Adelson, E.H. and Pentland, A.P.  399 Arend, L.  395, 398
Aglioti, S.  675–6 Arieli, A. 993
Agostini, T. and Galmonte, A.  406 Ariely, D.  158–9
Agostini, T. and Proffitt, D.R.  404 Armstrong, L. and Mark, L.E.  627
Ahlström, V. 578 Arnheim, R.  10, 16, 281, 285, 864, 871, 877
Ahveninen, J. 611 Arno, P.  656, 663
Akino, A. 866 Arnold, D.H.  820–1
Akisato, R. 878 Asch, S.E.  130
Alain, C. 610 Ashby, F.G.  960
Alais, D.  507, 508, 779, 784, 790 Ashby, F.G. and Alfonso-Reese, L.A.  935
Alais, D. and Blake, R. 781–2, 802, 804 Ashby, F.G. and Townsend, J.T.  955
Alberti, L.B.  1060 Attneave, F.  160, 167, 236–8, 243, 1029
Albrecht, A.R.  290 Auvray, M.  660, 661, 662
Albright, T.D.  506 Avedon, R.  902, 910
Alexander, D.A.  1000 Avidan, G. 762
Alexander, D.M.  982
Alexander, L.T.  778 Baars, B.J.  997, 999
Algom, D. 980 Bach-y-Rita, P.  656, 660, 661
Allan, L.G.  821 Bahnsen, P. 109
Allard, R. and Cavanagh, P.  154, 156 Bair, W. 324
Allen, G.  1037 Baker, D.H. and Graf, E.W.  808
Allen, P.G. and Kohlers, P.A.  646 Balas, B.J.  179
Allen, W.A.  856 Baldassi, S. and Burr, D.C.  154
Allik, J.  159, 160 Baloch, A.A. and Grossberg, S.  545
Allik, J. and Kreegipuu, K.  825 Barbosa, A. 850
Almeida, J. 807 Barenholtz, E. and Feldman, J.  264–5
Alpern, M. 825 Barenholtz, E. and Tarr, M.J.  265
Alpers, G.W. and Pauli, P.  788 Barense, M.G.  275–6
Alsius, A. and Munhall, K.G.  813 Barlow, H.B.  363, 941
Alvarez, G.A.  821 Barlow, H.B. and Levick, W.R.  489
Alvarez, G.A. and Oliva, A.  161 Barlow, H.B. and Reeves, B.C.  112, 113, 117
Amano, K. and Foster, D.H.  455 Barrow, H.G. and Tenenbaum, J.  399
Amedi, A.  658, 663, 664, 971 Barlow, H.B. and Tripathy, S.P.  157
Amir, O. 937 Barton, J.J.S.  762, 763
Amishav, R. and Kimchi, R.  141, 765, 766, 767, Barttfeld , P.  720
769, 963 Bassett, D. and Bulmore, E.  992
Anderson, B.L.  296, 311, 407, 408, 423, 448, 450 Battelli, L. 582
Anderson, B.L. and Winawer, J.  426 Bauer, B. 152
Anderson, E. 788 Bauer, R. and Heinze, S.  977
Anderson, J.R.  935 Baxandall, M.  1059
Anderson, J.R. and Betz, J.  935 Bayes, T.  1009–11, 1032
Anderson, L.A.  609 Baylis, G.C. and Driver, J.  267–8, 289, 362–3, 979
Angelucci, A. and Bullier, J.  938 Beaudot, W.H.  822
Anokhin, A.P. and Vogel, F.  994 Beck, D.M.  806
Anstis, S.  448, 513, 550 Beck, D.M. and Palmer, S.E.  74
Anstis, S. and Kim, J.  805 Beck, J.  167, 172
Anstis, S., Vergeer, M., and Van Lier, R.  446 Behrens, R. 869
Antonioni, M. 901 Behrmann, M. 723
1064 INDEX OF NAMES

Behrmann, M. and Kimchi, R.  131 Branucci, A. and Tommasi, L.  802


Behrmann, M. and Tipper, S.P.  742 Brascamp, J.W. and Blake, R.  787
Beintema, J.P.  589 Braunstein, M.L. and Andersen, G.J.  532
Beintema, J.P. and Lappe, M.  581 Bregman, A.S.  602, 605
Beller, H.K.  132 Breitmayer, B.G. and Ogmen, H.  800
Bellgrove, M.A.  992 Brentano, F.  21–4, 26–28, 32–3, 34
Benardete, E.A. and Kaplan, E.  826 Bressan, P.  400, 401, 404, 406, 408
Benary, W.  11, 404–5 Bressan, P. and Vallortigara, G. 527–8
Bendixen, A.  610, 611 Bressler, S.L.  997
Benedikt, M. 879 Bressler, S.L. and Menon, V.  991
Ben-Shahar, O. 976 Briscoe, E. 940
Benussi, V.  6, 30–1, 33, 832 Briscoe, E. and Feldman, J.  935
dispute with Koffka  31–2 Britz, J. 791
on stereokinetic effect 522–3 Brooks, J.L. and Driver, J.  78
Berdyyeva, T.K. and Olson, C.R.  821 Brown, A.M., Lindsey, D.T., and
Berenthal, B.I. and Pinto, J.  578 Guckes, K.M.  444
Bergen, J.R. and Adelson, E.H.  172 Brown, C. 720
Bergen, J.R. and Julesz, B.  172 Brown, J.W.  1046, 1055
Bergenheim, M. 829 Bruno, N.  299, 990
Berger, H. 993 Bruno, N. and Bertamini, M.  514
Bergmann Tiest, W.M.  627 Bruno, N. and Gerbino, W.  309, 514
Bergström, S.S.  399, 407 Brunswick, E.  1031
Berkeley, G.  1058 Brunswik, E. and Kamiya, J.  692
Berkes, P. 974 Buffart, H.  297, 299, 989
Berlin, B. and Kay, P.  444 Bukach, C.M.  762
Bertamini, M.  284, 286 Bulf, H. 310
Bertenthal, B.I. and Pinto, J.  578 Bullier, J. 275
Bertrami, M. and Croucher, C.J.  289 Bülthoff, I. 578
Bertrami, M. and Farrant, T.  290 Bundesen, C.  737, 749
Bertrami, M. and Helmy M.S.  286, 290 Burnham, K.P. and Anderson, D.R.  1021
Bertrami, M. and Hulleman, J.  286 Burns, B. and Shepp, B.E.  437
Bertrami, M. and Lawson, R.  288 Burr, D.C.  516
Bertrami, M. and Mosca, F.  290 Burton, H.E.  1058
Bhatt, R.S. 702–3 Bushnell, B.N.  301
Bhatt, R.S. and Quinn, P.C.  700, 708–9 Bushnell, B.N. and Pasupathy, A.  970
Biederman, I.  110, 568, 570, 920, 937 Busigny, T.  761–2
Biederman, I. and Kalocsai, P.  964 Busigny, T. and Rossion, B.  723
Bienenstock, E.S., Geman, S., and Potter, D.  922 Busse, L. 728
Binford, T. 936 Button, K.S.  729
Blaha, L.M.  953 Buzsaki, G. and Draghun, A.  999
Blair, C.B.  548
Blair, M. and Homa, D.L.  935 Cai, M., Stetson, C., and Eagleman, D.M.  830
Blake, R.  515, 727, 790 Caldara, R. 762
Blakemore, C. and Tobin, E.A.  975 Calvert, G.A.  514
Bloj, M.G., Kersten, D., and Hurlbert, A.C.  456 Canolty, R.T.  999
Blum, H.  377, 877 Caparos, S.  721, 722, 723, 725, 729
Blum, H. and Nagel, R.N.  245–6 Capelle, C. 656
Blumenfeld, W. 632 Caplovitz, G.P.  546–8
Boenke, L.T.  980, 981, 982 Caplovitz, G.P. and Tse, P.U.  549, 550
Bolte, S.  724, 726, 727, 728 Carlson, E.T.  970
Bonato, F. 403 Carmel, D.  791, 803
Bond, A.B. and Kamil, A.C.  852 Carruthers, P.  1018
Bonneh, Y.S., Cooperman, A., and Sagi, D.  806 Carter, O. 631
Boring, E.G.  104 Casati, R. and Varzi, A.C.  287
Boselie, F. 299 Casco, C. 79
Bosten, J.M.  439 Casile, A. and Giese, M.A.  581
Botticelli (Alessandro di Mariano vi Vanni Filipepi)  893n22 Castet, E. 510
Bouvet, L. 724 Castet, E. and Wuerger, S.  512
Bovill, C. 880 Cataliotti, J. and Gilchrist, A.L.  397
Boyaci, H. 402 Caudek, C. 552
Bozzi, P.  285, 286, 420 Cavanagh, P.  582, 583
Brainard, D.H. and Maloney, L.T.  470, 475 Cavedon, A. 286
Brainard, D.H., Brunt, W.A., and Speigle, J.M.  454 Cavina-Pratesi, C. 441
INDEX OF NAMES 1065

Chaitin, G.  1019, 1029 Das, A. and Gilbert, C.D.  975, 976


Chakravarthi, R. and Pelli, D.G.  200 Davidson, P.W.  622–3, 625
Chan, L.K.H. and Hayward, W.G.  215 Davis, G. and Driver, J.  979
Chandrasekaran, C. 582 Davis, R.A.O.  727
Chang, D.  631, 632 Daw, N.W.  334–5, 448
Charpentier, A. 629 Dawson, M.R.W.  494
Chater, N.  1019, 1033 Dayan, P. 791
Chebat, D.R.  661, 662 De Baene, W.  935
Cheek, L. 881 de Gardelle, V. and Summerfield, C.  160
Chen, L.  95, 283, 963 de Gelder, B.  764, 807
Chen, Y.C.  790 de Gelder, B. and Rouw, R.  762
Cheung, O.S.  768 DeGutis, J.  760, 762
Chiao, C.-C.  851 de Haan, B. and Rorden, C.  749
Chiao, C.-C. and Hanlon, R.T.  850 de Haas, B.  813
Chisholm, R.M.  34 Dehaene, S.  971, 998, 999
Chomsky, N.  1022 de la Torre, I.  875
Chong, S.C. and Blake, R.  787 Delogu, F.  418–9
Chong, S.C. and Treisman, A.  159 DeLucia, P.R. and Ott, T.E.  516
Chuang, J.  413n2 Del Viva, M.M.  727
Chubb, C. 152 Del Viva, M.M., Gori, M., and Burr, D.C.  826
Clark, V.P.  977 Dempster, A.P., Laird, N.M., and Rubin, D.B.  925
Clemons, J. 656 Denis, M.  887, 911
Clifford, C.W.G., Holcombe, A.O., and Pearson, J.  821 Dennett, D. and Kinsbourne, M.  823, 997
Cohen, L.G.  658 Desimone, R. and Duncan, J.  272–3, 970
Cole, R.E. and Diamond, A.L.  403 de Winter, J. and Wagemans, J.  236, 939
Coleman, M.J.  992 de Wit, T.  301
Collard, R.F.A. and Buffart, H.F.J.M.  1031 de-Wit, L.H.  721, 805
Collignon, O.  664, 665 Dickinson, S.J., Pentland, A.P., and Rosenfeld, A.  922
Conci, M. 811 Dijkstra, E.W.  1040
Conrad, V. 790 Dimitrov, P. and Zucker, S.W.  380
Cook, F.H.  1052 Dimitrova, M. 854
Copeland, A.M. and Wenger, M.J.  961 Dimitrova, M. and Merilaita, S.  851, 852
Corballis, M.C. and Roldan, C.E.  111, 117 Ding, N. and Simon, J.Z.  611
Cordes, D. 992 Dinnerstein, D. and Wertheimer, M.  990
Coren, S. and Girgus, J.S.  76, 674 Di Russo, F.  977
Cornes, K.  957, 959, 961 Dishon-Berkovitz, M. 980
Corthout, E. and Supèr, H.  974 Doherty, M.J.  728
Costello, P. 807 Domini, F. and Caudek, C.  534, 552–3
Cott, H.  843–4, 847, 853, 854–5 Donner, T. 806
Cox, R.T.  1011 Dorrenhaus, W. 779
Craft, E.  328, 355, 356, 376, 938 Dostmohamed, H. and Hayward, V.  624–5
Cragin, A.I.  99, 103 Dostoyevsky, F.M.  886
Crick, F.  1048n13 Driver, J. 721
Crick, F. and Koch, C.  334 Driver, J. and Baylis, G.C.  267–8, 272n6
Cronbach, L. 714 Dry, M. 112
Crook, A.C., Baddeley, R.J., and Osorio, D.  850 Duchaine, B. 763
Csathó, A.  117, 121 Duchaine, B. and Nakayama, K.  762
Culling, J.F. and Summerfield, Q.  605 Duchamp, M.  523, 886
Curby, K.M.  765–6 Dumoulin, S.O.  195, 196
Curby, K.M. and Gauthier, I.  765 Duncan, J. and Humphreys, G.W.  852, 970, 971
Cusack, R. 611 Duncan, R.O.  511
Cusack, R. and Carlyon, R.P.  613 Duncan F.S.  523
Cuthill, I.C.  844, 851, 853–4 Duncker, K.  10, 96, 399, 400, 493–4
Cuthill, I.C. and Székely, A.  854 Dumoulin, S.O.  196
Cuthill, I.C. and Troscianko, T.S.  852, 853 Dunmoulin, S.O. and Hess, R.F.  805
Cutting, J.E.  577–8 Dunn, B. and Leibowitz, H.  403
Dutton, D. 875
Dakin, S.C.  154, 180 Dyson, B.J.  610
Dakin, S.C. and Bex, P.J.  822
Dakin, S.C. and Herbert, A.M.  112 Eagleman, D.M.  825, 827
Dakin, S.C. and Watt, R.J.  118–9, 153 Earman, J.  1011
Dale, G. and Arnell, K.M.  729 Eckhorn, R. 998
Dali, S.  915n92 Economou, E.  406
1066 INDEX OF NAMES

Economou, E. and Gilchrist, A.  405 Field, D.J.  190, 191, 192, 197, 213
Edelman, G.M.  1039 Field, D.J., Hayes, A., and Hess, R.  215
Edelman, S. and Bülthoff, H.H.  921 Fific, M. and Townsend, J.T.  959–60
Egeth, H.E. and Yantis, S.  971 de Finetti, B.  1011
Eglash, R. 880 Fiorani, M. 975
Egly, R. 743 Fisher, R.  1011n3
Egner, T. and Hirsch, J.  980 Fishman, Y.I.  609
Eguilez, V.M.  993 Fitzgibbon, S.P.  996
Ehrenfels, C. von  5, 30, 871 Fitzpatrick, D.  938, 975
Ehrenstein, W.  302, 303 Forkman, B. and Vallortigara, G.  310
Eidels, A.  962, 963 Förster, J. and Higgins, E.  722
Eimer, M. 979 Foster, R.M. and Franz, V.H.  682
Ekroll, V. 399 Fox, K. 335
Ekroll, V. and Faul, F.  427, 455 Fox, M.D. and Raichle, M.E.  992
Elder, J.H.  218, 219, 228 Fox, R. and Check, R.  784
Elder, J.H. and Goldberg, R.M.  197, 212, 213, 214, Foxe, J.J. and Simpson, G.V.  977
214–5, 215, 216 Foxe, J.J. and Snyder, A.C.  994
Elder, J.H. and Velisavljević, L.  207–8, 209, 224 Francis, J.E.  876
Elder, J.H. and Zucker, S.W.  215, 220–1, 378 Franconeri, S. 822
Elhilali, M. 611 Franz, V.H.  685
Elliot, J. 436 Fraser, S. 854
Elliot, M.A. and Müller, H.J.  720 Freeman, E.  748–9
Ellis, R.R. and Lederman, S.J.  629 Freeman, E. and Driver, J.  516, 832
Ellis, W.D.  399 Freeman, W.J.  982
Elman, J.  1018 Freeman, W.J. and van Dijk, B.W.  982
Endler, J.A.  852 Freiwald, W.A.  768
Engel, A.K. and Singer, W.  998 Friedman, H.S.  445
Enns, J.T.  135, 136 Fries, P. 993
Enns, J.T. and Rensink, R.A.  971 Friston, K. 722
Eriksen, B.A. and Eriksen, C.W.  980 Frith, U.  716, 727
Ernst, M.O. and Banks, M.S.  516, 657 Froyen, V.  355, 356
Ernst, U.A.  197–8 Fry, G.A. and Alpern, M.  403
Escera, C. 610 Fu, K.-S.  922
Estrada, F. and Elder, J.H.  226 Fujimoto, K. and Yagi, A.  582, 583
Evans, K.K. and Treisman, A.  832 Fujisaki, W. 829
Exner, S.  4, 488, 825 Fujisaki, W. and Nishida, S.  645–6
Fulvio, J.M. and Singh, M.  249–50
Fahrenfort, J.J.  275 Fulvio,J.M., Singh, M., and Maloney, L.T.  241–2
Faivre, N. 807
Faivre, N. Berthet, V., and Kouider, S.  807 Gabo, N., Constructed Head No. 2  906
Falconbridge, M. 809 Gaillard, R. 999
Fang, F.  721, 805, 812 Gamboni, D. 915
Fang, F., Boyaci, H., and Kersten, D.  349 Ganel, T.  676, 677, 681–2
Fantoni, C. 298 Ganel, T. and Goodale, M.A.  678
Fantoni, C. and Gerbino, W.  296, 299 Gao, Z. 765
Farah, M.J.  760 Garner, W.R.  99, 766, 953–5, 962, 963, 980–1,
Farid, H. and Adelson, E.H.  822 1029, 1031
Farroni, T.  696–7 Gasper, K. and Clore, G.L.  722
Faul, F. and Ekroll, V.  453, 469 Gauss, C.F.  1060
Fechner, G.T.  42, 117 Geem, Z.W., Kim, J.H., and Loganathan, G.V.  1056
Feldman, J.  222, 937, 941–2, 1013, 1014, 1015, 1034, Geisler, W.S.  197, 213, 216, 804, 1013
1035, 1036 Geisler, W.S. and Diehl, R.L.  1022
Feldman, J. and Singh, M.  246, 249, 287–8, 939, 976, 1015–6 Geisler, W.S. and Perry, J.S.  215
Fell, J. 998 Gelb, A.  11, 394, 397, 398, 458
Felleman, D. and Essen, D.V.  377 Geman, S. 922
Felleman, D. and Van Essen, D.C.  969 Gentaz, E. 634
Felzenszwalb, P.F.  929 Gepshtein, S. and Kubovy, M.  72–3, 74, 76, 972
Felzenszwalb, P.F. and Huttenlocher, D.P.  926 Gerbino, W. 427
Fennema, C.L. and Thompson, W.B.  507 Gerbino, W. and Salmaso, D.  299, 300
Ferguson, G., Messenger, J., and Budelmann, B.  855–6 Ghim, H.R.  695
Feynman, R.  1039 Ghim, H.R. and Eimas, P.D.  694–5
Ffytche, D.H. and Zeki, S.  309 Gibson, B.S.  289
INDEX OF NAMES 1067

Gibson, J.J.  16, 167, 396, 625, 626, 872, 972, 1052–3 Häkkinen, J. and Nyman, G.  809
Giese, M.A.  579 Halko, M.A.  304
Giese, M.A. and Poggio, T.  586, 588 Hall, J.R.  856
Gilaie-Dotan, S. 975 Halliday, A. and Mingay, R.  829
Gilbert, A. 444 Hamers, J.F. and Lambert, W.E.  980
Gilbert, C.D.  1039 Han, S.  72, 79, 142, 749, 977–978
Gilbert, G.M.  646 Han, S. and Humphreys, G.  748
Gilchrist, A.  391, 394, 399, 400, 402, 407–8, 448, 455, Han, X. 383
470, 938 Hanlon, R.T.  848
Gilchrist, I. 215 Hanlon, R.T. and Messenger, J.B.  850
Gillam, B.J.  305, 306 Hansmeyer, M. 875
Gillam, B.J. and Grove, P.M.  266, 286 Happé, F.G.  727
Gillam, B.J. and Nakayama, K.  809 Harbisson, N. 666
Gintautas, V. 224 Harding, G., Harris, J.H., and Bloj, M.  456
Giralt, N. and Bloom, P.  283 Harnad, S. 935
Girshick, A.R., Landy, M.S., and Simoncelli, E.P.  156 Harrar, V.  646–7
Glass, L. 114 Harrar, V. and Harris, L.R. 642–3n4,  829
Glass, L. and Switkes, E.  215 Harris, A. and Aguirre, G.K.  768
Glover, S. and Dixon, P.  676 Harris, J.J.  809
Glynn, A.J. 294n2 Harris, L.R.  825
Godfrey, D., Lythgoe, J.N., and Rumball, D.A.  857 Harrison, S. and Feldman, J.  938
Goethe, J.W. 5 Hartline, H.K.  363
Goffaux, V. 768 Hassenstein, B. and Reichardt, W.  489
Gogel, W.C. and Mershon, D.H.  403 Hatfield, G. and Epstein, W.  1018
Goldberg, R. 864 Hayden, A.  701, 703, 705–6
Goldberger, P. 881 Haynes, J.-D. and Rees, G.  800, 808
Goldmeier, E. 11 Haynes, J.-D., Driver, J., and Rees, G.  806
Goldreich, D. and Peterson, M.A.  265 He, D., Kersten, D., and Fang, F.  805
Goldsmith, M. and Yeari, M.  749 He, Y. 991
Goldstein, K. and Gelb, A.  443 Heath, M. 682
Gombrich, E.H.  880 Heath Robinson, W.  864
Gong, P.  993, 994, 997 Hebb, D.O.  692
Gong, P. and van Leeuwen, C.  982, 991, 993 Heeger, D.J. and Bergen, J.R.  174, 176
Gonzalez, C.L.R.  676–7 Heider, F. 15
Goodale, M.A. and Milner, A.D.  672, 972 Heider, F. and Simmel, M.  872
Goodbourn, P.T.  723–4 Helmholtz, H. von  295, 392, 395, 402, 415, 632, 786,
Goodman, N.D.  935 1008, 1029
Goodwin, A.W.  623 Henshilwood, C.S.  880
Gordon, I.A. and Morrison, V.  623 Hering, E.  24, 27, 393, 395, 396, 398, 400, 786
Gordon, J. and Shapley, R.  364 Hernandez, A. 995
Goryo, K. 788 Heron, J. 832
Gottschaldt, K.  9, 10, 14, 15 Hess, C.V.  825
Graf, E.W.  516 Hess, R.  1058
Graham, D.J. and Field, D.J.  875 Hess, R.F. and Dakin, S.C.  198
Granit, R. 9 Hess, R.F. and Field, D.J.  194
Grassmann, H.  438–9 Hesselmann, G.  974, 992
Gray, C.M.  998, 1039 Hildebrand, A.  869, 872
Gray, K.L.  789 Hillebrand, F. 632
Green, D.M. and Swets, J.A.  955 Hillier, B. and Hanson, J.  879
Gregory, R.  307, 674, 811, 1018 Hillyard, S.A.  974
Grelling, K. 11 Hiris, E. 583
Griffiths, T.D. and Warren, J.D.  603–4 Hochberg, J. and Hardy, D.  214
Grinter, E.J.  724, 727 Hochberg, J. and McAlister, E.  81, 1018, 1028, 1029
Grosof, D.H.  975 Hochstein, S. and Ahissar, M.  143, 973
Gross, J. 996 Hock, H.S. and Nichols, D.F.  561, 564–5, 570
Grossberg, S. 328 Hoffman, D.D.  1056
Grosseteste, R.  436–7 Hoffman, D.D. and Richards, W.A.  243
Grossman, E.D.  583 Hoffman, D.D. and Singh, M.  262
Gutschalk, A. 610 Hohmuth, A. 627
Holcombe, A.O. and Cavanagh, P.  823
Hafed, Z.M. and Krauzlis, R.J.  515 Holcombe, A.O. and Clifford, C.W.  820–1
Haffenden, A.M. and Goodale, M.A.  685–6 Holcombe, A.O., Kanwisher, N., and Treisman, A.  821
1068 INDEX OF NAMES

Holcombe, A.O., Linares, D.I., and Jordan, C. 207


Vaziri-Pashkam, M.  821–2 Jordan, G. 439
Hole, G.J.  759, 760 Jordan, H. 579
Horowitz, T.S. and Kuzmova, Y.  290 Julesz, B.  89, 150–1, 167, 170–1, 844, 847
Houston, A.I., Stevens, M., and Cuthill, I.C.  852 Jupp, J. and Gero, J.S.  876
Howe, C.Q. and Purves, D.  811, 1036
Hsieh, P.-J. and Tse, P.U.  548, 550 Kafaligonul, H. and Stoner, G.R.  832
Hu, B. and Knill, D.C. 515–6 Kahneman, D. 495
Huang, J. 647 Kahneman, D. and Henik, A.  971
Huang, P.-C.  199–200 Kahrimanovic, M.  628–9
Hubel, D.H. and Wiesel, T.N.  105, 363, 506, 969, 970 Kaiser, M.K.  549
Hubner, R. and Volberg, G.  132 Káldy, Z. and Kovács, I.  728
Huddleston, W.E.  646, 647 Kamphuisen, A. 791
Hugrass, L. and Crewther, D.  803 Kanai, R.  791, 803
Hulleman, J. and Humphreys, G.W.  264, 288, 989 Kanai, R., Bahrami, B., and Rees, G.  803
Humbert de Superville, D.P.G.  897 Kaneko, K. and Tsuda, I.  994
Hume, D.  1010 Kang, M.S.  783
Humphreys, G.W.  741 Kang, M.S. and Blake, R.  790
Humphreys, G.W. and Riddoch, M.  744 Kangas, A. 703
Hung, C.-C., Carlson, E.T., and Connor, C.E.  357 Kanizsa, G.  14, 31, 96, 222, 294, 296, 311, 421n9,
Hunt, A.R. and Halper, F.  582, 583 971, 1054
Hunt, J.J., Mattingley, J.M., and Goodhill, G.J.  804 on experimental phenomenology  23–4
Hunter, I.M.L.  622–3, 625 on modal completion  306–7, 308
Hupé, J.M.  640–2 on transparency  416–7, 418, 426–7
Husk, J.S., Huang, P.C., and Hess, R.F.  156 Kant, I.  1048
Husserl, E.  29, 890n14 Kapadia, M.K.  195, 975, 976, 977
Kappers, A.M.L.  632, 633, 634
Ikeda, M. and Uchikawa, K.  631 Kardos, L.  11, 394, 395, 400, 405
Indow, T. and Kanazawa, K.  437 Karmarkar, U.R. and Buonomano, D.V.  822
Indow, T. and Uchizono T.  437 Karni, A. 992
Ingres, J.A.D., La Source  912 Kasai, T. 979
Ingvalson, E.M. and Wenger, M.J.  962–3 Kasai, T. and Kondo, M.  979
Intraub, H. and Richardson, M.  867 Kastner, S.  970, 975
Ito, J. 994 Katz, D.  395, 438
Itti, L. and Koch, C.  737 Kay, P. and Kempton, W.  444
Iturria-Medina, Y. 991 Keane, M.P.  865
Ivry, R.B. and Robertson, L.C.  131 Keetels, M. and Vroomen, J.  828
Ivry, R.B. and Schlerf, J.E.  822 Kelemen, O. 726
Kellman, P.J.  62, 298, 311
Jackson, S. and Blake, R.  578 Kellman, P.J. and Shipley, T.  214, 242, 296, 298,
Jacobs, A. and Shiffrar, M.  587 299, 310
Jacobs, D. 221 Kellman, P.J. and Spelke, E.S.  695–6
James, W. 821 Kelman, E.J.  851
Jaśkowski, P. 821 Kelman, E.J., Osorio, D., and Baddeley, R.J.  850
Jastorff, J. 583 Kelman, E.J., Tiptus, P., and Osorio, D.  849
Jastorff, J. and Orban, G.A.  585 Kennedy, J.R.  631
Jastrow, J. 627 Kentridge, R.W.  745
Jausovec, N. and Jausovec, K.  994 Kersten, D. 427
Jaynes, E.T.  1011, 1017 Khoe, W.  975, 977
Jeffreys, H.  1017 Kienker, P.K.  267, 272, 351, 353
Jehee, J.F.  328, 355 Kim, C.Y. and Blake, R.  779, 781
Jenkins, B.  112, 119–20 Kim, S.-H. and Feldman, J.  265, 938
Jepson, A. and Richards, W.A.  937 Kimchi, R.  72, 132–4, 135–6, 139–42, 744, 766–7
Jiang, Y. 789 Kimchi, R. and Amishav, R.  769
Jiang, Y., Costello, P., and He, S.  804 Kimchi, R. and Bloch, B.  973
Jin, F.-F. and Geman, S.  922 Kimchi, R. and Hadad, B.-S.  76
Jiroh, T. and Keane, M.P.  867 Kimchi, R. and Palmer, S.E.  138
Johansson, G.  399, 494, 575–7 Kimchi, R. and Razpurker-Apfeld, I.  746, 748
Johnson, S.C.  726 Kinchla, R.A.  130
Jones, L.A.  629 Kingdom, F.A.  421, 423, 456
Jones, M. and Love, B.C.  1022 Kingdom, F.A., Hayes, A., and Field, D.J.  151, 153
Jones, M.R.  611 Kinoshita, M., Gilbert, C.D., and Das, A.  805
INDEX OF NAMES 1069

Kitzbichler, M.G.  994, 997 Kunsberg, B. and Zucker, S.W.  372


Klatzky, R.L.  626 Kurylo, D.D.  725, 726
Klee, P.,  413 Kwok, H.F.  991
Klemm, O.  828–9
Klimesch, W. 994 Lachmann, T. and van Leeuwen, C.  1031
Klink, P.C.  787 Lack, L.C.  786
Knapen, T. 791 Lakatos, P. 999
Knierim, J.J. and Van Essen, D.C.  324 Lamme, V.A.F.  321, 324, 328, 347, 974
Knill, D.C. and Richards, W.  1012 Lamme, V.A.F. and Roelfsema, P.R.  275, 806, 1038
Koch, C. and Ullman, S.  737 Lamme, V.A.F., Supèr,H., and Spekreijse, H.  1038–9
Koenderink, J.J.J.  179, 833, 1059 Land, E.H  450
Koenderink, J.J.J. and van Doorn, A.  1059–60 Land, E.H. and McCann, J.J.  398, 450
Koenderink, J.J.J., Richards, W., and Langfeld, H.S.  627
van Doorn, A.  820, 1053 Langridge, K.V.  855
Koenderink, J.J.J., van Doorn, A., and Langridge, K.V., Broom, M., and Osorio, D.  850
Todd, J.T. 1058–9 Laplace, P.-S.  1010, 1017
Koenis, M.M.G.  992 Lappe, M. 581
Koffka, K.  4, 5, 6–7, 11, 15, 16, 294, 295, 691, 949, Larkum, M. 335
972, 1028 Lasaga, M.I.  139
analysis of art  872 Lashley, K.S.  13
on colour  454–5 Latora, V. and Marchiori, M.  991
on constancy hypothesis  391 Laurinen, P.I.  403
dispute with Benussi  31–2 Lavie, N. 611
on edge classification  396 Lawson, L. (Twiggy)  901
on frameworks  400 Lawson, R.B.and Gulick, W.L.  308
on lightness  394, 398, 399, 401, 402, 407 Lazebnik, S., Schmid, C., and Ponce, J.  927, 928
Kogo, N.  311, 355, 356, 357, 810 Le Grand, R.  762, 763
Kogo, N. and van Ee, R.  938 Lederman, and Klatzky,  622
Kogo, N. and Wagemans, J.  303 Lee, K.H.  998
Kohler, P.J.  548, 550 Lee, P.  1017, 1020
Köhler, W.  4, 5, 6, 11, 15–16, 378, 398, 691, 823, Lee, S.H.  787
1028, 1048 Lee, S.H. and Blake, R.  67, 822
‘physical Gestalten’ and isomorphism 7–9 Lee, T.S  367
Kok, P. 974 Lee, T.S. and Mumford, D.  973
Kok, P., Jehee, J.F.M., and de Lange, F.P.  805 Lee, T.S. and Nguyen, M.  309
Kolmogorov, A.N.  1019, 1029 Leeuwenberg, E.L.J.  110, 117, 297, 417, 419, 1029
Komar, V. and Melamid, A.  875 Leeuwenberg, E.L.J. and Boselie, F.  1018, 1031
Komatsu, H. 309 Leeuwenberg, E.L.J. and van der Helm, P.A.  143
Konen, Ch. and Kastner, S.  972, 989 Leeuwenberg, E.L.J., van der Helm, P.A., and
Kopfermann, H. 9 van Lier, R.J.  1034
Kopinska, A. and Harris, L.R.  824 Lehmann, D. 994
Korte, A. 488 Leibniz, G.W.  1052
Korzybski, A,  1048n11 Leibowitz, H. 403
Kourtzi, Z. 195 Leonardo da Vinci  913
Kourtzi, Z. and Kanwisher, N. 300–1, 351–2 Leopold, D.A.  579
Kovács, I.  779, 780 Leopold, D.A  803–4
Kovács, I. and Julesz, B.  220 Leopold, D.A. and Logothetis, N.K.  790–1
Kovács, I., Fehér, A., and Julesz, B.  877 Lescroart, M.D. and Biederman, I.  357
Koyama, S. 549 Lettvin, J.Y.  177
Kozaki, A. and Noguchi, K.  399 Levelt, W. 784
Krishna, A. 628 Levi, D.M.  103
Kruger, N. 213 Levi, D.M. and Carney, T.  180
Kubilius, J. 105 Levine, D.N. and Calvanio, R.  761
Kubilius, J., Wagemans, J., and Op de Beeck, H.P.  952 Levinthal, B.R. and Franconeri, S.L.  60
Kubovy, M.  60, 211, 972 Lewin, K.  10, 10, 15
Kubovy, M. and Pomerantz, J.R.  949 Leyton, M. 878
Kubovy, M. and Van Valkenburg, D.  95 Li, F.-F. and Perona, P.  927, 928
Kubovy, M. and Wagemans, J.  60, 76, 570 Li, G. and Zucker, S.W. 371–  2
Kubovy, M. and Yu, M.  642, 646 Li, M. and Vitányi, P.  1033, 1037
Kuffler, S.W.  363 Li, Z.  197, 327–8, 328–9
Kuitert, W. 865 Liberman, A.M.  587
Kumada, T. and Humphreys, G.  741 Liebmann, S. 9
1070 INDEX OF NAMES

Likova, L.T. and Tyler, C.W.  276, 349 Martelli, M. 177


Lin, Z. and He, S.  495n2,  807 Martinez, A.  743–4, 977
Ling, S. and Blake, R.  808 Maruya, K.  515, 790
Linkenkaer-Hansen, K. 993 Maruya, K. and Blake, R.  804
Lisman, J.E. and Idiart, M.A.  998 Masin, S.C.  425–6, 428
Liu, L.,  990 Massironi, M. 877
Liu, L. and Ionnides, A.A.  1000 Masuda, T. and Nisbett, R.E.  722
Liu, L., Stevenson, S.B., and Schor, C.M.  809 Masuno, S. 881
Liu, Z., Jacobs, D.W., and Basri, R.  221 Mathewson, K.E.  994
Livanov, M.N.  996 Mattingley, J. 740–1
Livingstone, M. and Hubel, D.  969 Mattingley, J.B., Davis, G., and Driver, J.  811
Lobmaier, J.S.  763 Matussek, P. 716
Loffler, G. and Orbach, H.S.  509 Maunsell, J.H.  826
Loftus, G.R.  764 May, K.A. and Hess, R.F.  191, 198–9, 200, 201
Loomis, J.M.  630–1 McCann, J.J. and Savoy, R.L.  403
Loos, A. 870 McCarthy, J.D.  551
Lopes da Silva, F.H.  994, 995 McClelland, J.L.  1022
Lorenz, K.  1050–1 McClelland, J.L. and Rumelhart, D.E.  275
Losciuto, L.A. and Hartley, E.L.  788 McCollough, C. 448
Lotze, R.H.  1058 McDermott, J. 516
Lowe, D.G.  921, 936 McDougal, W. 497
Lowe, M.J.  992 McGugin, R.W.  760
Lu, H. and Liu, Z.  581 McGurk, H. and McDonald, J.  813
Luck, S.J.  974, 979 McIlhagga, W.H. and Mullen, K.T.  193–4, 446–7
Luck, S.J. and Hilliard, S.A.  979 McKeefry, D.J., Laviers, E.G., and McGraw, P.V.  448
Lumer, E.D.  790 McLachlan, G.J. and Basford, K.E.  941
Lumer, E.D., Friston, K.J., and Rees, G.  803 McLeod, P. and Jenkins, S.  826
Lund, J.S.  976 McLeod, P., McLaughlin, C., and Nimmo-Smith, I.  826
Lunghi, C. 790 McMains, S. and Kastner, S.  747–8, 749
Lunghi, C. and Alais, D.  790 Medin, D.L. and Schaffer, M.M.  935
Luria, A. 736 Mefferd, R.B., Jr.  522
Lutz, A. 998 Meijer, P.B.L.  656
Lybrand, W.A.  992 Meinong, A.  29, 32, 33, 1047n8
Melara, R.D. and Mounts, J.R.  981
Mach, E.  30, 109, 372, 393, 394, 521–2, 1035 Meng, M. and Tong, F.  787, 803
Machilsen, B., Pauwels, M., and Wagemans, J.  222 Merabet, L.B.  664, 665
MacIver, M.A.  1053 Meredith, M.A.  830
Mack, A.  78, 169–70, 744–5 Merilaita, S.  852, 853
MacKay, D.  1031, 1036 Merilaita, S. and Lind, J.  851
Macknik, S.L. and Livingstone, M.S.  800 Mesgarani, N. and Chang, E.F.  612
Maddox, W.T.  960 Messerschmidt, F.X., character heads  899, 903
Maehara, G. 804 Mestry, N. 961
Magee, L.E. and Kennedy, J.M.  631 Metelli, F.  14, 399, 426n15, 468, 479–80
Magritte, R. 915 on transparency  416–7, 418, 421–3, 424
Malach, R. 976 Metzger, W.  9, 13–15, 372, 428n17
Malevich, K.S.  893n20 on camouflage  843, 844
Malik, J. and Perona, P.  172 on experimental phenomenology  24
Malmierca, M.S.  609 on kinetic depth effect  528–30
Maloney, L.T.  1020 on transparency 414–5
Maloney, R.K.  116, 119 Mevorach, C. 132
Mangun, G.R.  979 Michotte, A.  15, 298
Manzoni, P.  887n7 Michotte, A. and Burke, L.  294n2
Merda d’artista  889 Miles, W.R.  529
Manzotti, R. and Moderato, P.  1054 Miller, A.L. and Sheldon, R.  153
Marey, E.-J.  575 Milne, E. 727
Markram, H.  995–6 Milne, E. and Szczerbinski, M.  724–5, 727, 729
Marks, L.E.  981 Milne, J.L. 679–80
Marlow, P.J.  477 Milner, A.D. and Goodale, M.A.  682, 684
Marr, D.  363, 399, 820, 1022, 1028 Milner, P.M.  993
Marr, D. and Nishihara, H.K.  248 Minsky, M.  1050
Marr, D. and Vaina, L.  587 Mirenzi, A. and Hiris, E.  579
Marshall, N.J. and Messenger, J.B.  850, 851 Mitchell, J.F.  787
INDEX OF NAMES 1071

Mitchell, P. 728 Nishida, S. and Johnston, A.  823


Mitchison, G.J. and Westheimer, G.  221 Nitschke, G.  866–7, 877, 880
Mohan, R. and Nevatia, R.  921 Noguchi, K. and Kozaki, A.  399
Mondloch, C.J.  136 Norman, D.A.  880–1
Mondrian, P.C.  893n19 Norman, H.F.  784
Monge, G. 442 Norman, J.F.  626
Monroe, M. 901 Norman, J.F., Phillips, F., and Ross, H.E.  236, 237
Montag, C., Gallinat, J., and Heinz, A.  1054 Norman, L.J.  743
Montaser-Koushari, L. 309 Nosofsky, R.M.  935
Monteiro, A., Brakefield, P.M., and
French, V. 851 O’Craven, K.M.  802
Moore, B.C.  605 Ogawa, K. 868
Moore, C.M.  743 Öğmen, H.  498, 499
Moore, C.M. and Egeth, H.  745 O’Leary, A. and Rhodes, G.  640, 641
Moore, C.M. and Enns, J.T.  495n2 Oliva, A. and Torralba, A.  768, 927, 928
Morales, D. and Pashler, H.  111 Olkkonen, K.  403–4
Morein-Zamir, S. 832 Olson, R.K. and Attneave, F.  167
Morgan, M.J.  153–4, 831 Ooi, T.L. and He, Z.J.  779, 786–7, 803
Morgan, M.J. and Glennerster, A.  160 Op de Beeck, H.  970
Morgan, M.J., Chubb, C., and Solomon, J.A.  153 Oppenheim, P. 11
Motoyoshi, I.  152–3, 476 Oram, M.W.  825, 826
Mottron, L. 721 Orban de Xivry, J.J.  762
Moutoussis, K. 824 Ortiz, T. 659
Moutoussis, K. and Zeki, S.  808 Osaka, M. 994
Movshon, J.A.  510 O’Shea, R.P. and Corballis, P.M.  782–3
Muckli, L. 721 Osorio, D. and Srinivasan, M.V.  854
Mudrik, L.  789, 807 Ostrovsky, Y. 536
Mullen, K.T.  193–4 O’Toole, A.J. and Walker, C.L.  288
Müller, G.E. 8 Otto, T.U.  497, 498
Mumford, D. 323 Overvliet, K. 632
Murray, S.O.  79, 721, 805, 974 Oyama, T.  15, 59, 211, 212
Murray, S.O., Boyaci, H., and Kersten, D.  812
Murthy, V.N. and Fetz, F.E.  995 Pack, C.C.  510, 549
Musatti, C.  14, 399, 407, 427n16, 428n17 Paffen, C.L.E.  787, 790
on stereokinetic effect  523–5 Paffen, C.L.E. and Van der Stigchel, S.  787
Muybridge, E. 575 Palmer, G. 436
Myczek, K. and Simons, D.J.  159 Palmer, S.E.  65–66, 284, 287, 309, 660, 701
Palmer, S.E. and Beck, D.M.  74
Näätänen, R. 611 Palmer, S.E. and Brooks, J.L.  69–70, 78, 263, 264
Nagel, T.  1054 Palmer, S.E. and Ghose, T.  263–4
Nager, W. 610 Palmer, S.E., Neff, J., and Beck, D.  78
Naito, A. and Nishikawa, T.  866 Palmer, S.E. and Nelson, R.  78
Nakajima, Y. 605 Palmer, S.E. and Rock, I.  78
Nakatani, C.  970, 991, 996, 999–1000 Palmer, T.D. and Ramsey, A.K.  813
Nakatani, C. and van Leeuwen , C.  996 Pan, Y. 309
Nakayama, K.  308, 420 Panday, V.  626–7
Nakayama, K. and Silverman, G.H.  970 Panofski, E.  1059
Nam, J.H. and Chubb, C.  152 Parent, P. and Zucker, S.W.  213
Navon, D.  129–30, 131, 138–9, 270 Parise, C.V. and Spence, C.  648–9
Neisser, U.  89, 970 Parkes, L. 154
Nelson, R. 287 Parkkonen, L. 349
Nelson, R. and Palmer, S.E.  286 Pascual-Leone, A. and Hamilton, R.  657–8
Neri, P.  578–9 Pastukhov, A. and Braun, J.  787
Neuhaus, W. 488 Patching, G.R. and Quinlan, P.T.  981
Neurath, O.  910 Pawluk, D. 622
Newell, A. 713 Paz, R. 612
Newson, L.J.  403 Pearson, J. 788
Nieder, A. 310 Pecora, L.M. and Caroll, T.L.  993
Nijhawan, R. 826 Peelen, M.V. and Downing, P.E.  585
Nikolaev, A.R.  720, 972, 978–9, 997–8 Pelli, D.G.  200
Nikolaev, A.R. and van Leeuwen, C,  969, 974 Pellicano, E. 727
Nisbett, R.E. and Miyamoto, Y.  722 Pellicano, E. and Burr, D.  722
1072 INDEX OF NAMES

Penrose, R.  1039 Quinn, P.C. 696, 698–9


Penrose, R. and Hameroff, S.  1039 Quinn, P.C. and Bhatt, R.S. 697–8, 699–700, 703–4
Pepperell, R., Succulus 914, 916n93 Quinn, P.C. and Eimas, P.D.  694
Perenin, M.T. and Vighetto, A.  677 Quinn, P.C. and Schyns, P.G. 706–7
Perkel, D.J.  334 Quiroga, R.Q.  970
Perkins, D.  1018, 1035
Perrett, D.  584–5 Radonjić, A.  394, 475, 480
Perrinet, L.U. and Masson, G.S.  509 Radonjić, A. and Gilchrist, A.  402
Peters, R.A. and Evans, C.S.  856 Rahne, T. 645
Peters, R.A., Hemmi, J.M., and Zeil, J.  856 Rainville, S.J.M. and Kingdom, F.A.A.  112
Peterson, M.A.  72, 260, 268, 270, 272n7, 276, 350, Ramachandran, V.S.  849–50, 1008
351, 1038–9 Ramachandran, V.S. and Gregory, R.L.  806
Peterson, M.A. and Enns, J.T. 262, 270–1, 272n6 Ramon, M.  761–2, 762
Peterson, M.A. and Gibson, B.S.  267, 269, 270, 994 Ramon, M. and Rossion, B.  762
Peterson, M.A., Harvey, E.H., and Weidenbacher, Ramsden, B. 309
H.L. 268, 269 Rao, R.P.N. and Ballard, D.H.  722, 974
Peterson, M.A. and Hochberg, J.  994 Raphael 888
Peterson, M.A. and Lampignano, D.L.  262, La Madonna di San Sisto  888
270, 272n7 Rappaport, M. 116
Peterson, M.A. and Rhodes, G.  260 Rausch, E.  10, 14, 15
Peterson, M.A. and Salvagio, E.  265 Rauschenberger, R.  301, 990
Peterson, M.A. and Skow, E.  269, 270, 273–4, 989 Reed, S.K.  935
Petter, G.  295, 307–8, 511 Rees, G., Keiman, G., and Koch, C.  806
Pfurtscheller, G. and Lopes da Silva, F.H.  994 Reeves, A. and Sperling, G.  822
Piéron, H.  497, 498 Remondino, C.  424, 425
Pikler, J. 491 Ren, X. 938
Pinto, J. and Shiffrar, M.  578–9 Renier, L.  662, 663
Pitts, M.A.  349 Rensink, R.A.  33
Pizlo, Z. 938 Rensink, R.A. and Enns, J.T.  215, 970
Plaisier, M.A.  622 Renvall, P. 525
Platteau, J. 4 Restle, F. 494
Plaza, P.  660, 663 Revonsuo, A. 998
Plomp, G.  301, 970, 990 Richards, W.A.  1049
Plomp, G. and van Leeuwen, C.  990 Richards, W.A. and Bobick, A.  941–2
Poggio, T.  964, 1049 Richler, J.J.  760, 764, 765, 769, 961
Poirier, C.C.  660, 661, 663, 664 Riddoch, M. 742
Polat, U.  976, 977 Riesenhuber, M. 764
Poljac, E.  579, 582 Riesenhuber, M. and Poggio, T.  963
Pollock, J.  880, 889, 911n78, 912 Ringach, D. 977
Echo No. 25 912 Ripamonti, C. 402
Pomerantz, J.R.  103, 138, 284, 693, 973, 980, 981 Rissanen, J.J.  1027
Pomerantz, J.R. and Garner, W.  955 Ritter, W. 610
Pomerantz, J.R. and Kubovy, M.  1034 Rivest, J. 762
Pomerantz, J.R. and Portillo, M.C.  90, 102, 104 Rizzolati, G. and Craighero, L.  872
Pont, S.C. 623–4, 625, 1059 Roach, N.W.  830
Poole, A. 870 Roberts, B. 674
Poort, J.  324, 325, 326, 329–30 Roberts, B., Harris, M.G, and Yates, T.A.  811
Pöppel, E. 997 Roberts, K. and Humphreys, G.W.  749
Portilla, J. and Simoncelli, E.P.  151, 174, 176, 179, 180 Robertson, C.E.  721, 727
Portillo, M.C.  103 Robles-De-La-Torre, G. and Hayward, V.  626
Porway, J., Wang, Q., and Zhu S.-C.  922 Rock, I.  305, 401–2, 426, 744, 1038
Posner, M.I. and Keele, S.W.  935 Rock, I. and Brosgole, L. 77–8
Powell, G., Bompas, A., and Sumner, P.  448 Rock, I. and Gutman, D.  419
Preston, S.D. and de Waal, F.B.M.  872 Rock, I. and Palmer, S.  701
Proulx, M.J.  662 Rodriguez, E. 998
Prusinkiewicz, P. and Lindenmayer, A.  878 Roelfsema, P.R. 325–7, 347–8, 356, 974
Psotka, J. 877 Roelfsema, P.R., Lamme, V.A., and Spekreijse, H.  806
Ptito, M. 663 Romei, V. 720
Pylyshyn, Z.  495n2, 1022, 1039 Ropar, D. and Mitchell, P.  727
Qiu, F.T.  331 Rosch, E.H.  935
Qiu, F.T. and von der Heydt, R.  977 Rosebloom, W. and Arnold, D.H.  830–2
Qiu, F.T., Sugihara, T., and von der Heydt, R.  348 Rosenbach, O. 295
INDEX OF NAMES 1073

Rosenblum, M.G.  993 Sekuler, A.B.and Palmer, S.  299, 300, 990


Rosenholtz, R.  173, 179, 199, 201, 938 Sekuler, R.  515, 790
Rosenthal, J., Raising the Flag on Iwo Jima  907, 906 Self, M.W.  321, 329, 335
Rossetti, Y. 633 Self, M.W. and Roelfsema, P.R.  347–8
Rossi, A.F.  327–8 Sergent, C. 999
Rossion, B. 762 Serre, T., Oliva, A., and Poggio, T.  964
Rossion, B. and Boremanse, A.  764 Seymour, K. 79
Roufs, J.A.J.  826 Shamma, S.A.  602, 612, 613
Rowland, H.M.  856 Shams, L., Kamitani, Y., and Shimojo, S.  813, 832
Rubens, P.P.  893n21 Shannon, C.  1019, 1029
Rubin, E.  267, 287, 295, 363, 989 Shapeley, R.M. and Victor, J.D.  826
Rubinov, M.  991, 992 Shaw, J.C.  994
Rumelhart, D.E.  1022, 1050 Shepard, F. 810
Ruskin, J. 915 Shepard, R.N.  442, 941, 962
Russell, B.C.  926 Sherrington, C.S.  363
Russell, C. and Driver, J.  745 Sheth, B.R. and Pham, T.  788
Rutherford, E. 714 Shevell, S.K., St Clair, R., and Hong, S.W.  449
Ruxton, G.D.  844 Shiffrar, M. and Pavel, M.  514
Shimojo, S. 511
Sacks, O. 761 Shingen 867, 869
Sadato, N. 658 Shin-tsu-Tai, S. 869
Saenz, M. 744 Shohet, A.J.  850
Safford, A.S.  582 Shomstein, S. 745–6
Saidel, W.M.  849 Shrimpton, J. 901
Sakuteiki 867 Si, Z. and Zhu, S.-C.  929
Salapatek, P. 693 Siegel, M. 996
Salin, P.A.  334 Sigman, E. and Rock, I.  427
Salvagio, E.M.  276 Sigman, M.  212, 213
Sampaio, A.C.  308 Silverstein, S.M. and Keane, B.P.  726
Sanabria, D. 644 Simione, L.  996, 999
Sato, M. 642 Simmons, D.R.  383
Saund, E. 921 Simon, H.A.  1029
Saunders, B. 444 Singer, W. 998
Sawada, T. 110 Singh, M.  296, 308, 311
Schenk, T. and Milner, A.D.  684 Singh, M. and Anderson, B.L.  401, 468
Scherf, K.S.  137, 727 Singh, M. and Fulvio, J.M.  239–40, 1014
Schira, M.M.  549 Singh, M. and Hoffman, D.D.  418, 939
Schirillo, J.A. and Shevell, S.K.  455–6 Skarda, C.A. and Freeman, W.J.  982
Schneider, K.A. and Bavelier, D.  825 Slawson, D.A.  867, 868, 869
Schölvinck, M.L. and Rees, G.  806 Smeets, J.B. and Brenner, E.  683
Schopenhauer, A.  1056 Smit, D.J.  994
Schrödinger, E.  1048 Smith, E. and Medin, D.  935
Schulz, M.F. and Sanocki, T.  78 Smith, J.T.  867
Schulze, F. and Windhorst, E.  870 Smith, W.S.  821
Schumann, F. 8 Smithson, H. and Mollon, J.  821
Schurger, A.  808, 809 Snyder, J.S.  611
Schwaninger, A. 765 Sobel, K.V. and Blake, R.  809
Schwarzkopf, D.S.  719, 725, 806 Sober, E.  1037
Schwarzkopf, D.S., and Rees, G.  809 Solomon, J.A.  154
Schwarzkopf, D.S., Song, C., and Rees, G.  812 Solomon, J.A., Morgan, M., and Chubb, C.  159
Schweickert, R. 950 Solomonoff, R.  1019–20, 1029
Schweickert, R. and Townsend, J.T.  950 Song, C., Schwarzkopf, D.S., and Rees, G.  812
Schyns, P.G.  706, 935 Soska, K.C. and Johnson, S.P.  301
Scott-Samuel, N.E.  857 Soto-Faraco, S. 644
Searcy, J.H. and Bartlett, J.C.  962 Spehar, B.  215, 875
Searle, J.  1050 Spelke, E.S.  695–6, 1054
Sebastian, T. and Kimia, B.  248, 250 Spence, C.  645, 832
Seghier, M. 309 Spence, C. and Chen, Y.-C.  648
Seghier, M. and Vuilleumier, P.  309 Spencer, J. 727
Sejnowski, T.J. and Hinton, G.E.  267 Spencer, K.M.  720, 726
Sekuler, A.B.  298, 764 Sperandio, I., Chouinard, P.A., and Goodale, M.A.  812
Sekuler, A.B. and Bennett, P.J.  66–7 Sperandio, I., Lak, A., and Goodale, M.A.  812
1074 INDEX OF NAMES

Sperry, R.W.  13 Thornton, I.M.  578, 582


Sporns, O. and Zwi, J.  991 Thorpe, S. 324
Stam, C.J.  992 Thurman, S.M.  581
Stanley, D.A. and Rubin, N.  309, 810 Thurstone, L.L.  45
Stein, E.  1054 Tinbergen, N.  1050–1
Stein, T. 789 Todd, J.  49, 283
Steiner, G. 890 Todorović, D. 405
Steriade, M. 999 Tokunaga, R. and Logvinenko, A.D.  454
Sterzer, P. and Kleinschmidt, A.  791 Tommasi, L. 308
Sterzer, P., Haynes, J.-D., and Rees, G.  808 Townsend, J.T.  89
Stevens, M.  854, 855, 857 Townsend, J.T. and Altieri, N.  964
Stevens, M. and Cuthill, I.C.  844, 848, 854 Townsend, J.T. and Ashby, F.G.  949
Stevens, M. and Merilaita, S.  853 Townsend, J.T., Houpt, J.W., and Silbert, N.D.  964
Stevens, S.S.  45 Townsend, J.T. and Nozawa, G.  950, 959
Stevens, S.S. and Stone, G.  627 Townsend, J.T. and Thomas, R.D.  949
Stevin, S. 47 Treder, M.S.  118
Stewart, L.H.  807 Treder, M.S. and van der Helm, P.A.  111
Stilp, C.E.  935 Treisman, A, 970
Stins, J. and van Leeuwen, C.  978 Treisman, A. and DeSchepper, B.  272n7
Stone, J.V.  827 Treisman, A. and Gelade, G.  89, 823, 970, 973
Stöttinger, E. 676 Treisman, A. and Sato, S.  970
Stoughton, C.M. and Conway, B.R.  441–2 Treisman, M. 822
Stronmeyer, C.F. and Martini, P.  826 Troje, N.F.  579
Stroop, J.R.  980 Troje, N.F. and Westhoff, C.  581
Strother, L. and Kubovy, M.  60, 68 Trommershauser, J.  1021
Struber, D. and Stadler, M.  787 Troscianko, T. 447
Stuart-Fox, D. and Moussali, A.  844 Tse, P.U.  296, 298–9, 301–2, 543, 544, 563, 565, 566
Stumpf, C.  5–6, 29 Tsermentseli, S. 727
Stumpf, P. 504 Tsuchiya, N. 789
Stupina, A.I.  103 Tsuchiya, N. and Koch, C.  804
see also Cragin, A.I. Tsunoda, K. 970
Sugita, Y.  301, 972, 975 Turatto, M., Sandrini, M., and Miniussi, C.  806
Sullivan, L.H.  870 Tversky, A. 103
Sumner, P. 445 Tversky, T., Geisler, W.S., and Perry, J.S.  220
Sun, L. 720 Twain, M.  1055
Supèr, H.  321, 331 Twardowsky, K.  28–9
Supp, G.G.  995
Susilo, T. 762 Uhlhaas, P.J.  720, 723, 724, 726
Sussman, E.S.  610, 611 Uhlhaas, P.J. and Mishara, A.L.  726
Suzuki, T.  865, 879 Ulanovsky, N. 609
Synek, E. 875 Ullman, S.  531–3, 564, 938
Szalárdy, O.  610, 611 Ungerleider, L.G. and Mishkin, M.  972, 1032
Unrein, S., Rapid East  914
Takeichi, H. 299 Usher, M. and Donelly, N.  822
Tallon-Baudry, C.  995, 998
Tallon-Baudry, C. and Bertrand, O.  720, 993 Valenza, E. and Bulf, H.  310
Tampieri, G. 530 Vallortigara, G.  527, 1054
Tanaka, J.W. and Farah, M.J.  759 Vallortigara, G. and Bressan, P.  511–2
Tanaka, J.W. and Sengco, J.A.  758 Vallortigara, G. and Regolin, L.  581
Tanaka, K. 970 Van de Cruys, S.  722
Tanizaki, J.  864, 877 van de Kamp, C. and Zaal, F.T.  683–4
Tansey, M., The Innocent Eye Test 891 van den Berg, D.  991, 992
Tarr, M.J. and Bülthoff, H.H.  921 van den Berg, R.  200
Taylor, R.P., Micolich, A., and Jonas, D.  880 van der Helm, P.A.  1029, 1033, 1039, 1040, 1041
Terada, K. 627 van der Helm, P.A. and Leeuwenberg, E.L.J.  115, 116,
Ternus, J.  10, 490–1 117, 121
Tetreault, N.A.  383 van der Horst, B.J.  626
Thayer, A.H.  843 van der Horst, B.J. and Kappers, A.M.L.  626
Thayer, G.H.  843, 851–2, 853, 857 van der Vloed, G.  120–1
Theusner, S.  579, 581 van Doorn, A.J.  820
Thirkettle, M. 581 van Ee, R.  785, 787, 790, 802
Thompson, D.  227, 873, 876 Vangeneugden, J. 584
INDEX OF NAMES 1075

van Leeuwen, C.  982, 991, 996 Watt, R.J. and Phillips, W.A.  561n1
van Leeuwen, C. and Bakker, L.  981, 996 Watts, D. and Strogatz, S.  991
van Leeuwen, C. and Smit, D.J.A.  994 Weber, E.H.  42, 117, 629, 680
van Lier, R.  298, 304–5, 989 de Weert, C.M.M and van Kruysbergen, N.A.W.H.  449
van Lier, R. and De Weert, C.M.M.  779 Weil, R.S.  806
van Lier, R. and Wagemans, J.  298 Weiss, Y.  1013
van Lier, R.J., van der Helm, P.A., and Leeuwenberg, Wenger, M.J. and Ingvalson, E.M.  961
E.L.J. 1034–5 Wenger, M.J. and Townsend, J.T.  959
Van Loon, A.M.  714 Werner, H. 872
van Noorden, L.P.A.S.  605–6 Wertheimer, M.  3–5, 6, 488, 871, 1028
van Polanen, V.  622 ‘Gestalt laws’  9–10
Vanrie, J. 578 good continuation principle  239
VanRullen, R. and Koch, C.  1046 on perceptual grouping  57, 60, 61–2, 66, 76, 79–80,
Van Tonder, G.J.  877, 878 560, 562–3
Van Tonder, G.J. and Lyons, M.J.  867 on transparency  417
van Wassenhove, V.  997 on wholes and parts  29–30
Varela, F.  997, 998 Westland, S. and Ripamonti, C.  452–3
Vecera, S.P.  264, 268, 989 Westwood, D.A. and Goodale, M.A.  676
Vecera, S.P. and Farah, M.J.  270 Weyl, H. 880
Vecera, S.P. and O’Reilly, R.C.  275, 350, 351 Wheatstone, C. 777
Vecera, S.P. and Palmer, S.E.  264 White, A.L., Linares, D., and Holcombe, A.O.  825, 826
Vickery, T.J.  71 White, M. 405
Vickery, T.J. and Jiang, Y.V 75–76 White, S.J. and Saldaña, D.  729
Vierling-Claassen, D.  995, 996 Whittle, P.  778, 779
Vischer, R. 872 Wijntjes, M.W.A.  625, 631
Vladusich, T. 480 Wilder, J.  1016
Vogels, I.M.L.C.  625–6 Wilder, J., Feldman, J., and Singh, M.  251–2
Vogels, R. 937 Williams, C.B. and Hess, R.F.  199
von der Heydt, R.  309, 343, 356, 366 Williams, K. 337
von der Heydt, R. and Peterhans, E.  975 Williams, M.A.  807
von der Malsburg, C.  989, 993 Wilson, H.R.  507, 783
von Frisch, K.  1050–1 Wilson, J.A. and Anstis, S.M.  825
von Hildebrand, A.  902 Windmann, S. 349
von Holst, E. and Mittelstaedt, H.  1053 Winkler, I.  603, 603–4, 607, 608, 610–1, 612
von Skramlik, E.  632 Winkler, I. and Cowan, N.  603
von Stein, A.  995 Witkin, H.  716, 724
von Uexküll, J.  1050n16,  1051, 1052–4, 1060 Wittman, M. 821
Vrins, S.  301–2 Witzel, C. and Gegenfurtner, K.R.  444
Vroomen, J. and Keetels, M.  828 Wohlschlager, A. 515
Vuilleumier, P.  747, 807 Wokke, M.E.  810
Wolfe, J.M.  804, 970
Wagemans, J.  15, 16, 21, 48, 61, 88, 89, 108, 110, 111, Wolfe, J.M. and Cave, K.R.  972
114, 118, 119, 120, 121, 129, 139, 169, 195, 262, Wolfe, J.M. and Horowitz, T.S.  89, 103
294, 298, 364, 398, 488, 530, 569, 602, 607, 639, Wolff, W. 394
691, 714, 717, 723, 871, 723, 936, 937, 938 Wolfson, S.S. and Landy, M.S.  323
Walker, P. 788 Wolpert, D.M.  586, 587
Wallace, M.T., Wilkinson, L.K., and Stein, B.E.  830 Wong, Y.K. and Gauthier, I.  765
Wallach, H.  13, 392–3, 428, 504, 511, 512, 530–1, 547 Wood, G., American Gothic  909, 908
biographical notes  10 Wouterlood, D. and Boselie, F.  296, 298
Wallach, H. and O’Connell, D.N.  531 Wrobel, A. 996
Wandell, B.A., Dunmoulin, S.O., and Brewer, A.A.  809 Wu, T. and Zhu, S.-C.  921
Wang, B. 283 Wuerger, S.M., Maloney, L.T., and Krauskopf, J.  437–8
Wang, J. 928 Wulf, F. 9
Wang, L. and Jiang, J.  582
Wang, L., Weng, X., and He, S.  809 Xian, S.X. and Shevell, S.K.  449
Wang, S., Wang, Y., and Zhu, S.-C.  926
Wanning, A. 979 Yabe, H. 611
Ward, J. and Meijer, P.  658–9 Yabe, Y. 516
Ward, R. 741 Yamada, T. and Fujisaka, H.  993
Watanabe, K. and Shimojo, S.  643 Yanagi, S. 865
Watkins, S. 813 Yang, E.  726, 789
Watt, R. 197 Yang, E. and Blake, R.  804, 807
1076 INDEX OF NAMES

Yang, Z.Y. and Purves, D.  1036 Zaretskaya, N.  721, 791, 803, 805
Yang,E., Zald, D.H., and Blake, R.  807 Zemel, R.S.  76
Yao, R. 645 Zhang, N. and von der Heydt, R.  938
Yarbus, A.L.  397 Zhaoping, L. 355
Yarrow, K. 825 Zhou, H.  267, 328, 345, 977
Yazdanbakhsh, A. and Livingstone, M.S.  358 Zhou, H., Friedmann, H. and
Yen, S.-C. and Finkel, L.H.  197 von der Heydt, R.  366, 367
Yin, C. 299 Zhou, K. 283
Yin, R.K.  758 Zhou, W. 790
Yo, C. and Wilson, H.R.  508 Zhou, W. and Chen, D.  802
Yokoi, I. and Komatsu, H.  972 Zhu, S.-C.  922, 928, 929
Yong, E. 729 Zhu, S.-C. and Mumford, D.  922
Young, A.W.  742, 760 Zimba, L.D. and Blake, R.  807
Young, M.P. and Yamane, S.  970 Zipf, G.K.  875
Young, T. 436 Zipser, K. 974
Yovel, G. and Duchaine, B.  763 Zuckerman, C.B. and Rock, I.  691–2
Zuidhoek, S. 633
Zaidi, Q. and Li, A.  447 Zylinski, S.  310, 851
Zanforlin, M.  525–7 Zylinski, S., Osorio, D., and
Zangenehpour, S. and Zatorre, R.J.  663 Shohet, A.J.  850, 856
Subject Index

Note: page numbers in italics refer to figures. References to footnotes are indicated by the suffix ‘n’,
followed by the note number, for ­example 267n4.

3-dimensional object completion  298–9 identity hypothesis  310–11


3-dimensional perception, influence on colour infant research  301–2
perception 456 influence of knowledge  311
3-dimensional shape local completions  296
camouflage of  855–6 neural correlates  300–1
interaction with perceived gloss  477–8 in non-primate animals  302
tunnel effect  302
Abney effect  437 AMPA receptor  334
absolute orientation, effect on symmetry detection  111 role in figure–ground modulation  335
accident, role in design  871 analysis-by-synthesis approach to body motion
achromatic transparency  413–15 perception 587–8
see also transparency anchoring theory of lightness  400, 407–8, 455, 470–1
action and perception dissociation evaluation of 475
and size illusions  673–7 And-Or Graph  922
studies of configural processing of shape  678–80 And-Or-Tree (AOT)  922–4, 923
studies of object size resolution  680–5 human figures case study  929–30, 931
adaptation 76 mathematical formalism  924
adaptationist approach to perception  466–7 scene case study  926–29
adaptive windows  400 structure learning by parameter estimation  924–6
advancing region motion  263, 265 animal awareness  1050–1, 1052, 1054–5
aesthetic experience  893 animal detection, role of contour shape  207–8,
affect, effect on binocular rivalry  787–9 209, 223–4
affine structure-from-motion theorem  533 anti-symmetry 119
affinity, and dynamic grouping  564 aperture problem  504–5
affinity networks, role in object recognition  572 and figure–ground relationships  511–14
affordances  872, 972, 1052 and kinesthetic information  515–16
after-effects and multiple sensory interactions  514–15
asynchrony 830–2 structure-blind strategies  507–9
in body motion perception  579 and terminator classification  511
colour and form  448–9 top-down factors  516
of curvature  625–6 apparent motion  4, 488
agnosia 743 emergence 89
algorithmic information theory (AIT)  1027, perceptual grouping  72–4
1029, 1031 apparent rest phenomenon  513–14
algorithmic probabilities  1033 APV, effect on figure–ground
Fundamental Inequality  1037 modulation 335–7, 336
see also Kolmogorov complexity arbitrariness of features  935, 941–3
Alhambra  897, 896 Aristotle illusion  812–13
allocentric neglect  742 art
allocentric reference frames  633–4 definition of  886–92
alpha activity  993–4 see also visual art
pattern dynamics  994–5 aspect ratio, effect on lattice perception  978–9
alternating-motion display  820–1, 822–3 assimilation, and contrast  407, 408
ambiguous stimuli association field concept  213, 214, 782–3
cross-modal interactions  649 cellular physiology  194–5
see also bistable perception association field models  197–8
amodal complements  294n2 association fields  190, 191–2
amodal completion  294–6, 812, 1034–5 as integration fields  200
2D versus 3D  298–9 linking process, nature and site  192–4
and dynamic grouping method  568, 569–70 associative grouping  76
experimental paradigms  299–300 astrocytes, role in neuronal function  382–3
global completions  296–8 asymmetric matching task  454
1078 Subject Index

asynchrony after-effects  830–2 the human condition  1054–6


asynchrony-tuned neurons  831 inner world  1053–4
attention mental world  1047
and auditory perception  611 Umwelt 1052–3
and averaging  160–1 as a user interface  1056–60
and awareness  738 see also consciousness
and binocular rivalry  786–7 axial-based shape representation  245–6, 248–9,
and BOWN-sensitive activity  348–9 1014–15
and figure assignment  267–8 comparison of animal and leaf categories  250–2, 251
and figure–ground organization  329–31, 330, 351
and perceptual grouping  748–9 background matching  846
and perceptual organization  736–7 Balint’s syndrome  68
and spatial experience  821–2 band patterns  894–6
and transformational apparent motion  543 barber-diamond displays  511–12
visual 737–9 barberpole effect  509
and visual holes  290 edge classification and occlusion  511
attention, lack of, and perceptual grouping  744–8 edge classification beyond disparities  511–12
attentional blink  999 psychophysics of orthogonal and terminator
beta activity  996 signals 509–10
attentional enhancement, infants  719 sliding effect  512
attentional priorities, influence of perceptual base grouping  974
grouping 739–44 basic features  89
attentional priority map  737–8 Bauhaus design school  869–70
attentional selection, individual differences  721 Bayesian inference  467, 1009–11, 1021–2
attention deployment, time course of  977–981 basic calculations  1012–14
attention spreading  979 and binocular rivalry  791
inhibition of 994 competence versus performance  1020–1
audition, emergent features  104 computation of the posterior  1017–19
auditory bistability  775 decision making and loss functions  1019–20
auditory distance, compensation for  827–8 and global bias  722–3
auditory event-related brain potentials as a model of perception  1011–12
(AERPs) 610–11 and perceptual organization  1014–16
auditory perception, sensory substitution  656 priors 1015–17
auditory perceptual objects  603–4 probabilistic features  936, 937
neuroscience view of  611–12 and simplicity principle  1032–5, 1036
auditory perceptual organization  601 source of hypotheses  1020
conclusions and future directions  612–13 Bayesian updating  1015
extraction and binding of features  604–5 Bayes Occam  1018
grouping, cross-modal effects  643–6 Bayes’ postulate  1016n5
grouping principles  602–3 Bayes’ rule (Bayes’ theorem)  1009–10
interaction with visual perception  640–3 Benary effect  404–5
perception as inference  603 Berkeley Segmentation Dataset (BSD)  207
stimulus specific adaptation and differential Berlin versus Graz  34
suppression 609–10 Benussi–Koffka dispute  31–2
auditory pitch, and object size  648–9 descriptive and genetic inquiries  32–3
auditory scene analysis  605 beta activity  995–6, 1000
competition/selection stage  606–8 coupling with slow waves  999–1000
grouping stage  605–6 evoked activity  996–7
perceptual organization  608–9 beta motion  89, 488
auditory stimuli Bezold–Brücke effect  437
binaural rivalry  802 biased competition, in figure–ground perception  273
influence on binocular rivalry  790 bias–variance tradeoff  1018
McGurk illusion  813 bilateral symmetry, in human design  880
auditory streaming paradigm  605–7 binaral rivalry  802
autism  716, 720, 726–8 binary (Boolean) features  933
attentional selection  721 binaural rivalry  802
enhanced local processing  721 binocular rivalry  776, 777, 801
and predictive coding  722 adapting reciprocal inhibition model  784–5
averaging of dimensions  158–60 Bayesian view  791
avoidance-of-coincidences principle  1038 continuous flash suppression studies  788–9, 804
awareness 738, 1046 dynamics of  783–6
animals  1050–1, 1052, 1054–5 effect of noise  785, 804
bridging hypotheses  1048 effects of interpretation and affect  787–9
Subject Index 1079

figure–ground segregation  778–9 boundaries


multisensory interactions  790 and classical model of features  935
perceptual grouping  779–83 and probabilistic model of features  940–1
phase durations  785–6 boundary detection  323, 322, 324
predominance 786 and figure–ground modulation  328
role of attention  786–7 latency 326
and study of unconscious processing  802–5 lateral and feedforward inhibition  329
tipping factors  784 rapid detection tasks  324–5
underlying cortical networks  790–1 boundary inference, from contour
biological motion perception  589–590 geometry 367–70
bottom-up versus top-down processing  582–3 bounding contours  207
computational and neural models  585–9, 586 role in animal detection  207–8, 209
historical background  575–7 BOWN see border-ownership
neural mechanisms  584–5 Boynton illusion  445–6
perceptual spaces  579 Braille reading  658
phenomenological studies  577–9 Brainport device  656
recognition of body motion  580–2 brain time, and time-scales  824–5
relevance of learning  583–4 brain time theory  823
bistable perception  775–7, 776 breathing illusions  513, 514
cross-modal interactions  649 bridging hypotheses  1048
of figure–ground organization  347, 348, 349, 357,
358, 359 C1 activity, effect of aspect ratio changes  979
neural processes  803 camouflage 843
and study of unconscious processing  800–5 concealing motion  856–7
see also binocular rivalry cryptic coloration and background
blackshot 152 matching 847–52
blindness in cuttlefish  845, 850–1
congenital, restoration of sight  536 facial make-up  899
see also sensory substitution in flatfish  845, 849–50
blindsight historical studies  843–4
attention/awareness dissociation  745 multiple backgrounds problem  851–2
perceptual grouping  66 obscuring 3D form  855–6
Block Design task  715, 716, 724 obscuring edges  852–5
and autism  727 principles of  846–7
body functions, awareness of  799–800 recent research  844
body image, fashions in  901, 910, 912 and symmetry  851
body motion perception  575, 589–590 camouflage patterns  848, 850
bottom-up versus top-down carryover effects  76
processing 582–3 categorical perception  935
computational and neural models  585–9, 586 category-specific grouping rules  921–2
historical background  575–7 causal theory of perception  1047
neural mechanisms  584–5 Celtic symbols  893
perceptual spaces  579 central limit theorem  1016n6
phenomenological studies  577–9 change blindness paradigms  805–6
recognition of biological motion  580–2 change detection task studies  745–6
relevance of learning  583–4 chaotic itinerancy  994
bootstrap model of symmetry  119–21 chopstick illusion  513
border-ownership (BOWN)  248, 249, 328–9, 357–9, chronotopy, lack of  821
363–4, 366 closure 284
BOWN-sensitive neurons  343, 346–7, 348, 367 role in perceptual grouping  62, 220–1
brain activity and feedback involvement  347–9 and sound perception  603, 606–7
competitive signal processing  345, 347, 348 and visual search  378
computational modelling  354–7 CNQX, effect on figure–ground
extra fast processing mechanism  343, 345 modulation 335–7, 336
hierarchical organization and involvement of co-dimensions 1035
top-down feedback  349–51, 353 coding theory  494
levels of organization  365–6 coherence intervals  996–7
properties of  342–3, 344–5 relationship to stimulus pattern
border-ownership models  374–5 information 997–8
enclosure fields  378–80, 379, 381 coincidence, method of  49–52
feedback effects via LFP  380–3 coincident disruptive coloration  854
network propagation models  375–7 collateral sulcus  441
1080 Subject Index

collinear contours  715 influence of knowledge  311


collinearity modal 302–10
as an emergent feature  94, 95 complexity 871
as a non-accidental feature  937–8 complex shapes, BOWN signals  356
colour components, recognition by  920
as a feature  933 composite face paradigm  759–60, 768–9
as a Gestalt  103–4 studies in prosopagnosia  762
role in figure-ground assignment  265–6 composition, paintings  906
colour and form Computer Aided System for Blind People
in after-effects  448–9 (CASBliP) 656, 667
availability of colour-and luminance-defined computer vision
contours 445 and contour grouping  208–9
combination of colour-defined features  447–8 use of texture descriptors  179
organization imposed by colour  446–7 concavities, information content of  238
organization imposed by luminance-defined conceptual art  893–4
contours 445–6 concurrent grouping, of sound events  605
processing of colour- and luminance-defined conditional probability  1009
contours 444–5 cone excitation ratios, invariance of  452–3
colouration, camouflage patterns  848–9 cones, asymmetries in organization  440–1
physiologically controlled  849–51 configural cues  266n4
colour averaging  160 configural inferiority effect  90, 91
colour constancy  438, 450, 454–5 configural processing  955
colour contrast, relationship to colour dissociation between action and perception  678–80
constancy 454–5 face perception  766–8
colour conversions  451–3 versus featural processing  962
coloured after-effects  439–40 working axioms  952–3, 957, 959
colour grouping, interaction with symmetry configural properties  260–1, 263
detection 111 advancing region motion  265
colourimetry 436 articulating motion  263, 264–5
colour induction, and perceptual grouping  449 direct and indirect methods of
colour perception  436–7, 455–6, 456–8 experimentation 261–2
asymmetries in organization  440–1 edge-region grouping  264
configural effects  455–6 extremal edges and gradient cuts  263–4
correlates of material properties  453 lower region  264
dichromatic 439 part salience  262–3
dimensionality 437–9, 454 top–bottom polarity  264
later colour processing  441–2 configural superiority effect  90–1, 141, 143, 284
opponent-process theory  439–40 demonstration in infants  693–4, 695
organization imposed by cultural and linguistic establishment and quantification of emergent
factors 443–4 features 97–113
organization imposed by environmental connectedness principle, demonstration in infants  701
factors 442–3 connectivity 991–3
post-receptoral organization  440 as an emergent feature  94, 95
Colour Wagon Wheel illusion  447 consciousness
common fate principle  612, 613 access to  799–806
grouping by illumination  404 of bodily functions  799–800
perceptual grouping  58, 60, 66–7 change blindness paradigms  805–6
and sound events  602–3 masking procedures  800
common motion grouping, demonstration in and multistable perception  800–5
infants 696 neural correlates  806, 808–9
common region principle phenomenological contents of  811–13
demonstration in infants  701–3 unconscious perceptual organization  806–11
perceptual grouping  64–66, 65 see also awareness
communication theory  1029 constancy
comparison, methods of  45 of colour  438, 450, 454–5
competence theories  1021 of lightness  450
competition, in figure–ground perception  272–5 in shadows and layers  426
complete identification paradigm  957 constancy hypothesis  391
complete mergeability principle  299 constructivism 903
completions 294–6 context sensitivity, probabilistic features  942, 943
amodal 296–302 contextual constraints, features  935
DISC model  311 contextual modulation  975–6
identity hypothesis  310–11 long-range 976
Subject Index 1081

time course of  976–7 cross-modal perceptual organization  639, 650


continuous flash suppression (CFS)  788–9, 804 interactions between modalities  640–3
contour extrapolation  239–41, 240 intersensory Gestalten  646–9
contour fragmentation  208–9, 210 and intramodal perceptual grouping  643–6
contour grouping  207 and rate of stimuli presentation  645–6
computational framework  209–10 sensory substitution  657–8
computational models  226 see also sensory substitution
computer vision problems  208–9 crowding
generative models of shape  226–8, 229 and contour integration  200
global contour extraction  216–18 definition of  177n2
global shape cues  220–2, 229 see also visual crowding
local orientation coding  210–11 cryptic camouflage  843, 846, 847–52
pairwise association  211–16 Cubism 912
role in object perception  207–8, 209 cue combination, in contour grouping  216
role of feedback  222–6, 228–9 cueing paradigm, studies of holes  290
timing of events  223–6 cultural differences, and local versus global bias  722
contour integration  62, 63, 200–201 culture, interaction with colour perception  444
association field concept  191–2 curvature
Bayesian inference  1013–14 force-induced perception of  626
cellular physiology  194–5 haptic after-effects of  625–6
and crowding  200 haptic illusions of  625
functional imaging  195–6 haptic perception of  622–5
linking process, nature and site  192–4 role in contour extrapolation  239–41
and psychophysical flanker facilitation  199–200 role in perceptual grouping  68–9
quantification 189–191 curvature maxima
snake, ladder, and rope configurations  191 information content of  236–8, 252
spatial extent  199 minima rule  243–5
contour integration models positive and negative  243
Association Field models  197–8 cutaneous rabbit illusion, modulation by visual
filter-overlap models  198–9 stimuli 645
contour interpolation  241–2 cuttlefish, camouflage  844, 845, 850–1
contours
generative model of  237–8 Dadaism 912
information content of  236–8, 252 Dalmatian dog picture  913
interactions with region geometry  246–52 Daseinsform 902
part-based representations  242–6 dazzle coloration  856–7
contrast polarity, role in contour grouping  215 decaying curvature behaviour, contour
contrast suppression  810, 811 extrapolation 240
convergence 970 decisional separability (DS)  956, 961–2
converging operations  99–100 decisional stopping rules  950–1
convexity decision criteria, and temporal experience  831
as a grouping cue  221–2 decomposition models of lightness  399, 407
and holes  289–290 delta activity  998–9
information content of  238 depth, as an emergent feature  94, 95–6
role in figure-ground assignment  265–6, 355–6, 357 depth averaging  160
coordination 871 depth cues  267n4
coplanarity, and lightness perception  394, 401–2 in perception of holes  286
core systems  1054–5 depth estimation, pictorial shapes  49
corpus callosum, role in perceptual depth order perception
grouping 782–3 computational modelling  356
correspondence problem  72 functional imaging  351–2
cortical hierarchy, individual differences  720–1 depth perception
cortical rhythms, role in perceptual bias  719–20 relationship to lightness  394
cortical size, role in perceptual bias  719, 725 and sensory substitution  662, 663
countershading, role in camouflage  855–6 and transparency  418–19
craftsmanship 864 depth proximity, effect on lightness  403
crime-solving methods  1049 depth segregation, interaction with symmetry
criterion shifts  831 detection 111
croquis 908 description length (surprisal)  1018
cross-modal correspondences, as intersensory descriptive minimum principle  142–3, 1028
Gestalten 648–9 see also simplicity principle
cross-modal dynamic capture task  643–4 descriptive psychology  27–8
1082 Subject Index

design 863–4 double-pointing model of grasping  683


future study directions  875–6 dual-task experiments, perceptual grouping  744–5
Gestalt principles  869–72 duck-rabbit illusion  913
Japanese 864–9 dungeon illusion  406
natural images  873–5, 874 duo organization  294
designed structure, measures of duplex perception, sound components  603
bilateral symmetry and self-symmetry  880 dynamical interactive models, of figure–ground
isovist theory  879 perception 275–6
natural mappings  880–1 dynamic causal modelling studies, in blind
structured empty space and medial axis individuals 664–5
representation 877–8 dynamic grouping  560, 561, 562–3, 563
stylistic visual signature  876–7 affinity and the surface correspondence
deuteranomalous colour vision  439 problem 564
development of perceptual organization  708–9 and amodal completion  568, 569–70
demonstrations of organizational phenomena in compositional structure  565
infants 693–6 direction of motion  563–4
flexibility of grouping principles  706–8 identifying new grouping variables  567, 569
Gestalt accounts  691–2 implications for object recognition  570–2
of hierarchical structure perception  135–8 state-dependence and super-additivity  564–5
initial eye movement evidence  693 dynamic grouping motion, versus transformational
learning accounts  692–3 apparent motion  565–7, 566
perceptual grouping via classical organizational dynamic occlusions  496
principles 696–700 dynamic systems theory (DST)  1028, 1039
perceptual grouping via modern organizational dyslexia, and magnocellular neurons  724
principles 700–3
relations among grouping principles  703–6 eating-hand illusion  971
dichromatic colour vision  439 ébauche 908
differential latency problems  824 Ebbinghaus illusion  715, 716, 810, 811, 812
compensation failures  825–6 dissociation between action and perception  674–7
differential suppression, in auditory perception  609–10 eccentricity, effect on symmetry detection  112
dimensions, separable versus integral  953–4 edge-assignment computation  353
directed tension  871 edge classification  396, 505–6, 511
direction, judgement of  156–7 edge-region grouping  69–70, 263, 264
direction-selective neurons  506–7 edge relatability  242
direct magnitude estimation  45 edges 189
DISC (Differentiation-Integration for Surface camouflage of  852–5
Completion) model  311, 355 egocentric neglect  742
modelling of bistability  357, 358 egocentric reference frames  633–4
Discobolus 890n13 electric field theory, Köhler  7, 13
discrete hypotheses, Bayesian inference  1013–14 electroencephalography (EEG), studies of translational
disjoint allocation principle, sound components  603 apparent motion perception  545
disorder 871 element connectedness  67–8
disruptive camouflage  846–7 embedded figures test (EFT)  715–16, 729
obscuring edges  853–5 and autism  727
distributed systems  989 emergence 88–9
connectivity issues  991–3 emergent features (EFs)  88
dorsal attention network  738 candidates in human vision  91–97, 93–4
dorsal stream  672, 972, 989, 1032 conclusions 104–5
unilateral lesions of  677 constraints 91
dot lattices establishment and quantification via configural
studies of apparent motion  72–3 superiority 97–113
studies of perceptual grouping  59–60, 68, 76, 715, and Gestalts  90–1
978–9, 997–8 hierarchy of  102–3
dots, modification of illusory shapes  302–3 in modalities other than vision  104
double-anchoring theory  400 unresolved issues and challenges  105
double-belongingness 415 emotional content, effect on binocular rivalry  787–9
figural conditions  416–19 empathy theory  872
topological condition  416, 417, 418 empty space, use in design  877–8
see also transparency enclosure fields  378–80, 379, 381
‘double blind’ experiments  44 end-stopped cells, as T-junction detectors  358–9
double flash illusion  657 environmental influences, role in perceptual bias  718, 725
double-intentionality 890n14 episcotister model of transparency  421–3, 424, 468–9
Subject Index 1083

equivalent illumination models (EIMs)  450, 470 feature specificity  333–4


evaluation of 475 gating by feedforward activity  334–5
equivalent noise paradigms  159–60, 161–2 feedforward processing  963
esquisse 908 boundary detection  325, 326, 329
‘Etch a Sketch’ toy  51 figural goodness  1029
ethological, core systems  1054–5 figural parsing, and transformational apparent
ethology 1050–1 motion 543
Euclidean metric  963 figure–ground assignment  248, 249, 259–60, 989
event-related desynchronization (ERD)  994 and the aperture problem  511–14
event-related potentials, studies in blind and deaf and binocular rivalry  778–9
individuals 658 competition 272–5
event time reconstruction  827 computational modelling  353–7, 358
compensation for auditory distance  827–8 configural properties  263–5
compensation for the length of tactile nerves  828–9 direct and indirect methods of
intersensory adaptation  829–32 experimentation 261–2
event time theory  824 dynamical interactive models  275–6
and simultaneity constancy  824 high-level influences  267–76
evoked activity  996–7 and holes  283, 284–5, 289–290
expectation-maximization (EM) algorithm  925 image-based ground properties  265–6
experimental phenomenology  23–4, 34, 41–3, influence of attention  351
46–7, 52–3 influence of familiarity  350, 351
methodologies 47–52 traditional view  260–6
external local sign  1058–9 figure–ground modulation (FGM)  323, 322
extinction, studies of perceptual grouping  740–2 and border-ownership  328–9
extrastriate human body area (EBA)  585 and boundary detection  328
extremal edges (EE)  263–4 effect of attention  329–31, 330
eye make-up  899 feature-specific feedback signals  333–4
eye movement studies, infants  693 gating of feedback effects by feedforward activity  334–5
laminar circuitry  331–3, 332
face perception  758–9, 768–9 latency 326
as an automatized attentional strategy  765–6 pharmacology of  335–7
binocular rivalry  788, 789 and region growing  325–7
configural versus featural  962 figure–ground organization experiments  66
evidence for holistic nature  759–63 filling-in 806
holistic account  763–5 models of lightness  471
interactivity between features and film grain  913n84
configuration 766–8 filter models of lightness  471
in prosopagnosia  761–3 filter-overlap models  198–9, 200–201
systems factorial technology studies  959–60 filter-rectify-filter mechanism, contour integration  200
unconscious processing  807 FINST theory  495n2
faces fixation, influence on figure assignment  267
as an emergent feature  94, 96 fixed action patterns (FAPs)  1051
holistic primacy  141 flanker facilitation, and contour integration  199–200
symmetry of  897 flank transparency  428
facial make-up –  899, 899–900 flash-beep illusion  813
familiarity flash suppression  788–9, 804
influence on figure assignment  268–70 flatfish, camouflage  845, 849–50
influence on figure–ground perception  350, 351 flat organization  922
fashion 897–902, 910, 912 flavour perception  648n7
featural processing, face perception  766–8 flicker, as an emergent feature  94, 96
feature attribution, role of reference frames  496–7 flowers, symmetries  118
feature extraction  1036 font design  869–70
features 933–4 forensic science  1049
classical versus probabilistic models  934–5 form-based grouping
see also probabilistic features development of  699–700
feature-tracking (FT) strategy  507–9 in infants  698–9
feedback connections form-based information, body motion perception  580
region growing  327 formlets 227
reverse hierarchy theory  973 form–motion interactions  491, 492–3, 541, 553
feedback effects, border-ownership  349, 376, 377 perceived rotation speed, size and shape effects  546
role of LFP  380–3 perceptually grouped objects  550–3, 551
feedback effects, region growing  326, 329 transformational apparent motion  542–6
1084 Subject Index

form pathway  588 perceptual separability  953, 956, 957, 961


Fourier spectra, natural images  875 systems factorial technology  949–51, 959–60
fractal patterns  875, 922 working axioms for configural perception  952–3,
frameworks 400 957, 959
creation of illusions  407 Gestalt psychology  3, 488, 714–15, 972
versus layers  407–8 adoption of phenomenological methods  24
as perceptual groups  401–3 and anatomical constraints  718–19
frequentism 1010–11 and design  869–72
frieze groups  894 on development of perceptual organization  691–2
functional imaging early history of  3–6
of amodal completion  300–1 and figure–ground assignment  260–1
in autism  721 grouping principles  57–64, 79–80, 92, 602, 921
of body motion perception  582, 585 historical evaluation  15–16
of connectivity  992–3 internal laws of perceptual organization  871
of contour integration  195–6 Köhler’s ‘physical Gestalten’ and somorphism  6–7
of depth order perception  351–2 Prägnanz, law of  494
of face perception  768 rise and fall of  9–15
of figure–ground perception  349 role of likelihood  717–18
of modal completion  309 theory of lightness  398–401
of multistable perception  790–1, 803 on transparency  415, 417
of perceptual grouping  747–8, 974 and visual awareness  1048
of rotation perception  549 Wertheimer’s ‘Gestalt laws’  8–9
of sensory substitution  663–5 wholes and parts  28–30, 129, 139
of transformational apparent motion Gestalt qualities  30–1
perception 544–6 Gestalts
of unconscious perceptual organization  808–9 colour as  103–4
functional networks  993 and emergent features  90–1
functional tone  1053 intersensory 646–9
fundamental inequality  1037 memes 900
fusiform body area (FBA)  585 as templates  1058–60
in visual art  898
gamma activity  996, 1000 Glass patterns  114–15
coupling with slow waves  999–1000 representation models of detection  115–16
event-related 998 spatial filtering  119
evoked activity  996–7 glia, role in neuronal function  382–3
gamma distribution, bistable perception  776–7 global advantage  130
gap transfer illusion  605 boundary conditions  130–1
garden design, Japanese  864–5 brain localization  131–2
Garner Interference (GI)  99–100, 766–7, 769, 962, source of 131
963, 980–1 global broadcasting  998
Garner speeded classification task, dissociation global completions  296–8
between action and perception  678 global contour extraction  216–18
gauge figures  50–2 global features  938
Gaussian prior distribution  1016 global motion thresholds, in autism  727
Gelb effect  397–8 global precedence effect  129–32
generalized common fate  66–7 demonstration in infants  694–5
general recognition theory (GRT)  955–7, 964 global properties
analysis of Thatcher illusion  957–9, 958 global–local paradigm  138–9
experimental evidence  960–1 versus holistic properties  141–3
general viewpoint assumption  1034 Global Workspace Theory  999–1000
generative models of shape  226–7, 377 gloss perception  153, 476–9, 480
evaluation  227–8, 229, 229 glutamate receptors  334–5
generic grouping rules  921 role in figure–ground modulation  335
genetic approach, Meinong  33 ‘God’s Eye View’  1047–8
geometry 963 good continuation principle  58, 61–2, 63, 213–14, 297,
geons 110 366–7, 632
Gestalt processing in 3-D  371–2
decisional separability  956, 961–2 contour extrapolation  239–41
Garnerian approach  953–5 and contour geometry  370
general recognition theory  955–7, 982–3 demonstration in infants  697–8
perceptual independence  949, 955, 961 development in infants  696
Subject Index 1085

shading analysis  372–3, 374 and convexity coding  289–290


and sound events  602 as ground regions  284–5
Gothic style  876 memory of  287–8
graphlets 921 ontology 281–3
grasping perception of, influencing factors  285–6
adjustment to object size  683–4 perception of underlying surfaces  286–7
double-pointing model of  683 topology  282, 283–4
and illusions of size  674–7 and visual search  288
and visual form agnosia  684 holism, intrinsic  975–7
and Weber’s law  681–3 holistic dominance  143
grey-level patterns, kinetic transparency  428 holistic processing, face perception  763–5, 768–9
grey levels, sensitivity to  152–3 holistic properties
ground properties, role in figure-ground versus global properties  141–3
assignment 265–6 primacy of  139–41
ground regions, holes as  284–5 holographic approach to symmetry  115–17, 118
Ground-Up Constant Signal Method  101–3 holographic bootstrap approach to
grouping cells  356 symmetry 121, 122
homogeneity 871
Haldane prior  1016 Hongatte 865
hand, anisotropy of  625 hue naming  46
haptic perception  621–2 human body, hierarchical organization  920,
of curvature  622–6 929–30, 931
and design  863 hybrid image templates (HITs)  929
emergent features  104 hypercolumn model of object representation  964
of length  627–8 hyper-emergent features  104
of line drawings  630–1 hyperstrings 1040–1
of shape  626–7 hypothesis space, Bayesian inference  1012
of spatial patterns  631–2 hysteresis 76
of spatial relations  632–4
of volume  628–9 identity hypothesis of completions  310
of weight  629–30 identity imposition see orientation stability
hard boundaries, and classical model of features  935 ifenprodil, effect on figure–ground
Hebbian learning, and sensory substitution  663 modulation 335–7, 336
hemispatial neglect  738, 741 illumination, grouping by  401, 402–5, 402
perceptual grouping without attention  745–7 illumination edges  394–6, 397
hermeneutics 890n15 illumination field  467
Hess effect  825–6 illusory contours  222, 295, 302–3, 812
hierarchical letters  130 effect of TMS  224–5
hierarchical organization  969–71, 981–2, 1038–9, 1040 influence of region-based geometry  249–50
background review  922 kinetic 308–9
human figures case study  929–30, 931 and stereokinetic effect  527–8, 529
opposing approaches  972–4 and unconscious processing  809–11
And-Or-Tree framework  922–4 illusory flash  657
scene case study  926–29 illusory line motion  542, 543
structure learning by parameter estimation in illusory volumes  303, 304
AOT 924–6 image-based models of perception  921
tiling method  919–920 image processing, models of texture
unresolved problems  971–2 segmentation 172–3
hierarchical structure  129, 129–32, 143 Impressionism 912
development of perception of  135–8 inattention paradigm  744–5
global versus holistic properties  141–3 incongruence effects  980
levels of structure and holistic properties  138 incremental grouping  974
microgenesis of perception  132–4 incremental rigidity scheme  532–3
primacy of holistic properties  139–41 individual differences in global–local
simplest stimulus organizations  1030–1 paradigms 714
hierarchy, in design  871 in autism  726–8
hMT+ construct validity  723–5
role in motion parsing  546 future research areas  728–9
transformational apparent motion perception  544 in schizophrenia  725–6
holes 281, 291–2 underlying principles  717–23
and attention  290 induced grouping  71
1086 Subject Index

infant research Japanese garden design  864–5


accounts of development of perceptual medial axes  877, 878
organization 692–3 symmetry 880
on amodal completion  301–2 visual organization  867–9
demonstrations of organizational phenomena  693–6 Japanese interior design  865–7, 866
eye movement studies  693 Jeffreys’ prior  1016
flexibility of grouping principles  706–8 Jesus, images of  889
on modal completion  310 jitter, effect on symmetry detection  112
perceptual grouping, relations among the Jordan Curve Theorem  207
principles 703–6 Julesz conjecture  171
perceptual grouping via classical organizational just-noticeable difference (JND)
principles 696–700 dissociation between action and perception  681–3
perceptual grouping via modern organizational Weber’s law  680–1
principles 700–3
inference 1008 Kanizsa figures  472–3
Bayes’ rule  1009–10 as emergent features  94, 96
see also Bayesian inference Kanizsa triangle  294, 295
information-integration theory  629 and unconscious processing  809–11
information theory, application to perceptual Kanizsa-type modal completions  306–7
grouping 80–1 katagami 869
innate knowledge  1017, 1050 kinesthetic information, role in solution of aperture
inner world  1053–4 problem  515–16
insufficient reason (indifference), principle of  1016 kinesthetic sense  621
integral dimensions  953–4, 980 kinetic depth effect  10
integral properties  973 Metzger’s work  528–30, 529
integral superiority effect  973 Wallach’s work  530–1
integration fields  200 see also structure from motion (SfM)
integration process  150–1 kinetic transparency  428
integration ratio (IR)  112 knowledge, influence on completion  311
intention, influence on figure assignment  267 Koffka cross  302, 303
intentionality, in definition of art  886n3, 890 Kolmogorov complexity  1018, 1028, 1029
interior design, Japanese  865–7, 866 Korte’s laws  488
intermodal comparison method  45
internal laws of perceptual organization  870–1 ladder contours  191
interpolation models  62 language, interaction with colour perception  443–4
interposition cue to depth  295n3 lateral endpoint offset, as an emergent feature  94, 95
interpretation, effect on binocular lateral geniculate nucleus, chromatic tuning  440
rivalry 787–9 lateral inhibition  393
intersection, as an emergent feature  94, 95 lateral occipital complex (LOC)  441, 971
intersection of constraints (IOC) strategy  507–8 role in amodal completion  301
intersensory adaptation  829–32 role in contour integration  195
timing-selective neurons versus criterion shifts and role in depth order perception  351–2
expectations 830–2 role in symmetry detection  110
intersensory Gestalten  646–8 and sensory substitution  658
cross-modal correspondences as  648–9 transformational apparent motion perception  544
intrinsic holistic representation  975–7 lateral parietal sulcus (LIP)  972
invariance 876 lattice method of grouping  560–1
inverse optics  399, 1008 layer decomposition  421
inverse probability  1009–10 layers (scission) models of lightness  470
inverse problem  602 evaluation of  472–4
inverted faces L-cones 440, 441
processing of  764–5, 769, 789, 962 learning, impact on perceptual grouping  75–76
Thatcher illusion  579 length
isomorphism 7, 13 haptic perception of  627–8
iso-orientation excitation, region growing  325 illusions of  627–8
iso-orientation inhibition, boundary detection  324 levels of perceptual organization  364–6, 365
isovist theory  879 ‘life detection’, body motion perception  580–2
iterative learning, hierarchical organization by lightness  408–9, 450, 469–70, 480–1
AOT 925–6 and 3-D structure  393–4
ambiguity of luminance  391–2
Japanese design, influence on Bauhaus  869–70 anchoring theory  400, 407–8, 455, 470–1
Subject Index 1087

definition of 391 Wallach experiment  392–3


equivalent illumination models  470 luminance ratios  392–3, 470–1, 475, 480
evaluation of theories  471–5 limitations 393–8
filtering and filling-in models  471 luminance statistics  152–3
frameworks as perceptual groups  401–3 and texture segmentation  170
frameworks that create illusions  407
frameworks versus layers  407–8 magnitude of experience  42–3
Gestalt theory  398–401 magnocellular neurons, and dyslexia  724
grouping principles  403–5 make-up facial –899, 899
limitations of ratio theory  393–8 Markhov Random Field (MRF) frameworks  922
local versus remote ratios  396–8 Markov assumption  217
in models of transparency  424 limitations 218
perceived relationship to transmittance  468–9 masking procedures  800
reflectance versus illuminance edges  394–6, 397 masquerade 843, 846
relative luminance  392 material properties
reverse contrast illusions  405–6 perceptual correlates  453
scission models  470 perceptual representation of  479–81
structure-blind approach  391 material–weight illusion  629
Wallach experiment  392–3 maximum a posterior (MAP) hypothesis  1012, 1018
lightness-illumination invariance hypothesis  399 and shape representation  1014–15
lightness perception, benefits of  466 maximum entropy prior  1016
lightness similarity grouping, demonstration in maximum likelihood estimators (MLEs)  157
infants 696–7 and cross-modal interactions  657
likelihood function, Bayesian inference  1010, 1012n4 McGurk illusion  813
likelihood principle  1029, 1033–4 M-cones 440, 441
Bayesian inference  1017–18 meanings 1046–7
statistical 1017n7 computation of  1050
veridicality 1036–8 medial axis representation  877–8
and view dependencies  1031–2 medial-axis-transform (MAT)  245–6, 248–9
linearity, as an emergent feature  93, 98, 102 Meissner corpuscles  621
linear-nonlinear-linear (LNL) models, texture memes 900
segmentation 172, 173 memory, of holes  287–8
linear perspective, pictorial box spaces  1059 mental world  1047
line bisection tasks, studies in hemispatial mereological essentialism  34
neglect 746–7 mereology 28–9
line drawings, haptic perception of  630–1 Merkel nerve endings  621
L-junctions, in modal completions  306 metacontrast 497–8
local and global biases  713–14 metacontrast masking  800
in autism  726–8 metamerism 438
construct validity  723–5 metamodal theory of brain  657–8
future research areas  728–9 method of adjustment  681
historical background  714–17 Michigan Visual Sonification System
in schizophrenia  725–6 (MVSS) 656, 667
underlying principles  717–23 micropattern textures  168
local completions  296 mid-level theories of lightness  400–1
local features  938 mimicry 843
local field potential (LFP), role in minimal mapping  564
border-ownership 380–2 minima rule  243–4
local orientation coding  210–11 limitations 244–5
local sign concept  1058 minimum description length (MDL)
location-and-gradient mapping  239 principle  1018–19, 1027–8, 1035
long range connectivity, visual cortex  194, 222, 334, minimum-relative-motion principle  525–7, 526
368, 369, 376, 783, 976 mirror neurons  872
loss functions, Bayesian inference  1019–20 role in body motion perception  585
lower regions, configural properties of  264 mirror stereoscope  776, 777
luminance mirror symmetry  61
ambiguity of  391–2 see also symmetry
effect on direction judgements  157 mismatch negativity (MMN)  610
in models of transparency  424, 425 mistuned partial phenomenon  605
non-local nature  938 Mitate 865
relative 392 mixture distributions  941
1088 Subject Index

modal completion  294–6, 302–4 multidimensional affinity spaces, role in object


in animals  310 recognition 571
identity hypothesis  310–11 multidimensional scaling, investigation of colour
incompleteness as a local cue  304–6 perception 437
in infants  310 multimodal neurons  830
Kanizsa-type versus Petter-type  306–8 multiple objects tracking, studies of holes  290
kinetic illusory contours  308–9 multiple symmetry, representation models of
neural correlates  309 detection 117–18
in stereopsis  308 multisensory interactions  514–15
modalities of existence  25 and binocular rivalry  790
modes, and probabilistic structure  941–2 multistable perception
modular small-world structure  991–2 neural processes  803
monocular rivalry  802 of sound events  606–7
MOSAIC model  586, 587 and study of unconscious processing  800–5
motion see also bistable perception
barberpole effect  509–12 musical metre, intermodal perception  647
Bayesian inference  1013
as an emergent feature  94, 96 naïve Bayesian models  1016
form–motion interactions  492–3 natural images
interaction with transparency  428 in design  873–5, 874
non-local nature  938 edges 189
non-retinotopic feature attribution  497–9 unconscious processing  807–8
phenomenal identity problem  491 natural mapping  880–1
reference frames  493–7 natural tasks, identification of  466, 467, 479
terminology 488 natural textures  168
Zeno’s paradox  487 Navon letters  715, 716
motion after-effect, chromatic selectivity  448 and autism  727
motion ambiguity  504 limitations 728–9
aperture problem  504 and schizophrenia  726
edge classification problem  505–6 validity in demonstration of global bias  723
motion blur problem  496, 497 Nazi regime, impact on Gestalt psychology  11
motion camouflage  856–7 Necker cube  775, 776, 777, 800, 801
motion coherence tasks  157 neglect 738, 741
motion correspondence problem  491 egocentric versus allocentric  742
motion detection perceptual grouping without attention  745–7
as a fundamental perceptual dimension  489–490 network propagation models, of
as orientation detection in space-time  489, 490 border-ownership 375–7
motion fading ‘Neural Centre of Consciousness’ hypothesis  1048
effect of trackable features  548–9 neuronal synchronization  1039, 1041
effects of grouping  550 neurophysiological studies, of contour
motion features, body motion perception  580 integration 194–5
motion–form interactions  541, 542–6, 553 neurophysiology, relationship to
perceived rotation speed, size and shape effects  546 phenomenology 34
perceptually grouped objects  550–3, 551 NMDA receptor  335
transformational apparent motion  542–6 role in figure–ground modulation  335
motion-induced blindness  806 noise, role in rivalry dynamics  785, 804
motion neurons  584 non-accidental features  936–8, 1034
motion parallax  535 non-accidentalness, perceptual grouping  68–9
motion pathway  588 non-retinotopic feature attribution
motion perception sequential metacontrast  497–8
Gestalt psychology  10 Ternus–Pikler displays  498–9
phi motion  3–4 normalization 110
motion perspective  535 Nōtan 864
motion processing notion-sensitive neurons  489–490
structure-blind strategies for overcoming the null hypothesis significance testing (NHST)  1011n3
aperture problem  507–9
two stages of  506–7 object file theory  495
motion statistics  156–8 objective reality  1048
motor signals, influence on binocular rivalry  790 objectivity 41–2, 47
moving ghosts problem  496 shared 46
Müller-Lyer illusion  627, 810, 811 object localization, studies of sensory
dual-task experiments  745 substitution  661, 662
Subject Index 1089

object recognition and visual awareness  903


implications of dynamic grouping  570–1 pairwise association
studies of sensory substitution  660–2 cue combination  216
objects good continuation principle  213–14
auditory 603–4, 611–12 proximity principle  211–2
haptic properties  622–30 similarity grouping  214–15
object segmentation, role of colour  447 palm boards  52n16
object size resolution, dissociation between action and parallelism
perception 680–5 as an emergent feature  94, 95, 98
objet trouvés  886n2, 889, 892 haptic perception of  632–3
oblique effect  634 problems with contour-based
Occam’s razor  1027–8 representations  247, 248
occluded figure interpretation  266, 989–91 role in contour grouping  222
amodal completion  296–302, 569–70 role in perceptual grouping  58, 62, 69
contour extrapolation  239–41 parallel processing  821, 949–50, 953, 955, 962,
contour interpolation  241–2 990–1, 1039
and edge classification  511–12 parameter estimation, Bayesian inference  1012, 1013
good continuation  62, 632 pareidolia 8971n44
infants 695–6 parietal extinction  811
and local convexity  250 parse trees  923
modal completion  302–10 mathematical formalism  924
occluding layers, in Japanese interior design  866–7 partial movement  488
odd quadrant task  97–8 part salience  262–3
odours part segmentation  242–3
binaral rivalry  802 Medial-Axis-Transform (MAT)  245–6
influence on binocular rivalry  790 minima rule  243–5
open environments  913 part-whole effects, face perception  759
opponent colour processing  439–40 in prosopagnosia  761–2
optic flow, and structure from motion  533–5 part–whole paradigm  959
optimal codes  1018, 1030–1 past experience  1017, 1050
order 871 influence on figure assignment  268–72
orientation past experience, role in perception  536
as an emergent feature  92, 93, 98, 102 patterns, ornamental  893–7
illusions of 634 payoff matrix (loss function), Bayesian
representation in visual cortex  367–9, 368 inference 1019–20
orientation-dependent coloured after-effects  448 perception-action modelling  872
orientation-linking  190 perceptual grouping  57–8
linking process, nature and site  192–4 and binocular rivalry  779–83
orientation stability  523–4 ceteris paribus rules  62, 64
orientation statistics  153–6 changes in schizophrenia  725–6
discrimination and detection experiments  155 and colour induction  449
orthographic projection  529–30 common fate principle  60
oscillatory activity  993, 1000 common region principle  64–66
alpha activity  993–5 cross-modal interactions  643–6
beta and gamma activity  995–7 edge-region grouping  69–70
coherence intervals  996–8 element connectedness  67–8
coupling of slow and fast waves  999–1000 frameworks 401–3
event-related gamma activity  998 generalized common fate  66–7
slow wave modulations  998–9 generic versus category-specific rules  921–2
outline patterns, transparency  419–20 Gestalt qualities  30–1
good continuation principle  61–2
Pacinian corpuscles  621 grouping by illumination  402–5
pain perception  1048 grouping in dynamic patterns  72–4
painting haptic perception of spatial patterns  631–2
definition of  903, 906 induced grouping  71
gist 908 in infants  696–708
iconic images  908 influence on attentional priorities  739–44
microgenesis 913–6 interaction with attention  736–7, 748–9
paradigm shifts  911 lattice method  560–1
preliminary depictions  908 learning, associative grouping, and carryover
scales of structure  910 effects 75–76
‘space of images’ ,  911 motion correspondence problem  491
1090 Subject Index

perceptual grouping (Cont.) pixel count, as an emergent feature  94, 95


non-accidentalness and regularity  68–9 pixel statistics, and texture segmentation  170–4
operation without selection by attention  744–8 planarity, grouping by  401–2
proximity principle  59–60 plasticity, role in sensory substitution  657–8
role of probability  74–5 point-light studies of body motion  577–9, 578
similarity grouping  60–1 poking 1055
of sounds  602–3, 605–6 polarized gamma motion  542, 543
symmetry grouping  61, 110–11 Ponzo illusion  715, 716, 810, 811
synchrony grouping  67 dissociation between action and perception  676–7
and temporal experience  832–3 dual-task experiments  745
theoretical issues  76–81 and sensory substitution  662, 663
time course of  977–979 position averaging  160
uniform connectedness  72, 73 posterior distribution  1012
wholes and parts  28–30 posterior probabilities  1010
see also dynamic grouping computation of  1017–19
perceptual independence (PI)  949, 955 power spectrum, and texture perception  174, 175
violation of  957, 959, 961 Prägnanz principle  8, 9, 79–81, 494, 972, 1008, 1028
perceptually grouped objects, perceived predictive coding  722–3, 974
motion 550–3, 551 predominance, binocular rivalry  786
perceptual scaffolding  703–5 presentations 22–3
perceptual separability (PS)  953 information content of  24–5
violation of  956, 957, 961 prototypical durations  30–1, 33
perceptual switching primed matching paradigm  132
functional imaging  349 studies of amodal completion  299–300
oscillatory activity  996 prior probabilities  1010, 1015–17, 1021
perfect symmetry, representation models of probabilistic features  936
detection 115–16 local versus global  938–40
peripheral vision, texture processing  177, 179–90 non-accidental 936–8
perirhinal cortex, role in figure-ground shape features  938–40, 939, 941
assignment 275–6 and the statistical structure of the
personality traits, and local versus global bias  722 environment 940–43
perturbed symmetry, representation models of probability
detection 116–17 role in perceptual grouping  74–5
Petter-type modal completions  307–8 see also Bayesian inference
phase-of-firing code  382 probability matching behaviour  1020
phenomenal identity problem  491 process models of symmetry detection  118–19
object file theory  495 bootstrapping 119–21
phenomenal reality  25 proportionality 876–7
phenomenology prosopagnosia 761–3
experimental see experimental phenomenology 23–4 and Garner interference  766–7
first-person accounts  27–8 Prosthesis Substituting Vision by Audition (PSVA)
information content of presentation  24–5 device 656, 660
origins of 21 proto-objects
physical versus psychic phenomena  26–7 and auditory perception  612, 613
presentations 22–3 and sound perception  607–9
relationship to neurophysiology  34 proximity
phi motion  3–4, 89, 488 effect on symmetry detection  112–13
photometer metaphor  391 as an emergent feature  92, 93, 96–7, 98, 102
physical Gestalten  6–7 proximity dot lattice  715
physical phenomena  26 proximity grouping  57, 58, 59–60, 972
physical world  1047 demonstration in infants  697
physics 41–2 grouping by illumination  403
physiognomical perception  872 and haptic perception  631–2
pictorial box spaces, linear perspective  1059 learning of 692
pictorial reliefs  49 in pairwise association  211–12
pictorial shapes, experimental phenomenology  51–5 time course of  977–979
pictorial surfaces  49 pruning criterion, skeletal shape representation  246
pictures psychic phenomena  26–7
definition of  889–90 psychophysical flanker facilitation, and contour
see also painting; visual art integration 199–200
piecemeal perception of objects  994 psychophysics 42–3
piecemeal rivalry  781, 802 measurement in  43–5
Subject Index 1091

pure distance law  60 Rotoreliefs, Duchamp  523


qualities 1047 Rubin’s face/vase  776
quantum computing  1039–40 functional imaging studies  992–3
qubits 1039 Ruffini nerve endings  621
Ryonaji garden  867, 868
radial-tangential illusion  627 medial axes  877, 878
random dot kinematograms (RDK), motion
transparency 428 saddle shapes, disregard of  1059–60
random-dot stereogram, colour effects  447 Sakuteiki 867
rapid detection tasks, boundary detection  324–5 salience 970, 971–2
ratio theory of luminance  392–3 salience hierarchy, grouping principles  705–6
limitations 393–8 saliency map  737
receptive fields  363, 383, 938, 969 sampled motion  488n1
surround regions  975 Sansui manual  867
reciprocal inhibition, in binocular rivalry  784 Sapir-Whorf hypothesis  444
recognition-by-components theory  568, 570, 920, 937 Saturn illusion  527–8
recognition from partial information (RPI)  311 savannah landscape, visual appeal of  875
redundancy 150 scale invariance, functional networks  993
redundancy facilitation effect  954 scarification  895
redundancy gains and losses  100, 101 scene recognition, role of texture
reference frames  493–5 processing 178, 179
classification of  497n3 schizophrenia 716, 725–6
and haptic perception  633–4 comparison with autism  727
importance of  495–7 and connectivity  991–2
reflectance 391 Schrödinger’s principle  1048, 1060
grouping by  401, 402 Schroeder’s stairs  776
in models of transparency  424 scission models of lightness  470
reflectance edges  396, 397 evaluation of  472–4
region geometry, interactions with contour S-cones 440–1, 445
geometry 246–52 sculpture 902–3, 912, 914
region growing  323, 322, 325 segmentation 450
computational modelling  325–7 segmentation 405, 468
feature-specific feedback signals  333–4 segmentation problems
feedback connections  329 gloss perception  4476–9
region segmentation  207 lightness perception  469–75
regularity transparency perception  468–9
influence on amodal completion  298 selective grouping, scission induction  473–4
role in perceptual grouping  69 self-organization of brain structure  991
relatability  62, 296, 298 self-splitting figures  307–8
relational systems, Metzger’s work  12 self-symmetry, in human design  880
relative motion  576–7 sematic information, unconscious processing  807
relative size, and hierarchical structure  138–9 senses 41
releasers, ethological  1051 sensory substitution  655, 667–8
repetition discrimination task (RDT)  65–66, 74 and aesthetics  665–7
representation models of symmetry detection  115–18 as a cross-modal interaction  657–8
retinal ganglion cells  323, 440 and depth perception  662, 663
retinex algorithms  450 functional imaging studies  663–5
retinotopic maps, intrinsic constraints  719 future research areas  663
retinotopy, and temporal experience  822 historical background  656
reverse contrast illusions  405–6 network possibilities  665, 666
and assimilation  407 object localization studies  661, 662
reverse hierarchy theory  143, 973–4 object recognition studies  660–2
reversible figures  363, 364 subjective perceptual experiences  658–60
see also bistable perception technical overview  656–7
rhythm, intermodal perception  647 separable dimensions  953, 980
rigidity assumption  531–2 sequential grouping, of sound events  605–6
rocks, Japanese garden design  865, 867–9 sequential metacontrast  497, 498
rolling wheel illusion  493, 494 serial processing  949–50
rope contours  191 shading information
Rorschach inkblot figures  898, 896 and good continuation  372–3, 374
rotating snakes illusion  812 in perception of holes  286
rotation, perceived speed of, effects of size and shape-from-shading  48, 456, 457, 1059–60
shape 546–9 shadows, distinction from transparency  426
1092 Subject Index

shape emergent features  104


effect on perceived speed of rotation  546–9 influence on binocular rivalry  790
haptic perception of  626–7 snake contours  191
interaction with volume perception  628–9 soft boundaries, probabilistic model of features  940–1
part-based representations  242–6 somatosensory perception, Aristotle illusion  812–13
shape averaging  160 sound events, definition of  604
shape completion  238 sound organization  608–9
contour extrapolation  239–41 space–time coupling  73–4
contour interpolation  241–2 space-times 1053
shape features  938–40, 939, 941 space–time trade-off  73–4
shape-from-shading  48, 456, 457, 1059–60 spatial attitude estimation, pictorial shapes  49–52
shape interference  290–1 spatial filtering models of symmetry detection  118–19
shape neurons  585 spatial patterns, haptic perception of  631–2
shape processing spatial pyramid matching (SPM) model  928
Bayesian model  1014–15 spatial relations, haptic perception of  632–4
dissociation between action and perception  678–80 spatial relationships perception, serial
shared objectivity  46 processing 821–2
Shepard Tables  810, 811–12 spatiotemporal boundary formation  308
Sherlock’s principle  1049–50, 1060 spatiotemporal models of motion  489, 490
Shin-Gyō-Sō 865, 876 specular reflection, in gloss perception  476–7
signal detection theory  955 speech recognition, motor theory of  587
similarity, as an emergent feature  96 speed, judgement of  158
similarity grouping  58, 60–1, 214–15 spiral patterns  893–4
development in infants  696 spot-shadow experiment  396
grouping by illumination  403–4 staircase Gelb effect  397–8
and haptic perception  632 state-dependence, dynamic grouping  564–5
of sounds  602 stationarity assumption  535
time course of  977–979 statistics 151–2
simple pooling model of texture perception  180 and attention  160–1
simplicity (minimum) principle  81, 720, 1029–30 commonalities of averaging  161
Bayesian inference  1017–19, 1032–5 locus of computation  162
classical versus modern information-theoretic luminance statistics  152–3
simplicity 1030–1 motion statistics  156–8
historical background  1027–9 orientation statistics  153–6
neural realization of  1038–41 size averaging  158–60
veridicality of  1035–8 stereo correspondence, good continuation  371–2
view-dependencies 1031–2 stereokinetic effect  535–6
simultanagnosia  68, 736, 739 Benussi’s work  522–3
perceptual grouping without attention  746–7 height of the stereokinetic cone  524–5
simultaneity 820, 823 and illusory contours  527–8, 529
simultaneity constancy  824, 827 Mach’s work  521–2
compensation for auditory distance  827–8 minimum-relative-motion principle  525–7, 526
compensation for the length of tactile nerves  828–9 Musatti’s work  523–5
intersensory adaptation  829–32 orientation stability  523–4
simultaneous lightness constancy  393 Renvall’s explanation of  525
simultaneous lightness contrast  404–5, 810, 811 stereokinesis on inadequate basis  524
unconscious perception  809 stereoscopic noise patterns, scission induction  473
simultaneous matching task, studies of amodal stick insects  846
completion 299, 300 stimulus intensity, Weber’s law  680–1
size, effect on perceived speed of rotation  547, 548 stimulus specific adaptation (SSA), in auditory
size averaging  158–60 perception 609
size illusions stopping rules  950–1
dissociation between action and perception  673–7 stream/bounce illusion, cross-modal effects  643
relationship to visual cortex activity  812 stroboscopic motion  488n1
size–weight illusion  629 Benussi–Koffka dispute  31–2
skeletal shape representation  245–6, 248–9 strong Gestalten  6
comparison of animal and leaf categories  250–2, 251 Stroop Interference (SI)  99–100
sleep deprivation, and connectivity  992 Stroop task  980
sliding effect  512 structural classes  1031
small-world structure  991–2 structural information theory (SIT)  297, 1029, 1031
smell precisals 1033, 1040
binaral rivalry  802 Structuralism 129, 260
Subject Index 1093

structure from motion (SfM) of shape  626–7


Euclidean versus affine space  533 of spatial patterns  631–2
incremental rigidity scheme  532–3 of spatial relations  632–4
integration with other cues  535 of volume  628–9
optic-flow components and projection types  533–5 of weight  629–30
rigidity assumption  531–2 tactile stimuli, influence on binocular rivalry  790
see also kinetic depth effect Tactile Visual Substitution System (TVSS)  656
style spaces, body motion  579 tangent bundles  198
stylistic visual signature  876–7 Tangrams 919, 920
subjective contours  14 taste, emergent features  104
perception by infants  695 templates, Gestalts as  1058–60
subjective experience  27–8 temporal coherence  602, 612, 613
descriptive psychology  27–8 see also common fate principle
subjective figures, as emergent features  94, 96 temporal correlation hypothesis  79
subjectivism 1011 temporal experience
super-additivity, dynamic grouping  564–5 brain time  824–5
super-capacity processing  951, 953, 962 brain time theory versus event time theory  823–4
superior temporal sulcus, role in body motion chronotopy, lack of  821
perception 584–5 differential latency problems  824, 825–6
superposition method  97–8 event time reconstruction  827–9
supra-threshold phenomena  45 event time theory and simultaneity constancy  824
surface correspondence problem  564, 565 intersensory adaptation  829–32
surface disruption  854 ordered timeline view  820
surface properties and perceptual grouping  832–3
perceptual representation of  479–81 and retinotopy  822
see also gloss perception; lightness; transparency time-stamping 822
surprisal (description length)  237, 238, 1018 undefined temporal relationships  820–1, 822–3
Surrealism 912 temporal induction  603
surroundedness, as an emergent feature  93, 103 temporal ventriloquism effect  643
symmetry 108–9 terminator count, as an emergent feature  94, 95
and 3D object completion  298 Ternus–Pikler displays  491, 498–9
and camouflage  851 tesselations  897
as an emergent feature  93, 97, 102–3 tetrachromatic colour vision  439
in human design  880 texton theories  171, 847
local 249n9 textural crowding  865
in nature  873 texture, nature of  167
in patterns  894, 897, 906f texture analysis/synthesis techniques  175
problems with contour-based representations  248 texture descriptors
role in contour grouping  222 comparison of information encoding  174, 175–6
role in perceptual organization  109–111 use in computer vision  179
symmetry detection texture-orientation statistics  154
formal models  113–15 texture perception  175–6, 177, 179
modulating factors  111–13 high-dimensional models  180–1
process models  118–21 non-local nature  938
representation models  115–18 in peripheral vision  177, 179–80
symmetry grouping  58, 61 and visual crowding  177, 180
synaesthesia 649n9 textures
and sensory substitution  666 micropatterns  168–9
synchrony, temporal correlation hypothesis  79 natural 168
synchrony grouping  67 nature of 169
systems factorial technology (SFT)  949–51, 964 texture segmentation  150, 168–7
experimental evidence  959–60 combined statistical and image processing-based
models 173–4
tactile nerve length, compensation for  828–9 image processing-based models  172–3
tactile perception  621–2 phenomena 169–70
ambiguity 775 role of glutamate receptors  335
Aristotle illusion  812–13 statistics of pixels  170–1
of curvature  622–6 statistics of textons  171
interaction with visual perception  645 terminology 170n1
of length  627–8 texture-segmentation tasks  321–2, 322
of line drawings  630–1 texture statistics  150
sensory substitution  656, 659 order of  150–1
1094 Subject Index

texture tiling model  179–180 travelling waves, rivalry dominance  782, 783


Thatcher illusion  579 triadic rock groupings, Japanese gardens  868, 869
GRT analysis  957–9, 958 trichromacy 436–7
theatrical make-up  900 anomalous 439
theory of visual attention (TVA)  737, 749 asymmetries in organization  440–1
theta activity  998 tunnel effect, amodal completion  302
coupling with fast waves  999, 999–1000 “Twenty Questions” game  1049, 1055
thirds, rule of  867 two-point-threshold 621
threshold mean orientation offset  154 two-thirds power law, arm and finger movements  582
thresholds, measurement of  43–5 typography design  869–70
tiling method of organization  919–920
tilt illusion  811 Umwelt 1052–3
timescale of visual perception  1055–6 unconscious perceptual organization  806–7
time-stamping 822 face perception  807
tipping factors, binocular rivalry  784 of natural images  807–8
T-junctions 1034–5 neural representation  808–9
as cue for occlusion  296, 356 of semantic information  807
detection in visual cortex  358–9 uniform connectedness  72, 73
in Japanese interior design  866 uninformative priors  1016
role in perceptual grouping  405 unique hues  439
tone combinations, emergent features  104 encoding of  441–2
top–bottom polarity  263, 264 unitization, infants  719
topological consistency  375–6 user interfaces
topological equivalence  282 awareness as  1056–60
topology 283–4, 366 templates 1057–60
condition for transparency  416, 417, 418 utility functions (loss functions), Bayesian
as an emergent feature  94, 95 inference 1019–20
and Gestalt perception  963
trackable features hypothesis  546–7, 548–9 vector analysis  493, 494, 576–7
transcranial magnetic stimulation (TMS) vector average (VA) strategy  507–9
studies of Braille reading  658 vector field combination  296
studies of contour grouping  224–5 veiling luminance  423n13
studies of multistable perception  791, 803 ventral stream  672, 969, 972, 989, 1032
studies of sensory substitution  659, 664, 665 ventral stream damage  742–3
transfer of organization across grouping principles  703 visual form agnosia  684
transformational apparent motion (TAM)  542–4 verbal transformation effect  642
and dynamic grouping  562, 563 veridicality 1057
and dynamic grouping motion  565–7, 566 and simplicity principle  1035–8
model of  545–6 vertical–horizontal illusion, and sensory
neural correlates  544–5 substitution 662
transformational approach to symmetry  113–15, 114 view-dependencies 1031–2
translational apparent motion  542, 543–4 viewpoint generalization  110
transmittance 413 visible persistence  496
perceived relationship to lightness  468–9 vision-for-action 672
transparallel mind hypothesis  1040–1 comparison with vision-for-perception  673
transparallel processing  1039 see also action and perception dissociation
transparency  413, 468–9, 479–80 visual art
achromatic 413–15 definition of  886–892
chromatic 452–3, 456 fashion 897–902, 910, 912
contrast attenuation theory  423 ornamental patterns  893–7
distinction from shadows  426 painting 903, 908–16
effects 426–7 sculpture 902–3, 909, 914
effects on motion  428 stratified structure of  892–3, 898
figural conditions  416–19 visual attention
and invariance of cone excitation ratios  452–3 attentional priority map  737–8
Metelli’s model  421–3 and awareness  738
in outline patterns  419–20 neuropsychological deficits  738–9
photometric conditions  420–6 visual cortex
topological condition  416, 417, 418 chromatic tuning  441
X-junctions and four regions, detection of local boundary signals  367–9, 368
indispensability 425–6 hierarchical organization  969–71
transversality principle  243–4 and intrinsic holism  975–6
Subject Index 1095

motion processing  506–7 Wabi-Sabi 865


rotation perception  549 wallpaper patterns  896, 897
transformational apparent motion perception  544–6 weak Gestalten  7
visual crowding  154 Weber–Fechner law  117
and texture perception  177, 180 Weber’s law  680–1
visual extinction  739 dissociation between action and perception  681–3
visual field  1058–9 weight
visual form agnosia  684–5 awareness of  41–2
visual Gestalten  7 illusions of  629–30
visual literacy  890–891 perception of 629
visually-evoked potentials, studies of translational White’s illusion  404, 405, 407
apparent motion perception  545 wholes and parts, theory of  28–30
visual perception, Gestalt psychology  9 Wirkungsform 902
visual processing, timing of perceptual grouping  77–9 within-object illusions  679
visual proofs  46–7, 1054 dissociation between action and perception  678–9
visual regularities  109 workload capacity  951, 955
visual search
and holes  288 X-junctions, and transparency  418, 420, 425–6, 456
power of closure  378
role of texture processing  178, 179 yardsticks 49–50
voice device  656, 657
object recognition studies  660, 662 zebra stripes, function of  857
subjective perceptual experiences  658–9 Zeno’s paradox  487
volume, haptic perception of  628–9 zero-bounded response distributions (ZBRs)  198
volumetric sculpture  902, 912 zero–one loss, Bayesian inference  1019, 1020
von Uexküll’s principle  1053–4, 1060 zombie nature  1046, 1050

Вам также может понравиться