Академический Документы
Профессиональный Документы
Культура Документы
Approaches to Phonologization
Edited by
ALAN C. L. YU
OXPORD
UNIVERSITY PRESS
OXPORD
UNIVERSITY PRESS
Great Clarendon Street, Oxford, OXi 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
editorial matter and organization Alan C. L. Yu 2013
the chapters their several authors 2013
The moral rights of the authors have been asserted
First Edition published in 2013
Impression: i
All rights reserved. No part of this publication maybe reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
ISBN 978-0-19-957374-5
Printed in Great Britain by
MPG Books Group, Bodmin and King's Lynn
Contents
Preface vu
Acknowledgements xii
Notes on Contributors xiii
References 285
Language Index 331
Subject Index 333
Preface
The content of this volume grew out of a workshop on phonologization held at the
University of Chicago, Illinois, in April, 2008. The majority of the chapters in this
volume are based on papers presented at the workshop. In an attempt to broaden the
breadth and perspectives presented in this volume, however, others were added.
The term 'phonologization', which Larry Hyman defined in 1976 as 'what begins as
an intrinsic byproduct of something, predicted by universal phonetic principles, ends
up unpredictable, and hence, extrinsic' (Hyman 1976: 408), gained prominence as a
result of the publication of Hyman's seminal article under the same name. As Hyman
reviews in his contribution to this volume, however, defining 'phonologization' is not
so straightforward given the complexity in delineating the boundary between what
is phonetic and intrinsic and what is phonological and extrinsic. He considers the
role of contrast in the phonologization process and suggests that the term 'phonol-
ogization' needs to be extended to cover other ways that phonological structure
either changes or comes into being. He ultimately concludes that phonologization is
but one aspect of the larger issue of how (phonetic, semantic, pragmatic) substance
becomes linguistically codified into form. Elizabeth Hume and Frdric Mailhot, on
the other hand, seek to conceptualize the phenomenon of phonologization from the
perspective of information theory (Shannon 1948). In particular, they argue that
information-theoretic concepts such as entropywhich models a cognitive state of
the language user associated with the amount of uncertainty regarding the outcome
of some linguistic eventand surprisalwhich is context-dependent and is associ-
ated with individual elements of the systemare useful tools for understanding how
external factors, individually and together, influence the progression of sound change.
Phonologization, for example, is predicted to preferentially affect elements linked to
extreme degrees of surprisal.
Many issues are intertwined when discussing the phenomenon of phonologization.
As such, the task of arranging the chapters into coherent sections was made all the
more difficult. In the end, I have settled on four broad themes, corresponding to
different facets of phonologization research. It is important to point out, however,
that many chapters touch on themes that would have made them just as appropriate
under a different heading.
Much energy has been dedicated to understanding sound change by identifying
the very early inception of change, that is, the identification of perturbations of the
speech signal, conditioned by physiological constraints on articulatory and/or audi-
tory mechanisms, which affect the way sounds are analyzed by the listener. While
this emphasis on identifying the intrinsic variation in speech has provided important
viii Preface
insights into the origins of widely attested cross-linguistic sound changes, the nature
of phonologization has remained largely unexplored. Several factors, however, have
been implicated in the phonologization process, chief among them are channel and
analytic biases (Wilson 2006; Zuraw 2007; Moretn 2008, 2010; Yu 2011). Channel
bias refers to the relative likelihood of a phonetic precursor to sound change becom-
ing phonologized into full-fledged sound patterns (e.g. Hyman 1976; Ohala 1993;
Lindblom et al. 1995; Hume and Johnson 2001; Blevins 2004). The four chapters in
Part II consider the nature of the channel bias. Andrew Garrett and Keith Johnson
review the state of the art of channel bias research, showing that most typologies of
sound change have drawn either a two-way distinction between changes grounded in
articulation and perception or a three-way distinction among perceptual confusion,
hypocorrective changes, and hypercorrective changes. Heike Lehnert-LeHouillier
explores the role of language-specific perceptual cues in sound changes involving
vowel length and tone/accent on the one hand, and vowel length and vowel height
on the other. Based on the results of a cross-linguistic perception experiment, which
tested the influence of a falling f0 and vowel height on the perception of vocalic
length, she argues that spectral differences (as acoustic correlates of vowel height)
are more tightly linked to the perception of vowel duration than f0 (as the acoustic
correlate of tone/accent). Sam Tilsen, on the other hand, focuses on the contribution
of motor planning in sound change. He argues that contrast-maintaining inhibitory
interactions during contemporaneously planned articulation play a role in contrast
maintenance on diachronic timescales and bias productions toward maximal con-
trast. Sound change is often assumed to result from listeners having little a priori
assumptions about the language to which they are exposed (e.g. Ohala 1993). Such
an approach emphasizes the role of first language acquisition in shaping the course
of phonologization. Chandan Narayan presents a survey of work addressing devel-
opmental processes and the nature of phonological systems and change. He argues
that the types of phonetic contrasts that infants fail to discriminate are those that
are rare in the world's sound systems, which is in part due to their fragile acoustic-
perceptual salience. He also surveys recent research into the fine-grained phonetics
of infant-directed speech in English, which shows acoustic conditions similar to those
targeted in well-known sound changes in the world s languages. These findings suggest
that the ambient language input to infants has the potential to provide the seeds of
phonological change.
Analytic biases are limitations in computation or markedness relations and con-
straints imposed by the Universal Grammar. An analytic bias might render certain
patterns difficult to acquire even from perfect learning data. The nature of ana-
lytic biases is a matter of much debate. The three chapters in Part II wrestle with
this debate. Abby Kaplan argues for the importance of phonological markedness in
shaping the nature of the lexicon. She examines two cases of 'underphonologiza-
tiori, one where phonetic pattern is known to influence phonological patterns, and
Preface ix
one where it doesn't. She concludes that phonology rather than phonetics directly
influences patterns of lexical frequency. While Kaplan argues for the primacy of
phonology over phonetics, Jeff Mielke argues that phonological features are deriva-
tive of phonetic effects that are phonologized into sound patterns. He measures
the crosslinguistic frequency of occurrence of classes defined by particular features
and examines the phonological behavior of these classes. The characteristic behav-
ior profiles of features suggest that different features behave differently (e.g. more
or less assimilation or dissimilation, different behavior of + and - values, etc.),
often because the need for a particular feature is dominated by a particular type
of phonetically-motivated phonological pattern (e.g. voicing assimilation for classes
defined by [voice] and [sonorant]). He argues that the prevalence of these charac-
teristic phonological patterns is best attributed to the phonologization of phonetic
effects.
Phonological patterns often show effects of non-derived environment blocking.
That is, some sound alternations only obtain at morphological boundaries but not
in non-derived environments. How phonetic precursors to sound patterns come to
be phonologized only at morphological boundaries has not been previously explored.
Rebecca Morley tests the ability of participants to learn an association that was con-
ditioned on a morphological boundary, but that consisted of acoustic information
that was sub-phonemic in nature (degree of nasalization on a pre-nasal vowel, which
is never contrastive in English), using an artificial grammar learning paradigm. The
results show that listeners are successful in learning the morphological association
with novel phonetic cues even over short time periods and that grammatical and sub-
grammatical components of the linguistic system have the ability to interact. These
results thus offer supportive evidence for a historical phonetic origin for phonological
processes that only apply (or only fail to apply) in derived environments.
Understanding the emergence of new speech norms requires more than under-
standing the constraints and biases that shape the trajectory of change. The phonetic
and systematic bias factors delineate the preconditions for change, but they do not
explain why a change emerges at a particular moment in history, in one community
and not others.
The last part of this volume contains chapters that address the issue of the social
and computational dynamics of variation and change, a crucial facet of the phonolo-
gization process. To bridge the gap between the emergence of new variants and their
eventual propagation, a linking theory is needed. Two perspectives are offered in
this volume. Alan Yu argues for the potential role systematic individual differences
in modes of speech perception may play in the initiation and propagation of sound
change. He contends that individuals with different cognitive processing styles, and by
extension, different social and personality traits, might arrive at different perceptual
and production norms in speech. He suggests that individuals who are most likely
to introduce new variants in a speech community (the 'innovators' la Milroy and
x Preface
Milroy 1985) might also be the same individuals who are most likely to be imitated
by the rest of the speech community due to their personality traits and other social
characteristics. Conversely individuals with yet other cognitive processing styles and
personality traits might be more susceptible to the linguistic influence of others (the
so-called early adopters' la Milroy and Milroy 1985) and might lead the early phase
of linguistic convergence. Andrew Garrett and Keith Johnson, on the other hand,
attribute the point of entry to differences in sociolinguistic awareness, that is, how
individuals may differ in how they assign social meaning to linguistic differences. They
hypothesize that some individuals in a language community, but crucially not others,
may attend to linguistic variation within their own subgroup but not to variation in
other subgroups. If such individuals become aware of a particular phonetic variant
in their subgroup, but are unaware that it is also present in other subgroups, they
may interpret the variant as a group identity marker, and they may then use it more
often.
While the fact that language change requires variation is undisputed, how vari-
ation leads to change is a matter of much debate. Three authors investigate the
diachronic dynamics of linguistic variation from a computational perspective. At the
level of phonetic cues, the phonologization process often results in transphonolo-
gization (Hyman 1976). That is, the phonologization of one phonetic cue is often
accompanied by the dephonologization of another. Given that most phonological
distinctions are supported by multiple phonetic cues, what factors determine which
cues are selected for phonologization and which cue should dephonologize? James
Kirby argues for the role of probabilistic enhancement in phonologization through
computational simulation of an ongoing sound change in Seoul Korean. He proposes
that cues are targeted for enhancement as a probabilistic function of their statisti-
cal reliability in signaling a contrast. Simulation results using empirically derived
cue values are taken to support the idea that loss of contrast precision may drive
transphonologization.
In addition to the transfer of linguistic contrast from one cue dimension to another,
phonologization often leads to the establishment of sound patterns. A prime example
is the emergence of vowel harmony from vowel-to-vowel co articulation. Frdric
Mailhot shows that the emergence of a categorical pattern of lexical harmony from
vowel-to-vowel coarticulation can be simulated using a simple model of a language
transmission/acquisition feedback loop iterated over multiple generations. The pro-
gression of sound change does not stop at the introduction of a new variant. Under-
standing the behavior of a new variant once it is introduced in the speech stream
is crucial to explaining the trajectory of sound change. From this perspective, it
is intriguing that linguistic systems are replete with cases where multiple variants
coexist within the system. Why do some new variants coexist with old ones, while
others take over and become the dominant patterns? Morgan Sonderegger and Partha
Niyogi explore this issue of stability of variation computationally, using dynamic
Preface xi
modeling. Through a case study of stress shift in English noun/verb pairs, they show
that changes in stability of variation (i.e. bifurcation in dynamic modeling) occur only
under certain models of learning by individuals in a linguistic population.
Phonologization has emerged as one of the central topics in phonological research
in recent years. Many of the recent advances are made possible by researchers cross-
ing disciplinary boundaries and drawing on ideas from other research traditions to
address difficult questions previously thought unanswerable. The original call for
papers stated that the goal of this workshop is to facilitate collaboration among
phonologists as well as specialists from neighboring disciplines seeking unified the-
oretical explanations for the origins of sound patterns in language, as well as to
move toward a new and improved synthesis of synchronie and diachronic phonology'.
The present collection includes perspectives from phonetics, laboratory and theoret-
ical phonology, computer science, psycholinguistics, language acquisition, cognitive
neuroscience, cognitive and social psychology, and sociolinguistics. I hope that this
volume will serve as a stimulus to furthering the discussion and cross-pollination of
ideas.
This volume is dedicated to the memory of Partha Niyogi, a highly esteemed colleague
and a contributor to this volume, who passed away unexpectedly during the course
of preparation of the volume.
Chicago, IL Alan Yu
December 2011
Acknowledgements
Many thanks to the following reviewers of chapters for their valuable comments:
Adam Albright, Matt Carlson, Cynthia Clopper, Katie Drager, Edward Flemming,
Andrew Garrett, Peter Graff, David Harrison, Vsevolod Kapatsinski, Jelena Kri-
vokapic, Roger Levy, Lauren Hall-Lew, Bjrn Lindblom, Fang Liu, Alexis Michaud,
Andrew Nevins, Lisa Pearl, Anne Pycha, Yvan Rose, Joe Salmons, Ryan Shosted,
Morgan Sonderegger, Rachel Walker, Dominic Watts, Charles Yang, and Kie Zuraw
Caroline Crouch and Alison Thumel also provided much-appreciated assistance with
preparing this manuscript. Thanks also go to Julia Steer and John Davey, linguistics
editors at Oxford University Press, for their continued support during the preparation
of this volume.
Notes on Contributors
ANDREW GARRETT is Professor of Linguistics and Nadine M. Tang and Bruce L. Smith
Professor of Cross-Cultural Social Sciences at the University of California, Berkeley,
where he also directs the California Language Archive. In historical linguistics he has
published on general topics in sound change and morphological change as well as the
dialectology, diversification, and prehistory of Yurok (an Algic language of California)
and Western Numic (Uto-Aztecan), the dialectology and diachronic syntax of English,
and the syntax and morphology of Anatolian, Greek, and Latin.
ELIZABETH HUME is Professor of Linguistics at the University of Canterbury,
New Zealand, formerly of the Department of Linguistics at The Ohio State University.
She has published on topics including consonant/vowel interaction, feature theory,
information theory and phonology, language variation, metathesis, markedness, seg-
mentai structure, and the interplay of speech perception and phonology.
LARRY M. HYMAN received his PhD in Linguistics from UCLA in 1972. He taught
at the University of Southern California from 1971 to 1988. He came to Berkeley's
Department of Linguistics in 1988, which he chaired from 1991 to 2002. He has
worked extensively on phonological theory and other aspects of language structure,
concentrating on the Niger-Congo languages of Africa, especially Bantu. He has pub-
lished several books as well as over 120 articles in both theoretical and Africanist
journals.
KEITH JOHNSON is Professor of Linguistics and Director of the Phonology Laboratory
at the University of California, Berkeley. He has published two phonetics textbooks, a
textbook on quantitative linguistics, and two edited collections on speech perception
and phonology. His research focuses on the effects of phonetic and social experience
on speech perception.
ABBY KAPLAN is Assistant Professor (Lecturer) at the University of Utah. Her research
focuses on the phonology-phonetics interface, using a combination of experimen-
tal and corpus data to study the phonetic grounding of phonological patterns. She
received her PhD in 2010 from the University of California, Santa Cruz; her disserta-
tion research investigated the perceptual and articulatory basis of lenition.
FRDRIC MAILHOT received his PhD in Cognitive Science from Carleton University,
and now works in the Speech team at Google. He is interested in information-theoretic
and modeling-based accounts of sound change, as well as exemplar-based modeling
of generalization in phonological acquisition and use.
PARTHA NIYOGI was Professor of Computer Science and Statistics at The University
of Chicago. He obtained his undergraduate degree from IIT Delhi and SM and PhD
from MIT, and worked at Bell Laboratories before joining the University of Chicago.
His research spanned statistical inference, machine learning, speech and signal pro-
cessing, computational linguistics, and artificial intelligence. He wrote two books
(including The Computational Nature of Language Learning and Evolution) and many
journal and conference papers on these subjects.
MORGAN SONDEREGGER is a PhD candidate in Computer Science and Linguistics at
the University of Chicago. He received his B S from MIT and a master's degree from
Cambridge University. His research addresses stability and change in phonetics
and phonology, both within individuals and at the population level, using corpora
and computational and mathematical methods. He is also interested in quantitative
approaches to linguistics more generally, particularly phonetics, phonology, language
change, and sociolinguistics.
SAM TILSEN is Assistant Professor in the Department of Linguistics at Cornell Univer-
sity. He received his PhD from the University of California, Berkeley in 2009. He is
interested in how speech movements are represented, planned, and coordinated, with
Notes on Contributors xv
What is phonologization?
This page intentionally left blank
l
Enlarging the scope
of phonologization*
LARRY M. H Y M A N
"... the original cause for the emergence of all alternants is always purely anthropo-
phonic"
Baudouin de Courtenay (1895 [i9/2a: 184])
i.i Introduction
It is hard to remember a time, if ever, when phonologists were not interested in the
relation between synchrony and diachrony. From the very founding of the discipline,
a constant, if not always central issue has been the question of how phonology comes
into being. As can be seen in the above quotation from Baudouin de Courtenay, the
strategy has usually been to derive phonological structure from phonetic substance.
The following list of movements dating from the early generative period provides a
partial phonological backdrop of the wide-ranging views and interest in the relation
between synchrony and diachrony, on the one hand, and phonetics and phonology,
on the other:
(i) a. classical generative phonology (Chomsky and Halle 1968)
b. diachronic generative phonology (Kiparsky 1965, 1968; King 1969)
c. natural phonology (Stampe 1972, Donegan and Stampe 1979)
d. natural generative phonology (Vennemann i972a, b, 1974; Hooper 19763)
* Earlier versions of this chapter were presented at the Symposium on Phonologization at the University
of Chicago, the UC Berkeley, the Laboratoire Dynamique du Langage (Lyon), MIT, SOAS, and the Univer-
sity of Toronto. I would like to thank the audiences there, and especially my colleagues, Andrew Garrett,
Sharon Inkelas, and Keith Johnson, for their input and helpful discussions of the concepts in this chapter.
Thanks also to Paul Newman and Russell Schuh for discussions on Chadic.
4 Larry M. Hyman
e. variation and sound change in progress (Labov 1971; Labov et al. 1972)
f. phonetic explanations of phonological patterning and sound change (Ohala
!974> 1981; Thurgood and Javkin 1975; Hombert, Ohala and Ewan 1979)
g. intrinsic vs. extrinsic variations in speech (Wang and Fillmore 1961; Chen
1970; Mohr 1971)
For some of the above scholars the discovery of phonetic and/or diachronic moti-
vations of recurrent phonological structures entailed the rejection of some or all of
the basic tenets of classical generative phonology, as represented by Chomsky and
Halle's (1968) Sound Pattern of English (SPE). As a generative phonologist, I found
myself conflicted between a commitment to the structuralist approach to phonology
as reflected in the Prague School (e.g. Trubetzkoy 1939; Martinet 1960) and in SPE,
and a desire to explain this structure in terms of its phonetic and historical under-
pinnings. The resolution I opted for was to focus on the process of phonologization,
which is concerned not only with these underpinnings, but also with what happens
to phonetic properties once they become phonological. Thus, although resembling
Jakobsons (1931) termphonologization (Phonologisierung), which is better translated
as phonemicization (whereby an already phonological property changes from allo-
phonic to phonemic), I intended the term to refer to the change of a phonetic property
into a phonological one. Definitions of phonologization from this period include the
following:
Since a clear distinction was not always made at the time between allophonic varia-
tions which might be captured by phonological rule and language-specific phonetics,
the two were often lumped together. The result is a potential ambiguity, depending on
whether one makes a distinction between allophonics and language-specific phonet-
ics and, if so, whether the latter is identified as 'phonology' or as phonetics.
i. Enlarging the scope of phonologization 5
I have two goals in this chapter. First, I wish to explore the above notion of phonol-
ogization further, specifically addressing the role of contrast in the phonologization
process. Second, I wish to show how phonologization fits into the overall scheme
of the genesis and evolution of grammar. Extending the concept of phonologization
to a wider range of phonological phenomena, I shall propose that it be explicitly
considered as a branch of grammaticalization or what Hopper (1987:148) refers to as
'movements toward structure'.
As seen, phonetics and phonology can have very different properties. As one pro-
ponent of the distinction puts it, 'The relationship of phonology to phonetics is
6 Larry M. Hyman
LH L H L H L H
In (4b), however, the H of the L-H noun cl-po gift' spreads onto the pronoun, pro-
ducing a HL-H sequence. In (4c), the corresponding plural of (4a), there again is no
tone change, as expected, since the input is a L-L + L-H sequence. In (4d), the plural
and tonal correspondent to (4b), we do expect the H of -po to spread onto the plural
prefix zvn-, as it did in the singular in (4b). However, this does not occur, because the
voiced obstruent [zv] belongs to the class of depressor consonants which block H tone
spreading in Ikalanga. Since the depressor effect must be referred to by a categorical
phonological rule (H tone spreading) the second diagnostic has been met. As is well
known to Africanist tonologists, there is a tug-of-war between the natural tendency
for tone to spread vs. the intrinsic effects of consonants on pitch:
Since L-H and H-L tend to become L-LH and H-HL as a natural horizontal assimilation
[tone spreading], it can now be observed that the natural tendency of tones to assimilate
sometimes encounters obstacles from intervening consonants. Voiceless obstruents are adverse
to L-spreading, and voiced obstruents are adverse to H-spreading. The inherent properties
of consonants and tones are thus often in conflict with one another. In some languages (e.g.
Nupe, Ngizim, Ewe, Zulu), the consonants win out, and tone spreading occurs only when the
i. Enlarging the scope of phonologization 7
consonants are favorably disposed to it. In other languages (e.g. Yoruba, Gwari), the tones
win out, as tone spreading takes place regardless of the disposition of intervening consonants.
(Hyman 1973: 165-6)
In the terms of Archangeli and Pulleyblank (1994: 211), voiced obstruents are 'antag-
onistic' to H tone spreading, while other consonants are 'sympathetic'.
Two questions concerning what phonologization was (is) supposed to be are:
(i) Does 'intrinsic' mean unavoidable, i.e. 'universally present', or 'universal tendency'?
(ii) Does phonologization require that the phonetic feature of the trigger be con-
trastive? As mentioned earlier, it is widely accepted that one must distinguish between
universal and language-specific phonetics (Keating 1988,1990; Cohn 1993; Kingston
and Diehl 1994, etc.). What this means is that there are two diachronic reanalyses
which need to be recognized, as in (5):
(5) a. b. c.
universal phonetics > language-specific phonetics > phonology
('automatic') ('speaker-controlled') ('structured')
First, a perhaps unavoidable universal phonetic property takes on a language-specific
form which cannot be said to be strictly automatic or mechanical. The result is still
phonetic in the sense of (3), e.g. it may still be gradient rather than categorical. The
second diachronic reanalysis occurs when the language-specific property becomes
phonological in the traditional sense, i.e. structured, categorical.
This brings us to the question: What does it mean to be 'phonological'? This will
determine where 'phonology' begins in (5). For some, anything language-specific,
hence (sb), is phonology by definition: '... any rule, gradient or binary, phonologized
or categorical, to the extent that it appears in the grammar is fully phonological'
(Hajek 1997:16). The generative approach is to view phonology as a module of gram-
mar. However, there is a notoriously fuzzy boundary between postlexical phonology
(Kiparsky 1982) and phonetic implementation (Pierrehumbert 1980): 'The fact that
it is difficult to draw a line follows in part from the conception of phonologization
(Hyman 1976), whereby over time low-level phonetic details are enhanced to become
phonological patterns' (Cohn 2006: 30). Even some of the basic distinctions in (3)
have come under scrutiny. Cohn (2006) and Chitoran and Cohn (2009) consider the
possibility of categorical phonetics and gradient phonology, while Silverman (2oo6a:
214) apparently considers all of phonology to be gradient:
... there is no such thing as 'phonologization : at the proper level of description, all phonological
patterns are sound changes in progress, as they are all gradiently and variably implemented, and
they are all ever-changing... gradience and variation are the very stuff of phonology and sound
change...
If the boundary between phonetics and phonology is elusive, perhaps one can less
ambiguously characterize phonologization in terms of contrastiveness, the hallmark
8 Larry M. Hyman
of structuralist phonology. Here the central question is: What does it mean to be
'contrastive'? As summarized in (6), the term has been used to refer to different levels
of representation and to different domains:
Even if we limit ourselves to the quest for minimal pairs, hence words, it is still
necessary to distinguish between underlying and surface contrasts. Many of the exam-
ples of phonologization discussed in the 19705 concerned the 'redundant' effects of
contrastive features, e.g. [voice] in the following two examples:
(7a) concerns the oft-reported vowel length difference observed before voiced vs.
voiceless stops in English (see Purnell et al. 2005 for updated findings and more
subtle discussion). Since vowels are also longer before fricatives and sonorants, e.g.
gas [gae:s], man [mae:n], the process appears to be one of shortening before voiceless
stops (House 1961). Be that as it may, the durational differences are first phonologized
and then potentially phonemicized by final devoicing, as seen in the outputs. Concep-
tualized this way, the underlying voice contrast would correspond to a surface length
contrast in English.
The second case, (7b), has been much discussed in both the phonologization and
tonogenesis literature. Here we start with a H tone on syllables whose obstruent
onset differs in voicing. As seen, the intrinsic lowering effect of voicing on/0 is first
phonologized to create a rising tone on [b], whose consonant subsequently under-
goes devoicing. The result is a 'tonal bifurcation whereby the rising tone becomes
phonemic.
Much of the work on phonologization concerns such cases of re- or transphonol-
ogization of contrasts (Jakobson 1931; Hagge and Haudricourt 1978). There are at
least two possible interpretations of the voicing effects on duration and/0. The first is
that the phonologizations in (7) represent an enhancement of phonetic voicing. The
second is that they instead enhance the phonological [voice] CONTRAST. The latter
view of phonologization is explicitly adopted by a number of researchers:
... because no other articulation is likely to produce the F0 depression as an automatic byprod-
uct, the depression must itself be a product of an independently controlled articulation, whose
purpose is to enhance the [voice] contrast. (Kingston and Diehl 1994: 425)
i. Enlarging the scope of phonologization _9
Enhancement of the type we are considering here can be considered as a form of 'fine-tuning'
of a basic phonological contrast. (Keyser and Stevens 2001: 287)
I will use the term phonologization throughout to mean specifically the innovation of changes
to phonological representations, whether these result in neutralization of contrasts or not.
(Barnes 2006: 16)
In (8b) these H tones become rising after [d] and [g], a phonologization which could
be seen as an enhancement either of phonetic voicing or of their contrast with /t/ and
/k/. The real question is what would happen in (8c), where /b/ is phonetically voiced,
but does not contrast with /p/. Would the redundant voicing of [b] have an/0 effect,
as shown, or would this phonologization be blocked because there is no contrast with
[p] ? The phonological enhancement theories of Kingston and Diehl (1994) and Keyser
and Stevens (2001) would need to be tweaked by some notion of phonetic analogy
(Vennemann i972a) if (8c) does develop the rising tone. On the other hand, (8c)
seems to be allowed, if not predicted, by Ohala's (1981, 1992, i993b) theory of sound
lo Larry M. Hyman
To account for the relation between consonant types and tone in synchronie phonolo-
gies, Halle and Stevens (1971) and Halle (1972) proposed the following distinctive
feature analysis, where [stiff] = stiff vocal cords and [slack] = slack vocal cords:
(10) tones voiceless obstruents sonorants voiced obstruent
H M L ptkfs mnlwy bdgvz
stiff + - - + - -
slack - - + - - +
As seen, both H tone and voiceless obstruents are [+stiff, -slack], while L tone and
voiced obstruents are [-stiff, +slack]. Both M tone and sonorants are [-stiff, -slack].
Like vowels, sonorant consonants readily accept any tone, while obstruents have the
tonal affinities indicated above. While these features are often assumed to this day,
there are additional complications, as noted in the observations in (11).
(11) a. The above three-way distinction is not sufficient for tone (there can be a
fourth or fifth contrastive pitch level).
b. The above three-way distinction is not complete for consonants (Hombert
1978), e.g:
c. While the 'best' pitch depressors are fully or breathy voiced obstruents,
and although the phonetics of voice is complex (Kingston and Diehl 1994),
depressor consonants readily become unvoiced, e.g. in Nguni (Schachter
1976; Traill 1990; Downing 2009).
d. Prenasalized voiced stops [mb, nd, ng] are sometimes depressors, sometimes
not.
It is the observation in (nd) which potentially bears the question with which we
are concerned: Is it phonetic voicing or enhancement of CONTRAST!VE [voice] that
causes depressor effects? The following quotations show that there is a widespread
belief that the voicing on depressor consonants is necessarily contrastive:
... FO will only vary with the presence of voicing in stops that contrast for [voice].... (Kingston
and Diehl 1994: 436)
Since implosives and prenasalized stops are not contrastively voiced [in Suma], they are
assumed to be unspecified for the feature [voice] and, therefore, naturally excluded from the
depressor consonant group. (Bradshaw 1995: 263)
II convient de souligner que seules les consonnes phonologiquement sonoresc'est--dire
s'opposant des sourdes de mme point et mode d'articulationexercent un effet d'abaissement
[in Yulu], ce qui n'est jamais le cas des consonnes phontiquement sonores des sries glottalise
(partiellement), prnasalise, nasale continue et vibrante. Cet tat de fait prouve, s'il en est
besoin, la pertinence d'une approche phonologique des units articulatoires. (Boyeldieu 2009:
i99n; emphasis my own)
While Bradshaw and Boyeldieu assume that implosives fail to lower pitch because they
are non-contrastively voiced, the prevalent view has been that rapid lowering of the
larynx and tensing of the vocal chords provide quite adequate phonetic explanations
for why implosives tend to pattern with voiceless obstruents and H tone.2 On the
other hand, the ambivalent behavior of the voiced prenasalized stops [mb, nd, rjg],
which are sometimes depressors and sometimes not, is indeed puzzling. The question
is whether their ambivalence has anything to do with contrastiveness.
As a practicing structuralist phonologist, my initial hypothesis was that /mb, nd, ng/
would function as depressor consonants only in languages where they contrast with
/mp, nt, rjk/. In order to test this hypothesis, I examined the relatively small group
of African tone languages which have both depressor consonant effects and voiced
prenasalized consonants (ND), whether contrastive with their voiceless counterparts
(NT) or not. The results are presented in the following table:
2
More recently, Tang (2008) has argued that the tonal effects of implosives can pattern with those
of voiceless obstruents, voiced obstruents, or sonorants in different languages. While implosives do not
contrast in voicing in these languages, it is yet to be determined to what extent these differences can be
attributed to differences in phonetic production. The same conclusion will be reached with respect to voiced
prenasalized stops.
12 Larry M. Hyman
As seen, three of the four logical combinations of the two properties ([=bcontrast],
[zbdepressor]), were found. Setting aside borrowings (see below), the only languages
with a /NT, ND/ contrast were the Bantu Nguni languages of Southern Africa. Of
the remaining languages, all of those in the upper right quadrant are Chadic, as are
the first four languages of the lower right quadrant. Yulu is Central Sudanic, Suma is
Ubangian, and the Mijikenda languages are Bantu.
From (12) we conclude the following: (i) If/ND/ contrasts with /NT/, /ND/ will
have the same/0 effects as /D/. (ii) If ND does not contrast with NT, ND may have
the same/o effects as 111 or /D/. As mentioned, the first group consists solely of the
Nguni languages, e.g. Swati:
In all cases [in Swati], the prenasalized counterparts of depressor consonants are themselves
depressor consonants, while the prenasalized counterparts of non-depressor consonants are
themselves nondepressors. (Schachter 1976: 213)
It may be relevant to note that the Nguni languages have a rule of postnasal deaspi-
ration (NTh >> NT). The alternations in (13) illustrate the application of this rule in
Ndebele (Felling 1971; Galen Sibanda, pers. comm.):
(13) a. u-phondo 'horn' pi. im-pondo cf. impisi 'hyena
u-p awu 'sign, mark' pi. im-pawu imbizi pot, pan'
b. u-thango 'fence' pi. in-tango cf. intaba hill, mountain
u-t ungo 'rafter' pi. in-tungo indaba 'matter, news'
c. u-k h uni 'firewood' pi. irj-kuni cf. inkalo 'waists, hill passes'
u-k alo 'waist' pi. in-kalo in galo 'arm'
As seen in the forms to the right, this distributional constraint produces (near) min-
imal pairs involving unaspirated [mp, nt, nk] vs. voiced [mb, nd, ng]. The latter's
depressor effect on tone may therefore be a welcome cue for the voicing contrast.
It is interesting to note in this context that a much larger group of Bantu languages
have a rule of postnasal aspiration (NT >> NT h ), e.g. Mwiini, Zigula, Pokomo, Pare,
Shambala, Ngulu, Bondei, Namwanga, Chichewa. This process may then lead to the
i. Enlarging the scope of phonologization 13
(14) H M L
ND = depressor T N ND D
ND ^ depressor T N ND D
The problem is that we do not know what the intrinsic effects of ND on/0 really are.
The hierarchy in (14) suggests that ND has more of a depressor effect than N, but less
than D. We don't really know this other than from the phonological facts, which are
inconsistent. What is needed are instrumental studies of ND in languages which have
not phonologized depressor consonant effects. We need to do this both for languages
which have a phonetic NT/ND contrast, e.g. Luganda, and which don't, e.g. Kinande
ultimately establishing what the intrinsic effects of ND are expected to be even in
non-tone languages.
The fourth and last account seeks an explanation in terms of contrast, but in the
absence of/NT/ suggests that it is a different contrast that is being enhanced: /ND/ vs.
/N/. Languages which treat ND as a depressor do so to distinguish it further from N.
Particularly if the oral phase is minimal, there could be perceptual confusion between
ND and N, and hence transphonologization via the tone of the following vowel. Such
has happened in Masa, a Chadic language closely related to Musey. While /H/ tone
can occur after any consonant, there is a (near-) predictability of L vs. M tones as in
(15) (Catucoli 1978: 77):
As seen in (isa), L tone appears after a voiced obstruent, while M tone appears if
the root-initial segment is a voiceless obstruent, an implosive, or an oral sonorant,
including vowels. While several Chadic languages have similar distributions of L and
M tones, the originality of Masa is that it has a L vs. M contrast after nasals. The
reason, of course, is that there has been a sound change of *mb, *nd, *rjg > m,
n, rj with the original contrast being transphonologized in terms of L vs. M pitch.
Crucially, those roots which had historical *ND now have L tone, while those which
began with *N have M tone. Since closely related Musey treats ND as a depressor
(cf. (12)), we can be reasonably certain that the same was true in pre-Masa before
the prenasalized consonants lost their oral release. While we cannot predict which
nasals will be depressors, it is possible to say that contrastive [+voice] necessarily
conditions L tone: Le ton moyen est incompatible avec les consonnes sonores ayant
une correspondante sourde... ' (Catucoli 1978: 77).
i. Enlarging the scope of phonologization 15
/\ /\
[+prenasal] [-prenasal] [+voice] [-voice]
... if the last consonant in a word is an obstruent, it must be followed by /a/, whereas if the
last consonant is a sonorant, nasal, it cannot... Here, prenasalized consonants pattern with
obstruents (gmbd gourd' vs. gwgm 'dove').
3
Louis Goldstein has suggested to me that when the voicing of ND is non-contrastive, speakers need
not invoke articulatory mechanisms that result in lowered pitch, whereas such mechanisms are unavoidable
when there is a contrast with NT. It is significant that all of the examples cited by Lee (2008) involve
depressor consonants whose voicing is contrastive. Most striking is Tsonga (Baumbach 1987), where NDs
do not contrast with NT and are not depressors, but their contrastive breathy counterparts ND are. In such
a case, there is a disincentive for ND to exploit the gesture(s) which result in the lowering of/0. Thanks to
both Louis Goldstein and Maria-Josep Sol for helpful discussions of these matters.
i. Enlarging the scope of phonologization 17
(2l)
underlying vowels derived vowels
i u o a e o 9
ATR A X X X X
Front F X X X
Round R X X X
Open O X X X X X
As seen, the postradical process /a/ > [9] would be interpreted as the deletion of the
Open feature (which technically yields [i], from which Punu [9] is non-distinct).
The crucial point concerns the assimilation of // and /o/ to [e] and [o] before /i/
and /u/. This clearly has to be viewed as a phonologization of the common tendency
to tense mid vowels when they are followed by a high vowel in the next syllable.
However, it can be observed from the feature specifications in (21) that the ATR
feature, although active, is non-contrastive on the input vowels: Without ATR, /i/ and
/u/ would still be distinct from /, o, a/ in not having an Open feature. Thus, the tensing
process involves the phonologization of a non-contrastive feature.
Recall from section 1.2.1 that we allowed for the possibility that non-contrastively
voiced ND might exert a depressor effect by virtue of its contrast with plain nasals. It is
hard to make a similar case for Punu. Since post-radical /i/ and /u/ contrast only with
/a/, which is realized as [9], there seems to be little, if any, need to enhance this highly
redundant, minimal contrast. In fact, there are additional processes which further
obscure post-radical vowels. The first two in (22a, b) concern R- and F-VH, while
/a/-reduction is repeated in (22c).
v 22 ) a. a, i>u/ Cu i u a
b. a >> i / Ci
i i-i u-u i-9
c. a >> 9
u u-i u-u u-9
The rules in (22a, b) result in considerable loss of contrast. As seen in the distribu-
tions to the right, nine phonological inputs result in only six distinct outputs. What's
worse, when /CC-aC-i/ and /CC-aC-u/ are realized as CeCiCi and CeCuCu, the
input /a/ is no longer recoverable. The inescapable conclusion is that phonologiza-
tion is not necessarily triggered by contrastiveness, nor does it necessarily lead to
i. Enlarging the scope of phonologization 19
transphonologization (cf. Blevins 2004: 43). While Punu may ultimately develop an
underlying seven- (or eight-) vowel system, the mid-vowel ATR harmony appears to
have been phonologized as a 'mere' articulatory convenience!
In the following section we will extend these findings to other phonological phe-
nomena and then turn to their relation to grammaticalization in general.
I would like to suggest that the 'pronunciation in isolation form of a word is its lexical repre-
sentation. At the pause... words may undergo phonetic modifications; in particular, final oral
stops may become unreleased as in English and thereby lose their aspiration, and vocal cord
vibration may cease early, leading to devoicing. Since they occur at the pause, and the ad-
pausal variants are registered in the lexicon according to my proposal, these ad-pausal variants
may next appear in connected speech and may cause or undergo further changes in their new
context. (Vennemann 1974: 364)
Even a cursory glance over (23a-f) will reveal that contrastiveness is involved in
some aspects of the above phonological issues, but not others. Thus, it has long been
observed that syllable structure is never underlyingly contrastiveits very redun-
dancy or predictability in fact kept syllable boundaries (and syllable constituents) out
of early generative phonology:
One argument which has been raised against phonological syllables is that, unlike segments,
the location of a syllable boundary within a morpheme can never be phonemic. That is, two
morphemes such as /a$pla/ and /ap$la/ cannot differ only in their syllable structure.... Because
syllable boundaries can be determined automatically from universal principles and language-
specific facts about the segments contained in the syllables, generative phonologists have largely
worked under the assumption that the syllable is unnecessary in phonology. (Hyman 1975:192)
The syllable would thus appear to have more of an organizational function, rather
than a contrastive one, also presumably the metrical foot and higher level prosodie
domains. The phonologization of prepausal effects is perhaps less clear. It is tempting
to interpret languages which insert prepausal glottal stops as having phonologized
utter anee-final creakiness, as in British English (Henton and Bladon 1988):
... final GS maybe conditioned by a number of disparate factors from all parts of the grammar.
Since the common denominator appears to be 'before pause in declarative utterances', it is
tempting to conclude that such GS's result, historically, from the PHONOLOGIZATION of an
intrinsic variation in the speech signal. In the case of prepausal vowels, the speaker is expected
to cease voicing with the completion of the vowel. When GS is not present, this cessation is
smooth, in many cases giving the impression of a final slight breathiness. On the other hand,
when GS is present, the cessation of voicing is abrupt, giving the impression of a non-syllabic
articulation, i.e. a final 'consonant'. (Hyman 1988: 124)
While some languages suspend the final glottal stop in questions, suggesting a con-
trastive function between declaratives and interrogatives, the situation can be much
i. Enlarging the scope of phonologization 21
more complex. Thus, in Dagbani (Gur; Ghana), a prepausal glottal stop is inserted if
a complex set of conditions is met (Hyman 1988: 122):
In fact, final glottal stops do not always derive from prepausal phonologizations. In
certain Akan and Guang languages to the south of Dagbani glottal stops transparently
derive from apocope:
In Tikar, glottal stops are restricted to prepausal position (Jackson and Stanley 1977,
Stanley 1991). As proposed in Hyman (2oo8b), these final glottal stops result from the
debuccalization of coda *f and *fc which are realized as glottal stops before a pause,
but as 0 before a consonant. As part of the process, back vowels were fronted before
*f, while front vowels became backed before *fc, hence transphonologizing the F 2
properties of the two coda consonants as per Thurgood and Javkin (1975).
Concerning boundary narrowing, although Luganda must originally have short-
ened bimoraic long vowels before pause, present-day final vowel shortening is subject
to a number of complex factors and no longer requires pause (Hyman and Katamba
1990). It seems that while contrast can become implicated in a phonologization pro-
cess, it is typically not the driving force of the phenomena enumerated in (23). If the
analysis of Punu in (21) is correct, even a redundant distinctive feature, e.g. ATR, may
first become activated for allophonic effects and only later become contrastive.
While a phonetic motivation has been assumed for all of the phonologizations in
sections 1.1 and 1.2, at least some of the phonological properties in (23) raise the
question of whether phonetics is the only source of phonology, i.e. the only input to
phonologization. At least three other sources of phonology have been proposed in
the literature. First, phonology has been claimed to occasionally arise from frequency
distributions:
... it is possible for a phonological generalization to arise from frequency distributions in the
lexicon rather than from pure coarticulation effects. However, the former type are much less
22 Larry M. Hyman
frequent, since the conditions for coarticulation effects are always present in spoken language.
(Bybee 2001: 94-5)
Second, certain phonological properties have been said to derive from analogical
processes:
... new phonemes can arise through morphophonemic analogy.... In all such cases... no new
distinctive features are added.... morphophonemic genesis merely leads to a combination of
distinctive features which had not previously been used. (Moulton 1967: 1405)
... phonetically unnatural patterns can also arise by analogical processes. Since they are pho-
netically unnatural, they do not have purely phonological origins, but reflect instead the gener-
alization of fortuitous morphological patterns even the most regular morphophonological
patterns may lack phonetic origins. (Garrett and Blevins 2009: 543)
(27) phonetic > phonologized > phonemicized > morphologized > lexicalized >
LOSS
(28) pragmatic > syntactic > morphological > morphophonemic > lexical > LOSS
As seen, Givn was primarily concerned with the development of syntax from prag-
matics, which he refers to as 'syntacticizatiori Once a property has become syntactic,
it can then become morphological, as when an original independent word becomes
a concatenated affix, perhaps with phonological reduction or erosion. Givn's mor-
phophonemic stage arises when the original source is obscured, ultimately producing
a phonological alternation which is morphologically conditioned or morphologized.
This alternation may then become lexicalized and lost as in (26).
While phonology plays a role in Givn's view of the rise and fall of grammar, he
is mainly interested in the first three stages of (28), for which he had established
the mantra, 'Today's morphology is yesterday's syntax' (Givn 1971: 413). In fact,
the parallel in (29) is something that phonologists readily acknowledged during this
period:
... it is... very much part of the business of phonologists to look for 'phonetic expla-
nation of phonological phenomena just as when syntacticians look for pragmatic
accounts of aspects of sentence structure, the reason is to determine what sorts of facts
the linguistic system proper is not responsible for... (Anderson 1981: 497)
Phonetics provides much of the substance of phonology, and pragmatics provides much
of the substance of syntax. However, the ever-present phenomena of phonologization
and grammaticalization cannot be explained by reference to the origin of the substance.
(Hyman 1984: 83)
24 Larry M. Hyman
Example (sib) shows that when informational or contrastive focus is placed on the
adverb n 'today', it appears in the IAV position that would otherwise be occupied
by the direct object. Similarly when the subject is in focus in (320), it too appears
in the IAV position, with an expletive subject holding its place. WH-elements also
normally go in the IAV position, as expected, as do other constituents of the sentence,
particularly when they are singled out for exclusive focused information.
The above examples are intended to illustrate the similarities involved in quite
different domains when 'substance' becomes grammaticalized as 'form': The phonol-
ogization of phonetics and the syntacticization of pragmatics are exactly parallel.
Interestingly, reinforcement of a paradigmatic contrast, which has been assumed in
enhancement versions of phonologization and transphonologization, does not seem
applicable here. When the grammar requires a subject to be definite or a focused
element to appear in the superficial object slot, there is the suppression of a paradig-
matic contrast in the one case (subjects no longer contrast in definiteness) vs. the
establishment of a syntagmatic contrast in the other. (To simplify considerably, an
element in the IAV is in a privileged position vis--vis other elements in the sentence.
For recent statements on the IAV and focus in Aghem, see Hyman 201 ob and Hyman
and Polinsky 2009.)
Having established that phonologization bears resemblance to Givn's syntacticiza-
tion, it seems reasonable to incorporate it under the general heading of grammatical-
ization. In (33) I have added phonologization at the bottom of the list of the common
linguistic effects of grammaticalization presented by Heine et al. (1991: 213):
1.4 Conclusion
In the preceding sections I have established that phonologization need not involve
contrast, nor even be limited to cases where something phonetic becomes phono-
logical. Taken literally to mean 'the processes by which phonology comes into being',
phonologization becomes one branch of the more general phenomenon of grammat-
icalization: 'the processes by which grammar comes into being', i.e. Hopper's 'move-
ments toward structure'. Unfortunately this is not the usual meaning of 'grammatical-
ization', which often refers to the historical development of grammatical morphemes:
'Grammaticalization consists in the increase of the range of a morpheme advancing
from a lexical to a grammatical or from a less grammatical to a more grammatical
status.' (Kurylowicz 1965 [1972: 52] cited by Heine et al. 1991: 3). Thus, the linguistic
effects of grammaticalization indicated above in (32) mostly have to do with what
happens when a lexical morpheme (e.g. a word) becomes a grammatical morpheme
(e.g. an enclitic or affix). In my use of the term, grammaticalization refers more gener-
ally to the development of any aspect or component of grammar (syntax, morphology,
phonology).
This is but one of two terminological problems. The first is that there is no gen-
erally accepted term meaning conversion of substance to form'. While grammat-
icalization' would have been an excellent and transparent choice, it has been pre-
empted for specific phenomena, namely, the creation of grammatical morphemes.
Other terms I have heard are either inexplicit or awkward, e.g. codification, cod-
ing strategies, linguistification, grammatogenesis, movements toward structure. The
second terminological problem is that terms such as phonologization, grammati-
calization, syntacticization, lexicalization, etc. are potentially ambiguous, since they
only indicate the end product, not the source. This issue arose in the discussion in
section 1.3.1 of whether the possible development of phonology from non-phonetic
sources should be included under phonologization. As has been pointed out by others,
alternative terminology might instead refer to the source, hence dephoneticization,
dephonologization, demorphologization, etc. (Dressier 1985; Janda 2003; Joseph and
Janda 1988).
I would like therefore to conclude by making the following modest and totally
impractical proposals: (i) We should create terms which indicate both the input and
the output of the process, (ii) The input should be indicated by the prefix de- (indi-
cating a change in status) or re- (indicating a restructuring with the same status), (iii)
The output should be indicated by a prefix placed on the base -grammaticalization
(or -grammatogenesis*}. (iv) Grammaticalization should be taken to mean that the
output is grammar, whether phonology, morphology, or syntax. With these propos-
als, a systematic terminology of a catalogue of different types of grammaticalization
(in the broader sense) might look like (34).
i. Enlarging the scope of phonologization 27
... the concerns of Grammar... are not derivable from extragrammatical factors. (Hyman
1984: 71)
"What are the laws of motion but the expectations of reason concerning the posi-
tion of bodies in space? We are thus justified, not only in saying that all complete
knowledge involves anticipation, but also in affirming that all rational expectation is
knowledge." (Hitchcock 1903: 673)
2.1 Introduction
Traditionally, the term phonologization has been used to describe a diachronic
change within a given language system from a state of phonetic variation to that of
phonological generalization (Hyman 1976). More specifically, we take this to mean a
diachronic shift from variation across a large number of uncorrelated dimensions to
correlated variation of lower dimensionality. Such transitions are relevant both to the
creation of new categories and patterns (e.g. phoneme, stress pattern), as well as to the
change from one existing category into another. Many factors external to a language's
grammatical system have been shown to play an influential role in this process. Some
of these external factors are listed below (for relevant discussion see Archangeli and
Pulleyblank 1994; Blevins 2004; Bybee 2001; Culicover and Nowak 2002; Davidson
2007; Guin 1998; Hayes and Londe 2006; Hume and Johnson 2ooia; Hyman 1976;
Joseph and Janda 2003; Jeffers and Lehiste 1979; Lindblom 1990; Moretn and
* We owe a debt of gratitude to Kathleen Currie Hall, Dahee Kim, Adam Ussishkin and Andrew Wedel
for much lively discussion regarding the ideas in this chapter. We would also like to thank the following
people for their input on aspects of this research: Paul Boersma, Chris Brew, Joan Bybee, Jennifer Cole,
Peter Culicover, Alex Francis, John Goldsmith, John Hale, liana Heintz, Robert Kirchner, Kate Kokhan, Jeif
Mielke, William Schuler, Andrea Sims, Rory Turnbull, Mike White, Alan Yu, members of the Ohio State
phonetics/phonology and socio-historical linguistics discussion groups, and two anonymous reviewers.
30 Elizabeth Hume and Frdric Mailhot
Thomas 2007; Ohala 1981, 19930, 2003; Peperkamp, Vendelin and Nakamura 2008;
Yu 2007, inter alia}.
Grammar-extern al factors influencing phonologization include:
prone to change than elements occurring away from the extremes. Phonologization is
thus predicted to preferentially affect elements linked to extreme degrees of surprisal,
i.e. that have a small entropie contribution. Interestingly while the mechanisms that
affect elements with very low or very high surprisal may differ, they pattern together
in being prone to change given their low contribution to predicting outcomes in a
system.
The current approach also speaks to the nature of change. Unstable elements with
high surprisal are biased to change in the direction of a similar element or pattern
with lower surprisal, consistent with observations regarding analogical change (see
e.g. Phillips 2006; Wedel 2007). In other words, change affecting high surprisal ele-
ments is predicted to preserve structures that the speaker-hearer is already familiar
with. Conversely, as developed below, change in patterns with low surprisal need not
be structure preserving, and such patterns are typically prone to production-based
reduction processes (Bybee 2001), which can introduce novel patterns into a speaker-
hearer's linguistic system.
Before delving into these points in more detail, we define the information-theoretic
concepts of surprisal and entropy more rigorously, then briefly discuss the cognitive
state modeled by surprisal, which we call expectedness'. With this groundwork in
place, we turn to the heart of the chapter: the relevance of entropie contribution and
surprisal for phonologization and language change. Section 2.3 outlines in general
terms the effects of surprisal on language systems. The section also focuses on the
linguistic consequences of two key properties of our approach: instability and bias. In
doing so, we take a closer look at the potential for a given element to undergo change
or be the outcome of change given the degrees of surprisal associated with it.
(1)
2
where X is an event ranging over a set of possible outcomes {xlix2,...,Xi,...} each
with an associated probability, P(X = #/). In the general case, these probabilities are
defined contextually, e.g. phonologically, morphologically, etc.
Figure 2.1 illustrates the relation between probability and surprisal. Surprisal varies
continuously between zero and positive infinity; the occurrence of a highly likely
event (e.g. observing some vowel in a context where it is the only permissible one)
has low surprisal, while a highly unlikely event (e.g. observing some phonotactically
prohibited sequence of segments) has high surprisal. This reflects the intuition that the
occurrence of improbable events is highly surprising, while the occurrence of highly
likely events is not surprising.
As noted above, an element's contribution to the uncertainty (i.e. entropy) asso-
ciated with predicting the outcome of an event is its probability multiplied by its
surprisal, as given in Equation 2,
(2)
where X, as above, is an event whose outcome can take one of several values in
the vocabulary set Vx (e.g. outcomes of X could be any vowel in a language under
consideration), P(X = #/) is the probability that outcome Xi will be observed, and the
quantity log2P(X = #/), as discussed above, is the surprisal of outcome X = #/. We
label Hc(x) the entropie contribution of x.
1
We follow convention here and use a logarithmic base of 2, which allows us to express surprisal and
entropy in units of bits. Using a different logarithmic base is equivalent to a multiplicative scaling.
2
Formally, a random variable.
2. Entropy and sur frisai in phonologization and language change 33
(3)
Probabilistic notions are clearly relevant to the study of language acquisition, use,
change and representation, as discussed in works such as: Bod, Hay, and Jannedy
(2003); Boersma and Hayes (2001); Bybee (1985, 2001); Coleman and Pierrehumbert
(-997); Frisch, Pierrehumbert and Broe (2004); Goldsmith (2007); Greenberg (1966);
Hooper (i9/6b); Hume (2OO4a, b); Jurafsky et al. (2001); Phillips (1984, 2006); Luce
and Pisoni (1998); Pitt and McQueen (1998); Trubetzkoy (1969); Vitevitch and Luce
(i999); Zipf (1932), inter alia. Hence, the cognitive state modeled by surprisal corre-
lates with probability. The notion 'probability' here may be approximately equated
to subjective degree of belief, as in a Bayesian approach to cognition (Pearl 1988;
Chater, Tenenbaum and Yuille 2006), in which prior states of knowledge are taken
into consideration when computing the probability of some future event or state.
To illustrate, consider a hypothetical language, jf, with the following vowels:
Vj2? = {i, e, a, o, u, 9}. We wish to compute the entropy of jf s system of vowels;
more specifically, we want a measure of the amount of uncertainty associated with e.g.
predicting the observation of some vowel in a given phonological context, an event
34 Elizabeth Hume and Frdric Mailhot
we label L. First we take the case where each vowel is assumed to be, ceteris paribus,
equiprobable; then each v G V<> has a probability of o b s e r v a t i o n T h e
entropy computation is then as follows:
(4)
Of course, since the entropy of a system is its average surprisal, and each vowel in
this case has the same surprisal value (since they are equiprobable), the entropy of
this system is equal to each vowel's surprisal. To illuminate the relationship between
surprisal and entropy more clearly, we can examine how the entropy of this system
changes as we alter the probability estimates for particular vowels. As a simple initial
case, assume that one vowel, e.g. {9}, is more probable in some context than the others,
which are all equiprobable. For concreteness let us assume that the probability of
observing a schwa, P(L = 9), is |, hence the surprisal S(L = 9) = Iog 2 | ~ 1.4.
Then the surprisal of observing any of the remaining vowels is S(L = v ^ 9) =
Iog 2 1 = 3- The entropy of the system under this distribution is then
(5)
Note that the entropy in this case is lower than when all vowels are equiprobable. This
is because there is now less uncertainty about which vowel will occur in the context
under consideration, due to schwas higher probability of observation. We state here
without proof the theorem that the entropy of a system is maximized when all of its
outcomes are equally probable (Shannon 1948: 11).
Consider finally a slight generalization of the previous case, where we examine
all possible values for the probability of schwa occurring in some context, assuming
the remaining vowels are equiprobable. In lieu of additional calculations of entropy,
consider the graphs in Figure 2.2 and Figure 2.3: the first is of the entropy of^f's vowel
system versus the probability of observing schwa, the second is of schwas contribution
to the entropy of^f versus its probability of observation.
Note that entropie contribution goes to zero in Figure 2.3 for both low and high
probabilities. That is, outcomes known to be either (near) certain or (near) impossible
contribute little to the entropy of the system. As will be discussed further below, the
fact that surprisal extremes contribute little to system entropy is crucial to our model
2. Entropy and sur frisai in phonologization and language change 35
FIGURE 2.2 Entropy of J\ vowel system, as a function of the probability of observing {9},
assuming equiprobability of other vowels
FIGURE 2.3 Contribution of {9} to the entropy of , as a function of its probability of obser-
vation, assuming equiprobability of other vowels
36 Elizabeth Hume and Frdric Mailhot
of phonologization. In Figure 2.2, the entropy of the system does not go to zero for
P(L = a), since there is still maximal uncertainty about which of the remaining five
vowels will be observed. Before turning to the details of our model, we discuss more
specifically the measures relevant to the calculation of surprisal.
(6)
A segment's entropie contribution, H c (v/), provides a measure of the degree to which
that element is a factor in Jxf's effectiveness as a system of communication.
How the various factors interact and contribute to the overall surprisal associated
with a particular system is an important line of research yet beyond the scope of this
chapter (though see Hume et al. 2011). As we discuss below, however, it is surprisal
extremes that are of particular relevance to the present discussion, since elements
at these ends are least stable and thus good candidates for phonologization. In this
regard, it is reasonable to assume that extreme degrees of surprisal typically arise when
the impact of several factors point to a common end of the continuum, although
a single factor could potentially contribute sufficiently to determine the surprisal
on its own.
One might ask why we need to talk about surprisal and entropie contribution,
rather than simply limiting our discussion to probability itself. We can think of at
least three reasons. First, although it is a formal measure, the quasi-metaphoric term
'surprisal' helps to evoke and preserve the intuition that we are discussing human cog-
nition, and the impact of (socio)cognitive factors on phonologization and language
change. Second, surprisal is a key component of the entropy of a set of possible
outcomes (e.g. in a linguistic system), and it is the notion of entropy that allows us
to provide a unified account of those elements that are prone to change. Third, Hume
et al. (2011) show that probabilities based on confusability and frequency alone cannot
predict the quality of the epenthetic vowel or deleted vowel in French. Rather, it is the
entropie contribution based on these combined measures that correctly predicts the
observed patterns.
Expectation refers to the cognitive function that helps fine-tune our minds and bodies to
upcoming events... The biological purpose of expectation is to prepare an organism for the
future... The capacity for forming accurate expectations about future events confers significant
biological advantages. Those who can predict the future are better prepared to take advantage of
opportunities and sidestep dangers. Over the past 500 million years or so, natural selection has
favored the development of perceptual and cognitive systems that help organisms to anticipate
future events... Accurate expectations are adaptive mental functions that allow organisms to
prepare for appropriate action and perception.
Such a resonance develops when bottom-up signals that are activated by environmental events
interact with top-down expectations, or prototypes, that have been learned from prior experi-
ences. The top-down expectations carry out a matching process that selects those combinations
of bottom-up features that are consistent with the learned prototype while inhibiting those
that are not. In this way, an attentional focus starts to develop that concentrates processing on
those feature clusters that are deemed important on the basis of past experience. The attended
feature clusters, in turn, reactivate the cycle of bottom-up and top-down signal exchange. This
reciprocal exchange of signals eventually equilibrates in a resonant state that binds the attended
features together into a coherent brain state. Such resonant states, rather than the activations
that are due to bottom-up processing alone are proposed to be the brain events that represent
conscious behavior.
on those elements (e.g. auditory cues) considered important on the basis of past
experience (cf. Kirby (this volume) for a model of a diachronic shift in the weights
given to various acoustic cues). Given that attentional focus is a crucial component
of learning (e.g. Kruschke 2003; McKinley and Nosofsky 1996), it is directly relevant
to phonologization, since for change to take place, the user must learn to associate
phonological meaning with some phonetic detail. Further, since the resonant states
that result from the interaction of expected outcomes and perceptual input are 'the
brain events that represent conscious behavior', it is instrumental in shaping the
form that behavior takes. This is of particular relevance for our understanding of
phonologization, since although we often refer to the way that languages behave, it
is in fact the behavior of the language user that is at issue. It is the individual who,
for example, perceives the auditory cues that are subsequently phonologized as an
epenthetic vowel, or fails to produce the gestures involved in making one sound as
opposed to another.
It is perhaps worthwhile pointing out that while the discussion above has focused
on phonetic, processing and usage factors, an additional advantage of the approach
developed here is that it can be easily expanded to take into account other factors
including e.g. sociolinguistic attributes and attitudes. For example, if a language vari-
able, such as the pronunciation of [n] in e.g. running, has a specific social meaning
(Campbell-Kibler 2005), there are expectations associated with when and by whom
the variable is used which can influence behavior including an individual's attitudes
regarding its usage. We leave this topic open for future consideration.
In order to answer the question of why this might be so, we take any token
of language use (i.e. any speaker-hearer interaction) to be an instantiation of a
communication system striving (perhaps implicitly) to meet the competing demands
of efficiency and reliability. The reliability of a communication system is a function
of the degree of redundancy in transmitted elements. If symbols are on average
highly redundant (i.e. recapitulating information available elsewhere), then they
are more predictable/probable, and hence less informative (i.e. lower surprisal).
Efficiency, conversely, is a function of a communication systems rate of transmission
of information; increasing efficiency corresponds to transmitting more informative
(i.e. higher surprisal) items on average. Consider now the effects of noise; a reliable
system will in general be able to recover from an error in transmission, as the
built-in redundancy ensures that the information lost is likely to be predictable
from context, whereas a maximally efficient system, being non-redundant, makes
no such guarantees, and hence is more adversely affected by transmission errors.
The net result of striking a balance between the demands of reliability (maximal
redundancy/predictability) and efficiency (minimal redundancy/predictability) is
that elements that contribute significantly to the entropy of the system, those that
are neither too surprising, nor too expected, are most important for effective or
successful communication (see Lindblom 1990; Aylett and Turk 2004; Levy and
Jaeger 2007; Jaeger 2010, for related discussion). Interestingly, while elements at
opposite ends of the continuum pattern together in terms of being unstable, the
cause of the instability differs, as discussed below.
2.3.1.1 Low surprisal Low surprisal elements are associated with high frequency,
weak perceptual distinctiveness and simple articulations, among other properties. As
is well documented, elements associated with these properties tend to be unstable.
We acknowledge that isolating the effects of these properties may be a non-trivial
enterprise.
In terms of perception, elements with poor perceptual distinctiveness can result
in a failure to correctly parse the signal, which may result in assimilation or deletion
(Jun 1995) and subsequent sound change. This is consistent with Ohala's (1981) thesis
that an ambiguous signal can cause misperception giving rise to language change.
In fact, the present account subsumes Ohala's proposal as a special case, given that
low surprisal, on our account, can result not only from confusability, but from any of
the factors listed immediately above, presumably among others. Production-related
instability in cases of low surprisal may lead to, for example, reduction, deletion, or
assimilation, a claim supported by the phonetic, phonological and psycholinguistic
literature.
For example, words that occur frequently tend to be reduced, and high frequency
sounds and sequences are prone to processes such as lenition, deletion, and assimi-
lation, among others (cf. Bybee 2001, 2002; Bybee and Hopper 2001; Fosler-Lussier
42 Elizabeth Hume and Frdric Mailhot
and Morgan 1999; Frank and Jaeger 2008; Hooper i9/6b; Jurafsky, Bell, Gregory, and
Raymond 2001; Jurafsky 2003; Munson 2001; Neu 1980; Patterson and Connine 2001;
Phillips 1984, 2001, 2006; Pierrehumbert 2ooia; Raymond, Dautricourt, and Hume
2006; Tabor 1994; Zuraw 2003). Further, high frequency function words in English
such as just and and have been found to undergo deletion of /t, d/ at significantly
higher rates than less frequent words containing alveolar stops in comparable contexts
(cf. Bybee 2001, 2002; Guy 1992; Jurafsky et al. 2001; Raymond et al. 2006). The
result of phonological processes such as metathesis are also conditioned by frequency
(Hume 2OO4b). Consistent with the current approach, changes often have their start in
high frequency forms, subsequently spreading to other similar forms (see, e.g., Bybee
2001; Phillips 2006, inter alia).
It is worth pointing out that this approach is consistent with the observation that
the more a routine is used, the more fluent it becomes (Bybee 2001, 2002; Phillips
2006; Zipf 1932). However, in the current approach changes are viewed as more than
a practice effect. On our view, production, perception, and processing are guided by
surprisal and expectedness, and we hypothesize that this grounds the physiological
reflexes of practice in a cognitive explanation.
2.3.1.2 High surprisal High surprisal is associated with elements that occur with
very low frequency, have complex articulations, and/or have extremely noticeable
perceptual cues, among other factors. Given the link between surprisal and expected-
ness, when an element has high surprisal, its realization will correspondingly be only
weakly expected by the language user. This, we suggest, gives rise to instability from
both the speaker s and hearer s perspectives.
From a production perspective, it is well established that articulatory complexity
can create instability, with phonological consequences taking the form of deletion,
metathesis, assimilation, or other repairs to the unstable form. We provide an example
from metathesis further below.
Very low frequency sequences are also unstable. Treiman et al. (2000), for example,
found that English speakers made more errors in pronouncing syllables with less
common rimes than those with more common rimes. Similarly, Dell (1990) reports
that low frequency words are more vulnerable to errors in production than high
frequency ones. Interestingly, when a form is unstable because aspects of its realization
are unexpected, a speaker may also choose' to compensate by producing it more
slowly and carefully. In this regard, Whalen (1991) found that infrequent words were
longer in duration than frequent ones. The current approach is also consistent with
the observation that low frequency is a factor associated with forms that undergo
analogical change.4 Phillips (2001, 2006), for example, presents numerous examples
of change affecting low frequency items such as the case of [h] deletion in Old English
4
In her study of analogical change in Croatian morphology, Sims (2005) shows frequency as well as
social salience to be contributing factors, findings that are consistent with the current approach.
2. Entropy and sur frisai in phonologization and language change 43
(Toon 1978): low frequency words underwent deletion first giving rise to nut, ring,
loaf, from OE hnutu, hring, hlaf.
With respect to frequency, an interesting consequence of the current approach is
that it provides a unified account of the observation that high and low frequency
elements tend to lead language change (Bybee 2001; Phillips 1984,2000). As discussed
in subsection 2.2.1, frequency is a determinant of, and in direct proportion to, the
probability assigned to a linguistic outcome, hence to its surprisal. To the extent that,
all else being equal, low frequency correlates with high surprisal and high frequency
corresponds to low surprisal (recall Figure 2.1), the current theory makes the strong
and apparently correct prediction that high and low frequency elements will both be
prone to change.
Metathesis provides an apt example showing low frequency and articulatory com-
plexity contributing to instability, thus promoting change. In Hume's (2OO4b) study
of 37 cases of consonant/consonant metathesis, low frequency of occurrence and
similarity emerged as significant predictors of metathesis. In all cases, a consonant
sequence that underwent metathesis was a non-occurring or infrequent structure
in the language. In some cases, the word in which the sequence occurred was also
uncommon, contributing an additional layer of surprisal to the sequence. Further,
in over a third of the cases, the sounds involved were similar. Some shared the same
manner or place of articulation, or agreed in sonorancy, differing only in place and/or
manner, as attested in Georgian (Hewitt 1995; Butskhrikidze and Van de Weijer
2001), Chawchila (Newman 1944), and Aymara and Turkana (Dimmendal 1983),
among other languages. The significance of similarity in the present context relates
to the probability of accurate production. To the extent that sounds in a sequence are
articulatorily similar, it is reasonable to expect an increase in the effort required to
accurately produce and thus render each sound distinct.
A further prediction of the current approach is that elements with extremely dis-
tinctive cues will also be unstable. Clicks would seem to be an example of this type. The
observation that clicks are typologically rare and do not seem to be spreading among
language communities may provide some evidence for this prediction (A. Miller,
p.c.).5 However, our understanding of variable processes involving clicks and other
high surprisal elements is incomplete at this time and thus, we leave this issue for
future consideration. It is worth noting, however, that the patterning of sequences that
are neither overly noticeable or unnoticeable lend support for the present approach in
that they are predicted to be more stable than sounds/sequences at the extreme ends
of the noticeability pole. We thus hypothesize that common sound sequences such
as stop+vowel, sC, and other perceptually well-formed sequences, would be situated
away from surprisal extremes.
5
It is likely that articulatory complexity is also a factor, meaning that both articulatory and perceptual
factors contribute to their high surprisal.
44 Elizabeth Hume and Frdric Mailhot
consonant; subjects were biased toward the fricative with the highest transitional
probability. This is also consistent with the findings of Vitevitch and Luce (1999),
which reveal segment and sound sequence probabilities to be most influential when
listeners are presented with unfamiliar words; that is, high surprisal words. The obser-
vation that bias is especially strong in cases of high surprisal is of particular relevance
to understanding phonologization. It predicts that if an item is unstable because of
high surprisal, it will be prone to subsequent change to a pattern with lower surprisal;
that is, it will be biased in the direction of a more expected pattern. This is exactly the
pattern of change observed in cases of analogical change.
The study of metathesis once again provides an appropriate example. As noted
above, sequences prone to metathesis are those associated with high surprisal due to
a low probability of accurate production, and the user's limited experience or lack of
experience with the sequence (and perhaps the word it occurs in as well). As predicted,
the direction of change is biased toward a more expected structure with lower sur-
prisal. As the study of metathesis shows, the resultant structure is not only more com-
mon than the form that undergoes metathesis, but it has a higher probability of being
accurately produced, resulting in better perceptual cues. Building on Hume (2OO4b),
the reason why improved perceptual salience is a characteristic of so many results of
metathesis is thus simply an artifact of the nature of sequences that undergo metathe-
sis (those associated with high surprisal) and those that influence how the speech
signal is parsed (those associated with low surprisal); in short, unstable sequences
that undergo metathesis are biased toward phonologically similar patterns with lower
surprisal. Variable pronunciations of the word chipotle provide a simple illustration:
The influence of native language patterns on metathesis can also be heard in some varieties
of American English in the variable pronunciation of t-l in the word, chipotle, the [Nhuatl]
name for a particular kind of pepper and, recently, for a chain of Mexican restaurants. Both
orders of the final two consonants can be heard, even in the speech of the same individual:
chipotle (the original order) or chipolte (the innovative order) [... ] The two sounds involved are
archetypical 'metathesis sounds' and thus contribute to indeterminacy: /t/ with perceptually
vulnerable cues and /!/ with stretched out features [...] Another factor [...] is unfamiliarity
with the borrowed word [...] With indeterminacy, the order of sounds is inferred based on
experience, with the bias towards the most robust order. As predicted, although both /tl/ and
/It/ occur intervocalically in English [...] /tl/, in the original form, occurs in 67 words, while
the innovative /It/ sequence occurs in 356 words. (Hume, 2004b: 223)
analogical change and the observation that the output of metathesis is an existing
structure in the relevant language support this view.
Conversely, the result of change involving unstable patterns with low surprisal
need not be structure preserving. In such cases, the linguistic consequence of high
expectedness is under-realization; that is, a pattern contributes little to the entropy of
the system and is thus less crucial to the message. As discussed above, these elements
can thus be reduced in the interests of communicative efficiency without sacrificing
reliability.
An example of non-structure-preservation comes from the observation that reduc-
tion processes involving low surprisal segments, such as English schwa, can create
syllable structures not otherwise occurring in the language. Schwa can be considered
a low surprisal element given its simple articulation, its poor distinctiveness, its pre-
dictability in unstressed syllables, and its overall high frequency of occurrence in the
language (Hume and Broomberg 2005). As such, a native speaker will have strong
expectations concerning the occurrence of schwa in the initial unstressed syllable of a
word such as telepathy, thus licensing its omission, i.e. [tkpaOi]. While schwa deletion
can result in phonotactically licit syllable onsets (e.g. police [plis]), it can also create
onsets such as [tl], which do not otherwise occur word-initially in the language.
2.3.3 Summary
The ideas presented above are summarized in Table 2.1. It is proposed that a language
pattern is prone to change when, as listed in column I, it has a very low or very high
degree of surprisal and thus contributes little to the entropy of the linguistic system.
Column II identifies some of the factors that can give rise to the relevant level of
surprisal. The rightmost column summarizes the discussion above concerning bias
and the nature of the outcome of language change. For patterns that are unstable due to
high surprisal, bias influences the direction of change, while for unstable low surprisal
elements, the outcome of change takes some form of reduction which can result in an
increase in entropie contribution.
2.4 Conclusion
As we hope to have shown in the preceding pages, taking into account communicative
effectiveness, as formally expressed in terms of surprisal and entropy, allows us a
deeper understanding of phonologization and language change. To the extent that
this approach is on the right track, it has the potential to provide a unified model
of the factors conditioning an individual's language system. Given that the preceding
pages offer only a sketch of the current theory, many important aspects remain unre-
solved. These include at least the following fundamental issues: (a) understanding how
the diverse factors interact and contribute to cognitively and linguistically plausible
estimates of an element's surprisal, and (b) identifying the consequences of differing
degrees of surprisal and entropy for language systems, at the segmental level and
beyond.
This page intentionally left blank
Part II
Phonetic considerations
This page intentionally left blank
3
3.1 Introduction
Interest in the phonetics of sound change is as old as scientific linguistics (Osthoff
and Brugman iS/S).1 The prevalent view is that a key component of sound change is
what Hyman (1976) dubbed PHONOLOGIZATION: the process or processes by which
automatic phonetic patterns give rise to a language's phonological patterns. Sound
patterns have a variety of other sources, including analogical change, but we focus here
on their phonetic grounding.2 In the study of phonologization and sound change, the
three long-standing questions in (i) are especially important.
(i) a. Typology: Why are some sound changes common while others are rare or
nonexistent?
b. Conditioning: What role do lexical and morphological factors play in sound
change?
c. Actuation: What triggers a particular sound change at a particular time and
place?
In this chapter we will address the typology and actuation questions in some detail;
the conditioning question, though significant and controversial, will be discussed only
briefly (in section 3.5.3).
1
For helpful discussion we thank audiences at UC Berkeley and UC Davis, and seminar students in
2008 (Garrett) and 2010 (Johnson). We are also very grateful to Juliette Blevins, Joan Bybee, Larry Hyman,
John Ohala, and Alan Yu, whose detailed comments on an earlier version of this chapter have saved us from
many errors and injudicious choices, though we know they will not all agree with what they find here.
2
Types of analogical change that yield new sound patterns include morphophonemic analogy (Moulton
1960,1967) and analogical morphophonology (Garrett and Blevins 2009). Of course, the source of a pattern
is not always clear. For example, patterns like the linking [i] of many English dialects have been attributed to
a type of analogical change called 'rule inversion (Vennemann i9/2a), perhaps not phonetically grounded,
but work by Hay and Sudbury (2005) and others calls this into question. Note that some phonological
patterns, while phonetically grounded in a broader sense, correspond to no specific phonetic patterns
because they arise through the telescoping of multiple phonetically grounded sound changes. Again, it
is not always easy to identify such cases confidently.
52 Andrew Garnit and Keith Johnson
The typology question concerns patterns like those in (2-3). In each pair of exam-
ples in (2), one is a common sound change while the other is nonexistent. The ultimate
causes of these patterns are clear enough where there are obvious phonetic correlates,
but the mechanisms explaining the relationshipthat is, the precise mechanisms of
phonologizationare still disputed.
(2) Typologically common vs. nonexistent sound changes
a. Common: [k] > [tj] before front vowels (Guin 1998)
Nonexistent: [k] > [q] before front vowels
b. Common: vowel harmony involving rounding (Kaun 2004)
Nonexistent: vowel harmony involving length
c. Common: vowel reduction restricted to unstressed syllables (Barnes 2006)
Nonexistent: vowel reduction restricted to stressed syllables
d. Common: consonant metathesis involving sibilants (Blevins and Garrett
2004)
Nonexistent: consonant metathesis involving fricatives generally
Our typological point can be sharpened further. Not only are there generalizations
about patterns of sound change, but the typology is overwhelmingly asymmetric. For
example, the inverse of each of the common changes in (3) is nonexistent.
(3) Asymmetries in sound change
a. Common: [k] > [tj] before front vowels
Nonexistent: [tj] > [k] before front vowels
b. Common: intervocalic stop voicing (Kirchner 2001; Lavoie 2001)
Nonexistent: intervocalic stop de voicing
c. Common: [t] > [?] word-finally (Blevins 2004: 120-1)
Nonexistent: [?] > [t]
It is uncontroversial that such asymmetries in sound change must (somehow) reflect
asymmetries in phonetic patterns. We will refer to these as BIASES.
Our approach to the typology question, then, is grounded in processes of speech
production and perception and in the phonetic knowledge of language users. The bulk
of our chapter is devoted to an evaluation of various components of speech production
and perception, with an eye to identifying asymmetries (biases) that should be associ-
ated with each component. Our hypothesis is that various types of sound change can
be grounded in the various speech components based on their typological profiles.
We hope this approach yields a useful framework for discussing the relation between
patterns of sound change and their phonetic correlates.
From a broader perspective the typology question can be seen as a facet of what
Weinreich et al. (1968) call the CONSTRAINTS PROBLEM: determining 'the set of
3- Phonetic bias in sound change 53
possible changes and possible conditions for change' (p. 183). The second main ques-
tion we address in this chapter is what they call the ACTUATION PROBLEM: Why does
a change take place in one language where its preconditions are present, but not
in another? Historical linguists sometimes defer this question to sociolinguists by
assuming that its answer involves contingencies of social interaction, but a compre-
hensive model of phonologization should explain how phonetic patterns uniformly
characterizing all speakers of a language can give rise to phonological patterns that
serve as speech variants or norms for some of them.
Our approach highlights the three elements of phonologization shown in (4).
(4) a. Structured variation: Speech production and perception generate variants
(see sections 3.3-3.4)
b. Constrained selection: Linguistic factors influence the choice of variants (see
section 3.5)
c. Innovation: Individuals initiate and propagate changes (see section 3.6)
Processes of speech production and perception generate what Ohala (1989) memo-
rably describes as a 'pool of variation from which new phonological patterns emerge;
we emphasize that this variation is structured in ways that help determine phonolog-
ical typology. Other processes contribute to the phonologized outcome; for exam-
ple, Kiparsky (1995) and Lindblom et al. (1995) refer to 'selection from the pool
of variants. But our first goal is to understand how the underlying variation itself
is structured by bias factors, even if selectional processes also contribute bias (see
section 3.5). Finally, actuation begins with innovation; our second goal is to under-
stand why individual innovators would increase their use of certain speech variants
from the pool of variation.
This chapter is organized as follows. In sections 3.2-3.5, we address the constraints
problem of Weinreich et al. (1968). We begin with a review of sound change typologies
in section 3.2; despite differences of detail, many share a taxonomy inherited from
the neogrammarians. In section 3.3, we examine elements of speech production and
perception and evaluate possible bias factors in each case; we suggest in section 3.4
that certain patterns of sound change maybe correlated with certain bias factors based
on their phonological typology. We discuss selection in section 3.5, describing facets
of phonologization that are system-dependent; they may involve bias factors, but only
relative to language-specific or universal systematic constraints.
In section 3.6 we turn to the actuation question, sketching a theory of mecha-
nisms that link bias factors and sound changes. While the former are often universal,
the latter are language-specific and at first perhaps even speaker-specific. Successful
changes must propagate from innovators before eventually becoming community
speech norms; we present the results of simulating aspects of this process. We con-
clude in section 3.7 with a brief summary and some questions for future research.
54 Andrew Garnit and Keith Johnson
regular.3 This theory was couched by Paul (1880, 1920) in a surprisingly modern
exemplar-based view of phonological knowledge (see section 3.6 below).
More recently, a similar two-way scheme has been defended by Kiparsky (1995).
He writes that the first sound change type originates as speech variation with artic-
ulatory causes; certain variants are then selected by linguistic systems, subject to
further (linguistic) constraints.4 The residual type consists of changes that originate
as perceptually-based reintepretations, possibly in the course of language acquisition.
The role of the listener was already crucial for Paul (1880, 1920), according to
whom the major type of sound change occurs when articulatory processes create
variants that are heard by listeners, stored in exemplar memory, and in turn give rise to
new, slightly altered articulatory targets. But in emphasizing the articulatory basis of
sound change, neither the neogrammarians nor their successors explored the possible
details of listener-based innovation. In recent decades, two influential accounts of
sound change have done precisely this. These accounts, due to John Ohala and Juliette
Blevins, share comparable three-way typologies. We highlight the similarities between
them in Table 3.2, though they also have important differences.
For Ohala, most explicitly in a 1993 paper (Ohala i993b), there are three main
mechanisms of sound change.5 The one corresponding most closely to the traditional
category of articulatorily grounded change is what he calls HYPOCORRECTION. This is
rooted in correction, the normalization that listeners impose on a signalfor exam-
ple, factoring out coarticulatory effects to recover a talker's intention. In hypocor-
rection, a listener undercorrects for some co articulatory effect, assuming that it is
3
Bloomfield (1933) suggests with some uncertainty that articulatory simplification may underlie the
major type of sound change; he expresses no view of the cause(s) of the residual type.
4
The role of articulatory reduction in sound change has also been emphasized by other modern linguists
(e.g. Mowrey and Pagliuca 1995; Bybee 2001, 2007), but they have not yet presented an overall account of
how various types of sound change fit together.
5
It is hard to select one or even a few of Ohala's contributions from within his influential and insightful
oeuvre in this area; see Iinguistics.berkeley.edu/phonlab/users/ohala/index3.html for a full list.
56 Andrew Garnit and Keith Johnson
6
Ohala (i993b: 258) suggests that this can be viewed as a type of hypocorrection; the difference 'is
whether the disambiguating cues that could have been used by the listener (but were not) are temporally
co-terminous with the ambiguous part [as in the confusion of acoustically similar sounds] or whether they
are not', as in hypocorrection.
7
On this sound change see section 3.5.1 below. A potential criticism is that of Blevins's three mech-
anisms, only CHANGE is intrinsically asymmetric (assuming that perceptual biases and constraints on
misperception are asymmetric). By contrast, nothing about CHOICE or CHANCE per se predicts any direc-
tionality; for example, Blevins (2004: 35) notes, in CHANCE 'there is no language-independent phonetic
bias' and 'the signal is inherently ambiguous'. Therefore the explanation for any observed asymmetries
must be sought elsewhere. This criticism is not germane to Ohala's system. In that system, however, since
hypocorrection and hypercorrection are mirror-image processes, there is no immediate explanation for
their many asymmetries (for example, nonlocal laryng al- feature dissimilation is common but nonlocal
laryngeal-feature assimilation is rare).
3- Phonetic bias in sound change 57
Perhaps the fullest typology is that of Grammont (1939), the first author to present
a theory based on a survey of all known sound change patterns.8 For him, sound
changes emerge through competition between constraintshe called them 'laws'
(Grammont 1939:176)favoring effort reduction and clarity, as well as other factors.
Given in (5) is his full scheme; he distinguishes changes where the conditioning
environment is adjacent or local (sb) from those where it is nonlocal (5c). Grammont s
typology cannot readily be adapted to the present day, but it is notable that he invoked
articulatory reduction, perceptual clarity, and motor planning as key ingredients
in sound change. His theory of nonlocal dissimilation is especially interesting (see
already Grammont 1895): he argues that the segment undergoing dissimilation is
always in a 'weaker' position than the trigger; positional strength is defined with
reference to accent, syllable position, and, if all else is equal, linear order, in which
case the first segment is weaker. He suggests that nonlocal dissimilation occurs when
planning for a segment which is in a more prominent position distracts a talker who
is producing a similar segment in a weaker position.
(5) Grammont's (1939) typology of sound changes
a. Unconditioned changes: explanation unclear (in some cases language
contact?)
b. Locally conditioned changes
ASSIMILATION: motivated by articulatory ease
DISSIMILATION: motivated by perceptual clarity
METATHESIS: motivated by perceptual clarity and phonotactic
optimization
c. Nonlocally conditioned changes
ASSIMILATION: explanation unclear, but evidently articulatory
in origin
DISSIMILATION: originates in motor-planning errors
METATHESIS: motivated by perceptual clarity and phonotactic
optimization
Our own presentation draws much from the approaches of earlier authors, but it
crucially differs from them. With its reference to 'articulatory' reduction and variabil-
ity, the traditional dichotomy inherited from the neogrammarians is too simplistic,
even in its modern avatars, and fails to reflect the true complexity of speech produc-
tion. On the other hand, the listener-oriented typologies of Ohala and Blevins leave
8
That is, all patterns known to him over 75 years ago. The only comparable works are by Hock (1991),
whose textbook classifies surface patterns without a theory of causes, Blevins (2004), whose broad coverage
is exhaustive for certain patterns but is not meant to be complete for all types of sound change, and Kmmel
(2007), whose coverage is restricted to a few language families. Today it would be almost impossible to be as
thorough as Grammont tried to be; useful modern sources are Blevins's (2oo8b) 'field guide' and Hanssons
(2008) overview.
58 Andrew Garnit and Keith Johnson
essential questions about speech production unanswered; for example, what processes
generate and constrain the variable input to Blevins's CHOICE? Finally, while thorough
and replete with interesting observations, Grammont's account is too inexplicit and
stipulative to be used without change today.
The typology we present is deductive rather than inductive. That is, rather than
surveying sound changes, we examine components of speech production and percep-
tion, seeking relatively complete coverage, and we ask what biases each component is
likely to yield. We propose that biases emerging from the various systems of speech
production and perception, respectively, underlie various types of sound change with
corresponding phonological profiles. What emerges from this approach has elements
of previous typologies, therefore, but cannot be directly mapped onto any of them.
A second (defining) property of bias factors in sound change is that bias is direc-
tional. For example, given that [i] is most often misperceived as [e], one might suppose
that [e] would reciprocally be misperceived as [i]. As the second line in Table 3.3
shows, this is not the case. Although [e] is misperceived as [i] at a rate that is greater
than chance, the most common misperception of [e] was as [ae]. Table 3.4 indicates
that the lax back vowels tested by Peterson and Barney showed a similar asymmetric
confusion pattern, where [u] was confused with [A] while [A] was more often confused
with [a]. Labov (1994) observed that in vowel shifts, lax vowels tend to fall in the
vowel space; the perceptual data in Tables 3.3-3.4 suggest that one source of the
directionality of the sound change may be a perceptual asymmetry. In any case, our
main point is that phonetic bias factors are directional.
Phonetic bias factors thus produce a pool of synchronie phonetic variation (Ohala
1989; Kiparsky 1995; Lindblom et al. 1995) which forms the input to sound change;
this is sketched in Figure 3.1. The structure imposed on the phonetic input to sound
change, via the directionality of phonetic variation, is a key source of the typological
patterns of sound change.
In the following subsections, we will consider potential bias factors arising from
the phonetics of speaking and listening, and the extent to which they may provide
both non-randomness and directionality in sound change. Speaking and listening
as a whole can be said to contain four elements that might provide bias factors in
sound change. We will discuss these in turn: motor planning (3.3.1); aerodynamic
constraints (3.3.2); gestural mechanics (3.3.3), including gestural overlap and gestu-
ral blend; and perceptual parsing (3.3.4). The order roughly mimics the order from
thought to speech, and from a talker to a listener. In Section 3.4 we will turn to
discuss representative types of sound change that stem from the various bias factors
we identify in this section.
change mechanism
- speech perception
FIGURE 3.1 Phonetic bias factors produce a pool of synchronie phonetic variation which can
be taken up in sound change
9
On motor plan blending see Boomer and Laver (1968), MacKay (1970), Fromkin (1971), Fromkin
(1973), Dell (1986), and Shattuck-Hufnagel (1987). Pouplier and Goldstein (2010) have also shown that
speech planning and articulatory dynamics interact with each other in complex ways, so that the specific
phonetic results of some speech errors may be outside the speakers ordinary inventory of articulatory
routines.
In addition to blending and inhibition, bias may also emerge from what Hume (ioo4b) calls ATTES-
TATION, suggesting that some metathesis patterns point to ca bias towards more practiced articulatory
routines' (p. 229). Undoubtedly, there is a tendency for the articulators to be drawn to familiar, routinized
patterns. This can be seen in loan word adaptation as words are nativized, and probably also exerts a type of
phonotactic leveling as Hume suggests. We consider attestation to be a systematic constraint (section 3.5),
different in kind from the phonetic bias factors, though in this case the difference between linguistically
universal and language-specific biases is particularly fine.
3- Phonetic bias in sound change 61
preservations (waking rabbits >> waking wabbits). Blending of segmental plans due to
adjacency of those plans results in bias toward non-randomness in speech production
errors. People are more likely to blend plans that are in proximity to each otherin
time, phonetic similarity and articulatory planning structure (that is, onsets interact
with onsets, nuclei with nuclei, etc.).
The effects of motor plan inhibition can be seen in tongue twisters where an alter-
nating pattern is interrupted by a repetition (Goldinger 1989). In the sequence unique
New York we have a sequence of onset consonants [j . . . n ... n . . . j] and when the
phrase is repeated the sequence is thus [ . . . j n n j j n n j j n n j j . . . ], an aa bb pattern.
Other tongue twisters are like this as well. For example, she sells sea shells by the sea
shore is [J ... s ... s ... J . . . s ... J]. Typically, in these sequences the error is toward
an alternating pattern [ j . . . n ... j . . . n] instead of the repetition of one of the onsets.
It may be worth noting in this context that repeated tongue motion is dispreferred
in playing a brass instrument like a trombone or trumpet. With these instruments
(and perhaps others) rapid articulation of notes is achieved by 'double tonguing'
alternating between coronal and dorsal stops, rather than 'single tonguing'using a
sequence of coronal stops to start notes.
In both motor plan blending and motor plan inhibition, it is likely that rhythm
and stress may play a significant role in determining that prominent segments will be
preserved while non-prominent segments will be altered, because the prosodie orga-
nization of language is extremely important in motor planning (Port 2003; Saltzman
etal. 2008).
aerodynamic voicing constraint as a constraint against voicingproviding a phonetic bias toward the
elimination of voicing in segments where voicing is difficult.
11
On prenasalization and voicing see e.g. Iverson and Salmons (1996). It may be helpful to note also
that contrast maintenancea basic factor that we appeal to in accounting for sound changeis similar to
'faithfulness' constraints in Optimality Theory, whose 'markedness' constraints likewise correspond almost
exactly to our phonetic bias factors.
3- Phonetic bias in sound change 63
This mechanical effect (the overlap of consonant and vowel tongue gestures) does not
ordinarily lead to sound change, due to a perceptual mechanism that compensates
for coarticulation (Mann and Repp 1980); coarticulation is perceptually corrected.
But if, for some reason, the listener fails to correct for coarticulation, a change may
result: /u/ > [y] / [cor], with no change before other consonants. Something just
like this seems to have happened in Central Tibetan (Dbus), as illustrated in (6).
Final consonants were debuccalized or lost; the examples in (6a) show vowels that
were unaffected, while those in (6b) show fronting when the final consonant was a
coronal.
(6) Central Tibetan precoronal vowel fronting (Tournadre 2005: 28-32)
a. Written Tibetan (WT) brag rock' > Central Tibetan (CT) fa?
WT dgu 'nine' > CT gu
WTphjugpo rich' > CT thukpo
b. WTkarwool'>CT/;
WTbod 'Tibet' >CTph0?
WTkhortoboiY>CTkh0:
WT bdun 'seven' > CT dy
WT sbrul 'snake' > CT ^yi
Hypocorrection is a key ingredient of change, both in Ohala's and our account, but it
is important to add that hypocorrection per se does not involve a specific bias factor.
The bias factor in cases like (6)the phonetic force that introduces variability and
determines the direction of the changeis gestural. It is coarticulation that deter-
mines whether /u/ will drift toward [y] or [a] in coronal contexts. Hypocorrection
helps determine whether or not a change will occur on a specific occasion, and as
such it is part of a model of actuation; cf. section 3.6.
14
In any theory positing occasional events (e.g. misperceptions or failures of perceptual correction)
as sources of the variation that becomes conventionalized in change, it is hard to see what would exclude
occasional speech errors from contributing to the same variation.
3- Phonetic bias in sound change 67
The same asymmetry is found in speech errors (Shattuck-Hufnagel and Klatt 1979).
Stemberger (1991) relates this to 'addition bias' (Stemberger and Treiman 1986),
whereby complex segments are anticipated in planning simple segments; [J] is more
complex because it uses the tongue blade and body.
As Hansson (2010) notes, consonant harmony patterns also resemble speech errors
in being typically similarity-based: more similar segments interact with each other.
In view of this and their other parallels (the nonlocality of consonant harmony its
typically anticipatory nature, and addition bias), Hansson suggests, and we agree, that
phonological consonant harmony patterns are likely to have originated diachronically
in motor planning errors.
Long-distance displacement (nonlocal metathesis) is a second type of sound change
that may have its origin in motor planning. In the typology of metathesis sound
changes (Blevins and Garrett 1998,2004), it is notable that long-distance displacement
commonly affects only some segment types. Often, for example, liquids undergo
displacement leftward to the word-initial syllable onset. This is especially well doc-
umented in Romance varieties and languages influenced by Romance; Old Sardinian
examples are shown in (9).15
(9) Latin (L) > Old Sardinian (OS) liquid displacement (Geisler 1994: 110-11)
L castrum Tort' > OS crstu
L cochlea 'snail' > OS clocha
L complre 'fill' > OS clmpere
L dextra 'right (hand)' > OS dresta
Lfebrurium of February' > OSfrevariu
Lpigrum 'slow' > OS prigu
Lpblicum 'public' > OS plubicu
Such displacements are usually anticipatory and tend to involve comparable sylla-
ble positions. For example, as in (9), displacement is often restricted to interchange
between obstruent-liquid clusters. We take it that such phonologized patterns are
rooted in motor planning. Independent support for this view comes from the fact
that such displacements are a well-documented speech error pattern, as in German
Brunsenbenner for Bunsenbrenner 'Bunsen burner' (Meringer and Mayer 1895: 91).16
15
In Old Sardinian, as Geisler (1994: 112) notes, the displacement is restricted to adjacent syllables. In
modern dialects, longer-distance displacements are also found: Latin fenestra 'window' > Old Sardinian
fenestra > modern dialectal fronsta. This chronological difference between one-syllable and longer dis-
placement patterns undermines an argument by Blevins and Garrett (2004: 134-5), based on comparable
data in southern Italian dialects of Greek, that certain details of the longer displacement patterns favor the
view that such changes originate through misperception.
16
While displacements of this type are not rare in speech error corpora, we have not studied the data
carefully enough to judge whether other displacement patterns that are unattested as sound changes might
also correspond to rarer speech error patterns. If they do, as Juliette Blevins points out to us, we would face
the problem of explaining why such errors do not sometimes yield sound changes.
68 Andrew Garnit and Keith Johnson
17
Other changes indirectly attributable to this constraint are noted in section 3.5.1.
3- Phonetic bias in sound change 69
Note that (as the modern ou, au spellings indicate) all three English words in (12) had
a round vowel [u] before [x]. We follow Luick (1940: vol. 2, pp. 1046-53) and Catford
(1977) in assuming that en route from [x] to [f ] there was a realization like [xw], result-
ing from overlap of the round vowel and [x]. Catford notes that a strongly rounded
pronunciation can still be heard in southern Scotland: [krxw] 'laugh', [rAuxw] 'rough',
etc. The remaining [xw] > [f] change is not due to gestural mechanics and will be
discussed in section 3.5.1 below.
Typical changes due to gestural blend are coronal or velar palatalization (see further
3.4.4 below), the Tibetan precoronal vowel fronting pattern in (6) above, and vowel
coalescence. Shown in (13), for example, are Attic Greek coalescence patterns for non-
high non-identical short vowels. Here the coalescence of mid vowels preserves height;
directionality is relevant in some but not all cases.18
18
Omitted in (13) are the coalescence of identical vowels as long vowels and of glide sequences as
diphthongs. Note in relation to palatalization that not all 'palatalization is the same: whereas coronal
palatalization can be interpreted as an effect of gestural blend, labial palatalization would reflect gestural
overlap.
/o Andrew Garnit and Keith Johnson
(13) Selected Attic Greek vowel contraction patterns (Rix 1992: 52-3, Smyth
1956: 19)
INPUT CONTRACTION EXAMPLE
in favor of this account, to be sure, is that experimental studies (Miller and Nicely
1955, Babel and McGuire 2010) show that [0] is misperceived as [f] significantly
more often than the reverse; this is consistent with the fact that a [f] > [0] change
is unknown.23 But we suspect that the change may involve first the development of
labialization on [0],i.e. [0] > [0W], with a further [0W] > [f] change that is similar to the
English [xw] > [f] change mentioned in section 3.4.3. We have three reasons for our
suspicion. First, in Glasgow, to which the English [0] > [f ] change has spread in recent
decades, there is a variant that Stuart-Smith et al. (2007) describe as a labialized dental
fricative, perceptually intermediate between [0] and [f]. 24 Second, in South Saami and
Latin there are cases where an interdental > labiodental fricative change is limited to
labial contexts (Kmmel 2007:193); we interpret these as shifts targeting phonetically
labialized interdentals, equivalent to the [0W] > [f] step that we assume for [0] > [f]
shifts generally. Third, within Northern Athabaskan, as analyzed by Howe and Fulop
(2005) and Flynn and Fulop (2008), a reconstructible series of interdental fricatives
and affricates has the outcomes in (15):
(15) Selected reflexes of Northern Athabaskan interdental fricatives and affricates
a. Interdentals: Dene Tha dialect of South Slavey
b. Labials ([p], [ph], [p?], [f], [v]): Tulita Slavey
c. Labial-velars (e.g. [kw], k wh ], [kw?], [AY], [w]): Dogrib, Hare, Gwich'in
d. Velars: Dene Tha and Gwich'in dialects
e. Pharyngealized sibilants: Tsilhqot'in
Howe and Fulop (2005) argue that the Tsilhqot'in development in (ise) was as in (16),
and that all the outcomes in (i5b-i5e) passed through a labialized interdental stage.
(16) Northern Athabaskan interdental fricatives and affricates in Tsilhqot'in
[*t0, *t0h, *t0?, *0, *8] > [*t0w, *t0wh, *t0w?, *0W, *9W] > [ts?, ts h , ts?, s?, z ? ]
If so, two of the best-documented [0] > [f] cases (in English and Scots dialects, and in
Athabaskan) show evidence for an intermediate [0W] stage. Howe and Fulop (2005)
and Flynn and Fulop (2008) suggest that the reason labialization emerges is that it
enhances the acoustic feature [grave], which, they contend, characterizes interden-
tals; in their Jakobsonian formulation, [flat] enhances [grave]. In short, on this view
of [0] > [f] shifts, the initial bias factor driving them is not perceptual parsing but
perceptual enhancement (section 3.5.1).
23
As Nielsen (2010: 10) points out, however, if it is asymmetric misperception that explains [0] > [f]
shifts, we might expect [0] > [f] substitutions in English second-language learning; in fact other substitu-
tions appear to be more common.
24
We are not aware of detailed phonetic studies of the ongoing [0] > [f] change in other dialects. Note
that an independent earlier [0w] > [f ] change is documented in Scots dialects: Old English Owiitan > Buchan
Scots fdjt cut' (Dieth 1932). Of course this does not prove that the same change happened later, but it
establishes the change as a natural one within the phonological context of English and Scots.
3- Phonetic bias in sound change 73
A final common type of sound change where asymmetric misperception has been
assumed is the 'fusion of obstruent + [w] sequences as labial obstruents. In the typical
examples in (17), sequences with stops fuse as bilabial stops and those with frica-
tives fuse as labiodental fricatives.25 Two other examples were mentioned above: the
Buchan Scots Ow >/change in note 24 and the hypothesized Tulita Slavey labiodental
> labial shift in (isb).
(17) a. Stop-glide fusion: Latin dw > b I #
dwellum > bellum 'war'
dwenos > bonus good'
*dwis > bis 'twice'
b. Stop-glide fusion: Ancient Greek kw > p
*wekwos > epos 'word'
*leikwoi > leipoi 'I leave'
*kwolos > polos 'pivot'
c. Fricative-glide fusion: Old English xw > Buchan Scots/(Dieth 1932)
xwa: >/a:'who'
xwcet > fat 'what'
xwiit >fojt 'white'
xwonne > fan 'when'
Significantly, the fricative changes involve a bilabial > labiodental place of articulation
shift. Note also that the Slavey change is non-neutralizing (the phonological inventory
previously lacked labials) while the others are neutralizing.
In essence, the perceptual parsing account of changes like these is that [kw] is
sufficiently likely to be misheard as [p], and [0w] or [xw] is sufficiently likely to be
misheard as [f], for such misperceptions occasionally to give rise to a new phono-
logical representation. Though we do not know of any relevant experimental work,
we would not be surprised to learn that asymmetric misperception patterns such as
these can be confirmed in the laboratory. Still, one or two points are worth making.
First, competing with the perceptual parsing account is one based on articulatory
change: an account in which the glide [w] becomes a stop or fricative before the
immediately preceding stop or fricative articulation is lost. For example, according to
the competing view, [kw] > [p] via intermediate [kp] (or the like) and [xw] > [f ] via
intermediate [x<f>] (or the like). That such an intermediate stage is possible has support
from several sources. For the stop changes in (17), Catford (1977) mentions examples
like that of Lak and Abkhaz, where, for example in Lak, /kw?/ is realized as [kp ? ].
Catford writes that the 'the labial element is an endolabial stop: the lips are pushed
25
In some cases the glide is printed as a secondary articulation, in other cases as a distinct segment.
This reflects the standard phonological analyses of the languages and probably does not signify any relevant
phonetic difference.
74 Andrew Garnit and Keith Johnson
forward, but kept flat (not rounded)', and suggests that the Greek change in (i/b) may
have passed through the same stage. As Larry Hyman reminds us, labialized velar >
labial-velar changes are also well documented in Africa, for example in the Eastern
Beboid (Niger-Congo) language Noone (Hyman 1981; Richards 1991). To confirm
the perceptual parsing account of [kw] > [p] changes, it would be desirable to identify
an ongoing case where such a change involves no intermediate variants.
For fricative changes such as [xw] > [f], Catford (1977) compares Scots dialects:
The labialisation becomes quite intense, towards the end of the sound, and, intervocalically,
almost completely masks the sound of the velar component. Anyone who heard a South Scot
saying 'What are you laughing at', ['xwAt a r i 'Ierran at] can have no further doubts about how
[x] developed to [f ].
It is important to note the difference between [<f>] and [f]. It may be that the shift to
a labiodental place of articulation is due to perceptual parsing, but since labiodental
fricatives are noisier than bilabial fricatives it may alternatively be possible to assume
auditory enhancement (of continuancy). In any case, for the stop changes (e.g. [kw] >
[kp]) and the fricative changes (e.g. [xw] > [x<f>]), we are left with the question of
whether the emergence of [p] and [<f>] respectively is due to perceptual parsing (e.g.
[kw] misperceived as [kp), articulatory variability (e.g. [w] occasionally pronounced
with lip closure or near-closure), or some other cause.26 The question strikes us
as unresolved, and with it the role of perceptual parsing in sound changes of the
three broad types examined in this section, which target palatalized and labialized
obstruents. We turn in the next section to a final type of sound change that has been
attributed to perceptual parsing.
28
An additional pattern is that with an /-initial base, the infix undergoes assimilation and surfaces as
/-al-/: litik little -> plural l<al>itik.
3- Phonetic bias in sound change 77
29
The examples in (22-23) include the complete dossier of reasonably persuasive cases in the published
corpora of Meringer (Meringer and Mayer 1895; Meringer 1908) and Fromkin (2000).
30
We do not know of speech error studies for languages with phonological glottalization, aspiration,
etc. The motor-planning account of dissimilation predicts the existence in such languages of dissimilatory
speech errors involving those features.
78 Andrew Garnit and Keith Johnson
Grammont (1895, 1939) argues that dissimilation tends to target segments in unac-
cented positions and in 'weaker' syllable positions (e.g. onsets rather than codas).
The idea that typical targets of dissimilation are 'weak' positions and perhaps 'weak'
features (secondary features such as aspiration) is consistent with a motor-planning
approach. In interactions between nearby segments with identical features, motor
plan inhibition (section 3.3.1) eliminates repetition by preserving the more salient
(anticipated or positionally 'stronger') segment.31
3.5.1 Enhancement
The initial stages of sound changes that emerge from the bias factors discussed in
sections 3.3-3.4 are either categorical or incremental. They are categorical if they are
already phonetically complete in their initial stage. For example, if motor planning
errors are a source of sibilant harmony, the erroneous pronunciation of [s] may
already have been a fully changed [J]. Our expectation is that changes rooted in motor
planning and perceptual parsing are often categorical.
By contrast, in changes emerging from aerodynamic constraints and gestural
mechanics, the structured variation found in the initial stage of phonologization may
involve pronunciation variants that differ considerably from the eventual outcome.
For example, the first stages of adjacent-vowel coalescence might involve only partial
gestural overlap, with complete coalescence resulting only after several generations or
longer. Similarly, there is apparently a range of intermediate pronunciations between
[Vwx] and [Vxw], or between the latter and [f]. We use the term ENHANCEMENT
to refer to processes by which a relatively small initial bias effect is amplified to its
eventual categorical result.32 This in turn has two distinct profiles.
31
Tilsen (this volume) proposes a connection between motor-planning inhibition and dissimilatory
effects, grounded in the following experimental observations (from areas outside language): 'when move-
ment A to one target location is prepared in the context of planning a distractor movement B to a sufficiently
different target location, then the executed trajectory of movement A deviates away from the target of
movement B... In addition, more salient distractors induce greater deviations
32
This use of the term is not what Stevens and Keyser (1989) meant when they wrote about featural
enhancement, but there are parallels. Some phonetic property is made more recoverable by changes in
3- Phonetic bias in sound change 79
suggests that this pattern (which amounts to long-distance agreement) arose directly
from the purely local vowel-consonant coarticulation found in closely related Interior
Salish languages. She writes that the root cause of the shift is 'that faucal features
are maximally compatible with vocalic rather than consonantal structure... [T]he
phonologisation of local co articulation [in related languages] lays the ground for
a more general assignment of faucal features to vocalic structure, so that faucal
features appear on any preceding vowel' (Bessell 1998: 30). Note that in this as in
other cases of articulatory enhancement, the basic direction of change is determined
by articulatory factors; the bias emerges from gestural mechanics, not perceptual
enhancement.
Second, in what we call AUDITORY ENHANCEMENT, a new articulatory feature is
introduced with the effect of enhancing the auditory distinctness of a contrast. A
classic example is lip rounding on back vowels, which positions vowels in the acoustic
vowel space in a maximally dispersed way (Liljencrants and Lindblom 1972), thus
enhancing the overall perceptual contrast in the vowel system. Other redundant sec-
ondary features that can be analyzed in a similar way include the labialization of [J].
In our discussions of individual sound changes above, we have also identified several
developments, listed in (25), that may be attributable to auditory enhancement.
(25) Possible examples of sound change due to auditory enhancement
a. Prenasalization in voiced stops enhances voicing (section 3.3.2)
b. [0] > [0W] enhances [flat] (section 3.4.4)
c. [ x <f>] > [f] enhances continuancy (section 3.4.4)
The emergence of auditory enhancement could be envisioned in at least two ways.
One possibility is that talkers possess linguistic knowledge of acoustic targets, and that
new articulatory features are sometimes introduced in speech when a contrast is insuf-
ficiently salient. Such new features then spread like any other linguistic innovations.
Another possibility is that features that emerge through auditory enhancement are
occasionally present in natural speech, simply by chance along with other phonetic
variants, but that because they enhance a contrast they have a privileged status in
listeners' exemplar memories, and are then more frequently propagated. We cannot
judge which account is likelier. But whether the speaker-oriented or the listener-
oriented approach ultimately proves more satisfactory, it is worth noting that auditory
enhancement, unlike articulatory enhancement, does define a set of bias factors for
linguistic change: new features may arise that auditorily enhance existing contrasts.
This is a bias factor, but unlike those described in sections 3.3-3.4, it is system-
dependent. 34
34
Note that enhancement need not be regarded as teleological. For example, Blevins and Wedels (2009)
account of anti-homophony effects may generate articulatory (and perhaps even auditory) enhancement
effects as an automatic by-product of phonetic categorization.
3- Phonetic bias in sound change 81
selection in particular, and that the discovery of selectional bias patterns with no other
explanation may be evidence for universal constraints. But the details are debated; on
final voicing compare Yu (2004), Blevins (2oo6a, b), and Kiparsky (2006). Finally
Moreton's suggestion of general learning constraints on learning seems reasonable
a priori, but requires more investigation to be securely established as a source of
linguistic asymmetries (cf. Yu 2011).
described in changes such as English vowel reduction (Fidelholtz 1975) and flapping
(Rhodes 1992), among others summarized by Bybee (2001, 2002) and Phillips (2006),
but three problems remain. First, many well-studied leniting changes show no fre-
quency effects; examples include Latin rhotacism, Verner's Law, and the degemination
of Latin geminate stops in languages like Spanish.36 If word frequency effects are
implicated in sound changes from their earliest stages, the difference between changes
where these effects vanish and changes where they persist is unexplained. Second,
the nature of the effects identified experimentally (a gradient relationship between
frequency and duration) and in studies of phonological patterns (where words may
fall into two frequency-determined groups, only one of which shows a change) are not
precisely the same, and the relationship between them is not clear. And third, more
than one sociolinguistic study has found, echoing the classical view of Bloomfield
(!933- 352-362), that ongoing changes tend to exhibit lexical irregularities only late
in their development, after they have become sociolinguistically salient, whereas 'the
initial stages of a change' are regular (Labov 1994: 542-3; cf. Labov 1981; Harris 1985).
In our judgment not enough is understood yet about the emergence of frequency
effects in sound change to build a coherent picture out of the contradictory facts.
In any case, the role played by lexical and morphological patterns in grammar and
usage is independent of the role played by bias factors for asymmetric sound change.
Important as the question is, it falls outside the scope of this chapter.
Of course it is hard to observe innovators in the wild, but we can still ask the crucial
question: What causes them to deviate from the norm? Why do some individuals
speak differently from all the people around them?
To this first part of the actuation question there are several possible answers.37
One answer, following Yu (loioa, this volume), appeals to individual differences in
perceptual compensation. As discussed in section 3.3.4, perceptual compensation
ordinarily leads listeners to ignore coarticulation effects. In an exemplar model of
linguistic knowledge, this would have the effect of focusing an exemplar cloud more
closely on its phonological target. Individuals with systematically attenuated per-
ceptual compensation would therefore have more divergent exemplars in memory,
mirroring the bias patterns discussed in section 3.4, and might then produce such
variants more often.
A second possible answer would appeal to individual differences in linguistic devel-
opment and experience. For example, language learners may develop different articu-
latory strategies for realizing the 'same' acoustic target. It may be that two such strate-
gies yield perceptibly different outcomes in some contexts, such as coarticulation; this
could be the point of entry of a sound change.38 Or perhaps small random differences
in experiencedifferences in what are sometimes called 'primary linguistic data'
yield differences in the phonetic systems that learners develop.39
A third possible answer, which we explore here, appeals to differences in sociolin-
guistic awareness. The basic idea is that individuals (or groups) may differ in how they
assign social meaning to linguistic differences. We speculate that some individuals
in a language community, but crucially not others, may attend to linguistic variation
within their own subgroup but not to variation in other subgroups. If such individuals
become aware of a particular phonetic variant in their subgroup, but are unaware that
it is also present in other subgroups, they may interpret the variant as a group identity
marker, and they may then use it more often. One social parameter that may give
rise to such a dynamic is power; Galinsky et al. (2006: 1071) suggest that power may
'inhibit the ability to pay attention to and comprehend others' emotional states'. To this
we might add a converse linguistic principle: lack of power sharpens one's attention to
linguistic variation (Dimov et al. 2012). What follows is meant as a proof of concept.
37
The truth may involve a combination of answers. Or perhaps there is no answerLabov (2010: 90-
91) compares mass extinctions caused by a meteor: there is nothing biologically interesting about the causes
of a meteor collision. But for linguistic innovation, we can at least hope to find some underlying linguistic
or psychological causes.
38
Individual phonetic differences without sociolinguistic salience have been identified in English vowel
production (Johnson et al. i993b), rhotic production (Westbury et al. 1998), and coarticulation patterns
(Mielke et al. 2010); other such differences undoubtedly exist.
39
This view of how change is triggered is common in the historical syntax literature (Lightfoot 1999);
cf. Blevins's (2oo6a: 126) comment that sound change of the type she calls CHOICE can depend on simple
frequency changes of variants across generations, as well as differential weightings of variants based on
social factors... '.
3- Phonetic bias in sound change 85
40
Within the framework of Optimality Theory the two assumptions correspond generally to faithful-
ness and markedness constraints (Prince and Smolensky 2004).
41
Another mechanism that has been utilized recently in multi-agent modeling of sound change is the
'probabilistic enhancement' proposed by Kirby (this volume).
86 Andrew Garnit and Keith Johnson
The rest of section 3.6 has three parts, first establishing some parameters for the
multi-agent modeling of sound change and then presenting a set of simulations. In
subsection 3.6.1, we review exemplar models of linguistic memory and relate them to
the study of sound change. In subsection 3.6.2, we review research on imitation and a
variety of factors that influence it. Finally, subsection 3.6.3 presents the simulations.
Even after the physical excitement [the direct experience of articulation and perception] has
disappeared, an enduring psychological effect remains, representations in memory, which are
of the greatest importance for sound change. For it is these alone that connect the intrinsically
separate physiological processes and bring about a causal relation between earlier and later
production of the same utterance.
In his view, random variation in the cloud of representations yields gradual articula-
tory drift. Similarly, Hockett (1965: 201) wrote about a density distribution in acoustic
space measured over years:
In the long run (measured in years), the time-dependent vector that is the speech signal for
a given speakerboth for what he himself says and for what he hears from othersspends
more time in some regions of acoustic space than in others. This yields a density distribution
defined for all points of the space. The density distribution is also time-dependent, since the
speech signal keeps moving; we may also imagine a decay effect whereby the importance for
the density distribution of the position of the speech signal at a given time decreases slowly as
that time recedes further and further into the past.
The key aspect of exemplar memory models for sound change is that, in such
models, the representation of a category includes variants. This is important because
the cloud of exemplars may gradually shift as new variants are introduced. Exemplar
theory provides an explicit model of how variability maps to linguistic categorization,
and for sound change this model is important because it permits the accumulation
of phonetically biased clouds of exemplars that serve as a basis for sound change.
Exemplars retain fine phonetic details of particular instances of speech, so phonetic
drift or sudden phonological reanalysis are both possible (as will be discussed in more
detail below). Other models of the mapping between phonetic detail and linguistic
categorization assume that phonetic detail is discarded during language use, and
3- Phonetic bias in sound change 87
therefore these theories offer no explanation of how phonetic detail comes to play
a role in sound change.
There is a central tension in exemplar theory, however, which relates directly to
sound change. We mentioned above several mechanisms (categorical perception,
compensation for coarticulation, and mispronunciation detection) that lead listeners
to disregard exemplars. More generally, it has become evident that not all exemplars
have the same impact on speech perception or production. One particularly obvious
point concerns differences between the phonetic space for listening and the phonetic
space for speaking. Listeners may be perfectly competent in understanding speech
produced in accents or dialects that they cannot themselves produce. For example,
we are able to perfectly well understand our young California students at Berkeley,
but neither of us can produce a plausible imitation of this variety of American English.
The space of familiar exemplars utilized for speech perception is thus, evidently, larger
and more diverse than the space of exemplars utilized for speech production. When
we say, as above, that specific exemplars may be disregarded by listeners, this can be
interpreted to mean that the variants introduced by bias factors are not added to the
set of variants used in speech production.
Building on this idea that speech production and perception are based on different
sets of phonetic exemplars, following Johnson (i99/a) we posit that the perceptual
phonetic space is populated with word-size exemplars for auditory word recogni-
tion. We follow Wheeldon and Levelt (1995) and Browman and Goldstein (i99oa)
in assuming that the speech production phonetic space is populated with smaller
(segmental or syllabic) exemplars used in calculating speech motor plans. These
articulatory exemplars are also recruited in certain speech perception tasks, and in
imitation.
Evidence for this dual-representation model comes from a number of different
areas of research. For example, in neurophonetics Blumstein et al. (1977) noted
the dissociation of segment perception from word recognition in certain forms of
aphasia. Hickok and Poeppel (2004) fleshed out a theory of speech reception in
which two streams of processing may be active. A DORSAL stream involves the
speech motor system in perception (Liberman et al. 1967; Liberman and Mattingly
1985), and is engaged in certain segment-focussed listening tasks. More commonly
in speech communication, speech reception is accomplished by a VENTRAL stream
of processing that involves more direct links between auditory and semantic areas of
representation.
Speech errors and perceptual errors differ qualitatively as a dual representation
model would predict. In the most common type of (sound-based) slips of the tongue,
segments in the speech plan interact with each other, to transpose or blend with the
main factors being the articulatory similarity and structural position similarity of the
interacting segments. For example, the [f ] and [t] in the speech error delayed auditory
feedback > > . . . audifauditory... share voicelessness and are in the onsets of adjacent
88 Andrew Garnit and Keith Johnson
stressed syllables. Slips of the ear, on the other hand do not usually involve interaction
of segments in an utterance, but are much more sensitive to whole-word similarity
and availability of an alternative lexical parse (Bond 1999). For example, He works in
an herb and spice shop was misheard as He works at an urban spice shop and at the
parasession was misheard as at the Paris session.
Another source of support for a dual-representation model comes from the study of
phonetic variation in conversational speech (Pitt and Johnson 2003). Johnson (2004)
studied phonetic variation in conversational speech and found that segment and
syllable deletion is extremely common. He concluded that auditory word recognition
models that rely on a prelexical segment processing stage would not actually be able to
perform accurate (human-like) word recognition and that whole-word matching is a
better approach to deal with the massive phonetic variation present in conversational
speech.
Proponents of the Motor theory of speech perception (Liberman et al. 1967)
argued for a special SPEECH MODE of segment perception. We can now hypothe-
size that in experiments that require listeners to pay careful attention to phonetic
segments, this mode will dominate (Burton et al. 2000). But when listeners are
mainly attuned to the meaning of utterances, the speech mode of listening will not
be engaged (as much) and a LANGUAGE MODE of word perception will dominate.
Lindblom et al. (1995) refer to the contrast as the how'-mode vs. the what'-mode of
perception.
A dual-representation model of phonology is also consistent with several strands
of thinking in psycholinguistics. For example, Cutler and Norris's (1979) dual-route
model of phoneme monitoring (as implemented in Norris 1994) holds that phonemes
may be detected by a phonetic route, in a speech mode of listening, or via a lexical
route where the presence of the phoneme is deduced from the fact that a word
containing the phoneme has just been detected. They identified a number factors
that influence which of these two routes will be fastest. Two modes of perception
were also implemented in Klatt's (1979) model of speech perception. Ordinary word
recognition in his approach was done using a whole-word matching system that he
called LAPS (lexical access from spectra), and new words were incorporated into the
lexicon using a segmental spell-out system that he called SCRIBER. This approach
recognizes that reception of speech may call on either of these systems (or perhaps
both of them in a race).
Dual representation is important in our model of sound change because articu-
latory targets tend to be resistant to change, and in particular sound change is not
dominated by pronunciations found in conversational speech, as a naive exemplar
model might predict given the predominance of 'massive reduction (Johnson 2004)
in conversational speech. This resistance to change is consistent with the idea that the
speech mode of perception (and the consequent activation of articulatory represen-
tations) is somewhat rare in most speech communication.
3- Phonetic bias in sound change 89
3.6.2 Imitation
Laboratory studies of phonetic accommodation have shown that speakers adjust
their speech on the basis of recent phonetic experience, i.e., that phonetic targets
are sensitive to variation. In phonetic accommodation studies, subjects simply repeat
words that they hear and are seen to adopt phonetic characteristics of words pre-
sented to them (Babel 2009 on vowel formant changes; Nielsen 2008 on consonant
aspiration changes). Speech motor plans are maintained by feedback, comparing
expected production with actual production, and evidently in phonetic accommo-
dation the expected production (the target) is computed on the basis of one's prior
speech exemplars, together with phonetic representations derived from hearing other
speakers.
The feedback tuning of speech motor control can also be seen in the laboratory in
studies of altered auditory feedback (Katseff et al. 2012). In altered feedback exper-
iments, the talker hears (in real time) re-synthesized copies of his/her speech with
the pitch (Jones and Munhall 2000), formants (Purcell and Munhall 2006; Houde and
Jordan 1998; Katseff et al. 2012), or fricative spectra (Shiller et al. 2009) altered. Talkers
respond by reversing the alterations introduced by the experimenter, even though
they don't notice that a change was introduced. In both phonetic accommodation
and altered auditory feedback studies, we see the operation of a phonetic mechanism
that may be responsible for sound change: a feedback control mechanism that incor-
porates phonetic exemplars that the speaker hears others produce, or in other words
a subconscious phonetic imitation mechanism.
Studies of phonetic accommodation and altered auditory feedback have found a
number of parameters that are relevant for a theory of imitation in sound change.
First, imitation is constrained by prior speaking experience. People do not imitate
perfectly and do not completely approximate their productions to those of others
(Pardo 2006; Babel 2009). Some of the inhibition is due to the speaker's own personal
phonetic range; Babel (2009) found that vowels with the most variation in a subject's
own speech showed the greatest accommodation. We speculate, though this has not
been tested, that the degree of match between voices may influence imitation.
Second, imitation is socially constrained. People do not automatically or uncon-
trollably imitate others, but are more likely to imitate someone they identify with
at some level (Bourhis and Giles 1977; Babel 2009). This has implications for sound
change because it indicates that the use of bias variants in speech production is socially
conditioned.
Third, imitation generalizes. Thus instances of long VOT influence speech in words
or segments not heard; for example, /p/ with long VOT produces long (imitative)
VOT in /k/ (Nielsen 2008). This finding has important implications for the regular-
ity of sound change. The 'speech mode' system that we propose, by virtue of using
segment-sized chunks provides an account of the regularity of sound change (where
the receptive whole-word exemplar space would not). Interestingly, Nielsen's results
90 Andrew Garnit and Keith Johnson
never form part of the pool of tokens that are used', so that if a listener fail[s] to
comprehend [a] word and the sentence it contains.. .this token will not contribute to
the mean value' of the target segment.43 According to this view, perceptual confusion
may result in conservation of a boundary between confusable phonemes, by limiting
the exemplars of adjacent categories to only those that are correctly identified. The
results of the simulation, shown in Figure 3.2, illustrate this. We created hypothetical
vowel formant distributions that overlapped slightly and took a random sample of
one thousand tokens from each distribution. Each vowel token was classified as an
43
Simulations by Pierrehumbert (looia) and Wedel (2006) echo in various ways the simulations pre-
sented here; see also Kirby (this volume). Like many authors, Labov assumes that the mean value of a cloud
of exemplars is a rough indicator of a vowel target. This view may not be accurate (Pierrehumbert looia),
but serves as a viable simplifying assumption in our model.
92 Andrew Garnit and Keith Johnson
example of one of the three vowel categories based on its distance to the category
centers. The category centers were then recomputed, with the misrecognized vowel
tokens removed, and a new random sample of one thousand tokens was then drawn
from each vowel category. In order to make the simulation more realistic we limited
the possible vowel space and started the simulation with the back vowel (lowest Fi
value) located at the back edge of the space. This essentially fixed it in place with
a mean of about 1200 Hz. As the figure indicates, after several cycles of selective
exclusion of exemplars in the vowel categories, the category centers of the front vowel
and the mid vowel shift so that they no longer overlap. This simulation illustrates a
mechanism in speech perception that results in vowel dispersion (Liljencrants and
Lindblom 1972).
In extending this style of simulation to study how bias factors result in sound
change we included a social component. This was because we wanted to study not
only how sound change might emerge from simple assumptions about exemplar-
based phonological categories, but we also wanted a better understanding of the
normal case where bias does not result in sound change. Therefore, the remaining
simulations in this section track the development of phonetic categories in adjacent
speech communities, where a sound change occurs in the system for one group while
the other group does not experience the change. For both groups of speakers, we
constructed phonetic categories that were represented by clouds of exemplars which
include both normal variants and, crucially in both communities, a few exemplars (ten
per cent) that have been altered by a bias factor. The key difference between the groups
is whether or not the bias variants are disregarded. It seems reasonable to assume that
variants produced by phonetic bias factors are usually corrected', either by perceptual
processes like compensation or by rejection of speech errors. Stability of phonetic
categories is thus the norm. As we shall discuss, we assumed that these correction
processes were not implemented to the same degree by all speakers; one group of
speakers more actively applied perceptual compensation mechanisms than the other.
Thus, the difference between groups is modeled as a difference in the exemplars
selected by group members to define the phonetic category.
The top row of Figure 3.3 shows the starting phonetic and social distributions of
our first simulation of social stratification and sound change. The simulation tracks
the pronunciation of /z/ in two social groups. As discussed above, voiced fricatives
like /z/ are biased by aerodynamic constraints, and sometimes are realized with
reduced frication (more like an approximant). This simulation of a gradient phonetic
effect is appropriate for modeling many types of sound change including context
free vowel shifts, the despirantization of voiced fricatives, vowel fronting near coro-
nal consonants, vowel nasalization, and vowel coalescence, among other changes. In
this simulation, a bias factor produced a slightly skewed phonetic distribution. Most
productions (ninety per cent) clustered around the phonetic target value, which was
arbitrarily set to zero. A few productions (ten per cent), however, were a little biased
3- Phonetic bias in sound change 93
FIGURE 3.3 Simulation of a gradient phonetic bias. The starting phonetic and social identity
distributions are shown in the histograms. The results of a bivariate random selection from
these distributions is shown in the top right panel. Social group differences are indicated on
the vertical axis, which measures an arbitrary 'social identity' parameter. Phonetic output is
shown on the horizontal axis, where a value of zero indicates a voiced fricative production, and
a value of four indicates a voiced approximant production. The bottom panels show the gradual
phonetic drift, from iteration o to iteration 50 of the simulation, as the phonetic target includes
approximated variants for one social group, and persistent phonetic instability for the other
group who do not allow the inclusion of approximated variants to influence the target
so that the phonetic distribution has a longer tail in one direction than it does in the
other. The speech community in this simulation was also characterized by a bimodal
social stratification with fifty per cent of exemplars produced by one social group and
fifty per cent by another group of talkers. Each dot in the top right graph represents an
exemplar in the sociophonetic space defined by phonetic output and social identity.
At the start of the simulation there is no correlation between the phonetic and social
values; the bias factor is equally likely to affect the speech of each population group.
The bottom row of graphs shows how this phonetic system evolved over the course of
fifty iterations of simulated imitation.
As seen in Figure 3.3, the phonetic output of the two simulated groups of speakers
diverges. One group (centered around social identity index value o) maintained the
starting phonetic realizationa situation of persistent phonetic instability, where an
aerodynamic bias factor influences about ten per cent of all /z/ productions, but
this bias factor does not induce phonetic drift. The other group (centered around
social identity index value 6) shows gradual phonetic drift, so that by the end of the
94 Andrew Garnit and Keith Johnson
simulation the original /z/ is now /r/. Speakers in both groups are assumed to base
their productions on a cloud of exemplars (using the mean value of a set of exemplars
as a target). The difference is in the selection of exemplars to include in the cloud.
The V group, who did not experience a sound change, disregarded the phonetic
bias variantsthey successfully compensated for the bias and removed it from their
exemplar-based phonetic definition of /z/. The 6' group, who did experience the
sound change, INCLUDED the bias variants in /z/, and thus the phonetic target was
changed by the bias.
Why would different groups of speakers treat bias variants in different ways?
Although bias variants occur with equal frequency for both groups of speakers, we
assume that phonetically unusual productions may take on indexical meaning for 6'
group. Speakers who seek to identify with the group may be more likely to notice
phonetic variation among group members and thus include it in as a group index-
ical property, even though that same variation exists in the population as a whole.
Prospective group members may thus notice variants when they are produced by the
target group even though they disregard those same variants when produced by other
speakers. Considered from another point of view, a group that is aware of some social
distance from another group may attend to phonetic deviations from the norm as
marks of social differentiation.
It has to be admitted, though, that change caused by gradient bias may also be more
inevitable than change induced by more discontinuous bias factors, in that listeners
may be less likely to disregard bias variants that are only very minimally different from
unbiased variants. Thus, variation introduced by a gradient phonetic bias may be less
useful for social differentiation than a more discontinuous bias factor because it may
fuel sound change regardless of social identity factors.44 It is important, therefore, to
study the link between discontinuous bias factors (such as those introduced by speech
production or perception errors) and sound change.
To model more discontinuous phonetic bias factors such as the motor planning
errors that we posited for cases of consonant harmony, the same basic model can
be used. However, discontinuous bias is often structure preserving in the sense that
speech errors often result in sounds already present in the language, so we assume
that the basic mechanism is one of probability or frequency matching (Vulkan 2000;
Gaissmaier 2008; Koehler 2009; Otto et al. 2011). For example, we can model the
harmony process that results in a change from [s] to [J] by assuming that one group
includes harmonized instances of [J] in the exemplar cloud for /s/ while the other
group does not. Then, following Pierrehumbert (2ooia), we assume that speech pro-
duction involves a process that results in frequency matching so that the likelihood of
drawing from one or the other mode in the phonetic distribution (that is [s] or [J])
matches the frequency of exemplars in those regions of phonetic space.
44
But note that this is definitely not Labov's (1994) view.
3- Phonetic bias in sound change 95
FIGURE 3.4 Simulation of a sound change caused by a discontinuous phonetic bias (such as a
motor planning error that results in a consonant harmony)
The simulation (Figure 3.4) was structured in much the same way as the previous
one. We have a population of individuals who are evenly divided into two social
groups. We also have a phonetic distribution in which ten per cent of the output tokens
are mutated by a phonetic bias factor. In this case, though, the bias factor produces a
discontinuous jump in phonetic space. However, we cannot suppose that acceptance
of the bias variants into a phonological category would result in gradual phonetic
drift because the intermediate phonetic space may be unpronounceable, or the bias
variants are good instances of an existing phonetic category. So the average phonetic
target centered around /s/ (phonetic output equal to zero in the model) stays as it was,
as does the average phonetic target centered around /// (the bias variant, modeled with
phonetic output equal to 6). However, speakers in one group are willing to accept bias
variants as acceptable ways to say forms with an /s... // sequence, while speakers in the
other group do not accept bias variants. Thus with a frequency matching production
model, where the speaker's produced distribution of variants matches the distribution
of the exemplar cloud, the bias factor may lead to wholesale change?5
These simulations of the link between phonetic bias factors and sound change have
shown that exemplar-based models provide a useful, explicit method for studying the
45
This simulation provides a useful reminder of the importance of compensation mechanisms, for
phonetic stability. If the simulation is allowed to run over thousands of epochs the frequency matching
mechanism, plus the phonetic bias factor, leads to oscillation between [s] and [/]. The model does not
stabilize unless the group who shifted from [s] to [J] begin to treat instances of [s] as errors which should
be corrected and thus removed from the exemplar cloud.
96 Andrew Garnit and Keith Johnson
role of bias factors in sound change. We have also shown, with citations from Paul,
Hockett, and Labov, that an exemplar-based conception of human phonetic memory
is the mainstream view.46
The simulations also identified a crucial role for exemplar selection in sound
change, and in particular concluded that socially motivated exemplar selection rules
make it possible to model both sound change and phonetic stability. Building on this
finding, we speculate that a group who tend to accept bias variants (phonetic variants
caused by bias factors) is likely to be engaged in a project of social differentiation,
and are looking for cultural material that could be of value in this project. Thus,
bias variants, though phonetically confusing, may be socially useful. Although this
is stated as if it is a phonetically conscious activity, it need not be. To the extent that
changes are 'involuntary' and 'unconscious' (Paul 1880; Paul 1920: ch. 2; Strong et al.
1891: ch. i), we can speculate that a low status group who seek social identity with
each other, against some other group, may be more attentive to phonetic detail than a
group who feel secure in their social standing.
Finally, although we used an exemplar memory in all of the simulations, we used
two kinds of mechanism to model sound changephonetic target recalculation for
gradient bias factors (Figure 3.3) and frequency matching for discontinuous bias
factors (Figure 3.4). This difference relies on what Hockett (1965) called the 'Quanti-
zation hypothesis'the idea that the continuous range of phonetics is, for speakers,
divided into discontinuous quanta of phonetic intentions. In the exemplar model, the
difference boils down to whether the bias factor should be interpreted as changing
the articulatory plan for a specific gesture, or changing the production rule used to
select gestures in word production. One is tempted to associate this difference also
with neogrammarian sound change, as against lexical diffusion (as Labov 1981 did).
But there is no reason to believe that frequency matching is any less regular than
target changingthat is to say, there is no reason to think that the shifting frequency
distributions of [s] and [J] would not affect all tokens of [s].
3.7 Conclusion
In this chapter we have outlined a framework for categorizing and understanding
some key features of sound change. Much remains to be examined from this point
of view, of course, including questions only touched on above. For example, how do
processes of enhancement (section 3.5.1) work? How do we interpret lexical and mor-
phological effects in sound change (section 3.5.3)? And what actual sociolinguistic
and psychological evidence bears on the specific theories of actuation discussed in
section 3.6?
46
That is to say, the exemplar approach is mainstream in that part of linguistic research that Strong et al.
(1891: i) called the 'science of language', as opposed to 'descriptive grammar'.
3- Phonetic bias in sound change 97
We have described two broad classes of bias factors that may help explain asym-
metries in sound change. The first, our main focus (sections 3.3-3.4), consists of
bias factors emerging in speech production and perception through motor plan-
ning, aerodynamic constraints, gestural mechanics, and perceptual parsing. Despite
its familiarity, we suggested that perceptual parsing is the least securely established
factor; its prototypical examples may have other interpretations. More research is in
order on this and all the other production and perception bias factors we discussed.
Systemic constraints (section 3.5) are a second broad class of bias factors, aris-
ing from language-specific or universal features of a phonological system. This class
includes perceptual enhancement and in particular auditory enhancement, which
can yield asymmetries in sound change; selectional bias (favoring certain variants,
universally or in certain phonological systems); and perhaps lexical effects. Since some
of the bias factors in this broad class are less well established at this point, the eventual
dossier may be smaller than what we have identified. In Table 3.5 we summarize some
of the best established bias factor types in both broad classes, with a few representative
sound changes that we have mentioned.
Finally, since any full account of phonologization must address the emergence of
speech norms (in an individual or community) from occasional phonetic variants,
we have sketched the outline of a linking theory that relates them (section 3.6).
Whether this sketch and our discussion of bias factors are on the right track or in
need of substantial revision, we hope in any case to stimulate further discussion of
the phonetic bases of phonologization.
4
4.1 Introduction
The fact that sound change can be motivated by phonetic factors is rather uncon-
troversial (e.g. Ohala 1993). In particular, perceptual motivations have been invoked
and proven useful in the study of phonologization patterns (Ohala 1981, 1992, 1993;
Hume and Johnson 2001; Kavitskaya 2002).
According to Ohala's (1981) proposal, which has been widely adopted, sound
change may arise in cases when listeners misparse certain properties of the speech
signal and reinterpret what has been heard. For example, a listener may misperceive
a vowel with a falling tone, which is phonetically longer than other vowels, as phone-
mically long (see section 4.2.2), or a vowel length contrast may be reinterpreted by a
listener as a tonal contrast because certain tonal patterns consistently co-occur with
vowels of a certain quantity (see section 4.2.1). In this example, the sound change
involving vocalic length and tonal pattern may go in either direction (i.e. from tonal
contrast to length contrast or from length contrast to tonal contrast). I will call this
scenario bidirectional sound change.
However, not all sound changes are bidirectional. For example, sound changes
involving vowel height and vowel length seem to be unidirectional; accounts of
a difference in vowel length developing into a difference of vowel height do exist
(see section 4.2.3), however, a difference in vowel height has not been shown to
develop into a length contrast.1
The current study investigates this asymmetry in directionality of sound changes
involving vocalic length and tone on the one hand, and vocalic length and vowel
1
Possible counterexamples, which are extremely rare, seem to be instances of hypercorrection rather
than to be motivated by phonetic factors.
4- Perceptual motivations for changes in vocalic length 99
height on the other hand. In particular, the hypothesis that this asymmetry arises
from differences in the perception of tonal and spectral cues will be investigated by
drawing on the results of a cross-linguistic perception study. This perception study
was designed to test how tightly spectral cues (as acoustic correlates of vowel height)
and fundamental frequency (as the acoustic correlate of tone and pitch accent) are
associated with the perception of vowel duration. The rationale of the experiment was
that if listeners are sensitive to a cue regardless of whether or not that cue is used in
vowel length perception in their native language, this cue is intrinsically more tightly
associated with vowel duration than a cue that impacts only those listeners with a
specific language background; namely a language in which the cue is known to co-
occur systematically with vowel duration (i.e. extrinsically associated). The associ-
ation strength of two cues (intrinsically vs. extrinsically), in turn, can be linked to
phonologization patterns in the following way: If a cue impacts the perception of a
given dimension, such as vocalic length, in the same way for all listeners regardless
of language background, phonologization patterns will presumably reflect this by
allowing changes only in the direction that does not force tightly linked cues to
separate. For two cues that are less tightly associated, we would expect more variability
in phonologization patterns, hence allowing for bidirectionality in sound changes.
The remainder of the chapter is organized as follows: Examples of changes in vocalic
length and their interaction with tone and pitch accent as well as vowel height are
discussed in section 4.2. Section 4.3 reports on the cross-linguistic perception exper-
iment, and section 4.4 discusses the results and argues that the difference between
perceptual cues that are intrinsically linked and those that are extrinsically linked
at the very least correlates withif not motivatesthe asymmetry in sound change
patterns found in sound changes involving vowel length.
and Hu level tones rather than falling or rising tones interact with vowel length, and
that the interaction of different /0 heights and vowel duration is not yet very well
understood. Whatever the exact mechanism underlying the pattern that short vowels
are often associated with high tones and long vowels with low tones (see Yu 2oioc
for more examples and speculations), Ohala's view of sound change may still apply to
the tonogenesis in the two Mon-Khmer languages, since the length distinction must
have been reinterpreted at some point as a tonal distinction. Otherwise the consistent
occurrence of high tones in syllables containing historically short vowels and low
tones in syllables containing historically long vowels cannot be explained at all.
7i/ /u/
^ ^,
/ei/ -> /e/ I oil -> /o/
/e/ . /o/ .
^^ ^^
// PI
Changes in vowel length and vowel quality similar to those described for Latin are
also attested in Iranian Persian, however with a remaining vowel length contrast in
the low vowels /a:/ and /a/ (Windfuhr 1997: 687).
This sound change can also easily be accounted for assuming the listener as source
of the change. Since shorter vowels often tend to be more centralized and, therefore,
somewhat lower in vowel height than the corresponding long vowels, listeners may
come to reinterpret the vowel quality rather than the vowel length as the most promi-
nent feature. Consequently, listeners will adjust the production of these vowels such
4- Perceptual motivations for changes in vocalic length 103
that they are not produced with a shorter duration any more, which, in turn, will then
result in a sound change of the type observed in Latin.
4.2.4 Summary
In analogy to the interaction between tone/pitch accent and vowel quantity, we would
expect to find languages in which a quality contrast has developed into a vowel length
contrast. Such a change could also be phonetically motivated by the well known
fact that high vowels are intrinsically shorter than mid vowels, which in turn are
intrinsically shorter than low vowels (Lehiste 1970). Given this, it would be reason-
able to expect a scenario in which listeners come to reinterpret the length difference
between a high vowel and a mid vowel to be the most prominent characteristic that
distinguishes these vowels, and consequently adjust the production of the vowels such
that the original high vowel turns into a short high vowel and the vowel that was
originally a mid vowel turns into a long high vowel, as illustrated in (5):
(5) lil -> lil /u/ -> /u/
liil Ai:/
for the production of a given vowel, resulting in a formant structure that places
the shorter vowel in a more central position in the acoustic vowel space. Lindblom
(1963) found that the amount of undershoot as determined by the first three formants
was directly related to the duration of a vowel: the shorter the vowel the more the
target undershoot. The original target undershoot model, which was inspired by a
damped mass-spring model of the articulators (Lindblom 1983), is rather automatic
in nature, as it assumes that the target undershoot is the result of power limitations on
the movement of the articulators. In other words, target undershoot occurs because
more articulatory effort would be required in order to reach a given target in less
time.
Target undershoot is often linked to vowel reduction processes, such as reduction
of vowels in unstressed syllables (Lindblom 1963; Engstrand 1988; van Bergem 1993;
Crosswhite 2004). In addition to being linked to vowel reduction processes, the target
undershoot model has also been called upon to account for the quality differences
between long and short vowels in languages with a vowel length contrast. For example,
Johnson and Martin (2001: 82) note about the vowels in the Muskogean language
Creek that 'short vowels are centralized relative to long vowels because of vowel target
"undershoot" in short vowels'. Although it has been shown that target undershoot is
neither a completely automatic coarticulatory process (Manuel 1987; Whalen 1990)
nor a mechanism that can be found in all languages to the same degree (Delattre
1969), many languages with a vowel length contrast exhibit vowel centralization of
the short vowel in a long/short vowel pair. Studies on the influence of spectral cues
on vowel length perception have found that listeners (Heike 1972; Sendelmeier 1981
for German; Abramson and Ren 1990; Roengpitya 2001 for Thai) are influenced in
their judgment of vowel length by spectral cues, such that the more central vowels are
judged shorter than the corresponding peripheral vowels.
Investigations of the perception of dynamic /0, such as Lehiste (1976), Pisoni
(1976), and Wang et al. (1976), found that listeners perceive vowels with a dynamic
/o (i.e. a falling, a rising, or a falling-rising/0) as longer than vowels with a level/0.
All these studies used synthetic stimuli consisting of a single vowel (Lehiste 1976 and
Pisoni 1976) or isolated vowels and non-speech (Wang et al. 1976). While Lehiste s
stimuli compared the perception of a vowel with either a rising-falling or a falling-
rising/0 contour to the perception of a vowel with a level/0 of the same length, Pisoni
(1976) and Wang et al.'s (1976) stimuli compared vowels with a falling and a rising
/o to stimuli with a level/0. Wang et al. found that vowels with a rising/0 contour
are perceived as longer than those with a falling/0. This can be accounted for by the
results found in production studies where vowels with falling tones are shorter than
those with rising tones. Vowels with falling tones were perceived in Wang et al.'s study
as longer than the vowels with a level fundamental frequency. This result was recently
replicated by Yu (2oioc), with the additional finding that vowels with a low level tone
were perceived as shorter than those with a high level tone.
4- Perceptual motivations for changes in vocalic length 105
However, other perception studies either failed to replicate these results (Rosen
J-977) or found that an increase in perceived vowel duration due to a dynamic/0
was context dependent (van Dommelen 1993). Using monosyllabic and disyllabic
words, presented either in isolation or embedded in a sentence, van Dommelen (1993)
found that German listeners only perceived vowels with a dynamic/0 as longer when
they occurred in isolated monosyllabic words. In all other conditions the perceptual
lengthening effect was reversed.
been found in numerous studies since the first investigation of spectral differences
between long and short vowels by }0rgensen in 1969. Acoustic measurements on Thai
long and short vowels have also shown that short vowels are more centralized than
long vowels (Abramson 1962; Abramson and Ren 1990; Roengpitya 2001). While
Japanese is traditionally viewed as a language in which the contrast between long and
short vowels is exclusively durational (cf. Vance 1987: 13), a few acoustic studies that
measured formant values of Japanese long and short vowels did find slight differences
in the quality between long and short vowels (Nishi et al. 2008; Hirata and Tsukade
2003). The fourth language investigated, Spanish, does not have a vowel length
contrast.
The four languages also differ in the co-occurrence restrictions on falling/0 and
vowel length. In Japanese, the occurrence of a falling/0 is restricted by the phonology:
long vowels consist of two morae while short vowels consist only of one mora. Since
each mora can maximally be specified for one/ 0 target, a falling/0 contour (high
/o target on the first mora and low/0 target on the second mora) may only occur
with long vowels (McCawley 1968; Vance 1987). While phonological restrictions
on the distribution of tones in Thai look very similar to Japanese on the surface
a falling tone may occur only in CV syllables containing long vowels but not short
unstressed vowels (Abramson 1962; Moren and Zsiga 2006)the phonetic realization
of the tones of Thai reveals an important difference between Thai and Japanese. The
falling tone (HL) is phonetically realized by a rising-falling/0 contour, and the low
tone (L) is realized by a falling/0 from the mid-range to the low range (Abramson
1962; Candour et al. 1991). This means that the only tone in Thai that is realized
by a falling/o contour is the low tone, which may occur on both long and short
vowels in any syllable context (Abramson 1962; Moren and Zsiga 2006). For German
and Spanish, the occurrence of a falling /0 is not restricted to either long or short
vowels.
4.3.2.2 Study design The participants in this study were twelve native speakers of
Thai, twelve native speakers of German, twelve native speakers of Japanese, and twelve
native speakers of Latin American Spanish. All participants were presented with vowel
continua progressing from a short to a long vowel in three different experimental
conditions. All listeners performed a categorial AXB forced choice task. The stimuli in
the first condition differed in duration only, those in the second condition contained
in addition a falling/0 from 260 Hz to 180 Hz over each of the vowels. The third
condition contained conflicting cue stimuli, in which the spectral cues remained that
of the short vowel throughout the continuum while the long comparison vowel had
different spectral properties.
The stimuli for this experiment were based on the speech of a 22 year-old female
Estonian talker, who produced the vowels in the context of CV(i) syllables, where the
initial consonant was a voiceless unaspirated alveolar stop. Estonian was chosen as the
4- Perceptual motivations for changes in vocalic length 107
language from which the stimuli were drawn in order to avoid a native language
bias for the listeners of any of the investigated languages. Vowel continua for each of
the vowel pairs [ta]-[ta:], [te]-[te:], and [ti]-[tii] were created for each experimental
condition, such that there were seven stimuli on each continuum. Stimulus i on each
continuum was equivalent in duration to the original short vowel, and stimulus 7 was
equivalent in duration to the original long vowel, as produced by the Estonian talker.
The vowels were lengthened in equidistant steps from the duration of the short vowel
to that of the long vowel using Psola (Moulines and Charpentier 1990).
The stimuli for the Duration Only condition were based on the short vowel in the
original [ta], [te], and [ti] utterances. For each of the stimuli, the/0 contour of the
original short vowels was manipulated by removing the original /0 contour and by
replacing it with a level/0 of 180 Hz. Then these stimuli were lengthened in equidistant
steps.
The stimuli for the continua testing the influence of a falling/0 were also based on
the short vowel in the original [ta], [te], and [ti] utterances. However, unlike in the
Duration Only condition, the/0 contour of all stimuli in these continua was replaced
by a falling/o from 260 to 180 Hz. This means that all seven stimuli on each of the
three continua in this condition differed in duration of the vowel and by the steepness
of the /o contour. Since the slope of the /0 contour depends on the duration of the
vowel over which the fall from 260 Hz to 180 Hz is realized, the shortest stimuli on
each continuum had the steepest/0 slope and the longest stimulus on the continuum
had the slope with the least degree of steepness.
The design of the stimuli in the vowel height condition was a conflicting-cue design.
It tested whether listeners judge vowel length by durational cues alone or whether they
also take the quality of the vowel into account to make their judgments. If the spectral
cues did not influence the listeners, we would expect the same category boundary
judgments as in the Duration Only condition. However, if listeners are influenced
by the spectral cues, we should see a difference in the category boundary judgments
between the Duration Only condition and the Vowel Height condition. Just as for the
continua in the other two conditions, the stimuli in this condition were based on the
short vowel in the original CV utterances. The lengthening procedure for the vowels
was the same as in the previous two conditions. The stimuli for the continua in the
Vowel Height condition had the spectral properties of the short vowels, and a level/0.
However, unlike in the Duration Only condition, the vowel quality of the long flanking
vowel presented in either the A or B position of the AXB triad had the spectral cues of
the original long vowels, which were more peripheral in the acoustic vowel space. In
other words, the cues that conflicted were the duration and the quality of the vowel,
such that the stimuli in steps 4, 5, 6, and 7the stimuli with longer durationshad
the quality of the short vowel but the duration of a longer vowel.
The three experimental conditions (Duration Only, Falling/0, Vowel Height) were
presented in randomized order in three separate experimental blocks. Within each
io8 Heike Lehnert-LeHouillier
block, the AXB triads containing the stimuli from the three continua for each of the
vowel pairs [a]-[a:], [e]-[e:], and [i]-[ii] were also presented in randomized order.
Each stimulus was presented six times, yielding 126 trials for each block. Participants
completed a practice block of seven trials before completing the experimental blocks.
All stimuli were played over headphones directly from a PC. Each participant was
instructed to complete the task as accurately and quickly as possible. All participants
were instructed to press A' on the computer keyboard if they felt that the first and the
second stimuli in the triad sounded more alike, and B' if the second and the third
stimuli sounded more alike.
4.3.2.3 Results For the analysis of the results, the number of'short' responses, i.e. the
number of times out of the six repetitions that participants identified the stimulus as
a short vowel, was recorded. The data were then expressed in terms of a percentage of
'short' responses, as a function of the stimulus. For each subject the crossover point
from the 'short' category to the 'long' category on each continuum was determined by
first transforming the sigmoid function yielded by the raw data into a probit function.
This was done using SPSS. The 50 per cent crossover point, which is taken to be the
location of the category boundary, was then calculated using the formula in (6):
(6) x=(y-b)/m
In this formula, x is the point along the stimulus continuum where y (the percentage
of'short' responses) is 50 per cent, b is the intercept with the;/-axis and m is the slope
of the probit function. A two-way (language-vowel continuum) repeated measures
ANOVA was performed on the category boundary results from each experimental
condition, and post-hoc tests of significance using a Bonferroni paired f-test proce-
dure were performed where significant interactions were found.
In order to assess the impact of the spectral cues and the/0 cues on the perception
of vowel duration, the location of the category boundary (50 per cent cross-over)
in each of the two test conditions (Vowel Height or/ 0 ) was respectively compared
to the category boundary in the baseline (Duration Only) condition by subtracting
the category boundary values from the Duration Only condition from the respective
values in the test condition. If the difference in category boundary was significant (as
assessed by a post-hoc Bonferroni paired f-test on the category boundary results in
the two conditions being compared), it was concluded that the cue had a significant
impact on the perception of vocalic length.
Comparing the results of the category boundaries in the Vowel Height and the
Duration Only conditions, we find a significant influence of spectral cues on vowel
length perception for all listeners (F(s, 44) = 3.37; p = .02), regardless of language
background. All listeners judged vowels that they had judged to be long vowels
in the Duration Only condition, as short vowels in the Vowel Height condition.
In other words, they judged the vowels in the middle of the continuum predomi-
nantly based on spectral cues rather than duration. The paired f-tests comparing the
4- Perceptual motivations for changes in vocalic length 109
category boundary in the baseline condition to those in the Vowel Height condition
yielded statistical significance for all four languages: Thai (p < .0001) Japanese (p <
.0001), German (p < .0001), and Spanish (p < .0001). These results are shown in
Figure 4.1.
However, there is some language specificity with respect to how much spectral
cues influenced listeners' judgments of vowel length: the German listeners were
affected most by the spectral cues, while Japanese listeners showed the least sensitivity
to spectral cues, and Thai listeners were influenced somewhat more than Japanese
listeners. Post-hoc Bonferroni paired f-tests showed that German was significantly
different from Thai (p = .003) and from Japanese (p = .002), while Spanish was not
significantly different from either of the other languages. These language specific
differences in the exploitation of spectral cues in the perception of vowel duration
suggest that listeners did not simply respond in a psychoacoustic mode. In partic-
ular the fact that the spectral cues also influenced the vowel duration perception of
the Spanish listenersa group whose native language does not have a vowel length
contrastlends strong support to the hypothesis that spectral cues are more tightly
associated with vowel duration, and that no experience with a phonemic long/short
vowel contrast is needed in order to exploit this cue for the perception of vowel
length.
If we now turn to the impact of a falling/0 on the perception of vowel duration, we
find a quite different state of affairs. As shown in Figure 4.2, the perception of vocalic
length affected only the Japanese listeners significantly (p < .001). The Japanese listen-
ers judged the vowels in the mid-region of the continuumthe ones they had judged
as short in the Duration Only conditionas long in the/0 condition.
FIGURE 4.1 The difference in the location of the category boundary between long and short
vowels in the Duration Only and the Vowel Height conditions averaged across the three vowel
continua [a]-[ai], [e]-[ei], and [i]-[i:]. Asterisks indicate significance at the .0001 level
no Heike Lehnert-LeHouillier
FIGURE 4.2 The difference in the location of category boundary between long and short vowels
in the Duration only and the/0 conditions averaged across the three vowel continua [a]-[ai],
[e]-[ei], and [i]-[i:]. Asterisk indicates significance at the .001 level
Unlike spectral cues, a falling/0 seems to impact the perception of vowel duration
only for those listeners whose native language associates a falling/0 with vowel length.
As discussed in section 4.3.2.1, in Japanese, the occurrence of a falling/0 is restricted
such that it may only occur with long vowels. This co-occurrence restriction seems
to bias listeners towards a long vowel judgment when a vowel of ambiguous duration
contains a falling/0.
Furthermore, we notice that although the difference between the Duration Only
and the /0 condition was not significant for the other language groups, there is not
even the same trend apparent in the direction of how/0 impacts length perception.
While Thai and German listeners, following the statistically significant trend of the
Japanese listeners, tend to interpret the vowels with a falling/0 as longer than those
with a level/o, the Spanish listeners show a (non-significant) trend in the other
direction.
motivations. In the example at hand, the fact that there is some inherent directionality
in how vowel height and duration pattern (the more central a vowel the shorter it is;
see 4.3.1) motivates to some degree the patterns we see in sound changes involving
vowel height and vowel length. However, a similar phonetic motivation exists for
vowel length and tonal contour (vowel duration is longer in vowels with a falling/0
contour, compared to vowels with a rising or level/0; see 4.3.1), yet we find a different
pattern in sound changes involving/0 and vowel length as well as different (although
statistically non-significant) trends in how/0 impacts vowel length perception in the
experimental study (see 4.3.2.3). In other words, while association strength in cue
perception might not be solely responsible for the asymmetry in the directionality of
sound change, it is certainly one factor in explaining the puzzle.
The question that remains is why/0 is less tightly associated with the perception
of vowel duration than spectral cues. A possible explanation for why some cues are
readily perceived by all listeners, regardless of language background, and other cues
only impact the judgment of listeners with a specific language background, may be
rooted in the articulatory organization of speech. In particular, an explanation for the
difference in the influence of/0 and spectral differences on the perception of vowel
length could be grounded in articulation. Spectral differences arise from a difference
in the shape of the vocal tract. In vowel production, these differences are predomi-
nantly caused by gestures of the tongue body. In other words, a tongue body gesture
is an intrinsic requirement for vowel productionwith the exception of a targetless
schwa. If we assumeas proposed for example by Goldstein and Fowler (2003)that
perception tracks articulation, we would expect that all listeners are sensitive to slight
spectral differences in vowels. A dynamic/0unlike an intrinsic/0is not essential
to the articulation of a vowel, and, therefore, the implicit knowledge that a certain/0
pattern is associated with a vowel or syllable has to be established, maybe by means of
categorizing speech events via an exemplar mechanism.
5
5.1 Introduction
This chapter proposes that an inhibitory speech planning mechanism is involved in
the maintenance and maximization of phonological contrast. The maintenance of
contrast is of central importance to the understanding of phonologization. Generally
speaking, assimilatory coarticulation will, unchecked, lead to contrast neutralization.
Yet loss of contrast is far from the inevitable consequence of co articulation; this
implies that there exist cognitive mechanisms that oppose the phonologization of
co articulation. A complete theory of phonological change requires an account not
only of the mechanisms that lead to loss of contrast, but also the ones that preserve
contrast.
Limits on coarticulatory variation are commonly attributed to forces or constraints
that maximize the perceptual distinctiveness of contrast. Dispersion theories (Lil-
jencrants and Lindblom 1972; Lindblom 1986, 1990, 2003; Flemming 1996, 2004)
assert that there exist cognitive mechanisms which function to make speech targets
less perceptually similar. The reader should keep in mind that sound systems never
literally maximize perceptual differences between sounds, because other things, like
co articulation, often oppose the maximization of perceptual distinctiveness.
Recent experimental work on speech motor planning suggests an alternative view
of how contrast is maintained: inhibitory interactions between contemporaneously
planned articulatory targets result in dissimilatory effects, and over time these effects
* Thanks to Keith Johnson for discussions of this research. Two anonymous reviewers contributed to
the improvement of this chapter. Thanks to Yao Yao and Ron Sprouse for assistance in the University of
California, Berkeley Phonology Lab. This work was supported by the National Science Foundation under
Award No. 0817767.
5- Inhibition functions to maintain contrast 113
can prevent speech targets from becoming perceptually indistinct. For example,
experimental observations show that speakers tend to produce an [i] with more
peripheral Fi and F 2 values when they have very recently planned an [a] (Tilsen
2OO9b). Likewise, experimental results presented in this chapter show that Mandarin
speakers dissimilate tones that are planned in parallel. Findings of this sort suggest
that the planning of a speech target is influenced by other simultaneously planned
targets. These dissimilatory effects can be understood to arise from inhibitory
motor planning mechanisms, and can explain how speakers maintain and maximize
contrast.
Here the phonologization of vowel-to-vowel co articulation into vowel harmony
will serve as a representative example of phonologization processes associated with
assimilatory phonetic patterns. This sort of phonologization falls under a general cate-
gory of sound changes considered to arise from hyp o correction (Ohala 1981, i993b).
Section 5.2 describes how Ohala's listener-oriented theory of hypocorrective sound
change applies to co articulation, contextualizes this theory in an exemplar-based
model of memory, and discusses how dispersion theories model the forces counter-
acting this process via maximization of perceptual contrast. Section 5.3 will describe
experimental evidence for dissimilation between contemporaneously planned vowels
in speech, and will present new experimental evidence that indicates tones in Man-
darin exhibit the same effect. Section 5.4 discusses these experimental results, argues
that they arise from an inhibitory mechanism in the planning of articulatory targets,
and explains the importance of this mechanism for understanding phonologization:
i.e. inhibition functions to maintain and maximize contrast.
5.2 Background
To exemplify how hypocorrection leads to sound change, and how dispersion theory
models the forces opposing this process, we use carryover vowel-to-vowel coartic-
ulation as an example. Vowel-to-vowel co articulation is an assimilatory influence
upon the articulatory movements of one vowel due to the presence of a nearby vowel.
Vowel-to-vowel (henceforth V-V) co articulation is either anticipatory or carryover,
and both types have been observed in a variety of languages (hman 1966; Gay
!974> !977; Bell-Berti and Harris 1976; Fowler 1981; Parush et al. 1983; Recasens
1984; Recasens et al. 1997; Manuel and Krakow 1984; Manuel 1990). Carryover
coarticulation in V1 -V2 sequences may arise from a combination of several factors.
Mechanical constraints on the movement from the articulatory posture for V1 to
the posture for V 2 may give rise to coarticulation (Recasens 1984; Recasens et al.
1997). Another potential source of coarticulation is gestural overlap, which in the
task dynamic framework of articulatory phonology (Saltzman and Munhall 1989;
Browman and Goldstein 1986,1988, i99ob) would arise when the gestural activation
interval for V x extends into the time during which V2 is active.
114 Sara Tusen
However, mechanical constraints and gestural overlap cannot be the only sources of
V-V coarticulation because they are not expected over the observed temporal range of
V-V coarticulation, which can span up to three syllables (Fowler 1981; Magen 1997;
Grosvald 2009). A third possibility is that when the articulatory targets for V x and V2
are planned contemporaneously those targets may interact, resulting in assimilatory
shifts in the target of V2 toward V 15 or vice versa (cf. Whalen 1990). In other words,
prior to articulation, there may be variation in the formation of vowel targets that
is influenced by other vowel targets in the preceding and subsequent utterance con-
text, which are planned in parallel. Interestingly, the experimental evidence indicates
that these interactions are predominantly dissimilatory in nature, and hence tend to
oppose the effects of mechanical factors and gestural overlap.
In the highly influential model developed by Ohala (1981, i993b, i994b), V-V
coarticulation, and more generally any form of assimilatory coarticulation, can lead
to sound change through hyp o correction. In this process, sound change begins with
a 'phonetic perturbation that frequently occurs in a given linguistic context. The
sources of such perturbations can be mechanical, aerodynamic, motoric, and/or per-
ceptual. Carryover V-V coarticulation is one example. The normal functioning of the
perceptual apparatus, in this view, is to compensate for the contextually conditioned
perceptual similarity of V2 to V x . In a sense, compensation corrects' or 'normalizes'
for the perturbation in V 2 , undoing its effects on the perception and memory of the
sound.
Hypocorrection occurs when the compensatory mechanism under-corrects for
phonetic perturbations: 'in the vast majority of cases the listener (somehow) parses
the signal correctly and infers the speaker's intended pronunciation. But occasionally a
listener may misparse the signal' (Ohala i994b). The key idea here is that the perturba-
tion is 'parsed as independent of the perturbing vowel'. The correction mechanism fails
to compensate for coarticulation, and so a subtle phonetic assimilation is reinterpreted
as a new pronunciation norm. In the case of V-V coarticulation, hypocorrection leads
to vowel harmony, a contrast neutralization in which the vowels in some structural
domain (e.g. a root, stem, or word) covary in some of their features (cf. Vergnaud
1980; Rennison 1990; Krmer 2001; Finley 2008).
It is important to note that for phonologization to occur a new 'pronunciation norm'
must be established both within an individual speaker and across a group of speakers.
Exemplar theories (Goldinger 1992,1996,1998; Johnson 199/b, 2006; Pierrehumbert
2001 a, 2002) provide a useful way to understand how sound change occurs within a
given speaker. In the exemplar model of perception developed in Johnson (i99/b),
every perceived speech sound is stored in memory as a separate exemplar. The exem-
plars incorporate phonetic details of the particular instantiation of the sound, along
with a variety of contextual information and associations to categorial labels. Each
exemplar is assumed to have an activation levelits relative salience in memory,
which is influenced by its recency and potentially many other contextual factors, such
5- Inhibition functions to maintain contrast 115
as the word in which it occurred, nearby segments, the listener, speaker, etc. Hence
the memory of a sound is not an abstract category, but a large collection of detailed
exemplars that include, among other things, spectrotemporal information.
On the production side, the exemplar model described in Pierrehumbert (2001 a,
2002) uses the collection of stored exemplars to form a production target in the
following way. First, an exemplar is randomly selected, then a weighted average of the
phonetic values of similar exemplars is taken in order to form a production target.
The activation level is a factor in the weighting, and hence more recent exemplars
will play a greater role in target formation. The phonetic values are considered to be
perceptually or articulatorily relevant variables, which for vowels includes formant
values. Moreover, the categorial labels and phonetic values can be used to define a
similarity metric, allowing for a notion of'similar' exemplars.
In the context of this model, regularly present phonetic perturbations can gradually
shift the distribution of exemplars in phonetic space. For example, frequent carryover
V-V coarticulation will tend to assimilate the target of V2 to V x in that context.
This happens because each time a production target is formed, previously stored
exemplars influence the weighted averaging. Furthermore, the exemplar memory of a
given speaker is part of a network of interacting agents, each with their own exemplar
memory. If the phonetic perturbations occur with sufficient frequency across the
population, then memories of both self-generated and other-generated sounds will
feed into the sound change (cf. Oudeyer 2oo6a; Pierrehumbert 2004; Wedel 2OO4a).
Left unchecked, this will lead to partial contrast neutralization, and in the present
example, vowel harmony. What, then, opposes these tendencies?
Dispersion theories describe a formal approach to understanding the maintenance
and maximization of contrast, but these approaches do not explain how speakers
accomplish these things. There are two prominent dispersion theories we consider
here. The adaptive dispersion theory of Liljencrants and Lindblom (1972)cf. also
Lindblom (1986, 1990, 2003)models vowels as mutually repelling objects in a per-
ceptual space (e.g. a 2-D Fi, F2 space), and models vowel system organization as an
optimization problem. In contrast, the constraint-based approach of Flemming (1996,
2004) employs three goals, implemented as constraints: minimize articulatory effort,
maximize the number of contrasts, and maximize the perceptual distinctiveness of
contrasts.
Both approaches have in common an appeal to a cognitive mechanism which
functions to make perceptual contrasts maximally distinct, and both require that this
mechanism coexists with factors that indirectly reduce perceptual distinctiveness. In
the case of V-V coarticulation, both theories correctly predict that in languages with
more vowels, those vowels will exhibit a lower degree of V-V coarticulation because
there is more pressure to maximize perceptual contrast (cf. Manuel and Krakow
1984; Manuel 1990,1999; Magen 1989). However, adaptive dispersion and constraint-
based dispersion do not explain, nor purport to explain, how speakers implement
no Sam Tusen
the repulsive forces or constraints in real time; rather, they describe patterns that are
fairly removed from individual speakers and utterances. In that regard, dispersion
theories fall short of describing how contrast is maintained. Experimental evidence
presented in the next section points to an alternative understanding of contrast
maintenance and maximization, one that utilizes a well-motivated motor planning
mechanism.
FIGURE 5.1 Comparisons of primed vowel shadowing responses on concordant and discor-
dant trials. Ellipses represent 95 per cent confidence regions for within-speaker normalized Fi,
F2 bivariates averaged over the middle third of each response
n8 Sam Tusen
5.3.2.2 Results Eight of the twelve subjects exhibited significant or marginally signif-
icant dissimilation on discordant trials compared to concordant trials. However, the
interpretation of dissimilation is sometimes ambiguous due to the dynamic nature
of/o in contour tones. Figure 5.2 shows within-speaker comparisons of/ 0 - and
duration-normalized tone contours for each of the three tone combinations. Average
5- Inhibition functions to maintain contrast 119
concordant trial contours are shown with a solid line, average discordant trial con-
tours with a dotted line. Both contours are accompanied by 95 per cent confidence
standard error regions. Statistical tests comparing/0 on concordant and discordant
trials were conducted for the first, middle, and last third of each tone. Significant dif-
ferences (p < 0.05) are indicated with V, marginally significant differences (p < 0.15)
are indicated with +'.
Figure 5.2a shows results for subjects who produced Tone i (high) and Tone 2
(rising). Subjects 805, 515, and so6 show dissimilation in one or both tones, i.e. the
discordant contour for a given tone is less similar to the other tone than the concor-
dant contour. Subject su exhibits an anomalous average discordant trial contour, in
which the high tone responses appear to initially assimilate to the non-target rising
tone (which begins lower), and then subsequently dissimilate from the rising tone.
Since the non-target tone rises toward the end, it is possible to see the dissimilation
in Tone i as a form of assimilation to the rising pattern of Tone 2. In other words, the
similarity between tones can be assessed on the basis of relative/0 values, or on the
basis of a pattern of change in/0. However, this latter form of assimilation does not
appear to occur generally across the subject population.
Figure 5.2b shows results for Tone i (high) and Tone 4 (falling). Subjects sio and
si2 exhibit dissimilatory patterns, while so8 and 514 exhibit assimilatory patterns.
Note that 514, who had the largest assimilatory pattern in the experiment, produced
anomalously short tones. The interpretation of dissimilation in si 2 is based upon the
observation that the/0 in the final third of the falling discordant trial contour is further
away from the high tone contour than the concordant trial one. This is more suggestive
of dissimilation than the pattern produced by so8, for whom the discordant falling
tone both begins and ends lower than concordant one. In the so8 case, the contour
is most readily viewed as the consequence of an assimilatory contour-wide lowering
of/0; in the si 2 case, the relative fall in/0 in the final third of the falling tone is more
straightforwardly interpreted as a propensity to exaggerate the fall in/0.
Figure 5.2C shows results for Tone 2 (rising) and Tone 4 (falling). Subjects 503,
so/, and 509 exhibit a dissimilatory pattern in one of the tones. Subject 513 exhibited
no differences between the discordant and concordant conditions for either tone.
Subjects so/ and 509 tended to dissimilate Tone 2 from Tone 4 on discordant trials
by lowering/0; the effect was highly significant for so/, but marginally significant for
so9 and localized to the middle third of the contour. The dissimilation observed in 503
is of the sort identified in s 12, where the final third of the falling contour falls lower on
discordant trials, making it less similar to the rising pattern of the non-target rising
Tone 2.
Table 5.1 shows mean duration and RT data by subject, for each tone-concordance
condition. There were no significant differences in duration or RT between con-
cordant and discordant trials. One subject, so/, appears to have responded anoma-
lously slowly compared to the others. The absence of any effects of discordance on
120 Sam Tusen
FIGURE 5.2 Within-speaker comparisons of/ 0 - and duration-normalized tone contours for
each of the three combinations of Mandarin Tone i (H), Tone 2 (LH), and Tone 4 (HL). Average
concordant trial contours are shown with a solid line, and average discordant trial contours are
shown with a dotted line. Statistical tests comparing/0 on concordant and discordant trials
were conducted for averages taken over the first, middle, and last third of each tone. Significant
differences (p < 0.05) are indicated with V, marginally significant differences (p < 0.15) are
indicated with '+'. For each tone combination, all panels employ the same normalized f0 and
duration scales
FIGURE 5.2 Continued
TABLE 5.1 Mean durations and RTs for each tone and concordance condition
Tone A Tone B
concordant discordant concordant discordant
Tone A-B mean (s.d.)
DUR. (ms) sos 1-2 0.321 (0.029) 0.316 (0.029) 0.314 (0.027) 0.311 (0.028)
so6 1-2 0.137 (0.019) 0.135 (0.020) 0.142 (0.019) 0.138 (0.018)
su 1-2 0.272 (0.027) 0.269 (0.032) 0.241 (0.028) 0.247 (0.023)
sis 1-2 0.341 (0.027) 0.342 (0.030) 0.327 (0.025) 0.334 (0.029)
S03 1-4 0.325 (0.028) 0.332 (0.028) 0.310 (0.037) 0.312 (0.040)
S07 1-4 0.351 (0.029) 0.373 (0.034) 0.409 (0.042) 0.394 (0.040)
S09 1-4 0.289 (0.023) 0.296 (0.023) 0.271 (0.035) 0.273 (0.034)
si3 1-4 0.276 (0.030) 0.277 (0.031) 0.218 (0.031) 0.226 (0.025)
so8 2-4 0.278 (0.063) 0.287 (0.062) 0.245 (0.057) 0.242 (0.056)
sio 2-4 0.260 (0.032) 0.274 (0.033) 0.239 (0.021) 0.238 (0.025)
si2 2-4 0.270 (0.024) 0.275 (0.022) 0.263 (0.024) 0.255 (0.021)
si4 2-4 0.165 (0.017) 0.160 (0.013) 0.152 (0.012) 0.146 (0.011)
RT (ms) S05 1-2 0.434 (0.065) 0.417 (0.064) 0.400 (0.058) 0.428 (0.062)
so6 1-2 0.304 (0.062) 0.295 (0.069) 0.298 (0.068) 0.307 (0.067)
Sll 1-2 0.231 (0.074) 0.230 (0.076) 0.228 (O.07l) 0.221 (0.072)
si5 1-2 0.245 (0.066) 0.249 (0.062) 0.247 (0.056) 0.253 (0.064)
S03 1-4 0.262 (0.088) 0.268 (0.088) 0.266 (0.082) 0.269 (0.089)
S07 1-4 0.513 (0.052) 0.510 (0.051) 0.505 (0.055) 0.521 (0.049)
S09 1-4 0.284 (0.072) 0.281 (0.061) 0.285 (0.065) 0.284 (0.067)
si3 1-4 0.294 (0.077) 0.299 (0.081) 0.288 (0.085) 0.283 (0.074)
so8 2-4 0.423 (0.053) 0.412 (0.048) 0.394 (0.046) 0.410 (0.050)
sio 2-4 0.304 (0.083) 0.307 (0.080) 0.292 (0.078) 0.286 (0.071)
S12 2-4 0.307 (0.044) 0.308 (0.043) 0.304 (0.047) 0.308 (0.042)
si4 2-4 0.346 (0.059) 0.344 (0.059) 0.340 (0.058) 0.343 (0.071)
122 Sam Tusen
5.4 Discussion
To summarize, a majority of subjects exhibited dissimilation on discordant trials, in
at least one of the tones. However, substantial inter-subject variability was observed
in this regard, along with instances of assimilatory patterns. Section 5.4.1 will address
the potential sources of this variability. Section 5.4.2 will argue that the dissimilatory
patterns arise from an inhibitory motor planning mechanism, and section 5.4.3 will
explain how this inhibitory mechanism may be responsible for the maintenance and
maximization of contrast.
FIGURE 5.3 Simulation of the effects of intergestural inhibition on concordant and discordant
trials with an HI target. Stage i shows excitation functions after the prime vowel. Stage 2 shows
excitation and intergestural inhibition functions after the target stimulus. Stage 3 shows the
activation function from which a production target is derived. For comparison, the concordant
(black ) and discordant (white o) centers of activation are shown in both activation functions
In Stage 2, when the target is known, intergestural inhibition is applied and the
target exemplars are fully excited. The inhibition function, shown to the right of the
Stage 2 excitation function, is modeled as a bivariate Gaussian located on the center of
mass/activation of the non-target excitation function for /a/-exemplars. There are two
important aspects of the inhibition. First, inhibition of the non-target /a/-exemplars
is greater on the discordant trial than on the concordant trial. This is justified by
the observation that more salient distractors produce stronger dissimilatory effects
(Tipper et al. 2000). In other words, more inhibition is necessary on the discordant
trials because the non-target prime was more highly excited. Second, the inhibition
function is non-zero throughout the region of Fi, F 2 space where the target /i/-
exemplars are located, and crucially, the inhibition is greater on the side ofthat region
closer to the non-target /a/-exemplars. From these two characteristics, it follows that
the center of mass of the activation function (excitation minus inhibition, shown in
stage 3) is shifted further away from the non-target on the discordant trial, compared
to the concordant trial. In Stage 3, the concordant (black ) and discordant (white o)
centers of activation are shown in both activation functions, for purposes of compar-
ison. The Fi, F2 difference between discordant and concordant trials is about [-30,
55] Hz. Model equations and further details of implementation can be found in Tilsen
(2007).
A more complicated version of this model would treat a larger number of phonetic
variables, as well as dynamical aspects of speech targets. After all, vowel formants
are often dynamic, and Mandarin tones exhibit substantial change over time; this
must be incorporated into target planning and should therefore be subject to dis-
similation. Exemplar theory allows for modeling time as an additional dimension of
exemplar space (cf. Johnson 199/b), so that memories incorporate spectrotemporal
information. Hence the model proposed above should be generalizable to higher-
dimensional exemplar spaces with a temporal dimension. It is also noteworthy that
the model does not require one to commit to representation in either perceptual
or motoric coordinate space. Acoustic coordinates were used here for expository
purposes only.
time, if unconstrained, this situation could lead to loss of contrast, i.e. phonologization
of vowel harmony.
However, the inhibition model also predicts that as V x and V2 exemplar distribu-
tions shift closer in phonetic space, the strength of intergestural inhibition will become
greater on the region of V2 exemplar space (this follows as long as the inhibition
function remains constant over time). In other words, closer targets are more strongly
dissimilated. In some cases, this stronger inhibition will not dissimilate the target of
V2 enough to prevent loss of contrast, but in other cases, the dissimilation may be
strong enough to do so. The exemplar distribution in the latter case will come to reflect
a balance between the assimilation from coarticulatory forces and the dissimilation
from inhibitory ones. This balance is precisely what is described by dispersion theo-
ries. Indeed, intergestural inhibition can be seen as a mechanism through which the
speaker attempts to maximize contrast on an utterance-by-utterance basis. Whether
or not a relatively stable balance occurs in any given language is likely to depend on
many factors, particularly on vowel and consonant inventories of a language and co-
occurrence frequencies of the units in VXV sequences. Ultimately, what intergestural
inhibition provides is a real-time, utterance-anchored mechanism for maintaining
and maximizing contrast. Contrast is never fully maximized because highly variable
coarticulatory forces are always influencing the exemplar distribution, but dispersion
theories likewise do not predict that a phonetic space is actually maximally used
they only posit a tendency toward this.
Hence intergestural inhibition is not a priori mutually exclusive with perceptual
dispersion or perceptual correction. It can be seen in two ways, either as operating
alongside perceptual mechanisms, or as the underlying basis for them. It is also
reasonable to see inhibition both as an intrinsic aspect of how working memory
operates and as something modulated by experience. Whenever articulatory plans
are brought into working memory, the serial ordering of those plans is accomplished
by interacting excitatory and inhibitory processes; the production of one articulation
requires the simultaneous suppression of others, yet the extent to which inhibition
is exerted between plans is inferred and learned from the linguistic experience of a
speaker.
One problem with dispersion theories is that they lack an account of how articu-
latory targets are planned so as to maximize perceptual contrast. These theories hold
that the speaker, for functional reasons, produces sounds that maximize perceptual
contrast. However, there is limited evidence for a real-time perceptual dispersion
mechanism. The most suggestive evidence to date is the hyperspace effect reported
in Johnson, Flemming, and Wright (i993a) and in Johnson (2000). In Johnson et al.
(i993a), listeners identified the 'best' examples of a range of synthetic vowel stimuli
as the ones that were more peripheral than their own productions. The source of this
difference can be interpreted as a consequence of target undershoot in production, or
as the result of an active perceptual process. An alternative account of the hyperspace
5- Inhibition functions to maintain contrast 127
Developmental perspectives
on phonological typology
and sound change
C H A N D A N NARAYAN
6.1 Introduction
The relationship between first language acquisition and phonologization lies at the
crossroads of developmental psychology and historical phonologydisciplines not
often considered in the same breath when addressing the nature of sound patterns
and change. Despite these traditional boundaries I believe that this combined research
program can make significant contributions to a more nuanced understanding of why
sound systems look the way they do and change in particular directions. The present
chapter deals with the relationship between the earliest stages of language acquisition
and the shape of phonological systems and phonological processes including sound
change. The term 'developmental' encompasses both the dynamic nature of the cogni-
tive mechanisms underlying infants' and very young children's emerging organization
of their acoustic-phonetic environment as well as the nature of the linguistic environ-
ment itself. Of particular significance here is the potential contribution of develop-
mental processes (infant speech perception and caregiver speech production) to the
phonologization of acoustic variance in the input. The scope of these developmental
contributions is not limited to the infant and her abilities but also characteristics of
the unique register used by caregivers when interacting with infants. This research
program asks two questions:
Others have viewed the relationship between children's productive phonology and
phonological change more sympathetically and directly (e.g. Labov 1989). Grammont
(1933) suggests that children's productions are a 'microcosm' of historical change,
while Stampe went a step further in suggesting that children are the prime agents
in phonological change (Stampe 1972). Greenlee and Ohala (1980) argue that both
130 Chandan Narayan
children and adults are responsible for the type of phonetic variation that can lead to
sound change. Under the rubric of Ohala's misperception-based sound change (Ohala
1981), where physical constraints on articulatory and perceptual dynamics lead to the
phonologization of variation, Greenlee and Ohala (1980) outline the shifts of child
phonology that are similar to diachronic processes (e.g. French V > Vrj in French
loans in Vietnamese and children learning French).
More recently, linguists have suggested that the relationship between child phonol-
ogy and historical phonology should be played down precisely because 'typical or
potential sound changes' do not match observed phonological states in children's
production (Kiparsky 1988). Blevins (2004) argues that the mismatch between chil-
dren's productive phonology and sound changes (i.e. the types of production mistakes
made by children do not always look like typical sound changes) is a non-problem,
as the enterprise of child phonology does not necessarily assess competence (in the
form of perceptual acuity) but rather performance factors very likely governed by
physiological development (see also Hale and Reiss 1998). She outlines children's
productions, described in terms of phonological rules, as falling under two categories:
those resulting from immature articulatory development, or secondly true 'mini-
sound changes' which may spread through a community of speakers. As Blevins and
others have argued, the problem with looking to children's productive phonology for
clues to directions in phonologization is that articulatory and perceptual capacities
in the first few years of life mature along differing time scales, with motor control
and oral tract development lagging behind the shaping of perceptual competence.
At the earliest stages of language acquisition, production is not necessarily a reflec-
tion of competence (perceptual discrimination and categorization), with perceptual
acuity becoming honed well before infants' production of their first word at around
twelve months.
The clearest demonstration of the connection, and perhaps influence, of children's
productions and phonological phenomena can be seen in typological inventories.
Sound change aside, linguists have recognized the connection between the age of
productive acquisition of phonologically relevant phones and the relative rarity of
these sounds in phonological systems (Ferguson 1973). In general, age of successful
production can be described as exponentially related to frequency of occurrence in
the world's sound systems, that is, the more rare the consonant, the later its pro-
ductive acquisition.1 Figure 6.1 plots the age of productive 'mastery'2 of consonants
1
While there is certainly a relationship between the frequency of occurrence and the emergence of
certain phonological structures (see Levelt et al. 1999; Demuth and Johnson 2003; Rose 2009), the relation-
ship between accurate production of individual phones and the frequency of those phones in the ambient
language of the child is less clear than the overall typological frequency across languages. Table 6.3 (p. 146
below) provides a table of the frequency of consonants in the Brent corpus of infant-directed speech (Brent
and Siskind 2001).
2
Mastery of English consonants in Templin's (1957) study is described as 75 per cent accuracy, while
a more strict criterion of 90 per cent is used in Hua and Dodd's (2000) and Amayreh and Dysons (1998)
studies.
6. Developmental perspectives on phonological typology and sound change 131
FIGURE 6.1 Age of production mastery according to frequency in the UPSID (Maddieson
1984) in American English (Templin 1957), Putonghua (Hua and Dodd 2000), and Jordanian
Arabic (Amayreh and Dyson 1998). Values are jittered within each year
FIGURE 6.2 Proportion RMS energy change from nasal murmur to post-nasal vowel in
Bark 5-7 and 11-14 for [ma](), [na](*), and [rja](o) in Filipino. Used with permission from
Narayan (2008)
languages are more likely to exhibit a two-way /m/-/n/ contrast than a three-way
/m/-/n/-/n/ contrast in syllable-initial position (Maddieson 1984; Anderson 2008).
I argued, based on static (Fi x/ 3 frequencies at the onset of the NV transition) and
dynamic acoustic properties (RMS energy change from nasal murmur to V) of the
three nasal places in Filipino (Figure 6.2) and corresponding discrimination tests with
adult Filipino-speaking listeners, that the acoustic-perceptual salience of the /m/-/n/
distinction is more robust than /n/-/n/. Both static and dynamic acoustic measure-
ments showed better classification (with discriminant analyses) of the /m/-/n/ and
/m/-/n/ distinctions than the /n/-/n/ contrast where tokens showed significant over-
lap along the critical acoustic dimensions. Consequently, the /n/-/n/ distinction is
disproportionately affected by adverse listening conditions. In the noisiest listening
condition (~5dB SNR), discrimination of the [na]-[na] contrast fell to chance while
discrimination of both [ma]-[na] and [ma]-[na] remained near ceiling.
In a follow-up study I examined the perception of nasal place contrasts in Filipino-
and English-learning infants (Narayan et al. 2010) using the Visual Habituation
technique.3 Following on from the typological and acoustic-perceptual results from
Narayan (2008), the [na]-[na] contrast proved difficult for both groups of infants.
English-hearing infants at 10-12 and 6-8 months discriminated the acoustically
robust and typologically common [ma]-[na] contrast. English-learning infants did
not reliably discriminate the acoustically fragile [na]-[na] contrast, even at 6-8
months, an age when other non-native (oral) consonant contrasts are successfully
3
See Werker et al. 1998 for details regarding infant speech perception methods.
134 Chandan Narayan
discriminated. Even very young English-learning infants (4-5 months) were unable to
discriminate [na]-[na] while they successfully discriminated the acoustically robust
[ma]-[na] contrast. When 10-12- and 6-8-month-old Filipino-learning infants' dis-
crimination of the native [na]-[na] contrast was tested, only the older group showed
discrimination.
The results from Narayan et al. (2010) are suggestive of a role for acoustic salience in
developmental speech perception. The [na]-[na] contrast, which is acoustically fragile
(relative to the robust [ma]-[na] contrast), is poorly discriminated in early infancy
and only successfully discriminated with appropriate language experience by the end
of the first year. I would suggest that infants' difficulty discriminating the perceptually
similar syllable-initial [n]-[n] contrast contributes to the typological restrictions on
nasal onsets and the directions of sound change patterns observed in nasals in the
world's languages (i.e. Proto-Austronesian syllable-initial *m, n y r j > Thao, Malagasy,
Tetun, Hawaiian, Tahitian m> n).
Fricative contrasts: /f/-/9/ and /s/-/z/Dental, non-sibilant fricatives are rare in the
world's sound systems, occurring in only 3.99 per cent of the languages surveyed in the
UPSID (Maddieson 1984). In the WALS database of 567 genetically diverse languages,
they occur in just 43 (7.6 per cent) (Maddieson 2008). Correspondingly, contrasts
involving dental fricatives have been shown to pose problems for infants in speech
discrimination tasks. In a series of studies in the 19705, Eilers and colleagues showed
that English-hearing infants at both 6-8 months and 10-12 months fail to accurately
discriminate the English labiodental-interdental fricative place distinction ( [fa] - [0a] )
(Eilers 1977) using the Conditioned Head Turn procedure (CHT). The older group
showed discrimination of the contrast only when the fricative was followed by [i].
This result proved highly controversial and led to two subsequent studies, both of
which showed English-learning infants discriminating the [fa]-[0a] contrast. While
Holmberg et al. (1977) showed that six-month-olds discriminated the contrast, they
noted that subjects required twice as many trials to achieve criterion (an indirect
measurement of perceptual difficulty) than they did to reach criterion on the /S/-/JV
contrast. Further, at two months, infants were shown to successfully discriminate
[fa]-[0a] using the High-Amplitude Sucking (HAS) procedure (Levitt et al. 1988).
The conflicting reports of labiodental-interdental fricative discrimination in English-
hearing infants suggests the perceptual difficulty of the contrast relative to plosive
obstruent place contrasts.
I suggest that there is an acoustic source for infants' difficulty in discriminating
/f/-/0/, which potentially contributes to the relative rarity of the contrast in sound
systems. In a recent acoustic study of twenty American English speakers, the fricative
noise in both sounds was shown to have similar duration (165 ms), spectral peak
locations (8 kHz), mean spectral moments (5.1 kHz), kurtosis, and skewness (Jong-
man et al. 2000), all of which contribute to place perception in fricatives (Behrens and
6. Developmental perspectives on phonological typology and sound change 135
Blumstein 1988; Jongman 1988; Hedrick and Ohde 1993). In Jongman et al.s (2000)
study, when 21 acoustic predictors were used in a discriminant analysis classification,
27 per cent of labiodental tokens were classified as interdentals, and 26 per cent of
interdentals as labiodentals. This rate of confusion is consistent with human percep-
tual confusions between the two fricative places. In Miller and Nicely (1955), at the
highest signal-to-noise ratio and with the broadest band of frequency information
(+12 dB SNR, 200-6500 Hz), listeners identified /0/ as /f/ at a rate of 26 per cent.
Further, in several varieties of English (e.g. working-class London speech), /f/ and /0/
are merging.
Another fricative contrast that has proved difficult for infants to discriminate is the
alveolar voicing contrast. In a HAS procedure, English-learning infants (1-4 mos.)
failed to discriminate the [sa]-[za] contrast (Eilers and Minifie 1975; Eilers et al.
1977). There is corresponding asymmetry in the distribution of voiced and voiceless
alveolar fricatives in the world's languages, as well. In UPSID 69 per cent of alveolar
fricatives are voiceless. While there is a clear articulatory/aerodynamic reason behind
the preference for voiceless (over voiced) fricatives4 (Ohala 1983) and corresponding
devoicing of/z/ (Smith 1997), there is no clear acoustic-perceptual reason for infants'
failure to discriminate /s/-/z/ at such an early age.5 Indeed English-speaking adults'
perception of the contrast leads to little confusion (Miller and Nicely 1955), perhaps
owing in part to differences in voice onset time and fricative duration and amplitude
(Jongman et al. 2000).
The results of infants' perception of lead vs. short-lag VOT is quite different, how-
ever. The overwhelming majority of studies investigating this distinction suggest that
infants' discrimination is quite poor. Only two studies (Eimas 1974; Streeter 1976)
have shown infants' successful discrimination of the prevoicing/short-lag contrast
(Table 6.1). Kikuyu-learning infants discriminated both a prevoiced/simultaneous
(30/0 ms) VOT distinction as well as the short/long-lag distinction. It remains
unclear, however, whether the prevoiced discrimination results from experience with
Kikuyu or the psychophysical salience of the contrast, for English-learning infants
do not show discrimination of a similar distinctions (Eimas et al. 1971; Eilers et al.
1979). These studies suggest that the lead/short-lag implementation of voicing is
disadvantageous from the infant's point of view (but see Aslin et al. 1981). The lag
region of the VOT continuum is most likely privileged by the perceptual system for
psychophysical reasons (Pisoni 1977) as it provides more robust cues to a voicing
contrast (aspiration, Fi onset) than does the lead/short-lag distinction.
The perceptual advantage afforded to the short/long lag distinction in infancy has
an analogue in production as well, where mastery of prevoicing occurs relatively
late compared to short-lag VOT in languages like Spanish (Eilers and Benito-Garcia
1984), French (Allen 1985), and Thai (Candour et al. 1986) (but see Whalen et al. 2007
for VOT in babbling). The connection between infants' greater success at discriminat-
ing short/long-lag contrasts versus lead/short-lag contrasts and typological patterns
remains unclear, owing to a lack a comprehensive cross-linguistic survey (similar to
UPSID or WLS) of voicing implementation along the VOT dimension. Keating et al.
(1983)'s survey of 51 languages shows that a voicing contrast always utilizes a Voiceless
unaspirated' (short-lag) stop. Keating (1984) suggests that in contrast to the short-lag
implementation of VOT, languages which feature stop voicing contrasts are equally
likely to use 'fully voiced' (lead) or Voiceless aspirated' stops. The perceptual patterns
6. Developmental perspectives on phonological typology and sound change 137
of infants would predict, however, that languages more often utilize a short/long-lag
implementation of voicing than lead/short-lag VOT.
/1/-/J/Perception of the [la] -[ia] (English V) contrast has recently been shown to
be be facilitated by native language experience (Khl et al. 2006). Khl (2006) investi-
gated English- and Japanese-learning infants' perception of the (naturally produced)
contrast at 6-8 and 10-12 months of age using the CHT procedure. At 6-8 months,
both English- and Japanese-learning infants discriminated the contrast at a rate of
65 per cent correct, well below native levels of discrimination ability (approximately
80 per cent correct for synthetic stimuli in Miyawaki et al. 1975). By the end of their
first year, English-learning infants' perception of the contrast improved to approx-
imately 75 per cent correct. Further supporting the relative difficulty of III-III dis-
crimination in infancy, Kuhl's (2006) results revealed a directional asymmetry, where
facilitation of the contrast occurs only when infants are conditioned to discriminate
a change from III to III.
The English V is rare among the world's sound systems (occurring in roughly two
per cent of the languages in UPSID, compared with 39 per cent of languages with /!/)
and notoriously difficult to produce and perceive for non-native speakers (e.g. Goto
1971; Miyawaki et al. 1975; Polka and Strange 1985). Acoustically, III and III have very
similar spectral profiles, differing primarily in/3, which is characteristically low in III
(Fant 1960; Dalston 1975; Espy-Wilson 1992).
the control. By 10-12 months, mean A' scores for English-learning infants increased
slightly, while remaining unchanged for French-learning infants. Adult English speak-
ers showed A7 scores reflecting ceiling levels of discrimination. Adult French speakers'
A' scores remained unchanged from the infant groups. These results are suggestive
of the interpretation that language experience serves to facilitate (or improve) native
contrast discrimination. Further, they also show that the initial state of /d/-/8/ per-
ception is less accurate than a similar stop-fricative (here /b/-/W) contrast.
In clean listening conditions, English-speaking adults discriminate the /d/-/8/ con-
trast quite well (Polka et al. 2001), but with the addition of additive noise, confusion
patterns result that are consistent with both the infants' relatively poor discrimination
and also the substitution patterns observed in L2 speakers attempting to produce /8/
(Miller and Nicely 1955). Taken together, results from native infant and adult percep-
tion are suggestive of a low-level acoustic source for relatively poor discrimination of
the /d/-/8/ contrast and the substitutions observed.
6.2.2 Implications
Infants' perceptual sensitivities are far from language universal. The outline presented
above, highlighting instances where infants' perceptual performance falls short of the
language-general perceptual specification often cited by linguists and psychologists,
corresponds to the typological regularities found across the world's languages. I would
suggest that these contrasts, which are fragile in terms of their acoustic distinctiveness,
are prone to misperception at the earliest stages of phonological development.
Another stage in the acoustics/development/typology story is type frequency in the
lexicon and token frequency in ambient speech. It is often the case that phones in a
weak acoustic-perceptual salience relationship are rather infrequent exemplars in the
lexicon (such as /8/ restricted to demonstrative articles in English) as well as token
frequency (as in syllable-initial /n/ in Filipino) in a language (Narayan 2008). I would
suggest that if infants have minimal evidence (in terms of a stochastic mechanism
for category formation) (Johnson 199/b; Pierrehumbert 2ooia; Maye et al. 2002) for
an already acoustically weak contrast, which is then coupled with a low functional
load (Martinet 1933), they have the potential to affect misperception-based change
(Greenlee and Ohala 1980). This argument is further bolstered by the fact that, in
some children, production patterns suggest an effect of perception on early lexical
representations (Macken 1980; Rose 2009).
Liu et al. 2003; Werker et al. 2007). Much of this work is driven by recent models
of category learning as a function of the frequency of the input, where infants are
shown to discriminate phonetic categories when familiarized to tokens comprising
different modes in an artificially created stimulus continuum (Maye et al. 2002, 2008).
Researchers have found such modally distributed cues in the acoustic input to infants.
For example, Werker et al. (2007) showed that Japanese- and English-speaking moth-
ers, when teaching new words to their young infants, consistently produced acousti-
cally distinct modes of vowel quality (/i/ vs. /e/ and /e/ vs. /e/ for English) and vowel
duration (/i/ vs. /i:/ and /e/ vs. /e:/ for Japanese). Much of the research examining IDS
has highlighted its enhancing hallmarks, where categorical phonetic distinctions are
exaggerated (i.e. vowel duration, vowel quality, toneTang and Maidment 1996; Liu
et al. 2007). The present section considers an often overlooked acoustic consequence
of the IDS register, namely the reduced clarity of contrast in the speech to very
young infants (Baran et al. 1977; Malsheen 1980; Sundberg and Lacerda 1999), and
its implications for the directions of sound change.
the only cue to word boundaries. Only infants exposed to the IDS input were able to
distinguish words from part-words (Thiessen et al. 2005).
Interestingly acoustic features of IDS at the level of the segment also seem to change
over the course of an infant's development. Malsheen (1980) examined voicing in a
longitudinal study of English IDS spoken to children ranging from six months to five
years of age, and found that only when infants were 15-16 months old did mothers
significantly separate the voiced and voiceless categories along the VOT dimension.
At 15-16 months, mothers implemented longer VOTs for voiceless tokens than in
their voiceless tokens to younger infants. Baran et al. (1977) found no significant
differences in VOT between IDS and ADS when infants were twelve months old.
Sundberg and Lacerda (1999) found that in the IDS addressed to three-month-old
Swedish infants, VOT was significantly shorter in both voiced and voiceless stops
than in ADS. This resulted in more overlap between the voicing categories in IDS. The
authors provide a developmental account of their findings by suggesting that acoustic
properties of obstruents are less 'specified' in the IDS to young infants and gradually
reach adult-directed VOT values at around the time infants produce their first word.
More recently, in a study of Norwegian IDS, Englund (2005) found that alveolar and
velar stops have longer VOTs during infants' first six months than in ADS. While
there were no differences in the voiced/voiceless distinction along the VOT dimension
between the two registers, the developmental profile of the data suggested that VOT
in IDS becomes more like ADS as infants get older. The developmental account is
consistent with studies of IDS vowel production as well, where acoustic clarity is found
only in those lexical categories used by the child (Bernstein Ratner 1984).
What I argue in the case study below is that the not-so-careful speech to very young
infants has acoustic consequences which have the potential to become phonologized
by infants in this perceptually sensitive stage of development (Werker and Tees 1984).
The interaction between the socially driven imperatives of early IDS and contrastive
phonetic salience can provide the learner with the kind of structured acoustic vari-
ability associated with misperception-based sound changes (Ohala 1981).
listeners of a previously intrinsic cue after recession and disappearance of the main
cue.' While the primary cue to voicing in English (VOT) has not 'disappeared' as hap-
pened in many cases of tonogenesis, I would argue that the IDS register contributes
to acoustic ambiguity in voicing that is consistent with the development of tone.
Previous studies have shown that the distribution of voiced and voiceless tokens
along VOT are more similar in the IDS to infants under twelve months than in the
IDS to older infants or in ADS (in American English and Swedish). Voiceless VOTs
are generally shorter in IDS, resulting in more overlap with voiced VOTs (Baran et al.
1977; Malsheen 1980; Sundberg and Lacerda 1999) compared to ADS or IDS to older
infants. In a recent study of word-initial voicing in American English IDS and ADS we
(myself together with Kyle Gorman and Daniel Swingley from the University of Penn-
sylvania) examined VOT and post-consonantal/0 in the hope of understanding (i)
the regularity of the acoustic features of voicing available to young infants and (ii) the
relative weights of VOT and/0 in predicting voicing in IDS and ADS. In examining the
covariation of VOT and/0 in voicing in two different registers we hope to shed light
upon the history of the interaction between these features as providing potentially
ambiguous and ultimately misinterpretable cues.
1981). In order to control for varying speech rate, which is known to be slower in IDS
compared to ADS (Khl et al. 1997), VOT was normalized by dividing the raw VOT
measurement (ms) by the duration of the following vowel. This ratio has been shown
to serve as a perceptual criterion for voicing category affiliation (Boucher 2002).
Voiced regions inside the post-stop vowel region were extracted and pitch tracks
obtained (at i ms time steps) using SWIPE' (Camacho 2007). The pitch extraction
algorithm required that the voiced region be at least 10 ms. Tokens with less than
10 ms of post-stop voicing were discarded. The procedure yielded 1200 IDS and 1058
ADS CV tokens. A visual inspection of all the pitch tracks confirmed that there were
no obvious halving errors in the extraction. In order to control for individual speakers'
pitch ranges, raw/0 measurements were normalized by speaker using the standard z
calculation. Following Umeda (1981), peak (or maximum) f0 (in the first half of the
post-stop vowel) was computed for analysis.
6.3.4 Results
Analyses of mean VOTs according to register and voicing were consistent with pre-
vious reportsthere was a voicing x register interaction suggesting that voiced and
voiceless stops in IDS showed more overlap along VOT than in ADS (F(i, 2258) =
1552,p < o.oooi), that is, the modes of VOT were more separable for voiced and
voiceless tokens in ADS than in IDS. There was also a general pitch perturbation effect
(with no interaction of register) suggesting that voiced stops were followed by a lower
pitch than voiceless stops.8
The logistic regression models of IDS and ADS were fitted using VOT,/0, and
their interaction as predictors of voicing. Table 6.2 presents regression models of
voicing in IDS and ADS. Both registers show a significant main effect of VOT, with a
negative slope () indicating that an increase in VOT results in less voiced prediction.
Fundamental frequency is significant in both registers as well, again with a negative
slope confirming the pitch perturbation effect.
The interaction between VOT and f0 is significant in only the IDS model. The
interactions (plotted in Figure 6.3) suggest that as VOT increases,/0 has a greater effect
on voicing prediction. In IDS,/0 becomes more and more predictive of voicing as VOT
increases. As a result, where VOT is most ambiguous in the signal (VOT ratio between
o and o.5),/o becomes more useful as an indicator of voicing. No such effect is present
in ADS. For example, given a VOT ratio of 0.25 (where there is significant overlap
between voiced and voiceless tokens) and an f0 value at the 10 per cent quantile
8
Both the VOT and /0 analyses were conducted using 2 (register: IDS, ADS) x 2 (voicing: voiced,
voiceless) x 3 (place of articulation: velar, apical, bilabial) ANOVAs. There were considerably more pre-
voiced tokens in the ADS sample than in IDS. We explored the possibility that the interaction between
voicing and register was driven by the more negative mean VOT for voiced tokens in the ADS sample.
This interpretation was not supported, as the interaction was also significant when prevoiced tokens were
removed from the analysis. There was an expected effect of place on VOT, with velars having the longest,
followed by alveolars, then bilabials.
6. Developmental perspectives on phonological typology and sound change 143
ADS
FIGURE 6.3 VOT x /0 interaction in voicing prediction in IDS and ADS. The two panels
show probability curves overlaid on a jittered VOT x voicing scatterplot. The curves represent
quantile values of the distributions of/0. For example, the median curve (solid line) represents
the/o value below which 50 per cent of the data lie in each register. 'High' and Low'/0 represent
the highest and lowest/, values in each corpus
144 Chandan Narayan
(a relatively low pitch), the probability of the token being voiced is approximately
55 per cent in IDS. Given the same VOT and a median (50 per cent quantile) f0
value, the probability of the token being voiced is approximately 30 per cent. In the
ADS model, similar shifts in f0 do not substantially change voicing predictions.9
Voicing is therefore more consistently implemented with VOT in the ADS corpus,
thus minimizing the effect of/0 as a cue to voicing.
6.3.5 Implications
At nine months, infants in English-speaking environments are being exposed to
highly variable VOT, with considerable overlap between categories, thus highlighting
the regularity of pitch perturbation as a reliable cue to voicing. While we do not expect
English-learning infants to reinterpret the regularity of pitch perturbation as tone,
what this study points up is the acoustic instability of the IDS to young infants, par-
ticularly at an age when they are thought to be developing and honing their perceptual
sensitivity (Werker and Tees 1984; Narayan et al. 2010). The models suggest that, at
this young age, consistency of the pitch perturbation effect essentially prevents the
learner from making incorrect predictions of voicing.
VOT as the primary cue to voicing in English is salvaged, however, for by the
time infants produce their first words, VOT as a cue to voicing in IDS resembles the
consistency of ADS, thereby precluding infants' reinterpretation of/0 as the primary
cue to voicing.
6.4 Conclusions
Linguists have long speculated on the potential role for infants and children in histor-
ical phonology processes (Baudouin de Courtenay i895[i9/2a]). For the most part,
these speculations have been relegated to the domain of speech production and the
nuances of child phonology. As Blevins (2004) has noted, the enterprise of locating
phonetically motivated historical processes in child phonology is inherently con-
founded with inter- and intra-speaker variability associated with the quickly maturing
vocal apparatus. That is, language-specific phonological patterns may be obscured in
children's productions, by changing physical constraints. Unfettered by these non-
cognitive limitations, behavioral studies of speech perception offer a window into
the earliest sensitivities of infants, which in turn allow us to assess infant biases and
connections to directions of phonological change and typological regularities.
9
Voicing implementation was also modeled by considering individual speaker variation using hierar-
chical logistic regression (Gelman and Hill 2007; Gorman 2009). Results similar to the models presented in
Table 6.2 were obtained, with three out of the four IDS speakers showing a significant interaction between
VOT and/o. The interaction was not significant for any of the ADS speakers.
6. Developmental perspectives on phonological typology and sound change 145
Despite the remarkable cognitive abilities exhibited by young infants, the earli-
est stages of their speech perception are less than ideal. Infants, rather than being
citizens of the world', are more likely members of the majority party, with initial
perceptual abilities reflecting acoustically robust, and typologically common, pho-
netic contrasts. I argued that ubiquitous contrasts in the worlds languages reflect,
to some degree, the natural perceptual biases infants bring to the language-learning
table. Conversely, contrasts which are typologically rare reflect the relative difficulty of
their discrimination by young infants. The consistency between patterns in develop-
mental speech perception and phonological typology is consistent with functionally
based approaches to the phonetics-phonology interface such as Lindblom's disper-
sion theory (Liljencrants and Lindblom 1972; Lindblom 1986; Johnson et al. 1993a),
which proposes that phonological contrasts are sufficiently distinctive perceptually
(in order to be learned and remain stable). While the question of how and why
certain perceptually difficult contrasts remain in sound systems cannot be answered
within the present research program, we appeal to general learning mechanisms (i.e.
statistical learning in terms of frequency of occurrence, cf. Maye et al. 2002) for their
persistence.
Directions for future research might include exploring infant biases in the per-
ception of typologically asymmetric distributions of suprasegmental features such
as tone. Recent work suggests that infants' discrimination of certain tone contrasts
follows the typical profile of perceptual reorganization (Werker and Tees 1984). Mat-
tock and Burnham (2006) showed that English-learning infants discriminated Thai
rising vs. falling and rising vs. low tones more accurately at six months than at nine
months, suggesting that infants' perceptual sensitivities have reorganized in the direc-
tion of privileging native contrasts. Given the connections between the earliest biases
in infant speech perception and typological patterns, we might next ask if infants
discriminate acoustically similar tones (e.g. tones 22 vs. 33 from Cantonese) as well
as tone contrasts with a robust salience (e.g. Cantonese 21 vs. 25) (Khouw and Ciocca
2007).
Finally, the connection between development, typology and phonologization is
not limited to child behavior. This chapter also outlined the type of variability
associated with infant-directed speech in English and the similarity between its
acoustic characteristics and the phonetic conditions giving rise to tone from the
loss of voicing contrasts. While this analysis does not claim to capture a sound
change in progress, it provides evidence for treating infant-directed speech as criti-
cal input to the developing speech perception system, particularly when emotional
affect results in either hyper- or hyp o-articulation (Lindblom 1990) which can
potentially be reinterpreted as phonologically different from the intended linguistic
gesture.
146 Chandan Narayan
t 0.13
n o.io
j, s 0.07
k, d, l 0.06
m, j, , w 0.05
b, g, h 0.04
z, p 0.03
f, rj 0.02
v, 6, J, tf, et; o.oi
3 <o.oi
Part III
7.1 Introduction
Cross-linguistically, some phonological structures are preferred over others. For
example, voiceless obstruents are favored relative to voiced obstruents in that many
languages place special restrictions on voiced obstruents or lack them altogether.1
When confronted with asymmetries such as these, linguists have long referred to the
dispreferred structure as 'marked' (Croft 1990).
Markedness is often closely tied to phonetic facts. Voiced obstruents, for example,
are disfavored for well-documented reasons to do with the aerodynamic properties
of the vocal tract (Ohala 1983: 194). However, some linguists would argue for the
existence of an abstract cognitive representation of markednessone that may reflect
phonetic realities but cannot ultimately be reduced to them. For example, this is the
role played by markedness constraints in the view of some, but not all, practitioners of
Optimality Theory (Prince and Smolensky 2004). The role of these abstract represen-
tations is often to mediate between gradient phonetic patterns and their categorical
analogues. In this chapter, I will refer to low-level causes such as the aerodynamics of
the vocal tract as 'phonetic' pressures on marked structures, and to the hypothesized
coarse-grained cognitive representations of markedness as 'phonology'. Although the
* Many thanks to Elliott Moretn, whose questions led to this chapter; Colin Wilson, for statistical advice
and other helpful discussion; Paul Willis, for collaboration on the Korean database (and Yongeun Lee for
technical assistance), and Aaron Kaplan, Grant McGuire, Armin Mester, Jeremy O'Brien, Jaye Padgett, Matt
Tucker, and audiences at the 2008 OCP and the Symposium on Phonologization for helpful comments at
various stages of the project. All shortcomings are my own. This research was funded by an NSF Graduate
Research Fellowship and conducted at the University of California, Santa Cruz.
1
This is, of course, a simplification; there are also certain environments in which voiced obstruents are
preferred to voiceless onesfor example, postnasally (Pater 2004).
150 Abby Kaplan
7.2 Method
7.2.1 Corpora
In each of the two cases of underphonologization discussed below, the phonetic and
phonological patterns are tested against actual patterns of lexical frequency in seven
languages: English, German, Dutch, French, Spanish, Serbo-Croatian, and Korean.
2
Note that the term 'underphonologization' is also used in the literature to refer to cases in which some
phonetic pattern is realized as a categorical phonological pattern less often than expected. To avoid circu-
larity (cross-linguistic frequency defines markedness, which is in turn found to affect lexical frequency), I
restrict my attention to phonetic patterns that appear never to be phonologized.
152 Abby Kaplan
Data for English, German, and Dutch was obtained from the CELEX lexical database
(Baayen et al. 1996); data for French from Lexique (New et al. 2001); data for Spanish
from BuscaPalabras (Davis and Perea 2005); and data for Serbo-Croatian from the
Ukrstenko corpus (Sipka 2002).
Data for Korean was obtained from the Korean National Database (Lee 2006).
The entries in this database are listed in Korean orthography, which is largely mor-
phophonemic. However, for the sake of consistency with the results from the other
databases (which contain broad phonetic transcriptions), the database was postpro-
cessed with a basic SPE-style phonology of Korean with the goal of yielding more
surface-like representations of the lexical entries. The phonological grammar was
written in collaboration with Paul Willis at UC Santa Cruz and based on the descrip-
tions of Sohn (1999). There is reason to be cautious in interpreting the results for
Korean: since morphological boundaries were not available in the original database,
we were unable to implement any morphophonological rules.
For each language, lexical frequency was calculated over monomorphemic lemmas.
None of the languages has phonologized any of the patterns discussed below. (Korean
does have vowel harmony in affixes, but affixes were not included in the corpus and
thus did not affect the results.)
1. Extract all relevant sequences from the lexicon (e.g. all vowel-vowel sequences
in which the two vowels are separated only by consonants).
2. Build an LRM predicting the dependent variable (e.g. vowel height) from
other potentially relevant factors (e.g. vowel tenseness) and their two-way
interactions.
3. Run an ANOVA on the model to identify non-significant factors or
interactions.
/. Lexical sensitivity to phonetic and phonological pressures 153
4. Remove the least significant factor or interaction and rebuild the model.
=> Exception: never remove the factor of interest; where this factor is not sig-
nificant, this fact is reflected in its p-value.
5. Repeat steps 3-4 until all factors are significant at p < .05.
When the resulting models are reported below, I indicate which factors were actually
included in the final version of each model. For reasons of space, the individual inter-
actions included in each final model are not listed. See the appendix for information
on how the vowels and consonants of these corpora were coded for the relevant
features.
The phonetic and phonological patterns therefore make distinct predictions for lex-
ical frequency: there are phonetic precursors for both fronting and backing next to
coronals, but only fronting matches attested phonological patterns.
FIGURE 7.2 Coefficients of C place factors (*':p < .05; ":p < .1)
Although both fronting and backing next to coronals have phonetic precursors,
we do not know whether those precursors are equally strong. Ohala does not pre-
dict how often dissimilation will occur, only that it can result from misperception;
indeed, a reviewer suggests that since knowledge of the articulatory fronting effect is
a prerequisite for hypercorrection, we might expect the former to be stronger. Thus,
an alternative interpretation of these results is that lexical frequency reflects phonetic
precursors in proportion to their strength; further research would be required to rule
out (or confirm) this possibility. The case of underphonologization in the next section
is less susceptible to this kind of explanation.
V2 Vl V2 V2
Language Vowels Front Front Tens Stress Mid p High p
English All / / / / -.4740 .0000 -.1824 .0185
Nonlow / / / / - .3768 .0001
German All / / / / -.3438 .0000 -.2616 .0000
Nonlow / / / / - .0707 .2758
Dutch All / / / / -.0384 .1333 .1902 .0000
Nonlow / / / / - .2213 .0000
French All / / / NA -.2721 .0000 -.2765 .0000
Nonlow / / NA - .6906 .0000
Spanish All / / NA NA -.3150 .0000 -.3318 .0000
Nonlow / / NA NA - .1067 .0936
Serbo-Croatian All / NA NA .1701 .0000 .6461 .0000
Nonlow NA NA - .1503 .0000
Korean All / / / NA .7998 .0000 .9541 .0000
Nonlow / / / NA - .1277 .0001
FIGURE 7.3 Coefficients of Vi height factors, all vowels (*':p < .05; ":p < .1)
The results for the models with all vowels are summarized graphically in Figure 7.3.
They paint a surprising picture: most of the coefficients are negative, suggesting some-
thing like an OCP effect over vowel height. In addition, the pattern of Korean, which is
the only language to display effects in the predicted direction, may well be a relic of the
vowel harmony system that existed in older forms of the language. This apparent anti-
harmony pattern is predicted neither by the attested phonological patterns of vowel
height harmony nor by their well-documented phonetic precursors.
/. Lexical sensitivity to phonetic and phonological pressures 159
FIGURE 7.4 Coefficients of Vi height factors, non-low vowels (*':p < .05; ":p < .1)
4
'Voicing is here a cover term for a range of laryngeal contrasts: voiced/voiceless, unaspirated/aspirated,
lax/tense. In each pair, obstruents of the former type are expected to raise preceding vowels. Korean data
is reported for the unaspirated/aspirated obstruent series.
/. Lexical sensitivity to phonetic and phonological pressures 161
FIGURE 7.5 Coefficients of voiceless C factors C*':p < .05; ":p < .1)
obstruent voicing does not consistently affect lexical frequency, unlike its phonolo-
gized cousin involving height-height interactions. If we examine the various combi-
nations of levels of other factors and consider the effects of interaction coefficients, the
difference between the height-height pattern and the height-voicing pattern remains:
the height-height trends are far more consistent with the phonetic pattern than the
height-voicing ones.
that some phonetic patterns are 'stronger' than others, and it is only the strong
phonetic patterns that are able to influence other areas of natural languageamong
them, phonology and the lexicon. Although a priori plausible, it is likely that this
idea cannot be the whole story, given Moretons (2oo8a) finding that the acoustic
effects of vowel height and obstruent voicing on vowel height are of comparable
magnitudes.
It is certainly possible that this case of underphonologization could be accounted
for in terms of cue strengthfor example, by showing that although the two pre-
cursors are acoustically equivalent, one is perceptually stronger than the other. (See
Yu 2011 for an approach along these lines.) However, even if it proves possible to
construct a universal measure of the strength of phonetic precursors, we must still find
a way of identifying which precursors are ineligible for phonologization. For example,
are they precursors that fall below a certain threshold of strength? Or are they the
percursors that are the weakest within some small comparison set? It is by no means
clear that this criterion, whatever it is, emerges spontaneously from purely phonetic
considerations; even specifying a simple threshold seems to also require some kind of
higher-level, more coarse-grained cognitive mechanismin other words, what here
I call 'phonology'.
the lexicon contains a pattern that is never realized in phonology is good for these
models.5 Of course, the reason why some phonetic patterns influence the lexicon
while others do not is an important topic for future research within exemplar-type
frameworks, just as the reason why some phonetic patterns influence the phonology
while others do not is a topic for research in the more traditional framework adopted
above (for example, see Moretn 2008a for a proposal in the latter framework). The
contribution of this chapter is merely to rule out a model in which phonetics has one
relationship between phonology and an independent relationship with the lexicon:
these results show that whatever mechanisms selectively allow some phonetic patterns
but not others to be reflected in phonology, the same selectivity applies to which
phonetic patterns are reflected in the lexicon.
5
Pierrehumbert (looib) argues that if a pattern of lexical frequency is sufficiently non-robust, it can fail
to surface as a phonological pattern because not enough speakers will happen to have a lexicon that contains
the right information. Thus, an exemplar theory with this assumption could handle cases in which patterns
of lexical frequency do not match phonological patternsbut only if the lexical patterns are sufficiently
weak.
104 Abby Kaplan
Consonants were coded for voicing, place, and manner. Place and manner coding
are shown in Table 7.5. Sonorants (not listed) play no role in the analyses above.
8
8.1 Introduction
One of the successes of distinctive feature theory has been the identification of a num-
ber of phonetically defined features which are able to describe the groups of sound
that are phonologically active in many unrelated languages. This study measures the
crosslinguistic frequency of occurrence of classes defined by particular features which
have been proposed and examines the phonological behavior of these classes. The
characteristic behavior profiles of particular features are explored in terms of two
approaches to feature effects, one of which draws on representations for explanation,
and the other of which draws upon phonologizable phonetic effects for explanation.
Innatist approaches to distinctive features have accounted for the crosslinguistic
recurrence of particular types of sound patterns by building crosslinguistic gener-
alizations into the representations used for phonological patterns (Chomsky and
Halle 1968; Clements 1985; Sagey 1990). In this view, representations are explanatory.
Recurrent classes are definable using innate features, and the behavior of particular
classes is attributed to the organization of the mental representation of phonology.
The observation that only some logically possible classes of sounds are frequently
active in sound patterns is accounted for by positing that only the features which
define the active classes exist in a universal feature set. More specific observations
are accounted for by positing that certain feature values do or do not exist, and that
features are organized in a hierarchy that restricts the ways in which they can interact.
This approach maybe summarized with the slogan 'Things happen because of features'.
Another view is emergent features (Mielke 2008), in which feature effects are
accounted for in terms of the historical development of sound patterns, as marked-
ness generalizations and other patterns are accounted for in Evolutionary Phonology
(Blevins 2004). Recurrent phonologically active classes are defined by features whose
phonetic correlates are involved in commonly-phonologized phonetic effects, and the
166 JeffMielke
FIGURE 8.1 Relationships between phonetics, features, and phonological patterns (Mielke,
2008: 8)
behavior of particular classes is attributed to the nature of the phonetic effects from
which they developed. This approach can be summarized with the slogan 'Features
happen because of things'.
These two approaches to feature effects are schematized in Figure 8.1. In innate
feature theories, recurrent sound patterns are built out of distinctive features from the
universal feature set, which are in turn grounded in phonetics, so that features serve
as a link between phonetics and sound patterns. In the case of sound patterns which
are not easily captured using a particular feature set (e.g. sound patterns involving
unnatural classes of sounds), recourse can be made to phonetic effects or historical
accidents (the dotted line connecting 'sound pattern and 'phonetics'). In emergent
feature theory, this is the only connection between sound patterns and phonetics, i.e.
all sound patterns are historical accidents, but some of these accidents, such as the
phonetically natural ones which form the primary data for innate feature theories,
are more frequent than others. Features, in emergent feature theory, are posited by
learners in response to observed sound patterns.
The purpose of this study is to investigate whether the features that are frequently
required to describe sound patterns are attributable to frequently phonologized pho-
netic effects, i.e. to the dotted lines in Figure 8.1. Among the goals is to tease apart
features from their phonetic correlates as sources of explanation. Emergent feature
theory predicts that features are needed in rules only insofar as they are related to the
origin of the particular sound patterns they are involved in. In this view, features are a
component of a language user's grammar, as the formalization of a sound pattern that
is evident in the ambient language. Only features that are useful for characterizing
rules that occur due to real diachronic changes are useful for the grammar. This is
8. Phonologization and the typology of feature behavior 167
8.2 Methods
Counting and categorization of classes defined by particular features were conducted
on the sound patterns included in P-base1 (Mielke 2008), a database of sound patterns
found in language grammars available on library shelves. It includes 628 language
varieties, which are grouped into 549 languages. Dialects were considered to be one
language if they shared an entry in Ethnologue (Grimes, Grimes, and Pittman 2000).
All phonologically active classes involving more than one but fewer than all of the
segments in an inventory reported in these grammars were recorded. The definition of
a phonologically active class given in (i) is based entirely on phonological patterning,
as opposed to the traditional definition of natural class which also involves phonetic
or featural naturalness.
(i) Phonologically active class (Mielke 2008: 48-9): any group of sounds which, to
the exclusion of all other sounds in a language's inventory, do at least one of the
following:
a. undergo a phonological process;
b. trigger a phonological process; or
c. exemplify a static distributional restriction.
1
P-base is freely available on the web: <http://www.oup.com/uk/companion/mielke>. See Mielke (2008:
ch. 3) for a more detailed description of the survey methods.
168 JeffMielke
The 6077 phonologically active classes matching (ia, b) (excluding the static distri-
butional restrictions2) were classified according to features based on those proposed
in The Sound Pattern of English (Chomsky and Halle 1968), listed in (2). The features
[syllabic], [long], and [extra (long/short)] were used to capture prosodie distinctions
which are not considered to be the responsibility of the segmental feature system, and
the classes defined by these features are not discussed here.
(2) Features used for categorization of classes and changes:
[consonantal] [anterior] [delayed primary release]
[vocalic] [distributed] [delayed release of secondary closure]
[sonorant] [strident] [glottal (tertiary) closure]
[continuant] [lateral] [heightened subglottal pressure]
[voice] [back] [movement of glottal closure]
[nasal] [low]
[tense] [high] ([syllabic])
[coronal] [round] ([long])
[covered] ([extra (long/short)])
Features are being used as a familiar descriptive labeling convention in order to group
together different phonologically active classes of sounds from different languages,
for the purposes of counting them. The use of features for classificatory purpose is
orthogonal to the question of whether features are primitives in phonological pat-
terns. The particular feature set in (2) was chosen because it was able to represent
the greatest number of phonologically active classes using conjunctions of distinctive
feature values (Mielke 2008: ch. 7), compared to features from Preliminaries to Speech
Analysis (Jakobson, Fant, and Halle 1952) and Unified Feature Theory (Clements and
Hume 1995). Some of these features are no longer widely used, but this is of little
consequence here, because the primary concern of this study is the behavior of very
frequent classes, which are easily handled, often in the same way, by many different
feature systems. See Mielke, Magloughlin, and Hume (2011) for a comparison of six
feature systems using the same database.
Featural descriptions for phonologically active classes were generated by an algo-
rithm that constructs a feature matrix for the segment inventory of a language and
produces the minimal set of feature values that can define the class, if the class is defin-
able in this way. See Mielke (2008: 47-55) for details of how this algorithm works.3
Of the 6077 classes, 4313 (71.0 per cent) could be represented by a conjunction of the
2
Static distributional restrictions have been excluded in order to avoid mixing productive phonolog-
ical patterns with patterns that are more likely to be fossilized remnants. Distributional restrictions are
nonetheless an interesting object of study, and this is set aside for future investigation.
3
The algorithm selects an analysis which requires the minimum number of features. In cases where one
feature is implied by another one (e.g. [+lateral] implies [+coronal] in the SPE system), both features were
counted. In cases where more than one minimal feature bundle was possible, one of these was selected
arbitrarily.
8. Phonologization and the typology of feature behavior 169
features in (2) (Mielke 2008:147), and these classes are considered further. The residue
of'unnatural' classes, discussed in Mielke (2008:118-33), is a mixture of phonetically
unnatural classes and phonetically natural classes that are not handled well by the
feature theory. Since distinctive features were used to classify the sound patterns, the
analysis in this chapter focuses on the classes that were handled well by SPE features.
In addition to defining the classes automatically, the changes involved in the sound
patterns were defined featurally by hand.
Occurrences of features were categorized according to the types of behavior in
(3). These examples illustrate the four types of feature behavior using [+voice] as an
example. In (3a), [-hvoice] is involved in the change and also present in the environ-
ment triggering the change. This is classified as spread. In (3b), [-hvoice] is involved
in the change, and its opposite value ([voice]) defines the environment triggering
the change, so this would be classified as dissimilation. In (3c(i)), [-hvoice] defines a
class of sounds undergoing a change but is not involved in the change itself, making
this an example of a feature being used to partition an inventory into undergoers
and nonundergoers of a sound pattern. Partitioning an inventory into triggers and
nontriggers of a sound pattern without being involved in the change is also classified as
partitioning, as in (3c(ii)), where [-hvoice] sounds trigger a change which is unrelated
to voicing. Any use of a feature that does not fit into one of these three categories, such
as being involved in a change that is neither dissimilatory nor assimilatory, as in (3d),
is classified as other.
8.3 Results
The results are presented here in terms of the features used to define classes and
changes, beginning with a look at the frequency of the most frequently-used features,
and proceeding to their behavior.
Figure 8.2 shows the eighteen features that are used in the descriptions of most
sound patterns, and the activity of their + and values. The dark bars represent cases
where a single feature value defines a class or change, and the light bars represent
cases where the feature value is used in conjunction with other feature values to
i/o JeffMielke
define a class or change. The features [voice] and [high] are used the most, followed
by [back], [nasal], [continuant], and [sonorant]. Occurrences of [voice] are divided
roughly equally into [+voice] alone, [voice] alone, [+voice] as part of a larger
feature bundle, and [voice] as part of a larger feature bundle. Occurrences of [high],
however, are dominated by cases where it is used as part of a larger bundle (such as
[-hhigh, + vocalic]).
The features are sorted according to the total number of occurrences of either fea-
ture value, although it is apparent that some features are more symmetrical than others
in the occurrence of their + and values. The feature [voice] is quite symmetrical, but
other features are used more for one value than for the other. For instance, [-hnasal] is
more than three times as frequent as [nasal], and [sonorant] and [+distributed]
are also much more frequent than their opposites.
Figure 8.3 shows the number of occurrences of spreading, dissimilating, parti-
tioning, and other behavior for classes defined using the most frequently-used fea-
ture values. The chart is based on all of the occurrences of each value, not just
8. Phonologization and the typology of feature behavior 171
FIGURE 8.3 Total number of occurrences of each type of feature behavior for the most frequent
feature values
the ones where the feature is used by itself. Both are interesting things to count.
Counting only single-feature bundles (instances where a feature is used by itself)
yields more differences between features, but counting all occurrences of each fea-
ture (as in the figure) provides results that are more applicable to whether phonetic
effects can account for the need for a particular feature, because the figure shows
all of the instances where the feature is needed to describe a phonological pattern.
Figures 8.6-8.7, m the appendix, display the same information for single-feature
bundles.
1/2 JeffMielke
As seen in Figure 8.3, much of the spreading is concentrated among a small number
of feature values. The feature [+voice] is involved in 20.1 per cent of all spreading,
followed by [-hnasal] (12.1 per cent), [+back] (8.7 per cent), and [+continuant]
(8.2 per cent). Other feature values, such as [sonorant] and [continuant], seldom
if ever spread, but are required in other capacities (partioning, other). Dissimila-
tion is much less frequent than assimilation, and is concentrated primarily among
[sonorant] and [-hconsonantal], followed by [continuant], [voice], [zbnasal],
and [-hhigh]. All of the features are used extensively in partitioning, and there are
features, many of them major class features, which do almost nothing else.
Figure 8.4 shows the rates of spreading, dissimilating, and partitioning for the
same features, as percentages of the occurrences of each feature. The overall average
FIGURE 8.4 Rate of spreading, dissimilating, partitioning, and other behavior for the most
frequently-used feature values
8. Phonologization and the typology of feature behavior 173
rates of spreading, dissimilating, partitioning, and other behavior are indicated. Some
feature values, such as [-hsonorant], [low], and [consonantal], have high rates of
spreading, although they do not account for a very large proportion of all the instances
of spreading, because the feature values are used less overall.
8.4 Discussion
There are several ways in which the results show different behavior for different fea-
tures, including differences in spreading, dissimilation, partitioning, and in the way
particular values of the same feature behave.
The feature values that are responsible for the most spreading depicted in Figure 8.3
have phonetic correlates that are known to be involved in coarticulatory effects. These
features include [voice], [+nasal], as well as [+high] and [zbback], and [-hsonorant]
and [-hcontinuant], which are involved in intervocalic lenition. The feature values that
almost never spread ([son], [voc], and [+cons]) do not have phonetic correlates
that are involved in coarticulation, so they lack phonologization precursors for assim-
ilatory patterns. By attributing the difference between spreading and non-spreading
values to the phonetic effects that tend to get phonologized, it is possible to account
for this distinction while leaving a role in the theory for non-spreading feature values.
The features that have the highest ratios of partitioning to spreading are
[sonorant], [vocalic], and [-hconsonantal]. These features are frequently used
to define phonologically active classes, partioning inventories into undergoers and
non-undergoers or triggers and non-triggers. These features have a history of being
difficult to define. For example, Kenstowicz and Kisseberth (1979: 21) observe that
some features are hard to define phoneticaly but are still necessary to describe sound
systems:
There are no truly satisfactory articulatory or acoustic definitions for the bases of these two
different partitions [consonant and sonorant]. Nevertheless, they are crucial for the description
of the phonological structure of practically every language.
Chomsky and Halle (1968: 318) observe that it is not obvious how laterals should
be defined with respect to the feature [continuant]. Mielke (2005) gives evidence that
they pattern as continuants and as noncontinuants, and suggests that what has been
treated as a single feature [continuant] may be better treated as a bundle of related
phonetic parameters that oppose stops to fricatives and/or vowels but treat phonet-
ically ambiguous sounds (e.g. laterals and nasals) differently. This is consistent with
Kenstowicz and Kisseberth (1979)5 comments about [consonantal] and [sonorant].
Since there is often more than one way to define a class, crucial evidence about the
definitions of partitioning features is hard to find. Consequently, it is more difficult
to make a case for the universality of partitioning features, and Mielke (2005) argues
that [continuant] is only as predictable crosslinguistically as the phonetic properties of
the sounds involved, and that the feature effects which are the most consistent across
languages seem to be the ones with the most direct phonetic basis.
The connection between being hard to define phonetically and not spreading can
be treated as an issue of analysis, i.e. that it is easier to identify phonetic correlates
of features that spread, because the assimilated and unassimilated segments can be
compared directly. When features define a partition rather than spread, it is often
in conjunction with other features, and often there are multiple alternative feature
bundles which can define the same class.
The murkier phonetic dimensions are also less straighforwardly involved in coar-
ticulation or less straighforwardly reinterpreted as phonological patterns when they
are involved in co articulation. The spreading-partitioning distinction coincides with
8. Phonologization and the typology of feature behavior 175
8.5 Conclusion
In summary, the survey has provided evidence that different features have different
behavior, which in many cases can be attributed to their phonetic correlates' involve-
ment in phonologization precursors. This is expected if features are abstractions from
8. Phonologization and the typology of feature behavior 179
sound patterns, and different features have different reasons for existing (Mielke
2008), but it is surprising if features are treated as explanatory primitives. Understand-
ing how phonologization gives rise to certain sound patterns is key to understanding
the sound patterns themselves. Representational approaches have been developed
with the purpose of accounting for feature behavior, but universal representation is
often too blunt an object to account for the behavior of features, often placing too
much emphasis on whether a particular feature or value does or does not exist. Rather,
the life of features seems to be richer than can be compressed into a model based on
presence or absence of universal features.
9.1 Introduction
The speed and ease with which young children converge on seemingly complicated
and abstract linguistic knowledge has long been taken as support for the hypothesis
that many aspects of grammar must be innate (Chomsky 1986). However, there has
been a growing body of work showing success in pattern extraction by associationist
models (cf. Rumelhart and McClelland 1986; Elman 2003), as well as statistical learn-
ing by infants and adults (cf. Jusczyk et al. 1999; Maye et al. 2002; Newport and Aslin
2004; Wilson 2006). This work has re-opened the question of how much information
is in fact contained within the auditory input, and how much ofthat can be attended
to and extracted by listeners.
The fact that many aspects of phonetics cannot be attributed to universal processes
of articulation and motor planning, but that individual languages adopt individual
phonetic implementations, is evidence that these phonetic facts must be learned, and
therefore, that learners must be able to induce them from the speech stream. Research
suggests that among the relations that listeners must encode are the degree to which
their language nasalizes vowels before nasal consonants, and lengthens low vowels
relative to high ones (Keating 1985; Sol i992b; Beddor and Krakow 1999). There is,
additionally, considerable evidence that speakers can, at least for certain tasks, access
highly detailed representations of particular words and sounds (Summerfield 1981;
Goldinger 1996; Remez et al. 1997; Clopper and Pisoni 2004; Allen and Miller 2004).
* This work was supported by a Department of Education Javits fellowship as well as a National Science
Foundation IGERT grant to the Johns Hopkins Cognitive Science department. I would like to thank Paul
Smolensky, Colin Wilson, and the members of the Johns Hopkins IGERT lab.
18 2 Rebecca Morley
mismatch between perception and expectation. There is no source in the signal for
the nasality perceived on the vowel, other than the vowel itself. Thus, in the absence
of the originally conditioning nasal consonant, the nasality cues on the vowel may
first become more salient to the listener, leading them to be encoded as part of the
underlying representation of the vowel, and ultimately to an oral/nasal vowel contrast
in the language as a whole. This argument gains support from experimental work by
Kawasaki (1986) who found that, for a series of one syllable stimuli, English speaking
participants rated the vowel as more nasal, the lower the amplitude (and thus the lower
the perceptibility) of the final nasal consonant.1
The ultimate story, of course, must be significantly more complicated than this,
given the multifarious character of natural language. As one example of this com-
plexity, consider the prosodie hierarchy, nested levels of constituent structure each
of which may have its own associated phonetic and phonological rules (see Selkirk
1980; Byrd and Saltzman 1998). Furthermore, a large body of psycholinguistic work
on processing of sentence, word, and morpheme size units exists to motivate a cog-
nitively plausible model of phonological competence. As far as I am aware, there
exists no study of how these known factors should or could be incorporated into a
diachronically-based theory. The current work represents a first step in integrating
these linguistic areas, laying the groundwork for consideration of phonetic cues within
structured domains and how they might become phonological cues over time.
The particular cue investigated in this article is vowel nasalization, and the method-
ology is experimentalan artificial grammar learning task in which vowel nasality
is linked with morphological inflection. In sections 9.2 and 9.3, I will describe the
experiment in detail, and I will argue that success in learning phonetically conditioned
alternations across a morpheme boundary provides necessary support for a phonetic
origin of this type of domain restricted process. In section 9.4 I will consider more
detailed analyses of the experimental results, looking carefully at the degree of pho-
netic nasality in individual stimulus items. Finally, in section 9.5 I will summarize my
conclusions, and situate them within an Evolutionary Phonology account, suggesting
a special role that morphological decomposition might play in the process of phonol-
ogization.
9.2 Experiment
The present experiment is designed to investigate the relation between phonetic and
phonological patterns, in particular, the hypothesis that phonologization is initiated
due to a discrepancy in perception or production, either of which can lead to a differ-
ence between the listener's analysis of their input and the speaker s intended output.
An example of this, as discussed previously, is the genesis of phonemically nasal
vowels (/V/) from phonetically nasal vowels ([Vn]) that have lost their final nasals. A
clearly necessary pre-condition for a failure of expectation is the establishment of such
an expectation in the first place. The experiment described here will be concerned
with testing the hypothesis that such an expectation can be inducedthat listeners
are able to attend to sub-phonemic vowel nasalization, as well as learn novel rules
that link those cues with the grammatical structure of morpheme boundaries.
9- Rapid learning of morphologically conditioned phonetics 185
Previous experimental work has found both production and perception differences
related to the presence or absence of morphological boundaries. Measurements of
Korean speakers' productions show that there is a difference in amount of variability
in gestural timing with regards to palatalization across versus within a morpheme
boundary (Cho 2001). Work on English has demonstrated a correlation between
morphological boundary strength and degree of phonetic reduction (Hay 2003). And
work by Frazier (2005), also in English, shows reliable differences in vowel length
for monosyllabic words depending on whether they are monomorphemic or bimor-
phemic (e.g. passed/past). These differences have also been shown, in some cases, to be
accessible to hearers. Frazier reports a perceptual effect correlated with vowel length
in terms of participants' likelihood of selecting the mono- or bimorphemic variants in
a forced choice task. Another set of experiments have examined the effect of different
boundary types (phonological phrase, prosodie word) on word selection (Salverda
et al. 2003; Christophe et al. 2004), providing evidence that listeners can make use of
phonetic cues associated with domain structure, cues which can include differences
in segment length, pitch accent, and degree of coarticulation.
These perception experiments employed experimental tasks that were centered
around the explicit disambiguation of semantically distinct minimal pairs. The cur-
rent experiment, on the other hand, involves explicit training on a novel mor-
phological alternation (the presence or absence of the suffix -/m/), and only
implicit training on the associated (redundant) phonetic difference of interest (the
degree of nasalization on pre-nasal vowels). Furthermore, the task is discrimina-
tion between two words which are phonologically identical, but one of which is
the correct word phonetically (given the participants' training), and one of which
is not.
Similarly, the work described above has shown that listeners are sensitive to dif-
ferences in the realizations of the phonetic cues associated with the productions
of different speakers, and in different environments. But that work dealt with cues
which were otherwise contrastive in the given language (such as VOT), or were
robust phonetic indicators of phonemic contrast (such as vowel length differences,
which signal voicing distinctions on stops in English). In the current case, how-
ever, the feature under investigation is nasalitynever contrastive on English vowels,
and, as far as I am aware, almost always redundant in signaling a subsequent nasal
consonant.
The present paradigm is an approach that combines the power of the statistical
learning paradigm (cf. Newport and Aslin 2004)the ability to carefully control the
listener's input, and test associations learned implicitlywith the type of experimental
phonology advocated by Ohala (1974, 1981) and exemplified in Kawasaki's (1986)
work investigating the acoustic-level correlates of language change. This combined
approach is a very promising avenue to testing a number of hypotheses about language
learning and change.
186 Rebecca Morley
9.2.1 Procedure
The experimental design was as follows. Participants were told that they would be
hearing words in a new language, words spoken by somebody named Frank,2 and
that they would later be asked questions about those words. Each word would appear
in the singular, accompanied by a picture of a single object, and then in the plural,
accompanied by a picture of two of the same object. What followed was a passive train-
ing stage in which participants listened to words over the headphones and looked at
pictures on the computer monitor. There was a 1200 ms pause between each picture-
word pair. All training items were presented in such pairs, with the singular appearing
first. The singular and the plural differed in that the plural ended with the suffix -/ml.
For example, participants heard the word 'skimtu' over the headphones at the same
time they saw a picture on the screen of a single key; this was followed by the word
'skimtum' heard over the headphones, accompanied by a picture of two keys.
Participants were trained on twelve distinct singular-plural word pairs, repeated in
six randomized blocks. Once midway through training, and again after training had
completed, a practice block occurred. Each practice block consisted of presentation
of twelve pictures (a random selection from the set of all singular and plural items
seen during training). Each of these pictures was presented once, and accompanied
by two auditorily presented words. Six hundred ms after the picture appeared, the
first word was played; this was followed 800 ms later by the second word. Participants
were instructed to select (via key press) the spoken word that matched the picture
(V for the first word; '2' for the second). This was a test of the singular/plural dis-
tinction, such that, of the two word choices per picture, one was a singular inflec-
tion, and the other a plural inflection. As soon as the participants pressed a key,
the picture disappeared. Participants received feedback during these practice trials,
seeing either correct' or 'incorrect' appear on the screen, and hearing a buzzer noise
in the latter case. Two hundred ms later the next picture appeared. Participants'
performance in the second practice block was used as a criterion test for inclusion of
their results.
The alternation of interest related to the behavior of pre-nasal vowels. Since all
stems were vowel-final, all plural words contained such a vowel before the plural suffix
(-/m/), that is, at the morpheme boundary. Half of the words also contained stem-
internal nasals. In both conditions the degree of regressive nasalization on the vowel
contrasted in these two environments. In training for the ORAL-NASAL condition,
there was o per cent regressive nasalization within morphemes and 100 per cent
across; in the NASAL-ORAL condition, those values were reversed. See Table 9.1
for example stimuli. It should be noted that the ORAL-NASAL condition presents
an alternation to the learner on the word final vowel, whereas the NASAL-ORAL
condition shows no alternation.
2
The speaker was identified in the hopes of priming speaker identificationa task known to support
the encoding of sub-phonemic cues (Reniez et al. 1997; Allen and Miller 2004).
9. Rapid learning of morphologically conditioned phonetics 187
Half of the stems ended in /i/ and half in /u/. 3 Half of the words also contained
/m/ in the stem, which was preceded in three stems by /i/ and in the other three by
/u/. Thus, in both conditions, subjects heard six instances of /im/ and six of /urn/
word-internally, and six instances of each word-finally; what differed was whether
the vowel was nasalized in the word-internal (tautomorphemic) or word-final (het-
eromorphemic) /Vm/ sequences.
At test, subjects were asked to identify which of two words was the one spoken
by Frank. A two-alternative forced-choice task consisted of two auditorily presented
words, one with the high degree of nasal coarticulation, and one without any nasal-
ization on the pre-nasal vowel, but otherwise identical. Test items included both old
words (heard during training), and new words, both singular and plural. These words
were accompanied by pictures, as in the training phase. The six stems that lacked an
internal nasal were tested only in the plural; the six nasal stems were tested both in
the singular and plural. Each test item was presented twice in either order, for a total
of twelve singular test items and twenty-four plural test items each for old and new
words. The order of items was randomized across participants. Table 9.1 gives example
stimuli for each condition.
The phonetic cues present in the stimuli are natural (regressive nasalization asso-
ciated with nasal consonants) and redundant (only occurring with the cue of the
accompanying nasal consonant). Furthermore, vowel nasalization of any degree is
non-contrastive in English, the native language of the experimental participants. For
these reasons, we might not even expect listeners to reliably hear differences along
this dimension. To check for accuracy of perception, the final task of the experi-
TABLE 9.1 Example training and test items for the two experimental conditions
ORAL-NASAL NASAL-ORAL
3
The full list of stems:
Old Stems (heard both in training and test) New Stems (heard only at test)
'haezi 't/u 'skimtu ai'gimdu 'di 'fi ja'dimfu 't/umgu
'oski 'spi ja'tumbi 9'd3umpu 'ploksu 'ipi 'hum/i glau'dumki
!
'hu g9u 'Gumzi 'twimt/i 'nldu 'stu fra'bimsi 'imdji
18 8 Rebecca Morley
ment was an AXB test to assess participants' auditory discrimination of the phonetic
nasalization cue. Test items consisted of a subset of the words participants had been
tested on earlier in the experiment. For each triplet, participants had to choose which
two words were identical; either the first word was the same as the second, or the third
word was the same as the second. The non-identical token differed only in degree of
nasalization, e.g. [skimtu] [skmtu] [skmtu].
9.2.2 Stimuli
The stimuli consisted of words of varying length and syllable structure. The nasal
vowel at the morpheme edge was always post-tonic (except for the one-syllable roots);
the nasal vowel within the morpheme was always tonic, but sometimes occurred in
the first and sometimes in the second syllable. Natural speech tokens were used; each
token was recorded separately. All words were produced by a phonetically trained
male native American English speaker who was instructed to pronounce unstressed
vowels as full vowels (rather than reducing them). All stimuli were recorded in a
sound-attenuated booth at a 22 kHz sampling rate. Also recorded were monomor-
phemes with final oral consonants, and monomorphemes with final nasal consonants,
e.g. /haezim/, /haezi/, /zib/, /zim/. Nasalized vowel tokens were created by splicing
a portion of the vowel from a stressed nasal coda environment ([zim], generating
[haezm]). Since these stressed vowels were longer in duration it was possible to
select only the part of the vowel that was nasalized. This was determined by visual
inspection, verifying that a nasal formant (around 1000 Hz) was visible throughout
the duration of the vowel. See Figure 9.1 for an example spectrogram. Non-nasalized
tokens were created by splicing from a non-nasalized environment, e.g. [zib], gen-
erating [haezim]. For these items, no nasal formant was visible in any part of the
vowel. Vowels were normalized for length, such that root-internal front vowels did not
significantly differ across oral and nasal tokens (similarly for root-final front vowels,
root-internal back vowels, and root-final back vowels). To reduce auditory artifacts,
splicing was always done at zero crossings, and effort was taken to produce a smooth
intensity and frequency contour by splicing from multiple parts of the replacement
vowel (beginning, middle, and end). Intensity was adjusted where necessary to avoid
the percept of stress on the spliced vowel.
9.2.3 Participants
Fifty-eight undergraduates at Johns Hopkins University were given course credit
to complete the thirty minute experiment. All participants reached criterion in the
second practice block (>95 per cent correct). Only the results from the thirteen
participants in each condition who reached threshold on the AXB task (>/o per
cent singular and plural items, separately) are plotted. See Figure 9.2. Analyses were
subsequently carried out for the entire set of participants (adding back in the thirty-
two who fell below threshold).
9.2.4 Results
The results discussed here are for responses in the 2-Alternative Forced-Choice task.
Only participants who performed above threshold on the AXB discrimination task are
included for the first analyses. Test items consisted of two types: singular and plural.
The choice at test was always between a nasalized variant and a non-nasalized variant
(that is, the two possible responses differed only in nasality on the critical vowel).
Two conditions were contrasted: ORAL-NASAL: nasalization across boundary, none
within; and NASAL-ORAL: no nasalization across boundary, nasalization within.
Each participant was run in only one condition.
The dependent variable in the first analysis was the percentage of responses for
which participants selected the nasalized variant (as opposed to the non-nasalized).
Under successful learning, the value of this variable should be large for plural items
in the ORAL-NASAL condition, and small for singular items (small for plural items
in the NASAL-ORAL condition, large for singular items). As just described, the
prediction is that there should be a significant interaction between condition and test
type. See Figure 9.2a.
Since the dependent variable was a proportion response, varying between o
and i, a logistic regression analysis was performed (Jaeger 2008; Agresti 1996). Each
model term was assessed for its reduction in the residual deviance of the logistic fit as
compared to the model without that term. The significance of the reduction was then
evaluated using a chi-square test of significance, producing the p values shown here
and in the final column of the tables in the Appendix.
For the first analysis two separate regressions were performed, one for old items,
to establish firstly discriminability and attention, and one for new items, to show true
generalization. Model terms were Condition (ORAL-NASAL or NASAL-ORAL) and
190 Rebecca Morley
FIGURE 9.2 Experiment i: (a) Percent Nasal variant chosen by Condition (= ORAL-NASAL
or NASAL-ORAL) and Type (= Singular or Plural), plotted separately for old and new words,
(b) Percent Correct chosen by trained degree of nasality, plotted separately for old and new
words; Conditions and inflection types combined
9- Rapid learning of morphologically conditioned phonetics 191
Type (plural or singular), as well as a term for the interaction between Condition and
Type (the critical test of difference). Old Items: There was no main effect of Type,
but Condition was significant (x 2 (1) = 6.38,p < .05). Adding the critical interaction
term improved the model fit by a significant factor (x 2 (i) = 39-9 2 > P < .0005). New
Items: There was no main effect for Condition or Type. Adding the critical interaction
term improved the model fit by a significant factor (x 2 (i) = 11.46, p < .005).
Alternatives to the morphological hypothesis were also considered. One possibility
is that the participants were learning a single degree of nasalization (rather than
two differing by morphological location); another possibility is that nasalization was
encoded by stress rather than morphological condition (since stress location is largely
confounded). To test these hypotheses, a second regression model was run by item for
plurals alone, with participant accuracy as the dependent variable, and model terms
for stem type (nasal stem/oral stem, which provides a test of the single-association
hypothesis by comparing performance on nasal stem plurals with other plurals), and
stress location (pre-final/final) No effect was found for the interaction of stem type
with stress position.
As can be seen from Figure 9.2b, response levels hovered near chance for items
that were nasalized during training. A regression model of participant accuracy with a
term for training type showed that responses were significantly more likely to be accu-
rate for trained Oral (ORAL-NASAL singular and NASAL-ORAL plural) than for
trained Nasal (ORAL-NASAL plural and NASAL-ORAL singular) (x 2 (i) = 12.33,
p < .0005). No effect, however, was found for condition, and the results of the two
conditions are combined in Figure 9.2b.
A final analysis involved pooling the data from all fifty-eight participants (including
those previously excluded due to their below-threshold performance on the AXB
task). The results of analysis i hold when performed over all participants, implying
that learning has taken place. This in turn suggests that the 70 percent discrimination
threshold is an unnecessarily stringent criterion. Discrimination level (averaged over
all test types in the AXB task) was added as a continuous term to the regression
model of accuracy. This term was a significant predictor of accuracy over all test
items (x 2 (i) = 7-48, p < .05). This shows, perhaps unsurprisingly, that the better
participants were at the AXB task (at reliably discriminating the difference to be
learned), the more accurately they performed at test.
9.2.5 Discussion
A robust interaction effect (Condition x Type) indicates an effect of training on
participant response. For old items, learners were able to encode detailed phonetic
representations for words they had heard before, representations that included infor-
mation about sub-phonemic vowel quality features for at least two different positions
in the word. This same interaction effect for new items indicates that learners were able
to make an association between those phonetic features and some other property of
the training words such that they were able to correctly generalize to novel test items.
192 Rebecca Morley
A main effect of Condition for old items indicates that the nasalized variant was
less likely to be chosen overall for the NASAL-ORAL condition than the ORAL-
NASAL condition. This main effect goes away, however, for new items. As a result,
it is not entirely clear how to interpret this finding. Furthermore, accuracy modeled
against condition for both old and new items indicates that there was no difference in
performance between the ORAL-NASAL and the NASAL-ORAL condition (p > .5).
There was, however, a consistent difference in accuracy between oral and nasal
items, an asymmetry such that learners were more permissive of oral forms when they
were trained on nasal, but were less likely to accept nasal forms when trained on oral.
This might reflect a bias such that, prior to any training, there is an expectation for a
low (or close to zero) degree of nasal coarticulation in all contexts. To test this hypoth-
esis, a control experiment was run. The results are described in the next section.
FIGURE 9.3 Experiment 2: Percent Nasal variant chosen by Type (^Singular or Plural), plotted
separately for old and new words
training for those items, and the learning effect found in Experiment i is carried by
the trained Oral items. In order to determine the reason for this asymmetry, I took a
closer look at the phonetic characteristics of the experimental stimuli in question.
9.4.1 Measurements
Co articulation with a neighboring nasal segment can be observed in nasal formants
which are often visible in the spectrogram of a nasalized vowel. The amplitude of these
194 Rebecca Morley
9.5 Conclusions
This chapter has been a description of initial experimental work within a paradigm
that combines the strengths of the artificial grammar learning apparatus with the
insights of work investigating the phonetic bases of phonological sound changes. The
results of Experiments i and 2 suggest that listeners can perceive and encode different
phonetic associations for boundary versus non-boundary environments. This might
be characterized as the development of an expectation for a degree of coarticulation,
or perhaps degree of variability of coarticulation. What is more, adult speakers can
learn new associations of this type with very little training (less than thirty minutes).
4
The measured values fall within the range observed by Chen in her study of the natural productions
of eight English speakers (monosyllables, either completely oral or nasal in context, e.g. 'bed' vs. 'men).
FIGURE 9.4 (a) Box plot of Degree of nasalization: A(Ai-Pi). Separately by vowel (i or u),
and separately by Old and New words. The extreme outlier tokens are indicated, (b) Example
Spectral Slice from token 'oskim' (see Figure 9.1). Top: Nasal; Bottom: Oral. The location of the
nasal formant is indicated by Pi
196 Rebecca Morley
This result may seem surprising for a couple of reasons. One of these is the body of
work that suggests that discriminating and producing phonemic distinctions in a sec-
ond language which are allophonic in the native language is quite difficult (Goto 1971,
Dupoux et al. 1998). In the second case, making an association based on morphology
requires a level of abstraction on the part of the learner, one that is not always observed
in artificial grammar learning experiments in which participants fail to generalize to
novel segments or novel words, or unseen members of a natural class (see Peperkamp
2003 for a review of some of the literature).
For these reasons alone, this result is a significant one. However, it also has a
broader relevance. Listeners may use the phonetic cue of nasality on a preceding
vowel to predict the nasality of a following consonant, or the nasality of a following
consonant to assess the quality of the preceding vowel. And on an Evolutionary
Phonology account it is a mismatch between an expected and an observed degree of
this coarticulation which can lead to the genesis of a phonemically nasal vowel over
time. The experimental results presented here satisfy a necessary condition for this
basic misparsing storythe ability to develop such an expectation in the first place,
in particular, an association tied to the linguistically active domain of the morpheme
boundary. However, there is much further to go in developing a complete theory of
the route from phonetics to phonology.
One question that immediately comes to mind is under what circumstances we
might expect compensation for coarticulation to fail. In other words, what factors
might make it more likely for a mismatch to arise between the listener's expectation
and their perception of the speech signal? Intuitively, these circumstances seem to
be provided in cases in which the conditioning environment is somehow lost. But
what about situations in which the conditioning environment remains? The derived
environment effects in Korean, discussed at the beginning of this chapter in (i), would
be an example of this type.
We could start by assuming, for the moment, that with perfect knowledge of the
correct class of element (the correct boundary) and the expected range of feature
spread (or coarticulation) due to that boundary, we are practically perfect in our
ability to correctly reconstruct the constituent phonemes. If, on the other hand, our
ability to recognize the correct boundary is diminished in some way, then our ability
to apply the appropriate degree of compensation for coarticulation is automatically
compromised. In this way might a listener attribute a phoneme to a different category
than that intended by the speaker, either by interpreting its degree of coarticulation
with the neighboring segments as insufficient (and thus subtracting the relevant fea-
ture), or analyzing the degree as exceeding expectation (and thus adding the relevant
feature).
The way in which morphological junctures might become important to this story
beyond demarcating a specific class of phonological processes is by providing a
mechanism for initiating the process of phonologization. That is, we could advance
9- Rapid learning of morphologically conditioned phonetics 197
the preliminary hypothesis that all internal phonological change originates at the
morpheme boundary. This move allows us to make use of independent work in
psycholinguistics related to the question of word level processes. In this framework,
the ability to reconstruct a morpheme boundary can be related to the representational
status of the morphologically complex word. If a particular complex form, due to its
high frequency of use, achieved a lexicalized, undecomposed status, then the original
morpheme boundary could be thought of as weakening or disappearing.
Lexical access models of morphologically complex words often describe a competi-
tion between two routes to word meaning. One route achieves access via composition
of the constituent morphemes, and the other through the word as a whole. Sensitivity
of response times to word frequency in lexical decision tasks is taken as evidence for
access via the whole-word route. This is expected to occur above a certain frequency
threshold, such that the compositional route only wins when the whole word fre-
quency is relatively low (see Gordon and Alegre 1999 for discussion of these models).
We don't yet know how sub-phonemic information might enter into the picture. But
we can imagine a situation in which a speaker produces the appropriate high degree
of nasalization across a boundary, but a listener for whom the whole word, rather than
the compositional route has become the predominant one fails to compensate for the
boundary effect. This story may not be the right one, but it provides us with some
account of how this original failure to correctly reconstruct the source of the speech
signal might systematically occur, an element often missing from historically-based
accounts. Furthermore, it raises an intriguing hypothesis about the relation between
domain-limited effects in linguistics and general processes. If the former is a necessary
stage on the way to the latter, then a more careful study of a seemingly marginal
phenomenon like derived environment effects might prove fruitful for insights into
linguistic phenomena of all kinds.
Appendix
TABLE 9.2 Experiment i: '% nasal response fit to Condition, Test Type and Condition x Type; above-threshold participants only
OLD Df Dev Resid. Df Resid. Dev P(> |Chi|) NEW Df Dev Resid. Df Resid. Dev P(> |Chi|)
NULL 51 120.23 NULL 51 61.91
Gond i 6.38 50 113.85 0.01* Cond i 0.61 50 61.30 0.43
Type i 2.73 49 111.11 0.09 Type i 0.98 49 60.32 0.32
Cond:Type i 39.92 48 71.18 2.646-10* Cond:Type i 11.46 48 48.85 0.001*
TABLE 9.3 Experiment i: % correct response by item for plurals only; model fit to Stress location (pre-final/final), Nasality of Stem (2
nasals/1 nasal); above-threshold participants only
TABLE 9.4 Experiment i: % correct response fit to Condition, Test Type, Training type (Oral/Nasal), and AXB results; all participants
Individual differences
in socio-cognitive processing
and the actuation of sound change
A L A N C. L. YU*
lo.i Introduction
What motivates the introduction of new linguistic variants, such as a new sound or
a new sound pattern, and how these variants flourish and propagate throughout the
speech community? These questions are at the heart of research in phonologization
and the origins of sound change. Many theorists drew inspiration from biological
evolution and conceptualize the actuation of sound change in terms of a two-step
process of variation and selection (Lindblom et al. 1995; Kiparsky 1995; Mufwene
2001; Blevins 2004; Mufwene 2008). New variants propagate across a speech com-
munity as a result of a process of selection and rejection by language users who
evaluate all variations with respect to their social, articulatory, perceptual, and lexical-
systematic dimensions. The sources of variation are many (Ohala i993b; Lindblom
et al. 1995; Mufwene 2008; Beddor 2009). Setting aside the influence of language
contact, new variants are commonly assumed to be introduced as the results of the
effects of channel biases that are inherent in the modalities of speech communication
(e.g. biases in motor planning, speech aerodynamics, gestural dynamics, perceptual
parsing; see Garrett and Johnson's chapter in this volume for more discussion) and
analytic biases that come from presumed universal computational mechanisms such
as Universal Grammar (Wilson 2006; Moretn 2oo8a). When members of a speech
* I thank Penny Eckhert, Andrew Garrett, Peter Graff, Lauren Hall-Lew, Tyler Schnoebelen, and Tom
Wasow for their insightful comments and discussions. Attendees of the Variation and Language Processing
workshop at University of Chester and audiences at the Chinese University of Hong Kong, University of
Ottawa, and University of California, Berkeley provided useful feedback. Naturally, all errors are my own.
This material is based upon work partially supported by the National Science Foundation under Grant no.
0949754. Any opinions, findings, and conclusions or recommendations expressed in this material are those
of the author(s) and do not necessarily reflect the views of the National Science Foundation.
202 Alan C. L. Yu
community come to share these new perceptual and production targets, sound change
obtains. How a speech community, or a community of practice (Eckert 2000), comes
to adopt a new norm is a matter of much debate, however. Proponents of exemplar-
based models of sound change, for example, argue that sound change may be mod-
eled in terms of drifts of exemplar clouds' (e.g. Pierrehumbert 2001 a; Wedel 2006,
2007; see also Garrett and Johnson, this volume). That is, assuming that exemplars
in such models retain fine phonetic details of particular instances of speech, new
variants introduced by persistent bias factors would accumulate in such a fashion
that eventually moves the distributions of exemplars in the direction of the biased
variants, presumably as a consequence of convergence via imitation. That is, speak-
ers' production targets are altered along some phonetic dimensions to become more
similar to those of their fellow interlocutors (Babel 2009; Goldinger 1998; Nielsen
2007; Pardo 2006; Shockley et al. 2004). While the ability to imitate is assumed to be
innate (Dijksterhuis and Bargh 2001), imitation is not likely to be the lone driving
force behind the systematic propagation of new variants throughout the speech com-
munity, since phonetic imitation is not an entirely automatic or unrestricted process.
Social factors have been suggested as important motivators for imitation (Giles and
Powesland 1975; Clark and Murphy 1982; Bell 1984; Dijksterhuis and Bargh 2001;
Babel 2009). Gender difference is the one that is most commonly observed, although
there are conflicting results regarding which gender is more likely to imitate. Pardo
(2006), for example, found that men were more likely to converge in a map task than
women, yet Namy et al. (2002) found female participants converged more than male
participants in a shadowing experiment. Speaker attitude toward the interlocutor
(Babel 2010; brego-Collier et al. 2011) and perceived sexual orientation (Yu et al.
2011) have also been associated with degree of phonetic convergence and divergence.
Rather than propagating aimlessly and blindly as implied by a simplistic concep-
tion of an exemplar-based model of sound change, these findings suggest that new
variants are spread across the speech community when they come to be associated
with social significance (Eckert 2000; Labov 2001). It is often argued that social sig-
nificance may be associated with new variants via the influence of socially-relevant
innovators within the speech community (Labov 2001). That is, the propagation of
change happens when the sound patterns of an individual or a group of linguistic
innovators (i.e. the Teader(s)' of change) who occupy sociolinguistically influential
positions within the community are adopted by members of the speech community.
Given that the question of selection hinges on the role of the innovator, research in
the selection aspect of sound change actuation has focused on uncovering the social
dynamics that facilitate the promotion of an innovator (e.g. the network configuration
of the social group, the social profile of the innovator, the stylistic practice of the
individual, etc.).
The twin questions of where variants come from and how they come to acquire
social significance via the role of the linguistic innovator within the speech
lo. Individual differences in socio-cognitiveprocessing and sound change 203
10.2 Background
Individual differences in cognitive processing styles are evident at all levels of human
cognition, including vision (Stoesz and Jakobson 2008), learning (Riding and Rayner
2000), and sentence processing (Daneman and Carpenter 1983; King and Just 1991).
Note that 'individual differences' here are taken to mean variability in cognitive pro-
cessing that are systematic (i.e. governed by some fixed factors), rather than the results
204 Alan C. L. Yu
of chance. Before diving into the effects of cognitive processing style on speech pro-
cessing, I briefly consider individual-level factors that could contribute to variation
in phonetic and phonological processing. Broadly speaking, there are two primary
sources: experiential and cognitive-biological.
stimulus change followed by the process to 'read' the neural signal and to create
new perceptual categories: Ntnen 2001; Tremblay et al. 1998), Daz et al. (2008)
interpreted this to mean that, while both groups are equally able to represent the
phonetic auditory sensory information and to integrate this information into mem-
ory representations (i.e. processing at Stage 2), they may differ in the strength and
sensitivity of Stage i processing such that the activation of the neural code necessary
for the processing at the temporal areas might be hampered.
Individual variability may also come from differences in the regulation of neuro-
chemistry across individuals. Motivated by the association of striatal function and
phonological processing, as evidenced in the linguistic performance of patients with
Parkinson's Disease (Abdullaev and Melnichuk 1997), Tettamanti et al. (2005) mea-
sured modulations of the dopaminergic system using [11C]raclopride and positron
emission tomography while (Italian-speaking) participants judged the acceptability
of pseudowords that were made to either conform to or violate the phonotactics of
Italian. Crucially, participants in Tettamanti et al.'s (2005) study were drawn from
a healthy non-pathological population (eight healthy right-handed male university
students, ranging from 22 to 29 years old). Nonetheless, they found significant corre-
lations between performance in the pseudoword judgement task and dopaminergic
input to the left dorsal basal ganglia. In particular, better individual performances
correlate with less dopamine release in the left dorsal caudate nucleus while faster
response time correlates negatively with dopamine release in the left dorsal putamen.
toward a known word is shown to relate to the 'Attention Switching' and 'Imagination
components of the AQ in particular. These findings suggest that individuals with cer-
tain 'autistic traits' are less likely to be affected by lexical knowledge in their phonetic
perception, possibly due to their heightened sensitivity to actual acoustic differences.
The authors ruled out higher auditory sensitivity retardation of lexical access, and
verbal intelligence as potential alternative explanations for the observed correlation.
They found no correlation of AQ with the performance in a VOT discrimination task,
accuracy and speed in a lexical decision task, or individual verbal IQ. Similar findings
have been reported for native Mandarin Chinese from Taiwan (Huang 2007).
To further examine the extent of the association between 'autistic traits' and vari-
ability in human speech perception abilities, Yu (2010) investigated the association
between 'autistic traits' and the perceptual compensation for vocalic context and talker
voice. Previous studies show that listeners generally perceive more instances of [s]
than [J] in the context of [u] than in the context of [a] (Mann and Repp 1980; Mitterer
2006), presumably because listeners take into account the lowered noise frequencies
of /s/ in a rounded vowel context. Similarly, when listeners encounter ambiguous
sibilants, they more often report hearing /s/ when the talker is male than when the
talker is female (Strand 1999), possibly due to the lower peak frequency of /s/ (i.e.
more ///-like) when produced by male talkers than by female talkers.
In Yu's (2010) study, sixty subjects (32 females; age ranging from 18-47, with a
mean of 22 (SD = 4.7)) performed a 2-Alternative Forced-Choice task by listening
to a series of CV syllables (C = a synthesized 7-step /s/-/J/continuum; V = /a/ or /u/
in either a female or a male voice) and deciding whether the fricative was /s/ or ///.
After the identification task, participants took the Autism-Spectrum Quotient (AQ;
Baron-Cohen et al. 2ooib), Empathy Quotient (EQ; Baron-Cohen and Wheelwright
2004) and Systemizing Quotient (SQ; Baron-Cohen et al. 2003). All three quotients
are short, self-administered scales for identifying the degree to which any individual
adult of normal IQ may have traits associated with Autism-Spectrum Condition. Only
the effects of AQ were reported in Yu 2010, given that than article was focused on
establishing, for the first time, a significant association between 'autistic traits' and
perceptual compensation in speech.
Yu (2oioa) found that the magnitude of the compensation (i.e. context-dependent
identification shifts akin to that of the 'Ganong effect') is modulated by the listener's
sex as well as by the level of'autistic traits' s/he exhibits. In particular, individuals with
low AQ, particularly women with low AQ, show the least amount of identification
shift, but this effect of overall AQ score on identification shift is only evidenced in
the perceptual compensation for vocalic coarticulation, not in the case of talker voice
compensation. That is, individuals' overall AQ scores mediate the processing of lin-
guistic information (i.e. vocalic context), but do not seem to influence the processing
of socio-indexical information such as the (perceived) sex of the talker. The author
did observe that the magnitude of talker voice compensation is modulated by the
208 Alan C. L. Yu
D score indicates a brain type of Type S (i.e. D scores between the 65th and 97.5th
percentile), or Extreme Type S (ES; the top 2.5 per cent), while a negative score
indicates brain type of Type E (scores between the 2.5th and 35th percentiles) or
Extreme Type E (EE; the lowest scoring 2.5 per cent). Scores close to zero indicate
a balanced brain type (i.e. Type B; D scores between the 35th and 65th percentile).
Females are said to be stronger in empathy than their drive in systemizing (E > S,
also referred to as Type E), while males have a stronger drive to systemize than to
empathize (S > E, or Type S). According to this typology, individuals with Autism-
Spectrum Condition (ASC) have an extreme male brain cognitive profile (S E, or
Extreme Type S: Baron-Cohen 2002). Of particular interest here are findings suggest-
ing that individual differences in empathizing and systemizing abilities also closely
associate with differences in personality traits. Nettle (2007), for example, found that
EQ correlates significantly with agreeableness as well as with extraversion. SQ is found
to correlate moderately with openness. Such differences in personality traits may have
consequences for how an individual might interact with other members of his/her
social network. EQ, for example, has been shown to be a significant predictor of social
network characteristics (Nettle 2007). Individuals with higher EQ are associated with
a large sympathy group (i.e. close friends) and a larger support clique (i.e. individuals
to whom one turns in a time of major personal problems), as measured by a self-
reported amount of social contacts and social support.
The connection between personality traits and empathy, systemizing drive, and
brain type is further strengthened in light of the results of a recent survey study
conducted with 116 respondents (70 females, age range = 18-36) at the Univer-
sity of Chicago. As shown in Figure 10.1, the EQ scores of the respondents were
found to significantly correlate with four personality traits, in order of decreas-
ing magnitude of correlation: Agreeableness (r = 0.606,p < o.oooi), Conscien-
tiousness (r = 0.324, p < o.ooi), Extraversion (r = o.248,p < o.o), and Openness
(r = o. 198,p < 0.05). EQ is also weakly correlated with respondents sympathy
group (r = 0.185,p = 0.053) and support clique (r = 0.208,p < o.o5). Unlike what is
observed in Nettle's findings, SQ only significantly correlates with Conscientiousness
(r = 0.238,p < 0.05). Of particular interest are the significant correlation between
Brain Type and personality traits. D scores correlate significantly negatively with
Agreeableness (r = 0.484,p < o.oooi) and Extraversion (r = 0.252,p < o.oi),
suggesting that individuals who are Type E (i.e. high D score) are more likely to be
more agreeable and extraverted, while Type S (low D score) individuals are likely to
be less agreeable and more introverted. Individuals with a balanced brain type, which
the difference between the score and the population mean is divided by the maximum possible score of
the quotient (80 for the EQ and 150 for the SQ). The original EQ and SQ axes were then rotated by 45,
essentially factor-analyzing S and E, and were normalized by the factor of 1/2 to produce the new measure,
D (= i/2((5Q - <5Q>)/i50 - (EQ - <Q>)/8o)).
lo. Individual differences in so do-cognitive processing and sound change 211
FIGURE 10.1 Significant correlations between individual-difference dimensions (EQ, SQ, and
D) and personality traits. A = Agreeableness, N = Neuroticism, C = Conscientiousness,
E = Extraversion, O = Openness, SG = Sympathy Group, SC = Support Clique
comprises the bulk of the respondents, tend to exhibit more neutral personality traits,
at least with respect to agreeableness and extraversion.
Given the associations between individual-difference dimensions such as EQ,
SQ, and brain type, which capture individual differences in cognitive processing
styles, personality traits, and other social characteristics, might they also covary
with differences in perceptual compensation responses across individuals, as in the
case of the AQ? If such an association were established, it would go a long way
to establishing a firm link between individual differences in cognitive processing
212 Alan C. L Yu
This section lays out the results of a linear mixed-effects model testing for the effects,
if any, EQ, SQ, and brain type might have on sibilant perception. As reviewed in
section 10.2.3, the data comes from Yu 2010, which tested sixty native speakers of
American English (32 females; age range from 18 to 47, with a mean of 22 (SD = 4.7))
on the classification of an /sV-/V/ continuum by identifying each initial sibilant as
either /s/ or ///. The experiment was implemented in E-Prime. Subjects heard the
test stimuli over headphones in a soundproof booth. Subjects made their selection
by pressing one of two labeled keys on a response box. The session consisted of
three trial blocks. In each block, all 28 tokens (= 2 vowels x 2 talkers x 7 steps)
were presented four times in random order. Each subject categorized 336 tokens
(= 2 vowels x 2 talkers x 7 steps x 3 blocks x 4 times). After the identification task,
participants took the Autism-Spectrum Quotient questionnaire (AQ: Baron-Cohen
et al. (2ooib)), the Empathy Quotient (EQ: Baron-Cohen and Wheelwright (2004)),
and the Systemizing Quotient (SQ: Baron-Cohen et al. (2003)). A more detailed
account of the setup of the experiment and the preparation of the stimuli can be found
in the Materials and Methods section in Yu 2010.
half the items on each questionnaire are worded so that a high scorer will agree with
the item, to avoid response bias. The EQ comprises 40 items and the SQ 75 questions;
two points are given for a 'strongly' response and one point for an appropriate 'slightly'
response. The maximum scores for EQ and SQ are 80 and 150 respectively, while their
minimum is zero.
The distribution of AQ scores was typical of normally developing populations. As
a general comparison, the mean total AQ of individuals with ASC (N = 58) in Baron-
Cohen et al.'s (2001) study was 35.8 (SD = 6.5), while the mean total AQ of the Cam-
bridge University students they surveyed (N = 840) was 17.6 (SD = 6.4). Applying
FIGURE 10.2 Correlations between individual-difference dimensions as measured by the AQ subcomponents, EQ, and SQ for all participants with
regression lines superimposed. The Pearson correlation coefficient, given on top of each subplot, corresponds to the overall correlation irrespective
of sex
i o. Individual differences in so do-cognitive processing and sound change 215
Baron-Cohen et al.'s scoring method (they did not calculate the AQ on a Likert-scale
as in the present study), subjects in the present study have a mean total AQ of 18.45
(SD = 8.25). The distributions of EQ and SQ scores are typical of normal developing
populations as well. Wheelwright et al. (2006) reported that the average AQ, EQ, and
SQ of the neurotypicals in their study were 16.3(80=5.9), 44.3(12.2), and 55.6(19.7)
respectively. Figure 10.2 summarizes the correlation between individual quotients.
SQ correlates significantly only with Attention-to-detail (r = 0.496,p < o.ooi) and
marginally so with Imagination (r = 0.226, p = o.08). EQ correlates significantly with
Attention-Switching (r = 0.391,p < o.oi), Social Skills (r = 0.645,p < o.ooi),
Imagination (r = 0.535,p < o.ooi), and Communication (r = 0.679,p < o.ooi).
SQ and EQ do not correlate significantly (r = 0.169,p = 0.193).
Subjects' ///-responses were modeled using a mixed-effects model with a logit link
function. The model was fitted in R (Team 2010), using the ImerQ function from the
Im4 package for mixed-effects models. Positive regression weights indicate a positive
correlation between a predictor variable and the likelihood of a /// response. The
current model was selected from a full model containing all individual-difference
predictors and their interactions with vocalic context and the subject's biological
sex by eliminating predictors that do not significantly improve model likelihood. In
addition to EQ, SQ, BRAIN TYPE, and the subject's biological sex, the five AQ subscores
were entered into the model, in lieu of the overall AQ, to determine whether the effects
of EQ, SQ, and BRAIN TYPE, whatever they may be, are independent of the effects of the
AQ components on perceptual compensation. Given that the number of individuals
with extreme brain types (EE and ES) was small in this sample population, only
three brain types were considered (i.e. B, E, and S). Exploratory data analysis further
revealed that only the contrast between balanced (B) and imbalanced brain types
(E and S) was relevant, thus the BRAIN TYPE predictor was recoded as a binary pre-
dictor (balanced vs. imbalanced). With the changes to the model predictors described
above, three AQ subscores (IM, AD, CM) dropped out. The final model contains
ten fixed input variables: TRiAL(i-336), STEP (1-7), SUBJECT.SEX (male vs. female),
VOWEL (/a/ vs. /u/), TALKER (male vs. female), AS (1-50), SS (1-50), EQ (0-80), SQ
(0-150), BRAIN TYPE (balanced vs. imbalanced), as well as a by-subject random slope
for TRIAL.
Categorical variables were sum-coded (i.e. female = i, male = i; a = i, u= i;
balanced = i, imbalanced = i). Following Gelman (2008), EQ, SQ, and the AQ
subscores were centered and standardized by dividing the difference between the
input variable and its mean by two times its standard deviation in order facilitate the
comparisons of the magnitude of effects across categorical and continuous factors.
Each unit of difference in a standardized quotient score corresponds to a difference
of two standard deviations. Overall collinearity of predictors was low. The average
partial correlation of fixed effects was 0.014 and the highest variation inflation factor
was 2.479.
2i6 Alan C. L. Yu
Cognitive factors
AS 0.040 0.46l
SS 0.154 0.596
SQ 0.464 0.404
EQ 0.350 0.528
BRAIN TYPE 0.189 0.213
AS x SUBJECT.SEX 0.043 0.385
VOWEL x AS 0.206 0.052 ***
VOWEL x SS 0.251 0.067 ***
VOWEL x SQ 0.217 0.044 ***
VOWEL x EQ 0.512 0.059 ***
VOWEL x BRAIN TYPE 0.096 0.023 ***
VOWEL x AS x SUBJECT.SEX 0.162 0.044 ***
Table 10.2 summarizes the parameter estimate for each of the fixed effects
in the model, as well as the estimate of its standard error SE(/3), and the signifi-
cance level. Consistent with previous studies on the perceptual compensation for
vocalic coarticulation (Mann and Repp 1980; Mitterer 2006) and the sex of the
talker (Strand 1999), the model shows the expected main effects of vocalic context
and talker voice on sibilant perception. There is approximately a 20 per cent drop
in /// response when the following vowel is /u/ ( = o.466, z = o.023,p < o.oooi:
Figure 10.3a), rather than /a/, while the drop in /// response is about 30 per cent
when the talker is male rather than female ( = 0.644, z = 0.022,p < o.oooi: see
Figure io.3b). There is an interaction effect of vocalic context and talker voice
( = o. 152, z = o.021,p < o.oooi); ///-response is least likely when the talker is
male and the following vowel is /u/ (see Figure io.3c). There are also significant
lo. Individual differences in socio-cognitiveprocessing and sound change 217
FIGURE 10.3 Effects of (a) vocalic context, (b) talker, and (c) their interaction, on sibilant
identification
effects of the continuum step on vocalic context and talker voice gender compen-
sation. Beyond these canonical effects, individuals with low AS subscore (i.e. bet-
ter attention-switching skills: Figure io.4a) or low SS subscore (better social skills:
Figure io.4b) are less influenced by the effects of vocalic context in sibilant identi-
fication. The model likelihood is improved significantly in a model with a VOWEL
x ATTENTION-SWITCHING interaction (x 2 (2) = 16.161, p < o.001) or with a VOWEL
x SOCIAL SKILLS interaction (x 2 ( 2 ) = 21 -473>p < o.ooi) relative to a model with-
out these interactions. The interaction between VOWEL and ATTENTION-SWITCHING
was mediated by SUBJECT.SEX. Unlike Yu (2010), the three-way interaction between
VOWEL, SOCIAL SKILLS, and SUBJECT.SEX did not improve data likelihood significantly
(X 2 (4) = 8.325^ = 0.080).
Drive to empathize and to systemize. To test the effects of EQ and SQ on the
perceptual compensation for vocalic coarticulation, the significance of data likeli-
hood improvement of models with and without two-way interactions between these
cognitive traits and vocalic contexts was examined. The interaction between EQ and
VOWEL significantly improved the model's likelihood (x 2 ( 2 ) = 74-47 2 >P < o.ooi:
Figure io.4c); individuals with lower EQ (i.e. poor empathizers) are less affected
by the vocalic context in sibilant classification ( = 0.512, z = o.059,p < o.ooi).
The interaction between SQ and VOWEL significantly improves data likelihood as
well (x 2 ( 2 ) = 24.128,p < o.oooi: Figure io.4d); this interaction indicates that the
lower SQ an individual scores (i.e. the less driven an individual is to systemize),
the less affected the person is by the vocalic context during sibilant perception
( = 0.217, z = o.044, p < o.ooi).
Recall in Figure 10.2 that EQ correlates significantly negatively with both the
Attention-Switching (AS) and Social Skills (SS) subcomponents of the AQ score.
This suggests that poor empathizers (individuals with low EQ) tend to be highly
2i8 Alan C. L. Yu
focused (high AS score) and have poor social skills (high SS score). Yet, the results
of our statistical analysis thus far suggest that individuals who are less influenced by
vocalic contexts in sibilant perception (the minimal compensator) tend to be poor
empathizers with good social skills (low SS), and are also easily distractible (i.e. low
AS score). These cognitive traits thus appear to be in conflict with each other. That is,
a minimal compensator is not likely to be simultaneously a poor empathizer with low
social skills and distracted attention (and vice versa). This conflict is resolved once
BRAIN TYPE is taken into account.
The interaction between BRAIN TYPE and VOWEL significantly improved the
model's likelihood (x 2 ( 2 ) = 20.983,p < o.ooi: Figure io.4e), suggesting individuals
with imbalanced empathy and systemizing traits (i.e. Types E, EE, S, and ES) are
less affected by the vocalic context in sibilant classification than those with a more
balanced brain type. This finding helps to explain the puzzle above, since it suggests
that not all strong empathizers compensate for vocalic coarticulation equally robustly.
Strong empathizers with a weak systemizing drive are less likely to engage in percep-
tual compensation for vocalic context, as do poor empathizers with a strong system-
izing drive. On the other hand, individuals with a balanced drive toward empathy
FIGURE 10.4 Perceptual compensation for vocalic coarticulation as mediated by (a) attention-
switching skills, (b) social skills, (c) empathy, (d) systemizing drive, and (e) brain type (balanced
(B) vs. unbalanced (U))
lo. Individual differences in socio-cognitiveprocessing and sound change 219
and systemizing (i.e. strong empathizers with a strong drive to systemize or poor
empathizers with a weak drive to systemize) are more likely to compensate for vocalic
coarticulation.
change, since many experimental investigations have shown that listeners are on aver-
age quite effective in compensating' for the effects of coarticulation (Mann and Repp
1980; Mitterer and Blomert 2003; Mitterer 2006; Beddor and Krakow 1999; Beddor
et al. 2002; Viswanathana et al. 2010). This has led to the hypothesis that only listeners
with minimal knowledge of the language, such as children and second language
learners, are likely to repeatedly commit such perceptual errors (Ohala 1993!}; see
also Kiparsky 1995).
The H & H theory of phonetic variation (Lindblom 1990; Lindblom et al. 1995), on
the other hand, advocates a more speaker-oriented approach to sound change. The
H & H theory proposes that speakers adaptively tune their performance along the
H(yper)-H(ypo) continuum according to their estimates of the listener's needs in that
particular situation. These needs include preferences to maximize the distinctiveness
of contrasts and to minimize articulatory effort. Speakers hyp er-articulate when lis-
teners require maximum acoustic information; they reduce articulatory efforts, hence
hyp o-articulate, when listeners can supplement the acoustic input with information
from other sources. From this perspective, sound change occurs when intelligibility
demands are redundantly met or when the listeners focus their attention on the
'how' (signal-dependent) mode rather than the 'what' (signal-independent) mode
of listening (Lindblom et al. 1995). New phonetic variants accumulate during the
'how' mode of listening. When these newly accumulated variants are selected by the
listener-turned-speaker, sound change obtains. However, little is known about the
circumstances under which individuals would focus their attention on the signal-
dependent 'how' mode of listening and away from the signal-independent 'what'
mode.
The discovery of individuals with different 'autistic traits' exhibiting variable
degrees of lexical influence in speech perception and perceptual compensation for
coarticulation provides a promising solution to the seemingly opposing views of the
H & H and the listener-misperception approaches to sound change. Recall that indi-
viduals who exhibit minimal compensation for coarticulation (i.e. low AQ individ-
uals) also exhibit strong lexical effects in speech perception (Stewart and Ota 2008),
while those who compensate for coarticulations strongly (high AQ individuals) tend
to exhibit weak lexical influence. This trade-off between the influence from low-level
phonetic variation and higher order lexical information is in concert with cognitive
theories of autism that argue that autistic individuals have superior abilities with
respect to the processing of low-level perceptual information but exhibit difficulties
with the integration of higher-order information (Bonnel et al. 2003, Happ and Frith
2006, Mottron et al. 2006). In light of these findings, from the perspective of the H &
H model, high AQ individuals can be seen as individuals whose cognitive processing
is due to the rounding of the high front vowel, rather than frontness alone. Note, however, that this change
is only triggered by high vowels since non-high rounded vowels do not trigger this palatalization (e.g. [so:] ]
'comb'). The high back rounded vowel /u/ is not permitted after coronals in Cantonese.
lo. Individual differences in socio-cognitiveprocessing and sound change 221
style favors attending to lower order information (i.e. the 'how' mode of listening),
while low AQ individuals tend to focus more on higher order information, such as
lexical information, and place less emphasis on the low-level detail of the incoming
signal (i.e. the so-called 'what' mode of listening). From this point of view, individuals
who favor attending to the 'what' mode of listening should be the ones who register
more new variants in their phonetic memory 'pool', contrary to Lindblom et al.'s
assumption, since the 'what' mode listeners (i.e. low AQ individuals) exhibit lesser
perceptual compensation for coarticulation. That is, when a speaker produces /su/,
perhaps intending to call out for her dog, but the utterance ends up sounding more
like [Ju], a high AQ individual (the 'how' mode listener) would compensate for the
vocalic coarticulation and categorize the [J] as another instance of/s/, as intended by
the speaker. On the other hand, a low AQ individual (the 'what' mode listener) might
be inclined to accept the percept [Ju] at face value and treat /Ju/ as an acceptable
phonological variant for the name of this dog. Under this scenario, two individuals,
one with high AQ and the other with low AQ, upon hearing the same utterance, might
arrive at very different conclusions as to the name of the dog being called. For the low
AQ individual, who starts calling the dog /Ju/ regularly, this might be seen as a mini-
sound change.
Recall that the in dividual-difference dimensions considered in this study are also
significant indicators of personality traits and other social characteristics. For exam-
ple, AQ is correlated positively with neuroticism and conscientiousness and negatively
with extraversion and agreeableness (Austin 2005; Wakabayashi et al. 2006; see also
discussion regarding Figure 10.1). Jobe and White (2007) found that, with a sample
of non-clinical undergraduate students from a large, urban university (N = 97; mean
age = 19.4 =b 2 years), overall AQ significantly negatively correlates with length of
best friendship (r = 0.23,p = 0.02) and total AQ score is also a valid predictor
in a linear regression of loneliness ( = .48,p < .001), as measured by the UCLA
loneliness scale (version 3: Russell 1996). Given that Yu (2010) found that individu-
als with low AQ are more likely to compensate less for coarticulatory influences in
speech, it suggests that such minimal compensators tend to be less neurotic and less
conscientious but are more extraverted and agreeable. They also tend to have longer
best friendship and stronger feelings of loneliness.
Similar inferences might be made with respect to other individual-difference
dimensions. In the correlation study with 116 respondents discussed above
(Figure 10.1), the Attention-Switching (AS) and Social Skills (SS) subcomponents of
the AQ correlate significantly with various personality and social traits (Figure 10.5).
The AS subscore, for example, significantly correlates positively with neuroticism but
negatively with extraversion, suggesting that individuals who are easily distracted (a
trait of minimal compensators) are not very neurotic and more extroverted. AS scores
also correlate marginally significantly with agreeableness (r = 0.202,p = 0.053).
The SS subscores significantly correlate negatively with agreeableness, conscientious-
ness, extraversion, openness, the size of sympathy group and the size of support clique.
SS subscores also positively correlate with neuroticism. Taken together, individuals
with low SS subscores (another trait of minimal compensators) tend to be more
agreeable, less neurotic, more conscientious, more extraverted and more open to new
ideas. Crucially, such individuals also have more social contacts (as measured by the
size of the sympathy group) and more close friends (as measured by the size of the
support clique).
Likewise, EQ correlates positively with agreeableness, conscientiousness, extraver-
sion, and openness; SQ correlates positively with conscientiousness and openness but
negatively with neuroticism (Nettle 2007). Recall also that individuals with higher
EQ are also associated with a larger sympathy group and a larger support clique (see
discussion with respect to Figure 10.1; see also Nettle 2007).
Finally, minimal compensators generally have imbalanced brain types, that is, of
Type E/EE and Type S/ES. Type E and EE individuals, who have a stronger drive
to empathize, are likely to be highly agreeable, extraverted, and neurotic, but may
also be less conscientious and open; Type S and ES individuals, who are superb
systemizers, are not likely to be neurotic and are likely to be conscientious and
open, even though they might be quite introverted. To the extent that personality
lo. Individual differences in socio-cognitiveprocessing and sound change 223
FIGURE 10.5 Significant correlations between the Attention-Switching (AS) and Social Skills
(SS) subcomponents of the AQ and personality traits. Only significant correlations (p < 0.05)
are shown here. A = Agreeableness, N = Neuroticism, C = Conscientiousness, E = Extraver-
sion, O = Openness, SG = Sympathy Group, SC = Support Clique
224 Alan C. L. Yu
traits have consequences for how individuals interact in the social world, it seems
at least plausible that individuals with imbalanced brain types might have different
social network profiles than individuals with balanced brain types. In particular,
I would conjecture that minimal compensators who are superior empathizers might
be at an advantage in exerting their speech patterns on others within their social
network(s).
That women have been argued to be superior empathizers than men (Baron-Cohen
2003) is, for example, consistent with the general characteristics of leaders in linguistic
change. The fact that good empathizers tend to have a larger sympathy group and
support clique is also consistent with the observation that leaders in change often
have more contacts and have access to a wider network. What is not clear is to what
extent highly systemizing individuals (i.e. Type S or ES individuals) also contribute to
the propagation of sound change. Might the fact that Type S or ES individuals tend to
be more introverted and less agreeable (on account of their low EQ) lead them to have
fewer close friends and have less social contacts with others? If so, the speech patterns
of Type S or ES minimal compensators are not likely to influence the speech patterns
of the rest of the speech community. On the other hand, Type S/ES individuals are
also likely to be more conscientious and open. Labov (1973) suggests that the lames'
(i.e. individuals who are social outcasts or isolates during their formative years) tend
to carry less local features in their speech and are least capable of evaluating the
complexity of the in-group features on account of their exposure to more features of
other dialects and varieties. Could these characteristics (i.e. using less local features
and diminished capabilities in evaluating the complexity of the in-group features) be
a reflection of their Type S/ES brain type? Perhaps paradoxically, Labov concludes
that, to the extent that they are the kinds of lames' who eventually manage to break
out of their own niche and succeed in life, they might still manage to propagate their
speech patterns by virtue of having a wider network of contacts (cf. Milroy and Milroy
1985). It should also be noted that the innovators ultimately do not need to be socially
central themselves. Provided that they play the right role in a social network and exert
an effect on the influential individual(s) in that network, their innovations might still
spread.
10.6 Conclusion
In this work, I have offered support for the idea that, in addition to differences in
individual experiences, a major source of variability in speech comes from inherent
differences in the individual's cognitive makeup (as measured by individual-difference
dimensions such as AQ, EQ, and SQ). Crucially, variation in cognitive processing style
can be shown to covary with differences in listener's response pattern during speech
perception, particularly in the case of perceptual compensation for coarticulation.
To the extent that such differences in perceptual response may ultimately lead to
lo. Individual differences in socio-cognitiveprocessing and sound change 225
ii.i Introduction
PHONOLOGIZATIONthe process by which intrinsic phonetic variation gives rise to
extrinsic phonological encodingis often invoked to explain the acquisition and
transmission of sound patterns (Jakobson 1931; Hyman 1976; Ohala 1981; Blevins
2004). A familiar example is the idea that lexical tone contrasts can trace their origins
to the pitch perturbations conditioned by differences in obstruent voicing (Matisoff
1973; Hombert et al. 1979). A phonologization account of tonogenesis is sketched in
Table 11.1. First, intrinsic differences in vowel f0 (Stage I) become a perceptual cue to
the identity of the initial consonant (Stage II). If other cues to the contrast between
initial consonants are lost, the contrast may be maintained solely by differences in
/o (Stage III), setting the stage for a reanalysis of pitch as a contrastive phonological
feature.
* Portions of this work have appeared previously in Kirby (2010). I would like to thank Bob Ladd,
Bob McMurray, Morgan Sonderegger, Alan Yu, and Yuan Zhao for helpful comments and suggestions on
previous versions of this chapter.
ii. Probabilistic enhancement inphonologization 229
lat
fortis ppul pul 'horn
T=a
lenis pul p h l 'fire
3X
aspirated p h ul p h l grass'
This process can be observed in vivo in Seoul Korean, a language which maintains
a three-way phonological contrast between initial stops (Table 11.2). While studies of
Korean stop acoustics conducted during the 19605 and 19705 found this contrast to
be signaled primarily by differences in voice onset time (VOT: Lisker and Abramson
1964, Kim 1965, Han and Weizman 1970), subsequent studies have reported that lenis
and aspirated stops are no longer distinguished solely by VOT in either production
or perception, but rather that/ 0 has come to play a more central role (Kim et al.
2002; Silva 2oo6a, b; Wright 2007; Kang and Guin 2008). One way to describe
this change is as the phonologization of previously intrinsic, mechanical phonetic
variation, conditioned here by initial obstruent voicing.
While the phonologization model provides a useful descriptive framework for this
type of sound change, it also raises several new questions. First, while it is known that
multiple acoustic-phonetic cues are available to signal any given phonological contrast
(Lisker 1986), there has been relatively little discussion of how and why certain cues
are targeted for phonologization. In Seoul Korean, for instance, it has been established
that, in addition to VOT and/0, spectral tilt, the amplitude of the release burst, and
amplitude of the release burst are relevant perceptual cues to the initial onset contrast
(Cho et al. 2002; Kim et al. 2002; Wright 2007). So why was/0, and not some other
cue, phonologized in this case?
A related issue is Hyman's (1976) observation that the phonologization of one
cue often entails dephonologization of another, a process sometimes referred to
as TRANSPHONOLOGIZATION (Hagge and Haudricourt 1978). In the case of Seoul
Korean, as/0 has become an increasingly important acoustic correlate of the contrast
between lenis and aspirated stops, VOT has become correspondingly less informa-
tive. Given that contrasts are almost always redundantly cued, this shift is somewhat
unexpected. What might cause an increase in the informativeness of one cue to be
accompanied by a decrease in the informativeness of another?
This chapter proposes to answer these questions by arguing that phonologiza-
tion is an emergent consequence of adaptive enhancement in speech (Lindblom
1990; Diehl 2008). In particular, it is proposed that as contrast precision is reduced,
cues are enhanced to compensate. The degree of enhancement is argued to be a
230 James Kirby
probabilistic function of contrast precision, while the probability with which a given
cue is enhanced is related directly to its informativeness, the degree to which it con-
tributes to accurate identification of a speech sound (what Hume and Mailhot, this
volume, refer to as CUE QUALITY). To explore this hypothesis, phonetic categories are
modeled as finite mixtures (Nearey and Hogan 1986; Toscano and McMurray 2010),
and a case studythe phonologization of f0 in Seoul Koreanis explored in detail
through the use of agent-based computational simulations. The results suggest that
both probabilistic enhancement and loss of contrast precision interact to drive the
process of phonologization.
The remainder of this chapter is structured as follows. Section 11.2 reviews the
roles of the speaker and listener in sound change and motivates an adaptive notion
of enhancement. Section 11.3 discuss the mixture model of phonetic categories, and
section 11.4 describes the algorithm used to simulate speaker-hearer interaction.
These are used to explore the phonologization of/0 in Seoul Korean in section 11.5.
The results and implications are discussed in section 11.6, and section 11.7 provides
a general conclusion.
respond to loss of precision more generally. Researchers such as Ohala (1981 et seq.)
often assume, tacitly or otherwise, that speakers produce phonetic targets more or less
as they are intended (modulo contextual effects such as coarticulation). The response
to a loss of precision may then be a reanalysis on the part of the listener. For example,
on this view, phonologization of a cue such as/0 might come about due to listeners'
failure to compensate for the intrinsic perturbation effects of an initial consonant
on the pitch contour of the following vocalic segment. After these effects have been
phonologized, the initial conditioning environment (here, obstruent voicing), now a
redundant cue to the contrast, is free to dephonologize. However, it is not clear what
motivates this dephonologization, given that phonetic distinctions are rarely signalled
by a single cue. It is also not immediately clear why listeners would fail to compensate
for intrinsic variation along one dimension but not another.
A different account is suggested by more broadly functional approaches to sound
change, which hypothesize a more active role for the speaker (Liljencrants and Lind-
blom 1972; Kingston and Diehl 1994; Boersma 1998). A common theme in these
treatments is the idea that the acoustic realization of a phonetic target may be mod-
ulated both by TALKER-ORIENTED constraints enforcing efficiency in speech commu-
nication ( be efficient') as well as LISTENER-ORIENTED constraints requiring speech
sounds to be sufficiently distinctive ( be understood'). Talker-oriented constraints
are often implemented by penalizing gestures in terms of the energy or precision
required for their realization. Listener-oriented constraints are usually implemented
in such a way as to maximize distinctiveness between contrasts, although this takes
on a variety of forms: combining articulatory gestures which have mutually reinforc-
ing acoustic consequences (Kingston and Diehl 1994), adding redundant features
or secondary gestures to reinforce contrast perception (Stevens 1989; Keyser and
Stevens 2006), encoding a preference for accuracy in the approximation of phonetic
targets (Lindblom 1990; Johnson et al. 1993a; Boersma 1998), or imposing systemic
constraints to maximize the distance between contrasts (Liljencrants and Lindblom
1972; Flemming 2002).
A common thread in all of these treatments is the notion of enhancement of
phonetic targets. In this chapter, the term ENHANCEMENT will be used specifically to
refer to those actions taken on the part of the speaker which increase the precision of
a phonetic contrast. For example, a talker might enhance the contrast between two
initial obstruent categories by producing them with hyperarticulated VOT values,
or by reducing the variability in their productions of those values. These notions of
enhancement and precision will be more rigorously formalized in sections 11.3 and
11.4 below.
Functional approaches predict enhancement to be more likely in situations where it
would improve intelligibility for the listener. This suggests at least a partial explanation
for why any particular phonetic property might be phonologized: all else being equal,
cues which more reliably signal a difference between categories are more likely to be
enhanced. However, it is still not clear why phonologization should be accompanied
232 James Kirby
d)
where the structure 0 = ((7tlt f i l t EI), . . . , (TTK> /%> 2it)) contains the component
weights 7T, mean vectors /, and covariance matrices of the D-variate compo-
nent Gaussian densities A/"i,..., A/it- Figure n.ia shows how these three parameters
describe a given mixture component.
To make this more concrete, think of x as a bundle of cue values representing an
instance of phonetic category c; of D as representing the number of cue dimensions
(m1, m 2 > . . . , mu) relevant to the perception ofthat category; and of K as representing
the total number of category labels ( C i , c 2 , . . . >CFC) competing over the region of
ii. Probabilistic enhancement inphonologization 233
FIGURE 11.1 (a) Parameters of a Gaussian distribution for a single component (adapted from
McMurray et al. 2009). (b) Two class-conditional Gaussians (dotted grey lines) and their mix-
ture (solid black line)
phonetic space defined by D. For example, for a language like Korean with three
initial stops (K = 3) cued along five dimensions (D = 5), we might have c1 = /p/, c2 =
/pp/, c3 = /p h / and m x = VOT, m2 = burst amplitude, m 3 =/ 0 , rn4 = spectral tilt, and
m 5 = following vowel length. A given observation x will thus consist of five elements,
each one providing a value for one of these cues.
Figure 11. ib illustrates a GMM where K = 2 and D = i. The individual component
densities are shown in gray, while the mixture density is outlined in black. Although
more difficult to visualize, the mixture modeling approach extends straightforwardly
to the multivariate case where D > i.
In the GMMs for phonetic categories used in this chapter, experience forms the
basis for both production and perception. The speaker's task is to produce an instance
of a phonetic category; this may be modeled by sampling cue values from the relevant
class-conditional mixture component A/^. The listener's task is to assign this utterance
a category label c. If we assume that listeners weight information in the speech signal
by its quality (informativeness), we can construct a model of their behavior that would
optimize this task. Such models are sometimes referred to as IDEAL OBSERVER models
(Geisler 2003; Clayards 2008). The following section provides a brief overview; for a
more in-depth treatment, see Clayards (2008) or Kirby (2010).
evidence that cue m takes on value x can then be evaluated using Bayes' rule, as
shown in (2):
(2)
(3)
FIGURE 11.2 (a) Probability distributions of a cue dimension m for two categories cx (dark
lines) and c2 (light lines). Solid lines show a mixture where there is little overlap between the
components, dashed lines a mixture with more overlap, (b) Optimal categorization functions
given the distributions in (a). (Adapted from Clayards et al. 2008)
(4)
The informativeness com for an individual cue can then be expressed as
(5)
11.3.3 Categorization and contrast precision
Equation 3 allows the listener to compute the probability of category membership, but
it does not determine how such information should be used to assign a category label.
The approach taken here is to assign utterances a category label with probability pro-
portional to their relative strength of group membership (Nearey and Hogan 1986).
For example, an utterance which has probability 0.9 of belonging to category c1 and
probability o.i of belonging to category c2 will be assigned label c1 90 per cent of the
time, and label c2 10 per cent of the time. However, the statistically optimal classifier
the model which maximizes classification accuracyassigns the category label with
the highest maximum a posteriori probability. To continue with the previous example,
an utterance which has probability 0.9 of belonging to category c1 and probability o.i
of belonging to category c2 will always be assigned label c1 by the optimal classifier.
Although optimal classifiers make strong assumptions and their predictions are not
always in line with human classification behavior (Ashby and Maddox 1993), they
provide a lower bound on the error rate that can be obtained for a given classification
problem. In this work, contrast precision s is defined as the current error rate of the
optimal classifier for that contrast, i.e:
236 James Kirby
(6)
11.4.1 Architecture
Simulations are run for a fixed number of iterations. Each agent is characterized by
a lexicon, a set of exemplar lists 1,..., % corresponding to their experience with
ii. Probabilistic enhancement inphonologization 237
phonetic categories cly..., CK- Before the simulation begins, these lists are populated
by sampling from the conditional densities of a GMM representation of each category.
For simplicity here we consider agents with lxica containing just two categories.
Subsequently each iteration consists of a single interaction between two agents,
one acting as speaker and the other as listener (the framework can also be extended
to accommodate more than two agents). Each iteration contains four steps: produc-
tion, enhancement, bias, and categorization. All agents use the same production and
categorization strategies described in section 11.3. However, the strength of bias and
the degree of enhancement can be altered by manipulating two tuning parameters:
1. Production. In the production phase, the talker agent selects a target category
Cfc based on the mixture weights TT^, and samples a series of values xly... >XD
from the conditional densities J\fd(x\k,0) to form a PRODUCTION TARGET
X = (*!,.. . ^D)"^
values stay within a well-defined range, each bias term A.^ may be scaled rela-
tive to the distance between category means before being applied, approaching
zero when the means become identical (i.e. when the dimension is no longer
informative in distinguishing the contrast).
4. Categorization. Finally, the modified production target x' is presented to the
listener agent for classification, who assigns it a category label as described
in section 11.3.1. Once labeled, x' is added to the appropriate exemplar list.
Both agents then recompute the memory decay weights for each exemplar in
their lexicon, and delete exemplars whose weights have fallen below the decay
threshold. In the next iteration, the role of speaker is assumed by the listener
agent and vice versa.
In summary, the architecture provides two tuneable parameters (X and /3), corre-
sponding to phonetic bias and functional load, respectively. Varying these parameters
allows us to explore the effects of probabilistic enhancement in different scenarios, and
to see what parameter values best approximate observed data patterns. In the follow-
ing section, the probabilistic enhancement hypothesis is explored in this framework
using empirical data from the phonologization of/0 in Seoul Korean.
The simulations described here considered five cues which have been argued to
be relevant for the perception of the Korean stop contrast: voice onset time (VOT),
/o and duration of the following vowel (VLEN), the difference in amplitude between
the first two formants of the vowel (HiH 2 ), and the amplitude of the burst (BA).
Data on each of these cues reported in Cho et al. (2002), Kim et al. (2002), Silva
(2oo6a), and Kang and Guin (2008) were used to seed the initial exemplar lists of
two ideal observer agents with a simple lexicon consisting of just two syllables, lenis
/pa/ and aspirated /p h a/. This state corresponds to the cue distributions reported for
Seoul Korean speakers in the 19605. The initial parameters and their correspond-
ing informativeness values are shown in Table 11.3; two-dimensional scatterplots
showing the joint distributions of VOT and each of the cues are shown in the first
row of Figure 11.3. The second row of Figure 11.3 shows distributions based on the
parameters shown in the second half of Table 11.3, estimated on the basis of the speech
of younger speakers gathered in the 2ooos. It is to these distributions that the state of
the agents will be compared at the end of each simulation run. In other words, we
want to see under what circumstances the agents' states will evolve from the top row
of Figure 11.3 to the bottom row.
Three series of simulations are reported, each seeded with the same initial configu-
ration. The first round of simulations considered the effects of applying probabilistic
enhancement in the absence of phonetic bias (section 11.5.1); the second considered
the effect of applying phonetic bias to the production of a single cue, but without
enhancement (section 11.5.2); and the third explored the effects of applying both
enhancement and bias (section 11.5.3).
The simulations reported here are representative runs of 25,000 iterations, at which
point the statistical reliability of the cue targeted by the bias factor and/or the probabil-
ity of enhancement approached zero. Goodness of fit between the target distributions
and the results of the various simulations was quantified by the KULLBACK-LEIBLER
(KL) DIVERGENCE (Kullback and Leibler 1951) between each target and simulated
cue dimension. This is a non-symmetric measure of the dissimilarity between two
distributions; KL divergence equals zero when two distributions are identical and
grows with the dissimilarity between them.
FIGURE 11.4 Cue distributions (gray = lenis /pa/, black = aspirated /p h a/) after 25,000 iter-
ations. Row i: enhancement without bias. Row 2: bias without enhancement. Row 3: bias and
enhancement. Row 4: empirical targets. Captions give cue informativeness as computed by Eq.
(5). VOT = voice onset time (in ms); VLEN = vowel length (in ms); H x H 2 = spectral tilt (in
dB); BA = burst amplitude (in dB)
The results of a representative simulation run are shown in the second row of
Figure 11.4. As evidenced both by the scatterplots as well as the a) values, VOT has
ceased to be informative in distinguishing this contrast; to the extent that a contrast
between the two categories still exists, it is supported chiefly by a difference in/0 (row
2, panel 4). This differs slightly from the attested modern Korean situation (row 4) in
11. Probabilistic enhancement in phonologization 243
FIGURE 11.5 Comparison of contrast precision as measured by classification error rate at each
simulation timestep for simulations reported in sections 11.5.1-11.5.3
that the actual parameters characterizing the distributions of/0 have not changed for
either category:/0 has become the most informative cue simply because all other cues
have become less informative. However, the empirical Korean data indicate that the/0
means for aspirated and lenis obstruents have shifted slightly away from one another,
suggesting that they have been enhanced both in terms of a shift in means as well as
a reduction in variance (compare rows i and 2 of Figure 11.3).
As shown in panel 2 of Figure 11.5, in the absence of any kind of enhancement,
the precision of the contrast degrades steadily over time as bias is applied. These
simulation results indicate that while a redundant or covert contrast may become
exposed by a systemic production bias, at least in the present case, bias alone cannot
account for the shifts in cue distributions that are empirically observed.
dimension. At no point was/0, or any other cue, specifically targeted for enhancement.
As seen in panel 3 of Figure 11.5, while the error rate increased slightly in the early
iterations of this simulation, it was quickly reduced by the countervailing force of
probabilistic enhancement.
11.7 Conclusion
This chapter has argued for the role of probabilistic enhancement in phonologization
through computational simulation of an ongoing sound change in Seoul Korean. Two
challenges faced by a phonologization model of sound change were addressed: deter-
mining how cues are selected, and explaining why phonologization is often accompa-
nied by dephonologization. It was proposed that cues are targeted for enhancement
as a probabilistic function of their informativeness, so a cue which may be targeted
for enhancement in one language may be ignored in another. Simulation results using
empirically derived cue values were presented, providing strong support for the idea
that loss of contrast precision may drive the phonologization process. Depending on
the distribution of cues, the interaction of phonetic bias and probabilistic enhance-
ment can set the stage for a reorganization of the system of phonological contrasts.
12
12.2 Background
12.2.1 Vowel harmony
Across a wide variety of languages and in virtually all language families, one finds
vowel co-occurrence restrictions operating over particular phonological domains.
These constraints on which vowels may appear together in a word are typically con-
sidered a unitary phenomenon and called vowel harmony. The vowels in a language
with vowel harmony can be classified into disjoint sets2 such that vowels from only
one of the sets are found within the relevant domain, typically a phonological word.
A standard example from the literature involves the front/back distinction in Finnish
vowels (van der Hlst and van de Weijer 1995):
The point to note in Table 12.1 is that the root vowels are either all front {y, a} or
all back {u, a}, and that elative case has two exponents, [-st] and [-sta]> whose vowel
backness depends on whether the stem has front or back vowels.
12.2.2 Phonologization
Phonologization is a term used to describe the diachronic process whereby linguistic
variation that is under physical/physiological (i.e. 'phonetic') control comes to be
under cognitive (i.e. 'phonological') control. The term was introduced by Jakobson
and was most recently reintroduced by Hyman (1972).
2
I am glossing over the issue of neutral vowels, as they are not addressed by the simulations reported
here. See section 12.6 for some discussion of ongoing work dealing with neutrality.
12. Modeling the emergence of vowel harmony through iterated learning 249
For the purposes of this chapter, I will take phonologization to simply mean that
some detectable variation that is not due to any properties of the target phonological
grammar (i.e. the grammar that produces the data that the acquirer learns from),
becomes encoded in the acquirer's phonology.
12.2.3 Co articulation
Co articulation is the label given to the predictable effects that segments have on their
neighbours in running speech. Co articulation may affect adjacent consonants, as
when an English speaker says [Iimbejk9n] for lean bacon, anticipating the bilabial
closure (Kingston 2007), or between vowels and consonants, as when an English
speaker produces a nasalised vowel before a nasal consonant, as in pfi~/pit/~[pit]
versus pm~/pin/~ [pm]. Finally, it has been known since the work of hman (1966)
that vowels may coarticulate with other vowels across intervening consonants.
This vowel-to-vowel (V-to-V, henceforth) co articulation underlies one of the best-
known explanations for the existence and typological distribution of vowel harmony.
Ohala (i994b) proposes that vowel harmony is a result of the phonologization of this
V-to-V co articulation. In particular, he argues that harmony results when listeners
are unable to 'parse out' or compensate for the acoustic effects of distal segments (viz.
neighbouring vowels) and misattribute contextual variation to the proximal segment.
presence of two alternating sets of vowels in the inventory. When one set induces
the other to change, vowel harmony exists in that language' (Mahanta 2007: 14,
emphasis my own).3
As stated in section 12.1, the work presented here focuses on the diachronic emer-
gence of lexical harmony. Although this may seem surprising in light of the preceding
discussion of the perceived importance of synchronie alternations, I believe the work
described here is nonetheless a valuable first step in getting a computational handle
on diachronic explanations. Moreover, there is at least some evidence that lexical har-
mony in the absence of alternations may be used by the phonological system (Denis
2010). In the closing sections I discuss ongoing work addressing the acquisition and
emergence of productive alternations in vowel harmony.
item 5 leads to difficulties. There is no obvious way to verify or test the diachronic
dimension which is crucial to this kind of explanation. To be sure, one can make
and record some predictions and trust that their confirmation or refutation will
be followed up on by future generations, but this is a rather unrewarding way of
doing research. Moreover, it is almost impossible that specific, falsifiable predictions
would ever pan out, given the sheer amount of uncontrollable factors, e.g. patterns
of connectivity and communication in social networks, language contact situations,
etc.4 Of course, rather than making predictions about specific occurrences of change,
diachronic explanations make typological predictions and retrodictions that are in
principle open to verification. In other words, if a particular change is predicted to be
likely or frequent, one assumes that its outcome will be typologically well-represented.
Of course, typological data are as subject to noise and extraneous factors as any others,
and in fact are probably more subject to arbitrary types of noise that are difficult to take
into account (e.g. how funding gets distributed, and which languages are considered
'interesting' or worthy of study, which language groups are accessible, etc.).
An implicit claim of this chapter is that computational modeling is a viable, useful,
and perhaps soon necessary tool to have in a diachronic linguists arsenal. Modeling
gives the researcher a Virtual lab' in which to test explanations, with tight control
over parameters of interest, as well as perfect repeatability. In addition, computational
models generate quantitative data, which at least in principle allows for the possibility
of theory comparison and choice. Finally, implementation of a particular diachronic
explanation or model forces a rarely seen degree of explicitness and precision with
respect to the necessary auxiliary assumptions and parameters.
possess some kind of learning algorithm, which can be viewed as a function mapping
the internal state to itself (Russell and Norvig 1995).
Synthetic models can be further subclassified according to constraints on the flow
of information between agents. In a horizontal model, any pair of agents can interact
and all agents can update their internal state. An example of this type of model in a
language-based context is in de Boer (2001). In a model with vertical information flow,
there are restrictions on which pairs of agents may communicate and which agents
may change their internal state.
Kirby (1999) introduced and popularised linguistic agent-based models with ver-
tical information flow as iterated learning models.5 In an iterated learning model, the
population of agents is partitioned into two disjoint classes, one with fixed internal
state (modeling 'adults') and the other with modifiable internal state (modeling chil-
dren or learners'). Agents may only communicate across classes, and most typically,
children are listeners and adults speakers. The adult grammars serve as approximate
targets to which the child grammars are meant to converge. Upon convergence, or
after some predetermined amount of time, the adults are replaced by the children,
whose internal states become fixed, a new generation of children is introduced, and
the process is repeated. This feedback loop, iterated over several generations, is meant
to explicitly capture the interaction between I- and E-language (Chomsky 1986) in
language transmission and acquisition. There are two potential drivers of change in
these models: noisy data transmission and the information bottleneck that obtains
when learners are exposed to only a subset of the data.
The model presented here is in a sense the simplest possible iterated learning model,
with one adult and one child per generation. Notwithstanding this simplicity, the
model shows how noisy language transmission coupled with a form of probabilistic
learning can change a gradient pattern of V-to-V co articulation into a pattern of lexical
vowel harmony.
pressures derived from the production-perception feedback loop, coupled with the
dynamics of lexically-biased exemplar models, lead to (i) the appearance of cate-
goricity from initially gradient phenomena, (ii) general patterns of contrast main-
tenance, and (iii) something akin to the strict constraint domination in Optimality
Theory.
Dras and Harrison (2002) create multi-agent simulations of the emergence of
backness harmony in the Turkish lexicon, with a particular focus on modeling the
S-shaped' trajectory that has been claimed to characterize historical language change
(Kroch 1989). They model a population of interacting Turkic speakers (i.e. horizon-
tal information flow) initiated with 50 per cent harmonic i,ooo-word lexicons. At
each interaction, an agent can choose with some fixed probability to harmonize or
disharmonize a word that is transmitted to it. The single parameter which controls an
agent's decision to (dis)harmonize a word conflates several properties, some of which
I am interested in keeping apart (coarticulation, lexicon structure,... ). Additionally,
children in this model directly inherit a subset of their parents' lexicon, eliminating
the particular interaction that is key in the account developed here. Although these
simulations are clearly related to the present work with respect to content (vowel
harmony), the choices that the authors make in designing their models prevent them
from addressing the issues with which I am concerned here.
Choudhury (2007) is concerned with creating computational models of real-world
phonological change, specifically changes in Bengali verbal inflections, and the devel-
opment of a schwa-deletion rule in Hindi. One of these models is a multi-agent simu-
lation of the development of a schwa-deletion rule in Hindi. Choudhury's agents have
a stochastic bias toward schwa-reduction, and interact by means of an 'imitation game'
(de Boer 2000), in which there is explicit feedback about communicative success
or failure. This is plainly an unrealistic (albeit not uncommon) model of linguistic
interaction, especially between parent and child. Perhaps most troubling, however,
is that the built-in stochastic tendency for context-free schwa reduction seems to
build the looked-for behaviour right into the model. Given a steady stochastic bias
towards the shortening of schwa, it seems inevitable that schwa-deletion should be
the outcome (cf. section 12.5 on the 'actuation problem'). In sum, Choudhury's model
provides a good example of synthetic modeling, but is still not addressing the issues
that I hope to explore.
6
The models were programmed in Python, making heavy use of the numerical and scientific packages
NumPy and SciPy (Oliphant et al. 2001). Source code may be obtained from the author.
254 Frdric Mailhot
FIGURE 12.1 Architecture of a linguistic agent (adapted from Russell and Norvig 1995)
and production modules in lieu of sensors and effectors, and an internal knowledge
state, which essentially models a lexicon (cf. Figure 12.1).
The chief building blocks of the lexicon are two binary phonological features which
model the standard phonological features [HIGH] and [BACK]. 7 Lexical items are
sequences of four vowels,8 and there is no morphophonology This is clearly a highly
impoverished grammar', and yet it will be shown to suffice for the induction of lexical
harmony, given the learning algorithm discussed below. To model more sophisticated
aspects of harmony (e.g. alternations, neutrality) will require additional (or different)
entities and operations, cf. the discussion in section 12.5.
In producing outputs, discrete phonological features are transduced to continu-
ous articulatory parameters [HIGH], [BACK], and [ROUND] on the real interval [o, i].
These articulatory specifications are Beta distributed (see the appendix for details
concerning the parameters) over the front/back space modeling individual-level
hypo/hyperarticulation (Lindblom 1990). The articulatory parameters are in turn fed
to the following equations from de Boer (2001) to synthesize Fi and F 2 formant
values.9
7
Whether these features are learned or innate is orthogonal to the discussion here, although I find the
arguments by Mielke (2008) persuasive. I assume their availability here for convenience.
8
I abstract away from consonants, since the focus here is on vowel-to-vowel coarticulation and
harmony.
9
[ROUND] was unused and consistently set to zero.
12. Modeling the emergence of vowel harmony through iterated learning 255
Here, v represents the learner's hypothesis about the underlying structure (i.e. feature
description) of the vowel under consideration, P(D = d\V = v) is the likelihood of
the observed acoustic form, given the learner's hypothesis, P(V = v) is the prior prob-
ability ofthat hypothesis being correct, and z is a normalizing constant to ensure that
the calculation generates well-behaved probabilities. Since I only investigated uniform
priors (i.e. each underlying representation is equally probable, a priori), this algorithm
reduces to Maximum Likelihood learning, whereby the underlying representation
that gives highest likelihood to the observed acoustic form is the one chosen.
Given the articulatory specifications for the vowel cluster centres, the learner then
assigns underlying representations to entire lexical entries by means of a simple vector
quantization algorithm; each vowel in a word is assigned the underlying representa-
tion of the acoustic prototype nearest to it.
10
Because this noise models a sum of presumably independent sources, a Gaussian is a reasonable
hypothesis for its shape.
11
The variable k was set explicitly to 4 in these simulations. The simplification of essentially telling the
learners how many vowels to look for was mainly in the interests of computational tractability, although it
is also not implausible given the assumed availability of two binary features. Some attempts were made at
clustering the acoustic data using a mixture of Gaussians trained with the EM algorithm, and finding the
appropriate number of clusters with the Bayesian Information Criterion, but the addition of co articulatory
effects renders the data non-Gaussian and so the number of clusters was consistently overestimated.
12
This is clearly an unrealistic assumption, which could presumably be addressed in future research
with an 'analysis-by-synthesis' approach (Stevens and Halle 1967).
256 Frdric Mailhot
Algorithm i sketches the sequence of steps carried out for each generation of the
iterated learning model incorporating the production and comprehension modules
discussed above.
12.4.2 Simulations
For each degree of coarticulationanticipatory or perseveratory from o Hz to 400 Hz
in 50 Hz incrementsthe model was run 15 times for 250 iterations. The graphs in
Figure 12.2 show the results for some of these parameter settings with anticipatory
coarticulation.13 In particular, they show the increase over time (measured in gen-
erations') of the proportion of lexical items in the learners' lexicons that have fully
harmonic underlying feature specifications, i.e. full agreement of [BACK] across all
vowels in a word.
Of interest is the fact that there appear to be two stable levels of harmony between
absence of harmony and full harmony. Figure i2.2a shows the lexicon asymptoting
toward a harmonic proportion in the neighbourhood of 0.33, while Figure 12.2C
and Figure i2.2d show another region of stability around 0.66. Additionally, in
Figure i2.2a and Figure i2.2c we see clearly that for any given parametric set-
tings, a subset of the runs may 'escape' the principal region of stability and end
13
The results with perseverative coarticulation were qualitatively and quantitatively similar and will not
be discussed. Also, runs with intermediate co articulatory values are not shown or discussed here, as they
had qualitatively similar dynamics, and varied only in the speed at which they achieved stability.
12. Modeling the emergence of vowel harmony through iterated learning 257
FIGURE 12.2 Effects of varying degrees of anticipatory coarticulation. Fifteen runs per figure.
Gaussian noise ~ Ai( = o, a = 30) on post-articulatory outputs
258 Frdric Mailhot
with a higher proportion of harmonic forms in the lexicon (or else reach a plateau
much more quickly than other runs with the same parametric specification). This
variability across different runs of a particular parametric configuration is due to
post-coarticulatory Gaussian noise. The randomness of the distribution in acous-
tic space interacts synergistically with coarticulation, increasing the likelihood that
in the assignment of underlying forms, any particular vowel will be categorized
in an 'incorrect' cluster, i.e. assigned to an acoustic prototype different from that
which generated it. Because coarticulatory noise is anisotropic and biased in the
direction of the opposite articulatory specification, this misclassification is more
likely to happen in the direction of increased local harmony (viz. in agreement with
the immediately preceding or following vowel). Depending on where a particular
speaker lies along the hypo/hyperarticulatory continuum (recall that the parameters
a y which control this are normally distributed), several of these misclassifications
may occur together within a generation and conspire to drive a language toward
harmony much more quickly than is typical, as seen in e.g. five of the runs in
Figure i2.2a.
of coarticulation characteristic of human speech (see Beddor et al. 2002 for some
data). This in turn results (by hypothesis) in only sporadic opportunitiesabetted
by anisotropic noise from other factorsfor phonologization of the variety described
here.
There remains much work to be done in fleshing out this model to more accurately
reflect the conditions that obtain in real-world examples of sound change and phonol-
ogization. The model as presented here fits broadly into the view of phonological
diachrony espoused by Hale (2007), whereby sound changes and phonologization
are initiated within the heads of individual (particular) speaker-hearers. Of course,
individuals acquire their language from multiple sources (hence more variable input
forms), and children's language is often shaped as much by their peers as their parents
(Labov 1994), so even for a diachronician who ascribes to Hales viewpoint, it seems
unwise to avoid the influence of external actors. An incarnation of the model currently
in development incorporates acquisition from multi-source data.
12.6 Conclusions
The work presented in this chapter represents a first step in demonstrating that
computational modeling can supportand even be a crucial component of
diachronic explanation of synchronie phonological patterns. Given the recently
increasing focus on this style of explanation (Blevins 2004; Hale and Reiss 2008), and
the obstacles to empirically investigating phenomena which arise over timescales
potentially spanning centuries or millennia, the usefulness of computational models
in putting diachronic functional explanations on a sound theoretical and empirical
footing is clear.
In this chapter, I focused on a particular instance of diachronic explanation: Ohala's
(i994b) claim that vowel harmony emerges from the phonologization of vowel-to-
vowel coarticulation. Using a simple model of the language transmission/acquisition
feedback loop iterated over multiple generations, I showed how a gradient pattern of
front/back coarticulation coupled with anisotropic noise arising from external factors
(fatigue, noise, etc.) could eventually become phonologized as a categorical pattern of
lexical harmony.
Beddor and colleagues (Beddor et al. 2002, 2007; Beddor 2009) have recently
demonstrated that coarticulationin V-to-V and VN sequencesand 'perceptual
compensation for co articulation are highly language-specific, in particular that antic-
ipatory and perseverative co articulation vary widely in degree across languages, and
that compensation for coarticulation is largely attuned to a language's amount of
co articulation. This immediately puts the 'phonologization of coarticulation account,
at least as it has been implemented here, on a less certain footing. If listeners gen-
erally compensate as much as speakers tend to coarticulate, it is unclear whether
failures of compensation happen frequently enough for phonologization to gain any
traction.
Independently of this, there is a line of research giving increasing evidence that
language users have access to highly detailed episodes of linguistic experiences
(Goldinger 1996; Johnson 199/b; Pierrehumbert 2ooia; Hawkins 2003, inter alia),
and in particular that language users store acoustically-detailed 'word-sized' exem-
plars of linguistic experiences (Silverman 2oo6a; Johnson 2007; Vlimaa-Blum 2009).
But if humans' lexical representations are acoustic and word-sized, then there is no
meaningful sense in which coarticulation, within words at least, happens at all. Con-
sider a very basic example, in which the difference between the (relatively palatal)
[k] in keep versus the (relatively velar) [k] in coop is highlighted as an example of
anticipatory coarticulation. According to the 'phonetically detailed exemplars' view,
this difference is (at least synchronically) not attributable to coarticulation, but instead
is a product of the fact that these forms have only ever been heard in their respective
palatalized and velarized forms by the language learner.
In ongoing research (Mailhot 2010), I am modeling the synchronie acquisition and
diachronic emergence of vowel harmony within such an exemplar-based approach
to phonetics/phonology. In these models, agents explicitly store word-sized for-
mant (Fi, F2) sequences. Individual word tokens are synchronically subject only
to isotropic (Gaussian) noise, modeling the sum of external' noise sources, and
the emergence of vowel harmony comes about due to synchronie perceptual biases.
Synchronically, the model acquires productive alternations (e.g. in affixal morphol-
ogy) successfully, and preliminary results on the diachronic model indicate that
this acquisition model embedded into an iterated learning simulation can in some
instances give rise to such alternations over time, from an initial state lacking such
alternations.
12. Modeling the emergence of vowel harmony through iterated learning 261
The Beta distribution (Weisstein 2009) models events which are constrained to take
place within an interval, e.g. the probability density of hitting an instance of an artic-
ulatory target.
FIGURE 12.3 The Beta distribution, for various values of shape parameter a ( = 5)
The shape parameters were distributed a ~ A/"(4O, 5) and ~ A/"(5> i) for the
simulations discussed here.
13
13.1 Introduction
In every language, change is ubiquitous and variation is widespread. Their interac-
tion is key to understanding language change because of a simple observation: every
linguistic change seems to begin with variation, but not all variation leads to change.
What determines whether, in a given linguistic population, a pattern of variation leads
to change or not? This is essentially the actuation problem (Weinreich et al. 1968),1
which we rephrase as follows: why does language change occur at all, why does it
arise from variation, and what determines whether a pattern of variation is stable
or unstable (leads to change)? This chapter addresses these questions by combining
two approaches to studying the general problem of why language change occurs: first,
building and making observations from datasets, in the tradition of sociolinguists and
historical linguists (such as Labov and Wang); second, building mathematical models
of linguistic populations, to model the diachronic, population-level consequences
of assumptions about the process of language learning by individuals (Niyogi and
Berwick 1995 et seq.; Niyogi 2006).
We describe the diachronic dynamics of an English stress shift, based on a
diachronic dataset (1600-2000) which shows both variation and change. This stress
shift has several interesting properties which can be explored using computational
models. We focus here on a pattern characterizing much language change, the
* We thank two anonymous reviewers for comments on an earlier draft of this chapter, John Goldsmith,
Jason Riggle, and Alan Yu for insightful discussion, and Max Bane for both. Audiences at LabPhon 11, the
University of Chicago, and Northwestern University provided useful feedback.
1
'Why do changes in a structural feature take place in a particular language at a given time, but not in
other languages with the same feature, or in the same language at other times?' (p. 102).
13. Variation and change in English noun/verb pair stress 263
13.2 Data
The data considered here are English disyllabic noun-verb pairs such as convict, con-
crete, exile, referred to as N/V pairs throughout. As a rough count of the number of
N/V pairs in current use, 1143 are listed in CELEX (Baayen et al. 1996).3 N/V pairs are
a productive class (YouTube, google).
All current N/V pairs for which N and V have categorical stress follow one of the
three patterns shown in Table i3.i. 4 The fourth logically possible pattern, {2,1}, does
not occur; as discussed below, this patttern is also never observed diachronically. At
any given time, variation exists in the pronunciation of some N/V pairs, e.g. research,
address in present-day American English.
Variation and change in the stress of N/V pairs have a long history. Change in N/V
pair stress was first studied in detail by Sherman (1975), and subsequently by Phillips
(1984). Sherman (1975) found that many words have shifted stress since the first
dictionary listing stress appeared (1570), largely to {i, 2J. 5 On the hypothesis that this
was lexical diffusion to {i, 2}, he counted 149 pairs listed with {i, 2} or possible {1,2}
pronunciation in two contemporary dictionaries, one British and one American, and
2
These models are sampled from a larger project (Sonderegger 2009; Sonderegger and Niyogi 2010),
whose goal is to determine which model properties lead to dynamics consistent with the stress data, and
with observations about variation and change more generally.
3
The number of N/V pairs in current use depends on the method used to count. Many examples are
clear, but others have rarely-used N or V forms (e.g. collect} which are still listed in dictionaries.
4
We use curly brackets to denote N and V stress, where i = initial stress and 2 = final stress.
5
However, most words are not first listed until 1700 or later.
204 Morgan Sonderegger and Partha Niyogi
examined when the shift for each N/V pair took place. We call these 149 words List i
(Appendix A). Sherman found the stress of all words in List i for all dictionaries listing
stress information published before 1800, and concluded that many words were {i, 2}
by 1800, and those that were not must have shifted at some point by 1975. We will
revisit the hypothesis of lexical diffusion to {1,2} below, after examining the dynamics
of an expanded dataset.
Because Shermans study only considers N/V pairs which are known to have
changed to {1,2} by 1975, it does not tell us about the stability of the {1,1}, {2,2},
and {1,2} pronunciations in general. Over a random set of N/V pairs in use over
a fixed time period, is it the case that most pairs pronounced {1,1} and {2,2} shift
stress to {1,2}?
List 2 (Appendix B) is a set of no N/V pairs, chosen at random from all N/V pairs
which (a) have both N and V frequency of at least one per million words in the British
National Corpus; (b) have both N and V forms listed in a dictionary from 1700 (Boyer
1700); (c) have both N and V forms listed in a dictionary from 1847 (James and Mole
1847). These criteria serve as a rough check that the N and V forms of each word have
been in use since 1700.
In List 2, only 11.8 per cent of the words have changed stress at all between 1700 and
2007. Those stress shifts observed are mostly as described by Sherman, from {2,2} to
{1,2}, and mostly for words from List i. But this quick look suggests that when the set
of all N/V pairs is sampled over a 300 year period, most words do not change stress:
{i, i}, {i, 2}, and {2, 2} are all 'stable states,' to a first approximation. From this perspec-
tive, both sides of the actuation problem are equally puzzling for the dataset: why do
the large majority of N/V pairs not change, and what causes change in those that do?
6
The dictionary list is in Sonderegger (2009); the stress data are available on the first author s web page.
13. Variation and change in English noun/verb pair stress 265
{2,2} -> {i, 2} concert, content, digest, escort, exploit, increase, permit,
presage, protest, suspect
{i, 1} -> {i, 2} combat, dictate, extract, sojourn, transfer
{i, 2} -> {i, 1} collect,prelude, subject
{1,2} -> {2,2} cement
population, either due to variation within individuals (e.g. the dictionary's author(s))
or variation across individuals (each using initial or final stress exclusively). At a given
time, the N or V forms for many words in List i are rare, archaic, or not in use. The
pattern {2,1} is never observed.
Changes in individual N/V pairs' pronunciations can be visualized by plotting the
moving average of their N and V form stresses. To represent averages of reported
stresses on a scale, we need to map reported stresses s as numbers /(s) in [1,2].
We use
This measure overestimates variation between i and 2 by interpreting 1/2 and 2/1 as
meaning equal variation between i and 2.7
For a word w at time f, the average of pronunciations reported in the time window
(t 25, t + 25) (years) was plotted if at least one dictionary in this time window listed
pronunciation data for w. So that the trajectories would reflect change in one dialect
of English, only data from British dictionaries were used. Figures 13.1-13.2 show a
sample of the resulting 149 stress vs. time trajectories.8
Four types of complete stress shift, defined as a trajectory moving from one end-
point ({i, i}, {1,2}, or {2,2}) to another, are observed, ordered by decreasing fre-
quency in Table 13.2. The types differ greatly in frequency: {2,2}^{1,2} is by far the
most common, while there are only i or 2 clear examples of{i,2}>>{2,2}. For both
the {1,1} and {2,2} patterns, change to {1,2} occurs more frequently than change from
{1,2}. Change directly between {i, 1} and {2, 2} never occurs. A sample of each type
is shown in Figure 13.1.
7
In fact, dictionary authors often state that the first listed pronunciation is primary,' so that 1/2, 2/1,
and 1.5 could represent different types of variation in the population, in view of which we might want to
set/(i/2) < 0.5 and/(2/i) > 0.5. In practice, 1/2 and 2/1 are uncommon enough that trajectories plotted
with/(i/2) and/(2/i) changed look similar, at least with respect to the qualitative terms in which we
describe trajectory dynamics below.
8
All trajectories are given in Sonderegger (2009), and posted on the first authors web page.
266 Morgan Sonderegger and Partha Niyogi
FIGURE 13.1 Sample trajectories i: change between endpoints. Solid/dotted lines are moving
averages of N/V stress respectively
13. Variation and change in English noun/verb pair stress 267
For a given N or V stress trajectory, variationa moving average value greater than
i and less than 2could either be due to dictionary entries reporting variation, or
a mix of dictionary entries without variation reporting (exclusively) initial or final
stress. To give an idea of how often variation is reported in individual dictionary
entries, Table 13.3 shows the percentages of entries (with both N and V stresses listed)
reporting variation in N, V, or neither. Variation occurs within N or V in 13.9 per cent
of entries, but variation in both N and V at once is relatively uncommon (2.2 per cent
of entries).
What is the diachronic behavior of the variation observed in the stress trajec-
tories? Examining all trajectories, we can make some impressionistic observations.
Short-term variation near endpoints (converse; Figure i3.2a) is relatively common.
Long-term variation in one of the N or V forms (exile; Figure i3.2b) is less common;
long-term variation in both the N and V forms at once (rampage; Figure 13.20) is rare.
The pattern {2,1} is never observed in the dataset, and we argue it is in fact 'unstable'
in the following sense. Entries 'near' {2,1}, such as (N=2/i,V=i/2) are very rare (nine
entries), and are scattered across different words and dictionaries. This means that
the few times the N form of an N/V pair comes close to having a higher probability of
final stress than the V form, its trajectory quickly changes so this is no longer the case.
In the language of dynamical systems (section 13.4.1), this suggests that the region
pron^ > prony contains an unstable fixed point (one which repels trajectories), {2,1}.
We can summarize the observed diachronic facts as follows:
1. {1,1}, {1,2}, {2,2} are 'stable states', but short-term variation around them often
occurs.
2. Long-term variation occurs, but rarely in both N and V forms simultaneously.
3. Trajectories largely lie on or near a iD axis in the 2D (pron^, pron y ) space:
{1,1} ^> {i> 2} ^> {2, 2}. Both variation and change take place along this axis.
4. Changes to {1,2} are much more common than changes from {1,2}.
5. {2,1} never occurs, and is an 'unstable state'.
Returning to the question of what kind of change is taking place, we see that to a
first approximation and restricted to List i, Sherman was correct: most change takes
13. Variation and change in English noun/verb pair stress 269
FIGURE 13.2 Sample trajectories 2. (a) Short-term variation; (b) long-term variation in the V
form; (c) long-term variation in both N and V forms. Solid/dotted lines are moving averages of
N/V stress respectively
i/o Morgan Sonderegger and Partha Niyogi
FIGURE 13.3 Schematic of observed changes. Each oval represents a stable state: {1,1}, {1,2},
and {2,2} are the observed N/V pair stress patterns, and {1,0} and {0,2} indicate disyllabic
words without V and N forms, respectively. Solid lines indicate observed N/V pair stress shifts,
with line thickness indicating the relative frequency of each shift; e.g. {2,2}^ {1,2} is the most
frequent and {1,2}> {2,2} the least frequent. Dotted lines indicate all ways in which an N or V
form can come into or fall out of use
place to {i, 2}. But taking into account that change from {1,2} also occurs, and that
most words in stable states never change, the diachronic picture is more completely
schematized as in Figure 13.3. The observed dynamics are thus more complicated
than diffusion to {i, 2}. To understand their origin, we consider below (section 13.3)
proposed mechanisms driving stress shift in N/V pairs.
13. Variation and change in English noun/verb pair stress 271
research 9 6 2
perfume 2 3 4
address 2 2 l
9
The terminology is slightly misleading because the structure of variation (the a stored) differs
between speakers in 'within-speaker' variation as well.
10
See Sonderegger (2009) for details, including the list of stories.
2/2 Morgan Sonderegger and Partha Niyogi
13.3.1 Mistransmission
An influential line of research holds that many sound changes are based in asymmetric
transmission errors: because of articulatory factors (e.g. coarticulation), perceptual
biases (e.g. confusability between sounds), or ambient distortion between produc-
tion and perception, listeners systematically mishear some sound a as /3, but rarely
mishear as a.11 Such asymmetric mistransmission is argued to be a necessary con-
dition for the change a >> at the population level, and an explanation for why the
change a >> is common, while the change >> a is rarely (or never) observed.
Mistransmission-based explanations were pioneered by Ohala (1981 et seq.), and
have been the subject of much recent work (reviewed by Hansson 2008).
Although N/V pair stress shifts are not sound changes, their dynamics are
potentially amenable to mistransmission-based explanation. There is significant
experimental evidence for perception and production biases in English listeners
consistent with the most commonly-observed diachronic shifts ({2,2}, {1,1}>>{i,2}).
English listeners strongly prefer the typical stress pattern (N=i or =2) in novel
English disyllables (Guin et al. 2003), show higher decision times and error rates
(in a grammatical category assignment task) for atypical (N=2 or V=i) than for
typical disyllables (Arciuli and Cupples 2003), and produce stronger acoustic cues
for typical stress in (real) English N/V pairs (Sereno and Jongman 1995).12 It is also
known that for English disyllables, word stress is misperceived more often as initial
in 'trochaic-biasing' contexts, where the preceding syllable is weak or the following
syllable is heavy; and more often as final in analogously 'iambic-biasing' contexts.
This effect is more pronounced for nouns than for verbs; and nouns occur more
frequently in trochaic contexts (Kelly and Bock 1988; Kelly 1988,1989). Michael Kelly
and collaborators have argued these facts are responsible for both the N/V stress
asymmetry and the directionality of N/V pair stress shifts.
13.3.2 Frequency
Stress shift in English N/V pairsin particular the most common change, the dia-
tonic stress shift (DSS; {2,2}>>{i,2})has been argued to be a case of analogical
11
A standard example is final obstruent devoicing, a common change cross-linguistically. Blevins
(2006) summarizes the evidence that there are several articulatory and perceptual reasons why final voiced
obstruents could be heard as unvoiced, but no motivation for the reverse process (final unvoiced obstruents
heard as voiced).
12
For example, Sereno and Jongman find that the ratio of amplitudes of the first and second syllables
an important cue to stressis greater for initially-stressed N/V pairs (e.g. polic) read in noun context,
compared to verb context.
13. Variation and change in English noun/verb pair stress 273
change (Hock 1991; Kiparsky 1995) or lexical diffusion (Sherman 1975; Phillips 1984,
2006); indeed, the relationship between the two is controversial (see Phillips 2006
vs. Kiparsky 1995; Janda and Joseph, 2003). For both types of change, frequency has
been argued to play a role in determining which forms change first; in particular,
lower-frequency forms are said to be more susceptible to analogical change (e.g.
Manczak 1980), or to change first in cases of lexical diffusion which require lexical
analysis' (Phillips 2006), such as N/V stress shifts. This type of effect has been demon-
strated for the most common N/V stress shift: words with lower frequencies are more
likely to undergo the DSS (Phillips 1984; Sonderegger in press). More precisely, among
a set of N/V pairs pronounced as {2,2} in 1700, those with lower present-day combined
N+V frequency are more likely to have changed to {1,2} by the present.13
There is, however, an important ambiguity to this finding: present-day frequencies
are used, under the implicit assumption that they have changed little diachroni-
cally. We must therefore distinguish between (at least) two hypotheses for why low-
frequency words change (on average) earlier:
Under Hypothesis 2, the reason present-day frequencies are on average lower for
words which have changed is that their frequencies have decreased diachronically.
We can begin to differentiate between these hypotheses by examining diachronic
frequency trajectories for N/V pairs which have changed, and checking whether they
show negative trends. Real-time frequency trajectories (combined N+V frequencies)
were found for six N/V pairs (combat, decrease, dictate, perfume, progress, protest)
which have shifted stress since 1700.14 Figure 13.4 shows frequency trajectories along-
side pronunciation trajectories for these pairs.
Frequencies were found by sampling from prose written by British authors in the
Literature Online (LiOn) database, then normalizing against frequency trajectories
for a set of four reference words. Details and some justification for this normalization
step are given in Appendix C.15
13
Sonderegger (in press) argues that frequency and phonological structure interact to influence which
words undergo the DSS first. Here we refer to the finding that there is a significant main effect of frequency
once prefix class is taken into account.
14
A reviewer suggests that either N or V frequency alone would be a more relevant measure for
particular changes, i.e. change in the stress of the N form might be triggered by change in its frequency or in
the V form's frequency. This seems plausible, and we plan to consider frequency trajectories more carefully
in future work; here we consider N+V frequency rather than N or V frequency alone for compatibility with
previous work (Phillips 1984; Sonderegger in press), where N+V frequency is used.
15
lion. chadwyck. com. Only six words/four reference words were considered because finding trajec-
tories is time-intensive.
2/4 Morgan Sonderegger and Partha Niyogi
FIGURE 13.4 Frequency (lower) and pronunciation (upper) trajectories for combat, decrease,
dictate, perfume, progress, protest
All words show negative correlations between year and N+V frequency, four
out of six of which are significant (p < o.O5).16 Although any conclusion must be
tentative in view of the small number of frequency trajectories considered, these
16
Alphabetically: r = 0.78 (p < o.ooi), r = 0.78 (p < o.i), r = 0.79 (p < o.o), r = 0.32
(p > 0.25), r = 0.76 (p < 0.05), r = 0.74 (p < o.oi).
13. Variation and change in English noun/verb pair stress 275
negative correlations lend support to Hypothesis 2, and rule out the hypothesis that
the frequency trajectories for N/V pairs show no long-term trends. We thus adopt
the working hypothesis that change occurs in an N/V pair when its frequency drops
below a critical level.
13.3.3 Analogy/Coupling
A very broad explanation often invoked in language change is analogy: linguistic
elements which are similar by some criterion change to become more similar. In the
case of N/V pairs, it has been suggested that the most common stress shift, from {2,2}
to {1,2}, could be due to analogical pressure: given the strong tendency in English for
nouns to have earlier stress than verbs (e.g. Ross 1973), speakers 'regularize' {2,2} pairs
to follow the dominant pattern of stress in the lexicon (Phillips 2006: 37-9).
In the context of our N/V diachronic pronunciation trajectories, we restate analogy
as coupling between trajectories. We can check for coupling effects at two levels: within
N/V pairs, and within prefix classes:
Within prefix classes Impressionistically, over all N/V pair trajectories, those for
pairs sharing a prefix often seem more similar than would be expected by chance.
For example, many re- pairs were historically {2,2}, then began to change sometime
between 1875 and 1950. We would like a principled way to test the hypothesis of
coupling between the trajectories for words in the same prefix class; to do so, we
need a way to test how much two words change like' each other, or how similar
their trajectories are. We use a simple distance metric over trajectories' dissimilarity
('distance'), denoted d(w> wf) (for N/V pairs w and wf).u
Finding d(w> wf) for all possible word pairs defines a graph G(d) with nodes
M>I, . . . , w149, and edges d(wfy Wj) equal to the distance between w/'s and Wj's trajecto-
ries. This structure suggests a way of testing whether, given a group of words which are
linguistically related, their trajectories are similar: check the goodness of the cluster
formed by their vertices in G. For a subset of vertices C G [n] of G = (V>E)y define
17
Over both N and V trajectories, the sum of the mean trajectory difference and the mean difference
between trajectory first differences. Details are given in Sonderegger (2009).
2/6 Morgan Sonderegger and Partha Niyogi
R(C) to be the mean in-degree of C minus the mean out-degree of C.18 R(C) will be
high if most vertices of C are on average closer to each other than to vertices in V \ C.
This quantity is adapted from a common metric for finding community structure in
networks (Newman and Girvan 2004), with the important difference that here we
are only evaluating one hypothesized community rather than a partitioning of G into
communities.
As a measure of the goodness of a cluster C, let p(C) G [o, i] be the empirical
p-value, defined as the location of R(C) on the distribution of R for all communities
of size |C| in G. The closer the value ofp(C) to zero, the more similar the trajectories
for words in C are, compared to trajectories of a random set of words of size |C|. This
setup can be used to test whether words in List i which share a prefix have similar
trajectories. Table 13.5 shows p(C) for all prefix classes of size |C| > 2.
Many potential prefix classes have small p(C), confirming the initial intuition that
N/V pairs sharing a prefix tend to have more similar trajectories. The corn-Icon- and
im-1 in- categories are particularly interesting because they suggest that it is a shared
morphological prefix rather than simply shared initial segments which correlates with
trajectory similarity. The value of p(C) for combined com- and con- is lower than for
either alone, and the same holds for im-/in-; this makes sense under the assumption
that in- and im- are allomorph of a single underlying prefix.
TABLE 13.5 Prefix class p (C) values, |C| > 2. 'Bound' = re-fi, where
fji is a bound morpheme
c |C| p(Q c |C| p(Q
a- 10 0.270 out- 10 0.055
com- 5 0.067 per 3 0.263
comp- 3 0.032 pre 5 0.065
con- 17 o.ooi pro 4 0.078
cont- 4 0.266 re- 24 0.011
conv- 4 0.033 re- (bound) 8 0.576
conWcon- 22 0.0005 re- (unbound) 16 0.0017
de- 7 0.285 sub- 3 0.710
de- w/o des- 5 0.050 sur- 2 0.475
dis- 5 0.746 trans- 3 0.173
ex- 6 0.981 up- 7 0.196
im- 4 0.02l
in- 12 0.029
im-/in- 16 0.004
18
13. Variation and change in English noun/verb pair stress 277
We also find that larger classes have lowerp(C): there is a significant negative rela-
tionship between |C| andlog(p(C)) (r = 0.72, p < io~ 4 ) for the data in Table 13.5.
That is, larger classes show stronger analogical effects, in the sense of trajectory simi-
larity considered here.
13.4 Modeling
We have so far described the diachronic dynamics of variation and change in the
stress of N/V pairs, and proposed causes for these dynamics. We now build dynamical
systems models to test whether some proposed causes, implemented in the learn-
ing algorithm used by individuals, lead to one aspect of the observed dynamics at
the population level: change following long-term stability, which in the language of
dynamical systems corresponds to the presence of a bifurcation. This is only one of the
multiple patterns observed in the data; the remainder are in part addressed elsewhere
(Sonderegger 2009; Sonderegger and Niyogi 2010) and in part left to future work.
These are idealizations, adopted here to keep models relatively simple. The effects of
dropping each assumption are explored in Niyogi (2006) and Sonderegger (2009).
We also assume here that probabilities of producing initial vs. final stress for nouns
and verbs are learned separately: that is, there is no 'coupling' between them. However,
13. Variation and change in English noun/verb pair stress 279
a range of models for the N/V case incorporating coupling are considered in Son-
deregger (2009) and Sonderegger and Niyogi (2010).
21
This model of the learner is similar in spirit to the idea of'input filtering' suggested in Lisa Pearl's
computational studies of English acquisition and change (Pearl 2007 et seq.), where learners consider only
examples relevant to the cue currently being set.
28o Morgan Sonderegger and Partha Niyogi
examples (N=i, V=2) are stronger than in atypically-stressed examples (N=2, V=i)
for at least some speakers (Sereno and Jongman 1995), some atypically-stressed exam-
ples might be discarded by learners.
We define discarding probabilities that form i or form 2 examples are discarded:
and define p/^ as above. For learner i in generation t + i, the algorithm is:
Draw N examples from generation f, of which k^ i+1 are heard as form 2, k^ i+1
as form i, and N k^ij+i k^2\yt+i are discarded.
Setp/, i+1 =
where r e [o,ij.
That is, the learner's default strategy when all examples are discarded is to set p = r
(for r fixed). For any N and non-zero discarding probabilities, there is always some
chance (though possibly very small) that all examples are discarded. Where r comes
from is left ambiguous; for example, it could be the percentage of known disyllabic
words with final stress.
The evolution equation works out to
U3-2)
(see Sonderegger 2009: section 5.5.1). In the high-frequency (N >> oo) limit, this
reduces to:
(13.3)
In practice, the long-term dynamics of Eqn 13.3in particular the location of the
unique stable fixed pointare extremely similar to the true (frequency-dependent)
evolution equation (Eqn 13.2) for N greater than a small value (^3-5, depending on
the values of r1 and r 2 ). That is, the long-term dynamics are only affected by frequency
for very small N. We thus only consider Eqn 13.3 here.
Solving (Xt+i = ot in Eqn 13.3 gives two fixed points: o- = o and a+ = i. There
is a bifurcation at r x = r2: for r x < r 2 , U- is stable and a+ is unstable; for r2 < r y o-
is unstable and a+ is stable. Intuitively, the form with a higher probability of being
discarded is eliminated.
where H=heard', I=intended'. Values a and b are now the probabilities of not hearing
a form i example as form L When this occurs, the probability than the example is
heard as the wrong form, rather than being discarded, is R.
The learning algorithm for member i of generation t + i is the same as in Model 2,
but now k^2\j+i may include some mistransmitted form i examples (and similarly
forfcW/.f+J.'
Analysis of the resulting evolution equation (Sonderegger 2009: section 5.5.2)
shows there is a single fixed point, a*, and thus no bifurcations. Similarly to Model
2, there is essentially no effect of frequency on long-term dynamics for N above a
relatively small value, and we thus consider the high-frequency limit of the evolution
equation. The location of c* as a function of a b as R is varied is plotted in Fig-
ure 13.5. -R controls how 'bifurcation-like' the curve is: for R small, c* changes rapidly
13.5 Discussion
We have described a diachronic corpus of N/V pair stress, the dynamics of stress
shifts observed in the corpus, and several proposed factors driving this change: mis-
transmission, word frequency, and analogy. We then determined the population-level,
diachronic dynamics of three models of learning, to explore which models show
bifurcations, i.e. which give stability followed by sudden change as system parameters
are varied. We did not evaluate models with respect to the frequency or analogical
effects observed in the corpus (sections 13.3.2-13.3.3); however, both are considered
in the larger set of models described elsewhere (Sonderegger 2009; Sonderegger and
Niyogi 2010).
Following an idea proposed by Niyogi (2006), we suggest that bifurcations in the
diachronic dynamics of a linguistics population are a possible explanation for the core
of the actuation problem: how and why does language change begin in a community,
following long-term stability? This viewpoint suggests a powerful test of theories of the
causes of language change: do their diachronic dynamics show bifurcations? We found
that mistransmission alone (Model i) does not give bifurcations, while discarding
alone (Model 2) does: the form more likely to be discarded is eliminated from the
population. Combining mistransmission and discarding (Model 3) eliminates bifur-
cations, but gives more or less bifurcation-like behavior as the relative probability of
mistransmission and discarding is varied.
In line with other computational work on population-level change where several
models are considered (e.g. Liberman 2000; Daland et al. 2007; Baker 2008; Troutman
et al. 2008), the different dynamics of Models 1-3 illustrate that different proposed
causes for change at the individual level, each of which seems plausible a priori, can
have very different population-level diachronic outcomes. Among models tested here,
only those including discarding showed bifurcations (Model 2) or bifurcation-like
behavior (Model 3); the model including only mistransmission (Model i) did not.
Given the popularity of mistransmission-based explanations of phonological change,
this result illustrates an important point: because of the non-trivial map between
individual learning and population dynamics, population-level models are necessary
to evaluate any theory of why language change occurs.
13. Variation and change in English noun/verb pair stress 283
(that the N/V pairs considered decrease in frequency over time) to be valid, it must be
the case that this set of reference words remains approximately constant in frequency
over time. We checked that these words' frequencies show no time trends in two ways.
First, when normalized by the LiOn frequency trajectory of one extremely frequent
word (the)y whose frequency presumably is approximately constant diachronically, the
sum frequency of the reference words shows no time trend (p > o.i for both Pearson
and Spearman correlations). Second, the summed relative frequencies of the set of
reference words (i.e. occurrences per million) show no time trends (p > 0.15, Pear-
son and Spearman) in the Corpus of Historical American English (COHA), which
includes 400 million words from 1810-2000.22 Although COHA covers a different
dialect of English and a somewhat different time period than the N/V pair frequency
trajectories, it is the largest available diachronic corpus of English, and thus provides
some reassurance that the summed frequency of the set of reference words chosen is
not especially volatile.
Arvaniti, Amalia (2007). Greek phonetics: The state of the art. Journal of Greek Linguistics, 8,
97-208.
Ashby, E Gregory and Maddox, W. Todd (1993). Relations between prototype, exemplar,
and decision bound models of categorization. Journal of Mathematical Psychology, 37,
372-400.
Aslin, Richard N. and Pisoni, David B. (1980). Some developmental processes in speech per-
ception. In Child phonology: Perception (eds. G. Yeni-Komishan, J. Kavanaugh, and C. Fer-
guson), Volume 2, pp. 67-96. Academic Press, New York.
Hennessy, Beth L., and Percy, Alan J. (1981). Discrimination of voice onset time
by human infants: New findings and implications for the effects of early experience. Child
Development, 52,1135-45.
Ausburn, Lynna J. and Ausburn, Floyd B. (1978). Cognitive styles: Some information and
implications for instructional design. Educational Communication and Technology, 26(4),
337-54-
Austin, Elizabeth J. (2005). Personality correlates of the broader autism phenotype as assessed
by the autism spectrum quotient (AQ). Personality and Individual Differences, 38, 451-60.
Aylett, Matthew and Turk, Alice (2004). The smooth signal redundancy hypothesis: A func-
tional explanation for relationships between redundancy, prosodie prominence, and dura-
tion in spontaneous speech. Language and Speech, 47(1), 31-56.
Baayen, R. Harald (2008). Analyzing linguistic data: A practical introduction to statistics. Cam-
bridge University Press, Cambridge.
Piepenbrock, Richard, and Gulikers, Leon (1996). CELEX2 (CD-ROM). Linguistic Data
Consortium, Philadelphia.
Babel, Molly (2009). Phonetic and social selectivity in speech accomodation. PhD thesis,
University of California, Berkeley.
(2010). Dialect convergence and divergence in New Zealand English. Language in Soci-
ety, 39(4), 437-56.
and McGuire, Grant (2010). A cross-modal account for synchronie and diachronic pat-
terns of /{/ and /6/. Unpublished manuscript, University of British Columbia and University
of California, Santa Cruz.
Bailey, Anthony, Couteur, Ann Le, Gottesman, Irving, Bolton, Patrick, Simonoff, Emily, Yuzda,
E., and Rutter, Michael (1995). Autism as a strongly genetic disorder: evidence from a British
twin study. Psychological Medicine, 25, 63-77.
Baker, Adam (2008). Addressing the actuation problem with quantitative models of sound
change. Penn Working Papers in Linguistics, 14(1), 1-13.
Baldi, Pierre and Itti, Laurent (2010). Of bits and wows: A Bayesian theory of surprise with
applications to attention. Neural Networks, 23(5), 649-66.
Baran, Jane A., Zlatin Laufer, Marsha, and Daniloff, Ray (1977). Phonological contrastivity in
conversation: a comparative study of voice onset time. Journal of Phonetics, 5, 339-50.
Barnes, Jonathan (2006). Strength and weakness at the interface: Positional neutralization in
phonetics and phonology. Mouton de Gruyter, Berlin.
Baron-Cohen, Simon (2002). The extreme male brain theory of autism. Trends of Cogntive
Sciences, 6, 248-54.
(2003). The essential difference: Men, women and the extreme male brain. Penguin, London.
References 287
Baron-Cohen, Simon, Richler, Jennifer, Bisarya, Dheraj, Gurunathan, Nhishanth, and Wheel-
wright, Sally (2003). The Systemising Quotient (SQ): An investigation of adults with
Asperger Syndrome or high functioning autism and normal sex differences. Philosophical
Transactions of the Royal Society, Series B, 358, 361-74.
and Wheelwright, Sally (2004). The Empathy Quotient: An investigation of adults with
Asperger Syndrome or High Functioning Autism and normal sex differences. Journal of
Autism and Developmental Disorders, 34(2), 163-75.
Hill, Jacqueline, Raste, Yogini, and Plumb, Ian (200la). The Reading the Mind in the
Eyes Test revised version: a study with normal adults, and adults with Asperger syndrome
or high-functioning autism. Journal of Child Psychology and Psychiatry, 42, 241-51.
Skinner, Richard, Martin, Joanne, and Clubley, Emma (20oib). The Autism-
Spectrum Quotient (AQ): evidence from Asperger syndrome/high-functioning autism,
males, females, scientists and mathematicians. Journal of Autism and Developmental Dis-
orders, 31, 5-17.
Baudouin de Courtenay, Jan N. (i972a). An attempt at a theory of phonetic alternations [origi-
nally published in 1895]. In A Baudouin de Courtenay anthology: The beginnings of structural
linguistics (ed. E. Stankiewicz), Indiana University Studies in the History and Theory of
Linguistics, pp. 144-212. Indiana University Press, Bloomington. Edited and translated by
E. Stankiewicz (1972).
(i972b). The difference between phonetics and psychophonetics [originally published in
1927]. In A Baudouin de Courtenay anthology: The beginnings of structural linguistics (d. E.
Stankiewicz). Indiana University Press.
Baumbach, Ernst J. M. (1987). Analytical T songa grammar. University of South Africa, Pretoria.
Bayliss, Andrew P. and Tipper, Steven P. (2005). Gaze and arrow cueing of attention reveals
individual differences along the autism spectrum as a function of target context. British
Journal ofPscyholgy, 96, 95-114.
Beckman, Jill (1997). Positional faithfulness, positional neutralization, and Shona vowel har-
mony. Phonology 14,1-46.
Beddor, Patrice S. (2009). A coarticulatory path to sound change. Language, 85(4), 785-832.
Brasher, Anthony, and Narayan, Chandan (2007). Applying perceptual methods to the
study of phonetic variation and sound change. In Experimental approaches to phonology (eds.
M.-J. Sol, P. S. Beddor, and M. Ohala), pp. 125-43. Oxford University Press.
Harnsberger, James D., and Lindemann, Stephanie (2002). Language-specific patterns of
vowel-to-vowel coarticulation: acoustic structures and their perceptual correlates. Journal of
Phonetics, 30, 591-627.
and Krakow, Rena A. (1999). Perception of coarticulatory nasalization by speakers of
English and Thai: Evidence for partial compensation. Journal of the Acoustical Society of
America, 106(5), 2868-87.
and Foldstein, Lovis (1986). Perceptual constraints and phonological change: a study
of nasal vowel height. In Phonology yearbook 3 (eds. C. Ewen and J. Anderson), pp. 197-217.
Cambridge University Press.
and Lindemann, Stefanie (2001). Patterns of perceptual compensation and their
phonological consequences. In The role of perceptual phenomena in phonology (eds. E. Hume
and K. Johnson), pp. 55-78. Academic Press.
288 References
Behrens, Susan and Blumstein, Sheila (1988). On the role of amplitude of the fricative noise
in the perception of place of articulation in voiceless fricative consonants. Journal of the
Acoustic Society of America, 84(3), 861-7.
Bell, Allan (1984). Language style as audience design. Language in Society, 13,145-204.
Bell, Alan, Brenier, Jason M., Gregory, Michelle, Girand, Cynthia, and Jurafsky Dan (2009).
Predictability effects on durations of content and function words in conversational English.
Journal of Memory and Language, 60, 92-111.
Bell-Berti, Fredericka and Harris, Katherine (1976). Some aspects of coarticulation. Haskins
Laboratories Status Report on Speech Research, SR45/46,197-204.
Berg, Thomas (1998). Linguistic structure and change: An explanation from language processing.
Oxford University Press, Oxford.
Bergem, Dick R. Van (1993). Acoustic vowel reduction as a function of sentence accent, word
stress, and word class. Speech Communication, 12,1-23.
Bernstein Ratner, N. (1984). Patterns of vowel modification in mother-child speech. Journal of
Child Language, 11, 557-78.
Bessell, Nicola J. (1998). Local and non-local consonant-vowel interaction in Interior Salish.
Phonology, 15,1-40.
Bladon, Richard A. W. and Al-Bamerni, Ameen (1976). Coarticulation resistance in English III.
Journal of Phonetics, 4,137-50.
Blevins, Juliette (2004). Evolutionary phonology: The emergence of sound patterns. Cambridge
University Press, Cambridge.
(2005). Understanding antigemination: Natural or unnatural history. In Linguistic diver-
sity and language theories (eds. Z. Frajzyngier, D. Rood, and A. Hodges), pp. 203-34.
Benjamins, Amsterdam.
(20o6a). A theoretical synopsis of Evolutionary Phonology. Theoretical Linguistics, 32(2),
117-66.
(2oo6b). Reply to commentaries. Theoretical Linguistics, 32, 245-56.
(20o8a). Consonant epenthesis: Natural and unnatural histories. In Proceedings of the
Workshop on Explaining Linguistic Universals (ed. J. Good), pp. 79-107. Oxford University
Press, Oxford.
(2oo8b). Natural and unnatural sound patterns: A pocket field guide. In Naturalness and
iconicity in language (eds. K. Willems and L. D. Cuypere), pp. 121-48. John Benjamins,
Amsterdam.
and Garrett, Andrew (1998). The origins of consonant-vowel metathesis. Language, 74,
508-56.
(2004). The evolution of metathesis. In Phonetically based phonology (ed. B. Hayes,
R. Kirchner, and D. Steriade), pp. 117-56. Cambridge University Press, Cambridge.
and Wedel, Andrew (2009). Inhibited sound change: An evolutionary approach to lexical
competition. Diachronica, 26,143-83.
Bloomfield, Leonard (1933). Language. H. Holt and Company, New York.
Blumstein, Sheila E., Baker, Errol, and Goodglass, Harold (1977). Phonological factors in audi-
tory comprehension in aphasia. Neuropsychologia, 15,19-30.
Bod, Rens, Hay, Jennifer, and Jannedy, Stephanie (2003). Probabilistic linguistics. MIT Press.
References 289
Browman, Catherine P. and Goldstein, Louis (1988). Some notes on syllable structure in artic-
ulatory phonology. Phonetica, 45, 140-55.
(i99oa). Gestural specification using dynamically-defined articulatery structures.
Journal of Phonetics, 18, 299-320.
(i99ob). Tiers in articulatory phonology, with some implications for casual
speech. In Papers in laboratory phonology I: Between the grammar and the physics of
speech (eds. M. Beckman and J. Kingston), pp. 341-76. Cambridge University Press,
Cambridge.
Bullock, Daniel (2004). Adaptive neural models of queuing and timing in fluent action. Trends
in Cognitive Sciences, 8(9), 426-33.
and Rhodes, Bradley J. (2003). Competitive queuing for serial planning and performance.
In Handbook of brain theory and neural networks (ed. M. Arbib), pp. 241-4. MIT Press,
Cambridge, MA.
Burton, Martha W., Small, Steven L., and Blumenstein, Sheila E. (2000). The role of seg-
mentation in phonological processing: An fMRI investigation. Journal of Cognitive Neuro-
science, 12, 679-90.
Butskhrikidze, Marika and de Weijer, Jeroen Van (2001). On v-metathesis in modern Geor-
gian. In Surface syllable structure and segment sequencing, pp. 91-101. Holland Institute of
Linguistics.
Bybee, Joan (1985). Morphology: a study of the relation between meaning and form. John Ben-
jamins, Amsterdam.
(2001). Phonology and language use. Cambridge University Press, Cambridge.
(2002). Word frequency and context of use in the lexical diffusion of phonetically condi-
tioned sound change. Language Variation and Change, 14, 261-90.
(2007). Frequency of use and the organization of language. Oxford University Press, New
York.
Chakraborti, Paromita, Jung, Dagmar, and Scheibman, Joanne (1998). Prosody and seg-
mentai effect: Some paths of evolution for word stress. Studies in Language 22, 267-314.
and Hopper, Paul (eds.) (2001). Frequency and the emergence of linguistic structure. John
Benjamins, Amsterdam.
Bye, Patrik (2011). Dissimilation. In The Blackwell companion to phonology (eds. M. van Oos-
tendorp, C. J. Ewen, E. Hume, and K. Rice), Chapter 63, pp. 1408-33. Wiley-Blackwell,
Oxford.
Byrd, Dani (1994). Articulatory timing in English consonant sequences. Volume 86, Working
Papers in Phonetics. Department of Linguistics, UCLA, Los Angeles.
and Saltzman, Elliot (1998). Intragestural dynamics of multiple prosodie boundaries.
Jour nal of Phonetics, 26,173-99.
Catucoli, Claude (1978). Schemes tonals et morphologie du verbe en masa. In Pralables
la reconstruction du proto-tchadique (eds. J.-P. Caprile and H. Jungraithmayr), pp. 67-93.
SELAF, Paris.
Camacho, Arturo (2007). SWIPE': A sawtooth waveform inspired pitch estimator for speech and
music. PhD thesis, University of Florida.
Campbell, Lyle (2004). Historical linguistics: An introduction (2nd edn). MIT Press, Cambridge,
Mass.
References 291
Choudhury, Monojit (2007). Computational models of real world phonological change. PhD
thesis, Indian Institute of Technology, Kharagpur, India.
Christophe, Anne, Peperkamp, Sharon, Pallier, Christophe, Block, Eliza, and Mehler, Jacques
(2004). Phonological phrase boundaries constrain lexical access: I. Adult data. Journal of
Memory and Language, 51, 523-47.
Clark, Herbert H. and Murphy, George L. (1982). Audience design in meaning and reference.
In Language and comprehension (eds. J. E L. Ny and W. Kintsch.) Vol. 9, pp. 287-99. North-
Holland, Amsterdam.
Clayards, Meghan (2008). The ideal listener: Making optimal use of acoustic-phonetic cues for
word recognition. PhD thesis, University of Rochester.
Tanenhaus, Michael K., Aslin, Richard, and Jacobs, Robert A. (2008). Perception of speech
reflects optimal use of probabilistic speech cues. Cognition, 108, 804-9.
Clements, George N. (1985). The geometry of phonological features. Phonology yearbook, 2,
225-52.
(2005). Universal trends vs. language-particular variation in feature specification: Com-
ments on a paper by Elan Dresher. Handout of presentation at the Workshop on Phonolog-
ical Features, CUNY, New York, March 10-11, 2005.
and Hume, Elizabeth V. (1995). The internal organization of speech sounds. See Gold-
smith (1995), pp. 245-306.
Clopper, Cynthia G. and Pisoni, David B. (2004). Some acoustic cues for the perceptual cate-
gorization of American English regional dialects. Journal of Phonetics, 32(1), 111-40.
Cohn, Abigail C. (1992). The consequences of dissimilation in Sundanese. Phonology, 9,
199-220.
(1993). Nasalisation in English: phonology or phonetics. Phonology, 10, 43-81.
(1998). The phonetics-phonology interface revisited: where's phonetics? Texas Linguistic
Forum, 41, 25-40.
(2006). Is there gradient phonology? In Gradience in grammar: generative perspectives
(eds. G. Fanselow, C. Eery, and M. Schlesewsky), pp. 25-44. Oxford University Press.
(2007). Phonetics in phonology and phonology in phonetics. Working Papers of the
Cornell Phonetics Lab, 16,1-31.
and Riehl, Anastasia (2008). The internal structure of nasal-stop sequences: Evidence
from Austronesian. Paper presented at Laboratory Phonology 11, post-conference draft,
August 22, 2008.
Coleman, John and Pierrehumbert, Janet B. (1997). Stochastic phonological grammars and
acceptability. In Proceedings of the 3rd Meeting of the ACL Special Interest Group in Com-
putational Phonology, pp. 49-56. Association for Computational Linguistics.
Cooper, Robin P. and Aslin, Richard N. (1989). The language environment of the young
infant: implications for early perceptual development. Canadian Journal of Psychology, 43,
247-65.
Court, Christopher (1970). Nasal harmony and some Indonesian sound laws. In Pacific Lin-
guistics, Series C, No. 13 (eds. S. Wurm and D. Laycock). Australian National University,
Canberra.
Cover, Thomas and Thomas, Joy (2006). Elements of information theory (2nd edn). Wiley-
Inters cience, New York.
References 293
Crewther, David, Crewther, Daniel, Ashton, Melanie, and Kuang, Ada (2010). Left global
visual hemineglect in high Autism-spectrum Quotient (AQ) individuals. Journal of Vision,
10,358.
Croft, William (1990). Typology and universals, Chapter 3. Markedness in typology, pp. 64-94.
Cambridge University Press, Cambridge.
Crosswhite, Katherine M. (2004). Vowel reduction. In Phonetically based phonology
(ed. B. Hayes, R. Kirchner, and D. Steriade), pp. 191-231. Cambridge University Press,
Cambridge.
Crowley, Terry and Bowern, Claire (2009). Introduction to historical linguistics. Oxford Univer-
sity Press, Oxford.
Culicover, Peter W. and Nowak, Andrzej (2002). Markedness, antisymmetry and complexity of
constructions. In Variation yearbook, pp. 5-30. John Benjamins, Amsterdam.
(2003). Dynamical grammar: Minimalism, acquisition, and changes. Oxford Univer-
sity Press, Oxford.
Cutler, Anne and Norris, Dennis (1979). Monitoring sentence comprehension. In Psycholin-
guistic studies presented to Merrill Garrett (eds. W. E. Cooper and E. C. T. Walker),
pp. 113-34. Erlbaum, New Jersey.
Daland, Robert, Sims, Andrea D., and Pierrehumbert, Janet B. (2007). Much ado about nothing:
A social network model of Russian paradigmatic gaps. In Proceedings of the 4$th Annual
Meeting of the Association of Computational Linguistics, pp. 936-43. Association for Com-
putational Linguistics, Prague, Czech Republic.
Dalston, Rodger M. (1975). Acoustic characteristics of English /w,r,l/ spoken correctly by young
children and adults. Journal of the Acoustical Society of America, 57(2), 462-9.
Daneman, Meredyth and Carpenter, Patricia A. (1983). Individual differences in integrating
information between and within sentences. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 9, 561-84.
D'Ausilio, Alessandro, Pulvermller, Friedemann, Salmas, Paola, Bufalari, Ilaria, Begliomini,
Chiara, and Fadiga, Luciano (2009). The motor somatotopy of speech perception. Current
Biology, 19, 381-5.
Davidson, Lisa (2005). Addressing phonological questions with ultrasound. Clinical Linguistics
and Phonetics, 19, 619-33.
(20o6a). Phonology, phonetics, or frequency: Influences on the production of non-native
sequences. Journal of Phonetics, 34(1), 104-37.
(20o6b). Phonotactics and articulatory coordination interact in phonology: evidence
from non-native production. Cognitive Science, 30(5), 837-62.
(2007). The relationship between the perception of non-native phonotactics and loanword
adaptation. Phonology, 24, 261-86.
(2011). Phonetic, phonemic, and phonological factors in cross-language discrimination
of phonotactic contrasts. Journal of Experimental Psychology: Human Perception and Perfor-
mance, 37(1), 270-82.
Davis, Colin J. and Perea, Manuel (2005). Buscapalabras: A program for deriving orthographic
and phonological neighborhood statistics and other psycholinguistic indices in Spanish.
Behavior Research Methods, 37(4), 665-71.
294 References
de Boer, Bart (2000). Self-organization in vowel systems. Journal of Phonetics, 28, 441-65.
(2001). The origins of vowel systems. Oxford University Press, Oxford.
and Khl, Patricia (2003). Investigating the role of infant-directed speech with a computer
model. Acoustics Research Letters On-line, 4(4), 129-34.
Delattre, Pierre ( 1969). An acoustic and articulate ry study of vowel reduction in four languages.
International Review of Applied Linguistics and Language Teaching, VII, 295-325.
Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psycho-
logical Review, 93, 283-321.
(1990). Effects of frequency and vocabulary type on phonological speech errors. Language
and Cognitive Processes, 5(4), 313-49.
Demuth, Katherine and Johnson, Mark (2003). Truncation to subminimal words in early
French. Canadian Journal of Linguistics, 48(3/4), 211-41.
Denis, Derek (2010). Passive diagnostics of contrast. Presented at Montreal-Ottawa-Toronto
Phonology Workshop, Carleton University, Ottawa, Ontario, March 2010.
Daz, Begoa, Baus, Cristina, Escera, Carles, Costa, Albert, and Sbastian-Galles, Nuria (2008).
Brain potentials to native phoneme discrimination reveal the origin of individual differences
in learning the sounds of a second language. Proceedings of the National Academy of Sci-
ences, 105(42), 16083-8.
Diehl, Randy L. (2008). Acoustic and auditory phonetics: The adaptive design of speech sound
systems. Philosophical Transactions of the Royal Society, 363, 965-78.
Dieth, Eugen (1932). A grammar of the Buchan dialect (Aberdeenshire), descriptive and histori-
cal. W. Heifer and Sons, Cambridge.
Dijksterhuis, Ap and Bargh, John A. (2001). The perception-behavior expressway: Automatic
effects of social perception on social behavior. In Advances in experimental social psychology
(ed. M. P. Zanna), Volume 33, pp. 1-40. Academic Press, San Diego.
Dimmendal, G. J. (1983). The Turkana language. Foris Publications, Dordrecht.
Dimov, Svetlin, Katseff, Shira, and Johnson, Keith (in press). Social and personality variables
in compensation for altered auditory feedback. In The initiation of sound change: Percep-
tion, production, and social factors (eds. M. J. S. Sabater and D. Recasens). John Benjamins,
Amsterdam.
Donegan, Patricia and Stampe, David (1979). The study of natural phonology. In Current
approaches to phonological theory (ed. D. Dinnsen), pp. 126-73. Indiana University Press,
Bloomington.
Downing, Laura J. (2009). On pitch lowering not linked to voicing: Nguni and Shona group
depressors. Language Sciences, 31(2-3), 179-98.
Doyle, Melanie and Walker, Robin (2001). Curved saccade trajectories: Voluntary and reflex-
ive saccades curve away from irrelevant distractors. Experimental Brain Research, 139,
333-44-
Dras, Mark and Harrison, K. David (2002). Emergent behavior in phonological pattern change.
In Artifical Life VIII (eds. R. K. Standish, M. A. Bedau, and H. A. Abass), pp. 390-3. Oxford
University Press, Oxford.
Dresher, Elan (2003). Contrast and asymmetry in inventories. In Asymmetry in gram-
mar: Morphology, phonology, acquisition (ed. A. di Sciullo), pp. 237-59. John Benjamins,
Amsterdam.
References 295
Fant, Gunnar (1960). Acoustical theory of speech production. Mouton, The Hague.
Feather, Norman T. (1982). Expectations and actions: Expectancy-value models in psychology.
Lawrence Erlbaum, Hillsdale, New Jersey.
Feldman, Naomi H., Griffiths, Thomas L., and Morgan, James L. (2009). Learning phonetic
categories by learning a lexicon. In Proceedings of the 3 ist Annual Conference of the Cognitive
Science Society (eds. N. Taatgen and H. van Rijn), pp. 2208-13. Cognitive Science Society,
Austin, TX.
Ferguson, Charles A. (1973). Fricatives in child language acquisition. Papers and Reports on
Child Language Development, 6, 61-85.
Fernald, Anne (1992). Human maternal vocalizations to infants as biologically relevant sig-
nals: An evolutionary perspective. In The adapted mind (eds. J. Barkwo, L. Cosmides, and
J. Tooby), pp. 391-428. Oxford University Press, New York.
Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., and Fukui, I. (1989). A
cross-language study of prosodie modifications in mothers' and fathers' speech to preverbal
infants. Journal of Child Language, 16(3), 477-501.
Fidelholtz, James L. (1975). Word frequency and vowel reduction in English. CLS, 11, 200-13.
Finley, Sara (2008). Formal and cognitive restrictions on vowel harmony. PhD thesis, Johns
Hopkins University.
Flemming, Edward (1996). Evidence for constraints on constrast: the dispersion theory of
contrast. UCLA Working Papers in Phonology, i, 86-106.
(2001). Scalar and categorical phenomena in a unified model of phonetics and phonology.
Phonology, 18, 7-44.
(2002). Auditory representations in phonology. Routledge, New York.
(2004). Contrast and perceptual distinctiveness. In The phonetic bases of markedness
(ed. B. Hayes, R. Kirchner, and D. Steriade). Cambridge University Press, Cambridge.
Fletcher, Janet (2004). An EMA/EPG study of vowel-to-vowel articulation across velars in
Southern British English. Clinical Linguistics and Phonetics, 18(6), 577-92.
Flynn, Darin and Fulop, Sean (2008). Dentals are grave. Unpublished manuscript, University
of Calgary and California State University, Fresno.
Fontaney, Louise (1980). Le verbe. In Elments de description dupunu (ed. F. Nsuka-Nkutsi),
pp. 51-114. CRLS, Universit Lyon II.
Fosler-Lussier, Eric and Morgan, Nelson (1999). Effects of speaking rate and word fre-
quency on pronunciations in conver[sa]tional speech. Speech Communication, 29(2-4),
137-58.
Foulkes, Paul, Docherty, Gerry, and Watt, Dominic (2005). Phonological variation in child-
directed speech. Language, 81(1), 177-206.
Fowler, Carol A. (1981). A relationship between coarticulation and compensatory shortening.
Phonetica, 38, 35-50.
Francis, Alexander L. and Nusbaum, Howard C. (2002). Selective attention and the acquisi-
tion of new phonetic categories. Journal of Experimental Psychology: Human Perception and
Performance, 28(2), 349-66.
Frank, Austin F. and Jaeger, T. Florian (2008). Speaking rationally: Uniform information density
as an optimal strategy for language production. In Proceedings of the 3oth Annual Meeting
References 297
of the Cognitive Science Society (eds. B. C. Love, K. McRae, and V. M. Sloutsky), pp. 933-8.
Cognitive Science Society, Austin, TX.
Frazier, Melissa (2005). Output-output faithfulness to moraic structure: evidence from Ameri-
can English. In North East Linguistics Conference, U Mass, Amherst (eds. C. Davis, A. R. Deal,
and Y. Zabbal), pp. 1-14. GLSA, Amherst, Mass.
Frisch, Stefan A. (2004). Language processing and segmental OCP effects. In Phonetically based
phonology (eds. B. Hayes, R. Kirchner, and D. Steriade), pp. 346-71. Cambridge University
Press, Cambridge.
Pierrehumbert, Janet B., and Broe, Michael B. (2004). Similarity avoidance and the OCP.
Natural Language and Linguistic Theory, 22,179-228.
Fromkin, Victoria A. (1971). The non-anomalous nature of anomalous utterances. Lan-
guage, 47,27-52.
(ed.) (1973). Speech errors as linguistic evidence. Mouton, The Hague.
(2000). Fromkins speech error database. Online database, Max Planck Institute
for Psycholinguistics, Nijmegen (http://www.mpi.nl/resources/data/fromkins-speech-
error-database/).
Gahl, Susanne (2008). Time and thyme are not homophones: The effect of lemma frequency on
word durations in spontaneous speech. Language, 84, 474-96.
Gaissmaier, Wolfgang (2008). The smart potential behind probability matching. Cognition, 109,
416-22.
Galantucci, Bruno (2005). An experimental study of the emergence of human communication
systems. Cognitive Science, 29, 737-67.
Fowler, Carol A., and Goldstein, Louis (2009). Perceptuomotor compatibility effects in
speech. Attention, Perception, and Psychophysics, 71(5), 1138-49.
and Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psycho-
nomics Bulletin Review, 13, 361-77.
Galinsky, Adam D., Magee, Joe C., Inesi, M. Ena, and Gruenfeld, Deborah H. (2006). Power
and perspectives not taken. Psychological Science, 17,1068-74.
Gallagher, Gillian (2010). The perceptual basis of long-distance laryngeal restrictions. PhD thesis,
MIT.
Gallese, Vittorio, Fadiga, Luciano, Fogassi, Leonardo, and Rizzolatti, Giacomo (1996). Action
recognition in the premotor cortex. Brain, 119, 593-609.
Candour, Jack, Petty, Soranee H., Dardarananda, Rochana, Dechongkit, Sumalee, and Mukn-
goen, Sunee (1986). The acquisition of the voicing contrast in Thai: A study of voice onset
time in word-initial stop consonants. Journal of Child Language, 13, 561-72.
Potisuk, Siripong, and Dechongkit, Sumalee (1994). Tonal coarticulation in Thai. Journal
of Phonetics, 22(4), 477-92.
Crnica, Olga Kaunoff (1977). Some prosodie and paralinguistic features of speech to young
children. In Talking to children (eds. C. Snow and C. Ferguson), pp. 63-88. Cambridge
University Press, Cambridge.
Garrett, Andrew and Blevins, Juliette (2009). Analogical morphophonology. In The nature of
the word: Studies in honor of Paul Kiparsky (eds. K. Hanson and S. Inkelas), pp. 527-45. MIT
Press, Cambridge, Mass.
298 References
Goldstein, Louis and Fowler, Carol (2003). Articulatory phonology: a phonology for public lan-
guage use. In Phonetics and phonology in language comprehension and production: Differences
and similarities (ed. A. Meyers and N. Schiller), pp. 159-207. Mouton de Gruyter, Berlin.
Gordon, Peter and Alegre, Maria (1999). Is there a dual system for regular inflections? Brain
and Language, 68, 212-17.
Gorman, Kyle (2009). Hierarchical regression modeling for language research. Technical
report, Institute for Research in Cognitive Science, University of Pennsylvania.
Goto, Hiromu (1971). Auditory perception by normal Japanese adults of the sounds T and R'.
Neuropsychologia, 9, 317-23.
Goudaillier, Jean-Pierre (1987). Einige Spracheigentmllichkeiten der Ltzebuergeschen
Mundarten im Licht der intrumentellen Phonetik. In Aspekte des Ltzebuergeschen (ed. J.-P.
Goudaillier), pp. 197-230. Buske Verlag, Hamburg.
Grammont, Maurice (1895). La dissimilation consonantique dans les langues indo-europennes
et dans les langues romanes: Thse prsente la Facult des Lettres de Paris. Darantire, Dijon.
(i933)- Trait de phontique. Delagrave, Paris.
(1939). Trait de phontique (2nd edn.). Delagrave, Paris.
Green, David M. and Swets, John A. (1966). Signal detection theory and psychophysics. Wiley,
New York.
Greenberg, Joseph H. (1966). Language universals, with special reference to feature hierarchies.
Mouton de Gruyter, Berlin.
Greenlee, Mel and Ohala, John J. (1980). Phonetically motivated parallels between child
phonology and historical sound change. Language Sciences, 2(2), 283-308.
Grimes, Barbara E, Grimes, Joseph E., and Pittman, Richard S. (eds.) (2000). Ethnologue:
Languages of the world, itfh Edition. Summer Institute of Linguistics, Dallas, TX.
Grinter, Emma J., Maybery, Murray T., Van Peek, Pia L., Pellicano, Elizabeth, Badcock,
Johanna C., and Badcock, David R. (2009). Global visual processing and self-rated autistic-
like traits. Journal of Autism and Developmental Disorders, 39,1278-90.
Grossberg, Stephen (1978). A theory of human memory: S elf-organization and performance
of sensory-motor codes, maps, and plans. In Progress in theoretical biology (eds. R. Rosen
and E Snell), Volume 5, pp. 233-374. Academic Press, New York.
(2003). Resonant neural dynamics of speech perception. Journal of Phonetics, 31(3-4),
423-45-
Grosvald, Michael (2009). Interspeaker variation in the extent and perception of long-distance
vowel-to-vowel coarticulation. Journal of Phonetics, 37(2), 173-88.
Guin, Susan G. (1995). Word frequency effects among homonyms. In Texas Linguistic Forum
35: Papers in Phonetics and Phonology (eds. T. C. Carleton, J. Elorrieta, and M. J. Moosally),
pp. 103-15. Department of Linguistics, University of Texas at Austin, Austin.
(1998). The role of perception in the sound change of velar palatalization. Phonetica,
55> 18-52.
Clark, J. J., Harada, Tetsuo, and Wayland, Ratree P. (2003). Factors affecting stress place-
ment for English nonwords include syllabic structure, lexical class, and stress patterns of
phonologically similar words. Language and Speech, 46(4), 403-27.
Gussenhoven, Carlos (2004). The phonology of tone and intonation. Cambridge University Press,
Cambridge.
300 References
Harries-Delisle, Helga (1978). Contrastive emphasis and cleft sentences. In Universals of human
language, Vol. 4: Syntax (ed. J. H. Greenberg), pp. 419-86. Stanford University Press.
Harrington, Jonathan, Kleber, Felicitas, and Reubold, Ulrich (2008). Compensation for coar-
ticulation, /u/-fronting, and sound change in standard southern British: An acoustic and
perceptual study. Journal of the Acoustical Society of America, 123(5), 2825-35.
Harris, John (1985). Phonological variation and change: Studies in Hiberno-English. Cambridge
University Press, New York.
Hasegawa, Yoko (1999). Pitch accent and vowel devoicing in Japanese. In Proceedings of the
XlVth International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999 (eds.
J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. C. Bailey), pp. 523-6. ICPhS.
Hawkins, Sarah (2003). Roles and representations of systematic fine phonetic detail in speech
understanding. Journal of Phonetics, 31, 373-405.
Hay, Jennifer (2003). Causes and consequences of word structure. Routledge, New York and
London.
and Sudbury Andrea (2005). How rhoticity became /r/-sandhi. Language, 81, 799-823.
Hayes, Bruce, Kirchner, Robert, and Steriade, Donca (2004). Phonetically based phonology.
Cambridge University Press, Cambridge.
and Londe, Zsuzsa C. (2006). Stochastic phonological knowledge: the case of Hungarian
vowel harmony. Phonology, 23(1), 59-104.
and Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic
learning. Linguistic Inquiry, 39(3), 379-440.
Hay ward, Richard J. (1990). Notes on the Aari language. In Omotic language studies (ed.
R. J. Hayward), pp. 425-93. School of Oriental and African Studies, University of London.
Hedrick, M. and Ohde, R. N. (1993). Effect of relative amplitude of frication on perception of
place of articulation. Journal of the Acoustic Society of America, 94(4), 2006-26.
Heijmans, Linda (2003). The relationship between tone and vowel length in two neighbor-
ing Dutch Limburgian dialects. In Development in prosodie systems (eds. P. Fikkert and
H. Jacobs), pp. 7-45. Mouton de Gruyter, New York.
Heike, Georg (1972). Quantitative und qualitative Differenzen von /a(:)/-Realisationen im
Deutschen. In Proceedings of the Vllth International Congress of Phonetic Sciences, Prague,
pp. 725-9.
Heine, Bernd, Claudi, Ulrike, and Hnnemeyer, Friederike (1991). Grammaticalization: a con-
ceptual framework. University of Chicago Press.
Henton, Caroline and Bladon, Anthony (1988). Creak as a sociophonetic marker. In Language,
speech and mind (eds. L. M. Hyman and C. N. Li), pp. 3-29. Routledge, London and New
York.
Herzog, Eugen (1904). Streitfragen der romanischen Philologie. M. Niemeyer, Halle.
Hewitt, B. George (1995). Georgian: A structural reference grammar. John Benjamins, Amster-
dam and Philadelphia.
Hickok, Gregory and Poeppel, David (2004). Dorsal and ventral streams: A framework for
understanding aspects of the functional anatomy of language. Cognition, 92, 67-99.
Hillenbrand, James M., Clark, M. J., and Nearey, Terence M. (2001). Effects of consonantal
environment on vowel formant patterns. Journal of the Acoustical Society of America, 109,
748-63.
302 References
Hirata, Yukari and Tsukada, Kimiko (2003). The effects of speaking rate and vowel length on
the formant movements in Japanese. In Proceedings of the 2003 Texas Linguistics Society
Conference: Co articulation in Speech Production and Perception (eds. A. Agwuele, W. Warren,
and S.-H. Park), Somerville, pp. 73-85. Cascadilla Proceedings Project.
Hitchcock, Clara (1903). The psychology of expectation. The Psychological Review, 5(3), 1-78.
Hock, Hans Henrich (1991). Principles of historical linguistics (2nd edn). Mouton de Gruyter,
Berlin.
and Joseph, Brian D. (1996). Language history, language change, and language rela-
tionship: An introduction to historical and comparative linguistics. Mouton de Gruyter,
Berlin.
Hockett, Charles E (1955). A manual of phonology. International Journal of American Linguis-
tics, memoir 11.
(1965). Sound change. Language, 41,185-202.
Holmberg, Tristan L., Morgan, Kathleen A., and Khl, Patricia K. (1977). Speech perception in
early infancy: Discrimination of fricative consonants. Presented at the 94th Meeting of the
Acoustical Society of America.
Holt, Lori L. and Lotto, Andrew J. (2006). Cue weighting in auditory categorization: Implica-
tions for first and second language acquistion. Journal of the Acoustical Society of America,
ii9>3059-7i.
and Kluender, Keith (2000). Neighboring spectral context influences vowel identifi-
cation. Journal of the Acoustical Society of America, 108(2), 710-22.
Hombert, Jean-Marie (1977). Development of tones from vowel height. Journal of Phonetics 5,
9-16.
(1978). Consonant types, vowel quality, and tone. In Tone: a linguistic survey (ed. V. A.
Fromkin), pp. 77-111. Academic Press, New York.
Ohala, John J., and Ewan, William G. (1979). Phonetic explanations for the development
of tones. Language, 55, 37-58.
Hooper, Joan B. (i976a). Introduction to natural generative phonology. Academic Press, New
York.
(i976b). Word frequency in lexical diffusion and the source of morphophonological
change. In Current progress in historical linguistics (ed. W Christie), pp. 95-105, North-
Holland, Amsterdam.
Hopper, Paul (1987). Emergent grammar. Berkeley Linguistics Society 13,139-57.
Hosmer, David W and Lemeshow, Stanley (1989). Applied logistic regression. John Wiley and
Sons, New York.
Houde, John E and Jordan, Michael I. (1998). Sensorimotor adaptation in speech production.
Science, 279,1213-6.
Houghton, George and Tipper, Steven (1996). Inhibitory mechanisms of neural and cognitive
control: Applications to selective attention and sequential action. Brain and Cognition, 30,
20-43.
House, Arthur S. (1961). On vowel duration in English. Journal of the Acoustical Society of
America 33,1174-8.
Howe, Darin and Fulop, Sean (2005). Acoustic features in Athabascan. Unpublished
manuscript, University of Calgary and California State University, Fresno.
References 303
Hruschka, Daniel, Christiansen, Morten, Blythe, Richard, Croft, William, Heggarty, Paul,
Mufwene, Salikoko, Pierrehumbert, Janet B., and Poplack, Shana (2009). Building social
cognitive models of language change. Trends in Cognitive Sciences, 13, 464-9.
Hua, Zhu and Dodd, Barbara (2000). The phonological acquisition of Putonghua (Modern
Standard Chinese). Journal of Child Language, 27(1), 3-42.
Huang, Hui-Chun (2007). Lexical context effects on speech perception in Chinese people with
autistic traits. Master s thesis, University of Edinburgh.
Hume, Elizabeth (2004a). Deconstructing markedness: A predictability-based approach. In
Proceedings of the Berkeley Linguistic Society 13, pp. 182-98.
(2oo4b). The indeterminacy/attestation model of metathesis. Language, So, 203-37.
(2006). Language specific and universal markedness: An information-theoretic approach.
Paper presented at the 8oth Linguistic Society of America Annual Meeting, Symposium on
Information Theory and Phonology, Albuquerque.
(2008). Markedness and the language user. Phonological Studies, 11, 295-310.
and Broomberg, liana (2005). Predicting epenthesis: An information-theoretic account.
Paper presented at the 7th Annual Meeting of the French Network of Phonology, Aix-en-
Provence.
and Johnson, Keith (200 la). A model of the interplay of speech perception and phonology.
In The role of perception in phonology (eds. E. Hume and K. Johnson), pp. 3-26. Academic
Press, New York.
(2ooib). The role of speech perception in phonology. Academic Press, New York.
Mailhot, Frdric, Wedel, Andrew, Hall, Katherine C., Kim, D., Ussishkin, Adam, Add-
Decker, Martime, Gendrot, Cdric, and Fougeron, Ccile (2011). Anti-markedness patterns
in French declension and epenthesis: an information-theoretic account. In Proceedings of
the 37th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA.
and Odden, David (1996). Reconsidering [consonantal]. Phonology, 13, 345-76.
Huron, David (2006). Sweet anticipation: Music and the psychology of expectation. MIT
Press.
Huttenlocher, P. R. (2002). Neural plasticity: The effects of environment on the development of
the cerebral cortex. Harvard University Press.
Hyman, Larry M. (1972). Nasals and nasalization in Kwa. Studies in African Linguistics, 4,
167-206.
(1973). The role of consonant types in natural tonal assimilations. In Consonant types
and tone (ed. L. M. Hyman), Southern California Occasional Papers in Linguistics i,
pp. 151-79. University of Southern California, Los Angeles.
(1975). Phonology: theory and analysis. Rinehart and Winston, New York.
(1976). Phonologization. In Linguistic studies presented to Joseph H. Greenberg (ed. A. Juil-
land), pp. 407-18. Anna Libri, Saratoga, Calif.
(1981). Noni grammatical structure, with special reference to verb morphology. Department
of Linguistics, University of Southern California, Los Angeles.
(1984). Form and substance in language universals. In Explanation of language universals
(eds. B. Butterworth, B. Comrie, and O. Dahl), pp. 67-85. Stanford University Press.
(1988). The phonology of final glottal stops. In Proceedings of W.E.C.O.L. 1988,
pp. 113-30. CSU, Fresno.
304 References
Hyman, Larry M. (2002). Is there a right-to-left bias in vowel harmony? Paper presented at
9th International Phonology Meeting, Vienna, Nov. i, 2002. To appear in John R. Rennison,
Friedrich Neubarth, and Markus A. Pochtrager (eds.), Phonologica 2002. Berlin: Mouton.
(2003). 'Abstract' vowel harmony in kloon: A system-driven account. In Typologie
des langues d'Afrique et universaux de la grammaire (eds. P. Sauzet and A. Zribi-Hertz),
pp. 85-112. L'Harmattan, Paris.
(20o8a). Directional asymmetries in the morphology and phonology of words, with spe-
cial reference to Bantu. Linguistics, 46, 309-49.
(2oo8b). Universals in phonology. The Linguistic Review, 25, 83-137.
(2oioa). Affixation by place of articulation: the case of Tiene. In Rara and rarissima:
Collecting and interpreting unusual characteristcs of human languages (eds. M. Cysouw and
J. Wohlgemuth), pp. 145-84. Mouton de Gruyter, Berlin and New York.
(2oiob). Focus marking in Aghem. In Information structure in African languages: Typolog-
ical studies in language (TSL) (eds. I. Fiedler and A. Schwartz), pp. 95-116. John Benjamins,
Amsterdam and Philadelphia.
and Katamba, Francis X. (1990). Final vowel shortening in Luganda. Studies in African
Linguistics, 21,1-59.
and Mathangwane, Joyce (1998). Tonal domains and depressor consonants in
Ikalanga. In Theoretical aspects of Bantu tone (eds. L. M. Hyman and C. Kisseberth),
pp. 195-229. C.S.L.I., Stanford.
and Polinsky, Maria (2009). Focus in Aghem. In Information structure: theoret-
ical, typological, and experimental perspectives (eds. M. Zimmerann and C. Fry),
pp. 206-33. Oxford University Press.
Idiatov, Dmitry (2008). Antigrammaticalization, antimorphologization and the case of Tura.
In Theoretical and empirical issues in grammaticalization (eds. E. Seoane and M. J. Lpez-
Couso), pp. 151-69. John Benjamins, Amsterdam.
It, Junko, Mester, Armin, and Padgett, Jaye. (1995). Licensing and redundancy: underspecifi-
cation in optimality theory. Linguistic Inquiry 26, 571-614.
Iverson, Gregory K. and Salmons, Joseph C. (1996). Mixtee prenasalization as hypervoicing.
International Journal of American Linguistics, 62,165-75.
Jackson, Ellen and Stanley, Carol (1977). Description phonologique du tikar (parler de
Bankim). Ms. S. I. L. Yaounde.
Jaeger, T. Florian (2008). Categorical data analysis: Away from ANOVAs (transforma-
tion or not) and towards logit mixed models. Journal of Memory and Language, 59(4),
434-46.
(2010). Redundancy and reduction: Speakers manage syntactic information density. Cog-
nitive Psychology, 61(1), 23-62.
and Tily, Harry (2011). On language 'utility': Processing complexity and communicative
efficiency. WIREs: Cognitive Science, 2(3), 323-35.
Jakobson, Roman (1931). Prinzipien der historischen phonologie. Travaux du Cercle Linguis-
tique de Prague, 4, 247-67.
(1931 [1972]). Principles of historical phonology. In A reader in historical and comparative
linguistics (ed. A. R. Keiler), pp. 121-38. Rinehart and Winston, New York.
(1968). Child language aphasia and phonological universals. Mouton, The Hague.
References 305
Jakobson, Roman, Fant, G. Gunnar M., and Halle, Morris (1952). Preliminaries to speech anal-
ysis: the distinctive features and their correlates. MIT Press, Cambridge, Mass.
and Waugh, Linda (1979). The sound shape of language. Indiana University Press, Bloom-
ington.
James, William and Mole, A. (1847). Dictionary of the English and French languages for
general use with the accentuations and a literal pronunciation of every word in both lan-
guages, comp. from the best and most approved English and French authorities. B. Tauchnitz,
Leipzig.
Janda, Richard (2003). 'Phonologization as the start of dephoneticization - Or, on sound
change and its aftermath: Of extension, generalization, lexicalization, and morphologization.
In Handbook of historical linguistics (eds. B. Joseph and R. Janda), pp. 401-22. Blackwell,
Maiden, MA.
and Joseph, Brian (2003). On language, change, and language changeor, of history,
linguistics, and historical linguistics. In Handbook of historical linguistics (ed. B. Joseph and
R. Janda), pp. 3-180. Blackwell, Oxford.
Jansen, Wouter (2004). Laryngeal contrast and phonetic voicing: A laboratory phonology
approach to English, Hungarian, and Dutch. PhD thesis, University of Groningen.
Jeffers, R. and Lehiste, I. (1979). Principles and methods for historical linguistics. MIT Press,
Cambridge, MA.
Jescheniak, Jrg D. and Levelt, Willem J. M. (1994). Word frequency effects in speech produc-
tion: Retrieval of syntactic information and of phonological form. Journal of Experimental
Psychology: Learning, Memory and Cognition 20(4), 824-43.
Jobe, Lisa E. and White, Susan Williams (2007). Loneliness, social relationships, and a broader
autism phenotype in college students. Personality and Individual Differences, 42,1479-89.
John, Oliver P., Naumann, Laura P., and Soto, Christopher J. (2008). Paradigm shift to the inte-
grative big-five trait taxonomy: history, measurement, and conceptual issues. In Handbook of
personality: Theory and research (eds. O. P. John, R. W Robins, and L. A. Pervin), pp. 114-58.
Guilford Press, New York, NY.
Johnson, Keith (i997a). The auditory/perceptual basis for speech segmentation. Ohio State
University Working Papers in Linguistics, 50,101-13.
(i997b). Speech perception without speaker normalization: an exemplar model. In Talker
variability in speech processing (eds. K. Johnson and J. Mullennix), pp. 145-66. Academic
Press, San Diego.
(2000). Adaptive dispersion in vowel perception. Phonetica, 57,181-8.
(2003). Acoustic and auditory phonetics (2nd edn). Blackwell, Maiden, Mass.
(2004). Massive reduction in conversational American English. In Spontaneous speech:
Data and analysis. Proceedings of the ist Session of the loth International Symposium
(eds. K. Yoneyama and K. Maekawa), pp. 29-54. National Institute for Japanese Language,
Tokyo.
(2006). Resonance in an exemplar-based lexicon: The emergence of social identity and
phonology. Journal of Phonetics, 34, 485-99.
(2007). Decision and mechanisms in exemplar-based phonology. In Experimental
approaches to phonology (eds. M.-J. Sol, P. Beddor, and M. Ohala), Chapter 3, pp. 25-40.
Oxford University Press, Oxford.
3o6 References
Johnson, Keith, Flemming, Edward, and Wright, Richard (1993a). The hyperspace effect: Pho-
netic targets are hyperarticulated. Language, 69, 505-28.
Ladefoged, Peter, and Lindau, Mona (199313). Individual differences in vowel production.
Journal of the Acoustical Society of America, 94, 701-14.
and Martin, Jack (2001). Acoustic vowel reduction in Creek: Effects of distinctive length
and position in the word. Phonetica, 58, 81-102.
Jones, J. A. and Munhall, K. G. (2000). Perceptual calibration of Fo production: Evidence from
feedback perturbation. Journal of the Acoustical Society of America, 108,1246-51.
Jones, M. R., Johnston, H. M, and Puente, J. (2006). Effects of auditory pattern structure on
anticipator and reactive attending. Cognitive Psychology, 53, 59-96.
Jongman, Allard (1988). Duration of frication noise required for identification of English
fricatives. Journal of the Acoustic Society of America, 85(4), 1718-25.
Wayland, Ratree, and Wong, Serena (2000). Acoustic characteristics of English fricatives.
Journal of the Acoustic Society of America, 108(3), 1252-63.
J0rgensen, Hans Peter (1996). Die gespannten und ungespannten Vokale in der norddeutschen
Hochsprache mit einer spezifischen Untersuchung der Struktur der Formantenfrequenzen.
Phonetica, 19, 217-45.
Joseph, Brian and Janda, Richard (1988). The how and why of diachronic morphologization
and demorphologization. In Theoretical morphology (eds. M. Hammon and M. Noonan),
pp. 193-210. Academic Press, San Diego.
(eds.) (2003). The handbook of historical linguistics. Blackwell, Oxford.
Jun, Jongho (1995). Place assimilation as the result of conflicting perceptual and articula-
tory constraints. In Proceedings of West Coast Conference of Formal Linguistics, Volume 14,
pp. 221-37.
Jurafsky, Dan (2003). Probabilistic modeling in psycholinguistics. In Probabilistic Linguistics
(eds. R. Bod, J. Hay, and S. Jannedy), pp. 39-95. MIT Press, Cambridge, Mass.
Bell, Alan, Gregory, Michelle, and Raymond, W (2001). Probabilisitc relations between
words: Evidence from reduction in lexical production. In Frequency and the emergence of
linguistic structure (eds. J. Bybee and P. Hopper), pp. 229-54. John Benjamins, Amsterdam.
Jusczyk, Peter W, Goodman, Mara B., and Baumann, Angela (1999). Nine-month-olds atten-
tion to sound similarities in syllables. Journal of Memory and Language, 40, 62-82.
Kabak, Baris and Idsardi, William J. (2007). Perceptual distortions in the adaptation of English
consonant clusters: Syllable structure or consonantal contact contraints. Language and
Speech, 50, 23-52.
Kang, Kyoung-Ho and Guin, Susan G. (2008). Clear speech production of Korean stops:
Changing phonetic targets and enhancement strategies. Journal of the Acoustical Society of
America, 124(6), 3909-17.
Kataoka, Reiko (2010). Individual variation in speech perception as a source of 'apparent'
hyp o-correction. Paper presented at the 12th Conference on Laboratory Phonology, Albur-
querque, New Mexico, July 10.
(2011). Phonetic and cognitive bases of sound change. PhD thesis, University of California,
Berkeley.
Katseff, Shira, Houde, John, and Johnson, Keith (in press). Partial compensation for altered
auditory feedback: A tradeoff with somatosensory feedback? Language and Speech.
References 307
Kaun, Abigail R. (2004). The typology of rounding harmony In Phonetically based phonology
(eds. B. Hayes, R. Kirchner, and D. Steriade), pp. 87-116. Cambridge University Press,
Cambridge.
Kavitskaya, Darya (2002). Compensatory lengthening: Phonetics, phonology, and diachrony.
Routledge, New York.
Kawasaki, Haruko (1986). Phonological universals of vowel nasalization. In Experimental
phonology (eds. J. J. Ohala and J. J. Jaeger), pp. 81-98. Academic Press, Orlando, FL.
Kaye, Jonathan (1974). Morpheme structure constraints live! In Montreal Working Papers in
Linguistics, Volume 3, pp. 55-62.
Keating, Patricia A. (1984). Phonetic and phonological respresentations of stop consonant
voicing. Language, 60(2), 286-319.
(1985). Universal phonetics and the organization of grammars. In Phonetic linguistics:
Essays in honor of Peter Ladefoged (ed. V. A. Fromkin), pp. 115-32. Academic Press,
Orlando.
(1988). The phonology-phonetics interface. In Linguistics: The Cambridge survey, Vol-
ume I: Grammatical theory (ed. F. J. Newmeyer), pp. 281-302. Cambridge University
Press.
(1990). Phonetic representations in a generative grammar. Journal of Phonetics 18,
321-34.
(1996). The phonology-phonetics interface. In Interfaces in phonology (ed. U. Kleinhenz),
pp. 262-78. Akademie Verlag, Berlin.
Cho, Taehong, Fougeron, Ccile, and Hsu, Chai-Shune (2003). Domain-initial
articulatory strengthening in four languages. Papers in Laboratory Phonology, 6,
143-61.
Linker, Wendy, and Huffman, Marie (1983). Patterns of allophone distribution for voiced
and voiceless stops. Journal of Phonetics, 11, 277-90.
Mikos, M. J., and Ganong III, W. F. (1981). A cross-language study of range of voice onset
time in the perception of initial stop voicing. Journal of the Acoustic Society of America, 70(5),
1261-71.
Keenan, Edward L. (1976). Towards a universal definition of 'subject'. In Subject and topic
(ed. C. N. Li), pp. 303-33. Academic Press.
Kelly, Michael H. (1988). Rhythmic alternation and lexical stress differences in English. Cogni-
tion, 30, 107-37.
(1989). Rhythm and language change in English. Journal of Memory and Language,
28, 690-710.
and Bock, J. Kathryn (1988). Stress in time. Journal of Experimental Psychology: Human
Perception and Performance, 14(3), 389-403.
Kenstowicz, Michael and Kisseberth, Charles (1979). Generative phonology. Academic Press,
San Diego.
Kertsz, Zsuzsa (2003). Vowel harmony and the stratified lexicon of Hungarian. In The odd
yearbook, 7. ELTE Press, Budapest.
Keyser, Samuel Jay and Stevens, Kenneth N. (2001). Enhancement revisited. In Ken Hale: A life
in language (ed. M. Kenstowicz), pp. 271-91. MIT Press, Cambridge, MA.
(2006). Enhancement and overlap in the speech chain. Language, 82(1), 33-63.
3o8 References
Khouw, Edward and Ciocca, Victor (2007). Perceptual correlates of Cantonese tones. Journal
of Phonetics, 35,104-17.
Kim, Chin-Wu (1965). On the autonomy of the tensity feature in stop classification (with special
reference to Korean stops). Word, 21, 339-59.
King, Jonathan and Just, Marcel Adam (1991). Individual differences in syntactic processing:
The role of working memory. Journal of Memory and Language, 30, 580-602.
King, Robert D. (1967). Functional load and sound change. Language, 43, 831-52.
(1969). Historical linguistics and generative grammar. Prentice-Hall, Englewood Cliffs, N.J.
Kingston, John (2007). The phonetics-phonology interface. In The Cambridge handbook of
phonology (ed. P. de Lacy), pp. 401-34. Cambridge University Press, Cambridge.
and Diehl, Randy L. (1994). Phonetic knowledge. Language, 70, 419-54.
Kirk, Cecilia J., and Castleman, Wendy A. (2008). On the internal perceptual struc-
ture of distinctive features. Journal of Phonetics, 36, 28-54.
Kiparsky, Paul (1965). Phonological change. PhD thesis, M.I.T.
(1968). Linguistic universals and language change. In Universals in linguistic the-
ory (eds. E. Bach and R. T. Harms), pp. 171-202. Rinehart and Winston, New
York.
(1982). Lexical phonology and morphology. In Linguistics in the morning calm (ed. In-
Seok Yong), pp. 3-91. Hanshin, Seoul.
(1985). Some consequences of lexical phonology. In Phonology yearbook, pp. 85-183. MIT
Press.
(1988). Phonological change. In Linguistics: The Cambridge Survey (ed. E Newmeyer),
Volume i: Theoretical foundations, pp. 363-415. Cambridge University Press, Cambridge.
(1995). The phonological basis of sound change. In Handbook of phonological theory (ed.
J. Goldsmith), pp. 640-70. Basil Blackwell, Oxford.
(2006). Amphichronic linguistics vs. Evolutionary Phonology. Theoretical Linguistics, 32,
217-36.
Kirby, James P. (2010). Cue selection and category restructuring in sound change. PhD thesis,
University of Chicago.
(2011). Modeling the acquisition of covert contrast. In Proceedings of the Seventeenth
International Conference of the Phonetic Sciences, Hong Kong.
Kirby, Simon (1999). Function, selection and innateness: The emergence of language universals.
Oxford University Press, Oxford.
Kirchner, Robert (1998). An effort-based approach to consonant lenition. PhD thesis, UCLA.
(2001). An effort based approach to consonant lenition. Routledge, New York.
Kirsch, Irving (1999). Response expectancy: an introduction. In How expectancies shape
experience, (ed. I. Kirsch), pp. 3-13. American Psychological Association, Washington,
DC.
Klatt, Dennis H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical
access. In Perception and production of fluent speech (ed. R. A. Cole), pp. 243-88. Erlbaum,
Hillsdale, N.J.
Klein, Sheldon (1966). Historical change in language using Monte Carlo techniques. Mechani-
cal Translation and Computational Linguistics, 9, 67-82.
References 309
Klein, Steven, Kuppin, Michael, and Meives, Kirby (1969). Monte Carlo simulation of language
change in Tikopia and Maori. In Proceedings of the 1969 Conference on Computational
Linguistics (COLING), pp. 699-729.
Koehler, Derek J. (2009). Probability matching in choice under uncertainty: Intuition versus
deliberation. Cognition, 113,123-7.
Komarova, N. L., Niyogi, Partha, and Nowak, M. A. (2001). The evolutionary dynamics of
grammar acquisition. Journal of Theoretical Biology, 209(1), 43-60.
and Nowak, Martin (2003). Language dynamics in finite populations. Journal of Theoret-
ical Biology, 221, 445-57.
Kornai, Andrs (1990). Hungarian vowel harmony. In Approaches to Hungarian, Volume Three:
Structures and Arguments (ed. I. Kenesei), pp. 183-240. JATE, Szeged.
Krmer, Martin (2001). Vowel harmony and correspondence theory. PhD thesis, University of
Dsseldorf.
(2003). Vowel harmony and correspondence theory. Mouton de Gruyter, Berlin.
(2009). The phonology of Italian. Oxford University Press, Oxford.
Kroch, Anthony (1989). Reflexes of grammar in patterns of language change. Language Varia-
tion and Change, i, 199-244.
Kruschke, John (2003). Attention in learning. Current Directions in Psychological Science, 12(5),
171-5.
Khl, Patricia K. (1991). Human adults and human infants show a 'perceptual magnet effect'
for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50,
93-107-
, Andruski, Jean E., Chistovich, Inna A., Chistovich, Ludmilla A., Kozhevnikova, Elena V.,
Ryskina, Viktoria, Stolyarova, Elvira L, Sundberg, Ulla, and Lacerda, Francisco (1997).
Cross-language analysis of phonetic units in language addressed to infants. Science, 277,
684-6.
, Stevens, Erica, Hayashi, Akiko, Deguchi, Toshisada, Kiritani, Shigeru, and Iverson, Paul
(2006). Infants show a facilitation effect for native language phonetic perception between 6
and 12 months. Developmental Science, 9, Fi3-F2i.
, Williams, Karen A., Lacerda, Francisco, Stevens, Kenneth N., and Lindblom, Bjrn
(1992). Linguistic experience alters phonetic perception in infants by 6 months of age.
Science, 255, 606-8.
Kuipers, Aert H. (1974). The Shuswap language. Mouton, The Hague.
Kullback, Solomon and Leibler, Richard A. (1951). On information and sufficiency. Annals of
Mathematical Statistics, 22(1), 79-86.
Kmmel, Martin (2007). Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und
ihre Konsequenzen fr die vergleichende Rekonstruktionen. Reichert, Wiesbaden.
Kurylowicz, Jerzy (1965 [1972]). The evolution of grammatical categories. In Esquisses linguis-
tiques II, pp. 38-54. Fink, Munich.
Kutas, Marta and Hillyard, Steven A. (1984). Brain potentials during reading reflect word
expectancy and semantic association. Nature, 307,161-3.
Kwenzi Mikala, J. (1980). Esquisse phonologique du punu. In Elments de description dupunu
(ed. F. Nsuka-Nkutsi), pp. 51-114. CRLS, Universit Lyon II.
310 References
Kwon, Kyung-Keun (2003). Prosodie change from tone to vowel length in Korean. In Devel-
opment in prosodie systems (eds. P. Fikkert and H. Jacobs), pp. 67-89. Mouton de Gruyter,
New York.
Labov, William (1971). Methodology. In A survey of linguistic science (ed. W. O. Dingwall),
pp. 412-97. University of Maryland.
(1973). The linguistic consequences of being a lame. Language in Society, 2(1), 81-115.
(1981). Resolving the Neogrammarian controversy. Language, 57, 267-308.
(1989). The child as lingusitic historian. Language Variation and Change, i, 85-97.
(1990). The intersection of sex and social class in the course of linguistic change. Language
Variation and Change, 2(2), 205-54.
(1994). Principles of linguistic change, Volume i: Internal factors. Blackwell, Oxford.
(2001). Principles of linguistic change, Volume 2: Social factors. Blackwell, Oxford.
(2010). Principles of linguistic change, Volume 3: Cognitive and cultural factors. Wiley-
Blackwell, Maiden, Mass.
Yaeger, Malka, and Steiner, Richard (1972). A quantitative study of sound change in
progress. U.S. Regional Survey, Philadelphia.
Ladefoged, Peter and Maddieson, Ian (1996). The sounds of the world's languages. Blackwell
Publishers, Oxford.
Lashley, Karl S. (1951). The problem of serial order in behavior. In Cerebral mechanisms in
behavior (ed. L. Jeffress). Wiley, New York.
Lasky, R., Syrdal-Lasky A., and Klein, D. (1975). VOT discrimination by four- to six-month-
old infants from Spanish environments. Journal of Experimental Child Psychology, 20,
215-25.
Lavoie, Lisa (2001). Consonant strength: Phonological patterns and phonetic manifestations.
Routledge, New York.
Lee, Senghun Julio (2008). Consonant-tone interaction in optimality theory. PhD thesis, Rutgers
University.
Lee, Yongeun (2006). Sub-syllabic constituency in Korean and English. PhD thesis, Northwestern
University.
Lehiste, Use (1970). Suprasegmentals. MIT Press, Cambridge.
(1976). Influence of fundamental frequency pattern on the perception of duration. Journal
of Phonetics, 4,113-17.
(2003). Prosodie change in progress: from quantity language to accent language. In Devel-
opment in prosodie systems (eds. P. Fikkert and H. Jacobs), pp. 47-65. Mouton de Gruyter,
New York.
(2004). Bisyllabicity and tone. In Proceedings of the International Symposium on Tonal
Aspects of Language, pp. 111-14.
Lehnert-LeHouillier, Heike (2010). A cross-linguistic investigation of cues to vowel length
perception. Journal of Phonetics, 38(3), 472-82.
Levelt, C., Schiller, N., and Levelt, W (1999). The acquisition of syllable types. Language Acqui-
sition, 8, 237-64.
Levitt, Andrea G., Jusczyk, Peter W, Murray, Janice, and Carden, Guy (1988). Context effects
in two-month-old infants' perception of labiodental/interdental fricative contrasts. Journal
of Experimental Psychology: Human Perception and Performance, 14(3), 361-8.
References 311
Liu, Huei-Mei, Khl, Patricia K., and Tsao, Feng-Ming (2003). The association between
mothers' clarity and infants' speech discrimination skill. Developmental Science, 6,
Fi-Fio.
Tsao, Feng-Ming, and Khl, Patricia K. (2007). Acoustic analysis of lexical tone in Man-
darin infant-directed speech. Developmental Psychology, 43(4), 912-17.
Lloyd, Paul M. (1987). From Latin to Spanish: Historical phonology and morphology of the
Spanish language. The American Philosophical Society, Philadelphia.
Lombardo, Michael V., Barnes, Jennifer L., Wheelwright, Sally J., and Simon Baron-Cohen
(2007). Self-referential cognition and empathy in autism. PLoS One, 2, 883.
Luce, Paul and Pisoni, David (1998). Recognizing spoken words: The neighborhood activation
model. Ear and Hearing, 19,1-36.
Luick, Karl (1921-40). Historische Grammatik der englischen Sprache. Tauchnitz, Leipzig.
MacKain, Kristine S., Best, Catherine T., and Strange, Winifred (1981). Categorical perception
of English lil and III by Japanese bilinguals. Applied Psycholinguistics, 2, 369-90.
MacKay, Donald G. (1970). Spoonerisms: The structure of errors in the serial order of speech.
Neuropsychologia, 8, 323-50.
MacKay, David J. (2002). Information theory, inference and learning algorithms. Cambridge
University Press, Cambridge.
Macken, Marlys A. (1980). The child's lexical representation: The 'puzzle-puddle-pickle' evi-
dence. Journal of Linguistics, 16,1-17.
Mackenzie, Sara (2008). Contrast and similarity in consonant harmony processes. PhD thesis,
University of Toronto.
Maddieson, Ian (1984). Patterns of sounds. Cambridge University Press, Cambridge.
(2008). Presence of uncommon consonants. In The world atlas of language structures
online (eds. M. Haspelmath, M. Dryer, D. Gil, and B. Comrie), Chapter 19. Max Planck
Digital Library, Munich.
and Precoda, Kristin (1992). Syllable structure and phonetic models. Phonology, 9,45-60.
Magen, Harriet S. (1989). An acoustic study of vowel-to-vowel coarticulation in English. PhD
thesis, Yale University.
(1997). The extent of vowel-to-vowel coarticulation in English. Journal of Phonetics,
25,187-205.
Mahanta, Shakuntala (2007). Directionality and locality in vowel harmony. PhD thesis, Utrecht
University.
Mailhot, Frdric (2010). Modelling the acquisition and evolution of vowel harmony. PhD thesis,
Carleton University.
Malsheen, Bathsheba J. (1980). Two hypotheses for phonetic clarification in the speech of moth-
ers to children. In Child phonology (eds. G. Yeni-Komishan, S. Kavanaugh, and C. Ferguson),
Volume 2: Perception. Academic Press, San Diego.
Maczak, Witold (1980). Laws of analogy. In Historical morphology (ed. J. Fisiak), pp. 283-8.
Mouton, The Hague.
Mann, Virginia A. and Repp, Bruno H. (1980). Influence of vocalic context on perception of
the [J]-[s] distinction. Perception and Psychophysics, 28, 213-28.
Manuel, Sharon (1987). Acoustic and perceptual consequences of vowel-to-vowel coarticulation
in three Bantu languages (Zimbabwe). PhD thesis, Yale University.
References 313
Manuel, Sharon (1990). The role of contrast in limiting vowel-to-vowel coarticulation in differ-
ent languages. Journal of the Acoustical Society of America, 88,1286-98.
(1999). Cross-linguistic studies: relating language-particular coarticulation patterns to
other language-particular facts. In Coarticulation: Theory, data and techniques (eds. W. Hard-
castle and N. Hewlett), pp. 179-98. Cambridge University Press, Cambridge.
and Krakow, Rena (1984). Universal and language particular aspects of vowel-to-vowel
coarticulation. Raskins Laboratories Status Report on Speech Research, SRjj/jS, 69-78.
Martin, Andrew Thomas (2007). The evolving lexicon. PhD thesis, University of California, Los
Angeles.
Martinet, Andr (1933). Remarques sur le systme phonologique du franais. Bulletin de la
Socit de Linguistique de Paris, 34,191-202.
(1952). Function, structure, and sound change. Word, 8(1), 1-32.
(i955)- conomie des changements phontiques. Francke, Berne.
(1960). Elments de linguistique gnrale. Colin, Paris.
Massaro, Dominic W. and Cohen, Michael M. (1983). Evaluation and integration of visual
and auditory information in speech perception. Journal of Experimental Psychology: Human
Perception and Performance, 9, 753-71.
Matisoff, James (1973). Tonogenesis in Southeast Asia. In Cosonant types and tone (ed.
L. Hyman), pp. 71-95. University of Southern California.
Mattock, Karen and Burnham, Denis (2006). Chinese and English infants' tone perception:
Evidence for perceptual reorganization. Infancy, 10(3), 241-65.
Maye, Jessica, Weiss, Daniel J., and Aslin, Richard N. (2008). Statistical phonetic learning in
infants: facilitation and feature generalization. Developmental Science, 11(1), 122-34.
Werker, Janet F., and Gerken, LouAnn (2002). Infant sensitivity to distributional infor-
mation can affect phonetic discrimination. Cognition, 82(3), Bioi-Bin.
McCarthy, John J. (1988). Feature geometry and dependency: a review. Phonetica, 43, 84-108.
(2002). Comparative markedness. In Papers in optimality theory II (eds. A. W C. Angela
C. Carpenter and P. de Lacy), pp. 171-246. Amherst, MA.
McCawley, James D. (1968). The phonological component of a grammar of Japanese. Mouton,
The Hague.
McDonough, Joyce (1991). On the representation of consonant harmony in Navajo. Proceedings
ofWCCFL, 10,319-35.
McKinley, Stephen C. and Nosofsky Robert (1996). Selective attention and the formation
of linear decision boundaries. Journal of Experimental Psychology: Human Perception and
Performance, 22, 294-317.
McLachlan, Geoffrey J. and Peel, David (2000). Finite mixture models. Wiley, New York.
McMurray, Bob, Aslin, Richard N., and Toscano, Joseph C. (2009). Statistical learning of pho-
netic categories: Insights from a computational approach. Developmental Science, 12(3),
369-78.
Meringer, Rudolf (1908). Aus dem Leben der Sprache: Versprechen, Kindersprache, Nach-
ahmungstrieb. Behr, Berlin.
and Mayer, Karl (1895). Versprechen und Verlesen: Eine psychologisch-linguistische Studie.
Gschen, Stuttgart.
Messick, Samuel (1976). Individuality in learning. Jossey-Bass, Oxford.
314 References
Mielke, Jeff (2005). Ambivalence and ambiguity in laterals and nasals. Phonology, 22(2),
169-203.
(2008). The emergence of distinctive features. Oxford Studies in Typology and Linguistic
Theory. Oxford University Press.
Baker, Adam, and Archangeli, Diana (2010). Variability and homogeneity in American
English lil and Is/ retraction. In Laboratory phonology 10 (eds. C. Fougeron, B. Kuehnert,
M. DTmperio, and N. Valle), pp. 699-719. Mouton de Gruyter, Berlin.
Magloughlin, Lyra, and Hume, Elizabeth (2011). Evaluating the effectiveness of Unified
Feature Theory and three other feature systems. In Tones and features: In honor of G. Nick
Clements. Mouton de Gruyter, Berlin.
Miller, George A. and Nicely, Patricia (1955). An analysis of perceptual confusions among some
English consonants. Journal of the Acoustical Society of America, 27, 338-52.
Milroy, James and Milroy, Lesley (1985). Linguistic change, social network and speaker inno-
vation. Journal of Linguistics, 21(2), 339-84.
Mitchener, W. Garrett (2005). Simulating language change in the presence of non-idealized
syntax. In Proceedings of the Workshop on Psychocomputational Models of Human Language
Acquisition, Ann Arbor, Michigan, pp. 10-19. Association for Computational Linguistics.
Mitterer, Holger (2006). On the causes of compensation for coarticulation: Evidence for phono-
logical mediation. Perception and Psychophysics, 68(7), 1227-40.
and Blomert, Leo (2003). Coping with phonological assimilation in speech perception:
Evidence for early compensation. Perception and Psychophysics, 65(6), 956-69.
Miyawaki, Kuniko, Strange, Winifred, Verbrugge, Robert, Liberman, Alvin M., Jenkins,
James J., and Fujimura, Osamu (1975). An effect of lingusitic experience: The discrimination
of [r] and [1] by native speakers of Japanese and English. Perception and Psychophysics, 18,
331-40.
Mochizuki, Michiko (1981). The identification of/r/ and /!/ in natural and synthesized speech.
Journal of Phonetics, 9, 283-303.
Mohr, Burkhart (1971). Intrinsic variations in the speech signal. Phonetica 23, 69-93.
Moren, Bruce and Zsiga, Elisabeth (2006). The lexical and post-lexical phonology of Thai tones.
Natural Language and Linguistic Theory, 24,113-78.
Moretn, Elliot (20o8a). Analytic bias and phonological typology. Phonology, 25(1), 83-127.
(2oo8b). Learning bias as a factor in phonological typology. In Proceedings of the 26th
Meeting of the West Coast Conference on Formal Linguistics (WCCFL) (eds. C. Chang and
A. Haynie), pp. 393-401. Cascadilla Proceedings Project, Somerville, MA.
(2010). Underphonologization and modularity bias. In Phonological argumentation:
Essays on evidence and motivation (ed. S. Parker). Equinox, London.
and Thomas, Erik R. (2007). Origins of Canadian Raising in voiceless-coda effects: A
case study in phonologization. In Laboratory phonology 9 (eds. J. S. Cole and J. I. Hualde),
pp. 37-64. Mouton, Berlin.
Morgan, James L., White, Katherine, Singh, Leher, and Bortfield, Heather (under review).
DRIBBLER: A developmental model of spoken word recognition. Psychological Review.
Mottron, Laurent, Dawson, Michelle, Soulires, Isabelle, Hubert, Bndicte, and Burack, Jake
(2006). Enhanced perceptual functioning in autism: An update, and eight principles of
autistic perception. Journal of Autism and Developmental Disorders, 36, 27-43.
References 315
Moulines, Eric and Charpentier, Francis (1990). Pitch synchronous waveform processing tech-
niques for text-to-speech synthesis using diphones. Speech Communication, 9, 453-67.
Moulton, William (1960). The short vowel systems of northern Switzerland: A study in struc-
tural dialectology. Word, 16,155-82.
(1967). Types of phonemic change. In To honor Roman Jakobson: Essays on the occasion
of his seventieth birthday, Volume 2, pp. 1393-407. Mouton, The Hague.
Mowrey, Richard and Pagliuca, William (1995). The reductive character of articulatory evolu-
tion. Rivista di Lingistica, 7, 37-124.
Mufwene, Salikoko S. (2001). The ecology of language evolution. Cambridge University Press,
Cambridge.
(2008). Language evolution: Contact, competition, and change. Continuum Press, London
and New York.
Munson, Benjamin (2001). Phonological pattern frequency and speech prodcution in adults
and children. Journal of Speech, Language, and Hearing Research, 44, 778-92.
Ntnen, Risto (2001). The perception of speech sounds by the human brain as reflected by the
mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology, 38,
1-21.
Namy Laura L., Nygaard, Lynne C., and Sauerteig, Denise (2002). Gender differences in vocal
accommodation: The role of perception. Journal of Language and Social Psychology, 21(4),
422-32.
Narayan, Chandan R. (2008). The acoustic-perceptual salience of nasal place contrasts. Jo urnal
of Phonetics, 36,191-217.
Werker, Janet R, and Beddor, Patrice S. (2010). The interaction between acoustic salience
and language experience in developmental speech perception: Evidence from nasal place
discrimination. Developmental Science, 13(3), 407-20.
Nearey, Terrance and Hogan, John T. (1986). Phonological contrast in experimental phonetics:
Relating distributions of production data to perceptual categorization curves. In Experimen-
tal phonology (eds. J. J. Ohala and J. J. Jaeger), pp. 141-62. Academic Press, Orlando.
Nettle, Daniel (2007). Empathizing and systemizing: What are they, and what do they con-
tribute to our understanding of psychological sex differences? British Journal of Psychol-
ogy, 98, 237-55.
Neu, Hlne (1980). Ranking of constraints on /t,d/ deletion in American English. In Locating
language in time and space (ed. William Labor), pp. 37-54. Academic Press.
New, Boris, Pallier, Christophe, Ferrand, Ludovic, and Matos, Rafeal (2001). Une base
de donnes lexicales du franais contemporain sur internet: LEXIQUE. LAnne Psy-
chologique, 101(3-4), 447-62.
Newman, Mark E. J. and Girvan, Michelle (2004). Finding and evaluating community structure
in networks. Physical Review E, 69(2), 026113.
Newman, Stanley (1944). Yokuts language of California. Viking Fund Publication in Anthro-
pology, no. 2, New York.
Newport, Elissa L. and Aslin, Richard N. (2004). Learning at a Distance I. Statistical Learning
of Nonadjacent Dependencies. Cognitive Psychology, 48,127-62.
Nielsen, Jimmi (2010). Lexical frequency effects in the spread of TH-fronting in Glaswegian:
A cue to the origins of sound change? Master s thesis, University of Edinburgh.
3i6 References
Peperkamp, Sharon (2003). Phonological acquisition: Recent attainments and new challenges.
Language and Speech, 46(2-3), 97-113.
Vendelin, Inga, and Nakamura, Kimihiro (2008). On the perceptual origin of loanword
adaptations: experimental evidence from Japanese. Phonology, 25(1), 129-64.
Peterson, Gonnou E. and Barney, Harold L. (1952). Control methods used in the study of
vowels. Journal of the Acoustical Society of America, 24,175-84.
Phillips, Betty S. (1984). Word frequency and the actuation of sound change. Language, 60(2),
320-42.
(2000). Fast words, slow words. American Speech, 75(4), 414-16.
(2001). Lexical diffusion, lexical frequency, and lexical analysis. In Frequency and the
emergence of linguistic structure (eds. J. L. Bybee and P. J. Hopper), pp. 123-6. John Ben-
jamins, Amsterdam.
(2006). Word frequency and lexical diffusion. Palgrave Macmillan, New York.
Pierrehumbert, Janet B. (1980). The phonetics and phonology of English intonation. PhD thesis,
M.I.T.
(1990). Phonological and phonetic representation. Journal of Phonetics 18, 375-94.
(200la). Exemplar dynamics: Word frequency, lenition and contrast. In Frequency and the
emergence of linguistic structure (eds. J. L. Bybee and P. Hopper), pp. 137-57. John Benjamins,
Amsterdam.
(2ooib). Why phonological constraints are so coarse-grained. Language and Cognitive
Processes, 16(5-6), 691-8.
(2002). Word-specific phonetics. In Laboratory phonology (eds. C. Gussenhoven and
N. Warner), Vol. VII, Phonology and phonetics, pp. 101-39. Mouton de Gruyter, Berlin.
(2004). Phonetic diversity, statistical learning, and acquisition of phonology. Language
and Speech, 46(2-3), 115-54.
Piggott, Glyne (1992). Variability in feature dependency: The case of nasality. Natural Language
and Linguistic Theory 10, 33-77.
Pisoni, David B. (1976). Fundamental frequency and perceived vowel duration. Journal of the
Acoustical Society of America, 59(81), 839-839.
(1977). Identification and discrimination of the relative onset time of two component
tones: Implications for voicing perception in stops. Journal of the Acoustic Society of Amer-
ica, 61(5), 1352-62.
and Aslin, Richard N. (1982). Some effects of laboratory training on identification and
discrimination of voicing contrasts in stop consonants. Journal of Experimental Psychology:
Human Perception and Performance, 8, 297-314.
Pitt, Mark (1998). Phonological processes and the perception of phonotactically illegal conso-
nant clusters. Perception and Psychophysics, 60(6), 941-51.
Dilley, Laura, Johnson, Keith, Kiesling, Scott, Raymond, William, Hume, Elizabeth,
and Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd release).
www.buckeyecorpus.osu.edu. Columbus, OH: Department of Psychology, Ohio State Uni-
versity (distributor).
Pitt, Mark A. and Johnson, Keith (2003). Using pronunciation data as a starting point in
modeling word recognition. Manuscript, The Ohio State University.
References 319
and McQueen, James (1998). Is compensation for coarticulation mediated by the lexicon?
Journal of Memory and Language, 39, 347-70.
Polka, Linda (1991). Cross-language speech perception in adults: Phonemic, phonetic and
acoustic contributions. Journal of the Acoustical Society of America, 89(6), 2961-77.
Colantonio, Connie, and Sundara, Megha (2001). A cross-language comparison of
/d/-/o/ perception: Evidence for a new developmental pattern. Journal of the Acoustical
Society of America, 109(5), 2190-201.
and Strange, Winifred (1985). Perceptual equivalence of acoustic cues that differentiate
/r/and /!/. Journal of the Acoustical Society of America, 78(4), 1187-97.
and Werker, Janet E (1994). Developmental changes in perception of non-native vowel
contrasts. Journal of Experimental Psychology: Human Perception and Performance, 20,
421-35-
Port, Robert E (2003). Meter and speech. Journal of Phonetics, 31, 599-611.
Pouplier, Marianne and Goldstein, Louis (2010). Intention in articulation: Articulatory timing
of alternating consonant sequences and its implications for models of speech production.
Language and Cognitive Processes, 25, 616-49.
Prince, Alan and Smolensky, Paul (2004). Optimality Theory: Constraint interaction in genera-
tive grammar. Blackwell, Maiden, Mass.
Przezdziecki, Marek A. (2005). Vowel harmony and coarticulation in three dialects ofYoruba:
phonetics determining phonology. PhD thesis, Cornell.
Pulvermller, Friedemann, Huss, Martina, Kherif, Ferath, Moscoso del Prado Martin, Fer-
min, Hauk, Olaf, and Shtyrov, Yury (2006). Motor cortex maps articulatory features of
speech sounds. In Proceedings of the National Academy of Sciences, USA, Volume 103,
pp. 7865-70.
Purcell, David W and Munhall, Kevin G. (2006). Adaptive control of vowel formant frequency:
Evidence from real-time formant manipulation. Journal of the Acoustical Society of Amer-
ica, 120, 966-77.
Puri, Amrita and Wojciulik, Ewa (2008). Expectation both helps and hinders object perception.
Vision Research, 48, 589-97.
Purnell, Thomas, Salmons, Joseph, Tepeli, Dilara, and Mercer, Jennifer (2005). Struc-
ture heterogeneity and change in laryngeal phonetics. Journal of English Linguistics, 33,
307-38.
Quam, Carolyn, Yuan, Jiahong, and Swingley, Daniel (2008). Relating intonational pragmatics
to the pitch realizations of highly frequent words in English speech to infants. In Proceedings
ofthesoth Annual Conference of the Cognitive Science Society (eds. B. C. Love, K. McRae, and
V. M. Sloutsky), pp. 217-22. Cognitive Science Society, Austin, TX.
R Development Core Team (2010). R: A language and environment for statistical computing.
Technical report, R Foundation for Statistical Computing, Vienna.
Raymond, William, Dautricourt, Robin, and Hume, Elizabeth (2006). Word-medial /t, d/ dele-
tion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonologi-
cal factors. Language Variation and Change, 18, 55-97.
Reading, Anthony (2004). Hope and despair. How perceptions of the future shape human behav-
ior. The Johns Hopkins University Press.
3 20 References
Sagey, Elizabeth C. (1986 [1990]). The representation of features and relations in non-linear
phonology. Garland Publishing, New York.
Saltzman, Elliot and Munhall, Kevin G. (1989). A dynamical approach to gestural patterning
in speech production. Ecological Psychology, i, 333-82.
Nam, Hosung, Krivokapic, Jelena, and Goldstein, Louis (2008). A task-dynamic toolkit
for modeling the effects of prosodie structure on articulation. In Proceedings of the Speech
Prosody 2008 Conference, Campinas, Brazil (eds. P. A. Barbosa, S. Madureira, and C. Reis).
Salverda, Anne Pier, Delphine Dahan, et al. (2003). The role of prosodie boundaries in the
resolution of lexical embedding in speech comprehension. Cognition, 90, 51-89.
Schachter, Paul (1976). An unnatural class of consonants in Siswati. Studies in African Linguis-
tics, Supplement 6, 211-20.
and Fromkin, Victoria (1968). A phonology of Akan. UCLA Working Papers in
Phonetics, 9.
Schilling-Estes, Natalie (2002). American English social dialect variation and gender. Journal
of English Linguistics, 30(2), 122-37.
Schuh, Russell G. (1998). A grammar ofMiya. University of California Press, Berkeley.
Selkirk, Elizabeth O. (1980). Prosodie domains in phonology: Sanskrit revisited. In Juncture
(eds. M. Aronoff and M.-L. Kean), pp. 107-29. Anma Libri, Saratoga.
Sendlmeier, Werner E (1981). Der Einflu von Qualitt und Quantitt auf die Perzeption
betonter Vokale im Deutschen. Phonetica, 38, 291-308.
Sereno, Joan A. and Jongman, Allard (1995). Acoustic correlates of grammatical class. Language
and Speech, 38(1), 57-76.
Shannon, Claude (1948). A mathematical theory of communication. The Bell System Technical
Journal, 27, 379~423> 623-56.
Shattuck-Hufnagel, Stefanie (1987). The role of word-onset consonants in speech production
planning: New evidence from speech error patterns. In Motor and sensory patterns in lan-
guage (eds. E. Keller and M. Gopnik). Erlbaum, Englewood Cliffs, N.J.
and Klatt, Dennis H. (1979). The limited use of distinctive features and markedness in
speech production: Evidence from speech error data. Journal of Verbal Learning and Verbal
Behavior, 18, 41-55.
Sheldon, Amy and Strange, Winifred (1982). The acquisition of/r/ and III by Japanese learn-
ers of English: Evidence that speech production can precede speech perception. Applied
Psycholinguistics, 3, 243-61.
Sheliga, Boris M., Riggio, Lucia, and Rizzolatti, Giacomo (1994). Orienting of attention and eye
movements. Experimental Brain Research, 98, 507-22.
Shen, Xiaonan (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18(2), 281-95.
Sherman, Donald (1975). Noun-verb stress alternation: An example of the lexical diffusion of
sound change in English. Linguistics, 159, 43-71.
Shiller, Douglas M., Sato, Marc, Gracco, Vincent L., and Baum, Shari R. (2009). Perceptual
recalibration of speech sounds following speech motor learning. Journal of the Acoustical
Society of America, 125,1103-13.
Shockley, Kevin, Sabadini, Laura, and Fowler, Carol A. (2004). Imitation in shadowing words.
Percetion and Psychophysics, 66(3), 422-9.
322 References
Shriberg, Elizabeth E. (1992). Perceptual restoration of filtered vowels with added noise. Lan-
guage and Speech, 35,127-36.
Sievers, Eduard (1898). Angelschsische Grammatik (3rd edn.). Max Niemeyer, Halle.
Silva, David J. (1992). The phonetics and phonology of stop lenition in Korean. PhD thesis, Cornell
University.
(1993). A phonetically based analysis of [voice] and [fortis] in Korean. In Japanese/Korean
Linguistics (ed. P. M. Clancy), Volume 2, pp. 164-74. CSLI, Stanford.
(20o6a). Acoustic evidence for the emergence of tonal contrast in contemporary Korean.
Phonology, 23, 287-308.
(20o6b). Variation in voice onset time for Korean stops: A case for recent sound change.
Korean Linguistics, 13,1-16.
Silverman, Daniel (2oo6a). A critical introduction to phonology: Of sound, mind, and body.
Continuum.
(20o6b). The diachrony of labiality in Trique, and the functional relevance of gradience
and variation. In Papers in laboratory phonology VIII (eds. L. M. Goldstein, D. H. Whalen,
and C. T. Best), pp. 135-54. Mouton de Gruyter, Berlin.
Sims, Andrea (2005). Declension hopping in dialectal Croatian: Two predictions of frequency.
In Yearbook of Morphology 2005 (eds. G. Booij and J. van Marie), pp. 201-25. Springer,
Dordrecht.
Sipka, Danko (2002). Enigmatski Glosar. Alma, Belgrade.
Smith, Caroline L. (1997). The devoicing of /z/ in American English: effects of local and
prosodie context. Journal of Phonetics, 25(4), 471-500.
Smith, Jennifer L. (2002). Phonological augmentation in prominent positions. PhD thesis, Uni-
versity of Massachusetts.
Smyth, Herbert Weir (1956). Greek grammar. Revised by Gordon M. Messing. Harvard Uni-
versity Press, Cambridge, Mass.
Snider, Keith L. (1986). Apocope, tone and the glottal stop in Chumburung. Journal of African
Languages and Linguistics, 8,133-44.
Sohn, Ho-Min (1994). Korean. Routledge, New York.
(1999). The Korean Language. Cambridge University Press, Cambridge.
Sol, Maria-Josep (i992a). Experimental phonology: The case of rhotacism. In Phonologica
1988 (eds. W. U. Dressler, H. C. Luschtzky, O. E. Pfeiffer, and J. R. Rennison), pp. 259-71.
Cambridge University Press, Cambridge.
(i992b). Phonetic and phonological processes: the case of nasalization. Language and
Speech, 35(1-2), 29-43.
Sonderegger, Morgan (2009). Dynamical systems models of language variation and change:
An application to an English stress shift. Masters thesis, Department of Computer Science,
University of Chicago.
(in press). Testing for frequency and structural effects in an English stress shift. In Pro-
ceedings of the Berkeley Linguistics Society 36 (eds. J. Cleary-Kemp, C. Cohen, S. Farmer,
L. Kassner, J. Sylak, and M. Woodley). Berkeley Linguistics Society.
and Niyogi, Partha (2010). Combining data and mathematical models of language change.
In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,
Uppsala, Sweden, pp. 1019-29. Association for Computational Linguistics.
References 323
Stampe, David (1972). A dissertation on natural phonology. PhD thesis, University of Chicago.
Stanley, Carol (1991). Description morpho-syntaxique de la langue tikar (parle au Cameroun).
SIL, Epinay-sur-Seine.
Stemberger, Joseph Paul (1991). Apparent anti-frequency effects in language production: The
Addition Bias and phonological underspecification. Journal of Memory and Language, 30,
161-85.
and Treiman, M. (1986). The internal structure of word-initial consonant clusters. Journal
of Memory and Language, 25,163-80.
Steriade, Donca (2000). Paradigm uniformity and the phonetics-phonology boundary. In
Papers in laboratory phonology V: Acquisition and the lexicon (eds. M. Broe and J. Pierre-
humbert), pp. 313-34. Cambridge University Press, Cambridge.
(2001). Directional asymmetries in place assimilation: A perceptual account. In Perception
in phonology (eds. E. Hume and K. Johnson), pp. 219-50. Academic Press, San Diego.
(2008). The phonology of perceptibility effects: The P-map and its consequences for constraint
organization. In The nature of the word: Essays in honor of Paul Kiparsky (eds. K. Hanson and
S. Inkelas), pp. 151-80. MIT Press, Cambridge, Mass.
Knoll, Ronald, Monsell, Stephen, and Wright, Charles E. (1988). Motor programs and
hierarchical organization in the control of rapid speech. Phonetica, 45,175-97.
Sternberg, Sai, Monsell, Stephen, Knoll, Ronald, and Wright, Charles E. (1978). The latency and
duration of rapid movement sequences: Comparisons of speech and typing. In Information
Processing in Motor Control and Learning (ed. G. E. Stelmach), pp. 117-52. Academic Press,
New York.
Stevens, Kenneth N. (1989). On the quantal nature of speech. Journal of Phonetics, 17, 3-46.
and Halle, Morris (1967). Remarks on analysis by synthesis and distinctive features. In
Models for the perception of speech and visual form (ed. W Wathen-Dunn), pp. 88-102. MIT
Press.
and House, Arthur S. (1963). Perturbation of vowel articulations by consonantal context:
An acoustical study. Journal of Speech and Hearing Research, 6,111-28.
and Keyser, Samuel Jay (1989). Primary features and their enhancement in consonants.
Language, 65, 81-106.
Stewart, Mary E. and Ota, Mitsuhiko (2008). Lexical effects on speech perception in individuals
with 'autistic' traits. Cognition, 109,157-62.
Stoesz, Brenda M. and Jakobson, Loma S. (2008). The influence of processing style on face
perception. Journal of Vision, 8(6), 1138.
Strand, Elizabeth A. (1999). Uncovering the role of gender stereotypes in speech perception.
Journal of Language and Psychology, 18, 86-99.
Strange, Winifred, Verbrugge, Robert R., Shankweiler, Donald P., and Edman, Thomas R.
(1976). Consonant environment specifies vowel identity. Journal of the Acoustical Society
of America, 60, 213-24.
Streeter, Lynn A. (1976). Language perception of two-month-old infants show effects of both
innate mechanisms and experience. Nature, 259, 39-41.
Strogatz, Steven H. (1994). Nonlinear dynamics and chaos. Addison-Wesley, Reading, MA.
Strong, Herbert A., Logeman, Willem S., and Wheeler, Benjamin Ide (1891). Introduction to
the study of the history of language. Longmans, Green, & Co., New York.
3 24 References
Stuart-Smith, Jane and Timmins, Claire (2009). The role of the individual in language variation
and change. In Language and Identities (eds. C. Llamas and D. Watt), pp. 39-54. Edinburgh
University Press, Edinburgh.
and Tweedie, Fiona (2007). 'Talkin jockney'? Variation and change in Glaswegian
accent. Journal of Sociolinguistics, 11, 221-60.
Summerfield, Quentin (1981). Articulatery rate and perceptual constancy in phonetic per-
ception. Journal of Experimental Psychology: Human Perception and Performance, 7,
1074-95-
Sundberg, Ulla and Lacerda, Francisco (1999). Voice onset time in speech to infants and adults.
Phonetica, 56,186-99.
Svantesson, Jan-Olof (1989). Tonogenetic mechanisms in Northern Mon-Khmer. Phonet-
ica, 46, 60-79.
Sweet, Henry (1913). Collected papers of Henry Sweet, ed. by H. C. Wyld. Clarendon Press,
Oxford.
Tabor, Whitney T. (1994). Syntactic innovation: A connectionist model. PhD thesis, Stanford
University, Stanford, CA.
Tang, Joanne S.-Y. and Maidment, John A. (1996). Prosodie aspects of Cantonese child-directed
speech. Speech, Hearing and Language, 9, 257-76.
Tang, Katrina Elizabeth (2008). The phonology and phonetics of consonant-tone interaction.
PhD thesis, UCLA.
Tanowitz, Jill and Beddor, Patrice Specter (1997). Temporal characteristics of coarticula-
tory vowel nasalization in English. The Journal of the Acoustical Society of America, 101,
3194A.
Tees, Richard C. and Werker, Janet F. (1984). Perceptual flexibility: Maintenance or recovery
of the ability to discriminate nonnative speech sounds. Canadian Journal of Psychology, 38,
579-90.
Templin, Mildred C. (1957). Certain language skills in children: Their development and interre-
lationships. Greenwood, Westport, Conn.
Tettamanti, Marco, Moro, Andrea, Messa, Cristina, Moresco, Rosa M., Rizzo, Giovanna,
Carpinelli, Assunta, Matarrese, Mario, Fazio, Ferruccio, and Perani, Daniela (2005). Basal
ganglia and language: phonology modulates dopaminergic release. Neuroreport, 16(4),
397-401.
Thiessen, Erik D., Hill, Emily A., and Saffran, Jenny R. (2005). Infant-directed speech facilitates
word segmentation. Infancy, 7(1), 53-71.
Thompson, Laurence C. and Thompson, M. Terry (1985). A Grassmanns Law for Salish.
Oceanic Linguistics Special Publications, 20,134-47.
Thurgood, Graham, and Javkin, Hector. (1975). An acoustic explanation of a sound change:
*-ap to -o, *-at to -e, and *-ak to -ae. Journal of Phonetics 3,161-5.
Tilsen, Sam (2007). Vowel-to-vowel coarticulation and dissimilation in phonemic-response
priming. In UC-Berkeley Phonology Lab Annual Report, pp. 416-58. Berkeley Phonology
Laboratory.
(2oo9a). Interactions between speech rhythm and gesture. PhD thesis, University of
California, Berkeley.
References 325
Tusen, Sam (2009!)). Subphonemic and cross-phonemic priming in vowel shadowing: evidence
for the involvement of exemplars in production. Journal of Phonetics, 37(3), 276-96.
Tipper, Steven P., Howard, Louise A., and Houghton, George (2000). Behavioral consequences
of selection from neural population codes. In Attention and performance XVIII: Control of
cognitive processes (eds. S. Monsell and J. Driver), pp. 223-45. MIT Press, Cambridge, MA.
Toon, Thomas E. (1978). Lexical diffusion in Old English. In Chicago Linguistics Society: Papers
from theparasessions on the lexicon, pp. 357-64. Chicago Linguistics Society.
Toscano, Joseph C. and McMurray, Bob (2010). Cue integration with categories: Weighting
acoustic cues in speech using unsupervised learning and distributional statistics. Cognitive
Science, 34, 434-64.
Tournadre, Nicolas (2005). LAire linguistique tibtaine et ses divers dialectes. LALIES, 25,
7-56.
Townsend, David and Bever, Thomas (2001). Sentence comprehension: The integration of habits
and rules. MIT Press, Cambridge, MA.
Trager, George L. (1940). One phonemic entity becomes two: The case of 'short a'. American
Speech, 15, 255-8.
Traill, Anthony (1990). Depression without depressors. South African Journal of African Lan-
guages, 10,166-72.
Trehub, Sandra E. (1976). The discrimination of foreign speech contrasts by infants and adults.
Child Development, 47, 466-72.
Treiman, Rebecca, Kessler, Brett, Knewasser, Stephanie, Tincoff, Ruth, and Bowman, Margo
(2000). English speakers' sensitivity to phonotactic patterns. In Papers in laboratory phonol-
ogy V: Acquisition and the lexicon (eds. M. Broe and J. Pierrehumbert), pp. 269-82.
Cambridge University Press, Cambridge.
Tremblay, Kelly, Kraus, Nina, and McGee, Thrse (1998). The time course of auditory percep-
tual learning: Neurophysiological changes during speech-sound training. NeuroReport, 9,
3557-60.
Troutman, Celina, Goldrick, Matthew, and Clark, Brady (2008). Social networks and
intraspeaker variation during periods of language change. University of Pennsylvania Work-
ing Papers in Linguistics, 14(1), 325-38.
Trubetzkoy, Nikolai Sergeevich (1969). Principles of phonology [originally published in 1939;
English translation by Christiane A. M. Baitaxe]. University of California Press, Berkeley,
CA.
Tsushima, Teruaki, Takizawa, Osamu, Sasaki, Midori, Shiraki, Satoshi, Nishi, Kanae, Kohno,
Morio, Menyuk, Paula, and Best, Catherine T. (1994). Discrimination of English /r-1/ and
/w-y/ by Japanese infants at 6-12 months: Language-specific developmental changes in
speech perception abilities. In Proceedings of the International Conference on Spoken Lan-
guage Processing, Volume 4, pp. 1695-8. Acoustical Society of Japan.
Umeda, N. (1981). Influence of segmental factors on fundamental frequency in fluent speech.
Journal of the Acoustical Society of America, 70(2), 350-5.
Vlimaa-Blum, Riitta (2009). The phoneme in cognitive phonology: episodic memories of both
meaningful and meaningless units? Cognitextes, 2. Retrieved from http://cognitextes.revues.
org/2ii on 2010-07-16.
3 26 References
Vallabha, Gautam K., McClelland, James L., Pons, Ferran, Werker, Janet R, and Amano, Shigeaki
(2007). Unsupervised learning of vowel categories from infant-directed speech. Proceedings
of the National Academy of Sciences, 104(33), 13273-8.
van der Hlst, Harry and van de Weijer, Jeroen (1995). Vowel harmony. In Handbook of phono-
logical theory (ed. J. Goldsmith). Blackwell, Cambridge, MA and Oxford.
Van der Stigchel, Stefan, Meeter, Martijn, and Theeuwes, Jan (2006). Eye movement trajectories
and what they tell us. Neuroscience Biobehavioral Review, 30(5), 666-79.
and Theeuwes, Jan (2005). The influence of attending to multiple locations on eye move-
ments. Vision Research, 45(15), 1921-7.
van Dommelen, Wim A. (1993). Does dynamic Fo increase perceived duration? New light on
an old issue. Journal of Phonetics, 21, 367-86.
Vance, Timothy J. (1987). An introduction to Japanese phonology. State University of New York
Press, Albany.
Vennemann, Theo (i972a). Phonetic analogy and conceptual analogy. In Schuchhardt, the
Neogrammarians, and the transformational theory of phonological change: Four essays by
Hugo Schuchhardt, Theo Vennemann, Terence H. Wilbur (eds. T. Vennemann and T. H.
Wilbur), No. 26 in Linguistische Forschungen, pp. 115-79. Athenum, Frankfurt am Main.
(i972b). Rule inversion. Lingua 29, 209-42.
(1974). Words and syllables in natural generative phonology. In Parasession on natural
phonology (eds. A. Brck, R. Fox, and M. La Galy), pp. 346-74. Chicago Linguistic Society.
Vergnaud, Jean-Roger (1980). A formal theory of vowel harmony. In Issues in vowel harmony
(ed. R. M. Vago), pp. 49-62. John Benjamins, Amsterdam.
Verner, Karl (1877). Eine Ausnahme der ersten Lautverschiebung. Zeitschrift fr vergleichende
Sprachforschung, 23, 97-130.
Viswanathana, Navin, Magnusona, James S., and Fowler, Carol A. (2010). Compensation for
coarticulation: Disentangling auditory and gestural theories of perception of coarticula-
tory effects in speech. Journal of Experimental Psychology: Human Perception and Perfor-
mance, 36(4), 1005-15.
Vitevitch, Michael and Luce, Paul (1999). Probabilistic phonotactics and neighborhood activa-
tion in spoken word recognition. Journal of Memory and Language, 40, 374-408.
Charles-Luce, Jan, and Kemmerer, David (1997). Phonotactics and syllable stress:
Implications for the processing of spoken nonsense words. Language and Speech, 40(1),
47-62.
von dem Hagen, Elisabeth A. H., Nummenmaa, Lauri, Yu, Rongjun, Engell, Andrew D.,
Ewbank, Michael P., and Calder, Andrew J. (2010). Autism spectrum traits in the typical
population predict structure and function in the posterior superior temporal sulcus. Cerebral
Cortex, 21(3), 493-500.
Vulkan, Nir (2000). An economists perspective on probability matching. Journal of Economic
Surveys, 14,101-18.
Wakabayashi, Akio, Baron-Cohen, Simon, and Wheelwright, Sally (2006). Are autistic traits an
independent personality dimension? A study of the Autism-Spectrum Quotient (AQ) and
the NEO-PI-R. Personality and Individual Differences, 41, 873-83.
Walter, Mary Ann (2008). Heading toward harmony? Vowel cooccurrence in the Croatian
lexicon. Paper presented at the Symposium on Phonologization, University of Chicago.
References 327
Wang, William S.-Y. and Charles J. Fillmore (1961). Intrinsic cues and consonant perception.
Journal of Speech and Hearing Research 4,130-6.
Lehiste, I., Chuang, C. K., and Darnovsky, N. (1976). Perception of vowel duration. Journal
of the Acoustical Society of America, 60, 892.
Watkins, Kate and Paus, Toms (2004). Modulation of motor excitability during speech per-
ception: The role of Brocas area. Journal of Cognitive Neuroscience, 16(6), 978-87.
Strafella, A., and Paus, T. (2003). Seeing and hearing speech excites the motor system
involved in speech production. Neuropsychologia, 41, 989-94.
Watters, John Robert (1979). Focus in Aghem: A study of its formal correlates and typology. In
Aghem grammatical structure, Southern California Occasional Papers in Linguistics 7, pp.
137-97. University of Southern California, Los Angeles.
Wedel, Andrew (2004a). Category competition drives contrast maintenance within an
exemplar-based production/perception loop. In Proceedings of the seventh meeting of the ACL
special interest group in computational phonology, Barcelona, Spain, pp. 1-10. Association for
Computational Linguistics.
(2oo4b). Self-organization and categorical behavior in phonology. PhD thesis, UC Santa
Cruz.
(2006). Exemplar models, evolution and language change. The Linguistic Review, 23,
247-74.
(2007). Feedback and regularity in the lexicon. Phonology, 24,147-85.
Weinreich, Uriel, Labov, William, and Herzog, Marvin I. (1968). Empirical foundations for a
theory of language change. In Directions for historical linguistics: A symposium (eds. W P.
Lehmann and Y. Malkiel), pp. 95-188. University of Texas Press, Austin, TX.
Weiss, Michael (2010). Outline of the historical and comparative grammar of Latin. Beech Stave
Press, Ann Arbor, Mich.
Weisstein, Eric (2009). Beta distribution. From MathWorld-A Wolfram Web Resource.
Retrieved on 2009-03-15.
Welsh, Timothy and Elliot, Digby (2005). The effects of response priming on the planning
and execution of goal-directed movements in the presence of a distracting stimulus. Acta
Psychologica, 119,123-42.
Werker, Janet F. and McLeod, P. J. (1989). Infant preference for both male and female infant-
directed talk: A developmental study of attentional and affective responses. Canadian Journal
of Psychology, 43(2), 230-46.
Pons, Ferran, Dietrich, Christiane, Kajikawa, Sachiyo, Fais, Laurel, and Amano, Shigeaki
(2007). Infant-directed speech supports phonetic category learning in English and Japanese.
Cognition, 103,147-62.
Shi, R., Desjardins, R., Pegg, J. E., Polka, L., and Patterson, M. (1998). Three meth-
ods for testing infant speech perception. In Perceptual development: Visual, auditory, and
speech perception in infancy (ed. A. Slater), pp. 389-420. Psychology Press, Hove, East
Sussex, UK.
and Tees, Richard C. (1984). Cross-language speech perception: Evidence for perceptual
reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.
Westbury, John and Keating, Patricia A. (1986). On the naturalness of stop consonant voicing.
Journal of Linguistics, 22,145-66.
328 References
Werker, Janet E, Hashi, Michiko, and Lindstrom, Mary J. (1998). Differences among speakers
in lingual articulation for American English /r/. Speech Communication, 26, 203-26.
Wetzels, W. Leo (2007). On the representation of nasality in Maxacali: evidence from Por-
tuguese loans. Appeared (2006) in Portuguese translation, Sobre a representaco da nasal-
idade em Maxacali: evidencias de emprstimos do Portugus. In Descrio, Historia e
Aquisiao do Portugus Brasileiro, pp. 217-40. Pontes/FAPESP, Campinas.
Whalen, Douglas H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18, 3-35.
(1991). Subcategorical phonetic mismatches and lexical access. Perception and Psy-
chophysics, 50, 351-60.
Levitt, Andrea G., and Goldstein, Louis M. (2007). VOT in the babbling of French- and
English-learning infants. Journal of Phonetics, 35(3), 341-52.
Wheeldon, Linda R. and Lahiri, Aditi (1997). Prosodie units in speech production. Journal of
Memory and Language, 37, 356-81.
(2002). The minimal unit of prosodie encoding: prosodie or lexical word. Cogni-
tion, 85(2), 631-641.
and Levelt, Willem J. M. (1995). Monitoring the time course of phonological encoding.
Journal of Memory and Language, 34, 311-34.
Wheelwright, Sally, Baron-Cohen, Simon, Goldenfeld, Nigel, Delaney, Joe, Fine, Debra, Smith,
Richard, Weil, Leonora, and Wakabayashi, Akio (2006). Predicting Autism Spectrum Quo-
tient (AQ) from the Systemizing Quotient-Revised (SQ-R) and Empathy Quotient (EQ).
Brain Research, 1079, 47-56.
Wilson, Colin (2006). Learning phonology with substantive bias: An experimental and com-
putational study of velar palatalization. Cognitive Science, 30, 945-82.
Windfuhr, Gernot L. (1997). Persian phonology. In Phonologies of Africa and Asia (ed. A. Kaye),
Volume 2, pp. 675-90. Eisenbrauns, Winona Lake.
Witkin, Herman A., Moore, Carol A., Goodenough, Donald R., and Cox, Patricia W (1977).
Field-dependent and field-independent cognitive styles and their educational implications.
Review of Educational Research, 47(1), 1-64.
Wong, Patrick C. M. and Perrachione, Tyler K. (2007). Learning pitch patterns in lexical iden-
tification by native English-speaking adults. Applied Psycholinguistics, 28, 565-85.
and Parrish, Todd B. (2007). Neural characteristics of successful and less successful
speech and word learning in adults. Human Brain Mapping, 28, 995-1006.
Wright, Jonathan (2007). Laryngeal contrasts in Seoul Korean. PhD thesis, University of Penn-
sylvania, Philadelphia, PA.
Wright, Richard (1996). Consonant clusters and cue preservation in Tsou. PhD thesis, UCLA.
and Ladefoged, Peter (1997). A phonetic study of Tsou. Bulletin of the Institute of History
and Philology, Academia Snica, 68, 987-1028.
Xu, Yi (1997). Contextual tonal variation in Mandarin. Journal of Phonetics, 25(1), 65-83.
Yamada, Reiko A. and Tohkura, Yoh'ichi (1992). The effects of experimental variables on
the perception of American English /r/ and III by Japanese listeners. Perception and Psy-
chophysics, 52, 376-92.
Yang, Charles D. (2001). Internal and external forces in language change. Language Variation
and Change, 12(3), 231-50.
(2002). Knowledge and learning in natural language. Oxford University Press, New York.
References 329
Yu, Alan C. L. (2004). Explaining final obstruent voicing in Lezgian: Phonetics and history.
Phonology, 24, 73-97-
(2007). Understanding near mergers: The case of morphological tone in Cantonese.
Phonology, 24(1), 187-214.
(20ioa). Perceptual compensation is correlated with individuals' 'autistic' traits: Implica-
tions for models of sound change. PLoS ONE, 5(8), 11950.
(2oiob). Tonal effects on perceived vowel duration. In Laboratory Phonology 10 (eds.
C. Fougeron, B. Khnert, M. DTmperio, and N. Valle), pp. 151-68. Mouton de Gruyter,
Berlin.
(2011). On measuring phonetic precursor robustness: A response to Moretn 2008.
Phonology, 28(3), 491-518.
Abrego-Collier, Carissa, Baglini, Rebekah, Grano, Tommy, Martinovic, Martina, Otte,
Charles III, Thomas, Julia, and Urban, Jasmin (2011). Speaker attitude and sexual orientation
affect phonetic imitation. Penn Working Papers in Linguistics, 17(1), 235-42.
Zimmer, Karl (1985). Arabic loanwords and Turkish phonological structure. International Jour-
nal of American Linguistics, 51, 623-5.
Zipf, George Kingsley (1932). Selected studies of the principle of relative frequency in language.
Harvard University Press, Cambridge, MA.
Zuraw, Kie (2003). Probability in language change. In Probabilistic Linguistics (ed. Reus Bod,
Jennifer Hay, and Stefanie Jannedy), pp. 139-76. MIT Press.
(2007). The role of phonetic knowledge in phonological patterning: Corpus and survey
evidence from Tagalog infixation. Language, 83(2), 277-316.
This page intentionally left blank
Language Index
Aari 66 French 22, 36-8, 130, 132, 136-9, 151-2,
Aghem 24-5 155-6, 158-61, 249-50
Akuapem/Asante 21
Athabaskan 66, 72 German 67-8, 75, 83, 104-6, 109-10, 139,
151-2, 155-6, 158-61
Bole 12-13 Giryama 12-13
Bondei 12 Gonja 21
Greek 67, 69-71, 73-6
Cantonese 145,153, 219-220
Central Tibetan 65 Hindi 132, 204, 253
Chiche wa 12 Hu 101-2
Chumburung 21
Cokwe 13 Ikalanga 6,15
Creek 104
Czech 132, 204 Japanese 15, 63,105-6,109-10,132,137,
139,204
Dagbani 21
Digo 12-13 Kauma 12
Dutch 100,137,151-2,155-6,158-61 Kinande 14
Limburgian 100 Korean 99-100, 149, 151-2, 155-6, 158-61,
184-5, 196, 204, 229-30, 233, 238-43,
English 6, 8, 20, 22, 24, 42, 45-6, 51, 58, 245-6
73-8, 75, 79, 3-5, 87, 117, 1*3, Middle 99-100
130-41, 144-5, 151-2, 154-61, 168, Modern Seoul 99-100
182-3, 185, 187-8, 192, 194, 204-5,
212, 219, 249, 262-3, 265, 267, 269, Latin 67-8, 72-3, 75-6, 83,103
271-3,275,277,279,281, Late Spoken 102
283-4 Luganda 14, 21
African American Vernacular 137
British 20, 123, 263-5 Makua 13
Cockney 137 Malagasy 24,134
Middle English 68-9 Masa 14-15
Old English 42, 68-9, 72-3 Mentu Land Dayak 15
Estonian 101, 106-7 Mijikenda 12-13
Ewe 6, 10 Miya 12, 16
Musey 12,14
Fante 21 Mwiini 12
Filipino 133-4, 138
Finnish 248-9 Namwanga 12
332 Language Index
Rihe 12 Xhosa 12
inhibition 60-1, 66, 77-8, 89, 113, 115, 117, [nasal] 167-8,170-2,174,176-7,179-80
119,123-7 nasals 5, 9,11-16,18, 20, 62, 64, 69, 76, 80,
innovator 53, 83-4, 202, 219, 221, 224 85, 92,132-4,149,154,167-8,170-2,
input filtering 279 174, 176-7, 179-98, 249-50
instability 31, 40-3, 93, 144, 209 nasal assimilation 167
intonation 20, 139 natural classes 166-7,169,196
Italian 63, 67, 76,139, 206 noun/verb pairs (English) 263-6, 268-73,
iterated learning 252,256,260 275-7,279, 282-4
iterated maps 277
obstruent-glide fusion 73
Kullback-Liebler (KL) divergence 241,
243-5 P-base 167
palatalization 5, 54, 64, 69-71, 76, 97,
leader 202, 208, 221, 224, 226 184-5, 219-20
levels (of representation) 7-8,85 partitioning 169-70, 172-5, 178, 180, 276
lexical diffusion 96, 263-4, 273 perception 38-9, 41-2, 44, 52-4, 56, 58-60,
lexicon 20-1, 88, 138, 150-2, 154, 157, 63-4, 66-7, 70-3, 78, 81, 85-8, 90, 92,
159-63, 236, 238-9, 249-50, 253-4, 94, 97, 99, 101, 103-5, 108-11, 114,
256,258,275 127-38, 140, 144-5, 153-4, 156, 183-5,
linguistic population 262-3,278 187, 196, 203-7, 209, 212, 216-20,
listener-based misperception 263 224-7, 229-33, 236, 239, 245-7, 253,
Literature Online 273 263,272
long-distance displacement (nonlocal perceptual compensation 64, 84-5, 92,
metathesis) 66-7 127, 154, 207-9, 211, 215-18, 220-1,