Вы находитесь на странице: 1из 12

Haskins 1.

Aboratories Status Report on Speech Research


1993, SR-113, 51-62

Dynamics and Articulatory Phonology*

Catherine P. Browman and Louis Goldsteint

1. INTRODUCTION The same description also provides an intrinsic


specification of the high dimensional properties of
Traditionally, the study of human speech and its the act (its various mechanical and bio-mechanical
patterning has been approached in two different consequences).
ways. One way has been to consider it as mechan- In this chapter, we will briefly examine the na-
ical or bio-mechanical activity (e.g., of articulators ture of the low and high dimensional descriptions
or air molecules or cochlear hair cells) that of speech, and contrast the dynamical perspective
changes continuously in time. The other way has that unifies these to other approaches in which
been to consider it as a linguistic (or cognitive) they are separated as properties of mind and body.
structure consisting of a sequence of elements We will then review some of the basic assump-
chosen from a closed inventory. Development of tions and results of developing a specific model in-
the tools required to describe speech in one or the corporating dynamical units, and illustrate how it
other of these approaches has proceeded largely in provides both low and high dimensional descrip-
parallel, with one hardly informing the other at all tions.
(some notable exceptions are discussed below). As
a result, speech has been seen as having two 2. DIMENSIONALITY OF
structures, one considered physical, and the other DESCRIPTION
cognitive, where the relation between the t",:o
Human speech events can be seen as quite
structures is generally not an intrinsic part of eI-
complex, in the sense that an individual utterance
ther description. From this perspective, a com-
follows a continuous trajectory through a space
plete picture requires 'translating' between the in-
defined by a large number of potential degrees of
trinsically incommensurate domains (as argued by
freedom, or dimensions. This is true whether the
Fowler, Rubin, Remez, & Turvey, 1980).
dimensions are neural, articulatory, acoustic,
The research we have been pursuing (Browman
aerodynamic, auditory, or other ways of describing
& Goldstein, 1986; 1989; 1990a,b; 1992)
the event. The fundamental insight of phonology,
('articulatory phonology') begins with the very
however, is that the pronunciation of the words in
different assumption that these apparently
a given language may differ from (that is, contrast
different domains are, in fact, the low and high
with) each other in only a restricted number of
dimensional descriptions of a single (complex)
ways: the number of degrees of freedom actually
system. Crucial to this approach is identification
employed in this contrastive behavior is far fewer
of phonological units with dynamically specified
than the number that is mechanically available.
units of articulatory action, called gestures.
This insight has taken the form of the hypothesis
Thus an utterance is described as an act that
that words can be decomposed into a small
can 'be decomposed into a small number
number of primitive units (usually far fewer than
of primitive units (a low dimensional description),
one hundred in a given language) which can
in a particular spatio-temporal configuration.
combine in different ways to form the large
number of words required in human lexicons.
Thus, as argued by Kelso, Saltzman, and Tuller
This work was supported by NSF grant DBS-9112198 and
(1986), human speech is characterized not only by
NIH grants HD-01994 and DC-00121 to Haskins Laboratories. a high number of potential (microscopic) degrees
Thanks to Alice Faber and Jeff Shaw for comments on an of freedom, but also by a low dimensional
earlier version. (macroscopic) form. This macroscopic form is

51
52 Browman and Goldstein

usually called the 'phonological' form. As will be A major approach that did take seriously the
suggested below, this collapse of degrees of goal of unifying the cognitive and physical aspects
freedom can possibly be understood as an instance of speech description was that in the Sound
of the kind of self-organization found in other Pattern of English (Chomsky & Halle, 1968), in-
complex systems in nature (Haken, 1977; cluding the associated work on the development of
Kauffmann, 1991; Kugler & Turvey, 1987; Madore the theory of distinctive features (Jakobson, Fant,
& Freedman, 1987; Schoner & Kelso, 1988). & Halle, 1951) and the quantal relations that
Historically, however, the gross differences underlie them (Stevens, 1972, 1989). In this ap-
between the macroscopic and microscopic scales of proach, an utterance is assigned two representa-
description have led researchers to ignore one or tions: a 'phonological' one, whose goal is to de-
the other description, or to assert its irrelevance, scribe how the utterance functions with respect to
and hence to generally separate the cognitive and contrast and patterns of alternation, and a
the physical. Anderson (1974) describes how the 'phonetic' one, whose goal is to account for the
development of tools in the nineteenth and early grammatically determined physical properties of
twentieth centuries led to the quantification of the utterance. Crucially, however, the relation
more and more details of the speech signal, but between the representations is quite constrained:
"with such increasingly precise description, both descriptions employ exactly the same set of
however, came the realization that much of it was dimensions (the features). The phonological repre-
irrelevant to the central tasks of linguistic sentation is coarser in that features may take on
science" (pA). Indeed, the development of many only binary values, while the phonetic representa-
early phonological theories (e.g., those of tion is more fine-grained, with the features having
Saussure, Trubetzkoy, Sapir, Bloomfield) scalar values. However, a principled relation be-
proceeded largely without any substantive tween the binary values and the scales is also pro-
investigation of the measurable properties of the vided: Stevens' quantal theory attempts to show
speech event at all (although Anderson notes how the potential continuum of scalar feature val-
Bloomfield's insistence that the smallest ues can be intrinsically partitioned into categori-
phonological units must ultimately be defined in cal regions, when the mapping from articulatory
terms of some measurable properties of the speech dimensions to auditory properties is considered.
signal). In general, what was seen as important Further, the existence of such quantal relations is
about phonological units was their function, their used to explain why languages employ these par-
ability to distinguish utterances. ticular features in the first place.
A particularly telling insight into this view of Problems raised with this approach to speech
the lack of relation between the phonological and description soon led to its abandonment, however.
physical descriptions can be seen in Hockett's One problem is that its phonetic representations
(1955) familiar Easter egg analogy. The structure were shown to be inadequate to capture certain
serving to distinguish utterances (for Hockett, a systematic physical differences between
sequence of letter-sized phonological units called utterances in different languages (Keating, 1985;
phonemes) was viewed as a row of colored, but Ladefoged, 1980; Port, 1981). The scales used in
unboiled, easter eggs on a moving belt. The the phonetic representations are themselves of
physical structure (for Hockett, the acoustic reduced dimensionality, when compared to a
signal) was imagined to be the result of running complete physical description of utterances.
the belt through a wringer, effectively smashing Chomsky and Halle hypothesized that such
the eggs and intermixing them. It is quite striking further details could be supplied by universal
that, in this analogy, the cognitive structure of the rules. However, the above authors (also Browman
speech event cannot be seen in the gooey mess & Goldstein, 1986) argued that this would not
itself. For Hockett, the only way the hearer can work-the same phonetic representation (in the
respond to the event is to infer (on the basis of Chomsky and Halle sense) can have different
obscured evidence, and knowledge of possible egg physical properties in different languages. Thus,
sequences) what sequence of eggs might have been more of the physical detail (and particularly
responsible for the mess. It is clear that in this details having to do with timing) would have to be
view, the relation between cognitive and physical specified as part of the description of a particular
descriptions is neither systematic nor particularly language. Ladefoged's (1980) argument cut even
interesting. The descriptions share color as an deeper. He argued that there is a system of scales
important attribute, but beyond that there is little that is useful for characterizing the measurable
relation. articulatory and acoustic properties of utterances,
Dynamics and Articulatory Phonology 53

but that these scales are very different from the units arise from, or are constrained by, the micro-
features proposed by Chomsky and Halle. scopic, i.e., the detailed properties of speech ar-
One response to these failings has been to ticulation and the relations among speech articu-
hypothesize that descriptions of speech should lation, aerodynamics, acoustics, and audition (e.g.,
include, in addition to phonological rules of the Lindblom, MacNeillage, & Studdert-Kennedy,
usual sort, rules that take (cognitive) phonological 1983; Ohala, 1983; Stevens, 1972; 1989). A second
representations as input and convert them to line has shown that there are constraints running
physical parameterizations of various sorts. These in the opposite direction, such that the
rules have been described as rules of 'phonetic (microscopic) detailed articulatory or acoustic
implementation' (e.g., Keating, 1985; Keating, properties of particular phonological units are de-
1990; Klatt, 1976; Liberman & Pierrehumbert, termined, in part, by the macroscopic system of
1984; Pierrehumbert, 1990; Port, 1981). Note that contrast and combination found in a particular
in this view, the description of speech is divided language (e.g., Keating, 1990; Ladefoged, 1982;
into two separate domains, involving distinct Manuel & Krakow, 1984; Wood, 1982). The appar-
types of representations: the phonological or ent existence of this bi-directionality is of consid-
cognitive structure and the phonetic or physical erable interest, because recent studies of the
structure. This explicit partitioning of the speech generic properties of complex physical systems
side of linguistic structure into separate phonetic have demonstrated that reciprocal constraint be-
and phonological components which employ tween macroscopic and microscopic scales is a
distinct data types that are related to one another hallmark of systems displaying 'self-organization'
only through rules of phonetic implementation (or (Kugler & Turvey, 1987; see also discussions by
'interpretation') has stimulated a good deal of Langton in Lewin, 1992 [pp. 12-14; 188-191), and
research (e.g., Cohn, 1990; Coleman, 1992; work on the emergent properties of "co-evolving"
Fourakis & Port, 1986; Keating, 1988; Liberman complex systems: Hogeweg, 1989; Kauffman,
& Pierrehumbert, 1984). However, there is a 1989; Kauffman & Johnsen, 1991; Packard, 1989).
major price to be paid for drawing such a strict Such self-organizing systems (hypothesized as
separation: it becomes very easy to view phonetic underlying such diverse phenomena as the con-
and phonological (physical and cognitive) struction of insect nests and evolutionary and
structures as essentially independent of one ecological dynamics) display the property that the
another, with no interaction or mutual constraint. 'local' interactions among a large number of mi-
As Clements (1992) describes the problem: "The croscopic system components can lead to emergent
result is that the relation between the patterns of 'global' organization and order. The
phonological and phonetic components is quite emergent global organization also places con-
unconstrained. Since there is little resemblance straints on the components and their local
between them, it does not matter very much for interactions. Thus, self-organization provides a
the purposes of phonetic interpretation what the principled linkage between descriptions of
form of the phonological input is; virtually any different dimensionality of the same system: the
phonological description can serve its purposes high·dimensional description (with many degrees
equally well. (p. 192)" Yet, there is a constrained of freedom) of the local interactions and the low-
relation between the cognitive and physical dimensional description (with few degrees of
structures of speech, which is what drove the freedom) of the emergent global patterns. From
development of feature theory in the first place. this point of view, then, speech can be viewed as a
In our view, the relation between the physical single complex system (with low-dimensional
and cognitive, i.e. phonetic and phonological, as- macroscopic and high-dimensional microscopic
pects of speech is inherently constrained by their properties) rather than as two distinct
being simply two levels of description-the micro- components.
scopic and macroscopic-of the same system. A different recent attempt to articulate the na-
Moreover, we have argued that the relation be- ture of the constraints holding between the cogni-
tween microscopic and macroscopic properties of tive and physical structures can be found in
speech is one of mutual or reciprocal constraint Pierrehumbert (1990), in which the relation be-
(Browman & Goldstein, 1990b). As we elaborated tween the structures is argued to be a 'semantic'
there, the existence of such reciprocity is sup- one, parallel to the relation that obtains between
ported by two different lines of research. One line concepts and their real world denotations. In this
has attempted to show how the macroscopic prop- view, macroscopic structure is constrained by the
erties of contrast and combination of phonological microscopic properties of speech and by the prin-
54 Browman and Goldstein

ciples guiding human cognitive category-forma- As will be elaborated below, contrast among
tion. However, the view fails to account for the utterances can be defined in terms of these
apparent bi-directionality of the constraints. That gestural constellations. Thus, these structures can
is, there is no possibility of constraining the mi- capture the low-dimensional properties of
croscopic properties of speech by its macroscopic utterances. In addition, because each gesture is
properties in this view. (For a discussion of possi- defined as a dynamical system, no rules of
ble limitations to a dynamic approach to phonol- implementation are required to characterize the
ogy, see Pierrehumbert & Pierrehumbert, 1990). high-dimensional properties of the utterance. A
The 'articulatory phonology' that we have been time-varying pattern of articulator motion (and its
developing (e.g., Browman & Goldstein, 1986, resulting acoustic consequences) is lawfully
1989, 1992) attempts to understand phonology entailed by the dynamical systems themselves-
(the cognitive) as the low-dimensional macroscopic they are self-implementing. Moreover, these time-
description of a physical system. In this work, varying patterns automatically display the
rather than rejecting Chomsky and Halle's property of context dependence (which is
constrained relation between the physical and ubiquitous in the high dimensional description of
cognitive, as the phonetic implementation speech) even though the gestures are defined in a
approaches have done, we have, if anything, context-independent fashion. The nature of the
increased the hypothesized tightness of that articulatory dimensions along which the
relation by using the concept of different individual dynamical units are defined allows this
dimensionality. We have surmised that the context dependence to emerge lawfully.
problem with the program proposed by Chomsky The articulatory phonology approach has been
and Halle was instead in their choice of the incorporated into a computational system being
elementary units of the system. In particular, we developed at Haskins Laboratories (Browman &
have argued that it is wrong to assume that the Goldstein, 1990a,c; Browman, Goldstein, Kelso,
elementary units are (1) static, (2) neutral Rubin, & Saltzman, 1984; Saltzman, 1986;
between articulation and acoustics, and (3) Saltzman, & Munhall, 1989). In this system,
arranged in non-overlapping chunks. Assumptions illustrated in Figure 1, utterances are organized
(1) and (3) have been argued against by Fowler et ensembles (or constellations) of units of
al. (1980), and (3) has also been rejected by most articulatory action called gestures. Each gesture is
of the work in 'nonlinear' phonology over the past modeled as a dynamical system that characterizes
15 years. Assumption (2) has been, at least the formation (and release) of a local constriction
partially, rejected in the 'active articulator' within the vocal tract (the gesture's functional
version of 'feature geometry'-Halle (1982), Sagey goal or 'task'). For example, the word "ban" begins
(1986), McCarthy (1988). with a gesture whose task is lip closure.

3. GESTURES intended output


Articulatory phonology takes seriously the view utterance speech
that the units of speech production are actions,
and therefore that (1) they are dynamic, not static.
Further, since articulatory phonology considers
phonological functions such as contrast to be low- LINGUISTIC TASK VOCAL

dimensional, macroscopic descriptions of such GESTURAL DYNAMIC TRACT


actions, the basic units are (2) not neutral MODEL MODEL MODEL
between articulation and acoustics, but rather are
articulatory in nature. Thus, in articulatory
phonology, the basic phonological unit is the
articulatory gesture, which is defined as a
dynamical system specified with a characteristic
set of parameter values (see Saltzman, in press).
Finally, because the tasks are distributed across
the various articulator sets of the vocal tract (the
Figure 1. Computational system for generating speech
lips, tongue, glottis, velum, etc.), an utterance is using dynamically-defined articulatory gestures.
modeled as an ensemble, or constellation, of a
small number of (3) potentially overlapping The formation of this constriction entails a change
gestural units. in the distance between upper and lower lips (or
Dynamics and Articulatory Phonology 55

Lip Aperture) over time. This change is modeled two paired tract variable regimes are specified,
using a second order system (a 'point attractor,' one controlling the constriction degree of a par-
Abraham & Shaw, 1982), specified with particular ticular structure, the other its constriction loca-
values for the equilibrium position and stiffness tion (a tract variable regime consists of a set of
parameters. (Damping is, for the most part, values for the dynamic parameters of stiffness,
assumed to be critical, so that the system equilibrium position, and damping ratio). Thus,
approaches its equilibrium position and doesn't the specification for an oral gesture includes an
overshoot it). During the activation interval for equilibrium position, or goal, for each of two tract
this gesture, the equilibrium position for Lip variables, as well as a stiffness (which is currently
Aperture is set to the goal value for lip closure; the yoked across the two tract variables). Each func-
stiffness setting, combined with the damping, tional goal for a gesture is achieved by the coordi-
determines the amount of time it will take for the nated action of a set of articulators, that is, a co-
system to get close to the goal of lip closure. ordinative structure (Fowler et al., 1980; Kelso,
The set of task or tract variables currently im- Saltzman, & Tuller, 1986; Saltzman, 1986;
plemented in the computational model are listed Turvey, 1977); the sets of articulators used for
at the top left of Figure 2, and the sagittal vocal each of the tract variables are shown on the top
tract shape below illustrates their geometric defi- right of Figure 2, with the articulators indicated
nitions. This set of tract variables is hypothesized on the outline of the vocal tract model below. Note
to be sufficient for characterizing most of the ges- that the same articulators are shared by both of
tures of English (exceptions involve the details of the paired oral tract variables, so that altogether
characteristic shaping of constrictions, see there are five distinct articulator sets, or coordi-
Browman & Goldstein, 1989). For oral gestures, native structure types, in the system.

tract variable articulators involved


LP lip protrusion upper & lower lips, jaw
LA lip aperture upper & lower lips, jaw

TTCL tongue tip constrict location tongue tip, tongue body, jaw
TTCD tongue tip constrict degree tongue tip, tongue body, jaw

TBCL tongue body constrict location tongue body, jaw


TBCD tongue body constrict degree tongue body, jaw

VEL velic aperture velum

GLO glottal aperture glottis

+ upper lip
velum
+ lower lip
VEL + jaw
.? tongue
body
center

- GLO
+
glottis

Figure 2. Tract variables and their associated articulators.


56 Browman and Goldstein

In the computational system the articulators are those approaches are not identical to the tract
those of a vocal tract model (Rubin, Baer, & variable dimensions. These approaches can be
Mermelstein, 1981) that can generate speech seen as accounting for how microscopic continua
waveforms from a specification of the positions of are partitioned into a small number of macro-
individual articulators. When a dynamical system scopic categories.
(or pair of them) corresponding to a particular ges- The physical properties of a given phonological
ture is imposed on the vocal tract, the task-dy- unit vary considerably depending on its context
namic model (Saltzman, in press; Saltzman, 1986; (e.g., Kent & Minifie, 1977; Ohman, 1966;
Saltzman & Kelso, 1987; Saltzman & Munhall, Liberman, Cooper, Shankweiler, & Studdert-
1989) calculates the time-varying trajectories of Kennedy, 1967). Much of this context dependence
the individual articulators comprising that emerges lawfully from the use of task dynamics.
coordinative structure, based on the information An example of this kind of context dependence in
about values of the dynamic parameters, etc, lip closure gestures can be seen in the fact that
contained in its input. These articulator the three independent articulators that can
trajectories are input to the vocal tract model contribute to closing the lips (upper lip, lower lip,
which then calculates the resulting global vocal and jaw) do so to different extents as a function of
tract shape, area function, transfer function, and the vowel environment in which the lip closure is
speech waveform (see Figure 1). produced (Macchi, 1988; Sussman, MacNeilage, &
Defining gestures dynamically can provide a Hanson, 1973). The value of lip aperture achieved,
principled link between macroscopic and however, remains relatively invariant no matter
microscopic properties of speech. To illustrate what the vowel context. In the task-dynamic
some of the ways in which this is true, consider model, the articulator variation results
the example of lip closure. The values of the automatically from the fact that the lip closure
dynamic parameters associated with a lip closure gesture is modeled as a coordinative structure that
gesture are macroscopic properties that define it links the movements of the three articulators in
as a phonological unit and allow it to contrast achieving the lip closure task. The gesture is
with other gestures such as the narrowing gesture specified invariantly in terms of the tract variable
for [w]. These values are definitional, and remain of lip aperture, but the closing action is
invariant as long as the gesture is active. At the distributed across component articulators in a
same time, however, the gesture intrinsically context-dependent way. For example, in an utter-
specifies the (microscopic) patterns of continuous ance like [ibi], the lip closure is produced concur-
change that the lips can exhibit over time. These rently with the tongue gesture for a high front
changes emerge as the lawful consequences of the vowel. This vowel gesture will tend to raise the
dynamical system, its parameters, and the initial jaw, and thus, less activity of the upper and lower
conditions. Thus, dynamically defined gestures lips will be required to effect the lip closure goal
provide a lawful link between macroscopic and than in an utterance like [aba]. These microscopic
microscopic properties. variations emerge lawfully from the task dynamic
While tract variable goals are specified numeri- specification of the gestures, combined with the
cally, and in principle could take on any real fact of overlap (Kelso, Saltzman, & Tuller, 1986;
value, the actual values used to specify the ges- Saltzman & Munhall, 1989).
tures of English in the model cluster in narrow
ranges that correspond to contrastive categories: 4. GESTURAL STRUCTURES
for example, in the case of constriction degree, During the act of talking, more than one gesture
different ranges are found for gestures that is activated, sometimes sequentially and some-
correspond to what are usually referred to as times in an overlapping fashion. Recurrent pat-
stops, fricatives and approximants. Thus, terns of gestures are considered to be organized
paradigmatic comparison (or a density into gestural constellations. In the computational
distribution) of the numerical specifications of all model (see Figure 1), the linguistic gestural model
English gestures would reveal a macroscopic determines the relevant constellations for any ar-
structure of contrastive categories. The existence bitrary input utterance, including the phasing of
of such narrow ranges is predicted by approaches the gestures. That is, a constellation of gestures is
such as the quantal theory (e.g., Stevens, 1989) a set of gestures that are coordinated with one an-
and the theory of adaptive dispersion (e.g., other by means of phasing, where for this purpose
Lindblom, MacNeilage, & Studdert-Kennedy, (and this purpose only), the dynamical regime for
1983), although the dimensions investigated in each gesture is treated as if it were a cycle of an
Dynamics and Articulato/l{ Phonology 57

undamped system with the same stiffness as the phase relations among the gestures, to calculate a
actual regime. In this way, any characteristic gestural score that specifies the temporal activa-
point in the motion of the system can be identified tion intervals for each gesture in an utterance.
with a phase of this virtual cycle. For example, the One form of this gestural score for "pawn" is
movement onset of a gesture is at phase 0 degrees, shown in Figure 3b, with the horizontal extent of
while the achievement of the constriction goal (the each box indicating its activation interval, and the
point at which the critically damped system gets lines between boxes indicating which gesture is
sufficiently close to the equilibrium position) oc- phased with respect to which other gesture(s), as
curs at phase 240 degrees. Pairs of gestures are before. Note that there is substantial overlap
coordinated by specifying the phases of the two among the gestures. This kind of overlap can re-
gestures that are synchronous. For example, two sult in certain types of context dependence in the
gestures could be phased so that their movement articulatory trajectories of the invariantly speci-
onsets are synchronous (0 degrees phased to 0 de- fied gestures. In addition, overlap can cause the
grees), or so that the movement onset of one is kinds of acoustic variation that have been tradi-
phased to the goal achievement of another (0 de- tionally described as allophonic variation. For ex-
grees phased to 240 degrees), etc. Generalizations ample in this case, note the substantial overlap
that characterize some phase relations in the ges- between the velic lowering gesture (velum {wide})
tural constellations of English words are proposed and the gesture for the vowel (tongue body
in Browman and Goldstein (1990c). As is the case {narrow phar}). This will result in an interval of
for the values of the dynamic parameters, values time during which the velo-pharyngeal port is
of the synchronized phases also appear to cluster open and the vocal tract is in position for the
in narrow ranges, with onset of movement (0 de- vowel-that is, a nasalized vowel. Traditionally,
grees) and achievement of goal (240 degrees) being the fact of nasalization has been represented by a
the most common (Browman & Goldstein, 1990a). rule that changes an oral vowel into a nasalized
An example of a gestural constellation (for the one before a (final) nasal consonant. But viewed in
word "pawn" as pronounced with the back un- terms of gestural constellations, this nasalization
rounded vowel characteristic of much of the U.S.) is just the lawful consequence of how the individ-
is shown in Figure 3a, which gives an idea of the ual gestures are coordinated. The vowel gesture
kind of information contained in the gestural dic- itself hasn't changed in any way: it has the same
tionary. Each row, or tier, shows the gestures that specification in this word and in the word "pawed"
control the distinct articulator sets: velum, tongue (which is not nasalized).
tip, tongue body, lips, and glottis. The gestures are The parameter value specifications and
represented here by descriptors, each of which activation intervals from the gestural score are
stands for a numerical equilibrium position value input to the task dynamic model (Figure 1), which
assigned to a tract variable. In the case of the oral calculates the time-varying response of the tract
gestures, there are two descriptors, one for each of variables and component articulators to the
the paired tract variables. For example, for the imposition of the dynamical regimes defined
tongue tip gesture labelled {clo alvl, {clo} stands by the gestural score. Some of the time-varying
for -3.5 mm (negative value indicates compression responses are shown in Figure 3c, along with
of the surfaces), and {alv} stands for 56 degrees the same boxes indicating the activation intervals
(where 90 degrees is vertical and would corre- for the gestures. Note that the movement curves
spond to a midpalatal constriction). The associa- change over time even when a tract variable is
tion lines connect gestures that are phased with not under the active control of some gesture.
respect to one another. For example, the tongue Such motion can be seen, for example, in the LIPS
tip {clo alv} gesture and the velum {wide} gesture panel, after the end of the box for the lip closure
(for nasalization) are phased such that the point gesture. This motion results from one or
indicating 0 degrees-onset of movement-of the both of two sources. (1) When an articulator is not
tongue tip closure gesture is synchronized with part of any active gesture, the articulator returns
the point indicating 240 degrees-achievement of to a neutral position. In the example, the upper lip
goal-of the velic gesture. and the lower lip articulators both are returning
Each gesture is assumed to be active for a fixed to a neutral position after the end of the lip
proportion of its virtual cycle (the proportion is closure gesture. (2) One of the articulators linked
different for consonant and vowel gestures). The to the inactive tract variable may also be linked
linguistic gestural model uses this proportion, to some active tract variable, and thus cause
along with the stiffness of each gesture and the passive changes in the inactive tract variable.
58 Browman and Goldstein

(3a) 1----------------------

wide
VELUM

clo
TONGUE TIP
~alV
narrow
TONGUE BODY phar

CIO _ _ _ _ _
LIPS
lab

L..
GLOTTIS

""wide

(3b)I----------------------~
'pan

VELUM

cia
TONGUE TIP alv

TONGUE BODY

LIPS

GLOTTIS
Dvnamics and Articulatory Phonology 59

(3c)
'pon

closed
wide
VELUM
~-------I t
----------------------~
closed
TONGUE TIP
t
closed
TONGUE BODY
t
LIPS

GLOTTIS
----- - ---------1
closed
t
~
closed
100 200 300 400

TIME (ms)
Figure 3. Various displays from computational model for "pawn." (a) Gestural descriptors and association lines (b)
Gestural descriptors and association lines plus activation boxes (c) Gestural descriptors and activation boxes plus
generated movements of (from top to bottom): Velie Aperture, vertical position of the Tongue Tip (with respect to the
fixed palate/teeth), vertical position of the Tongue Body (with respect to the fixed palate/teeth), Lip Aperture, Glottal
Aperture.

In the example, the jaw is part of the coordinative give some examples of how the notion of contrast
structure for the tongue body vowel gesture, as is defined in a system based on gestures, using the
well as part of the coordinative structure for the schematic gestural scores in Figure 4.
lip closure gesture. Therefore, even after the lip One way in which constellations may differ is in
closure gesture becomes inactive, the jaw is the presence vs. absence of a gesture. This kind of
affected by the vowel gesture, and its lowering for difference is illustrated by two pairs of subfigures
the vowel causes the lower lip to also passively in Figure 4: (4a) vs. (4b) and (4b) vs. (4d). (4a)
lower. "pan" differs from (4b) "ban" in having a glottis
The gestural constellations not only characterize {wide! gesture (for voicelessness), while (4b) "ban"
the microscopic properties of the utterances, as differs from (4d) "Ann" in having a labial closure
discussed above, but systematic differences among gesture (for the initial consonant). Constellations
the constellations also define the macroscopic may also differ in the particular tract
property of phonological contrast in a language. variable/articulator set controlled by a gesture
Given the nature of gestural constellations, the within the constellation, as illustrated by (4a)
possible ways in which they may differ from one "pan" vs. (4c) "tan," which differ in terms of
another is, in fact, quite constrained. In other whether it is the lips or tongue tip that perform
papers (e.g., Browman & Goldstein, 1986; 1989; the initial closure. A further way in which
1992) we have begun to show that gestural constellations may differ is illustrated by
structures are suitable for characterizing comparing (4e) "sad" to(4f) "shad," in which the
phonological functions such as contrast, and what value of the constriction location tract variable for
the relation is between the view of phonological the initial tongue tip constriction is the only
structure implicit in gestural constellations, and difference between the two utterances. Finally,
that found in other contemporary views of two constellations may contain the same gestures
phonology (see also Clements, 1992 for a and differ simply in how they are coordinated, as
discussion of these relations). Here we will simply can be seen in (4g) "dab" vs. (4h) "bad."
60 Browman and Goldstein

(a) pren (b) bren

VELUM

TONGUE TIP
wide
TONGUE BODY phar

LIPS

GLOTTIS

(c) tren (d) ren

VELUM

TONGUE TIP

TONGUE BODY

LIPS

GLOTTIS

(e) sred (f) Jred

VELUM
crit clo crit
TONGUE TIP alv alv alvpal
wide wide
TONGUE BODY phar phar

LIPS

GLOTTIS wide wide

(g) dreb (h) bred

VELUM

clo clo
TONGUE TIP alv alII
wide wide
TONGUE BODY phar
phar

LIPS clo clo


lab lab
GLOTTIS

Figure 4. Schematic gestural scores exemplifying contrast. (a) "pan" (b) "ban" (c) "tan" (d) "Ann" (e) "sad" (f) "shad"
(g) "dab" (h) "bad."
Dynamics and Articulatory Phonology 61

This chapter described an approach to the de- action. In B. Butterworth (Ed.), Language production (pp. 373-
420). New York, NY: Academic Press.
scription of speech in which both the cognitive and
Haken, H. (1977). Synergetics: An introduction. Heidelberg:
physical aspects of speech are captured by viewing Springer-Verlag.
speech as a set of actions, or dynamic tasks, that Halle, M. (1982). On distinctive features and their articulatory
can be described using different dimensionalities: implementation. Natural Language and Linguistic Theory, I, 91-
low dimensional or macroscopic for the cognitive, 105.
Hockett, C. (1955). A manual of phonology. Chicago: University of
and high dimensional or microscopic for the
Chicago.
physical. A computational model that instantiates Hogeweg, P. (1989). MIRROR beyond MIRROR, Puddles of LIFE.
this approach to speech was briefly outlined. It In C. Langton (Ed.), Artzficial life (pp. 297-316). New York:
was argued that this approach to speech, which is Addison-Wesley.
based on dynamical description, has several Jakobson, R, Fant, C. G. M., & Halle, M. (1951). Preliminaries to
speech analysis: The distinctive features and their correlates.
advantages. First, it captures both the
Cambridge, MA: MIT.
phonological (cognitive) and physical regularities Kauffmann, S. (1989). Principles of adaptation in complex
that minimally must be captured in any systems. In D. Stein (Ed.), Sciences of complexity (pp. 619-711).
description of speech. Second, it does so in a way New York: Addison-Wesley.
that unifies the two descriptions as descriptions Kauffmann, S. (1991). Antichaos and adaptation. ScientifiC
with different dimensionality of a single complex American, 265, 78-84.
Kauffmann,S., & Johnsen, S. (1991). Co-evolution to the edge of
system. The latter attribute means that this chaos: coupled fitness landscapes, poised states, and co-evolu-
approach provides a principled view of the tionary avalanches. In C. Langton, C. Taylor, J. D. Farmer, & R
reciprocal constraints that the physical and Rasmussen (Eds.), Artificial life II (pp. 325- 369). New York:
phonological aspects of speech exhibit. Addison-Wesley.
Keating, P. A. (1985). CV phonology, experimental phonetics, and
coarticulation. UCLA WPP, 62, 1-13.
REFERENCES Keating, P. A. (1988). Underspecification in phonetics. Phonology,
Abraham, R. H., & Shaw, C. D. (1982). Dynamics-The geometry of 5,275-292.
behavior. Santa Cruz, CA: Aerial Press. Keating, P. A. (1990). Phonetic representations in a generative
Anderson, S. R, (1974). The organization of phonology. New York, grammar. Journal of Phonetics, 18, 321-334.
NY: Academic Press. Kelso, J. A. S., Saltzman, E. L., & Tuller, B. (1986). The dynamical
Browman, C. P., & Goldstein, L. (1986). Towards an articulatory perspective on speech production: Data and theory. Journal of
phonology. Phonology Yearbook, 3, 219-252. Phonetics, 14, 29-59.
Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as Kent, RD., & Minifie, F. D. (1977). Coarticulation in recent speech
phonological units. Phonology, 6, 201-251. production models. Journal of Phonetics, 5, 115-133.
Browman, C. P., & Goldstein, L. (1990a). Gestural specification Klatt, D. H. (1976). Linguistic uses of segmental duration in
using dynamically-defined articulatory structures. Journal of English: Acoustic and perceptual evidence. Journal of the
Phonetics, 18,299-320. Acoustical Society of Ameriea, 59, 1208-1221.
Browman, C. P., & Goldstein, L. (1990b). Representation and Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and
reality: Physical systems and phonological structure. Journal of the self- assembly of rhythmic movement. Hillsdale, NJ: Lawrence
Phonetics, 18,411-424. Erlbaum Associates.
Browman, C. P., & Goldstein, L. (1990c). In J. Kingston & M. E. Ladefoged, P. (1980). What are linguistic sounds made of?
Beckman (Eds.), Tiers in articulatory phonology, with some Language, 56, 485- 502. Ladefoged, P. (1982). A course in
implications for casual speech (pp. 341-376). phonetics (2nd ed.). New York, NY: Harcourt Brace Jovanovich.
Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: Lewin, R (1992). Complexity. New York, NY: Macmillan.
An overview. Phonetica, 49, 155-180. Ladefoged, P. (1982). A course in phonetics (2nd ed.). New York:
Browman, C. P., Goldstein, L., Kelso, J. A. 5., Rubin, P., & Harcourt Brace Jovanovich.
Saltzman, E. (1984). Articulatory synthesis from underlying Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-
dynamics. Journal of the Acoustical Society of America, 75, S22-S23 Kennedy, M. (1967). Perception of the speech code.
(A). Psychological Review, 74, 431- 461.
Chomsky, N., & Halle, M. 1968. The sound pattern of English. Liberman, M., & Pierrehumbert, J. (1984). Intonational invariance
New York: Harper Row. Clements, G. N. (1992). Phonological under changes in pitch range and length. In M. Aronoff, R T.
primes: Features or gestures? Phonetica, 49, 181-193. Oehrle, F. Kelley, & B. Wilker Stephens (Eds.), Language sound
Clements, G. N. (1992). Phonological primes: Features or structure (pp. 157-233). Cambridge, MA: MIT Press.
gestures? Phonetica, 49,181-193 Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983).
Cohn, A. C. (1990). Phonetic and phonological rules of Self-organizing processes and the explanation of phonological
nasalization. UCLA WPP, 76. universals. In B. Butterworth, B. Comrie, & O. Dahl (Eds.),
Coleman, J. (1992). The phonetic interpretation of headed Explanations of linguistic universals (pp. 181-203). Mouton: The
phonological structures containing overlapping constituents. Hague.
Phonology, 9. 1-44. Macchi, M. (1988). Labial articulation patterns associated with
Fourakis, M., & Port, R (1986). Stop epenthesis in English. Journal segmental features and syllable structure in English. Phonetiea
of Phonetics, 14, 197-221. 45, 109-121.
Fowler, C. A., Rubin, P., Remez, R E., & Turvey, M. T. (1980). Madore, B. F., & Freedman, W. L. (1987). Self-organizing
Implications for speech production of a general theory of structures. American Scientist, 75,252-259.
62 Browman and Goldstein

Manuel, S. Y., & Krakow, R. A. (1984). Universal and language (Eds.), Generation and modulation of action patterns (pp. 129-144).
particular aspects of vowel-to-vowel coarticulation. Haskins Berlin/Heidelberg: Springer-Verlag
Laboratories Status Report on Speech Research, SR77178, 69-78. Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A task
McCarthy, J. J. (1988). Feature geometry and dependency: A dynamic approach. Psychological Review, 94, 84-106.
review. Phonetica, 45, 84-108. Saltzman, E. 1., & Munhall, K. G. (1989). A dynamical approach to
Ohala, J. (1983). The origin of sound patterns in vocal tract gestural patterning in speech production. Ecological Psychology
constraints. In P. F. MacNeilage (Ed.), The production of speech 1,333-382.
(pp. 189-216). New York, NY: Springer-Verlag. Schoner, G., & Kelso, J. A. S. (1988). Dynamic pattern generation
Ohman, S. E. G. (1966). Coarticulation in VCV utterances: in behavioral and neural systems. Science, 239,1513-1520.
Spectrographic measurements. Journal of the Acoustical Society of Stevens, K. N. (1972). The quantal nature of speech: Evidence
America, 39, 151- 168. from articulatory-acoustic data. In E. E. David & P. B. Denes
Packard, N. (1989). Intrinsic adaptation in a simple model for (Eds.), Human communication: A unified view (pp. 51-66). New
evolution. In C. Langton (Ed.), Artificial life (pp. 141-155). New York, NY: McGraw-Hill.
York: Addison-Wesley. Stevens, K. N. (1989). On the quantal nature of speech. Journal of
Pierrehumbert, J. (1990). Phonological and phonetic Phonetics, 17, 3-45.
representation. Journal of Phonetics 18, 375-394. Sussman, H. M., MacNeilage, P. P., & Hanson, R. J. (1973). Labial
Pierrehumbert, J. B., & Pierrehumbert, R. T. (1990). On attributing and mandibular dynamics during the production of bilabial
grammars to dynamical systems. Journal of Phonetics, 18. consonants: Preliminary observations. Journal of Speech and
Port, R F. (1981). Linguistic timing factors in combination. Journal Hearing Research, 16,397-420.
of the Acoustical Society ofAmerica, 69, 262-274. Turvey, M. T. (1977). Preliminaries to a theory of action with
Rubin, P. E., Baer, T., & Mermelstein, P. (1981). An articulatory reference to vision. In R. Shaw & J. Bransford (Eds.), Perceiving,
synthesizer for perceptual research. Journal of the Acoustical acting and knowing: Toward an ecological psychology. Hillsdale, NJ:
Society ofAmerica, 70,321-328. Lawrence Erlbaum Associates.
Sagey, E. C. (1986). The representation offeatures and relations in non- Wood, S. (1982). X-ray and model studies of vowel articulation.
linear phonology, doctoral dissertation, MIT. Working Papers, Lund University, 23.
Saltzman, E. (in press). Dynamics in coordinate systems in skilled
sensorimotor actiVity. In T. van Gelder & B. Port (Eds.), Mind as FOOTNOTES
motion. Cambridge, MA: MIT Press. 'In T. van Gelder & B. Port (Eds.), Mind as motion. Cambridge,
Saltzman, E. (1986). Task dynamic coordination of the speech MA: MIT Press (in press).
articulators: A preliminary model. In H. Heuer & C. Fromm t Also Department of Linguistics, Yale University.

Вам также может понравиться