Backley Englishvowels - 2010

1
Element Theory
and the
The Structure of English Vowels

Phillip Backley
Tohoku Gakuin University, Japan

February 2009

2

Contents

Chapter 1. Background and Introduction

Chapter 2. Representing Segmental Structure
2.1 Segments have internal structure
2.2 Articulation versus perception
2.3 Elements as patterns in the speech signal
2.4 Monovalency versus bivalency
2.5 Elements and the grammar
2.6 Summary

Chapter 3. Element Theory and the Representation of Vowels
3.1 Introduction
3.2 What makes |A I U| special?
3.3 |A I U| as simplex expressions
3.4 |A I U| in compounds
3.4.1 Phonetic evidence for element compounds
3.4.2 Phonological evidence for element compounds
3.5 Central vowels
3.5.1 Phonetic evidence for empty vowels
3.5.2 Phonological evidence for empty vowels

Chapter 4. English Vowel Structure
4.1 Introduction
4.2 Front rounding in vowels
4.3 Element dependency
4.4 The representation of English vowels
4.4.1 Introduction
4.4.2 Short vowels
4.4.3 Long monophthongs
4.4.4 Weak vowels
4.4.5 Diphthong structure
4.4.6 |I| diphthongs
4.4.7 |U| diphthongs
4.4.8 |A| diphthongs

Chapter 5. Summary

3

1: Background and Introduction

This paper is concerned with the internal structure of language sounds. It considers the nature
of segmental structure in general and the representation of English vowel sounds in particular.
Although the phonology of English has been the focus of many descriptive studies in the past,
the present paper has a novel contribution to make to the literature. The novelty of its
approach comes from the way it rejects a number of basic assumptions associated with the
mainstream view of segmental phonology. One of its aims is to highlight problems arising
from the use of traditional distinctive features; specifically, it questions the value of a feature
system which, firstly, is biased towards articulation, and secondly, employs binary features
such as [+son] and [son]. As an alternative to traditional features, this paper will motivate an
Element Theory view of melodic representation in which a small number of single-valued
elements constitute the basic units of segmental structure.
In spite of its age, The Sound Pattern of English (Chomsky & Halle 1968) or SPE
continues to exert its influence on phonological theory. And although the original SPE model
has undergone significant changes over the years, on the whole it has developed in a way that
retains close ties with its earlier roots. Its most widespread, present-day form is what I shall
call the standard approach. This is intended as a cover term for those theoretical approaches
which follow SPE in viewing phonology as a derivational device and which represent
segmental structure using SPE-type distinctive features. This paper challenges both of these
characteristics of the standard approach derivations and distinctive features though it
will focus on the problematic nature of a feature system that can be traced back to the time of
SPE but which still dominates the study of segmental phonology today.
Most proponents of distinctive feature theory see no need to look for alternative ways
of describing segmental structure, because features offer a rich and descriptively powerful
model of representation that has more than withstood the test of time. Indeed, distinctive
features continue to show a remarkable endurance. While other aspects of phonological
theory have undergone radical revision notably in the move from specific rewrite rules
towards more general constraints on grammaticality the set of features for expressing
segment-internal structure has remained largely unchanged. The use of features even extends
beyond the standard approach for example, to Optimality Theory (Prince & Smolensky
2004). Of course, Optimality Theory and the standard approach differ in fundamental ways
4

in terms of grammaticality, for instance. The Optimality view sees grammaticality as
being determined by a once-only evaluation of some lexical input, whereas in standard theory
a grammatical form corresponds to the final stage of a serial derivation process. Yet when it
comes to segmental structure, the two approaches usually converge in the sense that they both
employ distinctive features and they both admit lexical forms comprising linear strings of
segments from which prosodic structure is largely predictable.
Distinctive features are undeniably part of the fabric of mainstream phonology. This
is not an a priori reason to accept their validity as units of linguistic structure, however. In
fact this paper claims that features do not provide the most suitable means of representing the
internal structure of language sounds. Instead, I argue that segmental representations are built
from an alternative set of units called elements, which are mapped onto patterns that humans
perceive in the speech signal. Clearly this departs from the standard view that features are
associated with the articulatory properties of speech production. Below I illustrate the use of
elements in representations by analysing the internal structure of vowels.
The discussion is organised as follows. Section 2 considers some of the problems
associated with distinctive features. In particular, it questions two common assumptions
about the nature of features: their bias towards articulation and their reliance on binary values.
(Readers who are already familiar with these issues and with the thinking behind the Element
Theory approach may skip this section altogether, and proceed to section 3). Then section 3
introduces Element Theory as an alternative way of describing segmental structure. It focuses
on the representation of vowels using the elements |A I U|. Section 4 offers an Element
Theory analysis of the vowel system(s) of English. It shows how an approach based on
phonological elements can shed light on patterns that characterise the shape and behaviour of
vowels in present-day English. Finally, section 5 summarises the main points.
5

2: Representing Segmental Structure

2.1 Segments have internal structure
There is a long tradition of using segments to describe language sounds. For example,
dictionaries provide segmental (i.e. phonemic) information to show the pronunciation of a
word (e.g. segmental /scg'mcnloI/), and linguists refer to inventories of segments when
comparing one language with another, or when discussing the set of contrastive sounds in a
language. Yet there is overwhelming evidence that segments are not the primary units of
sound structure. Rather, by observing how sounds behave in languages we can uncover a set
of more basic sound properties which collectively describe the internal make-up of segments;
and it is this assumption which has driven the study of segmental phonology since the time of
Trubetzkoy.
According to this view, segments with one or more of the same basic properties in
common are expected to show similar phonological behaviour, whereas segments with little
or no shared internal structure should show quite different behaviour. Identifying these basic
sound properties is therefore central to the task of explaining segmental patterns and
groupings. As any introductory course in phonology attempts to show, understanding the
nature of segment-internal properties should reveal why segments regularly cluster together
only in certain combinations and why segments interact in predictable ways as a result of
coming into contact with each other. So, although the term segment continues to serve as a
convenient label for referring to language sounds, segments themselves should not be seen as
having the status they once had as formal units of linguistic structure.
The standard approach views segments as bundles of co-occurring features, where
each feature picks out one aspect of a segments behaviour. This means that one feature alone
cannot define any individual segment; in order to characterise a segment in full we must refer
to its combined feature specification that is, to the sum of its phonological properties.
Nevertheless, single features do have a role in representation systems: each defines an entire
class of segments, where every member of the class shares the same phonological property by
virtue of having the same feature in its representation. With this one property in common,
segments from the same class should, in principle, display similar phonological behaviour
with respect to this property. For example, the feature [+coronal] unites a range of otherwise
disparate sounds including [lj 0 I z l n], all of which may follow the vowel [au] in English:
the words couch, mouth, owl, blouse, shout, count contain well-formed sequences of [au]
6

plus coronal, whereas a segment from any other class is banned from this position (*[laub],
*[sau!], *[aupL], etc.).

2.2 Articulation versus perception
Because every language shows distributional regularities of the kind just described, there is
little reason to doubt that segments have internal structure. What still remains unresolved,
however, is the question of the nature of this internal structure. In particular, what are the
linguistic units which represent the sub-segmental properties of speech sounds? As I have
noted, the standard approach assumes a set of features adapted from those employed in SPE.
From their labels alone (e.g. [high], [voice], [lateral], etc.) it is clear that features can be
traced back to phonetic properties primarily, to properties referring to articulation such as
glottal state and tongue position. When they are used to analyse linguistic patterns in speech,
however, they are also associated with the kinds of phonological properties that describe
segmental contrasts and dynamic processes. So there is an underlying assumption that
phonological phenomena are motivated by phonetics, and more specifically by speech
production that is, by articulation.
Yet the association between phonology and articulation is not a necessary one. The
authors of Fundamentals of Language (Jakobson & Halle 1956) argued that phonological
features should be defined in auditory-acoustic terms, and this view had a major influence on
phonological studies until the time of SPE. For instance, they propose the feature pair
[compact]/[diffuse], where these labels reflect the acoustic properties of the sound classes
they represent. Specifically, these features describe how acoustic energy is distributed across
the spectrum. In compact sounds such as low vowels and back consonants it is concentrated
in the central area of the spectrum that is, the energy has a [compact] distribution in this
acoustic region; whereas in diffuse sounds such as high vowels and front consonants it
extends more widely across the spectrum in other words, the energy has a [diffuse]
distribution. The other eight feature pairs proposed in Fundamentals of Language have a
similar acoustic or hearer-oriented characterisation.
The tradition of describing segmental structure in auditory-acoustic terms came to an
abrupt end with the publication of SPE. This was despite the authors of SPE having given
little justification for rejecting auditory-acoustic features or for adopting articulatory features
instead. But such was the influence of SPE on the development of phonological theory that its
preference for articulatory features quickly caught on. And to this day most analyses of
7

segmental structure are described in terms of articulation that is, they model the
characteristics of the vocal tract (Browman & Goldstein 1992). Recently, however, this view
has been challenged by alternative, non-articulatory approaches to segmental representation.
After all, if phonology aims to maintain the original generative goal of modelling the
internalised knowledge of an ideal speaker-hearer, then why should it, like the standard
approach, focus exclusively on a speakers knowledge of speech production? By the same
token, why should the pre-SPE tradition of employing auditory-acoustic features focus
exclusively on the linguistic knowledge of a hearer?
This paper adopts neither of these views. Instead, it takes up a position first
introduced in Jakobson et al. (1952) and later developed by Element Theory (Harris &
Lindsey 2000), which attempts to capture the linguistic knowledge that is common to both
speakers and hearers. It does this by associating phonological structure directly with the
speech signal, this being the only aspect of the communication process involving both parties:
speakers use their vocal organs to create patterns in the speech signal, while hearers perceive
those same patterns in the speech signal and decode them into meaningful language. On this
basis, a theory of segmental structure which focuses on properties of the speech signal, rather
than on articulation or on auditory properties, concerns itself with the linguistic knowledge
shared by all language users speakers and hearers alike and in doing so, approximates
more closely to that neglected notion of the ideal speaker-hearer.
Although the standard model assumes a set of features based largely on speech
production, there is no clear motivation for describing segmental structure with a bias
towards the speaker. In fact, the use of articulatory features raises at least one basic problem:
such features may reflect what speakers know about sound structure, but they are less well
equipped to capture linguistic knowledge from the perspective of the hearer. As Harris (2007)
notes, a feature-based model such as the standard approach relies on the assumption that
hearers understand input speech by matching what they hear with the articulatory movements
of the speaker who produced it. Direct Realist Theory claims that this is achieved by hearers
perceiving those movements directly (Fowler 1986), which seems an unlikely scenario in
view of the fact that the speech organs are mostly obscured during speech production.
Meanwhile the Motor Theory of Speech Perception claims that hearers translate the perceived
input into what they take to be the correct articulatory movements for reproducing that form
as a speaker (Liberman & Mattingly 1985); this position seems equally unlikely, in view of
the way infants acquire the ability to perceive speech much earlier than the ability to produce
it themselves.
8

In short, there seems little support for the assumption that speech sounds should be
represented in terms of articulatory properties. If anything, the arguments point towards
speech perception as being primary and speech production only secondary. This was indeed
the accepted position before SPE, as documented in the work of Sapir and Jakobson. It is also
the position that Element Theory attempts to revive. As just indicated, the acquisition facts
suggest that infant learners begin by perceiving adult input forms; on the basis of these input
forms they build mental representations, which serve as the beginnings of their native
lexicon; and only later do they go on to reproduce these stored forms as spoken language. But
while the former (perception) stage is necessary for successful acquisition, the latter
(production) stage is not, as confirmed by the ability of mutes and those with abnormalities of
the vocal apparatus to acquire a native grammar; evidently, the inability to articulate normally
is not a bar to perceiving speech. Conversely, speech production in the profoundly deaf rarely
develops to a native-like level, presumably because their means of perceiving language lacks
the necessary input from the speech signal.
Having argued that speech perception is more fundamental to the grammar than
speech production, it is natural to assume that segments should be formally described in
terms of their perceptual (i.e. auditory) properties that is, from the hearers point of view.
Recall, however, that this paper is attempting to develop a representation system which
favours neither the speaker nor the hearer, but which instead models the linguistic knowledge
common to both. As suggested above, this means focusing on the speech signal the set of
acoustic events which involves the transmission of sound waves through the air and which
acts as an intermediary between the origin of a sound (the vocal organs of the speaker) and its
target (the auditory system of the hearer). This approach is motivated in Harris & Lindsey
(2000), where it is proposed that the speech signal be understood as a channel through which
speakers transmit and monitor[linguistic] information and listeners receive it (Harris &
Lindsey 2000: 185).
As a physical phenomenon, the speech signal is something that can be measured in
concrete terms. So when an utterance is transmitted between speaker and hearer it is possible
to describe its acoustic properties (e.g. amplitude, formant values). However, it seems that
most of these properties are irrelevant to the grammar, and as such, need not be encoded by
features in phonological representations. Indeed, the extensive literature on segmental
structure gives no indication that raw acoustic data such as formant values or voice onset
measurements have any place in formal phonological theory. A simple parallel can be found
in music: although the notes of a musical phrase can be described by referring to their
9

physical attributes (e.g. frequency in hertz), a musician does not need precise information of
this kind in order to perceive that phrase, store it in memory, or reproduce it as a melody. Nor
do these physical characteristics need to be written on the page of a musical score. A musical
note is identified not by raw acoustic values, but rather, by its overall acoustic shape and its
relation to other notes in the musical context.
Like musicians, language users do not classify sounds according to their acoustic
properties. It is true that phoneticians may use phonetic data such as formant frequency to
describe the sounds of a language, or to compare different languages; importantly, however,
these data do not constitute linguistic information, and as such, do not identify segmental
features. But if the speech signal is the medium by which language is transferred between
speaker and hearer, then which aspects of the signal are relevant to the grammar and to the
communication process? The claim made by Element Theory is that humans perceive specific
information-bearing patterns in the speech signal, and that each pattern is represented by an
element, where an element is taken to be the smallest unit of segmental structure present in
mental representations. This is the position motivated in Harris & Lindsey (2000) and
summarised in Nasukawa & Backley (2008).

2.3 Elements as patterns in the speech signal
The Element-based approach assumes that hearers instinctively seek out linguistic
information: when decoding speech, they ignore most of the incoming acoustic stream and
focus only on the specifically linguistic information contained within the speech signal. Thus
Element Theory recognizes the human ability to extract from running speech only those
acoustic patterns that are relevant to language. And, as just mentioned, it further assumes that
the mental phonological categories represented by elements are mapped directly on to those
same acoustic patterns. So although elements are associated with certain physical patterns in
the speech signal, they exist primarily as mental constructs that is, as units of phonological
structure in the internalized grammar. In order to highlight the way the term element can
refer to both the physical and the mental, Harris & Lindsey (2000) describe elements as
auditory images. This label suggests that an element is primarily a grammar-internal object
a mental image of some linguistically significant information, but that it is also a
grammar-external object a physical pattern in the speech signal which hearers use to cue
that mental image. The defining characteristics of these speech signal patterns are described
in section 3 below.
10

So far, the discussion has given only a hearer-oriented view of elements, in which
hearers perceive the speech signal, recover information-bearing patterns from it, and then
associate those patterns with particular elements in phonological structure. But the speech
signal is a neutral medium, and must therefore carry linguistic information which is also
relevant to speakers. In the case of speakers, the same information-bearing patterns function
not as perceptual cues but as production (i.e. articulation) targets. It must be assumed that a
speakers internalized grammar includes knowledge of the mapping between elements in
lexical representation and their associated acoustic patterns in the speech signal. So in order
to phonetically interpret a word, speakers must access the lexical form of that word, associate
the elements it contains with their corresponding speech signal patterns, and use the vocal
organs to reproduce those target acoustic patterns in an utterance.
Importantly, this process of reproducing an acoustic target succeeds without the need
for an element to contain information about speech production. For the grammar to specify
any mapping between elements and articulation would be at best unnecessary, and at worst
counter-productive, since there is not always a one-to-one correspondence between the shape
of the vocal tract and the resulting sound. Consider a trained ventriloquist, for example, who
can reproduce the speech signal pattern associated with bilabial stops but without using
conventional lip closure. Even untrained speakers typically have available to them a choice of
different articulatory configurations for creating the same acoustic result. For example, to
bring about a general downward shift in vowel formant values creating a flattening of
the sound spectrum (Jakobson, Fant & Halle 1952: 31) speakers may employ lip rounding,
or a contraction of the pharynx, or a combination of the two.
1
In sum, an element in
phonological representation establishes which signal pattern a speaker must aim for, but it
does not prescribe what the speaker must do to reach the target. A suitable articulation is
something that speakers master only through being experienced users of their native language.
Before returning to the issue of distinctive features, let us review the way some basic
phonological concepts should be (re)defined in light of the preceding discussion on the nature
of Element Theory. First, the elements themselves are to be seen as acoustic images
primarily as cognitive objects which are present in lexical representations and which serve to
encode contrasts and alternations. However, elements also connect to the external world by
having a direct physical interpretation they are mapped onto certain acoustic patterns in
the speech signal which carry linguistic information. Thus a phonological representation may

1
For further examples, see Harris & Urua (2001: 79).
11

be thought of as a code which allows language users to store and identify these mental
acoustic patterns.
In contrast, speech production is an aspect of language use which is not controlled by
the grammar. Tongue position, glottal state, lip attitude and the like do not constitute
linguistic information; rather, they provide a way of delivering the speech signal. So
articulation serves as a vehicle for carrying the linguistic message, but it does not constitute
the message itself. To reinforce this point, we need only consider the communication process:
when a hearer perceives information-bearing patterns in the speech signal, each pattern acts
consistently and reliably as a cue to its associated element it makes no difference whether
the signal originates from the articulation of an actual utterance, or from the recording of an
actual utterance, or from a synthesized, unarticulated voice on a computer. In each case the
linguistic message is the same, regardless of whether the vocal organs are involved or not,
since articulation is not a component of the mental grammar.
In conclusion, there is little evidence to support the prevailing view that the basic
units of segmental structure are defined in articulatory terms. For this reason, section 3 will
argue for an alternative view of phonological representations in which features or elements
are mapped onto certain patterns in the speech signal. Although these patterns can be
characterized by their acoustic properties, they are to be understood primarily as cognitive
units which carry linguistic information about the identity of morphemes.

2.4 Monovalency versus bivalency
Before going on to introduce the elements in detail, this section addresses another issue
concerning the use of distinctive features: should features (or elements) in representations be
monovalent (single-valued) or bivalent (binary-valued)? The standard model follows a
tradition of employing bivalent features, meaning that the grammar marks the presence of a
phonological property by specifying a positive feature value, while the absence of that
property is shown by the corresponding negative value. For example, l-sounds are specified
as [+lateral] while all other sounds are [lateral]; this creates an equipollent distinction
between lateral and non-lateral, according to which [+lateral] and [lateral] appear to have
equal status because the grammar is able to refer to either category. But alongside bivalent
features such as [lateral] we also find a number of monovalent features being used in some
versions of the standard model (Steriade 1995). Unlike [lateral], a monovalent feature such
as [round] can only refer to the presence of a given property, not to its absence. This creates a
12

privative distinction between the opposing categories, because only a single value of the
feature can be expressed in representational terms.

(1)
[n] vs. [u] [n] vs. [I]
a. bivalency [+round] vs. [round] [lateral] vs. [+lateral]
b. monovalency [round] vs. vs. [lateral]

As (1) shows, there are two ways of referring to the same phonological contrast,
because there are two ways of expressing the absence of a certain property. For example, to
describe a back unrounded vowel such as [u] we can either use [round] (i.e. the negative
value of the bivalent feature [round]) or we can choose to make no reference to rounding, as
indicated in (1) by (i.e. the monovalent feature [round] is absent from the segments
representation). At first sight, the difference between [round] and seems trivial, because
the same contrast can be expressed in both systems. However, several authors including
Durand (1995), Kaye (1989), Harris (1994) and Roca (1994) have noted that the choice
between bivalency and monovalency affects our predictions about how language sounds are
grouped into natural classes and how they participate in phonological processes. That is, the
two systems make different grammatical statements.
To illustrate this point, consider the representation of nasal vowels such as [c] and [a].
These belong to a natural class a non-random group whose members all share some
physical characteristic (nasal resonance) and, more importantly, some pattern of phonological
behaviour (e.g. vowel lowering, trigger of nasal harmony). It is assumed that these shared
physical and phonological characteristics are an indication that the same structural property
in this case, nasality is specified in the representation of each member of the natural
class. In other words, the common structural property defines the natural class. Furthermore,
most theories of segmental structure assume that this class-defining property corresponds to a
basic, indivisible unit in phonological representation, typically a feature or an element. In this
example the basic property is nasality, so it follows that every nasal vowel must have a
nasality feature/element in its segmental make-up.
The Amerindian language Warao (Osborn 1966) illustrates how monovalency and
bivalency make different grammatical predictions (data from Botma 2005):

13

(2) a. ja sun c. Inava ha summer
b. ja walking d. oIho io kind of tree

As (2a-b) show, this language has a lexical contrast between oral and nasal vowels. So in a
monovalent system of representation, the feature [nasal] appears in the structure of [a] in (2b),
while [a] in (2a) makes no reference to [nasal] and is therefore interpreted as an oral vowel.
Alternatively, under bivalency [a] is specified as [+nasal] while [a] has [nasal]. (2c-d) show
that Warao also has a process of nasal harmony, where the presence of a nasal trigger (a nasal
vowel or nasal consonant) causes all target sounds (vowels, laryngeals, glides) to its right to
be nasalised within the word domain. Any harmonic trigger in Warao is characterised as a
segment with [nasal]/[+nasal] in its lexical representation, where this feature defines a natural
class of nasals all united by similar (harmonic) behaviour.
As expected, oral vowels do not act as harmonic triggers in this language, because
they have no [nasal]/[+nasal] specification. Moreover, they do not constitute a natural class
because they display no unified active behaviour.
2
Importantly, the fact that [nasal] (i.e.
oral) vowels collectively do not do something provides no justification for grouping them
together as a natural class. Yet this is exactly what the bivalent feature system does. Allowing
[nasal] to appear in representations gives it a grammatical status equal to that of [+nasal],
making it possible for the phonology to refer to [nasal] as well as [+nasal] as an active
property in some phonological process. However, the evidence does not support this position:
for example, we find no comparable process of oral harmony in which [nasal] acts as a
harmonic trigger and oralises nasal vowels. In short, it is difficult to motivate the bivalency
prediction that [nasal] and [+nasal] both exist as basic structural properties, and hence, as
two separate natural classes.
It seems, then, that the problem with bivalent features arises from their ability to refer
to negative categories that is, to properties which are absent from a segments structure.
To reinforce this point, consider other negative features besides [nasal] that characterise oral
vowels under a bivalent feature system. Oral vowels are all non-lateral, for example. But the
feature [lateral] does not define a natural class either, because it identifies a whole range of
sound classes besides oral vowels (e.g. obstruents, nasal stops, rhotics) which cannot be
unified by the presence of even a single common property. Compare this with a true natural
class such as [+nasal], whose members comprise nasal vowels and consonants; all and only

2
Note that [nasal] fails to capture the class of segments targeted by nasal harmony in Warao, because this set
includes some non-nasals (e.g. glides) but excludes other non-nasals (e.g. obstruents).
14

these sounds act as harmonic triggers in Warao because only these sounds possess the active,
class-defining feature [+nasal].
By contrast, the use of monovalent features makes it possible for the segmental
structure itself to show that nasal vowels form a grammatical set whereas oral vowels do not.
[nasal] identifies the nasal vowels as a natural class, while the lack of any equivalent feature
specification for oral vowels indicates that they have no common behaviour; furthermore, it
prevents the grammar from referring to them as a unified set. In more general terms, the
monovalent feature [nasal] groups together nasal vowels and consonants as a natural class, as
evidenced by Warao nasal harmony, whereas the arbitrary set of non-nasal segments (oral
vowels and all non-nasal consonants) displays no common properties and consequently has
no feature specification to indicate natural class status.
The conclusion to be drawn from this comparison between monovalent and bivalent
features is that bivalency makes for an altogether less restrictive system. Since bivalency
forces representations to specify either the presence or the absence of a given property, the
number and nature of specified and therefore potentially active phonological properties
exceeds what is actually observed in natural languages. In other words, it predicts the
possibility of many phonological processes and therefore many grammars that would
presumably be ruled out by a more constrained theory. Of course, the notion of restrictiveness
now plays a relatively minor role in theory building. By contrast, in early generative theory
the issue of restricting the generative capacity of the grammar was of central concern, when
the focus was on developing a model that could generate any possible grammar and at the
same time rule out any impossible one.
Even the authors of SPE recognised that the use of bivalent features did not square
easily with the generative ideal. This is clear from the final chapter of SPE, where they
acknowledge an asymmetry between the two values of a feature which cannot be expressed
simply by plus or minus. Their response was to propose a theory of markedness an
independent mechanism for calculating the grammatical significance of different feature
values, these calculations being based on cross-linguistic generalisations about the choice of a
default or unmarked value over its opposite value. According to their proposal, the relative
markedness of [+feature] or [feature] could be determined on the basis of, for example, how
widely a feature value was distributed across languages and the stage of acquisition when a
feature value is first used. However, the elaborate way in which markedness theory was
formulated does little to disguise its true identity as a repair strategy and an admission that
15

bivalent features do not provide the most appropriate means of representing basic units of
sound structure.
The distinction between the default value and the marked value of a bivalent feature
later fuelled the development of Underspecification Theory (Archangeli 1988, Mohanan
1991), which attempted to exclude inactive or redundant features from lexical representations.
This left only the feature specifications that were required for making contrasts. The result, at
the lexical level at least, was something akin to monovalency, since the phonology could
refer to only one value of a bivalent feature. However, because the underspecification model
was essentially a variant of the standard approach, it was tied to the notion of derivation. And
the process of deriving a pronounceable phonetic representation from this abstract lexical
form involved the application of redundancy rules which supplied missing (default) feature
values. The reasoning behind this move was straightforward: it was assumed that only a fully
specified representation could be phonetically interpreted. In hindsight, however, it is clear
that the addition of default feature values had the effect of reintroducing bivalency and its
associated problems. In models which relied on rule ordering, for example, a later rule could
target the default feature values that had already been introduced by redundancy rules.
In another development of the standard approach, Feature Geometry (Clements 1985,
Sagey 1986, Halle 1992) has also attempted to overcome problems arising from the use of
bivalent features. A number of proposals within Feature Geometry have posited traditional
bivalent features that are then grouped into monovalent constituents such as CORONAL,
DORSAL and LARYNGEAL, as the geometric fragments in (3) show:

(3) CORONAL DORSAL

[ant] [distr] [high] [low] [back] [ATR]

These structures are typical of feature geometry trees, in that bivalency is employed for
features such as [ant], where both the positive and negative values are thought to be
phonologically active, whereas monovalency is used for other features particularly for
articulator nodes such as CORONAL and DORSAL where only the positive value is
recognised as a natural class. Clearly Feature Geometry attempts to address the problematic
issue of an asymmetry between the two values of a feature. However, it seems an arbitrary
matter as to whether a feature is treated as single-valued or binary. In other words, the
16

valency of a feature appears to be an inherent and unpredictable property of that feature
simply an observation about its behaviour in the phonology.
But if the task of identifying the basic units of segmental structure comes down to one
of observing active properties, then it is logical to assume that we can observe only what is
there, not what is absent. This means that if [+ant] and [ant] are both active in the grammar,
they must represent two distinct, equal and independent (albeit complementary) properties
that are both in some sense positive. As such, they are better expressed as a pair of
monovalent features such as [anterior] and [posterior].
3
Moreover, if the same idea can also
be extended to other cases where polar values are typically used, then it becomes feasible to
dispense with bivalency altogether: each negative feature displaying active phonological
behaviour is replaced with an equivalent monovalent feature, as illustrated by the
hypothetical example [ant][posterior], while redundant negative features are simply
ignored because they are linguistically insignificant. The result is a wholly monovalent
approach to the representation of segmental properties. This is the position taken in Element
Theory. The following sections will show how the notion of element is entirely consistent
with the theoretical conclusions drawn above: units in segmental representation should be
monovalent and should map onto linguistically significant patterns in the speech signal.

2.5 Elements and the grammar
From the way phonological representations are formulated in the standard approach, it is easy
to gain the impression that features occupy a separate and autonomous level of structure. Of
course they do show a direct relation with prosodic structure, by virtue of being associated to
syllabic constituents or to intervening timing units. But they appear to play no role in
determining or even influencing other aspects of the phonology. This is clear from the fact
that features have been transferred from the standard approach to quite different theoretical
models like Optimality Theory (Kager 1999, McCarthy 2002) without the need for any
modification. In the case of elements, however, the same is not true: here I show how the
decision to employ elements in representations goes hand in hand with other decisions about
the shape of the grammar. In 2.3 it was argued that elements should map onto patterns in the
acoustic signal, and in addition, in 2.4 it was claimed that they should be single-valued. Let
us now consider the effects of these two conditions on the phonological model as a whole. It

3
To my knowledge, [posterior] has never been seriously considered as a member of the feature set. However,
we do find legitimate cases where the standard approach has recast a single bivalent feature in monovalent
terms: for example, [ATR] may be redefined as [ATR] and [RTR].
17

will emerge that these characteristics of the elements affect how we understand the role of
derivation, which in turn has implications for the theoretical issue of abstractness.
In the standard approach, representations need to be fully specified before they can be
articulated or perceived. This requires that every feature be assigned a positive or negative
value. The same is true of Underspecification Theory, where missing feature values are filled
in by rule in order to arrive at a full specification that is pronounceable at the phonetic level.
By contrast, Element Theory recognises only monovalent units in segmental structure, which
prevents it from referring to negative values at any level of representation. From a traditional
viewpoint, this absence of unmarked values makes element-type representations look
structurally incomplete, and therefore not ready to be phonetically interpreted. It should be
clear by now, however, that the Element Theory approach has little respect for theoretical
tradition. This even extends to its definition of the notion segment. In feature-based models,
a segment is the smallest pronounceable unit of structure; and a segments representation
must be fully specified before it can be pronounced. As Harris & Lindsey (1995) have noted,
a single feature like [+ant] or [lateral] on its own provides language users with no linguistic
information; rather, for a feature to mean something it must be accompanied by a battery of
other features which can then collectively describe a whole segment. In other words, a feature
can contribute to a segments identity, but it cannot constitute the segment itself.
With its commitment to monovalency, on the other hand, Element Theory takes a
quite different approach to the phonetic interpretation of segmental structure. It assumes that
every element has its own phonetic identity, allowing it to stand alone in a segment and be
pronounced without the need for support from other elements. This makes it possible for a
segment to contain just a single element in its representation. Furthermore, if elements can be
pronounced in isolation, then there is no need for the phonology to complete a segments
representation by supplying default values. Indeed, under monovalency it is not clear how a
default value could be specified, given the impossibility of referring to the absence of a
marked property.
Below, the phonetic interpretability of elements will be demonstrated at length, so a
simple example will suffice here. The element |I| represents a pattern in the speech signal
characterized by a spectral peak at the top of the frequency band for vowels; this peak marks
the convergence of the second and third formants, which speakers typically associate with
front or palatal articulations. When |I| appears alone it is interpreted phonetically as the vowel
[I]. No other elements are needed in the representation of [I] because |I| (or high F
2
) is its only
18

marked or positive property. Although [I] does have other phonetic qualities including (in
traditional feature terms) [+high] and [round], Element Theory treats these as unmarked and
phonologically inactive;
4
as such, they are not specified in this vowels structure. When a
speaker interprets |I| as [I] the result in phonological terms is pure frontness of F
2
, since no
other elements are present to indicate other marked properties. This is also the reason why [I]
is interpreted with the default phonetic qualities [+high] and [round]: a [+high] vowel results
from the absence of the open element |A| (see footnote 4), while a [round] vowel is the
phonetic byproduct of there being no round element |U| in the representation of [I]. The
elements |I A U| are discussed fully in the following section.
The previous paragraph has outlined one of the distinguishing properties of Element
Theory namely, the independent phonetic interpretability of elements. Yet an elements
ability to be interpreted in isolation is something which relates not only to segmental structure
but more generally to the organization of the phonology as a whole. If phonological
representations are pronounceable as they stand, then in principle Element Theory needs no
separate level of phonetic representation. In other words, the use of elements implies a
monostratal organisation of the phonology. Once again this marks a significant departure
from the standard approach, which assumes a bi-stratal (or multi-stratal) model in which two
(or more) levels of representation are required because each serves a different function:

(4) underlying representation function: lexical storage
(units: abstract, contrastive)

surface representation function: input to articulation/perception
(units: concrete, phonetic)

The traditional arrangement in (4) presents phonology as a device for creating
phonetic objects that is, for taking abstract phonological forms and converting them into
concrete phonetic forms that can serve as the input to external language processes such as
articulation and perception. As Harris (1994) points out, however, this renders phonology a
performance system, its purpose being to generate phonetic representations and check the

4
To capture the height dimension in vowels, Element Theory posits |A| as the marked property. The element
|A| loosely equates with the feature [+low], therefore high (i.e. non-low) vowels like [i] make no reference to |A|
in their representations. Section 3 describes the vowel elements in detail.
structure-changing
operations
19

grammaticality of utterances. In effect, it places phonology outside linguistic competence and
thus outside the confines of the grammar. Yet treating phonology as extra-grammatical
clearly goes against our understanding of what language users know. We assume, for instance,
that linguistic knowledge includes knowledge of certain phonological generalisations like
patterns of alternation and distribution, which are evidently part of linguistic competence
because they exist independently of articulation and/or perception.
5

So by assuming a derivational model as in (4), the standard approach gives phonology
a somewhat ambiguous status with respect to its role in the grammar. At best, we might say
that the standard approach allows phonology to straddle both sides of the traditional division
between competence and performance: by capturing a languages structure-changing
operations (i.e. rules or constraints) it relates to competence, whereas by preparing lexical
forms for articulation and/or perception (i.e. derivational output) it relates to performance.
Clearly, however, this situation is at odds with the general assumption that phonology should
be treated as part of the core grammar.
In response, Element Theory avoids this ambiguity by keeping phonology entirely
within the domain of linguistic competence. In an element-based phonology, therefore,
phonological processes do not create phonetic or pronounceable forms; in fact, they have no
direct connection with utterances. Unlike in derivational models, their role is not to take an
abstract representation and convert it into something more physical; rather, they take an
abstract phonological form, such as a stored lexical representation, and impose structural
regularities on it so that it conforms to the grammar of a given language. For example, they
may force contiguous consonants to agree in voicing, or they may cause vowels to shorten in
closed syllables. In other words, phonological processes control grammaticality by generating
the set of grammatical phonological structures of a language. Importantly, however, the
output of such processes will be no less abstract than the input: an element-based process can
only change a phonological object into another phonological object.
Of course, the inability of an element-based phonology to generate phonetic forms is
countered by the phonetic interpretability of elements. As discussed above, it is proposed that
any element expression can be mapped onto its corresponding physical pattern in the speech
signal; moreover, this can take place at any stage of derivation, since lexical representations
and derived representations are assumed to be of the same type. In principle, then, any lexical

5
The traditional bi-stratal model in (4) is also motivated by the supposed advantage of separating idiosyncratic
information (in lexical storage) from predictable information (in the structure-changing component). As Harris
(1994) points out, however, this position has never been strongly defended in the psycholinguistics literature.
20

form may be interpreted by a speaker or hearer as it stands. In practice, however, the result is
likely to be an ungrammatical string, because in such cases the phonology has not imposed its
characteristic effects on the grammaticality of the structure in question. So although lexical
forms in Element Theory have much in common with derived forms for example, both
involve abstract phonological representations, both employ the same structural units, and
both can be pronounced as they are it is derived forms which are consistently grammatical
and thus relevant to the process of information exchange via the speech signal.

2.6 Summary
What this discussion has shown is that the Element Theory approach to representation takes a
more abstract view of phonology than we find in the standard approach, in the sense that
phonology itself is seen as being concerned only with abstract or cognitive objects. On the
one hand, the standard approach operates primarily as a performance system, generating
phonetic forms and thereby bridging the divide between the cognitive and the physical. On
the other hand, the element-based approach operates exclusively within the cognitive domain,
providing a system for organising language users knowledge about phonological strings and
about the internal structure of morphemes. So Element Theory incorporates phonology into
the competence grammar as follows:

(5)
component controls determining
syntax sentence structure how words behave in sentences
morphology word structure how morphemes behave in words
phonology morpheme structure how elements behave in morphemes

As a component of the cognitive grammar, phonology in Element Theory has little to
say about raw phonetics. Like other theoretical approaches, it does recognise the role of
phonetic factors such as ease of articulation and/or perception in shaping the phonology; but
unlike most other approaches, it does not see any place for phonetic factors in mental
phonological representations. Similarly, speech production is viewed as a grammar-external
process specifically, as a system for transmitting linguistic information; this effectively
puts articulation on a par with writing, since both of these media function as vehicles for
delivering language but neither actually constitutes the linguistic information itself. After all,
21

the inability to write does not prevent a person from acquiring a normal grammar, and neither
does the inability to speak.
Taking all these points into consideration, this paper develops a model of segmental
representation which uses monovalent elements as the basic units of phonological structure.
Elements represent the cognitive categories that are responsible for conveying linguistic
information about the structure of morphemes. For the purposes of communication, elements
also connect to the physical world by mapping onto information-bearing patterns that humans
perceive in the speech signal. However, their cognitive function remains primary. This means
that the process of identifying elements should begin with an analysis of phonological
behaviour (e.g. distribution, alternation, natural classes); only after an element has been
identified as a grammatical unit can it be associated with a particular speech signal pattern. In
other words, phonological structure is determined primarily through data analysis, and only
secondarily through listening.
22

3: Element Theory and the Representation of Vowels

3.1 Introduction
Section 2 considered some of the problems inherent in the standard feature-based approach to
segmental representation. It also claimed that these problems could be overcome by imposing
certain conditions on the way the basic units of segmental structure are formulated. In
particular, it advocated single-valued features which stand for abstract phonological
categories. These features, which I will refer to as elements, are the units which characterize
the lexical shape of morphemes but which also map onto information-bearing acoustic
patterns in the speech signal.
Element Theory claims that the segmental properties of all languages are described
using the set of six elements |A I U ? H N|. These fall naturally into two subgroups |A I U|
and |? H N|, the former being associated primarily with vowel structure and the latter with
consonant structure. Admittedly, this split between vocalic and consonantal elements is
something of an oversimplification, since vowel elements do appear in the representation of
consonants, and vice versa. Indeed, as a consequence of abandoning distinctive features, it
becomes possible to play down the importance of the traditional categories vowel and
consonant and instead treat these terms simply as informal labels. So for the sake of
convenience I will continue to refer to vowels and consonants as segment types, but this does
not imply any formal bifurcation in terms of their segmental structure. This paper will focus
on vowel representations and therefore on the role of the elements |A I U|. For a description
of consonant representations and the remaining elements |? H N|, see Backley (in prep).
Before discussing the structure of vowels in detail, it is worth making the point that
the set of vowel elements in (6a) is smaller than an equivalent set of features such as (6b):

(6) a. elements for vowels: |A|, |I|, |U|
b. features for vowels: [high], [low], [back], [round], [ATR]

In fact, this difference reflects a more general divergence between the two approaches over
the issue of generative capacity: namely, feature systems tend to over-generate while element
systems tend to under-generate. A single feature usually represents a very specific segmental
(typically articulatory) property, so in order to describe (the articulation of) a segment in full,
23

the grammar must call upon a sizeable number of different features. For example, Odden
(2005) uses 17 features to describe English consonants and a further 5 features to describe the
vowels. Unfortunately, however, having so many features available opens the door to serious
levels of over-generation, where the set of possible combinations of feature values and
thus, the set of possible segmental contrasts is far larger than that required by the grammar
of any one language. To address this problem, the phonology must restrict combinability in
some way; restrictions have come in the form of feature-geometric relations (see 2.4 above)
or negative constraints such as *[+ATR, +low] (Archangeli & Pulleyblank 1994).
6

In contrast to feature theories, which generate too many segmental expressions and
thus have to impose constraints on their output, Element Theory takes the opposite position
of first generating a minimal set of contrasts capable of describing only the simplest and most
common segmental inventories. As (6) shows, this is made possible by recognizing a
relatively small number of basic structural units. Now, with only a small set of elements to
hand, the phonology must have ways of expanding its generative capacity to accommodate
larger and more complex systems of contrast. Yet according to Element Theory this is the
preferred position, claiming that this under-generation approach is more restrictive because it
gives the grammar greater control over the size and shape of segmental systems. So the
function of an element-based grammar is to generate a small set of attested forms rather than
to eliminate a potentially large set of unattested ones. In this way, the set of vowel elements
in (6a) is intentionally small a fact which reflects the way Element Theory is committed to
addressing the issue of excessive generative capacity that continues to characterize feature-
based models.

3.2 What makes |A I U| special?
For the reasons just outlined, the set of vowel elements should initially be capable of
generating vowel systems that are typologically unmarked that is, structurally simple and
cross-linguistically widespread. Why then should |A|, |I|, and |U| qualify as the most basic
segmental properties in such systems? Crothers (1978) and other vowel typology surveys
confirm that the universally preferred inventory has the following five-vowel arrangement:

6
Although the filter *[+ATR, +low] succeeds in capturing a distributional regularity, it is nonetheless arbitrary
in that it fails to explain why this combination is ungrammatical whereas, for example, [+ATR, low] is
widespread. Even illogical combinations such as *[+high, +low] cannot simply be dismissed as ungrammatical
if the features in question really do stand for abstract phonological categories rather than articulatory properties.
24

(7) I u
c o
a

Yet despite the unmarked status of (7), it cannot be assumed that this system of five vowels
corresponds to the presence of five basic phonological properties. For instance, we cannot
automatically treat [a I u c o] as the phonetic instantiation of a corresponding set of elements
such as |A I U E O|. In fact, there are strong arguments to indicate that the mid vowels [c o]
belong to more than one natural class (Harris 1994), which in turn suggests that [c] and [o]
are each represented by more than one element. In other words, the phonological structure of
the mid vowels [c o] is apparently not as basic as that of the corner vowels [a I u].
Treating [a I u] as the least marked vowels follows naturally from their unique
properties. In describing these properties, let us begin with language typology, and with the
fact that [a I u] are cross-linguistically very common, indeed present in almost every known
language. When we examine the smallest attested vowel systems, which usually comprise
only three vowels, we find such systems regularly employing only these corner vowels. The
examples in (8) are from Lass (1984):

(8) [I u a] (Tamazight) [i u v] (Quechua) [I u a] (Moroccan Arabic)
[I u v] (Greenlandic) [c o v] (Amuesha) [I u +] (Gadsup)

A comment is in order about phonetic vowel quality. On the understanding that the
vowel symbols in (8) stand for phonological categories rather than phonetic tokens, we do
expect to find some cross-linguistic variation in the way the same contrastive system is
interpreted phonetically. This applies not only to the systems in (8) but also to 5-vowel
systems. Take Spanish [a I u c o] and Zulu [a I u c ], for example. A comparison of, say,
Spanish [c] with Zulu [c] would show that these sounds have similar phonological properties
and play the same role in their respective systems. What counts in Element Theory (and in
related theories such as Dependency Phonology) is the behaviour of a sound with respect to
(i) natural classes and (ii) other contrastive sounds in the same system. Phonetic values are
not taken to be the main criterion for identifying melodic representations which, of course,
25

refer only to phonological categories. Although this purely phonological approach is not
universally accepted (Lass 1984, Crothers 1978), it does have support from various quarters
including Anderson (1985) in his discussion of the work of Trubetzkoy:

In general, it is not possible to tell from the phonetic properties of a segment
in isolation... how it should be characterized phonemically. This is because it
is not merely its phonetic identity that matters phonologically but, more
importantly, what other segments it is opposed to in the language in question.
(Anderson 1985: 96)

The same three corner vowels [a I u] also appear in larger systems, where additional
vowels are accommodated by making use of other areas of the vowel space rather than by
encroaching on the apparently primary areas occupied by [a I u]. Typological patterning
shows no evidence that [a I u] are ever replaced by other segments when the vowel system
becomes more complex and the vowel space becomes more crowded. So, the (near-)universal
distribution of [a I u] gives these vowels an unmarked status, which suggests that they are
phonologically basic in any vowel system where they appear. Looking at this another way,
the absence of [a], [i] or [u] would render any system highly unusual and thus typologically
marked. In terms of segmental representation, this casts [a I u] more precisely, it casts the
speech signal patterns characterizing [a I u] as the three most basic properties of vowel
structure. In other words, [a I u] are the result of phonetically interpreting three of the basic
units of phonological structure, specifically the vowel elements |A I U|, respectively.
Support for the primary status of |A I U|, then, comes in part from language typology
patterns. But the decision to identify |A I U| as elements that is, as basic properties of
segmental structure does not rest solely on the shape of vowel systems. Other evidence
including the active participation of |A I U| in phonological processes must also be taken into
account, and this will be discussed below. It will be shown that each of the vowel elements is
regularly involved in phonological processes, either as an active property or as a target.
Before turning to the phonological evidence, however, let us consider some points relating to
the phonetic properties of the vowels [a I u].
Quantal Theory (Stevens 1989) claims that the areas of the vowel space which
correspond to [a I u] display clearer, more stable phonetic qualities than do those areas
occupied by other vowels. It claims that each of the [a I u] regions is associated with a strong
and easily identifiable acoustic pattern created by the convergence of different vowel
26

formants (see 3.3 below); this makes it possible for language users to tolerate a substantial
amount of phonetic variation and/or signal distortion when perceiving or reproducing the
acoustic patterns associated with [a I u]. By contrast, this is not the case with more marked
sounds such as mid and central vowels, which, in order to be transmitted successfully, require
a higher degree of accuracy on the part of speakers and listeners. For example, an overly
loose interpretation of the vowel [c] can easily cause it to be confused with an acoustically
similar sound like [i] or [c]. In comparison, it is unusual for [a I u] to be involved in
segmental confusions of this kind.
An alternative explanation for the predominance of triangular vowel systems is
offered by Dispersion Theory (Lindblom 1990), reported in Johnson (2003: 112). According
to Lindbloms adaptive dispersion view, languages favour [a I u] because the acoustic
properties of these vowels specifically, their F
1
and F
2
values are maximally distinct.
Assuming that vowel contrasts can be defined primarily with reference to formant values, this
means that [a I u] mark out the extreme points of the vowel space in other words, they are
maximally dispersed. Now, given that the primary purpose of speech is to communicate
linguistic information, it is reasonable to expect spoken languages to have evolved in such a
way as to make the communication process as efficient as possible. Thus, following
Lindblom (1990), we anticipate that any vowel system should show a natural tendency to
exploit those vowels with the most distinctive acoustic characteristics, as this would optimize
perceptibility.
In sum, there is strong phonetic and typological evidence to indicate that [a I u]
behave as basic vowels cross-linguistically. Reflecting this special status, Element Theory
takes these vowels to be the interpretation of the elements |A I U|, three primary phonological
units which act as the building blocks of vowel structure. As the term element suggests,
these units are compositionally basic; they are structural primitives, and as such, cannot be
broken down into smaller units. When a single element from the set |A I U| is phonetically
interpreted, it maps onto a pattern in the speech signal corresponding to one of the three basic
vowels [a I u]. In the majority of languages, however, elements regularly appear in
combination too. By combining the signal patterns of two or more elements and interpreting
these simultaneously, a set of non-basic or compound vowel expressions is created, such as
mid vowels [c o]. Below I shall demonstrate how single elements (3.3) and also element
compounds (3.4.1) are mapped onto the speech signal.
27

At this point it is worth noting how |A I U| compares with traditional features. By
treating |A I U| as primary structural units, Element Theory emphasizes the special status of
the vowels [a I u] and, more generally, highlights a cross-linguistic preference for triangular
vowel systems. In comparison, traditional features based on articulation are unable to express
these properties in any natural way. For example, using the minimal set of distinctive features
in (9), the standard approach cannot differentiate between the class of basic vowels [a I u]
and the class of non-basic vowels [c o]:

(9)

As (9) shows, the same number of feature values is needed to describe a (basic) corner vowel
as is required for a (non-basic) mid vowel. By comparison, in element-based representations
the corner vowels comprise a single element each, whereas the mid vowels as 3.4.2 will
explain contain two elements. So, unlike in Element Theory, the special status of [a I u] is
not encoded directly in feature-based representations. Furthermore, the feature matrices
representing [a I u] in (9) give no indication that these sounds form a natural class of basic
vowel properties. As already noted, in the standard approach it is necessary to look beyond
the features themselves, to some external markedness metric, for example, in order to capture
typological generalizations like these.
On the question of the shape of vowel systems, too, elements and features make
different predictions. By taking the three phonological primes |A I U| to mark the extreme
points in the (acoustic) vowel space we naturally arrive at a triangular arrangement as the
default pattern. And this seems consistent with the typological facts. On the other hand, in
feature-based vowel representations it is the two features [high] and [back] that define the
limits of the vowel space. Since both are binary, they mark out a vowel space approximating
to a square:

(10) [+hi] [+hi]
[bk] [+bk]

[hi] [hi]
a I u c o
[high] + +
[back] + + +
[round] + +
28

[bk] [+bk]

The arrangement in (10) has an articulatory bias, as it reflects tongue position specifically,
the height and degree of backness of the tongue needed to produce different vowel sounds.
However, a vowel square fails to capture the special status of [a I u], thereby missing an
important generalization concerning typological markedness. Moreover, if Dispersion Theory
is correct in assuming that languages prefer vowels which are maximally distinct, then from
(10) we can infer that the vowels at each of the four corners of the vowel square are equally
unmarked. Yet this is clearly not the case: the [hi,bk] vowel [] is cross-linguistically less
common than [I] ([+hi,bk]) or [u] ([+hi,+bk]), for example.
Here I have reviewed some of the reasons for treating [a I u] as basic vowels.
Element Theory characterizes the special status of these vowels by equating each with an
element from the set |A I U|, where these elements function as active phonological units in
vowel contrasts and vocalic processes. It should be noted that Element Theory is by no means
the first to recognize the significance of |A I U| as phonological primes. The vowel elements
are pre-dated by the particles of Particle Phonology (Schane 1984) and by the components of
Dependency Phonology (Anderson & Ewen 1987), both of which can be traced back to
three principal underlying and abstract 'characteristics' involved in vowel formation |u|
'roundness', |i| 'frontness', and |a| 'lowness' first proposed by Anderson & Jones (1974: 16).
What sets Element Theory apart from these other models of vowel representation, however,
is its claim that elements are associated specifically with properties of the speech signal.
Further discussion of the motivation for |A I U| can be found in Rennison (1986).

3.3 |A I U| as simplex expressions
Elements are primarily abstract units of linguistic structure: they determine the lexical shape
of morphemes, and they behave as active properties in phonological processes such as
assimilation and lenition. So we identify individual elements by studying language data by
analyzing sound contrasts, distributional patterns and dynamic phonological changes. But in
addition, elements connect to the physical world through their association with certain
patterns in the acoustic speech signal. Once an element has been identified through its
phonological properties, an analysis of its phonetic characteristics may be carried out in order
to establish its unique acoustic signature. The typological evidence reviewed in 3.2 pointed
to the existence of three vowel elements |A I U|. This section examines the speech signal
29

patterns represented by these elements; then, to reinforce the status of |A I U| as phonological
primes, it considers their roles in linguistic structures and dynamic phenomena.
Element Theory assumes that language users focus on three specific patterns in the
speech signal when producing or perceiving vowels. These patterns are revealed by analysing
the distribution of energy across the frequency band from zero to around 3kHz the
frequency range which contains the first three formants and which is therefore crucial for
perceiving vowel sounds. The figures in (11) show the signal patterns that speakers and
hearers associate with the three abstract phonological categories |A I U|. Spectrograms of the
corresponding vowel sounds [a I u] are given in (12).

(11) Spectral patterns for |I|, |A| and |U|

Figure 1: |I| as a dIp Figure 2: |A| as a mAss Figure 3: |U| as a rUmp

(12) Spectrograms of [a I u] showing the first three formants

Figure 4: [I] Figure 5: [a] Figure 6: [u]

The pattern for |I| in figure 1 consists of two energy peaks with a characteristic dip in
between. One peak is located at the lower end of the vowel spectrum at around 500Hz (on the
horizontal axis), and the other is at the upper end at approximately 2.5kHz. The peaks
themselves represent bands of energy, typically resulting from the convergence of two
formants; so the same pattern can also be extracted from the spectrogram for [I] in figure 4.
30

This figure shows a low F
1
value for the high vowel [I], as indicated by the concentration of
energy in the 0-500Hz range (cf. the leftmost peak in figure 1). This vowel also has a high F
2

converging with F
3
at around 2.5kHz, which creates a concentration of energy at the top of
the spectrum (cf. the rightmost peak in figure 1). The sharp drop in energy in the middle of
the spectrum, corresponding to the lighter area between 1-2kHz in figure 4, gives |I| its
mnemonic label dIp.
7

The signal pattern for the element |A|, on the other hand, has the informal label mAss.
This term describes a mass of energy located in the centre of the spectrum, peaking at around
1kHz. As figure 2 shows, there is a drop in energy on either side of this mass. The same
characteristic mAss pattern is reflected in the spectrograph for [a] in figure 5, where the
energy peak results from a high F
1
value converging with F
2
in the 1kHz region. Finally, the
speech signal pattern for the element |U| is characterised by a concentration of energy at the
lower end of the spectrum. In figure 3 the energy peaks are contained within the 0-1kHz band,
while across the higher frequency range we observe a steady fall. This falling spectral shape
has been dubbed rUmp. Again, the pattern is visible in the spectrograph for the corresponding
vowel: figure 6 shows how [u] involves a lowering of all formants, with F
1
at around 500Hz
and F
2
at around 1kHz.
Of course, the formant patterns in figures 4-6 are subject to some inter-speaker (as
well as intra-speaker) variation. Nevertheless, the above samples taken from my own speech
should illustrate the general physical correlates of the phonological categories |A I U| when
each element is interpreted in isolation. In fact, from an Element Theory point of view such
variation is of no linguistic consequence, since the theory defines elements only in terms of
their overall spectral pattern i.e. dIp, mAss and rUmp and not by referring to raw
acoustic data such as precise formant values. In the preceding paragraphs I have used specific
frequency values to describe each spectral pattern in a precise way; but it must be stressed
that numerical data of this kind is for descriptive purposes only it has no formal place in
the Element Theory grammar.
8
A fuller description of the spectral properties of |A I U| can be
found in Harris & Lindsey (1995).

3.4 |A I U| in compounds
3.4.1 Phonetic evidence for element compounds

7
The labels dIp, mAss and rUmp are taken from Harris (1994: 139).
8
Not all models of segmental structure take this position. For example, Flemming (2002) proposes that scales of
formant values be incorporated directly into vowel representations.
31

The definition of elements as speech signal patterns appears to be consistent with the Quantal
Theory explanation for why languages favour triangular vowel systems bounded by |A I U|.
As noted above, Quantal Theory assumes that each corner of the vowel triangle is associated
with a unique and unambiguous acoustic pattern which is exactly what the vowel elements
represent. The original Quantal Theory descriptions, which refer to patterns of converging
vowel formants, are redefined in (13) in terms of the impressionistic spectral shapes shown in
figures 1-3:

(13)

The summary in (13) shows that each vowel element has a pattern which is not only unique
but also highly distinct, given the small number of variables involved. So the three-way
contrast between [I], [a] and [u] should be easy to recognise, and moreover, difficult to
confuse, just as the quantal approach predicts. However, most languages have vowel systems
containing more than just [I a u], which means they must allow elements to combine into
compound expressions. Let us now look at compounding in more detail. We first examine the
effects of compounding on the speech signal, and then consider the phonological properties
of compounds.
It will be recalled from 3.2 that the universally unmarked vowel system consists of
the corner vowels [a I u] plus the mid vowels [c o]. It has already been argued that [a I u]
have a special status as basic vowels, which is reflected in the way each corresponds to a
primary unit of phonological structure i.e. an element. In contrast, the mid vowels do not
share this status. Instead, the phonological evidence indicates that [c o] are each the result of
combining two elements and interpreting these simultaneously: [c] is represented by the
compound |I A| while [o] comes from |U A|. Now, assuming that every element is associated
with a spectral pattern, and further assuming that all information relating to element structure
is transmitted via the speech signal, we can expect the speech signal itself to contain complex
spectral patterns when a mid vowel is interpreted. The spectral patterns for mid vowels are
shown in (14) and (15):

|I| |A| |U|
position of peak(s) low + high centre low
position of trough(s) centre low + high centre + high
32

(14) Spectral pattern for |I A| (versus |I|)

Figure 7: |I A| ([c]) versus Figure 8: |I| ([I])

The mid vowel [c] results from the interpretation of the compound expression |I A|,
with both elements contributing to the overall shape of the composite spectral pattern in
figure 7. In the centre of the spectrum we find the dip between F
1
and F
2
that characterises |I|,
though this is both narrower and shallower than in the pure dIp pattern in figure 8 (repeated
from figure 1). The difference introduced in figure 7 is accounted for by the presence of |A|,
which produces an energy mass in the same central region with troughs on either side. In
short, the |I A| compound creates a dIp within a mAss a large central mass of energy
containing a dip inside it.

(15) Spectral pattern for |U A| (versus |U|)

Figure 9: |U A| ([o]) versus Figure 10: |U| ([u])

The mid vowel [o] is the result of interpreting the compound expression |U A|. In
figure 9 the presence of |U| ensures that a concentration of energy is maintained at the lower
end of the spectrum, as we find with the pure rUmp pattern in figure 10 (repeated from figure
3). Unlike [u], however, where the energy peak is located very near the bottom of the
spectrum, the mid vowel [o] shows a concentration of energy somewhat closer to the central
33

region; as Harris & Lindsey (2000) point out, the energy peak in [o] is far enough above the
bottom of the frequency range to constitute a mAss, with troughs above and below (Harris &
Lindsey 2000: 196). So the |U A| compound produces a rUmp within a mAss a centralised
mass of energy which falls as the frequency increases.

3.4.2 Phonological evidence for element compounds
So there is phonetic evidence to indicate that mid vowels are complex structures: the spectral
pattern for |I A| (= [c]) combines mAss and dIp, while the pattern for |U A| (= [o]) combines
mAss and rUmp. But structural complexity is primarily a phonological property, which means
that support for the existence of element compounds like |I A| and |U A| should come
primarily from phonological evidence. In the case of mid vowels, the evidence focuses on the
way the individual elements in a compound become visible under certain phonological
conditions. In other words, the phonology allows us to see inside complex expressions and
observe their internal composition.
The following examples are, above all, intended to support the existence of element
compounds in the grammar. Additionally, however, they reinforce the status of |A I U| as
phonological primes, since they demonstrate how these elements regularly participate as
active units in various dynamic phenomena. In this section I shall discuss examples of vowel
processes which make reference only to the five vowels [a I u c o] introduced so far. In
general, these processes cause the internal (element) structure of a vowel to be reorganised or
reinterpreted in some way. This is illustrated by processes such as monophthongisation,
diphthongisation and vowel coalescence. Other process types that demonstrate the workings
of element-based representations include vowel harmony and vowel reduction; I shall touch
on these below, after having discussed the structure of element compounds in more detail.
The history of English provides numerous cases of monophthong formation and
diphthong formation. Following Harris (1994: 100), I describe these two processes together,
since one is essentially a reversal of the other. Many dialects of late Middle English had the
diphthongs [aI](~[I]) and [au] in the following words (data from Jones 1989):

(16) a. Middle English [aI]/[I] b. Middle English [au]
day [daI] day law [Iau] law
eight [aIl] eight dauhter [dauxloi] daughter
vain [vaIn]vain naught [nauxl] not
34

pay [paI] pay baul [bauI] ball

During the sixteenth and seventeenth centuries, however, these diphthongs began to develop
the monophthongal realisations [c:] and [:], respectively, which survive in some dialects of
Modern English: for example, British English retains [:] in law [I:] and ball [b:I], while
some regions in northern England also pronounce [c:] in eight [c:l] and pay [pc:]. Expressed
in |A I U| terms, this monophthongisation process involves a simple reorganisation of the
elements in the original diphthong:

(17) a. [aI] [c:] b. [au] [:]

N N N N

x x x x x x x x

|A| |A| |A| |A|
|I| |I| |U| |U|

(17a) shows how the interpretation of the expression |A I| has changed during the
development of the English vowel system. In late Middle English |A| and |I| were interpreted
separately, resulting in a diphthong [aI]. In this case, speakers distributed |A| and |I| across the
two prosodic positions in the nuclear domain. Later, however, language users began to
interpret the same elements simultaneously, thereby producing a mid vowel [c:].
9
Segmental
reconfiguration of this kind typically leaves the prosodic structure untouched, so the later
interpretation [c:] is still tied to a long nucleus. (17b) shows how back diphthongs also
underwent a similar reconfiguration process.
Importantly, monophthong formation comes about as a result of speakers and hearers
adjusting their interpretation of the original diphthong structures. The lexical structures
themselves are unchanged nothing has been added or removed. In the absence of any
representational changes, then, what we see in (17) is the mid vowel interpretations [c: :] of
the compound expressions |A I| and |A U|, respectively. On this basis, it should come as no
surprise that other ways of reinterpreting the same structures have also emerged. For example,

9
The compound expression |A I| can be interpreted as either [c:] or [c:]. Clearly, in languages with a [c:]~[c:]
contrast these vowels must have distinct representations. This will be discussed below.
35

Estuary English (South-East England) has since reverted to a diphthong realisation of |A I|:
day [daI], eight [aIl]. By contrast, in RP and many other dialects we also find a diphthongal
reinterpretation: day [dcI], eight [cIl]. These are illustrated in (18):

(18) a. Estuary English: day [dai] b. RP English day [dcI]

N N N N

x x x x x x x x

d |A| d |A| d |A| d |A|
|I| |I| |I| |I|

So, historical and dialectal evidence indicates that mid vowels are represented by
compound element expressions. Further support for the structures |A I| and |A U| comes from
other cases of English dialect variation, and in particular from the simplification (in effect,
monophthongisation) patterns found in various African Englishes. The examples in (19) are
taken from Simo Bobda (2007):

(19) a. [aI][c] diphthong simplification
like [IcL] Sierra Leone, Liberia
finding ['!cndip] Zambia
primary ['picmoii] Kenya
tribe [licb] Uganda

b. [au][] diphthong simplification
round [in(d)] Kenya
mouth [ml]~[mp] West African Pidgin
town [ln] Liberia
house [s] Krio

The process of diphthong simplification in African Englishes seems to be accompanied by
concomitant vowel shortening, as these cases of monophthongisation tend to result in a short
36

vowel. Nevertheless, as far as their segmental structure is concerned they reinforce the
patterns described in (17), and provide additional evidence for (i) the primary status of the
vowel elements |A I U| as active phonological units, and (ii) the representation of mid vowels
as the compounds |A I| and |A U|.
Looking beyond English, we see further evidence for the mid vowel structures |A I|
and |A U| in languages as diverse as Japanese and Maga Rukai. Kubozono (2001) describes
two processes of monophthong formation in Japanese, one historical and the other synchronic.
Towards the end of the Middle Japanese period, the diphthong [au] in Sino-Japanese words
underwent monophthongisation to [o:]:

(20) Middle Japanese monophthongisation
[au] [o:] cherry tree ()
[Lau] [Lo:] high (), fidelity ()
[Lyau] [Lyo:] capital (), home town ()

The output forms in (20) are subject to an analysis similar to that shown in (17b) for early
English. Meanwhile, in present-day Tokyo Japanese the reinterpretation process described in
(17a) has become a characteristic of casual speech (Kubozono 2001: 63), with [aI] being
monophthongised to [c:]. The diphthong [aI] is retained in formal speech, however, resulting
in the alternations shown in (21):
10

(21) Tokyo Japanese monophthongisation
[laIgaI]~[lc:gc:] usually
[Lyo:daI]~[Lyo:dc:] siblings
[IlaI]~[Ilc:] painful

In view of the Japanese patterns in (20) and (21), it is clear that analysing [c o] as the
element compounds |A I| and |A U|, respectively, does not just capture mid vowel behaviour
in English; rather, it describes a property of the vowel elements themselves. This point is
reinforced by the fact that similar behaviour is also observed in other, unrelated languages. In
Maga Rukai, an Austronesian language spoken in Taiwan, a synchronic process of vowel

10
Hirayama (2003) analyses the Japanese data in (21) using traditional features.
37

coalescence has created mid vowels that were not present in the proto-language (Hsin 2003).
The nouns in (22a) have the heterosyllabic vowel sequence [a][u] in the root of the
negative form, which corresponds to [o] in the positive. This [o] is the result of merging the
phonological properties [a] and [u]. In (22b) we find a parallel alternation between [a][I]
and [c]:

(22) a. [a][u] coalescence b. [a][I] coalescence
negative positive negative positive
I-L-vaIuu vIoo bee I-L-damIII dmcIc hemp
I-L-laIquu lIoqo bridge I-L-vaIsII vIcsc tooth
I-L-paIpuu pIopo pan I-L-caLII cLcc excrement

Maga Rukai has a pattern of vowel syncope determined by its iambic foot structure (Hsin
2003: 64). In (22) this is shown as the loss of [a] in the root-initial syllable of the positive
form. Yet although the nuclear position itself is suppressed, its segmental content |A| is
retained; this stray element is then interpreted in the adjacent nucleus:

(23) Maga Rukai vowel coalescence: [caLII]~[cLcc]

N N N N

x x x x x x

c |A| k c |A| k |A|
|I| |I|

So Maga Rukai provides another example of a process which reconfigures a representation in
such a way as to reveal the internal structure of mid vowels. The merger of |A| and |I| in (23)
produces [c] in [cLcc], while the same analysis also applies to the merger of |A| and |U| to
create [o] (e.g. [pIopo]).
The representations shown here follow the conventions of autosegmental phonology
in having individual elements occupy separate structural levels or tiers; in (23) for instance,
|A| and |I| reside on independent tiers. Although this arrangement is not crucial, it does offer a
38

convenient way of expressing a difference between possible element combinations (between
two elements on different tiers) and impossible ones (between two elements sharing the same
tier). In the canonical 5-vowel system [a i u e o] only |A I| and |A U| exist as compounds, so
in languages which have this system the elements |I| and |U| do not combine that is, |I| and
|U| are assumed to occupy the same tier.
Evidence similar to that provided by Japanese and Maga Rukai can also be found in
many other languages to support the analysis of mid vowels as element compounds. On this
basis, the element-based representation of the unmarked 5-vowel system in (7) appears to be
fairly robust. By and large this is true, though in section 4 I will show that the system for
creating element compounds must be refined if it is to account for larger and more marked
vowel systems. Before that, let us turn to another property of many unmarked systems the
presence of schwa.

3.5 Central vowels
3.5.1 Phonetic evidence for empty vowels
In some ways [o] behaves like the definitive unmarked vowel, as its presence seems to make
little difference to the overall markedness of the system in which it appears. As (24) shows,
[o] or an acoustically similar (central) vowel such as [I] is found in systems of varying
size and shape (length and nasality not shown here):

(24) [a I u] + [o] Wapishana
[a I u c ] + [o] Chukchi
[ i u c n :] + [o] RP English (short vowels)
[a I u c o c ] + [o] Wolof
[a I u c o u o] + [I] Turkish
[a I u c o c i u] + [o]
11
Bari

The ability of [o] to apparently slot into any vowel system suggests that it does not participate
in the usual segmental relationships with other vowels. For example, it is not obvious where
[o] stands with regard to natural class membership or the formation of element compounds. It
is as though [o] lies outside the |A I U| system altogether; and indeed, this is the position

11
Steinberger & Vago (1987) describe this sound as a centralised low vowel. Here I use the symbol [o], though
[a] may be equally appropriate.
39

taken in the Element Theory approach assumed here. As I shall show below, the phonetic and
phonological evidence points to the analysis of schwa as an unspecified vowel.
Let us first consider the phonetic evidence. In 3.3 each of the vowel elements |A I U|
was associated with a distinctive spectral pattern in the speech signal. It was argued that these
patterns are used as acoustic cues by listeners and as production targets by speakers when
they communicate linguistic information about the internal structure of morphemes. Recall
from (11) that |I| was characterised by a central dIp pattern, |A| by a central mAss pattern and
|U| by a falling rUmp pattern. Also recall that dIp, mAss and rUmp are the result of formants
coming together; when they converge, the formants create a concentration of energy (i.e. a
spectral peak) at a given frequency. Now compare those spectral shapes with the pattern for
schwa:

(25) Spectral pattern for [o] Formant pattern for [o]

Figure 11: | | ([o]) Figure 12: [o]

Unlike the spectra for |A I U|, the pattern for schwa in figure 11 is regular the peaks are
equally spaced. The same pattern is also visible in the spectrogram for [o] in figure 12, where
the formants are equally spaced at around 0.5mHz (F
1
), 1.5mHz (F
2
) and 2.5mHz (F
3
). In
other words, the formants for [o] do not converge. Importantly, the absence of converged
formants (or alternatively, the presence of equally spaced spectral peaks) translates into an
absence of linguistic information, assuming that only irregular patterns such as dIp, mAss and
rUmp characterise linguistic categories like |A I U|.
From the speakers viewpoint, producing a vowel with the pattern in figure 11 is
achieved by adopting a neutral or relaxed tongue/lip position the position associated
with a vocal tract that has a fairly uniform shape throughout. The position of the articulators
for [o] is effectively the antithesis of those positions for producing peripheral vowels like
[a I u c o], all of which require the articulators to be markedly displaced from the neutral
40

position. This difference between schwa and other vowels is to be expected, however, if
Element Theory is correct in its claim that vowel properties are mapped onto the acoustic
signal. The presence of |A|, |I| or |U| is associated with a strong, characteristic spectral pattern;
and to produce such a pattern speakers must adopt a distinct, non-neutral vocal tract shape.
On the other hand, the absence of any characteristic spectral pattern, such as we find in [o], is
naturally paired with a vocal tract configuration lacking any distinct shape. A uniformly
shaped tube is unable to manipulate formant values in any linguistically meaningful way, and
the phonetic result is schwa a central vowel of a neutral or indistinct quality.
So the spectral shape for [o] shows none of the characteristic vocalic patterns dIp,
mAss or rUmp, suggesting that [o] has no vowel elements in its representation. The absence
of |A I U| effectively leaves an unspecified or representationally empty vowel. As indicated
above, the Element Theory literature also considers schwa to be informationally empty
(Harris & Lindsey 2000), in the sense that having no element structure means it contains no
linguistic information. In Element Theory, representational emptiness and informational
emptiness amount to the same thing.
But if schwa has no element structure, how can it be heard and pronounced? Harris &
Lindsey (1995) argue that the spectral pattern in figure 11 may be viewed as a baseline
resonance that exists latently in all vowels. Usually this pattern is not heard, because in the
presence of |A I U| it is overridden by the more marked patterns dIp, mAss and rUmp. In the
case of most vowels, these marked patterns are superimposed onto the baseline resonance and
have the effect of masking it entirely. In the case of schwa, however, which has no elements,
the baseline resonance is exposed. Language users associate this resonance with the central
region of the acoustic space more specifically, with the only area of the vowel space not
occupied by |A I U|:

(26) |A I U| areas of the vowel space

|I|

|U|

|A|
I
o
41

It has already been noted that any vowel system may contain a neutral vowel, which
can vary phonetically between [o] and [I]
12
. Now consider the stylised vowel space in (26),
which demonstrates why this phonetic variation is possible, or perhaps even expected. The
absence of |A I U| corresponds to a central area of the vowel space covering a sizeable range
of different vowel qualities, any of which may be targeted by individual languages as the
interpretation of an unspecified vowel. Importantly, phonetic differences such as [o] versus
[I] are trivial in most languages,
13
because these variants refer to the same linguistic object,
namely a phonologically empty vowel. Harris & Lindsey (1995) liken the empty vowel to a
blank canvas a neutral background which becomes hidden when different colours are
painted on to it. And no matter what shade of white or grey the original canvas may be, it is
still interpreted as having no colour as long as it remains empty (i.e. unpainted).

3.5.2 Phonological evidence for empty vowels
It has been stressed that elements should be treated primarily as units of phonological
structure, and that their existence should therefore be supported by evidence from the
phonology. At first sight, however, it seems that a different approach may be needed in the
case of schwa, the empty expression | |, because it contains no elements in its representation
and thus amounts to nothing in phonological terms. In fact this is not the case. Although [o]
has no segmental content, it is still linked to the prosodic structure specifically, to a
syllable nucleus which is clearly within the scope of phonology. If [o] is to be viewed as
the interpretation of an empty nucleus, then it should receive a phonological analysis like any
other nucleus. Another reason for treating | | as a phonological object is that this empty
expression is often the result of a phonological process that removes element structure (e.g.
from weak syllables). If elements are removed from a vowel expression until nothing remains,
then it becomes possible for the baseline resonance of an empty nucleus to be interpreted.
The following examples from Bulgarian and Turkish illustrate the phonological identity of
empty nuclei.
Like English, Bulgarian (Pettersson & Wood 1987) has a full set of vowel contrasts in
stressed positions but only a reduced set in unstressed positions, as shown in (27). Examples
of these alternating vowels are given in (28) (data from Crosswhite 2004)::

12
Other realisations of an unspecified vowel are also possible: e.g. [m] in the Jivaro system [a I u m].
13
In 4.4.4 it will be argued that this is not true of English.
42

(27) Vowel system(s) of Bulgarian
stressed: I c u o a o

unstressed: I u o

(28) Vowel reduction in Bulgarian (data from Crosswhite 2004)
stressed unstressed
[scIu] village [sIIa] villages
[iogu!] of horn [iugal] horned
[iabulo] work [iobolnIL] worker

Bulgarian illustrates a common pattern whereby unstressed syllables support only a
subset of the vowel contrasts that are possible in stressed syllables: [I c] are neutralised to [I]
in weak syllables, [u o] become [u], and [a o] merge as [o]. Using traditional features it is not
easy to express these vowel reduction effects as a single process: [c][I] and [o][u] are
captured by [high][+high], whereas the same feature [high] is irrelevant to [a][o] as
both are [high]; instead, the change from [a] to [o] must be described as [+low][low].
Yet it is clear that the alternations in (27) are all motivated by the same conditioning factor
namely, the inability of an unstressed nucleus to support certain vowel properties. Restated in
terms of Element Theory, however, the generalisation becomes formally simple: |A| is not
licensed in unstressed syllables. As such, the element |A| is suppressed in those contexts but
language users still interpret any remaining elements.

(29) a. high vowels are unchanged (|A| not present)
[I][I] |I||I|
[u][u] |U||U|
b. mid vowels are raised (|A| suppressed)
[c][I] |A I||A I|
[o][u] |A U||A U|
c. central vowels become unspecified (|A| suppressed)
[o][o] | || |
[a][o] |A||A|
43

Bulgarian vowel reduction is a process that targets |A|, and because the high vowels in
(29a) lack |A|, they are unaffected. By contrast, the mid vowel compounds [c o] in (29b) do
contain |A|; this element is interpreted in stressed positions, but is suppressed in weak
positions; the loss of |A| leaves a sole |I| or |U| remaining, which is interpreted as the high
vowel [I] or [u] respectively. Turning to the patterns in (29c), these provide evidence to
support the analysis of [o] as an unspecified vowel. As a structurally empty vowel, [o] has no
|A| and is thus unaffected by vowel reduction: [o][o]. On the other hand, [a] has |A| in its
representation, this element being interpreted in stressed syllables. But in unstressed positions
[a] loses its entire element structure through the |A|-suppression process, leaving behind an
empty nucleus which is interpreted phonetically as baseline resonance: [a][o]. What (29)
shows is that these vowel reduction effects can be unified as a single process only if the
grammar allows for an unspecified vowel to appear in representations. In the absence of any
positive vowel properties (i.e. elements), this vowel is interpreted as neutral or baseline
resonance, typically [o].
The interpretation of phonologically empty nuclei is also observed in Turkish. This
language, like a number of other Altaic systems, has a well-documented process of vowel
harmony in which suffix vowels agree in backness with root vowels. In traditional analyses
the active property is assumed to be the feature [back], whereas in Element Theory it is the
element |I|. Recall that |I| identifies those vowels with a dIp spectral pattern; these have a
relatively high second formant, which places them in the front area of the vowel space. In
Turkish vowel harmony, when a root vowel contains |I| then the same element is also
interpreted in suffixes. For example, the genitive singular suffix in (30a) has a lexically
empty vowel, so the suffix is pronounced [In]. Under harmony conditions, shown in (30b), it
copies |I| from the root and the suffix vowel is interpreted as [In]:

(30) |I| harmony in Turkish
Nom. sg. Gen. sg. Nom. pl.
a. LIz LIz-In LIz-Iai girl
sap sap-In sap-Iai stalk

b. Ip Ip-In Ip-Ici rope
cv cv-In cv-Ici house

44

The nominative plural suffix also alternates, between its lexical form [Iai] (with a vowel
containing |A|) and its harmonising form [Ici] (with an additional |I|). Example structures are
shown in (31):

(31) a. LIz-In b. Ip-In c. cv-Ici
N N N N N N

x x x x x x

k | | z | | n p n |A| v l |A| r
|I| |I| |I| |I|

The forms in (30) present a somewhat simplified picture of the facts relating to vowel
harmony in Turkish.
14
Nevertheless, they are consistent with the analysis of [o]/[I] given
above, and with the claim that some grammars allow representations to contain structurally
empty nuclei. But if | | really has no element content, then why is it not interpreted as
silence? Having no elements means that | | cannot be mapped on to any linguistically
significant patterns in the acoustic signal; that is, it cannot carry segmental information.
However, | | is associated with a nuclear position, and this nucleus plays an important role in
the formation of prosodic structure. In combination with other nuclei, it contributes to the
construction of higher prosodic domains such as feet and words units which convey
linguistic information deemed essential for speech perception and efficient lexical access
(Cutler & Norris 1988). There is evidence, for example, that listeners pay particular attention
to the beginnings of foot and word domains when processing running speech. So, one
consequence of not interpreting an empty nucleus is to reduce the amount of linguistic
(specifically, prosodic) information being transmitted via the speech signal.
This is not to say that empty nuclei can never be silent. In fact, uninterpreted empty
nuclei are a grammatical possibility in many languages, including English (e.g. [:n.LIio]
unclear, where marks a silent nucleus). Importantly, however, their appearance needs to be
controlled in order to avoid the emergence of unmanageable sequences of consonants.
Grammars which allow silent empty nuclei must therefore impose restrictions on their
distribution (Charette 1991, Scheer 2004). But if a nucleus is silent, how can we be sure it is
there at all? English provides an answer to this question by showing how the same nucleus is

14
See Charette & Gksel (1996) for a more detailed account.
45

silent under certain conditions but phonetically interpreted under other conditions. The
following example illustrates the point.
According to one innovative approach to syllable structure, all well-formed lexical
representations end in a nucleus (Kaye 1990). Some languages such as Italian require this
final nucleus to be interpreted, with the result that words must end phonetically in a vowel.
For example, all native Italian words are vowel-final: casa house, case housing, caso
chance (but *cas); additionally, many loanwords in Italian have become vowel-final
through adaptation: gallon (English) gallone (Italian). By contrast, other languages allow a
final empty nucleus to be silent. As a result, they admit words ending phonetically in a
consonant: peach [pI:lj] (English), schlimm [jIim] bad (German), rhad [iad] cheap
(Welsh). Following Kaye (1990), the structure of the English word peach is shown in (32a),
where the word-final empty nucleus is licensed to remain silent.

(32) a. peach b. plural c. peaches
O N O N O N O N O N O N

x x x x x x x x

p |I| | | z | | p |I| | | z | |

As an independent lexical structure, the plural suffix in (32b) also has a final empty
nucleus which is not phonetically interpreted; in segmental terms, the plural marker consists
solely of its onset fricative [z].
15
And when a language user constructs the plural noun
peaches by concatenating the two forms (32a) and (32b), the result is the structure in (32c).
Since resyllabification is not permitted in Kayes model, the plural noun peaches ends up
with two empty nuclei one from the stem peach, the other from the suffix. It also contains
the two sibilant consonants [lj] and [z], which are phonetically adjacent and thus create an
unmanageable sequence of the kind mentioned above. Specifically, when these sounds are
adjacent, their similar acoustic properties make them perceptually almost indistinguishable.
Yet the perceptibility of [lj] and [z] and therefore the linguistic information
associated with these segments can be recovered by exploiting the lexical structure itself.
By phonetically interpreting the intervening empty nucleus | | as a neutral vowel [I], as was

15
The voicing properties of English obstruents are discussed in Backley (in prep).
46

observed for Turkish in (31a), important acoustic cues carried by the C-to-V [ljI] transition
and the V-to-C [Iz] transition can be easily perceived; as a result, the linguistic information
carried by [lj] and [z] is transmitted in full. So, without recourse to arbitrary measures such
as the insertion of an epenthetic vowel, we get the form peaches [pI:ljIz]. This analysis of the
[Iz] plural departs from the usual textbook explanation in two respects. First, [I] is seen here
as a product of the existing representation rather than as a newly introduced addition to the
structure. This is presumably a gain for restrictiveness, in that the distribution of empty nuclei
is strictly controlled by the grammar whereas epenthesis can in principle be applied anywhere.
Second, interpreting | | as a neutral vowel has a clear linguistic motivation, since it enhances
the perceptibility and recoverability of linguistic information. By contrast, the traditional
vowel epenthesis account is typically concerned with notions such as ease of articulation
which, following the discussion in 2.3 above, is best seen as non-linguistic in nature.
The behaviour of the English plural suffix provides further evidence for the existence
of empty nuclei in representations. It also shows how linguistic conditions can cause an
empty nucleus to be phonetically interpreted in a language-specific way. One aspect of the
analysis of [Iz] should be clarified, however. I have claimed that | | is interpreted as [I] in
English, which contradicts the general assumption that [o] is actually the default English
vowel. In section 4 I shall argue that English has both [I] and [o], but that these differ
phonologically and therefore structurally: while [I] is an empty expression, [o] has segmental
structure in the form of the element |A|. In order to describe the element structure of larger
vowel systems such as English, I shall introduce the idea of head-dependency relations
holding between elements in the same expression.
47

4: English Vowel Structure

4.1 Introduction
Section 3 introduced the vowel elements |A I U| and showed how these represent unmarked
triangular vowel systems comprising three and five vowels. In addition, it argued that some
grammars allow nuclear positions to be empty that is, to exist without segmental content.
With no associated elements, an empty nucleus contains no linguistic information; as such, it
corresponds to a vowel with a neutral or baseline speech signal pattern (i.e. with equally
spaced formants), usually interpreted as a central vowel in the [o]~[I] region. A vowel that is
representationally empty and phonologically neutral may appear in a vowel system of any
type, since it does not participate in natural class groupings like other vowels. So, using the
elements |A I U| it is possible to represent symmetrical vowel systems with up to six vowels,
such as [a I u c o o].
This accounts for a large number of grammars more than 60 percent of the
languages surveyed in Maddieson (1984), for example; and in these languages the basic
elements |A I U| are presumably the only segmental units involved in vowel representation
(ignoring nasality and tone). However, the element-based model must also be able to describe
larger and more marked vowel systems, which suggests that element structure will have to be
enriched in order to increase the number and type of vowel contrasts that it can formally
express. In view of the special unmarked status of |A I U|, there seems little advantage in
attempting to expand the element set by introducing new elements, as it is unlikely that any
additional elements would be unmarked in the same way as |A I U|. This leaves element
combination. By allowing elements to combine in different ways for example, in unequal
proportions the model can capture a larger number of vowel categories yet still maintain
the ability to refer to natural classes. This section first describes the mechanism of element
combination, and then analyses the vowel system of English in Element Theory terms.

4.2 Front rounding in vowels
So far, element combination has been restricted to those cases where |A| combines with either
|I| or |U|, producing the mid vowel compounds |A I| and |A U|. Two further combinations are
also possible, however, both of which result in relatively marked expressions. The first of
these is |I U|, a combination of dIp and rUmp interpreted as the high front rounded vowel []
48

(= IPA [y]). Figure 13 shows the spectral pattern for this vowel, which is characterised by a
dIp within the falling rUmp pattern:

(33) Figure 13: |I U| ([]) Figure 14: |I A U| ([])

Although the spectral pattern for this vowel appears to show its internal composition (i.e.
|I|+|U|) quite clearly, [] is generally regarded as being perceptually less distinct than other
compounds. This is explained by its formant structure, in which |I| causes an increase in the
value of F
2
while |U| has the counteractive effect of bringing about a decrease in F
2
(by means
of lip rounding). By lowering the second formant, and thereby suppressing the salient
acoustic characteristics of |I|, the element |U| has the effect of masking the linguistic
information associated with |I|. So it is not unexpected that any phonological expression that
combines |I| and |U| should be marked cross-linguistically. After all, the communication
process succeeds by transmitting linguistic information, not by concealing it.
For the same reason, the mid front rounded vowel [] (= IPA [o]) in figure 14 is also a
marked sound.
16
It is represented as |I A U|, as it combines the phonological properties of the
front vowel [c], expressed as |I A|, with those of |U|. The identity of this vowel as |I A U| is
confirmed by the pattern of vowel harmony found in Mongolian (and in several other Altaic
systems):

(34) |U| and |I| harmony in Mongolian
Nominative Instrumental
a. gaI gaI-a:i fire
b. dc:I dc:I-c:i coat
c. doio: doio:-go:i stirrup
d. noxoi noxoi-o:i comrade

16
According to the World Atlas of Language Structures database (Haspelmath et al. 2005), front rounded
vowels such as [] and [] are found in less than 7% of the worlds languages.
49

The instrumental suffix in Mongolian has a vowel specified lexically with |A|: its lexical form
(non-harmonised form) is [a:i]. But after a stem vowel containing |I|, as in (34b), this |I| is
interpreted throughout the harmonising domain, creating the |I A| compound in the form [c:i].
The element |U| is also harmonically active in this language. As (34c) shows, when |U| is
present in the stem vowel, it too is interpreted in the suffix, resulting in the |U A| compound
interpreted as [o:i]. Finally, in (34d) |I| and |U| harmony operate simultaneously: when both
elements are present in the stem vowel, both harmonise to the suffix, as (35b) shows:

(35) a. instrumental [a:i] b. [noxoi-o:i]

N N N N

x x x x

|A| r n |A| x |A| r |A| r
|U| |U| |U|
|I| |I| |I|

In some languages, then, it is possible for all three vowel elements to co-exist in a
single expression. Yet despite the prevalence of front rounded vowels such as [] in some
language groups (especially Altaic and Uralic), they remain relatively rare cross-linguistically.
For the reasons given above relating to perceptual distinctiveness, combining |I| and |U| is not
the preferred way of increasing the expressive power of the |A I U| model. Instead, some
grammars allow elements to form unequal or asymmetric combinations, as 4.3 shows.

4.3 Element dependency
As a general characteristic of linguistic structure, when two units in a representation are
formally related (e.g. through licensing) they create an asymmetric relation. This asymmetry
is commonly expressed by the notion of head-dependency, where the presence of a dependent
unit is sanctioned by the presence of its corresponding head unit, but not vice versa. For
example, Di Sciullo (2005) argues that head-dependency is central to morphological structure,
where the relation holds between units such as a (head) root and a (dependent) affix. The
structural head-dependency relation often manifests itself in terms of strong versus weak,
though these properties are interpreted differently according to the domain/level in question.
50

In the case of morphology, a dependent affix may show its relative weakness by supporting a
narrower range of lexical contrasts than its head, or by acting as a harmonic target but not as
a trigger.
Head-dependency also operates within the phonological domain, most notably among
units of prosodic structure. For example, when two syllable (rhymal) heads combine to form
a foot, the relation between them is always asymmetric: in some systems the foot constituent
is right-headed, in others left-headed. Either way, the syllable acting as the head of the foot is
expected to display certain strong characteristics, the nature of which will differ between
languages. For instance, prosodic headedness may determine whether a syllable can bear
stress, or whether it allows a particular segmental property to be interpreted (e.g. aspiration in
English); it may also influence the number or type of lexical contrasts that a syllable can
support.
Clearly, then, the notion of head-dependency is an established aspect of linguistic
structure. The question is whether it should apply to segmental structure in the same way that
it applies to prosodic and morphological structure. Models of segmental structure which
follow the |A I U| approach, including Element Theory, assume that it should; and for some
time these models have claimed that elements (or their equivalents in other frameworks) form
head-dependency relations with one another. To illustrate the role of headedness, consider
how Element Theory might represent the vowel system of the native American language
Tunica (an underlined element is the head of the expression).

(36) I u |I| |U|
c o |I A| |U A|
c |I A| |U A|
a |A|

Because this system has two pairs of mid vowels, the basic compounds |I A| and |U A|
will no longer suffice; rather, a structural distinction must be made between the close mid
vowels [c o] and the open mid vowels [c ]. Taking [c] vs. [c] as an illustration, it has been
remarked in the Element Theory literature that [c] has acoustic properties which make it more
palatal or |I|-like, whereas [c] has characteristics that give it more a resonant or |A|-like
quality. Both vowels involve a combination of |I| and |A|, but differ in the relative salience of
51

each element in the overall expression. As (37) shows, the relative prominence of one
element over the other is revealed when we compare the spectral patterns of the two vowels.

(37) Figure 15: |I A| ([c]) Figure 16: |I A| ([c])

In figure 15 the dIp pattern associated with |I| dominates the expression. The mAss
pattern from |A| is also visible, especially at the higher end of the frequency range, but it is
dIp which occupies the central region and gives [c] its palatal or |I|-like character. This
asymmetry is expressed in phonological terms by making |I| the head of the head-dependent
compound |I A| (head element underlined). It will be recalled that the head-dependency
relation usually manifests itself as a difference in strength or prominence, as noted in the case
of foot structure. This is also true for |I A|, where the head element |I| makes a greater
contribution to the spectral pattern of the overall expression than does its dependent |A|.
By contrast, in the pattern for [c] (figure 16) |I| does not show the same degree of
prominence. Although the dIp pattern is still visible in the 1-2kHz range, it does not dominate
the expression; this indicates that |I| is not the head of the compound. On the other hand, |A|
creates a central mAss pattern in the 500-2500Hz range which is more prominent in [c] than
in [c]. Some proponents of Element Theory claim that elements always combine in unequal
proportions, meaning that every compound expression must have a head. According to this
view, [c] would be represented as |I A|. On the other hand, other versions of the model follow
Dependency Phonology in assuming that non-headed expressions are also possible, where
each element contributes equally to the resulting compound. In such a case, [c] may be
represented as |I A|. As discussed in 2.2, the choice between |I A| and |I A| (or any other
representation) cannot be based solely on a vowels physical properties, since elements
encode the phonological properties of a segment, not its precise phonetic qualities. In other
words, the internal structure of [c] can only be established by studying the behaviour of this
52

vowel in phonological processes and distributional patterns. It is therefore possible for this
structure to vary between different languages.
Without the appropriate phonological data to hand, a conclusive analysis of the mid
vowels cannot be given here. Instead, I simply offer a revised representation (cf. (36) above)
of the Tunica vowel system:

(38) I u |I| |U|
c o |I A| |U A|
c |I A| |U A|
a |A|

As (38) shows, I assume here that elements always combine asymmetrically: the close mid
vowels [c o] contain headed |I| while their open counterparts [c ] have headed |A|. I further
assume that vowels containing only a single element are, by default, also headed expressions;
so [I] is represented as |I| rather than |I|. This assumption is not unreasonable: if headed status
gives an element prominence in the acoustic signal, then we should expect single elements to
be always headed because they make a 100 percent contribution to the expressions in which
they occur.
17
On the other hand, the description of English vowels in the following section
will show that this default headedness can be overridden for example, when a
phonological distinction is made between headed and non-headed forms of the same element.
In the case of English, I show that |A| and |A| exist as independent phonological expressions.

4.4 The representation of English vowels
4.4.1 Introduction
All dialects of English have relatively large vowel systems; for example, RP contrasts
nineteen full vowels, and has at least two weak vowels in addition to these. So clearly,
representing such a system requires structural properties beyond those of straightforward
element combination. Here I propose an element-based analysis of English vowels which
rests on two (not uncontroversial) assumptions. First, I assume that element dependency or
headedness is relevant to vowel structure in English, but not in the sense described in 4.3.
So far, the difference between headed and non-headed has been used to distinguish between

17
Similarly, a verb stem which stands as the head of a morphological domain displays its head-like properties
regardless of whether or not it has a dependent suffix.
53

individual vowel sounds; in English, however, I propose that headedness relates to vowel
strength: headed expressions represent full vowels, both short and long, whereas non-headed
expressions account for reduced vowels and for the weak members of diphthongs. Second, I
assume that vowel length is the distinguishing property in contrasts such as [u:] versus [u]. In
this case the same phonological expression |U| has different qualitative interpretations
according to whether it is associated with a short or long nucleus. This is consistent with the
Element Theory view, discussed in 3.2, that segmental representations typically show some
degree of phonetic latitude.
Here I focus primarily on the British English dialect of RP, as this is the dialect most
familiar to me. However, I also consider other dialects from around the English-speaking
world in order to highlight some of the structural differences responsible for dialectal
variation.

4.4.2 Short vowels
The RP dialect has the six short full vowels [i c : n u]. The term full (as opposed to
reduced) refers to the ability of these vowels to contrast in prosodically strong positions
such as stressed syllables, while the term short indicates that they are associated with single
(as opposed to branching) nuclei. I propose the following representations:

(39) [i] |I| miss, building
[c] |I A| send, defend
[] |I A| gas, action
[:] |A| love, subtle
[n] |U A| wrong, tonic
[u] |U| good, cooking

It is no coincidence that the vowels in (39) are all represented by headed expressions. Like
many other languages, English reveals a natural association between headedness and prosodic
strength, such that expressions with headed elements show an affinity for strong/stressed
positions.
As for the element structures themselves, |I| and |U| need no explanation since their
interpretations [i] and [u] involve the simple patterns dIp and rUmp respectively. As shown
in figure 17, [:] also involves a simple acoustic pattern the mAss pattern created by the
54

convergence of a high F
1
with F
2
. Yet this is also the pattern for the archetypal low vowel [u:],
given in figure 18 for comparison. The similarity between figures 17 and 18 suggests that [:]
and [u:] have the same element structure: both are interpretations of |A|. Of course, they are
distinguished phonologically by the length of their associated nucleus (see 4.4.3).

(40) Figure 17: |A| (as short [:]) Figure 18: |A| (as long [u:])

The difference between RP [c] and [] is presumably based on headedness, as it
appears to parallel the [c]~[c] difference in Tunica proposed in (38). In the RP vowel [c] the
dIp pattern dominates the central region of the spectrum, suggesting the |I|-headed expression
|I A|. By contrast, mAss dominates in [], which points to |A|-headedness and to the
representation |I A|. The spectral patterns for both vowels are given in (41):

(41) Figure 19: |I A| (as short [c]) Figure 20: |I A| (as short [])

Unfortunately, short vowels in RP are generally inert; that is, they are rarely involved
in dynamic alternations of the kind that could reveal something about their internal structure.
Nevertheless, by observing the behaviour of English loanwords in Japanese, we gain an
insight into the structural properties of the vowels in the original English words.

55

(42) English loanwords in Japanese
Japanese RP
a. [c] [c] [Lclljappm] ketchup [gcslo] guest on TV
b. [a] [:] [Lappmim] couple [galsm] guts, pluck
c. [Ia] [] [LIampasm] campus [gIapgm] gang

As (42a) shows, when Japanese adopts an English word with [c], the loanword keeps
the original vowel [c]: e.g. [gcslo] guest. This is because both languages have the same
vowel [c] with the same representation |I A|. On the other hand, in (42b) the English vowel
[:] is reinterpreted as [a] in Japanese: e.g. [galsm] guts. Assuming that [:] is represented as
|A| in English, this is the expected outcome because the same expression |A| is interpreted as
[a] in Japanese. Then in (42c) the RP vowel [] is reinterpreted in Japanese as the light
diphthong [Ia], which may be analysed as the vowel [a] with a dIp onglide: e.g. [gIapgm]
gang. What this indicates is that Japanese listeners perceive [] as a compound of |I| and |A|,
but one that is phonologically distinct from |I A| (which is interpreted as [c]). Specifically,
[] is perceived as |I A| an expression in which |A| dominates (i.e. acts as a head) while |I|
has a dependent role. Because Japanese has only five vowels [I c a o m], the light diphthong
[Ia] is the only feasible interpretation of |I A| which (i) is permitted by the phonology of
Japanese and (ii) closely reflects the original English vowel structure.
18
So, support for the
vowel representations in (39) comes not directly from English but from the forms in (42)
which show how English loanwords are interpreted in Japanese.
To complete the discussion of RP short vowels, let us consider the properties of [n].
Undeniably this is a compound of |U| and |A|, though the head-dependency relation between
them is not immediately clear. The historical evidence suggests the representation |A U|,
indicating that this is primarily a mAss vowel onto which the rUmp pattern is superimposed
(via lip rounding). For example, in present-day RP where [n] has the spelling a (e.g. watch,
swan, quantity) the original Middle English short [a] has been rounded to [n] after a labial
glide [v]. In acoustic terms, the rUmp pattern associated with [v] came to be interpreted on
the adjacent vowel: |A| + |U| |A U|. Another source of RP [n] is the Old English short
vowel [], corresponding to the spelling o (e.g. dog, song, long). Once again, the relatively

18
Below I argue that the glide [j] is the interpretation of a simple dIp pattern in a non-nuclear position.
56

open quality of [] (as opposed to [o]) suggests that the original vowel was represented by an
expression with a dominant (i.e. headed) |A|.
On this basis, can we conclude that RP [n] is represented by |A U|? This would be a
little premature. In one sense, identifying the head-dependency relation in [n] is not crucial,
since it does affect the contrastive system English has no other short vowels with |A| and
|U|. But on the other hand, I assume that this does not prevent the language learner from
imposing some kind of headedness relation on the |A U| compound anyway, because head-
dependency seems to be an intrinsic aspect of element structure. The question then arises as
to how the language learner determines headedness in an expression. Following McMahon
(2000), I assume it is necessary to keep the historical properties of a segment quite separate
from its linguistic properties; this is because the phonological grammar should ideally reflect
just the synchronic properties of a sound system. As McMahon notes, historical facts can
sometimes help us to understand how grammars have come to be the way they are, but these
facts do not form part of a language users phonological knowledge.
According to this view, historical evidence of the kind described above (for example,
the process of [a][n] rounding after [v]) can never be conclusive. It could even be viewed
as irrelevant, because we have to assume that infant language learners do not have access to
historical information when they are building their phonological grammars. They have no
means of referring to spelling, for example, or to the inputs of earlier sound changes. Instead,
infants build segmental representations purely on the basis of what they perceive. So, the task
of acquiring [n] will involve identifying from the input language the distributional, relational
and acoustic characteristics of [n] and nothing else. We can speculate that, in the earliest
stages of acquisition, learners pay particular attention to the acoustic properties of the input.
In this case, it would be the spectral characteristics of [n] such as a predominant mAss
pattern which would motivate a headed representation such as |A U|. And presumably, this
representation would remain in the infants developing grammar as long as it is consistent
with the phonological behaviour of this vowel.

4.4.3 Long monophthongs
As mentioned in 4.4.1, I shall assume that contrasts such as English [u:] versus [u] are based
on a difference in vowel length rather than segmental structure. This means it is possible to
find the same phonological expression associated with either a short (non-branching) or a
long (branching) nucleus; in the case of long [u:] versus short [u], both are represented by the
57

same expression |U|. On the other hand, they do differ in their phonological behaviour: [u:]
patterns with other long vowels (e.g. long vowels do not precede [p]: *[su:p] but [su:n])
while [u] behaves like other short vowels (e.g. short vowels do not appear word-finally in
open stressed syllables: *[bu] but [buL]). In most dialects, the length difference between [u:]
and [u] is accompanied by a qualitative difference; and in some theories of representation this
difference in vowel quality is taken to be the primary distinguishing property. However, I
shall assume here that any qualitative differences traditionally expressed by referring to
vowel tenseness are a matter of phonetic interpretation rather than phonological structure.
That is, the same phonological expression |U| can be interpreted differently according to
whether it is associated with a short or long nucleus. As discussed in 3.2, this kind of
phonetic variation is a characteristic of the Element Theory model.
Although [u:] versus [u] will be treated here as a prosodic difference (in vowel
quantity) rather than as a segmental difference (in vowel quality), it should be pointed out
that not all versions of Element Theory take the same view. For example, Harris (1994: 114)
distinguishes the English tense vowels ([I:], [ci]) from the lax vowels ([i], [c]) using
differences in both prosodic and segmental structure: tense is linked to long nuclei and lax to
short; and in addition, tense is represented by headed expressions and lax by non-headed
expressions. This dual approach has the advantage of expressing a direct connection between
headedness in the prosodic domain and headedness at the segmental level: head-dependency
between elements co-occurs naturally with head-dependency between the two positions in the
branching nucleus. Evidently, Harris treats these things as two sides of the same coin a
headed vowel expression acts as the head of a prosodic domain (i.e. a long nucleus), and at
the same time contains a headed element which acts as the head of a segmental domain (i.e. a
headed compound expression).
There are clear advantages in being able to capture the link between vowel length and
tenseness in representational terms. And I support Harriss view that a split between headed
and non-headed expressions offers one way of encoding distinctions such as [u:] versus [u].
However, I shall claim that this does not hold for all languages employing (what appears to
be) the same tenseness contrast. Harris himself notes that the behaviour of tenseness in
Germanic languages like English and Dutch is quite different from that found in African
languages such as Akan and Wolof. In the latter type, tenseness behaves unmistakably as a
segmental property; for instance, it is often harmonically active, causing other vowels within
58

the harmonic domain to agree in tenseness.
19
Moreover, tenseness in these systems operates
quite independently of vowel length or syllable weight. For instance, in Wolof (Ka 1994) the
vowel of a tense verb root can be either long (|lo:y] to be shy) or short (|lox] to smoke),
as can the vowel of a lax verb root. In cases like Wolof it makes sense to represent segmental
tenseness in the segmental structure itself, and for this purpose the headedness distinction
employed by Harris works well. The same headedness-based approach to vowel harmony is
also adopted in Harris & Lindsey (1995), Backley & Takahashi (1998) and elsewhere.
By contrast, tenseness in Germanic languages is tied to prosodic structure
particularly, to the branching or non-branching status of the nucleus. Furthermore, in these
systems where tenseness is conditioned by prosodic structure, the tenseness property itself
rarely displays any active phonological behaviour. For example, it is possible to find
instances of vowel tensing in English, but these are never the result of a segmental process
such as tenseness assimilation between adjacent vowels. On this basis, it seems that
tenseness of the Germanic type is to be more appropriately viewed as a property of the
prosodic structure specifically, the length of the nucleus and not as a property of the
segmental (element) structure. I therefore propose that the lexical distinction between a short
and a long nucleus is not matched by any corresponding segmental distinction; so [u:] and [u]
have the same element structure |U|. However, the difference between short and long nuclei is
reflected in the way language users phonetically interpret the vowel associated with the
nucleus: in a branching nucleus |U| is interpreted as a peripheral or tense vowel, while in a
non-branching nucleus the same expression has a more lax interpretation. In this way,
language users employ differences in the interpretation of an expression as a means of
conveying linguistic information in this case, regarding the branching status of the
associated nucleus.

(43) monophthongs [I: u: u: : +:]
long vowels
rising [ai ci i], [ou au]
diphthongs
centring [io co uo o]

19
In the context of vowel harmony systems, tenseness is more commonly referred to as Advanced Tongue Root.
59

In traditional descriptions of the RP vowel system, the long vowels are divided as in
(43). This organisation is of course based the phonetic characteristics of these sounds, with
the steady state (monophthong) vowels separated from the contour (diphthong) articulations.
Then the diphthongs further subdivide according to the direction of articulatory movement
(rising versus centring). In this paper I follow the same arrangement, although my analysis of
the long vowels will be motivated primarily by their phonological and representational
properties. For example, I analyse the diphthongs as a natural grouping because, unlike
monophthongs, their representations specify two distinct element structures one in each
position of the branching nucleus. These will be discussed in 4.4.5 below. Before that, I
consider the class of long monophthongs [I: u: u: : +:] in RP, which I represent as follows:

(44) a. [I:] b. [u:] c. [u:] d. [:] e. [+:]
N N N N N

x x x x x x x x x x

|I| |U| |A| |A| |A|
|U|

The vowels [I: u: u: : +:] in (44) are taken to be the long vowel equivalents of the short
vowels [i u : n o], respectively; therefore [I:] has the same element structure as [i], and so on.
This raises the question of what it means for an expression such as |I| to be interpreted in a
wider domain that is, across a domain consisting of more than one nuclear position. The
standard assumption in autosegmental frameworks is that |I| is lexically a property of the head
nuclear position, as shown in (45a), but that languages typically extend its association to the
dependent position too, as in (45b).

(45) a. N b. N c. N

x x x x x x

|I| |I| |I| |I|

Some approaches treat (45b) as the output of a derivational process that presumably
would be expressed as a phonological rule or as a condition on structural well-formedness.
60

Yet (45b) need not be seen as being structurally any different from (45a). Instead, (45b)
merely gives us a clearer picture of how (45a) is interpreted by language users. Recall that it
is through the phonetic interpretation of element structure that language users convey
linguistic information. One piece of information contained in (45a) is the fact that there is a
branching nucleus; this is linguistically significant because vowel length is lexically
distinctive in English. But because the dependent nuclear position has no segmental content
of its own, language users must interpret it using segmental material from elsewhere; this is
the only reliable way of expressing the presence of the branching nucleus.
20
To avoid a
situation in which a randomly chosen segment is interpreted in this dependent position, the
source of this segmental material should be local that is, from a neighbouring position.
Furthermore, I shall assume that segmental material is shared between positions that stand in
a head-dependency relation. In (45b) the right position is a dependent of the left position,
which allows it to interpret element structure belonging to its head: the result is a long [I:].
The figure in (45c) shows an alternative way of modelling the way language users
interpret (45a). Again, the expression |I| belongs to the nuclear head position and extends to
the dependent position. The difference here is that head-dependency at the prosodic level is
reflected in a similar head-dependency relation at the segmental level: headed |I| is interpreted
in the nuclear head while non-headed |I| appears in the dependent slot. Generally speaking,
we find no significant difference in phonetic interpretation between |I| and |I|, since both are
mapped on to the same dIp signal pattern. But they do differ in their distribution. In English,
non-headed expressions like |I| are usually restricted to prosodically weak positions.
21
These
positions tend to be characterised by vowel neutralisation, or, at the very least, by an inability
to support a full range of lexical contrasts. Prime examples include the weak member of a
foot (e.g. mtter, tpid) and the dependent position in a branching nucleus (e.g. [ai], [au] but
*[ac], *[an]). In this sense, (45c) provides a more accurate representation of long vowels in
English, although (45b) expresses the same linguistic information.
It should be noted that, while non-headed element structures show an affinity for
weak positions in English, this is a tendency rather than a universal. As already observed, in
ATR (or tenseness) harmony languages such as Wolof, for example, segmental headedness
behaves independently of prosodic structure; its role is to encode the lexical distinction

20
Vowel duration is a notoriously unreliable indicator of prosodic length in English: for example, the vowels in
peach and pitch differ in phonological length but sound similar in duration.
21
An exception is the simplex expression |A|. In (44e), |A| appears to occupy the head position of a branching
nucleus despite being a non-headed structure. I shall discuss the representation of [+:] below, after examining
the nature of the |A| element in more detail.
61

between tense and lax. Nevertheless, it will be interesting to see whether other Germanic
languages besides English, or perhaps other languages with vowel reduction, follow the
English tendency to restrict non-headed expressions to prosodically weak positions. This
pattern will be in evidence below, where I describe the structure of English weak vowels and
diphthongs.

4.4.4 Weak vowels
I have already argued that non-headed vowel expressions are typically associated with weak
positions in the grammar of English. This being the case, it may be further suggested that
reduced vowels such as [o] (and, to a lesser extent, [i] and [u]) are represented by non-headed
structures, since these are the vowels that regularly occur in weak positions. Here I shall
claim that the three post-reduction vowels [o i u] are the phonetic interpretation of the three
basic vowel elements in their non-headed forms, |A|, |I| and |U| respectively. Before
discussing the representations themselves, however, let us consider the nature of reduced
vowels in more detail.
In present-day English we find a direct relationship between stress and the distribution
of full/reduced vowels: full vowels appear freely in stressed syllables but they are typically
absent from unstressed syllables. To put it another way, stressed positions tend to support a
range of vowel contrasts involving full vowels, as in (46a), while the same contrasts tend to
be neutralised in unstressed positions. Depending on the extent of this neutralisation, a weak
position may still be able to support a limited number of contrasts; however, these will
invariably involve reduced vowels, as in the [o] versus [i] contrast in (46b).

(46) RP vowel contrasts in relation to stress
a. stressed syllables: e.g. b[i]tter, b[c]tter, b[:]tter, b[]tter, b[u:]ter
b. unstressed syllables: e.g. btt[o], Btt[i]

It has already been noted that the inherent weakness of unstressed positions is
mirrored by a corresponding weakness at the segmental level. In other words, weak vowels
belong in weak positions. But in what sense is a vowel to be understood as weak? Here I shall
claim that segmental weakness is encoded phonologically by at least two aspects of a
segments representation: a weak vowel is non-headed, and in addition, it comprises just a
single element. The claim that weak vowels are represented by non-headed expressions
62

follows from the discussion in 4.3, where it was argued that heads display a greater degree
of prominence and/or salience than dependents. In this respect, weak vowels show typical
dependent characteristics: for example, in terms of phonetic interpretation they are recessive
(i.e. unstressed) rather than prominent, and in linguistic terms they play a less significant role
(i.e. they contain less linguistic information) than full vowels because they appear in
neutralising contexts. On this basis I assume that weak vowels in English, such as those in
(46b), are represented by non-headed structures.
The second feature of weak vowels is their simplex structure each has only one
element in its representation. Though this is evident in the case of [i] and [u], which are
associated with the single patterns dIp and rUmp respectively, it is less obvious that [o]
should be represented only by the mAss pattern |A|. It will be recalled that English shows
evidence of having two default vowels, [I] and [o]; and in 3.5.2 I argued that an empty
nucleus is phonetically interpreted as [I] rather than [o]. The example in (32) showed how the
plural form peaches ([pI:lj_z_]) contains two empty nuclei, the first of which is phonetically
interpreted in order to break up the sibilant sequence [lj...z] and enhance perceptibility; this
results in the interpretation [pI:ljIz] in RP and other dialects. But if [I] signals an empty
nucleus in English, then [o] must have a different (i.e. non-empty) representation because [I]
and [o] have different phonetic qualities as well as distinct phonological properties.
The phonological identity of English [o] become apparent when we observe vowel
alternations conditioned by stress. To restate the basic pattern: in a stressed position it is
always possible to interpret a full vowel, whereas in a weak position a full vowel is typically
reduced to one of the weak vowels [o i u]. Leaving aside the phonetic implications of the
term reduced, it is evident that reduction refers to a decrease in a vowels contrastive
potential or, recast in Element Theory terms, a reduced vowel carries a reduced amount of
linguistic information. Here I follow Harris & Urua (2001) in assuming that information loss
of this kind goes hand in hand with the loss of structural material from a segments
representation; by removing or suppressing
22
segmental material, the linguistic information
associated with that material is also suppressed. But what kind of segmental material is
targeted in such cases? Vowel representations contain structural information of three different
kinds: the branching versus non-branching status of the nucleus, the elements themselves, and

22
Recently, the Element Theory literature has shown a preference for the suppression rather than the outright
loss of representational material. This suggests that the suppressed material remains in the representation but is
not phonetically interpreted.
63

the head-dependency relation between the elements. It appears that all three of these are
potential targets for suppression in unstressed positions.

(47) Vowel reduction in English
stressed unstressed
vowel structure example vowel structure example
a. [I:]
|I|
defect
[i] |I|
defective
[i]
history historical
b. [u:]
|U|
beauty
[u] |U|
beautician
[u]
wood Hollywood
c. [u:]
|A|
drama
[o] |A|
dramatic
[:]
sulphur sulphuric

The table in (47) shows that a branching nucleus is shortened in unstressed positions;
e.g. long [I:] reduces to short [i]. It also reflects the assumption made above that a headed
expression in a stressed position becomes non-headed when unstressed; e.g. |I| reduces to |I|.
And if vowel reduction should involve a loss of headed status in |I| and |U|, then we may
further assume the same applies to |A|: in (47c) headed |A| is interpreted as [u:]/[:] in stressed
positions, but it reduces to non-headed |A| in weak positions where it is interpreted as [o]. So,
it seems that, contrary to general opinion, [o] is not an unspecified vowel in English; rather, it
has segmental content in the form of |A|. This allows [o] to be distinguished from the
structurally empty vowel [I], as illustrated by word pairs (in non-rhotic RP English) such as
badgers versus badges:

(48) a. badgers ['bdoz] b. badges ['bdIz]
N N N N N N

x x x x x x

b |A| |A| + z | | b |A| | | + z | |

Turning to compound expressions, we see the same effects of vowel reduction in
weak positions. In (49), long vowels become shortened and headed structures are
reinterpreted as non-headed. In addition, part of the original element structure is suppressed
64

to leave just a single element. Note that it is typically the head element which remains, thus
reinforcing the claim that heads play a dominant role.
23

(49) Vowel reduction in element compounds
stressed unstressed
vowel structure example vowel structure example
a. []
|A I| malice
[o] |A| malicious
b. [c]
|A I| desperate
[i] |I| despair
c. [n]
|A U|
congress
[o] |A|
congressional
[:]
Gregorian Gregory

As the examples in (49) show, language users interpret whatever segmental material is
available to them. In stressed positions the compound expression appears in full, so they may
interpret the structure in its entirety. By contrast, in weak positions part of the structure is
suppressed, so language users can interpret only what remains. In this way, the quality of a
reduced vowel is largely predictable from the structure of its corresponding full vowel, given
that language users can only interpret what is there.
This approach departs significantly from some traditional feature-based descriptions
of vowel reduction, where feature values for the full vowel are simply (and arbitrarily)
replaced by a different set of values for the reduced vowel. For example, in feature terms we
can describe reduction effects by referring to vowel height: [c][i] in (49b) involves
[high][+high], while [][o] in (49a) involves [+low][low]. But in fact vowel
reduction itself is unrelated to vowel height. As Element Theory is able to show, it is better
understood as a process involving the suppression of segmental structure. I have argued that
there is a tendency for head elements to remain while dependent elements are suppressed
(though this is not always the case see 4.4.6). Additionally, I have claimed that the
remaining element loses its headed status, which may be taken as a further indication of
prosodic weakness. A weakened vowel is effectively a minimal vowel: weakening causes
long to become short, compound to become simplex, and headed to become non-headed.
These same properties of weak vowels will be in evidence in the following section, where I
discuss the phonological characteristics of diphthongs.

23
This may be subject to dialect variation: e.g. despair [dis'pco] ~ [dos'pco].
65

4.4.5 Diphthong structure
The distribution of diphthongs in English is similar to that of long monophthongs, both being
associated with branching nuclei. For example, diphthongs occupy stressed word-final open
syllables, as do long vowels: tree [liI:], try [liai] (but *[lii]). With respect to their segmental
properties too, diphthongs and long vowels have much in common. Recall that in long vowels
there is a prosodic head-dependency relation holding between the two positions of the
branching nucleus, and that this is matched by a similar asymmetry at the segmental level.
Specifically, the strong nuclear head position supports a headed expression, and the element
in this headed expression extends to the weaker dependent slot where it is interpreted as a
non-head, as shown in (50a).
24

(50) a. long monophthong [u:] b. diphthong [ai]
N N

x x x x

|A| |A| |A|
|I|

The same is generally true of diphthongs, which tend to show a similar asymmetry in
their element structure.
25
For instance, in (50b) the nuclear head position contains headed |A|
while its dependent has non-headed |I|. In fact, on this basis it is possible to make a more
general prediction concerning the shape of English diphthongs: if the second position of a
branching nucleus acts as a typical weak position, it follows that the second portion of a
diphthong can contain only a weak vowel. This effectively means a choice between [i u o],
which indeed covers all the possibilities for RP English:

24
Note that a slightly diphthongised interpretation of (50a), such as [uo] (e.g. father ['!uoo]), is not only
attested but also predicted by the segmental structure, since the typical interpretation of non-headed |A| in
English is [o] (see 4.4.5 above). By contrast, the long vowels [I:] and [u:] tend not to diphthongise, since
English non-headed |I| and |U| have phonetic interpretations which do not differ significantly in vowel quality
from those of their headed counterparts |I| and |U|.
25
An exception is [ou], which (together with the monophthong [+:]) will be discussed below.
66

(51) []-target []-target []-target
[ai] time [au] now [io] beer
[ci] say [ou] low [co] fair
[i] coin [uo] tour

Note that the diphthongs are grouped as in (51) merely for descriptive convenience. I shall
refer to these groupings in my description, but it should be stressed that the groups
themselves do not stand for separate phonological categories. In fact, the only relevant
phonological class here is the set of diphthongs as a whole, which is unified by the presence
of a weak lexical vowel in the dependent position.

4.4.6 |I|-diphthongs
The diphthongs [ai ci i] all have a non-headed |I| element in their second position, since
they share the same end point [i]. As already argued, this is exactly the kind of expression we
expect to find in a prosodically weak position. By contrast, in the strong positions of these
diphthongs we only find expressions which are headed. For the internal structure of these
compounds we may refer back to 4.4.2, since [ai ci i] begin with the element structures
already proposed for the short vowels [: c n] respectively.
First consider the diphthong [ai]. It has been observed (Cruttenden 2001: 132) that
[ai] has a starting point which varies noticeably between dialects: for example, [i] (RP),
[ni] (Australia), [oi] (South-West England), [:i] (Scotland). In spite of this phonetic variation,
however, I shall follow the convention of referring to this vowel as [ai], in an attempt to
focus on its value as a phonological unit. Common to these dialectal variants of [ai] is the
presence of |A| in the first portion; and in keeping with the proposed link between headedness
and strong positions, I assume that in [ai] this element is present in its headed form |A|. The
main historical source of [ai] is Middle English [I:], which first diphthongised to [oi]
26
in the
sixteenth century; then later [oi] was reinterpreted as [:i] and more recently as [ai]. Again,

26
En route from [I:] to [oi] this vowel went through an intermediate pronunciation [iI] in the early 1500s. I take
[iI] and [I:] to be different phonetic interpretations of the same phonological expression. The motivation for
[I:][iI] may have come from the desire to avoid a merger with the output of another change [c:][I:], which
affected words such as sweet at around the same time.
67

what unites these earlier diphthongal forms is the presence of |A|, either as a non-head in [oi]
or as a head in [:i]/[ai]. The representation for present-day RP [ai] is given in (52a).
(52) a. [ai] time b. [ci] say c. [i] coin
N N N

x x x x x x

|A| |A| |A|
|I| |I| |I| |U| |I|

Alternations such as der[ai]ve~der[i]vation show that [ai] is subject to stress sensitive
vowel weakening of the sort illustrated in (49). Again, in an unstressed position the full
vowel reduces to a minimal expression a single, non-headed element in a shortened
nucleus. In the case of [ai] it may be either the element |I| (as in der[i]vation) or the element
|A| (as in der[o]vation) which is retained. Surprisingly, some phonological descriptions treat
this process of [ai][i] weakening in parallel with the historical (and no longer productive)
process of tri-syllabic shortening that is referred to in the description of pairs such as
div[ai]ne~div[i]inity and cr[ai]me~cr[i]minal. Yet the two processes are different in kind.
The output of tri-syllabic shortening is a stressed vowel, which means that this is not a
weakening process: for example, the full vowel [i] in div[i]inity is represented by the headed
element |I|.
Furthermore, the [ai] in div[ai]ne and the [i] in div[i]inity have come about through
two independent historical processes: [ai] (div[ai]ne) is a product of the [I:][ai]
diphthongisation process associated with the Great Vowel Shift, while [i] (div[i]inity) is a
genuine case of [I:][i] tri-syllabic shortening. Because [ai] and [i] represent two distinct
phonological developments, there is no reason to assume that one should be derived from the
other. To reinforce this point, I suggest that div[ai]ne~div[i]inity should be treated not as an
alternating pair but as two separate lexical forms. After all, it is conceivable that language
learners acquire these two words at different times in all probability, divine will enter the
lexicon earlier than divinity. Thus again we see how historical evidence can shed light on
patterns in the present-day language without necessarily playing any active role in the
synchronic grammar.
68

Like [ai], the diphthongs [ci] and [i] have endpoints represented by |I|. Also like [ai]
they show a certain amount of variation in their initial vowel quality. In the case of [ci],
Cruttenden (2001: 130) notes a range of starting points from [ci] in RP through [ci] in refined
RP to [i] in Australian English. Although these variants are united by the presence of a
compound containing |I| and |A| in the first portion, in some cases they are distinguished by
the headedness relation between these two elements: compare for instance the |I|-headed RP
interpretation [ci] in (52b) with the |A|-headed Australian English interpretation [i]
27
in
(53a).

(53) Dialectal variation in the interpretation of [ci]
a. [i] (Australia) b. [c:] (Scotland) c. [co] (Tyneside)
N N N

x x x x x x

|A| |A| |A| |A|
|I| |I| |I| |I|

In addition, there are some dialects which interpret the [ci] vowel as a monophthong; for
example, [c:] in Scottish English and [c:] in many dialects of northern England. As illustrated
in (53b), this variant contains the same elements |I| and |A|, but lacks any vowel specification
in the dependent position. Meanwhile, in Tyneside English (North-East England) we find yet
another interpretation of the [ci] vowel one where the diphthong ends in [o] rather than
[i]: e.g. take [lcoL]. The structure for [co] is shown in (53c).
On the question of alternation, I treat cases such as ch[ci]ste~ch[]stity in parallel
with div[ai]ne~div[i]inity discussed above that is, as involving separately stored lexical
entries. Yet it is possible to find genuine instances of vowel reduction affecting [ci]. For
example, necklace (originally [ncL]+[Icis]) evolved into the morphologically simplex form
[ncLIos] when the second portion [Icis] lost its stress and its vowel became weak. In Element
Theory, weakening has the effect of reducing segmental structure to a minimum (i.e. to a

27
In Australian English and also in London/Estuary English, the interpretation of the [ci] vowel as [i] (e.g.
main [min]) does not overlap with the diphthong [ai], since the latter has the quality [ni] (e.g. mine [mnin]).
69

single non-headed element); in this case only |A| remains from the original vowel expression,
interpreted as [o] in the weak syllable of present-day [ncLIos].
The diphthong in words like coin and joy shows less phonetic variation, most dialects
having an interpretation [i] which corresponds to the representation in (52c). However, in
some varieties including London English do we find a diphthong with a closer starting point
approximating to [oi]: e.g. point [poinl]. This suggests that the vowel structure contains the
same elements but with headedness reversed (i.e. |A U| in the strong position), although
phonological evidence would be needed to confirm this. Finally, in West Indian English no
contrast is made between the vowels of point and pint, the diphthongs [i] and [ai] having
converged as [ui] (Cruttenden 2001: 134). I take the dark quality at the beginning of [ui] to
be a variant interpretation of the expression |A|, such as we find in the long monophthong [u:].

4.4.7 |U|-diphthongs
Most dialects of English have two diphthongs ending in [u], namely [au] (e.g. now) and [ou]
(e.g. go). Predictably, both have a dependent slot containing an expression with only a single
non-headed element in this case |U|.

(54) a. [au] b. [ou]
N N

x x x x

|A| |A|
|U| |U|

For many English users, the starting point for [au] is the same as that for [ai],
suggesting the representation given in (54a). In fact, the historical development of [au]
during the Great Vowel Shift parallels that of [ai] described in 4.4.6 above:

(55) ME 1400s 15-1600s 1700s
a. [I:] [iI] [oI] [aI]
b. [u:] [uu] [ou] [au]

70

As (55) shows, diphthongisation came about through the introduction of the element |A|. In
fact, the unnatural coexistence of |I| and |U| makes this the only feasible way of bringing
about a (non-reduction) change to the segment structure of [I:] and [u:]. And it is the presence
of |A| that is the common factor across different interpretations of [au] in modern dialects,
such as [u] in London English and [uu] in refined RP. The diphthong [au] does not usually
weaken in unstressed positions.
So far I have maintained a useful generalisation regarding segmental structure in
branching nuclei namely, headed element expressions belong in the nuclear head position
while non-headed ones go in the dependent position.
28
Unfortunately, the vowel [ou] appears
to break this otherwise robust pattern, as (54b) shows. The vowels [ou] and [au] are very
similar in terms of their element structure, with the headedness of |A| being the only property
separating them. On the strength of synchronic and historical evidence, we can be reasonably
certain that [ou] contains |A|. Historically, [ou] derives from ME [:] (e.g. no, home, ghost)
and from ME [u] (e.g. know, soul, own), both ME vowels containing |A| and |U|. Without
evidence to the contrary, I shall assume these same two elements have survived into the
present-day grammar, now interpreted sequentially as in (54b).
Meanwhile, synchronic evidence for the presence of |A| in [ou] comes from vowel
reduction. In some regional dialects like Tyneside English, [ou] reduces to [o] in weak
syllables: for example, pillow ['piIo] (versus RP ['piIou]). Then in all dialects we find the
stress-conditioned alternation [ou]~[o] in word pairs such as cust[ou]dian~cust[o]dy and
pr[ou]sody~pr[o]sodic. Now, assuming that the process of vowel reduction involves stripping
away segmental material to leave a minimal expression, and assuming also that [o] in English
is the interpretation of non-headed |A| (see 4.4.4), then we must conclude that |A| is present
in the representation of the full vowel [ou]. This |A| cannot be a headed, because [ou] is
distinct from [au] (which does have headed |A|, in parallel with [ai]). Therefore, the first half
of the diphthong [ou] apparently contains just the single non-headed element |A|, despite this
structure being untypical of English phonology. It is worth noting that the structure in (54b) is
avoided in some dialects. For instance, the interpretation of [ou] in the West Midlands area of
Britain is [au] (e.g. no [nau]), which presumably has headed |A| as in (54a).
29

28
The long monophthong [+:] has already been noted as an exception. I will discuss this vowel below.
29
In this dialect no [nau] is kept distinct from now [ncu], the latter having a compound expression in the strong
(left-hand) position of the nucleus.
71

At first sight, the analysis of the full vowel [ou] as a non-headed expression seems an
undesirable result. After all, it means that [ou] fails to conform to a general pattern
controlling the distribution of headed and non-headed element structures. Yet at the same
time it gives us a useful insight into the nature of the elements themselves. The notion of
headedness was introduced in 4.3 as an asymmetric relation which usually manifests itself
as strong versus weak. And on the basis of the analyses given above, we can generalise that
headed elements are stronger than non-headed ones because a headed element can occupy a
prominent (head) position whereas a non-headed element is usually restricted to a recessive
(dependent) position. But in the case of [ou] in (54b) the asymmetric relation holds between
two non-headed elements |A| and |U|; and in this situation it is |A| which occupies the strong
(initial) position, not |U|. From this we may infer that, in English, |A| is inherently stronger
than |U|, all else being equal. Of course English diphthongs do not, by themselves, provide
enough evidence for reaching any conclusions on strength differences between elements; but
they do indicate that the three vowel elements are not necessarily of equal status.
I close this discussion of |U|-diphthongs by returning to the issue of the representation
of [+:]. Although [+:] is not a diphthong, it has one structural property in common with the
diphthong [ou] namely, the non-headed element expression |A| in the strong position of
the nucleus. The relevant structures are repeated here:

(56) a. [ou] b. [+:]
N N

x x x x

|A| |A|
|U|

It has just been shown how the relative strength of non-headed |A| allows this element to
pattern with headed expressions (which are strong by virtue of their headed status) in the first
position of a branching nucleus. This has already been noted in the case of the diphthong [ou]
in (56a), and it can now be seen in (56b) too. Recall from (44) that the long monophthongs
[I: u: u: :] are all represented by headed expressions, but that the same is not true of the one
remaining long vowel [+:], which has only non-headed |A|. Yet evidently |A| is strong enough
72

to occupy the head position in this long vowel domain, which renders the structure in (56b)
grammatical.
As for its phonetic interpretation, it was established in 4.4.4 that non-headed |A| is
interpreted as [o] in English; so in a long nucleus the expected interpretation must be [o:].
Again this seems correct if we follow Cruttenden and others in viewing [+:] as the long vowel
equivalent of [o]: The quality of /+:/ often coincides with that of /o/, the difference between
the two being only one of length (Cruttenden 2001: 125). And in terms of acoustic mapping,
if the same expression |A| is present in both [o] and [+:], then we should expect it to map onto
the speech signal in a similar way in both cases. So the case for representing [+:] as a non-
headed element expression is not without support.
In some descriptions of English, [+:] is referred to as the rhotic vowel, not only
because it has a unique r-like acoustic quality but also because it invariably appears in words
containing historical r in the spelling. Indeed, the reason why [o] lengthened to [+:] is linked
directly to the loss of r in non-rhotic dialects during 1700s. Most present-day instances of [+:]
derive from a ME sequence of a short vowel plus r. The examples in (57) are from
Cruttenden (2001):

(57) [c] + r earth, heard, fern
[i] + r shirt, birth, myrrh
[u] + r word, journey, spur

By the end of the 1800s the vowels in (57) had neutralised to [o] before r# and rC, the
sequences [ci ii ui] all being interpreted as [oi]. This was followed by the suppression of r in
non-rhotic dialects which, in turn, allowed [o] (= |A|) to extend into the vacant slot to create a
long |A| vowel. So it is through these historical changes that we arrive at the anomaly in
present-day RP of having a long central vowel [+:] that regularly appears in strong positions
but is nonetheless structurally non-headed.
Historical r also provides the context for other RP vowels besides [+:]; these are
referred to in the following section as the set of |A|-diphthongs.

4.4.8 |A|-diphthongs
73

The diphthongs terminating in [o] are characteristic of RP and other non-rhotic systems, and
invariably appear in words with r in the spelling. The examples in (58) illustrate the set of
[o]-final diphthongs, or |A|-diphthongs, associated with some non-rhotic dialects. As before,
the symbols are to be understood as phonological units and not as interpreted vowel sounds.

(58) |A|-diphthongs in some non-rhotic systems
a. [io] beer [bio], idea [ai'dio]
b. [co] fair [!co], scarce [sLcos]
c. [uo] poor [puo], insurance [in'juoions]
d. [o] score [sLo], pour [po]

As expected, the |A|-diphthongs in (58) have a lone |A| in the dependent slot, which is what
unites them as a set. Following 4.4.4, this non-headed |A| element is interpreted as [o]. This
gives us the representations below:

(59) a. [io] b. [co] c. [uo] d. [o]
N N N N

x x x x x x x x

|A| |A| |A| |A| |A| |A|
|I| |I| |U| |U|

A notable characteristic of |A|-diphthongs is their liability to be interpreted as
monophthongs. Apparently language users show a tendency to avoid interpreting |A| in the
dependent position of the nucleus, perhaps owing to the mismatch between the inherent
strength of |A| and the inherent weakness of this syllabic context. Sometimes the dependent
|A| is simply suppressed and the result is a long monophthong. As (60a) shows, this is true for
[io], which is regularly interpreted as [i:] in Australian English (and increasingly so in RP):
e.g. beer [bi:]. A monophthong interpretation of [co] is also typical of most non-rhotic
dialects: e.g. fair [!c:]. This is illustrated in (60b).

74

(60) a. [io] ~ [I:]/[i:] b. [co] ~ [c:]/[c:]
N N N N

x x x x x x x x

|A| |A| |A| |A| |A| |A|
|I| |I| |I| |I|
Another way of avoiding the interpretation of |A| in the weak position of the nucleus
is to interpret it in the strong position instead. This happens in the case of the (phonological)
diphthong [uo], which is regularly interpreted as [:] in Australia, New Zealand and southern
England. As (61a) shows, [:] results from the coalescence of |U| and |A|. Additionally, we
find a more general pattern among non-rhotic dialects in which the diphthong [o] is also
interpreted as [:]. Indeed this pattern has become so widespread that the original diphthong
interpretation [o] has become something of a rarity. As with [io] and [co], the [:]
interpretation of [o] comes from the suppression of |A| in the dependent slot. This is shown
in (61b).

(61) a. [uo] ~ [:] b. [o] ~ [:]
N N N N

x x x x x x x x

|A| |A| |A| |A| |A| |A| |A|
|U| |U| |U| |U|

As a result of the variation shown in (61), words such as pour [po] and paw [p:] are now
homophones for many non-rhotic speakers, both being realised as [p:]. Note, however, that
it is still necessary to maintain [o] and [:] as distinct categories in the grammar, since they
differ in terms of their phonological behaviour with respect to linking r, for instance. The
representation of consonants in Element Theory, including that of linking r, is discussed in
Backley (in prep).

75

5: Summary

In this paper I have introduced Element Theory as a novel approach to the representation of
segmental structure. I began by addressing some fundamental questions about the nature of
native speaker phonology, arguing that our phonological knowledge is primarily concerned
with linguistic information carried by the speech signal rather than with information about
how to articulate speech sounds. On this basis I proposed a rejection of traditional distinctive
features, which are largely speaker-oriented, in favour of a small set of monovalent elements.
The elements are linguistic categories, in the sense that they identify linguistic information
needed by language users either speakers or hearers to distinguish one morpheme from
another. Unlike traditional features, elements do not directly encode the physical properties of
spoken language; so in adopting an element-based approach to representations we must
abandon the underlying notion that phonology tells you how to pronounce things.
Despite having a cognitive base, however, language must also relate to the physical
world, otherwise it could never be acquired by learners or be produced/perceived by users.
This means that the phonological units of language the elements must make reference
to some physical aspect of the communication process. Element Theory assumes that a
phonological element maps on to a pattern in the speech signal. This pattern corresponds not
to a specific acoustic property (such as a formant frequency value) but to one of a small
number of broad spectral shapes that language users equate with linguistic information. In
section 3 it was argued that vowel distinctions rely on the identification of three such shapes
dIp, mAss and rUmp indicating that only three elements are needed to represent vowel
contrasts. These were introduced as the elements |I|, |A| and |U| respectively, each of which
defines a natural class of segments.
A vowel element is interpretable either alone or in combination with other elements.
And in large vowel systems such as that of English, elements combine asymmetrically to
form head-dependent relations within the segment. I argued that, in the case of English, the
absence of a headedness relation in a segment indicates a weak expression such as a reduced
vowel or the second (i.e. non-head) position in a diphthong. In other words, the difference
between heads and non-heads in segmental structure parallels a similar difference between
heads and non-heads in the prosodic structure. This interdependency between segmental and
prosodic structure is an important feature of the Element Theory approach. By assuming that
76

the distribution of elements is, at least to some extent, controlled by the strength (headedness)
of prosodic positions, we progress towards a non-arbitrary account of segmental effects such
as phonotactic patterning and vowel reduction.
This paper has discussed the motivation for using elements in representations, and has
shown how this approach relates to the structure of vowels. However, Element Theory can
also describe consonants, and the reader is referred to Backley (in prep) for an introduction to
the consonant elements |H L ?| and their role in the representation of non-nuclear expressions.

77

References
Anderson, J.M. & C.J. Ewen (1987). Principles of Dependency Phonology. Cambridge:
Cambridge University Press.
Anderson, J.M. & C. Jones (1974). Three theses concerning phonological representations.
Journal of Linguistics 10, 1-26.
Anderson, S.R. (1985). Phonology in the Twentieth Century: Theories of Rules and Theories
of Representations. Chicago: University of Chicago Press.
Archangeli, D. (1988). Aspects of Undespecification Theory. Phonology 5, 183-207.
Archangeli, D. & D. Pulleyblank (1994). Grounded Phonology. Cambridge, Mass.: MIT
Press.
Backley, P. (in prep). Phonology: an Introduction to Element Theory. Edinburgh: Edinburgh
University Press.
Backley, P. & T. Takahashi (1998). Element activation. In Cyran (ed.), 13-40.
Botma, B. (2005). Nasal harmony in Yuhup: a typological anomaly? In N. Kula & J. van de
Weijer (eds.), Papers in Government Phonology: special issue of Leiden Papers in
Linguistics 2.4, 1-21.
Bright, W. (ed.) (1992). International Encyclopedia of Linguistics 3. Oxford: Oxford
University Press.
Browman, C.P. & L. Goldstein (1992). Articulatory phonology: an overview. Phonetica 49,
155-180.
Burton-Roberts, N., P. Carr & G. Docherty (eds.) (2000). Phonological Knowledge:
Conceptual and Empirical Issues. Oxford: Oxford University Press.
Charette, M. (1991). Conditions on Phonological Government. Cambridge: Cambridge
University Press.
Charette, M. & A. Gksel (1996). Licensing constraints and vowel harmony in Turkic
languages. SOAS Working Papers in Linguistics and Phonetics 6, 1-23.
Chomsky, N. & M. Halle (1968). The Sound Pattern of English. New York: Harper and Row.
Clements, G.N. (1985). The geometry of phonological features. Phonology Yearbook 2, 223-
250.
Crosswhite, K.M. (2004). Vowel Reduction. In B. Hayes, R. Kirchner & D. Steriade (eds.),
191-231.
78

Crothers, J. (1978). Typology and universals of vowel systems. In Greenberg et al. (eds.), 93-
152.
Cruttenden, A. (2001). Gimsons Pronunciation of English, sixth edition. London: Arnold.
Cutler, A. & D. Norris (1988). The role of strong syllables in segmentation for lexical access.
Journal of Experimental Psychology: Human Perception and Performance 14(1),
113-121.
Cyran, E. (ed.) (1998). Structure and Interpretation: Studies in Phonology. Folium: Lublin.
Di Sciullo, A.-M. (2005). Asymmetry in Morphology. Linguistic Inquiry Monograph 46, MIT
Press.
Durand, J. (ed.) (1986). Dependency and Non-Linear Phonology. London: Croom Helm.
Durand, J. (1995). Universalism in phonology: atoms, structures and derivations. In J. Durand
& F. Katamba (eds.), 267-288.
Durand, J. & F. Katamba (eds.) (1995). Frontiers of Phonology: Atoms, Structures,
Derivations. Harlow, Essex: Longman.
Flemming, E.S. (2002). Auditory Representations in Phonology. New York: Routledge.
Fowler, C.A. (1986). An event approach to the study of speech perception from a direct-
realist perspective. Journal of Phonetics 14, 3-28.
Goldsmith, J. (ed.) (1995). The Handbook of Phonological Theory. Cambridge: Blackwell.
Greenberg, J., C.A. Ferguson & E. Moravcsik (1978) (eds.). Universals of Human Language,
vol. 2: Phonology. Stanford: Stanford University Press.
Halle, M. (1992). Phonological features. In W. Bright (ed.), 207-212.
Hardcastle, W.J. & A. Marchal (eds.) (1990). Speech Production and Speech Modeling.
Dordrecht: Kluwer.
Harris, J. (1994). English Sound Structure. Oxford: Blackwell.
Harris, J. (2007). Representation. In P. de Lacy (ed.). The Cambridge Handbook of
Phonology, 119-138. Cambridge: Cambridge University Press.
Harris, J. & G. Lindsey (1995). The elements of phonological representation. In J. Durand &
F. Katamba (eds.), 34-79.
Harris, J. & G. Lindsey (2000). Vowel patterns in mind and sound. In N. Burton-Roberts et al.
(eds.), 185-205.
Harris, J. & E-A. Urua (2001). Lenition degrades information: consonant allophony in Ibibio.
Speech, Hearing and Language: Work in Progress 13, 72-105.
Haspelmath, M., M.S. Dryer, D. Gil & B. Comrie (eds.) (2005). The World Atlas of
Language Structures. Oxford: Oxford University Press.
79

Hayes, B., R. Kirchner & D. Steriade (2004) (eds.). Phonetically Based Phonology.
Cambridge: Cambridge University Press.
Hirayama, M. (2003). Contrast in Japanese vowels. Toronto Working Papers in Linguistics
20, 115-132.
Hsin, T.-S. (2003). The mid vowels of Maga Rukai and their implications. Journal of East
Asian Linguistics 12, 59-81.
Jakobson, R., G. Fant & M. Halle (1952). Preliminaries to Speech Analysis. Cambridge, MA:
MIT Press.
Jakobson, R. & M. Halle (1956). Fundamentals of Language. The Hague: Mouton.
Johnson, K. (2003). Acoustic and Auditory Phonetics. Oxford: Blackwell.
Jones, C. (1989). A History of English Phonology. London: Longman.
Ka, O. (1994). Wolof Phonology and Morphology. University Press of America: Lanham:
Maryland.
Kager, R. (1999). Optimality Theory. Cambridge: Cambridge University Press.
Kaye, J.M. (1989). Phonology: a Cognitive View.
Kaye, J.M. (1990). Coda licensing. Phonology 7, 301-330.
Kubozono, (2001). On the markedness of diphthongs. Ms., Kobe University.
Lass, R. (1984). Vowel system universals and typology: prologue to theory. Phonology
Yearbook 1, 75-112.
Liberman, A.M. & I.G. Mattingly (1985). The motor theory of speech perception revised.
Cognition 21, 1-36.
Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W.J.
Hardcastle & A. Marchal (eds.), 403-439.
McCarthy, J. 2002. A Thematic Guide to Optimality Theory. Cambridge: Cambridge
University Press.
McMahon, A. (2000). Lexical Phonology and the History of English. Cambridge: Cambridge
University Press.
Maddieson, I. (1984). Patterns of Sounds. Cambridge: Cambridge University Press.
Mohanan, K.P. (1991). On the bases of radical underspecification. Natural Language and
Linguistic Theory 9, 285-325.
Nasukawa, K. & P. Backley (2008). Affrication as a performance device. Phonological
Studies 11, 35-46.
Odden, D. (2005). Introducing Phonology. Cambridge: Cambridge University Press.
80

Osborn, H. (1966). Warao I: phonology and morphophonemics. International Journal of
American Linguistics 32, 108-132.
Pettersson, T. & S. Wood (1987). Vowel reduction in Bulgarian and its implications for
theories of vowel reduction: a review of the problem. Folia Linguistica 21, 261-279.
Prince, A.S. & P. Smolensky (2004). Optimality Theory: Constraint Interaction in
Generative Grammar. Oxford: Blackwell.
Rennison, J.R. (1986). On tridirectional feature systems for vowels. In Durand, J. (ed.), 281-
303.
Roca, I. (1994). Generative Phonology. London: Routledge.
Sagey, E. (1986). The representation of features and relations in nonlinear phonology.
Doctoral dissertation. MIT, Cambridge: Mass.
Schane, S.A. (1984). The fundamentals of Particle Phonology. Phonology Yearbook 1, 129-
155.
Scheer, T. (2004). A Lateral Theory of Phonology. Studies in Generative Grammar, 68.1.
Berlin and New York: Mouton de Gruyter.
Simo Bobda, A. (2007). Patterns of segment sequence simplification in some African
Englishes. World Englishes 26(4), 411-423.
Steinberger, K.E. & R.M. Vago (1987). A multileveled autosegmental analysis of vowel
harmony in Bari. In Odden, D. (ed.) Current Approaches to African Linguistics 4,
357-368.
Steriade, D. (1995). Underspecification and markedness. In J. Goldsmith (ed.), 114-174.
Stevens, K.N. (1989). On the quantal nature of speech. Journal of Phonetics 17, 3-45.

Backley Englishvowels - 2010

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Backley Englishvowels - 2010

Загружено:

Авторское право:

Доступные форматы

1

Вам также может понравиться