Вы находитесь на странице: 1из 324

Voicing in Japanese

Studies in Generative Grammar 84


Harry van der Hulst

Jan Koster
Henk van Riemsdijk

Mouton de Gruyter
Berlin New York

Voicing in Japanese

Edited by

Jeroen van de Weijer

Kensuke Nanjo
Tetsuo Nishihara

Mouton de Gruyter
Berlin New York

Mouton de Gruyter (formerly Mouton, The Hague)

is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.

The series Studies in Generative Grammar was formerly published by

Foris Publications Holland.

Printed on acid-free paper which falls within the guidelines

of the ANSI to ensure permanence and durability.

Library of Congress Cataloging-in-Publication Data

Voicing in Japanese / edited by Jeroen van de Weijer, Kensuke Nanjo,
Tetsuo Nishihara.
p. cm. (Studies in generative grammar ; 84)
Includes bibliographical references and index.
ISBN-13: 978-3-11-018600-0 (cloth : alk. paper)
ISBN-10: 3-11-018600-4 (cloth : alk. paper)
1. Japanese language Phonetics. I. Weijer, Jeroen Maarten van
de, 1965 . II. Nanjo, Kensuke. III. Nishihara, Tetsuo, 1961 .
IV. Series.
PL541.V65 2005

Bibliographic information published by Die Deutsche Bibliothek

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available in the Internet at http://dnb.ddb.de.

ISBN-13: 978-3-11-018600-0
ISBN-10: 3-11-018600-4
ISSN 0167-4331
Copyright 2005 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin.
All rights reserved, including those of translation into foreign languages. No part of this
book may be reproduced in any form or by any means, electronic or mechanical, including
photocopy, recording, or any information storage and retrieval system, without permission
in writing from the publisher.
Cover design: Christopher Schneider, Berlin.
Printed in Germany.


Most of the work on this book was done while the first editor was a Research
Fellow at the Netherlands Institute for Advanced Study in the Humanities
and Social Sciences (NIAS) in Wassenaar in the period 20022003. We are
extremely grateful to NIAS for the tranquil yet productive environment in
which the ideas expressed in this book could be conceived and reflected
A first version of some of the papers in this volume here were presented at
a workshop in the Linguistics and Phonetics 2002 (LP2002) conference,
held from September 26, 2002 at Meikai University in Urayasu, Japan.
We are grateful to the organisers for giving us the opportunity to have this
workshop, and to the audience for helpful discussion and suggestions.
Jeroen van de Weijer
Kensuke Nanjo
and Tetsuo Nishihara

Leiden, Summer 2005


Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Voicing in Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jeroen van de Weijer, Kensuke Nanjo and Tetsuo Nishihara

Part I Consonant voice

Rendaku: Its domain and linguistic conditions . . . . . . . . . . . . . . . . . . . . . . .
Haruo Kubozono

Sequential voicing, postnasal voicing, and Lymans Law revisited . . . .

Keren Rice


Sei-daku: diachronic developments in the writing system . . . . . . . . . . . .

Kazutoshi Ohno


The representation of laryngeal-source contrasts in Japanese . . . . . . . . .

Kuniya Nasukawa


Rendaku in inflected words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Timothy J. Vance


Ranking paradoxes in consonant voicing in Japanese . . . . . . . . . . . . . . . . 105

Haruka Fukazawa and Mafuyu Kitahara
The implicational distribution of prenasalized stops in Japanese . . . . . . 123
Noriko Yamane-Tanaka
The correlation between accentuation and Rendaku in Japanese
surnames: a morphological account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Hideki Zamma
A survey of Rendaku in loanwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Tomoaki Takayama

viii Contents
Recognizing Japanese numeral-classifier combinations . . . . . . . . . . . . . . 191
Keiichiro Suzuki

Part II Vowel voice

Corpus-based analysis of vowel devoicing in spontaneous Japanese:
an interim report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Kikuo Maekawa and Hideaki Kikuchi
Syllable structure and its acoustic effects on vowels in devoicing
environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Mariko Kondo
The effect of speech rate on devoiced accented vowels in
Osaka Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Miyoko Sugito
Where voicing and accent meet: their function, interaction, and
opacity problems in phonological prominence. . . . . . . . . . . . . . . . . . . . . . . 261
Shin-ichi Tanaka

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index of authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index of languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index of subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Voicing in Japanese
Jeroen van de Weijer, Kensuke Nanjo
and Tetsuo Nishihara

This book presents a number of studies which focus on the [voice] grammar
of Japanese, paying particular attention to historical background, dialectal
diversity, phonetic experiment, and phonological analysis. Both voicing processes in consonants (such as Sequential Voicing, henceforth Rendaku) and
vowels (such as vowel devoicing) are examined. A number of new analyses
are presented, focusing on well-known data that have been controversial in
phonological debate in the past, but it also presents new (or rediscovered)
data, partly through the work of Japanese scholars that hitherto went mostly
unnoticed, partly through new database research, and partly through phonetic
experiment. In this introduction, we will briefly introduce the different contributions and point out their respective interests.
There are two parts to the book: (1) consonant voice, (2) vowel voice. In
the consonant part, the contribution by Kubozono presents a point of departure by introducing many of the voicing phenomena in Japanese, and also
pointing out some of the relevant dialectal differences. Let us briefly review
the most important of these in very general terms. For details and refinements, we refer to the contributions that follow. Rendaku is a rule of
Japanese which voices the initial consonant of the second member of a
compound, if certain phonological and syntactic conditions are satisfied.
Consider the following examples (taken from various standard sources):


- kuni
- sushi
- tanuki  oo-danuki

island country
rolled sushi
large badger

2 Jeroen van de Weijer, Kensuke Nanjo and Tetsuo Nishihara

However, if the second member of the compound has a voiced stop, Rendaku
is not allowed to occur, as the following examples show (again, we will not
go into various important issues, but only illustrate the received wisdom):


- tokage
- kaze


poisonous lizard


big wind
(Vance 1987)

The first non-Japanese researcher who wrote about this blocking effect was
Lyman (1894), which is why the condition is commonly referred to as
Lymans Law. In other literature, the condition is referred to as Motoori
Norinagas Law. A point of controversy in the literature is whether this
Law has exceptions or not. In recent work, Haraguchi (2003) points out
that the exceptions can all be analysed by making reference to independently
motivated principles of grammar, such as morphological constituency.
A number of issues are distinguished with respect to Rendaku: is it an exceptionless rule? Tamamura (1989) points out that only in 60% of the nounnoun compounds in which Rendaku could occur, it actually does occur. If
Rendaku is not a regular rule, how should the exceptions be accounted for?
How should the phonological and syntactic conditions be formalized? To
what part(s) of the lexicon does the rule apply? How long has it been part
of the grammar of Japanese? Do loanwords undergo it? What does the rule
tell us about the specification of voicing on obstruents and on sonorants in
the phonology of Japanese? All these issues are dealt with at length in the
contributions that follow, but let us pick out three major issues here. First, it
appears to be the case that some lexical items that would at first glance be
expected to undergo Rendaku do not undergo it, or undergo the rule in
some compounds but not in others. This variable or unpredictable behaviour can be approached from a number of viewpoints. Has the rule been
completely lexicalized? Are there still subregularities? Does analogy play a
role? How are new formations treated? These questions play a major role in
the contributions by Kubozono and Ohno, while certain syntactic conditions that were assumed up to now are scrutinized and dismissed in the contribution by Vance. The fact that a rule is variable presents obvious problems for speech recognition software. This problem is dealt with in the
paper by Suzuki.
A second issue concerns the stratification of the Japanese lexicon. As is
well known, the Japanese language incorporates a number of layers, or

Voicing in Japanese

vocabulary from different sources, which are subject to partly different

restrictions. An obvious question is how many layers there are, about which
there is some controversy in the literature. A first layer comes from native
Japanese, and is known as Yamato. A second layer comes from early
Chinese well-incorporated loans, known as Sino-Japanese. Borrowings may
be either fairly well incorporated (in which case they constitute the Loan
stratum) or not (in which case they constitute the Foreign stratum). Finally,
onomatopoeic forms are involved in a class of their own, which is usually
referred to as the Mimetic stratum. The question is whether Rendaku applies in all of these strata, or rather since it does not how the difference
in application among the strata should be formalized, especially in the light
of the idea that the grammars of languages consist of a single constraint
hierarchy. This issue is discussed in detail in the contribution by Fukuzawa
and Kitahara, while the question if Rendaku applies in loanwords is explored by Takayama.
A final issue touched upon here is the specification of the distinctive
feature [voice]. The nature of this feature has been the topic of fierce debate
in the literature: is it binary or unary? Is it specified for obstruents and
sonorants in the same way? Recall that Rendaku voices obstruents in compounds under certain conditions. Nasals (or other sonorants) do not play a
role, which suggests that they are not (underlyingly) specified for voice. In
another voicing process postnasal voicing , however, nasals seem to impose their voicedness on a following stop, so that sequences such as nt
are not allowed (again, making provisions for stratumhood). In this respect,
therefore, nasals do seem to bear a specification for voicing, which presents
an interesting paradox, especially, again, in the light of the idea that the
grammars of languages consist of a single constraint hierarch, which maps
outputs directly onto inputs, without intermediary stages. This paradox
takes primary place of attention in the contribution by Rice.
Of particular interest are the historical papers in the volume. Two articles shed light on the historical dimension: where does Rendaku come from
and was it always an irregular process? Ohno investigates this question
using the earliest sources and finds, among other things, that Rendaku has
always been irregular. Yamane-Tanaka investigates the close relation between voicing and prenasalization (which is still evident in some varieties
that have prenasalized stops or nasal vowels; cf. Nishihara 2002) and offers
an OT-style analysis both of the dialect-geographical situation and the
historical development.
It is well known that Japanese is a pitch-accent language, and this raises
the question whether there is a relation between accent on the one hand,

4 Jeroen van de Weijer, Kensuke Nanjo and Tetsuo Nishihara

and Rendaku on the other. Zamma investigates precisely this relation in
personal names and in place names, giving an Optimality account which
again bears on the stratification of the lexicon.
The second part of the book deals with vowel voice, and here vowel devoicing, the second process which Japanese is well known for, is the main
topic. Again, let us briefly illustrate without going into details. In most
Japanese dialects, there is a phonological rule of high vowel devoicing. If
the high vowels /i, u/ appear between voiceless sounds and/or word boundary, these vowels are devoiced. Furthermore, Maekawa (1988) points out
that devoicing of the low vowel /a/ and the mid vowel /o/ sometimes takes
place, although not as frequently as the high vowels. Both the facts, which
differ greatly between dialects, speech styles, etc., as their phonological
interpretation are topics of debate in the contributions that follow. The
stage is set by the article by Maekawa and Kikuchi, who present a great
deal of factual data concerning vowel devoicing, including information on
which vowels are devoiced, in which environments, whether consecutive
vowels can be devoiced, etc. These facts are taken from new database research. The contribution by Kondo looks at vowel devoicing from a syllable/mora perspective: does vowel devoicing affect syllable structure, or,
alternatively, does syllable structure constrain the process? The paper by
Tanaka looks at the concept of prominence in general, which bears on the
voiced-voiceless distinction, which is relevant to consonants as well as
vowels, but also on accent. Sugito presents the results of an experiment on
vowel devoicing: how well do speakers recognise the accent that was present on the devoiced vowel?
We hope that this volume succeeds in putting some of the received wisdom with respect to voicing in Japanese to the test from both a
phonological as well as phonetic perspective.

Rendaku: Its domain and linguistic conditions

Haruo Kubozono

1. Two kinds of rendaku voicing

The main purpose of this paper is to review past works on rendaku, or sequential voicing, with main focus on its domain and linguistic conditions
and to summarize remaining questions for future work.
One of the most fundamental questions regarding Japanese rendaku concerns its linguistic nature: is it a productive process or is it no more than a
property of specific lexical entries? The first hypothesis emphasizes the productivity of rendaku and defines it as a productive phonological (or morphophonological) process of voicing that permits lexical exceptions (Otsu 1980,
It and Mester 1986, 2003; cf. Kuroda 2002). On the other hand, the second
hypothesis focuses on the extremely large number of lexical exceptions and
attributes rendaku to a lexical property of certain words (Ohno 2000).
Let us consider the pair hiragana and katakana (the two types of kana
letters), for example. Etymologically, these words are made up of two morphemes, /hira+kana/ and /kata+kana/. In the course of the history, the first
word underwent voicing as in (1), while the second did not.
(1) hira + kana  hiragana
According to the first hypothesis mentioned above, this historical process
of voicing remains a productive process in modern Japanese by which
/hira+kana/ turns into /hiragana/. The word katakana is regarded as an exception to this synchronic process. The second hypothesis, in contrast, posits
hiragana and katakana as underlyingly /hiragana/ and /katakana/, respectively: the presence of voicing in the first word and the lack of voicing in
the second are lexical properties of the respective words.
These two hypotheses are difficult to assess because rendaku is extremely
productive in modern Japanese, on the one hand, and, on the other, it admits
an extremely large number of exceptions whose exceptionality is difficult to
explain. One important study that has tackled this difficult issue is the experimental work by Shinji and Suzy Fukuda (Fukuda and Fukuda 1999). They

6 Haruo Kubozono
looked at children with a language disorder called specific language impairment (henceforth SLI for short). People with SLI are linguistically normal
in every respect except that they cannot apply productive grammatical rules
to morphological/syntactic strings. For example, native English speakers
with SLI are unable to produce plural forms for countable nouns (2a) and to
put an ending /s/ to a verb to mark a third person, singular form (2b).

a. I have three apple.

b. Mary walk in the yard.

Fukuda and Fukuda (1999) examined how native Japanese speakers with the
same language impairment produce Japanese utterances. Specifically, they
looked at the way their eight- to twelve-year-old subjects produced voicing
in compound nouns. If the subjects should fail to produce voicing in words
like /hiragana/, then it would mean that voicing is a productive rule in modern Japanese, hence supporting the first hypothesis mentioned above. If, on
the other hand, the subjects should produce voicing in words like /hiragana/
just as normal native speakers do, then it would suggest that the phonological form with voicing is a lexical form of the word, namely, that voicing has
been lexicalized and is not produced by rule in the synchronic grammar.
What Fukuda and Fukuda (1999) found out is something that compromises
the two predictions. On the one hand, their subjects showed voicing in some
basic compound nouns like nagagutu long + shoes; boots, suggesting that
voicing in these words is part of their underlying representation. On the
other hand, they also showed lack of voicing in non-frequent and novel
compounds such as those in (3a), which were pronounced with voicing by
normal native speakers of the same age group as shown in (3b).

a. kotoba + tukai  kotoba-tukai language use

kotoba + hon  kotoba-hon language book
b. kotoba + tukai  kotoba-dzukai language use
kotoba + hon  kotoba-bon language book

This latter result reveals a contrast between normal speakers and speakers
with SLI, with the first but not the latter group of speakers being able to
produce voicing in non-frequent and novel compounds. This suggests that
voicing in non-frequent and novel compounds should be attributed to a
productive rule and, hence, that there exists a productive process of voicing
in normal speakers grammars.

Rendaku: Its domain and linguistic conditions

Fukuda and Fukudas experimental data are interesting in that they reveal
that some instances of rendaku voicing are lexicalized while others are due
to a productive rule. Native speakers of Japanese deal with the first type of
voicing by memorizing the form with voicing as a lexical entry. In contrast,
they deal with the second type of voicing by acquiring a voicing rule, or
rendaku rule, and applying it to unfamiliar and novel compounds. What
remains unclear is the boundary between the two kinds of voicing, more
specifically, between frequent and non-frequent compounds. This will
be an intriguing empirical question for future research.


Lymans Law revisited

2.1. Original version

One of the best-known conditions concerning the domain of rendaku is the
so-called Lymans Law, which can be defined as in (4). Representative
examples are given in (5) in contrast to those that are not subject to the

Rendaku is blocked in a compound word [AB] if B already contains a

dakuon, or a voiced obstruent.


a. aka + huda
 aka-huda, *aka-buda red tag
cf. uwa + huta  uwa-buta
top lid
roten + huro  roten-buro
outdoor, bath;
outdoor bath
b. ai + kagi
 ai-kagi, *ai-gagi
cf. ama + kaki  ama-gaki
umi + kame


duplicate key
sweet, persimmon;
sweet persimmon
sea, turtle; sea turtle

c. yama + kazi  yama-kazi, *yama-gazi forest fire

cf. wa + kasi  wa-gasi
Japanese cake
temuzu + kawa temuzu-gawa
Thames, river;
River Thames
As these examples indicate, Lymans Law represents a case of the OCP
(Obligatory Contour Principle) by which an identical element or feature is
prohibited from occurring more than once within a certain domain. In

8 Haruo Kubozono
rendaku, the feature in question is [+voice, +obstruent], with the relevant
domain of OCP being the morpheme or the second member of a compound.
While this is a well-known fact in Japanese phonology, there are certain
cases where Lymans Law requires a larger domain. This can be seen rather
clearly in the data provided by Sugito (1965), which we will consider in
detail in the next section.
2.2. Sugitos data and Lymans Law
Sugito (1965) looked at the alternation between /ta/ and /da/ shown by the
morpheme ta rice field as it is combined with a bimoraic morpheme to
form a personal name: e.g. /siba-ta/ vs. /ima-da/. This particular morpheme
exhibits a rather clear pattern of alternation, which is more or less predictable from the consonant in the immediately preceding mora.1 The results of
Sugitos analysis can be summed up as follows.

a. The morpheme is usually realized as [da] when it is immediately

preceded by a mora containing either /s/, /m/, /n/, /t/, or /k/, as well
as when it is preceded by a heavy syllable (except a syllable containing a moraic obstruent).
b. The morpheme is invariably realized as [ta] when it is preceded by
a mora containing either /d/, /b/, /g/, /z/ or /y/, or when it is preceded by a moraic obstruent.
c. The morpheme is predominantly realized as [ta] but permits [da] occasionally when it is preceded by a mora containing either /r/ or /w/.

Representative examples are given below.


a. asa-da, hama-da, sana-da, kata-da, huku-da; soo-da, sai-da, kan-da

b. kubo-ta, kado-ta, naga-ta, mizu-ta, haya-ta
c. ari-ta vs. hara-da, iwa-ta vs. sawa-da

We can develop Sugitos analysis one step further and reinterpret the data
in terms of natural classes.2 This reanalysis leads to the generalization in
(8). The contrast between (8a) and (8b) is illustrated in (9).

a. /da/ is preferred after voiceless obstruents and nasals.

b. /da/ is prohibited after voiced obstruents.
c. /ta/ is preferred after approximants.

Rendaku: Its domain and linguistic conditions


a. huku-da, kasi-da, kusu-da, asi-da, kase-da, kaku-da, sima-da,

naka-da (or naka-ta), kata-da
b. hugu-ta, kazi-ta, kuzu-ta, azi-ta, kaze-ta, kagu-ta, siba-ta, kubo-ta,
naga-ta, sugi-ta, kado-ta

In terms of the markedness of voicing, this generalization means that /ta/ is

chosen if the consonant in the immediately preceding mora has the feature
[+voice], whereas /da/ may be chosen if the consonant in question is [-voice]
or unspecified with respect to voicing (as in nasals). Here the variation between /ta/ and /da/ for some words like naka + ta (/nakata/~/nakada/) does
not directly concern us. What is of interest is the fact that /da/ is never permitted if the immediately preceding mora already contains a voiced obstruent.
This is a clear case of application of the OCP, or an extension of Lymans
Law (see (4)). In (5), rendaku voicing is blocked if the second element of
the compound already contains a voiced obstruent. In (9), the same process
is blocked if the first element ends in a mora containing a voiced obstruent.
The similarity between the two cases is obvious: presence of a voiced obstruent in the neighborhood prevents rendaku from creating another voiced
obstruent. The OCP effect in (9b) is particularly interesting because voiced
obstruents block rendaku across a morpheme boundary.
This extended effect of OCP in rendaku is not a new finding, however.
Kindaichi et al. (1988: 264), citing examples like /maga-tama/ ancient accessory and /mizu-tama/ polka dot, note that this law has existed in
Japanese since the ancient period. According to It and Mester (2003), this
was originally reported by Tatsumaro Ishizuka in 1801. It and Mester
claim that the domain of Lymans Law has narrowed from the word (prosodic word) in Old Japanese to the morpheme in modern Japanese. It remains unclear why this domain change has taken place and how the old
effect of Lymans Law leaves its trace in modern Japanese. These are very
interesting questions for future work.
Returning to Sugitos data regarding the /ta/-/da/ alternation, there are
several additional facts that are worthy of special attention. One is the fact
that approximants (/r/, /w/ and /y/) pattern more or less with voiced obstruents, while nasals (/m/ and /n/) pattern with voiceless obstruents. These two
groups of sounds form a natural class in Japanese phonology in that they
are all voiced and lack voiceless counterparts. It is puzzling that they pattern differently with respect to the /ta/-/da/ alternation. Particularly mysterious is the behavior of approximants which, as summarized in (10), tend to
show the same behavior as voiced obstruents.

10 Haruo Kubozono

-/ta/ -/da/

In terms of the markedness of voicing, approximants should pattern with

voiceless obstruents and nasals since they involve an unmarked value of
voice. This unmarkedness shows itself very clearly in the general cases of
Lymans Law which we saw in (5) above. Namely, unlike voiced obstruents,
approximants do not block the voicing process when they occur in the second member of compound nouns, and they pattern exactly with voiceless
obstruents and nasals in this respect. It is very strange to find that approximants display the same pattern as voiceless obstruents and nasals with respect to Lymans Law in the original sense, while they pattern with voiced
obstruents in the extended version of the same law.
Another interesting fact about the /ta/-/da/ alternation concerns the peculiar behavior of /r/. As shown in (10), /r/ predominantly prefers /ta/ rather
than /da/. The three exceptions to this in Sugitos data are /hara-da/, /terada/ and /tora-da/, in all of which /da/ is preceded by the low vowel /a/.3 This
suggests that the choice between /ta/ and /da/ after /r/ is also influenced by
the quality (or height, to be more exact) of the immediately preceding
vowel, i.e. the final vowel of the preceding morpheme. This possibility is
also worth exploring.
A final noteworthy fact about Sugitos data is that /k/ behaves somewhat
differently from other voiceless obstruents. While /t/ and /s/ invariably
choose /da/, /k/ admits quite a few exceptions as the following statistics and
examples show.

-/ta/ -/da/

(12) a. /ta/: iku-ta, aki-ta, oki-ta, kaki-ta, maki-ta, saka-ta

b. /da/: huku-da, oka-da, taka-da, toku-da, oku-da, kaku-da, ike-da,
take-da, huka-da
It is true that /k/ prefers /da/ rather than /ta/, but it is obviously different
from other voiceless obstruents in the extent to which it tolerates /ta/. The
reason for this peculiar behavior of /k/ remains unclear.

Rendaku: Its domain and linguistic conditions


2.3. Summary
In the preceding section we have seen Sugitos data concerning the /ta/-/da/
alternation in personal names consisting of three moras. It should be clear
now that in these particular type of compound nouns, Lymans Law exerts
its effect in a wider domain than is usually assumed. The same effect is
found in many pairs of personal names including /naga-sima//nakazima/, /naga-sawa//naka-zawa/ and /naga-saki//naka-zaki/, which fluctuate between these two forms. This being said, it is also important to point
out that not all morphemes or personal names exhibit the same extended
effect of Lymans Law. Restricting ourselves to personal names, we find
that some morphemes invariably undergo rendaku even when they are preceded by a voiced obstruent. sono garden and huti the depth, an abyss,
for example, get voiced regardless of what morpheme they are combined
with. Indeed, these morphemes invariably undergo rendaku as long as they
are in a non-initial position of a compound.
(13) a. hoka-zono, mae-zono, kubo-zono, azi-zono, naka-zono, nagazono, sugi-zono, eno-ki-zono
b. naka-buti, naga-buti, sugi-buti
On the other hand, some morphemes tend to resist rendaku voicing in any
context. hara field and saka slope may be such morphemes, which are
realized as /hara/ and /saka/, respectively, in most cases:
(14) a. oo-hara, o-hara, naga-hara, naka-hara, saka-hara; cf. kanbara
b. e-saka, oo-saka, naka-saka, no-saka, ta-saka
Most morphemes including ta discussed in the preceding subsection fall
between these two extremes. A closer examination of compound nouns
may reveal a more general nature of the extended effect of Lymans Law
sketched in (8)(9) as well as the degree to which rendaku voicing is morpheme-dependent.

3. Branching constraint
A second major condition on rendaku voicing in Japanese is the so-called
branching constraint (Otsu 1980; Kubozono 1988). This constraint can be
defined as follows.

12 Haruo Kubozono
(15) Rendaku is blocked in the second member of a right-branching compound.
Otsu (1980) gives the following pairs to illustrate the effect of this constraint.
(16) a. Right-branching compounds
nuri + [hasi + ire]  nuri-hasi-ire
lacquered, chopstick, case; chopstick case which is lacquered
nise + [tanuki + siru]  nise-tanuki-ziru
pseudo, raccoon dog, soup; raccoon dog soup that is not authentic
b. Left-branching compounds
[nuri + hasi] + ire  nuri-basi-ire
lacquered, chopstick, case; case for lacquered chopsticks
[nise + tanuki] + siru  nise-danuki-ziru
pseudo, raccoon dog, soup; soup made from a pseudo raccoon dog
In (16a), the second member forms a constituent with the third rather than
the first member. Corresponding to this morphosyntactic structure, rendaku
voicing is blocked between the first and second members although it is not
blocked between the second and third. In contrast, rendaku is not blocked
in (16b), where the second as well as the third member can undergo the
process. Sato (1989) adds the following pair to illustrate the same effect:
(17) a. mon + [siro + tyoo]  mon-siro-tyoo, *mon-ziro-tyoo
white, armorial bearing, butterfly; white cabbage butterfly
b. [o + siro] + wasi  o-ziro-wasi tail, white, eagle: white-tailed eagle
The status of the branching constraint may be questioned, however, despite
the examples in (16) and (17). For one, it is difficult to find clear cases
showing its effect. The compound nouns in (16) are novel compounds for
many native speakers of Japanese, who do not necessarily have clear-cut
intuitions about the presence or absence of voicing in the pairs of expressions. The compounds in (17) are existing expressions, but it is difficult to
find more examples showing a similar effect. Moreover, the branching constraint may be questioned by the existence of expressions that apparently
defy its effect. Some of these counterexamples are given in (18).

Rendaku: Its domain and linguistic conditions


(18) oo + [huro + siki]  oo-buro-siki big, bath, carpet; big talk

mati + [hi + kesi]  mati-bi-kesi town, fire, to extinguish;
fire brigade for common people
While the status of the branching constraint may thus be questioned, it can be
supported by more general phonological considerations. Kubozono (1988)
provided evidence that the process of accentual phrasing characteristic of
compound formation exhibits essentially the same prosodic asymmetry
between right-branching and left-branching structures. This is illustrated in
(19), where // denotes a lexical accent and is placed immediately after the
accented mora. Words without this mark are so-called unaccented words,
which involve no abrupt pitch fall at the phonetic output. { } indicates an
accentual phrase, or a prosodic word (PrWd).
(19) a. Right-branching compounds
doitu + [bungaku + kyookai]  {doitu}{bungaku-kyookai}
Germany, literature, association; German Association of literature
b. Left-branching compounds
[doitu + bungaku] + kyookai  {doitu-bungaku-kyookai}
Association of German literature
In right-branching compounds, accentual phrasing is blocked between the
first and second members with the result that the first member forms an
accentual phrase independent of the second and third members. Their leftbranching counterparts, in contrast, do not exhibit such an accentual split
and, consequently, constitute one unified accentual unit. This contrast between left-branching and right-branching compounds is equivalent to the
situation of rendaku blocking shown in (16) and (17). Unlike the case of
rendaku voicing, there are a number of compound nouns in Japanese that
exhibit an accentual contrast as shown in (19).
More significantly, the right-branching structure is subject to a similar
branching constraint at the phrasal level, where intonational phrases called
minor phrases are formed. This post-lexical process, too, is blocked in
right-branching structure, but not in left-branching structure (Kubozono
1988, 1995). All these observations indicate that a branching constraint of
the sort in (15) is independently motivated in Japanese phonology. All we
need to do is to define the constraint in a more general form as in (20):
(20a) and (20b) are synonymous in descriptive terms.4

14 Haruo Kubozono
(20) Branching constraint
a. Phonological unification is blocked in the right-branching structure.
b. Phonological unification is blocked between two constituents, A
and B, if B does not c-command A.
An equally interesting fact about the branching constraint thus redefined is
that it also applies to phonological processes in languages other than Japanese. In English, for example, compound nouns exhibit an asymmetry between left-branching and right-branching constructions, with the latter but
not the former failing to conform to the general strong-weak pattern of
compound stress of the language (Chomsky and Halle 1968; Liberman and
Prince 1977). Essentially the same asymmetry is observed in Chinese. In
this tone language, right-branching phrases fail to undergo the well-known
tone sandhi rule whereby a sequence of two tones 3 (falling-rising tone) is
converted into a sequence of tone 2 (rising tone) and tone 3 (falling-rising
tone) (Hirose et al. 1994). Thus, a string of 3-3-3 tones turns into 2-2-3 via
2-3-3 if it forms a left-branching structure, but the same string tends to
yield 3-2-3 in a right-branching structure. A similar effect is observed in the
tone sandhi rule in Ewe, a tone language in Africa (Clements 1978). Moreover, it is also reported that consonant lengthening in Italian is blocked in
right-branching constructions (Napoli and Nespor 1976). It is an open empirical question if this structural constraint is observed in a wider range of
languages, but it obviously represents a rather general constraint on
phonological processes that has a cross-linguistic significance.
One last question that remains unanswered is why phonological processes
in Japanese and other languages are subject to the structural constraint formulated in (20) or, equivalently, why right-branching structure exhibits
such a marked phonological pattern. Kubozono (1995) proposed two hypotheses. One is that the right-branching structure displays irregular phonological behavior in languages where the right-branching structure is syntactically/morphologically marked. This interpretation is consistent with the
fact that the left-branching structure is unmarked, at least statistically, in
Japanese compounds and phrases as well as in English compounds. If this
interpretation is correct, it is expected that left-branching rather than rightbranching structures will show marked, exceptional behavior in rightbranching languages. A second hypothesis put forward by Kubozono
(1995) is that the right-branching constraint in (20) is universal and applies
to compound nouns irrespective of whether the left-branching or rightbranching structure is syntactically/morphologically unmarked in a particular

Rendaku: Its domain and linguistic conditions


language. These two hypotheses must be compared and evaluated by examining phonological markedness in a wider range of languages. This is certainly another interesting topic for future work that will require detailed
cross-linguistic comparisons.


Mora constraint

4.1. The alternation between /hon/ and /bon/

We have seen in the preceding section that the constituent structure of a
compound noun can serve as a condition on rendaku voicing and, accordingly, determine the domain of this productive process. In addition to this,
there are cases where the domain of the rule is determined by the phonological length of the compound. One such case is compounds that contain
the morpheme hon book as their second member. According to Ohno
(2000), this morpheme exhibits an alternation between the underlying form
/hon/ and the rendaku form /bon/, depending on the phonological length of
the element with which it is combined, or N1.5 A crucial boundary lies between bimoraic and trimoraic N1. If the N1 is monomoraic or bimoraic,
hon fails to undergo rendaku and manifests itself as /hon/. If the N1 is three
moras long or longer, on the other hand, hon undergoes the voicing rule
and yields /bon/. These two patterns are formulated in (21) and illustrated
in (22).6,7
(21) N1+hon undergoes rendaku voicing if N1 is longer than two moras.
(22) a. e-hon picture book, aka-hon red book, or a brand name of a
publishers books for entrance examinations, ero-hon erotic book,
huru-hon secondhand book
b. bunko-bon paperback book, manga-bon comic book, tyuuko-bon
secondhand book, etti-bon erotic book, tankoo-bon independent
book, pinku-bon pink, book; pornographic book, karaa-bon
colored book
Two additional points should be emphasized here. First, the phonological
length of N1 must be defined in terms of the mora and not the syllable. This
is clearly shown by compounds such as /pinku-bon/ and /karaa-bon/ whose
N1 consists of three moras but two syllables. These bisyllabic morphemes

16 Haruo Kubozono
do not pattern with bimoraic and bisyllabic morphemes like /aka/ red and
/ero/ erotic. Another noteworthy point is that the morphological complexity
of N1 does not matter. The monomoraic and bimoraic N1s in (22a) are all
monomorphemic while N1 in (22b) consists of more than one morpheme in
most cases. This reflects the fact that hon, a Sino-Japanese (SJ) morpheme,
tends to be combined with another SJ morpheme (or morphemes) and that
each SJ morpheme is up to two moras long. However, the morphological
structure of the N1 does not directly concern the boundary between /hon/
and /bon/. This is shown by monomorphemic N1s such as /pinku/ pink
and /karaa/ color, which clearly pattern with bimorphemic words like
/bunko/ bibliotheca, papeterie and /manga/ cartoon and not with monomorphemic words like /aka/ red and /ero/ erotic.
Having justified the generalization in (21), it is necessary to point out
that this rule applies specifically to compound nouns with hon, and not to
other compounds. Indeed, many morphemes other than hon do not conform
to the pattern in (21). We saw in section 2 above that the morpheme ta rice
field can undergo voicing even when it is combined with a bimoraic noun.
Moreover, some morphemes like ha tooth and kame turtle undergo
rendaku even when they are combined with bimoraic nouns as in (23a),
while others are not subject to voicing whether they are combined with
bimoraic or trimoric nouns, as shown in (23b).
(23) a. musi + ha  musi-ba
a decayed tooth
umi + kame  umi-game sea, turtle; turtle
mayu + ke  mayu-ge eyebrow, hair; eyebrow
b. migi + te
hidari + te  hidari-te
kasegi + te  kasegi-te

right, hand; the right hand

left, hand; the left hand
to earn, hand; bread winner

While the rule in (21) is not a general constraint on rendaku in Japanese

compound nouns, it does not follow that the mora-based generalization
represents an idiosyncratic rule in Japanese phonology. The rule in (21) can
be reinterpreted as follows if we consider the phonological length of the
whole word rather than the length of individual components.
(24) hon undergoes rendaku if the entire word consists of more than four
This generalization means that rendaku does not occur in Noun-hon if this
whole word is up to four moras long. In other words, the morpheme hon

Rendaku: Its domain and linguistic conditions


preserves its underlying form /hon/ in four-mora or shorter words, while it

undergoes some phonological process characteristic of compounds in
words of five or more moras. Interestingly, essentially the same contrast
between words up to four moras and those composed of five or more moras
is observed in several other phonological processes independent of rendaku
voicing in Japanese. Let us first consider the process that It and Mester
(1996) called contraction in SJ compounds.

4.2. Contraction in SJ compounds

One type of SJ morpheme has a (C)VC structure with an optimal onset.
Morphemes of this type can only take a voiceless obstruent, /t/ or /k/, in the
coda position, and exhibit two phonological patterns in SJ compounds, depending on the initial segment of the following morpheme. In many cases,
they undergo vowel epenthesis in order to avoid closed syllables or voiced
geminates. This is illustrated in (25), where < > and /./ denote an epenthetic
vowel and a syllable boundary, respectively.
(25) gak + bu  ga.k<u>.bu, *gak.bu, *gab.bu learning, part; faculty
but + ri  bu.t<u>.ri, *but.ri, *bur.ri
substance, law; physics
On the other hand, (C)VC morphemes do not undergo epenthesis if the
following morpheme begins with a voiceless consonant. This is the pattern
that It and Mester (1996) termed contraction. The only minor change
that the morphemes in question undergo is place assimilation, whereby the
morpheme-final consonant becomes homorganic with the initial consonant
of the following morpheme.8 This is illustrated in (26).
(26) gak + kai  gak.kai, *ga.k<u>.kai learning, party; academic society
but + si  bus.si, *bu.t<u>.si
Buddha, teacher;
a sculptor of Buddhist images
gak + ka  gak.ka, *ga.k<u>.ka learning, department; department
but + ka  buk.ka, *bu.t<u>.ka thing, price; commodity prices
The contraction process in (26) has the effect of combining the two morphemes in a straightforward manner. This process, however, is blocked if
there is a word boundary between the two morphemes. In other words, the
contraction in (26) occurs only if the two adjacent morphemes form a constituent. This is illustrated in (27a), where the constituency is shown by [ ].

18 Haruo Kubozono
SJ compounds in (27b), in contrast, readily undergo the contraction since
they do not involve a word boundary between the second and third elements.
(27) a. [dai + but] + si  dai.bu.t<u>.si, *dai.bus.si
great, Buddha, teacher; a sculptor of big Buddhist images
[sin + gak] + ka  sin.ga.k<u>.ka, *sin.gak.ka
god, learning, department: department of religion
b. dai + [but + si]  dai.bus.si, *dai.bu.t<u>.si
great, Buddha, teacher; a great sculptor of Buddhist images
sin + [gak + ka]  sin.gak.ka, *sin.ga.k<u>.ka
new, learning, department; a new department
It and Mester (1996) interpret the constituency effect illustrated in (27) as
a constraint on the domain of the contraction process: Contraction occurs
within a PrWd, which consists of one or two morphemes. Since every SJ
morpheme is at most two moras long, this domain constraint can be reinterpreted as in (28).
(28) Contraction occurs in the domain of up to four moras.
Contraction has taken place in (26) and (27b) since the (C)VC morphemes
in question are embedded in a word of up to four moras. In (27a), by contrast,
(C)VC morphemes are combined with the following CV(C) morphemes in
a larger word. In terms of phonological length, this fact can be reduced to a
constraint requiring that the maximal domain of contraction be a constituent
consisting of four moras. In other words, two morphemes can be combined
without undergoing vowel epenthesis if they form a four-mora or shorter
word. This is precisely the same domain constraint that we saw for hon
above, which does not undergo the compound rule of rendaku voicing if it
is embedded in a four-mora or shorter word.

4.3. /p/-/h/ alternation in SJ compounds

/p/-/h/ alternation in SJ compounds shows the same domain effect as the process of vowel epenthesis. This is illustrated with the morpheme hitu pencil
here. It is generally assumed that the underlying form of this alternation is
/p/, which alternates with /h/ in a predicable way. It and Mester (1996)

Rendaku: Its domain and linguistic conditions


showed that morphemes involving this alternation preserve the underlying

form with /p/ when they follow a morpheme ending in a moraic nasal /N/:
(29) eN + pitu  em-pitu lead, pen; pencil
haN + patu  ham-patu opposite, start; rebel
However, this does not happen in the two environments in (30) even if they
are preceded by a moraic nasal. In (30a), the /p/-morpheme is combined
with a SJ compound; in (30b), it forms a constituent with the following
morpheme before it does with the preceding one. In these two cases, /p/initial morphemes do not keep their underlying /p/ and, hence, take /h/ instead ([ ] denotes a constituent):
(30) a. [maN +neN] + pitu  mannen-hitu, *mannem-pitu
one thousand, year, pen; fountain pen
b. siN + [patu + mei]  sin-hatumei, *sim-patumei
new, invention; new invention
Since SJ morphemes are maximally bimoraic, the boundary between the /p/
pattern in (29) and the /h/ pattern in (30) can be defined as follows:
(31) /p/ is preserved if it is in the non-initial position of words consisting
of up to four moras; otherwise, it is realized as /h/.
This domain effect is identical to the one we saw in the preceding section
as well as the ones we will see in the next sections.

4.4. Accent of mimetics

Japanese exhibits some accentual processes that are sensitive to the fourmora domain. One of them is the accentuation of reduplicative mimetic
expressions. The base form of Japanese mimetic expressions is largely bimoraic with accent on the initial syllable. When these bimoraic bases are
reduplicated to form four-mora words, they are usually accented on their
initial syllable. In other words, only the first member of the reduplicated
form preserves its accent. This is an accent pattern characteristic of reduplicative mimetics (32) and reduplicative nouns (33) as well as dvandva, i.e.
coordinate, compound nouns (34) (Nasu 2001).

20 Haruo Kubozono
(32) yura + yura  yura-yura (to sway) gently
suru + suru  suru-suru (to climb) smoothly
bata + bata  bata-bata (to fall) noisily, one after another
(33) mura + mura  mura-mura village, village; villages
kazu + kazu  kazu-kazu number, number; in a great number,
(34) yoru + hiru  yoru-hiru
hiru + yoru  hiru-yoru
asa + ban

night and day

day and night
morning and evening

Accent deletion of the second member does not seem to occur, however, if
the bimoraic base is reduplicated after being combined with a mimetic ending such as /ri/ and the moraic nasal /N/.9
(35) yurari + yurari  yurari yurari (to sway) in slow motion
sururi + sururi  sururi sururi (to dodge) swiftly
bataN + bataN  bataN bataN thumpety thump
The contrast between (32) and (35) indicates that four-mora mimetics constitute a prosodic word (PrWd), or one accentual unit, whereas six-mora
mimetics form two PrWds. This provides further support to the claim that
the maximal length of a PrWd is four moras.

4.5. Accent of numeral sequences

Similar accentual evidence can be found with the pronunciation of numeral
sequences. SJ morphemes for numbers are underlying monomoraic or bimoraic, but they are invariably pronounced with a bimoraic length when
enumerated in a string of numbers (It 1990). Thus, monomoraic morphemes, /ni/ two and /go/ five, are pronounced with a long vowel: [ni:]
and [go:]. What is interesting here is that numeral sequences are divided
into prosodic words each consisting of two morphemes, or four moras. This
shows up very clearly in citing telephone numbers, as exemplified in (36).
{ } denotes the domain of PrWd, while H and L stand for high and low
tones, which are assigned to every mora here for the sake of description.

Rendaku: Its domain and linguistic conditions


(36) a. 03-6825-7194 {reesan}{rokuhati} {niigoo} {nanaiti} {kyuuyon}

b. 721-2875

{nananii} {iti} {niihati} {nanagoo}


In (36b), the string of three numbers, 721, is realized in two PrWds, with the
first two numbers forming a four-mora unit, and the last number constituting
a separate PrWd. This clearly demonstrates that the optimal length of PrWds
is maximally four moras.
Interestingly, the same maximality constraint operates in other dialects,
too. (37) shows how the string in (36b) is pronounced in Kinki (Kyoto/
Osaka) dialects (Fukui 1990). In fact, Tokyo and Kinki dialects differ only
in the tonal pattern of four-mora PrWds: four-mora strings are pronounced
with the tonal pattern of LHHL in Tokyo, and with the pattern of LLHL in
Kinki.10 Two-mora PrWds are pronounced with the original (or lexical)
accentual pattern of the relevant morpheme in both dialects.
(37) 721-2875 {nananii}{iti} {niihati} {nanagoo}
The facts in (36) and (37) clearly show that the maximal size of a PrWd is
four moras in number enumerations. Compound nouns can form a longer
PrWd, as exemplified in (19), but this is due to a morphological requirement demanding correspondence between morphological and prosodic
words (or edges). The facts discussed in this and the preceding sections
reveal an emergence of the unmarked, or an optimal phonological shape of
PrWds in Japanese.

4.6. Morphological evidence

Finally, morphological evidence reinforces our claim that the optimal form
of a PrWd in Japanese is up to four moras long and, hence, that the mora
condition on the voicing of /hon/ in (24) is of a rather general nature in
Japanese phonology. Let us consider truncation, first. One of the most basic
characteristics of loanword truncation in Japanese is that long words are
converted into four-mora or shorter forms: e.g. /irasutoreesyon/  /irasuto/
illustration (It 1990). Truncation of compounds is subject to essentially
the same condition to yield four-mora outputs in most cases: e.g. /poketto

22 Haruo Kubozono
monsutaa/  /pokemon/ pocket monster, or Pokmon. This process admits three-mora outputs in some contexts, but never permits five-mora or
longer outputs.
The same maximality constraint applies to other morphological processes
such as the formation of zuzya-go (a jazz musicians secret language).
Zuzya-go formation involves metathesis by which the final two moras in
the input are combined with the initial two moras to yield four-mora outputs. Here again, three-mora outputs are allowed in some contexts, but fivemora or longer outputs are absolutely illicit (It et al. 1996; Kubozono
2002b). This input-output correspondence, too, reveals a tremendous difference between structures with four and structures with five moras. All in
all, the fact that five-mora or longer outputs are never tolerated in these
morphological processes supports the idea that the optimal word form in
Japanese is up to four moras long.
There are, of course, quite a few words, mostly loanwords, that are morphologically simplex but phonologically longer than four moras: e.g.
/irasutoreesyon/ illustration, /animeesyon/ animation. But there is
some accentual evidence suggesting that five-mora or longer loanwords are
processed as phonological compounds, i.e. that five-mora or longer words
are split into two four-mora or shorter substrings to which accent is assigned by the compound accent rule (Sato 2002; Kubozono 2002a). This,
too, lends support to the idea that PrWds in Japanese are optimally up to
four moras long.

5. Concluding remarks
In this paper I have first considered Fukuda and Fukudas (1999) neurolinguistic data suggesting that rendaku voicing falls into two kinds: voicing in
some words is lexicalized, while voicing in other words is due to a productive synchronic process of voicing. In the rest of the paper, I have discussed
three constraints on rendaku voicing: an extended version of Lymans Law,
branching constraint and mora constraint. These three constraints define the
domain in which the productive process of rendaku voicing occurs in contemporary Japanese. The extended version of Lymans Law and the mora
constraint apply only to a specific type of compound nouns, while the
branching constraint applies in a wider context. Despite this difference, all
these constraints represent quite general conditions on phonological and morphological processes in Japanese. In this sense, the constraints on rendaku

Rendaku: Its domain and linguistic conditions


voicing should be interpreted in a wider context. These constraints, if examined in more detail, might uncover more interesting aspects and principles of (Japanese) phonology.

1. What counts here is the consonant in the preceding mora, not in the preceding
syllable. This is clearly shown by den and goo, which yield /den-da/ and /gooda/, not /den-ta/ and /goo-ta/, respectively, even though they contain a voiced
obstruent (/d/ or /g/).
2. Sugito was mainly concerned with the relationship between the /ta/-/da/ distribution and the accentual pattern of the whole personal name. She found out
that three-mora names ending in /ta/ are usually accented on their initial mora
in Tokyo Japanese (e.g. siba-ta, kubo-ta), while those containing /da/, e.g.
/ima-da/ and /sima-da/, tend to be unaccented. This is an interesting fact that
needs to be explained.
3. We can add the word /kuro-da/ to Sugitos list of exceptions.
4. Node A c(onstituent)-commands node B if neither A nor B dominates the other
and the first branching node which dominates A dominates B (Reinhart 1976:
32). In the right-branching structure [[A][[B][C]]], [A] c-commands [B], but
[B] does not c-command [A] because [B] forms a constituency with [C] rather
than with [A]. In the left-branching structure [[[A][B]][C]], on the other hand,
both [B] and [C] c-command [A].
5. The morpheme hon book should be clearly distinguished from the numeral
classifier hon which is used to count the number of objects such as fingers and
pencils (e.g. /go-hon no yubi/ five-hon Gen finger=five fingers). This numeral morpheme alternates between three phomemic forms, /hon/, /bon/ and
/pon/, depending on the phonetic property of the immediately preceding sound
(Tanaka and Kubozono 1999).
6. An apparent exception to the generalization illustrated in (21) is the word /binibon/ vinyl book, a book enclosed in vinyl. This particular instance will not
count as an exception since /bini-bon/ does not come directly from /bini-hon/,
but from /buniiru-bon/ via shortening: namely, /biniiru + hon/  /biniiru-bon/
7. There are some compounds which contain the morpheme hon but have lost its
original meaning book e.g. /mi-hon/ a sample for sale, /syoo-hon/ an extract, hyoo-hon a sample. Interestingly, these lexicalized compounds conform to the generalization in (21).

24 Haruo Kubozono
8. Contraction is generally blocked if the first morpheme ends in /k/. In this case,
vowel epenthesis instead of contraction occurs except when the second morpheme also begins with /k/. Thus, /hak+ti/ and /hak+sai/ undergo vowel epenthesis and turn into /hakuti/ imbecility and /hakusai/ Chinese cabbage, respectively, while /hak+kyuu/ turns into /hakkyuu/ white ball. The fact that
the morpheme-final /k/ blocks contraction reveals an interesting asymmetry
between /t+k/ and /k+t/, which is called coronal asymmetry by It and
Mester (1996: 30). Thus, the former but not the latter triggers contraction: e.g.
/bet+kak/  /bekkaku/ different style vs. /hak+ti/  /hakuti/ imbecility. A
similar asymmetry is observed in the morphophonology of native verbs, where
a stem-final /k/ triggers vowel epenthesis rather than contraction when it is followed by a /t/-initial ending like the past marker /ta/. Thus, /kak + ta/ to write
(past) turns into /kakita/ (and subsequently /kaita/), whereas /yor + ta/ to approach (past) and /hasir + ta/ to run (past) turn into /yotta/ and /hasitta/, respectively.
9. We occasionally observe reduplicative mimetic forms that are five-mora long.
These five-mora forms seem to be split into two PrWds: e.g. {yura} {yurari}
(to sway) gently.
10. This tonal pattern is different from the typical pattern of nouns. In Tokyo
Japanese, nouns are usually accented, if accented at all, on the third mora from
the end of the word: the word Nagasaki, for example, is accented on /ga/ as
in /nagasaki/. However, the tonal pattern characteristic of numeral sequences
is also found in four-mora acronyms consisting of two alphabets. Thus, the
words for PC (personal computer) and OL (office lady) are pronounced
with an accent on the penultimate mora: /piisii/, /ooeru/. Alphabetic acronyms are different from numeral sequences, though, in that three-letter and
longer acronyms form one unified PrWd that is longer than four moras: e.g.
PTA /piitiiee/, IBM /aibiiemu/, YMCA /waiemusiiee/.

Sequential voicing, postnasal voicing,

and Lymans Law revisited
Keren Rice

Japanese exhibits several processes that involve voicing. In this article I

examine three of these: rendaku, Lymans Law, and post-nasal voicing.
Rendaku and post-nasal voicing are generally considered together under the
rubric of sequential voicing (e.g., Martin 1952; Vance 1987, 1996). Martin
(1952: 48) defines sequential voicing, or voicing alternations, as the replacement of a voiceless consonant with the corresponding voiced consonant
and Vance (1987: 133) gives a similar definition, saying that sequential
voicing or rendaku refers to the replacement of a morpheme-initial voiceless obstruent with a voiced obstruent. These definitions do not consider
the environment for voicing, nor the nature of the element that triggers
voicing, but simply view voicing as a substitution of a voiced obstruent for a
voiceless one. In the theoretical literature on voicing alternations in Japanese,
it has often been assumed that the voicing feature that is active in sequential
voicing is of a single type see, for instance, It and Mester (1986, 1995a,
1999a), It, Mester, and Padgett (1995, 1999), Calabrese (1995), Labrune
(1999), and Clements (2001) for some recent treatments. Others have argued
that two types of voicing features exist phonologically in Japanese see
Rice (1993); Avery and Idsardi (2002), and Kuroda (2002). In the first part
of this article, working within a representational framework, I address this
issue, arguing that two types of phonological voicing are needed in Japanese,
what I will call laryngeal voicing (LV) and what has been identified as
sonorant voicing, or SV (e.g., Avery 1996; Piggott 1992; Rice 1993; Rice
and Avery 1991). I argue that LV is the feature involved in what I will call
rendaku, to be further defined below, and SV the feature involved in postnasal voicing in a non-rendaku environment. My concern in this article is
with representations more than grammar, and I leave open exactly how the
grammar is to be formalized.
While the identification of two types of voicing accounts for one set of
problems in the phonology of Japanese, namely the different patterning of
voiced obstruents and nasals with respect to rendaku and Lymans Law, the


Keren Rice

solution that I offer runs counter to the claim that the lexicon of Japanese is
synchronically stratified, with a constraint against post-nasal voiceless obstruents holding of the Yamato, or native, stratum of Japanese and not in
the Sino-Japanese part of the lexicon.1 In the second part of the article, I
review the problems with stratification between the Yamato and SinoJapanese vocabularies based on the proposed constraint that the Yamato
stratum of the lexicon requires that post-nasal obstruents be voiced while
the Sino-Japanese stratum does not have this restriction, extending the discussion in Rice (1997).

1. Some background
It and Mester (1986) pose a conundrum in Japanese. First consider the
much-studied process of rendaku, or sequential voicing. Several examples
are given in (1).2 I use the Romanization used in the source.

a. voicing of initial consonant when no consonants follow
take + sao  take-zao
bamboo pole
bamboo pole
(Vance 1987: 133)
kan + sya  kan-zya
illness person
(Labrune 1999: 123)
b. voicing of initial consonant when voiceless obstruent follows
de + kuc&i  de-guc&i
leave mouth
(It and Mester 1986: 52)
c. voicing of initial consonant when sonorant follows
is&i + tooroo  is&i-dooroo stone lantern
stone lantern
(Vance 1987: 133)
ike + hana  ike-bana
arrange flower
(It and Mester 1986: 53)
hon + tana  hon-dana book shelf
book shelf
(Labrune 1999: 123)
d. no voicing of initial consonant when voiced obstruent follows
doku + tokage
poisonous lizard, Gila monster
poison lizard
(Vance 1987: 137)
hyootan + kago
gourd basket
(It and Mester 1986: 69)

Sequential voicing, postnasal voicing, and Lymans Law revisited


As the examples in (1) illustrate, rendaku fails to apply just in the case that
a voiced obstruent follows within the morpheme that is the target of voicing.
This blocking of rendaku by a voiced obstruent is conditioned by a process
known as Lymans Law; see, for instance, Martin (1952), McCawley (1966),
Vance (1987), It and Mester (1986, 1995a, 2003), and Fukazawa and
Kitahara (2001) for discussion. Lymans Law disallows more than one
voiced obstruent within a morpheme. This constraint holds whether both
obstruents are lexically contained within a single morpheme (see (2)) or
one would be derived through rendaku (as in (1d)).

Lymans Law: morpheme internal

futa lid
fuda sign

buta pig
(It and Mester 1995a: 819)

It and Mester (1986) treat Lymans Law as an OCP effect against two
voiced segments within a particular domain. If the feature [voice] is contrastive for obstruents and redundant for sonorants, the exclusion of sonorants as blockers is easily explained.
The term sequential voicing is also used to describe voicing that is
found in a post-nasal environment; see, for instance, Martin (1952), It and
Mester (1986), and Vance (1987, 1996). I refer to this as post-nasal voicing. An example is given in (3), showing alternations in form of the past
tense morpheme.

post-nasal voicing: morphologically derived environments

look at, past
(Vance 1987: 176)
die, past
(Vance 1987: 177)

It and Mester (1995a, 2003) and It, Mester and Padgett (1995) also propose that post-nasal voicing is at play within morphemes. They argue that
within the Yamato, or native, vocabulary of Japanese, voicing is redundant
on post-nasal obstruents, as in (4).

post-nasal voicing: within a morpheme


(It and Mester 1995a: 819)

(It and Mester 1995a: 819)

The facts presented in this section leave us with the conundrum mentioned
at the start the voicing illustrated in (1) is blocked by Lymans Law and
requires that voicing be absent from nasals (1c), but the process shown in


Keren Rice

(4), post-nasal voicing, requires the presence of voicing on nasals to trigger

voicing assimilation. It and Mester (1986) lay out a variety of solutions,
and suggest that rule ordering might offer the best solution, while It,
Mester, and Padgett (1995) use antagonistic constraints to account for the
facts; Clements (2001) offers an alternative analysis using a single voicing
feature, as does Calabrese (1995).
In this article, I take up my earlier proposal (Rice 1993, 1997; see also
Avery and Idsardi 2002; Ohno this volume), that the voicing that underlies
rendaku and the voicing that underlies post-nasal voicing are different in
nature, with one being laryngeal voice and the other sonorant voice. I refer
to this as the dual mechanism hypothesis, and to the alternative as the single
mechanism hypothesis. I begin by briefly summarizing the account of
rendaku and then examine post-nasal voicing.

2. Rendaku
It will be useful to begin by defining some terms. I use the term sequential
voicing to mean the overall effects discussed in section 1, namely the substitution of a voiced obstruent for a voiceless one without regard for details.
I use the term rendaku or rendaku element in a very narrow sense, to
refer to a voicing feature that functions as a compound formative (see It
and Mester 1986, for instance); it is this feature that causes voicing in compounds. Finally, I use the term post-nasal voicing to refer to the voicing
caused by a nasal in the absence of the rendaku element, as in (4).
With respect to rendaku, then, I assume an analysis like that proposed
by It and Mester (1986): the voicing feature, i.e., the rendaku element, is
part of a segment that occurs in some compounds (to be elaborated below)
and provides voicing to the initial obstruent of the second member of the
compound. Following the standard analysis, the association of the rendaku
element is blocked by the occurrence of a voiced obstruent later in the morpheme. I refer to the feature involved as LV; representationally, I assume
that it is dominated by a root node; see also Labrune (1999). Other than
calling the feature LV rather than [voice], this analysis follows It and
Mester (1986); see Labrune (1999) and Clements (2001) for other alternative analyses.

Sequential voicing, postnasal voicing, and Lymans Law revisited



Post-nasal voicing

I examine post-nasal voicing in two environments, between morphemes

(section 3.1) and within a morpheme (section 3.2).

3.1. Post-nasal voicing between morphemes

Post-nasal voicing between morphemes is illustrated in (3). What is the
source of post-nasal voicing? In Rice (1993) I argue that post-nasal voicing
in a morphologically derived environment in Japanese is a consequence of a
second type of voicing, sonorant voicing. Under the assumption that SV is
inherent in nasals, in morphologically derived forms, post-nasal voicing
results from the sharing of this feature by the nasal at the end of one morpheme and an adjacent obstruent at the beginning of the following morpheme; see Rice (1993). In Rice (1993) I made this proposal in order to
account for the fact that nasals are transparent with respect to rendaku voicing but share their voicing with a following obstruent, arguing that two
types of voicing are required to account for this chameleon-like patterning
of nasals. If this analysis is correct, what would be expected if underlying
obstruent voicing (LV) and post-nasal voicing (SV) were to interact?
Assuming these two types of voicing, one would expect that Lymans
Law would not block voicing when the targeted morpheme contains an LV
segment and sequential voicing is a consequence of post-nasal voicing, or
the feature SV. Under the single mechanism hypothesis, on the other hand,
with only a single type of voicing, both underlying voiced obstruents and
voiced obstruents derived from post-nasal voicing involve the same type of
voicing, and one would expect that Lymans Law would block post-nasal
voicing if a voiced obstruent is already present in the morpheme.3
Recall that sequential voicing (in the general sense) affects the first consonant of the second element of a compound. In examples of sequential
voicing, noun compounds where the first item ends in a vowel are usually
used, and examples of post-nasal voicing are discussed separately (e.g.,
Martin 1952; Vance 1987). It and Mester (1986) supply one example of
Lymans Law at work when the first noun of a noun-noun compound ends
in a nasal, (1d). I now examine compounds of other types than noun-noun
to see what is found there.
Vance (1987: 144) and Vance (this volume), in a discussion of compounds of other types (1987: 142146), observes that sequential voicing


Keren Rice

does not usually apply in compounds where both members are verbs, and
semantic and phonological conditions hold such that it sometimes applies
in direct object-verb compounds and sometimes not; see also Labrune
(1999), among others. Vance further recognizes a source for rendaku voicing, proposing that it comes from a reduced form of the genitive particle no
(Vance 1982, 1987: 136; also Labrune 1999) that occurred between nouns;
thus verb compounds would be excluded from being affected by sequential
voicing due to the absence of this particle. Based on Vances discussion,
and on the examples used to illustrate rendaku, it appears that the rendaku
element is not found in all compound types. Without going into detail, the
rendaku element is not found in most compounds involving verbs as a second member. Assuming then that the rendaku morpheme is present in some
noun compounds but that compounds with a verb as the second member (I
will call these verb compounds) do not contain this element, it might be
possible to sort out whether the rendaku element and post-nasal voicing
have the same feature through an examination of verb compounds.
First consider cases where the rendaku element is present and the second element of the compound contains a voiced obstruent. In the rendaku
environment, rendaku LV is not licensed as it would lead to a violation of
Lymans Law, as discussed above. The example in (5), repeated from (1d),
illustrates this.

hyootan + kago

gourd basket

(It and Mester 1986: 69)

In this case, the initial obstruent of the second member of the noun compound (k) fails to voice due to Lymans Law; the final nasal of the first
member of the compound cannot trigger voicing of the /k/ because it is not
adjacent, as illustrated in (6).





When the rendaku element is present, then, it takes precedence over a nasal
in terms of being a trigger for voicing on the initial consonant of the second
element of the compound. In (6), Lymans Law blocks the association of
the rendaku LV to /k/, producing the unvoiced form. Nasal-final and vowelfinal first elements of a compound pattern together, as it is the rendaku LV
and not the nasal SV that has the opportunity to be implemented here.

Sequential voicing, postnasal voicing, and Lymans Law revisited


Consider now forms with verbs as the second member. The prefixed forms
in (7) are verbs which contain an initial element with a final nasal and a
second element with an initial obstruent. The first morpheme is identified
by It and Mester (1999a: 68) and Vance (this volume) as fum- to step on
(and hence these forms are treated as compounds) and by Vance (1987:
137) as fuN-, an unproductive emphatic prefix.

Post-nasal voicing in verb compounds

tsukeru attach fun-dzukeru *fun-tsukeru trample on
(It and Mester 1999a: 68)
give up;
take decisive action
(It and Mester 1999a: 68; Vance this volume)

In these forms, the post-nasal obstruent voices. This can be accounted for
by either the single voicing mechanism or the dual voicing mechanism hypothesis; under the single mechanism hypothesis, voicing on the initial
consonant of the verb would be triggered not only by the rendaku element
but also by a nasal (the same is true of forms with the verb suffix in (3));
under the dual mechanism hypothesis, rendaku in (1) is triggered by the LV
of the rendaku element and post-nasal voicing in (3) and (7) by the SV of
the final nasal of the first morpheme. In the environment where the rendaku
element is missing, nasal-final and vowel-final first elements do not pattern
together: the final nasal of the first element causes post-nasal voicing, but if
these same verb stems follow a vowel-final morpheme, no voicing occurs;
see (11) below.
Some additional examples of verb compounds are given in (8) and (9).
These items have in common that their second element is suru, a verb
meaning do (Martin 1952: 49). When suru follows an element ending in a
nasal, its initial is generally voiced (8); when it follows an element ending
in a vowel, its initial is generally voiceless (9); there are lexical exceptions
to both statements.4

Voicing of the initial consonant of suru do following a nasal

a. karon-zuru to esteem, treat lightly
(Martin 1952: 51; formal register, Bill Poser, p.c. June 2002)
cf. karu(o)-si adjective form (Kazutoshi Ohno, p.c., August 2002)
b. omon-zuru to esteem, treat highly
(Martin 1952: 51; formal register, Bill Poser, p.c., June 2002)


Keren Rice

c. kin-zuru
d. uton-zuru
utome. sakin-zuru

(Vance 1986: 139, formal register)
[kin-jiru (normal register)]
is indifferent, neglects
(Martin 1952: 51)
distant, estranged
(Martin 1952: 51)
ahead + n- intensive

No voicing of the initial consonant of suru following a vowel

a. kae-suru
to make change, to exchange (Parker 1939: 115)
b. ai-suru
to love
(Manami Hirayama, p.c.)
c. maru.arai-suru to do all the washing ([all.washing]-do])
(Kazutoshi Ohno, p.c. August 2002)
d. sanpo-suru
to take a walk
(Poser 2002: 3)
e. tatiuti-suru
to cross swords
(Poser 2002: 3)

Martin (1952: 49) comments on compounds with suru, remarking that the
form usually begins with the voiceless obstruent; it is voiced following some
long vowels. Vance (1987: 140) identifies these long vowels as coming from
vowel-nasal sequences. Martin (1952: 50) further notes that the majority of
S[ino] morphs ending in n which occur in this sort of compound are attached
to the alternant -zu.ru rather than -su.ru. Vance (1987: 140) echoes this
observation, pointing out that there are counterexamples to the tendency to
voice following a nasal, but it seems to reflect a very old pattern. The overall tendency with suru, then, is that its initial consonant voices after a nasal,
but not after a vowel. This can be accounted for if these compounds do not
involve the rendaku element, but simply show post-nasal voicing. Either
hypothesis could account for these forms.
Other compounds show similar patterning. The examples in (10) should
be compared with those in (7). (10a, b) illustrate initial voicing of the verb
kiru cut as the second element of a compound after a nasal (10a), but not
after a vowel (10b); (10c) shows the verb tsukeru attach, add after a
vowel. Some of these forms contain the morpheme fum shown in (7), here
in its continuative form fum-i (Vance, this volume). In (10a), in the second
example the second part of the compound is a deverbal noun (Vance 1987:
145); this is also the case in the third example in (10c).

Sequential voicing, postnasal voicing, and Lymans Law revisited

(10) a. voicing after a nasal

fuN-giru give up
mijin-giri mincing
bit +


(It and Mester 1999a: 68)

(Vance 1987: 146)


b. no voicing after a vowel

fumi-kiru to step out of bounds (cf. fuN-giru) (Parker 1939: 39)
kami-kiru to bite off
(Manami Hirayama, personal communication, July 2002)
garasu-kiri glass cutter
(Vance 1987: 146)
glass + kiru cut
c. no voicing after a vowel
(Manami Hirayama, p.c. July 2002)
fumi-tsukeru to trample (something) under foot
cf. fun-dzukeru in (6)
In the forms in (10), the initial consonant of the second element is voiced
when it follows a nasal, but not when it follows a vowel. (Note that Vance
1987 discusses the pair mincing and glass cutter as examples where
accent and the grammatical relationship between the two pieces are the
major determinants in whether post-nasal voicing occurs. Clearly more work
is necessary to sort out the various phonological, syntactic, and semantic
factors involved. Vance (1987: 4041) points out that while a preceding
nasal favoured the development of sequential voicing in Sino-Japanese, the
correlation is not perfect; thus, counterexamples exist in both ways: both
nasal-voiceless obstruent and vowel-voiced obstruent sequences also exist.
See also Ohno (2002) for discussion. As mentioned in note 2, there are
lexical exceptions where the patterns discussed here simply do not hold,
and lexical listing is required.)
Now consider another situation, namely when the second morpheme in
a verb compound begins with a voiceless obstruent, and a voiced obstruent
follows later in the word, i.e., the environment in which Lymans Law
might be expected to operate. Under this condition, the initial consonant of
the second morpheme still voices, as in (11).
(11) a. voicing following a nasal
tie up, immobilize
(Vance 1987: 137; It and Mester 1999a: 68)
b. no voicing following a vowel
kui-s&ibaru clench ones teeth
to eat
(Manami Hirayama, p.c. July 2002)


Keren Rice

In this case, post-nasal voicing occurs despite the presence of the voiced
obstruent later in the word, unlike in (1), where rendaku is blocked in similar
circumstances. (Note that (11) is reported to be the only example of its type
found in Japanese.)
At this point, I have argued the following. First, the rendaku morpheme,
a compound formative, occurs in compounds such as those in (1), but is not
found in the verb compounds in examples like (7) through (11), nor in affixation environments as in (3). When the rendaku element appears, it surfaces so long as its realization does not lead to a violation of Lymans Law
(1d). Because this morpheme does not occur in all morphological concatenations, we can look to verb compounds and affixation structures to see
what the effect of the preceding segment is on the initial consonant of the
second morpheme in the absence of the rendaku element. In this environment (7, 8, 10), post-nasal voicing is found, and this voicing is not blocked
by Lymans Law (11a). These facts are difficult to account for under the
single voicing mechanism hypothesis: why would realization of the voicing
from the rendaku element be blocked by the presence of a voiced obstruent
in the second morpheme (1d), but voicing from a nasal be allowed (11 a)?
The dual mechanism hypothesis renders such forms explicable: the voicing
triggered by the nasal and the voicing of the morpheme-internal obstruent
have different sources, and there is no violation of Lymans Law since
there is but a single laryngeally voiced obstruent present.
To summarize, I have proposed that in studying sequential voicing in
Japanese, one must sort out two things that are often conflated. First, what I
call rendaku is restricted to the voicing triggered by a compound formative.
This compound formative has the feature LV, and the surface implementation of LV is blocked by the presence of a voiced (LV) obstruent later in
the target morpheme. This compound formative is found in noun-noun
compounds (with exceptions, both lexical and principled), but does not
generally occur in compounds headed by a verb, nor in affixation structures. In the non-rendaku environment, nasals generally trigger voicing of
the initial consonant of the second morpheme. Voicing triggered by the
rendaku element interacts with Lymans Law, but post-nasal voicing does
not, creating apparent violations of Lymans Law. My conclusion is that
post-nasal voicing is triggered by SV, while rendaku voicing is LV. See
also Pater (1999: 332334), Rice (1993), Steriade (1995: 185), and Ohno
this volume, among others, for discussion.

Sequential voicing, postnasal voicing, and Lymans Law revisited


3.2. Post-nasal voicing within morphemes

So far we have seen post-nasal voicing between morphemes. It and Mester
(1986, 1995a, 1999a, 2003) and It, Mester, and Padgett (1995) also argue
that postnasal voicing is found within morphemes, and they state this as a
constraint, *NT. Examples are given in (4).
This claim is falsified when one considers the Japanese lexicon as a
whole, as It and Mester (1995a, 1999a) recognize while in many cases a
voiced obstruent follows a nasal (4), there are also nasal-voiceless obstruent
sequences, as in (12).5
(12) N-T (Sino-Japanese vocabulary)

(It and Mester 1995a: 819)

(It and Mester 1995a: 819)
(It and Mester 1999a: 71)
(It and Mester 1999a: 71)

Such words can be found in the rendaku environment, where rendaku occurs.
(13) fuufu + keNka  fuufu-geNka
husband & wife quarrel
onna + teNka  onna-deNka
woman empire

domestic quarrel
(Vance 1987: 114)
petticoat government
(It and Mester 1999a: 70)

It, Mester, and Padgett (1995) set aside this class of words containing NT
sequences, and argue that post-nasal voicing, in the form of a constraint *NT,
is a constraint only within the Yamato vocabulary; with Sino-Japanese (12)
and other vocabulary, NT sequences are allowed. Morphemes such as those
in (12) are Sino-Japanese, so the constraint *NT does not hold.

4. Sequential voicing and within-morpheme post-nasal voicing

The claim that tautomorphemic post-nasal voicing is predictable is problematic for both the single mechanism and dual mechanism hypotheses. If postnasal voicing is due to SV voicing, as under the dual mechanism hypothesis,
one would expect that a post-nasal voiced obstruent would be transparent
with respect to sequential voicing in the presence of the rendaku element,
as in (1b). If, on the other hand, post-nasal voicing is due to redundant LV


Keren Rice

voicing, as under the single mechanism hypothesis, that voicing should not
be present at the time that rendaku voicing takes place. See It and Mester
(1986) and It, Mester, and Padgett (1995) for extensive discussion. Both
hypotheses thus predict that post-nasal voicing should not block voicing
triggered by the rendaku element. However, post-nasal tautomorphemic
voiced obstruents are not transparent with respect to Lymans Law, as one
might expect, but instead are blockers of rendaku, as in (14).
(14) s&irooto + kaNgae  s&irooto-kaNgae
aka + tombo
red dragonfly

laymans idea
(It and Mester 1995a: 576)
red dragonfly
(Kawasaki 1996: 4)

Within-morpheme post-nasal voicing is visible with respect to Lymans Law,

serving to block rendaku, unlike derived environment post-nasal voicing
(11), where Lymans Law does not block voicing of a morpheme-initial
Let me summarize what we have seen so far.
(15) Lymans Law blocks rendaku in noun compounds (1d)
Lymans Law does not block post-nasal voicing in a morphologically derived environment in verb compounds (11) or in affixation
structures (3)
Lymans Law blocks rendaku if the target morpheme contains an
ND sequence (14)
The dual mechanism account given so far, that rendaku involves LV and
post-nasal voicing SV, handles the between-morpheme facts. Morphemeinternal ND sequences are problematic, however, under the assumption that
the voicing on these post-nasal obstruents is predictable and that the voicing is SV, and I now turn to these forms.

5. On the representation of tautomorphemic NC clusters

There are three recent accounts of tautomorphemic NC clusters that treat
postnasal voiced obstruents as underlyingly voiced rather than voiceless,
Avery and Idsardi (2002), Kuroda (2002), and Rice (1993). In Rice (1997),
I argue that post-nasal voicing is not redundant, abandoning It and

Sequential voicing, postnasal voicing, and Lymans Law revisited


Mesters *NT constraint. Instead, I propose that Japanese does not provide
appropriate cues to stratify the lexicon into Yamato and Sino-Japanese
vocabulary based on this constraint, and that voicing, namely LV, is contrastive in post-nasal position. If LV is distinctive in this position, then the
surface facts are as expected: LV blocks Lymans Law and SV is transparent with respect to Lymans Law.7
Avery and Idsardi (2002) pursue the line of thinking that voicing is
contrastive after a nasal. They further examine another constraint proposed
by It and Mester. It and Mester (1995a: 821822) identify two
constraints that are relevant to the representation of nasal-obstruent
clusters. First is the now familiar *NT, and second is a constraint against
voiced geminates, *DD. They link these constraints with stratification as in
(16) Yamato
Avery and Idsardi (2002) pick up on the constraint *DD and propose that
the underlying clusters allowed in the Yamato vocabulary are the following:
(17) TT


TT clusters are realized as voiceless geminates, and DD clusters are realized

as prenasalized stops in standard Japanese. This account explains the lack
of contrast between DD and ND clusters and at the same time allows a uniform account of the interaction between ND clusters and rendaku: voicing
is distinctive rather than redundant on this D, and thus serves to trigger
Lymans Law, blocking rendaku. (See also Hamada (1952) and Ohno (this
volume), among others, for discussion of the historical development of
Japanese. Ohno, like Avery and Idsardi, argues that what I have called postnasal voicing is historically pre-voiced obstruent nasalization.)
By this account then, the Yamato and Sino-Japanese vocabulary allow
the following underlying clusters.8
(18) Yamato




As Avery and Idsardi point out, the difference between the lexical strata
concerns the distribution of N before a consonant.


Keren Rice

I do not try to decide between the two accounts here; critically in both cases
post-nasal obstruent voicing is non-contrastive between morphemes but
contrastive morpheme internally. Instead I turn next to the issue of stratification between the Yamato and the Sino-Japanese vocabulary with respect
to the constraint *NT. These two accounts converge on the point that voicing is distinctive in tautomorphemic clusters. Either treatment accounts for
the range of patterns found in the language, as summarized in (19).
(19) observation: Lymans Law blocks rendaku when a singleton voiced
obstruent follows (1d)
Lymans Law [LV from the rendaku morpheme cannot be realized because of following LV]
observation: Lymans Law does not block rendaku when a nasal
follows (1c)
nasals have SV [LV from the rendaku morpheme can
be realized because no LV segment follows]
observation: Lymans Law does not block post-nasal voicing between morphemes [derived environment post-nasal
voicing in a non-rendaku environment] (11)
Derived environment post-nasal voicing is marked
by SV, and thus Lymans Law is not violated [realization of SV from nasal is not blocked by later LV]
observation: Lymans Law blocks rendaku when tautomorphemic
ND follows (14)
Voicing is distinctive in tautomorphemic post-nasal
obstruents [LV from the rendaku morpheme cannot
be realized because of following LV]
6. On stratification in the Japanese lexicon
While stratification and post-nasal voicing are logically independent of one
another, it is nevertheless worth pursuing whether the stratification analysis,
that NT holds of Yamato but not of Sino-Japanese vocabulary, is reasonable to maintain. In this section I examine why one might choose to abandon
stratification with respect to the properties discussed here.
That the Japanese lexicon is stratified is a general assumption in the
literature. Martin (1952) divides the lexicon into three groups, Native,
Sino-Japanese and Onomatopoeia, and Foreign, as does McCawley (1968).

Sequential voicing, postnasal voicing, and Lymans Law revisited


Martin (1952: 9), in an interesting discussion about his purpose, states that
this is the first attempt to make a systematic study of Japanese morphophonemics on a synchronic level. He argues that a study of compounds
shows a definite cleavage of morphs into two classes, here called class S
(for Sino-Japanese, the historical original of the class) and class Y (for
Yamato, or native Japanese, the presumed original of most members of the
class). There are numerous hybrid compounds, to be sure; but on the basis
of selectivity within immediate constituents which contain only two morphs,
for the overwhelming majority of cases, each morph and morph group may
be placed clearly in one of the two classes (24). In a comparison between
Native and Sino-Japanese morphemes, McCawley (1968: 64) states about
the Sino-Japanese morphemes that they are borrowed from Chinese in
medieval times and which function in Japanese chiefly as elements of compounds which usually have a somewhat learned flavor; their role in Japanese
is much like that of the Latin and Greek morphemes found in the learned
vocabulary of English. Since Sino-Japanese morphemes are syntactically
distinct from the other morphemes of Japanese in that they and only they
are the bound morphemes from which two-element compounds such as
they above are formed, the syntactic information in the dictionary entry of a
Japanese morpheme must indicate (directly or indirectly) whether the morpheme is Sino-Japanese or not. McCawley shows that Sino-Japanese and
native morphemes have a slightly different vowel inventory (Cyu and Cyo
are excluded in the native vocabulary but not in the Sino-Japanese vocabulary); Sino-Japanese items obligatorily have no fewer than two nor more
than four mora, as also discussed by It and Mester (1995a, 2003). While
McCawley differentiates Sino-Japanese and Native vocabularies, he has a
number of rules marked [-foreign], including Native and Sino-Japanese
together, but only one marked [+native] (restrictions on diphthongs) and
one marked [+Sino] (a deletion/epenthesis rule).
Looking at the distribution of obstruent voicing following a nasal, the
following divisions into strata then are proposed by It and Mester (1995a,
post-nasal voicing between morphemes
post-nasal voicing within morphemes



I consider three issues, learnability, a comparison with the English Germanic/

Latinate split (cited by McCawley 1968; It and Mester 1995a, 1996, 1999a),
and the writing system.


Keren Rice

6.1. Learnability issues

So far we have seen the following. In the surface lexicon of Japanese, NT
and ND contrast tautomorphemically; heteromorphemically ND is generally found (see Martin 1952 and Vance 1987 for discussion of exceptions)
except in the rendaku environment where Lymans Law blocks it (hyootan
kago gourd basket It and Mester 1986: 6170).9 Tautomorphemic ND
clusters block sequential voicing triggered by the rendaku element (14),
while NT clusters allow sequential voicing triggered by this element (13).
In this section, I consider these facts with respect to the hypothesis that the
lexicon of Japanese is stratified on the basis of the constraint *NT which
holds of the Yamato vocabulary. In Rice (1997), I question the stratification hypothesis with respect to post-nasal voicing, raising the question of
how a child would come to stratify the lexicon based on exposure. I ask
why, when a child hears ND and NT both, s/he would choose to place these
lexical items in different parts of the lexicon rather than abandon the generalization that Yamato post-nasal obstruents are voiced in one portion of the
vocabulary. A parallel problem exists in English, one that we might term
the font-fond problem: why is it not proposed for English that these terms
occupy different strata rather than showing a post-nasal contrast? This then
is the learnability issue what would cause the learner to place tautomorphemic NT and ND clusters in different strata?10

6.2. Comparison with the Germanic/Latinate split in English

McCawley (1968) and It and Mester (1995a, 1999a, 2003), in arguing for
stratification, cite the well-established tradition in Japanese of distinguishing
native, Sino-Japanese, other foreign, and mimetic vocabulary. They compare
Japanese with English, referring to the Germanic-Latinate split in English.
Let us look more carefully at the comparison between English and Japanese
with respect to the criteria used to establish strata.
First consider English. The Latinate/Germanic vocabularies in English
are differentiated in several phonological ways. First consider Latinate vocabulary such as divine, sane, and obscene, the set of words that participate
in Trisyllabic Laxing. Notice that there is nothing inherent in these lexical
items themselves that tells us that they deserve a special marking in the
lexicon, rather it is their patterning under affixation. So, for instance, sane
takes the nominalizing suffix -ity, and its vowel laxes in the presence of this

Sequential voicing, postnasal voicing, and Lymans Law revisited


suffix; vain is parallel in its patterning. The very similar-sounding adjective

plain, on the other hand, takes the suffix -ness, and the vowel of this stem
does not lax. It is thus the interaction of these stems with affixes that allows
us to divide them into two classes; knowledge of the adjective alone does
not allow them to be divided into two groups. Similar are Latinate verbs
such as permit and resign. These verbs are grouped together into a class
based on phonological properties such as unexpected stress assignment
(e.g. permt vs. dit), consonantal shifts under suffixation (e.g., permit/permission, remit/remission), and the presence of /s/-voicing intervocalically
(consign vs. resign); see, for instance, Kiparsky 1985. It is the different
patterning with respect to the grammar that allows one to distinguish classes
of lexical items.
Now consider Japanese. Stem-internal voiced obstruents, whether singletons or post-nasal, pattern in an identical way with respect to rendaku
they block its application. There is thus no phonological patterning that
allows one to distinguish between two classes of ND clusters, those that
contrast with NT and those that do not. Turning to post-nasal voicing in the
non-rendaku environment, morpheme-initial obstruents in this environment
are different. Alternations show that these are lexically voiceless (e.g., the
suffixes begin with T in most environments and with D only following a
nasal; the stems subject to post-nasal voicing begin with a voiceless obstruent when they are in a non-nasal environment), and they are voiced by postnasal voicing. But this voicing is not blocked by Lymans Law. Thus we
can distinguish two types of surface ND clusters, the derived environment
clusters where the voicing is predictable and the within-morpheme clusters
where the voicing is contrastive. Non-rendaku derived environment voiced
obstruents also distinguish themselves by their failure to participate in
Lymans Law, again showing that their voicing is of a different type.

6.3. On the status of alternations

It and Mester (1999a, 2003) and It, Mester, and Padgett (1999) remark
that Rice (1997) incorrectly assumes that there are no alternations associated with the constraints that are involved in lexical stratification. For the
purposes of the Yamato and Sino-Japanese strata, the only constraint that
differentiates them is *NT. As argued in Rice (1993) and summarized in
section 4, the derived environment effects are a result of SV and thus are
not relevant to the question here.


Keren Rice

Again a comparison with English is in order. In English morphemes, NT

and ND are contrastive, as in items such as font-fond, ant-and, brant-brand,
pint-kind, grant-grand, tent-tend. However, in a derived environment (past
tense), only ND occurs (e.g., fanned, canned, banned, pined). One would
not claim that words in English with ND always have predictable voicing
on the D; rather within morpheme post-nasal obstruent voicing is unpredictable while between morphemes, post-nasal obstruent voicing is predictable.

6.4. Writing system

It is sometimes suggested that the writing system of Japanese aids in stratification. For instance, It and Mester (1999a: 63) point out that this stratification corresponds in kind to the distinction in English between the Germanic versus the Latinate vocabulary, but is more accessible and conscious
to the non-specialists because of its reflection in the writing system. The
orthography of Japanese is complex, using three different systems. As Kess
and Miyamoto (1999: 14) discuss, there is no strict one-to-one correspondence between type of vocabulary item and script type, although one
usually sees Chinese borrowings in kanji characters, native Japanese content words in kanji or hiragana, native Japanese function words in hiragana,
and the borrowings from other languages in katakana. See also It and
Mester (1999a), Kess and Miyamoto (1999), and Vance (1987), among
others, for discussion and further references.
Let us examine the assumption that the writing system of Japanese aids
in stratification in more detail. The following discussion summarizes Vance
(1987: 23). Of the three types of writing systems found in Japanese, two
are relevant to the distinction between the Yamato and Sino-Japanese
strata, namely kanji and hiragana. Kanji, the Chinese characters, are used
for Sino-Japanese morphemes and for some Yamato morphemes as well.
Hiragana is a syllabic system developed from a small set of simplified
kanji. In modern Japanese orthography, grammatical endings are generally
written in hiragana while noun, verb, and adjective stems are written in
kanji. Native and Sino-Japanese morphemes for which kanji are no longer
in general use are commonly written in hiragana. Childrens books ordinarily use hiragana for native and Sino-Japanese morphemes for which the
readers are not likely to know the kanji (children learn slightly under ninety
kanji in the first year of school; Poser, personal communication, June 2002).
Recent borrowings, not discussed in this article, are generally written in

Sequential voicing, postnasal voicing, and Lymans Law revisited


katakana. Thus a typical text contains both kanji and kana; see Kess and
Miyamoto (1999) for detailed discussion.
Many words of Sino-Japanese origin are written with two kanji. This
might suggest that two morphemes are actually involved, and that the use
of two kanji provides orthographic evidence that *NT holds of the Yamato
stratum but not of the Sino-Japanese stratum. Martin (1952) can be viewed
as providing evidence for the position that the Sino-Japanese forms are
morphologically complex: he points out that morphs in Japanese are limited
to certain shapes, and that n.C sequences (where the dot represents a morpheme boundary) are nearly always indicative of morph boundaries (17).
Vance (1996: 23), on the other hand, suggests that many Sino-Japanese
words written with two kanji probably should not be analyzed as consisting
of two morphemes. He further notes that kana spelling provides a clear
indication in some cases that an etymological compound is no longer recognized as a compound (27). In discussion of kanji, Kess and Miyamoto
(1999: 68) point out that compound kanji are often used for common vocabulary items in literary Japanese, and that many two-kanji compound
words are stored and accessed as whole word units (6869).
Based on the studies of the writing system cited above, it appears that
orthography is not necessarily a useful tool to demarcate the Yamato and
Sino-Japanese vocabularies. First, both strata employ kanji and hiragana.
Second, discussion in Kess and Miyamoto (1999) and Vance (1996) suggests that etymological compounds are not necessarily analyzable as compounds synchronically. While the writing system allows recent borrowings
to be identified through the use of katakana, it is not necessarily helpful in
sorting out the Yamato and Sino-Japanese strata.

7. Conclusion
In this article I have made three points. First, I have argued that not all
compounds take the rendaku element, and that post-nasal voicing can be
studied best outside of the rendaku environment. Second, I have added
support to the position that two voicing mechanisms are found in Japanese,
LV and SV. Post-nasal voicing is contrastive within a morpheme, marked
by LV, and it is predictable between morphemes, marked by SV, in the nonrendaku environment. The contemporaneous inclusion of tautomorphemic
post-nasal voiced stops in Lymans Law in the rendaku context and the
failure of derived environment post-nasal voicing to be blocked by Lymans


Keren Rice

Law in the non-rendaku environment provides evidence for this claim. Third,
I have argued that NC sequences provide little evidence for stratifying the
Japanese lexicon into Yamato and Sino-Japanese vocabulary. The kinds of
alternations that one would hope to find to distinguish the two strata with
respect to this criterion do not appear to be available. Stratification with
respect to post-nasal voicing seems to be tangential as no approprite alternations exist to trigger the placement of words in different strata. One certainly does not want to deny the possibility of stratification in grammar.
Within-morpheme requirements, without the benefit of alternations, are not
clear evidence for stratification, however, as there is no evidence available
to the learner for making the morpheme other than what it appears to be.

Thank you to Bill Poser for helpful discussion of the Japanese facts discussed in this article. I could not have completed this work without his assistance. Thank you also to Kazutoshi Ohno for detailed comments and
discussion on an earlier draft, to an anonymous reviewer, and to Manami
Hirayama for help with the data. Misunderstandings are my own.

1. Additional strata are argued for, mimetic and foreign. See It and Mester
1995a and It and Mester 1999a for recent work that deals explicitly with this
classification, and Martin 1952 and McCawley 1968 for foundational work in
English. See Vance 1987 for a discussion of some of the older literature.
2. There are many lexical exceptions to the processes discussed in this article;
see, for instance, Martin 1952, Vance 1987, Labrune 1999, Ohno 2002, and
Kubozono this volume for discussion. For instance, some words always undergo rendaku as the second element of a compound, some never undergo rendaku (e.g., tuti soil, ground, himo string), and some are variable, undergoing rendaku in some but not all compounds (e.g., hune boat); see Ohno 2002
and others for discussion. Some exceptions can be accounted for by
phonological, syntactic, and semantic factors; see Lyman 1894 and Ogura
1910 (cited in Martin 1952: 49) as well as Otsu 1980, It and Mester 1986,
Vance 1987, Labrune 1999 and Kubozono this volume, among others, for dif-

Sequential voicing, postnasal voicing, and Lymans Law revisited







ferent perspectives. In examining these processes, it is necessary to sort out

what is predictable from what is listed; I concentrate here on what is predictable, and am not concerned with words that are generally considered to be
lexical exceptions.
In testing this prediction, I do not consider an environment of the following
type. A morpheme-initial obstruent is voiced by post-nasal voicing (as in (3)).
This form is then put in the rendaku environment. What happens? Such cases
perhaps do not exist. Even if they do, Lymans Law is usually believed to have
as its domain a single morpheme. In addition, other structural factors can block
rendaku (see Otsu 1980, It and Mester 1986), perhaps rendering any findings
uninterpretable as support for one position or another.
This morpheme often occurs in an alternative form, jiru, which Martin 1952:
52 identifies as more colloquial.
It and Mester 1986: 69 treat these Sino-Japanese words as compounds
(sam+po, *sam+bo stroll, han+tai *han+dai opposition), as does Martin 1952
and McCawley 1968. In It and Mester 1995a: 819, these words are written
without a morpheme boundary, and are not treated as bimorphemic. Vance
1996 remarks that many Sino-Japanese items that are written with two kanji
probably should not be analyzed as consisting of two morphemes, and provides evidence for this claim. See section 5.4 and Vance 1996 for discussion.
It, Mester, and Padgett 1995 present an Optimality Theory solution to this
problem. See especially Pater 1999 for discussion of problems with the details
of their account.
A reviewer raises the interesting question of whether the post-nasal voiced obstruents that are LV (morpheme-internal) and the post-nasal voiced obstruents
that are SV (following a nasal in another morpheme) are phonetically identical.
There are preliminary indications that there are some differences between
them; see Avery and Idsardi 2002 for discussion.
Note that in recent borrowings, DD clusters are also found; see It and Mester
1995a, 1999a and Kuroda 2002.
I am assuming, following Vance 1996, that these Sino-Japanese items consist
of a single morpheme. See the discussion in section 5.4 on writing.
Vance 1996: 26 notes As Varden (1994) points out, typical Japanese children
have already acquired many Sino-Japanese binoms before they learn to read,
but their vocabularies are unlikely to provide much basis for further analyzing
these words. In many cases, of course, the relationship between the meaning of
a Sino-Japanese binom and the meanings of its constituent morphemes is
opaque even to an educated adult.

diachronic developments in the writing system
Kazutoshi Ohno



One issue regarding voicing in Japanese concerns the sei-daku (lit. clearmuddy1) distinction, which correlates with the voicing opposition in contemporary Japanese. Various aspects of the sei-daku distinction are represented
in the history of the writing system. This article presents the historical development of this distinction within Japanese orthography and comments
on the nature of such distinctions. This article, therefore, chiefly presents
the facts of sei-daku as represented in the writing system, and introduces
prior proposals that attempt to account for the inconsistent representation of
this phenomenon in the orthographic history. It provides neither new data
nor new findings.
The contents are as follows: Section 2 illustrates the three diachronic
stages of the sei-daku distinction in writing (distinguished, not distinguished, and distinguished again); Section 3 introduces hypotheses that
explain the transitions of the three stages; Section 4 displays one of the
important remaining issues: sei-daku and nasality; finally, Section 5 concludes with some discussion points.


Sei-daku in writing systems

2.1. Issue 2
In the current usage of the kana syllabary (hiragana or katakana), sei-daku
is distinguished by the addition of two dots just to the top right corner of a
given kana (see Appendix). These dots are called daku-ten (ten dot),
which change a sei-on (on sound) character to a daku-on character. That
is, daku-on are not presented by independent kana characters, but rather are
created by adding a diacritic to a sei-on character. This convention is due to

48 Kazutoshi Ohno
the fact that hiragana and katakana materialized as systems without a seidaku distinction. One kana could represent either sei-on or daku-on. The
source of hiragana or katakana, manyougana (see section 2.2 below for
details), actually had daku-on characters, and the sei-daku distinction was
quite well distinguished by different characters at some point in the past.
In terms of the writing conventions, then, the diachronic transition of the
sei-daku distinction can be roughly divided into three stages as given in (1)
below (see sections 2.2 through 2.4 below for further details and clarification).

Sei-daku in kana system

sei-daku distinction
Earliest Stage:
yes, by different characters
Middle Stage:
Current Stage:
yes, by diacritic [daku-ten]

hiragana, katakana
hiragana, katakana

The two periods in which sei-daku was distinguished within the kana systems
are interrupted by a period in which sei-daku was not distinguished in the
writing system. This is a fact of the history of kana usage.
Explanations for this fact will be largely different depending on whether
we regard it as a reflection of actual spoken language or not. We will address
this issue in section 3. In the remaining part of section 2, we will discuss
the diachronic development of the kana systems in Japanese in more detail.
2.2. Earliest Stage: manyougana
Chinese characters in Japan, or kanji, were (and still are) read in two ways:
the Chinese way (on reading), and the Japanese way (kun reading).3 For
example, the Chinese character for four could be read either as si (on
reading) or as yo (kun reading).4 These readings were utilized to dictate
Japanese pronunciation. Such Chinese characters, which present sound information rather than logographic information, are called manyougana
(lit.) kana used in Manyoushuu because their use is most diversified in
Manyoushuu (see below).5
The earliest written works in Japanese can be traced back to the eighth
century, or the Nara Era [710784], represented by writings such as Kojiki
(712), Nihonshoki (720), and Manyoushuu (759?).6 Some parts of these
official documents or collections are written in manyougana.7 For example,
waka (Japanese traditional songs) were usually written in manyougana.

Sei-daku: diachronic developments in the writing system


The following is an example of manyougana use in these works. In order

to represent the native vocabulary kamo (admiration marker), it may be
written with the kanji for duck whose kun reading is kamo, or with two
kanji for ka-mo. In the latter case, a kanji is chosen from multiple candidates whose on or kun reading is ka, and another is chosen for mo. The
choice of kanji is totally dependent on the author and varies among authors
and even within an authors work. Consequently, the same pronunciation
could be represented by different kanji,8 while the same kanji could be read
differently such as the kanji for four (si or yo) mentioned above.9
Studies of manyougana reveal that there had been separate characters
used specifically for sei-on and specifically for daku-on in this period.10,11
Modern studies by Kasuga (1941), no (19478, 1953), Nishimiya (1960),
Tsuru (1960), among others, confirm that sei-daku in those ancient works
cited above was generally distinguished by different manyougana.12 There
is little room to doubt the existence of the sei-daku distinction in this body
of literature.


Middle Stage: development of hiragana/katakana

2.3.1. Simplification of the kana system

The use of manyougana made it possible to dictate the Japanese language.
As the use of manyougana spread, simplification of the kana system was
set forward in two respects: first, this was achieved by the creation of a
one-to-one correspondence between each kana and sound, i.e. one kana
represents one sound, and one sound is represented by one kana. It is nothing but redundant for a sound-based writing system (syllabary) to have
multiple different characters for a single sound (syllable) or multiple ways
of representing a sound,13 and incomplete to use the same kana for different
sounds. Secondly, this was achieved by simplifying the characters themselves. Manyougana, or full Chinese characters, were unnecessarily complex for the purpose of representing each Japanese sound.14 A cursive or
partial (or mixture of cursive and partial) representation of manyougana
was thus employed.15
Hiragana and katakana were developed through such simplification
processes. Interestingly, the two kana systems were established without the
sei-daku distinction by characters, despite the fact that sei-daku had been
distinguished by using different manyougana previously. We will see how

50 Kazutoshi Ohno
this occurred below, but before moving on, two things must be kept in
mind with regard to the development of hiragana and katakana. One is that
the development was gradual. The other is that the sei-daku distinction was
not associated with the writing systems, i.e. it is not accurate to say
manyougana had the sei-daku distinction, while hiragana and katakana

2.3.2. Simplified kana in chaos

In the early Heian Era, or during approximately the first 100 years of the
Heian Era [7941192], the kana system was rather chaotic in a sense because the simplification processes noted above were just beginning to be
implemented: the system was not yet standardized.
In this period, a single sound was still represented by multiple different
kana. However, the number of kana for a given sound was rapidly reduced
after Manyoushuu (759?), but the system was not a one-to-one system.
Moreover, the degree of simplification of characters varied. Some simplified characters were as simple as hiragana or katakana currently used, or
simpler, but some were as complex as full kanji, and yet others were
somewhere in between.16 One manyougana (kanji) could be seen in various
degrees of simplified forms.
The choice of which kana represented a given sound, as well as how it
was written (cursive or partial) and how much the kana was simplified
(from hiragana/katakana-like to kanji-like) was determined by the author
and the type of work being written. Many variants of simplified characters
were created. Some were repeatedly used (and would eventually develop
into hiragana or katakana), while many were merely forgotten. It is important to note that simplified characters in this time period were not exactly
separated into the distinction of hiragana and katakana yet.17
In this chaotic period, some simplified characters for daku-on were actually used ( tsubo 1977: 257). This was a natural transition from the period
in which sei-daku was distinguished by different manyougana. However,
those simplified kana for daku-on were eventually abandoned, and hiragana
and katakana were developed without having kana for daku-on later on.

Sei-daku: diachronic developments in the writing system


2.3.3. Sei-daku confusion in manyougana

The orthographic sei-daku distinction, which surely existed at some time,
rapidly disappeared in this chaotic period. Generally speaking, sei-daku is
not distinguished in the literature in and after the ninth century (Hamada
1971: 44; Nakata and Tsukishima 1980: 586, etc.). Tsuru (1977: 238) discusses the decline of the number of manyougana used only for daku-on
(daku-on senyougana).

Manyougana used only for daku-on18

(Tsuru 1977: 238)
a. Shokunihongi
(797)12[9]: ga, g, za, za, zi, zi, zu, z, di, di, d, b
b. Nihonkouki
(840) 6 [6]: ga, g, gu, zu, ze, b
c. Shokunihonkouki (869) 3 [3]: ga, g, b

The sei-daku distinction is rather confused in Shokunihongi (797), which

retains 12 manyougana for daku-on (for 9 different sounds). Nihonkouki
(840) retains 6 manyougana for daku-on. The sei-daku distinction is rarely
seen in Shokunihonkouki (869), which retains just 3 manyougana for dakuon, and which are used only in a particular volume.19
What the statement the sei-daku distinction is [rather] confused means
is that manyougana (or its simplified variant) previously used only for sei-on
is used where daku-on is expected, and manyougana previously used only
for daku-on is used where sei-on is expected. In other words, the same
manyougana character began to be used to represent both sei-on and dakuon. This confusion, or lack of distinction, can in fact already be seen in
the newer volumes of Manyoushuu to a large degree, and in other earlier
literature of the eighth century to varying degree.
It is worth noting that some manyougana widely used for both sei-on
and daku-on in Nihonkouki and Shokunihonkouki are the matrices of hiragana or katakana (Tsuru 1977: 238, who lists 12 such manyougana).

2.3.4. Transition to hiragana/katakana

Eventually, hiragana and katakana developed into two separate writing systems.20 Roughly speaking, cursive characters rapidly developed into hiragana
after the chaotic period, i.e. after around 900, while partial characters evolved
into katakana in or a little before the second half of the Insei Period [1068
1221] ( tsubo 1977: 257264) both without a character for daku-on.

52 Kazutoshi Ohno
To summarize, it is clear that the systems of manyougana and hiragana/
katakana represent a continuum, illustrating that the manyougana system
gradually shifted to hiragana/katakana. The loss of the sei-daku distinction
in writing can already be seen in the manyougana system (see 2.3.3) and
yet preservation of some daku-on characters in simplified forms were established, if only temporarily (see 2.3.2). It is more natural to assume that
the tendency not to keep the orthographic sei-daku distinction became
stronger and stronger regardless of the kana system between the Earliest
Stage and the Middle Stage. The transition from manyougana to hiragana/
katakana happened to overlap with this tendency. It is thus perhaps no surprise that hiragana/katakana developed without the sei-daku distinction.

2.4. Current Stage: diacritic for daku-on

In the current use of hiragana and katakana, sei-daku is presented and recognized by the absence or appearance of a two-dot diacritic, or daku-ten.
However, daku-on have not been consistently represented in the writing system until quite recently, though the history of diacritics for daku-on is long.
The sei-daku distinction in literature began declining rapidly in the ninth
century (cf. 2.3.3 above). Yet there was still a need for the sei-daku distinction, e.g. when describing the precise pronunciation of Chinese characters.
Chinese dictionaries, commentaries on Chinese literature, commentaries on
Buddhist scriptures brought from China, and so forth, all demanded a seidaku distinction. Hence, some texts were written in manyougana allowing
for the sei-daku distinction, even when the simplified kana (which have no
daku-on characters) were being widely used.21
The most popular means to represent sei-daku, however, was to mark
the character with some diacritic symbol. Nakata and Tsukishima (1980:
586) give a brief history of the development of daku-ten, on which the following discussion draws. The use of a daku-on marker can be traced back
to as early as the end of the ninth century, which means that certain Chinese
characters must be read as daku-on.22 Diacritical symbols for daku-on on
katakana began to be used in the 11th century or the end of the previous
century, but they were normally added to the left of kana. The diacritic for
daku-on began to be placed in the top right corner in the 14th century.23
At first, the use of diacritical symbols was actually not limited to the
representation of daku-on. Some symbols, especially the dots or circles on
the left of a given kana, overlap in function with accent [tone] marking.24

Sei-daku: diachronic developments in the writing system


Some were also used to represent nasal sound. Placed to the (top) right of a
kana, they could be distinguished from tone marks, i.e. function as daku-on
markers (Komatsu 1981: 6371).
The use of daku-on markers was fairly established and gradually spread
to other fields than those directly related to Chinese in the Muromachi Era
[13381573], but they were not popularized yet. In the first half of the Edo
Era [16001867], the diacritic was unified to the two dot form on the top
right corner (i.e. same as daku-ten currently used) and the appearance of the
diacritic increased greatly (Ono 1995: 80). The early Edo Era, therefore, is
often considered to be the era in which stabilization of the sei-daku distinction in writing (by daku-ten) occurred. However, even in the Edo Era,
daku-on were not consistently marked by daku-ten.25 Generally speaking,
daku-ten was added only when the author thought it necessary. Hence, kana
with daku-ten would be daku-on, but kana without daku-ten could be sei-on
or daku-on. Not as common as daku-ten, fudaku-ten (fu not) was sometimes added to represent sei-on in the Edo Era (Komatsu 1981: 71).26 It will
safely be said that kana was still common to both sei-on and daku-on.
The use of daku-ten was finally incorporated into the modern education
system in the Meiji Era [18681912]. Even so, some documents were still
written without daku-ten (Maruyama 1967: 1122).27 Official (Governmental)
documents, such as in law or in regards to the constitution, are written
without using daku-ten, and the style continued until the end of the World
War II (Kamei 1970: 4445). The rigid sei-daku distinction by daku-ten,
then, is much more recent than people normally think.


Interpretations of the three stages

3.1. Two approaches 28

Although the transitions are gradual, the three stages given in (1) sei-daku
distinction by characters, no sei-daku distinction, and sei-daku distinction by
diacritic are surely observed. Two types of approaches will be available
to address this issue.
The first approach hypothesizes that the stage transitions, or varying
degrees of orthographic representation, are reflections of the actual language.
That is, sei-daku forms were recognized as distinct at first, but indistinct
later, and currently they are considered distinct once again. The question
here is: why was the distinction once made, then no longer existed, and then

54 Kazutoshi Ohno
seemingly reappeared? Under this assumption, therefore, it must be explained how and why such changes occurred, including changes of the
sound values of daku-on and/or sei-on.
The second approach hypothesizes that the stage transitions merely reflect the facts of writing. Under this hypothesis, it becomes reasonable to
claim that the recognition of the sound values of sei-daku remained the
same throughout the history of Japanese. The question here is: why such
different writing conventions, in terms of sei-daku, were adopted?
In the remainder of section 3, we will discuss two proposals along the
lines of the second approach (sections 3.2 and 3.3), and one along the lines
of the first approach (section 3.4). In these subsections, we focus on the
transition from the Earliest Stage to the Middle Stage, since the explanation
of this transition is the key for each proposal. Finally, we will discuss the
transition from the Middle Stage to the Current Stage (section 3.5).
3.2. Second approach 1: sei-daku has been distinctive
The second approach introduced above hypothesizes that the sei-daku distinction, or the lack of this distinction, is merely a matter of writing practice. This approach can be further divided into two positions, depending on
whether or not we assume that sei-daku has remained distinctive throughout
the history of Japanese. In 3.2, we will discuss the first position that was
adopted within this second approach, i.e. sei-daku has been phonologically
distinctive but the distinction was not reflected in writing (in the Middle
This assumption must be accompanied by a satisfactory account of the
question mentioned in 3.1 above why different writing conventions were
adopted? More precisely, the following question must be answered: Why is
there a stage in which sei-daku was not distinguished in writing if it was
distinctive in the [spoken] language? A possible answer to the question is:
Because sei-daku was rarely contrasted for the purpose of interpretation
(even if contrasted in pronunciation), the distinction was simply ignored in
writing conventions, as seen in Takagi et al. (annotated) (1960: 4246),
Anonymous (1963: 375388), etc. As long as the sei-daku distinction seldom triggered semantic confusion, it did not have to be reflected in the
writing system (e.g. hasituma and hasiduma would be the same word
meaning loving wife; naturally context played a role as well).
The additional explanation above would explain the Current Stage as
well because sei-daku began to trigger semantic confusion, the distinction

Sei-daku: diachronic developments in the writing system


was employed within the writing convention. However, it does not explain
why sei-daku was relatively well-distinguished in the earliest literature.
Hence, further explanation is added as follows (cf. Takagi et al. (annotated)
1960: 4344): Manyougana were used to represent precise pronunciation,
while hiragana and katakana were created to represent sound conveniently
(simply and quickly). That is, the appearance and the loss of the sei-daku
distinction in writing resulted from the different functions of manyougana
and hiragana/katakana. Manyougana would distinguish hasituma from
hasiduma because they are pronounced differently, while hiragana and katakana would not because they are the same word for loving wife.29 For
convenience, it would have been better to represent hasituma and hasiduma
together, i.e. with a character that represents both sei-on and daku-on (tu/du).
After all, the main claim of this position is that hiragana and katakana
were established without the sei-daku distinction because the users felt it
most convenient for their writing systems. This claim itself is quite reasonable, if we do not assume an association between the kana system and the
sei-daku distinction.
There remains an important question, which is not detrimental to the
claim made above. Would people really give up the sei-daku distinction
just for convenience in writing despite the fact that they were well aware of
the distinction? If people hear and pronounce two sounds distinctly, will it
seem natural to describe them in different ways?30 In order to answer this
question, or avoid answering it, some have sought to explain the three stage
transitions without assuming that sei-daku was distinctive.
3.3. Second approach 2: sei-daku was indistinctive
The second approach does not necessarily assume that sei-daku was distinctive throughout the history of Japanese. Some researchers take a radical
position by assuming that there was no sei-daku distinction in the past at
all, but many simply assume that generally the auditory distinction was
extremely hard and thus confused quite often in the writing system, even
though the distinction existed in speech. They assume that sei-daku was
actually indistinctive in the past anyway, and gradually became distinctive
later (in the Current Stage). In order to justify this assumption, it must be
explained why sei-daku was relatively well-distinguished in the earliest
Hamada (1960, 1971) proposes that the knowledge or skills of the
authors or editors made the distinction possible. That is, exceptional writers

56 Kazutoshi Ohno
could distinguish sei-daku both by hearing and in writing. One of the reasons
for this is that they are assumed to have been familiar with Chinese (characters, language, and literature)31, so that they could utilize manyougana for
distinct phonetic phenomena. As discussed in section 2.4 above, the rigid
sei-daku distinction was generally required in reading Chinese. Hence, being
familiar with Chinese would have resulted in being able to distinguish seidaku.
There are several pieces of evidence to support Hamadas hypothesis.
Let us discuss two of them. First, even in the eighth century, in which seidaku was well-distinguished in official documents, sei-daku was not distinguished in private documents and/or by lower-class people (see also Kamei
1970, 1985: 228). For example, the two personal letters of Shousouin kana
monjo (762?)32 do not show sei-daku distinctions at all in their manyougana use. Second, Shinsen jikyou, a set of the oldest existing ChineseJapanese dictionaries, is relatively rigid in the sei-daku distinction, though
it was written in 892.33 This is easily understood if we assume that the distinction was the result of an educated editor writing for educational purposes.34
Because there were people who could distinguish sei-daku, it is natural
to assume that sei-on and daku-on would have been pronounced differently.
However, it is possible that common people did not pay attention to the
sei-daku distinction. There appears room to suspect that perhaps they would
have been even unaware of such distinction; similar to the nasal alternations in contemporary Japanese.35 Assuming so does not require any correction of the argument presented above; rather, it accounts for things more
Let us note two ways in which this is so. First, it explains why hiragana
and katakana were established without the sei-daku distinction within their
systems. This would have been because people generally had trouble in
distinguishing sei-daku. Originally, manyougana was used by the elite, who
could distinguish sei-daku. As manyougana were popularized, those who
were not educated enough to distinguish sei-daku started using manyougana
or their simplified forms, confusing sei-daku. Hiragana and katakana were
not established or issued by one particular person, institute, or authority, at
some particular moment but by various people, over time. Second, even
the writers of the earliest literature, who could distinguish sei-daku, sporadically confuse sei-daku in writing. This might have been due to the fact
that the pronunciation of sei-daku in that period was rather more indistinctive than is currently believed.

Sei-daku: diachronic developments in the writing system


3.4. First approach (sei-daku revived)

The first approach hypothesizes that the various transitions seen in the writing
conventions reflect phonological perceptions of the spoken language. That is,
the sei-daku distinction existed but disappeared later, and then was revived
in the spoken language. End (1989) is one of the rare scholars who try to
justify this approach.36 His argument does not seem to be well-supported,
but is worth discussing because parts of his argument may lead to a fuller
understanding of this phenomenon.
End (1989) begins his argument by questioning Hamadas account of
the sei-daku distinction by the elite (see section 3.3 above). End points
out that kakekotoba37 pun(s) which ignore the sei-daku distinction are
regularly found in the Middle Stage,38 while such kakekotoba are rarely
seen in the Earliest Stage. The authors of the kakekotoba verse are supposed to be similar in social class (e.g. aristocrats) in both stages, so the
sei-daku (in)distinctiveness may not actually be due to educational backgrounds. He further points out that kakekotoba, or verse in general, are not
visually but verbally appreciated. That is, the sei-daku (in)distinctions in
kakekotoba are in reality reflections of the spoken language. Here the possibility arises that sei-daku was in fact similar in sound quality and rather
indistinctive in the Middle Stage (and therefore there was no sei-daku distinction in kakekotoba, too), while sei-daku was distinctive in the Earliest
Stage (and therefore there was no sei-daku ignorance in kakekotoba, either).
End (1989) hypothesizes that the sound quality of daku-on actually
changed in (or a little before) the Middle Stage, which blurred the sei-daku
distinction. Nasality plays an important role in his proposal. It is well known
among scholars that daku-on were very likely to be accompanied with nasality in the past. This is indicated in literature written in non-Japanese,39
such as works in Portuguese,40 Chinese,41 and Korean42 (see Hamada
1952a). Assuming that daku-on were generally nasalized in the past,43,44
End proposes an ambitious hypothesis that the nasality neutralized the
sei-daku distinction. That is, daku-on without nasality were distinct from
sei-on, but daku-on with nasality were not.45
End (1989) says that the nasalization of daku-on can be traced back to
around 800,46 based on his previous work in End (1973) which discusses
that the sound quality of [ b ] changed to [ mb ] around that time.47 It is
reported that [m]-column sounds (ma-gyou on, i.e. / ma, mi, mu, me, mo / )
changed to [b]-column sounds (ba-gyou on, i.e. / ba, bi, bu, be, bo / ) in
many words in this period (Matsumoto 1965, etc.). Another piece of evi-

58 Kazutoshi Ohno
dence, out of several he provides, is the appearance of nasality in the inflections of the b-final verbs in this period; e.g. tob+ta fly+PAST > tonda
flew, which is actually similar to the sound alternation of yom+ta read+
PAST > yonda read (past tense). These will be naturally understood if
the sound quality of [ b ] was closer to [ m ] such as [ mb ].48 He further
assumes a similar change for other daku-on as well. See End (1973) for
other evidence and discussion.
To summarize, End (1989) hypothesizes that nasalization of daku-on
blurred the sei-daku distinction, which is reflected in the writing of the
Middle Stage, while nasality appeared around 800. Before that, daku-on
were not accompanied with nasality and distinguished from sei-on, which
is reflected in the writing of the Earliest Stage. Thus, according to End ,
revival of the sei-daku distinction must be closely related to the disappearance of nasality (which will be discussed in 3.5 below).
The hardest part in supporting this hypothesis is the unsatisfactory explanation for why the nasalized daku-on and sei-on are indistinctive, while
daku-on without nasality are distinctive from sei-on. Another difficulty is
generalizing the change seen [ b ] to other daku-on. End (1989) discusses
the change of [ b ] to [ mb ] (i.e. ba-column daku-on) extensively, but other
daku-on only briefly. Yet another remaining issue is the time when nasality
appeared with daku-on. The change of [ b ] to [ mb ] may have been around
800, the transitional span from the Earliest Stage to Middle Stage, but it
may not have been indicative of the change in daku-on in general. It is important to remember that the nasalization of daku-on is one of the hardest
issues to deal with in the study of the history of Japanese (see also fn. 43
and section 4 below).

3.5. Sei-daku in contemporary Japanese

We have reviewed three possible arguments for the transition from the Earliest Stage to the Middle Stage. In this section, we briefly address the transition from the Middle Stage to the Current Stage, and then discuss the
status of the sei-daku distinction in contemporary Japanese.
Each position will provide motivation for the transition to the Current
Stage. The first argument (discussed in section 3.2) argues that it became
more convenient to distinguish sei-daku in writing (e.g. started triggering
semantic confusion). The second argument (section 3.3) argues that it is
because the sei-daku distinction which had been non-existent became con-

Sei-daku: diachronic developments in the writing system


trasted. The third argument (section 3.4) explains the phenomenon in a

similar fashion to the second argument, but further claims that the contrast
became clearer due to the disappearance of nasality from daku-on.
However, few scholars will disagree with the assumption that the transition took place in the 16th 17th century or so. Some refer to Christian literature (kirishitan shiryou, see fn. 40) written around 1600. The sei-daku
differences seen in the vocabulary there are mostly the same as the sei-daku
differences seen in the contemporary Japanese vocabulary (Takagi et al.
1960: 4243), though it is not so hard to find exceptions. Scholars assume
that the sei-daku distinction became fairly stabilized a little before such
literature was written, i.e. sometime in the 16th century. Others refer to the
fact that the use of daku-ten was increased in the first half of the Edo Era
[16031867] (see section 2.4 above). Such scholars assume that the seidaku distinction was tentatively stabilized in the early Edo Era, i.e. in the
17th century.49
There is a view that the sei-daku distinction today is not as stable as
normally believed,50 despite the fact that Japanese speakers can clearly distinguish sei-daku auditorily and that kana (hiragana, katakana) has a solid
way of representing daku-on (by daku-ten). To support this view, we can
point out that loan words in contemporary Japanese, which do not have
more than a 150-year history, are sometimes pronounced with different
voicing values. For example, forms such as betto bed, zyanbaa jumper,
batominton badminton, amezisuto amethyst, etc. (all from English) are
still used by many people.51 Even recently introduced words such as
zyaguzii jacuzzi include such confusion. The functional load of voicing in
Japanese may be lower than normally thought. In other words, Japanese
may still be in the transition of establishing the sei-daku distinction or consciousness (as voicing distinction).
As Komatsu (1971: 2637) proposes, the recognition of sei-daku may
have been like accent (= pitch patterns) in Japanese. In the major dialects of
Japanese, accent is lexically assigned to each lexical item and distinguished
from other patterns, but its functional load is very low (cf. Vance 1987:
107), i.e. if someone uses an incorrect accent, the listener will feel
strange but can understand it easily.52 Similarly, if someone spoke in different sei-daku, the listeners could align it with their own sei-daku distinction. This can be one explanation of why sei-daku was not reflected in the
writing system in the past, also.53 The sei-daku difference may have been
very small diachronically, and possibly synchronically as well.

60 Kazutoshi Ohno
4. Remaining issues
We have illustrated the sei-daku distinctions in the history of Japanese orthography, and seen possible explanations for differences in the various stages.
While we investigate the data found in literature, we must try to reconstruct
the sound values of sei-daku or assume the consciousness of sei-daku by
Japanese speakers of the past. There remain many unresolved issues regarding this exploration. In this section, however, only the most important remaining issue is addressed: the sound values of sei-daku in relation with
In the major dialects of contemporary Japanese, sei-daku is opposed in
voicing. Most sei-daku pairs differ not only in voicing but also in places
and/or manners of articulation (see appendix), but the statement that sei-on
are all voiceless, while daku-on are all voiced holds. The sei-daku opposition, therefore, is usually assumed to be the voicing opposition, implicitly
or explicitly. Basically this article has taken this position as well. However,
when the sound values of sei-daku are concerned diachronically, nasality
must also be taken into account. That is, at least three different sounds in
manner must be considered, e.g. [ t ], [ d ], and [ nd ], for sei-daku. Since
sei-daku is a binary distinction, how to group the three into two is an important issue.
Considering that [ N ] is regarded as a variant of [ g ] in contemporary
Japanese54 and that it is noted that [ b ] is occasionally accompanied with
nasality by Rodriguez in the beginning of the 17th century (see fn. 43),
and so forth, voiced obstruents55 (e.g. [ d ]) are perceptually grouped together with (pre)nasalized obstruents (e.g. [ nd ]) and distinguished from
voiceless obstruent (e.g. [ t ]). This distinction is the sei-daku distinction.
This will be the most popular view.
End (1989) hypothesizes that voiceless obstruents (e.g. [ t ]) can be
grouped together with nasalized obstruents (e.g. [ nd ]) due to their perceptual closeness, but distinguished from voiced obstruents (e.g. [ d ]), though
motivation is not convincingly given. The sei-daku distinction assumed by
him, however, is parallel to the view just given above, i.e. in terms of voicing. This is obvious from his statement the sound values of daku-on
changed (from [ d ] to [ nd ]), etc.
There is another view that we have not discussed yet. Some support the
idea of grouping oral obstruents (e.g. [ t ] and [ d ]) together and distinguishing them from nasalized obstruents (e.g. [ nd ]) (M. Takayama 1992a,b,
among others56). This division is based on nasality (oral vs. nasal), rather

Sei-daku: diachronic developments in the writing system


than voicing (voiceless vs. voiced). They propose that the sei-daku distinction had been based on this distinction, i.e. non-nasal obstruents are sei-on
and [partially] nasal obstruents are daku-on. Voicing of the non-nasal obstruents (i.e. sei-on) had been allophonic, e.g. voiceless word initially and
voiced word internally. That is, the sei-daku distinction in the past is similar to the dialects currently spoken in Tohoku (north-eastern Japan) or part
of southern Kyushu.57
We did not, and will not, explore this view mainly because this is not a
hypothesis to explain the diachronic transitions of the sei-daku distinction
in writing. It is unclear how this view accounts for the transitions in writing
representation convincingly. Also, this view, so far, does not explain how,
why, and when the sei-daku distinction in the past (by nasality) changed to
the distinction now (by voicing). According to their assumption, voiced
obstruents (e.g. [ d ]) in the past were sei-on; but now they are categorized
daku-on. We would like to have a persuasive explanation of this fact.
This, of course, does not mean that this view is not worth exploring. As
noted in section 3.4, such as in fn. 44, more and more consideration is required to conclude something about the diachronic background of the nasality appearance with obstruents. Until we reach a solid consensus, it is
best to keep our eyes open to various possibilities.58 It is actually worth
investigating various proposals, such as Takayama (1992b) who extensively discusses sei-daku in relation to nasality, to consider the diachronic
development or change of the sound values of sei-daku. It must also be
noted that the real value of End s proposals (section 3.4) becomes clearer
as the study of the nasalized obstruents are considered in greater detail.

5. Summary
In the first half of this article (section 2), it was illustrated that in writing seidaku was distinguished at first (Earliest Stage), then was not distinguished
(Middle Stage), and is now distinguished again (Current Stage). The transitions of the kana systems, including daku-ten usage, were also illustrated.
In the second half (section 3), three possible explanations for the sei-daku
representations were introduced. One approach assumed that sei-daku in
literature was a reflection of the actual [spoken] language, and the other
approach assumed that sei-daku in literature was merely a fact within the
writing system. The latter assumed that the sei-daku distinction existed
phonologically in a different manner from that in writing, and allowed for a

62 Kazutoshi Ohno
position either that sei-daku was actually distinctive or that sei-daku was
indistinctive. Finally (section 4), it was briefly noted that the correlation
between sei-daku and nasality could be a key for the further development
of sei-daku study in terms of the history of Japanese.

Sei-daku (lit.) clear-muddy


























From the left column: hiragana (cursive/rounded syllabary), manyougana

from which the hiragana on the left was developed [only for sei-on], katakana (partial/angular syllabary), manyougana from which the katakana
on the left was developed [only for sei-on], romanization faithful to kana
syllabary (i.e. phonemic romanization, employed in this article to describe

Sei-daku: diachronic developments in the writing system


the data), romanization faithful to pronunciation (i.e. phonetic romanization, typically seen in the spelling of Japanese proper names), and [broad]
transcription of actual pronunciation. [N] is observed only word internally,
if it appears (dialectal). See section 2 for discussion on manyougana.

The completion of this paper was supported by the MOE (Ministry of Education) project of the Center for Linguistics and Applied Linguistics of
Guangdong University of Foreign Studies, Guangzhou, China. The revised
version of this article was written while I was working at the Institute of
Cognitive Science, Hunan University, Changsha, China; after considerable
restructuring from its original version, entitled Sei-daku: More than a
voicing difference toward a better understanding of the rendaku phenomenon , written in June 2002, while I was studying at the University
of Arizona, USA. Reviews from two anonymous scholars were very helpful, especially, detailed comments and suggestions from the second reviewer. Takayama Tomoaki provided me not only with helpful suggestions
but also materials that I could not obtain. This article could not have been
completed without continuous help from the editors of this volume. Many
thanks go to those who have commented on various drafts of this article.
All remaining errors are my own.

Notes (Japanese, Chinese and Korean names are given in the order last-first)
1. (lit.) = (literal translation)
2. Discussion in Section 2.1 is largely dependent on Hamada (1971: 4445).
3. Adding a little more explanation, on reading is based on Chinese pronunciation,
while the kun reading is actually native vocabulary (e.g. word) assigned to
kanji. In contemporary Japanese, not all, but most of the frequently used kanji
have the two readings. Moreover, there may be multiple on readings and/or
multiple kun readings for one kanji. It is not surprising that one kanji has several
4. cf. Four in contemporary Mandarin Chinese is s in Pinyin representation.
5. There was another way of reading manyougana called gisho fun reading,
which relies on association or imagination by the reader. For instance, two
kanji [bee]-[sound] represented the sound bu (onomatopoetic), two kanji
[ten]-[six] (i.e. sixteen) represented the sound sequence of sisi because
44 (called si-si four-four in Japanese) makes 16, and so forth.

64 Kazutoshi Ohno
6. Kojiki (Record of Ancient Matters) is a history book written by  no Yasumaro (who recorded what Hieda no Are said) by Imperial request. The preface
to this work is written in Chinese (seikaku kanbun regular Chinese), while
the text is written in highly Japanized Chinese (hentai kanbun irregular Chinese). Nihonshoki (Chronicle of Japan) is another history book written by
Toneri Shinn (Shinn Imperial Prince), etc. and is the first official document
compiled by Imperial command. The text of the book is basically written in
Chinese. Manyoushuu (Collection of Myriad Leaves) is an anthology of
Japanese traditional songs (waka) written or edited by various anonymous
authors in 759, or perhaps a little later than that (around 770; the complete editorial work may have been done even later than this).
7. Using Chinese characters to show Japanese pronunciation is in fact already
seen in inscriptions (kinseki-bun) a few centuries earlier. However, those lexical items are limited to proper nouns, which it is often hard to reconstruct
original pronunciations for.
8. At least 35 different manyougana are used for the native sound of si in Nihonshoki (Tsuru 1977: 242).
9. Thus, reading manyougana was already difficult in the next Era (Heian Era
[794 1192]).
10. According to Tsukishima (1972: 384), this had already been recognized by
Keich (Waji shouin, 1691). Motoori Norinaga (Kojiki-den 1: Karina no koto,
1767) studied this issue, and his work was well expanded by his student Ishizuka Tatsumaro (Kogen seidaku kou, published in 1801).
11. Although data are limited, it is generally agreed that manyougana in inscriptions (see fn. 7 above) in or around the Suiko Era [592628] also had the seidaku distinction by using different characters (cf. Tsuru 1977).
12. Kasuga (1941) reexamined manyougana in Kojiki which had been considered
dubious regarding the sei-daku distinction at that time, and concluded that they
were distinguished well by different characters except for a few exceptions.
no (194748, 1953), argued that manyougana in Nihonshoki, which made
Motoori Norinaga wonder why there were many exceptions, also generally had
the sei-daku distinction by characters as well. Nishimiya (1960) and Tsuru
(1960) argued that the sei-daku distinction was well represented by not only
on-gana (manyougana read in on reading) but also kun-gana (manyougana
read in kun reading).
13. An example of multiple ways of representing a sound is provided in section
2.2 above (kamo (admiration marker) could be represented by one kanji or
two ka-mo). Another way is gisho use of manyougana (see fn. 5 above).
14. This difficulty was remarkable especially when taking supplemental notes on
the Chinese literature. For reading help, difficult or special pronunciations, morphemes that Chinese lacked (e.g. particles such as case markers, inflectional
endings, etc.), annotations, and so forth, were added in the margin.

Sei-daku: diachronic developments in the writing system


15. In fact, some simplified kana had already been sporadically used earlier. For
example, a few of them are seen in Shousouin komonjo: Minokoku [Mino no
kuni] Kamogun Hanifuri (Hanyuuri) koseki-chou Register book of Hanyuuri,
Kamo County, Mino State (currently part of Gifu Prefecture), which is the
existing earliest family register book written in 702. A claim in the text is that
kana simplification was positively processed in this period.
16. Those various variants of simplified characters were sometimes mixed with
manyougana (i.e. full kanji) in the literature.
17. Any simplified form of manyougana is, thus, often grouped together and
called ryakutaigana simplified kana in contrast to magana real kana which
refers to non-simplified manyougana (i.e. full kanji).
18. The umlaut shows that the sound belongs to the otsu type, not the actual sound
quality such as lip rounding.
19. Tsuru (1977) chose this literature for discussion because it represents official
documents (history books compiled by Imperial command), which are descended from Nihonshoki. They are written in the Chinese style, and
manyougana are included.
20. They were separately developed because of their preference in the field. Cursive
characters were preferred in writing the native literature, while partial characters
were preferred in reading (annotating, commenting, etc.) Chinese literature.
21. For example, Japanese pronunciation of Chinese characters are written in
manyougana in Konkoumyou saishououkyou ongi, a commentary on Buddhist
scriptures, copied in 1079.
22. This was actually simplified (partially represented) kanji for daku noted in
red ink (seen in Kongouchou yugarengebu shinnenju giki, 889).
23. Komatsu (1981: 70) notes that the primitive use of daku-ten on the top right
corner is seen in the literature in the mid-thirteenth century ([Kanchiin] Ruijuu
myougishou, written [copied] in 1241).
24. Originally they were tone marks, which later started to represent sei-daku.
25. Maruyama (1967: 1122) notes that in the Edo Era [16031867] people tended
to think that sei-on were sophisticated and daku-on vulgar. For example, Japanese-studying scholars often wrote sei-on but no daku-on. This might be one of
the reasons for the inconsistency.
26. This is a single small circle on the top right corner of kana, i.e. the same diacritic currently used for handaku-on ([ p ]-initial syllables).
27. e.g. Nihonshoki Tsuushaku (Commentary on Nihonshoki) by Iida Takesato
published in 1902.
28. Discussions in sections 3.1 and 3.2 are largely dependent on Hamada (1971).
29. Of course, this is not identical to the statement manyougana is phonetic (seidaku is distinct) and hiragana and katakana are phonemic (sei-daku is indistinct), which contradicts the basic assumption of this position sei-daku has
been distinctive.

66 Kazutoshi Ohno
30. But see the last paragraph in section 3.5 below for a possible answer to this
31. Mori (1991) points out that some volumes of Nihonshoki were actually written
by Chinese scholars (at least two different Chinese scholars). Ide (1989: 241)
says, citing M. Inoue (1932: 223225), that some manyou-gana usage in
Manyoushuu requires knowledge of the Chinese literature.
32. There are two of them: kou-monjo and otsu-monjo. They are (independently)
written at least before 762, according to Komatsu (1981: 57). They are generally regarded as written by a person from the low class (among intellectuals). M. Tanaka (1995: 196) says that they are written in rough style with
sloppy characters, so the author is supposedly not highly educated.
33. See also the second paragraph of section 2.4.
34. Wamyou ruiju-shou, which was edited a little later in 934 (by Minamoto no
Shitag), has no sei-daku distinction, though it is also a set of ChineseJapanese dictionaries. Hamada (1971: 44) notes that this is also due to the differences between the attitudes of the authors/editors against the sei-daku distinction in writing, rather than temporal factors.
35. Syllabic/Moraic nasal in Japanese undergoes place assimilation. It becomes
[ m ] before a bilabial sound, [ n ] before an alveolar sound, [ N ] before a velar
sound, and [ ] elsewhere (word-final, etc.). However, they are all recognized
as the same sound and written in the same hiragana or katakana by native
speakers of Japanese (Komatsu 1981: 5859). Sometimes [ ] (i.e. unassimilated) is observed in the environments of assimilation (e.g. before a bilabial
sound). Native Japanese speakers will not recognize it as special or different.
36. End (1989) is a collection of his papers published in 19711988.
37. Kakekotoba is one rhetorical devise mainly used in verse, by which two (or
more) readings are available from one expression. That is, homonymic expressions are exploited there. e.g. matu > wait(ing), pine (tree)
38. For example, in the traditional song (waka) in Kokinwakashuu (edited in around
913): wasurenanto omohukokorono tukukarani arisiyorigeni maduzokohisiki
(14: 718), two readings are available from madu madu first (of all) and
matu wait(ing).
39. Such nasality is not indicated in Japanese literature at all.
40. Around 1600, several books and dictionaries related to Japan or Japanese were
written by missionaries of the Society of Jesus. They are called Kirishitan
shiryou Christian literature, represented by Arte da Lingoa de Iapam [Nihon
daibunten] (16041610) by Joo Rodriguez. These are most reliable for discussion of sound values of sei-daku because they are relatively new and most
of them are written in alphabets which distinguish voicing by using different
What Rodriguez actually says is that the preceding vowel is nasalized, but
Hamada (1952a: 21, note 9 on p. 31) says that the nasality must be accompanied

Sei-daku: diachronic developments in the writing system







with daku-on because: (i) the nasality is also expected word-initially; (ii) the pronunciation of daku-on shares the property of nasals in youkyoku (singing Noh),
heikyoku (singing Heike monogatari Tale of Heike), citing Iwabuchi (1934).
Helinyuli [Kakuringyokuro] edited by Luo Dajing [Ra Taikei] (13c, 1252?),
Ribenjiyu [Nihonkigo] edited by Xue Jun [Setsu Shun] (1523), etc. In these
works, daku-on are usually preceded by a coda nasal. e.g. f
n-zh for hude
(writing) brush. Pronunciations are given in contemporary Mandarin Chinese
here for convenience.
Iropa [Iroha] (author unknown) (1492), Cheophaesineo [Shoukaishingo] by
Gang Useong [K Gsei] (1676), etc. They spelled in similar fashion as in the
Chinese literature (see fn. 41 above), i.e. put a nasal coda before daku-on. One
might think that it is showing the voicing of the obstruent rather than the nasalization of daku-on, mentioning that Hangul (Korean syllabary) itself has no
way to show voicing. In fact, Hangul had a letter for / Z / and it was used for
the Japanese / z / sound in the works cited above; nonetheless, it is preceded by
a nasal coda (see also Hamada 1952b).
In Nosondang Ilbonhaengnok [Roushoudou Nihonkouroku] by Song
Huigyeong [Kikei S ] (1420?), Haedongjegukgi [Kaitoushokokuki] edited by
Sin Sukju [Shin Shukush] (1471), etc., Japanese place names are written with
Chinese characters, using the same representation for daku-on as seen in the
Chinese literature mentioned in fn. 41 above.
Special thanks go to Choi Kyung-Ae who helped me to transliterate the
authors and titles using the new Korean romanization system.
Precisely speaking, Rodriguez writes in Arte da Lingoa de Iapam [Nihon
daibunten] (see fn. 40 above) that nasality is always observed before D, DZ, G
(/ d, g /, where / d / includes [ dz ] before / u /) and occasionally before B (/ b /).
However, considering examples in Korean literature (see fn. 42 above) and
k- n-zh for kaze wind (Helinyuli), hung-bng for obou monk (Helinyuli),
y n-b-j for ibiki snore (Ribenjiyu), etc. in Chinese literature, it will be reasonable to assume that daku-on were generally accompanied by nasality
around the 14th century.
It will be fair to note that it was kindly pointed out to me by an anonymous
reviewer that this is not well-established or an uncontroversial view. Rodriguez
does not say daku-on are consistently accompanied with nasality (see fn. 43
above). Some argue that the coda nasal before daku-on may not represent nasality but emphasize voicing of the following obstruent (e.g. Fukushima 1959).
However, it is also a fact that there is no strong evidence to refute the assumption (daku-on were generally accompanied with nasality); and therefore, not so
many scholars seem to reject it. This issue, whether daku-on were consistently
nasalized or not, is still debated.
End (1989) does not provide any convincing explanation or evidence for what
he calls neutralized. However, his hypothesis itself is actually very suggestive, especially when we compare the onomatopoeias that contrast in sei-on,

68 Kazutoshi Ohno




daku-on, and nasals. It is well known that the sei-daku contrast in Japanese
onomatopoeia results in contrastive meanings, such as positive and negative,
clear and dirty, light and heavy, etc., respectively. For example, ones eyes are
kirakira when enjoying something, while they are giragira when in hunger.
Let us consider the following examples: surusuru/zuruzuru vs. nurunuru
and torotoro/dorodoro vs. noronoro. (All examples here are mine.) Surusuru
refers to smooth action such as sliding down a rope, while zuruzuru refers to
the friction caused by moving a heavy object; and nurunuru refers to the oily
and/or slippery surface, which is much closer to surusuru (lubricious) and opposite of zuruzuru (frictional). Moreover, turuturu also refers to a slippery surface.
Moving on to the other set, torotoro refers to slow action or transition, while
dorodoro refers to the state or movement which is jelly-like or pulpy; and
noronoro is to describe that some movement is slow, which is closer to the
meaning of torotoro.
These are interesting, but it is unknown how these impressions/feelings of
the modern people can be evidential for determining the recognition of the intuitions of people hundreds years ago. See Komatsu (1981: 107110) for an interesting discussion of the [nasal][daku-on] ([ n ][ d ]) contrast that functions
similarly to the sei-daku contrast in meaning as described above. (His example
is nora vs. dora, both of which have a meaning of stray.)
Hamada (1952a: 27f) also assumes the change (nasalization over daku-on) in
the same period, but for different reasons. He assumes that the change was triggered by the influence from Chinese that had pre-nasalized obstruents around
this period. However, many others take a prudent attitude to this assumption.
Considering that education, or knowledge of Chinese, in that period was
strictly limited to a small set of people, it is difficult to accept an assumption
that the knowledge (and preference) of nasalized obstruents by such a limited
group changed the sound quality of the entire Japanese sound system.
The existence of it is confirmed by foreign literatures (see fn. 4043). However, not much is actually known about the emergence of the nasalization of
daku-on (see also section 4 below).
Then the nasality weakened to something like [ mb ] a few centuries later to be
realized as described by Rodriguez (see fn. 40 above).
Mabuchi (1971) also indicates the sound change of [ b ] in this period.
In the previous paragraph, it is expressed that the sei-daku distinction was
rather or tentatively stabilized. This is because daku-on had been inconsistently specified in the Japanese literature throughout the Edo Era, as mentioned
in section 2.4 above.
Many do not deny this view. Hamada (1960: 78) notes this, and so does
M. Takayama (1992b: 45).
Many fix their pronunciations after they study the spellings of those words.
That is, for Japanese speakers, the voicing distinction is still hard just through

Sei-daku: diachronic developments in the writing system


52. In other words, it is social rather than linguistic (Vance 1987: 107).
53. It is also suggestive in that most Japanese speakers cannot easily name the
accent pattern, e.g. HLLL, LHL, etc. (L = Low and H = High), if they can
distinguish them. The sei-daku distinction in the past may be similar to this:
even if they could distinguish them in speech, they could not write them down
54. [ g ] is supposed to be [ Ng ] in the past, and [ N ] would have been developed
from [ Ng ] because [ N ] was not a phoneme. It is sometimes taught that / g / is
prescriptively [ N ] word internally, but this is actually dialectal. The velar nasal
is not clearly observed in the dialects of the western half of Japan. Even in some
dialects of the eastern half of Japan, the younger generations do not produce
the nasal consistently (e.g. Tokyo dialect).
55. Whenever nasal is unspecified, e.g. voiced obstruents, they are oral or
non-nasal sounds.
56. Takayama (1992a, b) says that he followed Hayata (1977a,b), although Hayata,
in fact, just mentioned it without providing any evidence in his articles.
57. Yamane Tanaka (this volume) discusses voicing and nasality of obstruents in
Tohoku dialects.
58. This is one of the reasons this article focuses on introducing different views,
rather than proposing or concluding something new.

The representation of laryngeal-source contrasts

in Japanese
Kuniya Nasukawa

1. Introduction
With a focus on generative restrictiveness and the significance of crosslanguage variation in source contrasts, this paper identifies those phonological primes responsible for creating laryngeal-source contrasts in Japanese.
The arguments will be based on Element Theory (Kaye, Lowenstamm and
Vergnaud 1990; Harris 1994; Harris and Lindsey 1995, 2000). Unlike theories based on SPE-type distinctive features, this theory of melodic representation recognizes primes which are monovalent and which can therefore be
interpreted separately without needing to be combined with other primes.
The theory admits only two autonomous melodic categories for crosslinguistic source contrasts: one contributes aspiration and the other prevoicing.
This paper claims that Japanese exploits only the prevoicing element in the
representation of phonation-type contrasts, this position being supported by
evidence from assimilatory processes, early language acquisition and aphasia.
The argument leads to the further claim that vowel devoicing is not a process
triggered by the laryngeal element for aspiration, but by a manner element
called noise. Furthermore, in accordance with the general trend towards
reducing the size of the element inventory, the paper will discuss the validity
of a recent proposal to merge the prevoicing and nasal elements.

2. Phonetic characteristics of the laryngeal-source distinction

It is widely acknowledged that Japanese exhibits a two-way source contrast
between voiced and voiceless. The term voiced consonants refers to
the set of sounds represented by the phonetic symbols b, d, g, z while
voiceless corresponds to p, t, k, s. These labels are variously exploited in
the literature, and usually suffice as a means of identifying the contrasts in
phonemic and pedagogic terms.

72 Kuniya Nasukawa
However, it has been acknowledged that the use of these terms is insufficient for describing the varied phonetic manifestations of the source contrasts across different languages. In articulatory terms, for example, the socalled voiced obstruent plosives b, d, g in most Slavic (such as Polish and
Russian) and Romance languages (such as Spanish and French) are typically produced in word-initial position with glottal pulsing during articulatory closure: in precise phonetic terms, they are described as voiced unaspirated. The voiceless plosives p, t, k in those languages are articulated
without glottal pulsing and are described as voiceless unaspirated. On the
other hand, the voiced plosive series of some Germanic languages such as
English and Swedish, for example, is typically produced in word-initial
contexts without glottal pulsing, and is identified as voiceless unaspirated.
The voiceless series in those languages is also articulated without glottal
pulsing, but with aspiration: it is described as voiceless aspirated, which is
thus identical to the voiced series of most Slavic and Romance languages.
The cross-linguistic differences in the phonetic realization of the two-way
contrast is often captured by voice onset time (VOT), which is the wordinitial interval between the release of the stop closure and the onset of vocalfold vibration (Lisker and Abramson 1964; Abramson and Lisker 1970).
Members of the voiced unaspirated (truly voiced) series, as found in Spanish
and French, are characterized by a relatively long lead time between the
onset of glottal pulsing and stop release. On the other hand, in the voiceless
aspirated (fortis) series found in languages such as English and Swedish,
there is a relatively long time lag between closure release and the onset of
glottal pulsing. In the voiceless unaspirated (lenis or neutral) series, which is
common to all languages of the world, there exists either a relatively short
or zero time lag between closure release and the onset of glottal pulsing.
It is possible to identify the perceptual source distinction using experimental phonetic methods. Spectrographic analysis reveals how the VOT
value in an initial CV context is reflected in the cutback of the onset of the
first formant (F1) relative to the higher formants. According to experimental
tests involving the discrimination of contrasting initial CV contexts (Lisker
and Abramson 1970), for English speakers the perceptual boundary of the
b-p distinction is observed in the VOT value from +20 to +30 msec,
whereas the b-p boundary for Spanish speakers lies in the VOT value +10
to +20 msec.
To avoid confusion arising from the use of the cover terms voiced and
voiceless, the experimental phonetics literature often employs the terms
voiced unaspirated, voiceless aspirated and voiceless unaspirated to refer to

The representation of laryngeal-source contrasts in Japanese


long voicing lead (negative VOT), long voicing lag (positive VOT) and
short or zero voicing lag (zero VOT) in the description of the interval between the stop release and the onset of voicing. With respect to these three
VOT categories, languages are classified into at least four groups (there
exists a fifth type which will be discussed in 5) as follows:

VOT systems of two-way contrast



Short lag

Long lead

Long lag

Types II and III are systems exhibiting a two-way source contrast: Type II
to which Spanish and French belong displays short-lag and long-lead
plosives; Type III to which English and Swedish belong shows shortlag and long-lag plosives. There are two other types of source contrasts:
Type I which is found in Finnish exhibits only the short-lag series; on
the other hand, Type IV which is the system observed in Thai and Burmese employs all three source contrasts. It should be noted that the shortlag series is always present in every language system.
Japanese, like Spanish and French, is considered as belonging to Type
II, since it exhibits a contrast between short voicing lag and long voicing
lead in word-initial plosives: in initial consonants, the language does not
show the aspiration which is a property of Type III languages. Also, experimental tests for the discrimination of source contrasts in initial CV contexts
in Japanese (Shimizu 1977) reveal that the perceptual boundary of the b-p
distinction lies in the VOT value from +15 to +20 msec, which is almost
identical to the result for Spanish.

3. The representation of laryngeal-source distinctions

Current phonological theories are agreed that speech sounds are decomposable into smaller categories. The existence of these categories is widely
supported by the notion of natural classes; furthermore, they are considered
to be part of the linguistic aspects of the human genetic endowment. These
categories have been investigated within a number of different theoretical

74 Kuniya Nasukawa
frameworks, and have been variously labeled as distinctive features (SPE,
et passim), components and gestures (in the framework of Dependency
Phonology: Anderson & Ewen 1987, van der Hulst 1989), particles
(Schane 1984, 1995) and elements (Kaye, Lowenstamm and Vergnaud 1985;
Harris 1990, 1994; Harris and Lindsey 1995, 2000).
In order to represent cross-linguistic source distinctions, various types of
phonological primes have been proposed within this range of theories. Distinctive feature theories, for example, use the bivalent features [voice] and
[tense] (see Jakobson, Fant & Halle 1952, Chomsky and Halle 1968, and
others). In addition to these, we find references to [heightened subglottal
pressure] and [glottal constriction] in Chomsky and Halle (1968), [spread
glottis], [constricted glottis], [stiff vocal cords] and [slack vocal cords]
in Halle and Stevens (1971), and [fortis] in Kohler (1984).
In frameworks employing monovalent distinctive features, the singlevalued prime [voice] can be found (see It, Mester & Padgett (1995),
Lombardi (1995) and others). Also, [stiff vocal cords] is employed in Halle
& Stevens (1991) while [spread glottis] is used in Jessen & Ringen (2001).
In Dependency Phonology (Anderson & Ewen 1987), which employs
only monovalent primes, the addition of the |V| component in the phonatory
sub-gesture represents voicing while its absence indicates voicelessness. In
the variant of Dependency Phonology known as Radical CV phonology
(van der Hulst 1995), C under the phonatory sub-gesture is for constricted
glottis, Cv for spread glottis, V for obstruent voicing and the absence of
these components for voicelessness in obstruents.
Particle Phonology investigates mainly vocalic systems and does not
provide any significant details about the melodic representation of source
contrasts in consonants.
Like Dependency Phonology, Element Theory can be traced back to
Anderson & Jones (1974). However, more recent developments in Element
Theory have necessitated a change of name, and the label Government/
Licensing-based Phonology (Kaye, Lowenstamm & Vergnaud 1985, 1990)
has become the accepted term. In this framework the two elements [L] and
[H] are employed to express laryngeal-source contrasts: [L] for long voicing
lead (truely voiced); [H] for long voicing lag (voiceless aspirated); the absence of these elements stands for short or zero voicing lag (neutral).
Most studies investigating the phonological phenomena of Japanese have
traditionally employed the bivalent feature [voice] to represent the twoway source contrast (It & Mester 1986, et passim): [+voice] and [voice]
for long voicing lead and short voicing lag respectively. In recent theories

The representation of laryngeal-source contrasts in Japanese


which claim that source contrasts involve a privative prime (Rice 1992, It
& Mester 1993; It, Mester & Padget 1995), the monovalent feature [voice]
is typically employed: the existence of [voice] refers to long voicing lead
and its absence to short voicing lag.
The first Element Theory analysis of source contrasts in Japanese was
given in Shohei Yoshida (1990, 1996), where both elements [L] and [H] are
employed. Without taking into consideration any of the cross-linguistic
differences regarding source contrasts, he utilises [L] for voiced obstruents and [H] for voiceless cognates. An alternative approach such as
Nasukawa (1995), however, succeeds in incorporating the cross-linguistic
facts concerning source distinctions into the melodic analysis of Japanese.
Specifically, it is claimed that [L] is the only element required for the
source contrast in Japanese: its existence is interpreted as long voicing lead
(voiced series of obstruents) and its absence as short voicing lag (voiceless series of obstruents). Furthermore, by merging [L] and the murmur/nasal element [N], Nasukawa (1998) proposes that long voicing lead
phonetically manifests itself when [N] is the head of a given melodic
In the context of Element Theory, this paper will consider (i) how the
phonological primes involved in laryngeal-source contrasts contribute to
the internal organisation of individual sounds, and (ii) how the representation of those primes succeeds in incorporating implicational universals as
well as the phonological properties associated with source contrasts.

4. Oppositions, interpretation and definition in melodic representation

Let me first identify the theoretical characteristics which define Element
Theory. In comparison with the other sub-segmental theories, elements are
defined by at least the following three tenets:
(2) An element is
a. monovalent in terms of phonological oppositions,
b. the minimal unit of interpretation by the sensorimotor system, and
c. an information-bearing pattern that humans perceive in speech signals.
Firstly, (2a) is concerned with the way melodic information is expressed. In
element-based theory, the binary nature of phonological contrasts is represented through the presence (activeness) or absence (inactiveness) of a given

76 Kuniya Nasukawa
monovalent prime. For instance, like some melodic theories employing
gestures or particles, voice contrasts are encoded by the presence versus
the absence of the voice prime in a given expression. Some feature-based
theories such as It, Mester & Padgett (1995) and Lombardi (1995) take the
same theoretical stance. In contrast, the competing notion of bivalent oppositions is exploited by orthodox distinctive-feature theories (SPE, et passim),
where an opposition is derived by specifying plus and minus values to a
given prime. For instance, the voice contrast is captured by the voice prime,
to which a plus or minus value is specified. Under this view, bivalency
produces at least three possibilities: [+voice] is active in processes; [voice]
is active; and both [+voice] and [voice] are simultaneously active.
As the literature indicates, however, the processes involving laryngealsource contrasts in Japanese do not employ these three options in equal
measure: [+voice] is traditionally said to trigger most dynamic processes
such as postnasal voicing and compounding; [voice] rarely triggers processes except for high vowel devoicing between voiceless consonants; and
no simultaneous participation of both features is attested. Furthermore, in
the context of a rule-based multi-stratal model, the bivalent format substantially over-generates the number of unattested processes. In a model exploiting the notion of monovalency, on the other hand, the voice contrast
is captured by the presence or absence of the voice prime. In this case,
only the prime which is present in a given context can be active for processes such as voicing assimilation. Its absence means a failure to participate in any processes. This adequately describes the asymmetric processes
involving the voice prime, yet it does not generate processes which are
Secondly, (2b) states that an element can be interpreted separately without needing to be combined with other elements. Indeed, this property is
common to all theories which adopt the notion of monovalency. This suggests that information from phonological representations is accessible by
the sensorimotor systems. As Harris & Lindsey (1995) discuss, this approach succeeds in eliminating redundancy rules of the kind which fill in
predictable feature values, and instead pursues a mono-stratal approach to
phonology. In orthodox feature theories, on the other hand, a single prime
cannot be interpreted without being harnessed to the signatures of other
primes. This implies that the minimal units of phonetic interpretation are
segments, not features.
The element-based approach (Harris & Lindsey 2000) is also characterised by (2c), which states that elements are not defined, by properties such

The representation of laryngeal-source contrasts in Japanese


as tongue height or formant height: rather, they are sound images which
comprise information-bearing patterns that humans perceive in speech
sounds. They should be detectable through the traditional method of determining the manner in which sounds are organized into systems and natural
classes. This view is based upon the assumption that speech sounds are
represented cognitively as auditory images primary media which are neutral between speaker and hearer: speakers transmit and monitor such information and listeners receive it. This position is rarely touched upon in the
literature of other representational approaches such as orthodox feature
theory, where features are chiefly defined in terms of articulation or raw
acoustics (Chomsky & Halle 1968; Clements & Hertz 1991) or otherwise
in terms of coexisting articulatory and acoustic specifications (Flemming
1995). As a challenge to non-element-based theories, Harris & Lindsey
claim that articulation and raw acoustics are not information-bearing categories: articulation is a delivery system for linguistic information and raw
acoustics is a mere outcome delivered by articulation.
5. Laryngeal-source elements
Element Theory employs two autonomous melodic primes: the low source
element labelled [L] and the high source element labelled [H], which are
phonetically interpreted in obstruents as long voicing lead (true voicing,
prevoicing) and long voicing lag (aspiration) respectively. The auditory
images of these elements are, as the names imply, low source and high
source, the acoustic patterns of which appear in spectrograms as lowered
fundamental frequency (F0 down) and raised fundamental frequency (F0 up)
respectively. The nearest corresponding features might be taken to be [slack
vocal cords] and [stiff vocal cords] respectively (Halle & Stevens 1971,
1991), although the equivalence is found only in terms of glottal execution.
Whereas [slack vocal cords] and [stiff vocal cords] are defined in terms of
articulation, [L] and [H] may be treated as phonologically defined auditory
images. In addition, [slack vocal cords] and [stiff vocal cords] are intended
only for representing laryngeal activity (that is, phonation-type distinctions
in consonants and also tonal distinctions in syllable nuclei) (Bao 1990,
Halle & Stevens 1991, cf. Yip 1980), whereas [L], as we will see in 7, is
also relevant to the representation of nasality (Nasukawa 1995, 1998,
2005a; Ploch 1999). Harris (1998) claims that the availability of the two
monovalent elements [L] and [H] implies the possibility of the following
four combinations:

78 Kuniya Nasukawa

Element specification of source contrasts

Source elements
[L, H]

Phonetic manifestation
Short or zero voicing lag (neutral)
Long voicing lead (truly voiced)
Long voicing lag (voiceless aspirated)
Long voicing lead and lag, murmur (breathy)

Besides the specification of both [L] and [H] alone, there are two further
possibilities: both elements are unspecified in obstruents, and both elements
are simultaneusly specified in a given melodic expression. The former option, where no source category is specified, manifests itself phonetically as
short or zero voicing lag in obstruents. The latter option, where both source
categories are specified together, is interpreted as long voicing lead and lag
(breathy voice), which is associated with the voiced aspirated plosives
found in, for instance, Hindi and Gujarati.
These combinatorial specifications are selected by parameter on a language by language basis. The typology of source-element specifications is
illustrated below:

Typological specification of [L] and [H]






[L, H]

In presenting the VOT typology in (4), Harris (1994, 1998) claims that its
arrangement straightforwardly captures implicational universals and allows
us to identify those segmental classes that are active in processes involving
laryngeal source.
With respect to implicational universals, the relative markedness of
source contrasts is explained in terms of compositional complexity. The
unmarked setting is represented by the absence of source elements and corresponds to the short or zero voicing lag found in all languages. This nonspecification of source elements is regarded as the baseline on to which
source elements are superimposed: so the existence of any source elements

The representation of laryngeal-source contrasts in Japanese


implies the existence of short or zero voicing lag the manifestation of

non-specification of source elements. In addition, the most complex (Type
V) systems allow the combination of [L] and [H], which results in a fourway contrast. In this case, the existence of the [L]-[H] combination implies
those parameter settings which permit the existence of sole [L], sole [H]
and also the absence of both [L] and [H].
Furthermore, both the active and the inert members of a segmental class
are straightforwardly captured by the presence or absence, respectively, of a
particular source element. For instance, true-voicing assimilation (spreading
of [L]) is only observed in languages which employ [L], such as Polish and
Serbo-Croatian; and similarly, voiceless assimilation (to voiceless aspirated
obstruents) (spreading of [H]) is only observed in languages such as English
which employ [H]. Members of the neutral series of obstruents may undergo
processes such as neutralisation and lenition in certain contexts, but they
never participate in processes as triggers: for example, final devoicing in
Polish, Northern German and Catalan is explained in terms of the suppression of [L]; and in the phonology of English-speaking children the favoured
neutral state in obstruents is accounted for in terms of the suppression of
In Element Theory, the source elements [L] and [H] play a dual role:
while they contribute the phonation-type contrasts in non-nuclear positions,
as just discussed, in syllable nuclei [L] and [H] are interpreted as low tone
and high tone respectively (Harris 1998). This dual role stems from the close
correlation in tone languages between voicing in non-nuclear positions and
register differences in nuclear positions. In diachronic terms, tone on nuclei
has developed partially or fully from phonation-type contrasts in neighbouring non-nuclear sites (Yip 1980; Bao 1990; Halle & Stevens 1991).
It should here be noted that [L] and [H] represent tonal contrasts but not
voice activity in nuclei. This reflects the fact that voice in nuclei is characterised by spontaneous voicing, which the theory considers to be an innocent by-product of its manner characteristics (Harris 1994: 135136).
(Sonorant devoicing such as high vowel devoicing in Japanese is a different
issue. As we will see in 6.3, it has nothing to do with source elements.) In
contrast, the unmarked (lexically unspecified) laryngeal state for obstruents
is voiceless unaspirated. The state of true voicing or aspiration is not spontaneous since the presence of a source element [L] or [H] is superimposed
upon the unmarked state.

80 Kuniya Nasukawa
6. Japanese as a Type II language
6.1. Phonetic evidence
There are several pieces of evidence to support the claim that Japanese belongs to the group of Type II languages in (4).1 One piece of phonetic evidence, as 2 has already mentioned, comes from the observation that, like
Spanish and other languages belonging to the Type II category, aspiration
(which is a characteristic of two-way source contrasts in Type III languages) is cannot be detected in typical contexts for VOT measurement in
Japanese. The voiceless series of two-way source contrasts in Japanese
comprises voiceless unaspirated consonants. This is supported by the results of tests for source discrimination in initial CV contexts. Shimizu (1977)
claims that the perceptual boundaries of the b-p, d-t and g-k distinctions lie
in the VOT value from +15 to +20 msec, from +20 to +30 msec, and from
+20 to +30 msec, respectively. These characteristics of VOT perception are
almost identical to those found in a Type II language like Spanish. In contrast, as Shimizu argues, Type-III languages (showing another two-way
source contrast) such as English exhibit rather different values for the perceptual boundaries in the same b-p, d-t and g-k distinctions: from +20 to
+30 msec, from +30 to +40 msec, and from +30 to +40 msec, respectively.

6.2. Active processes of true voicing

Phonological evidence for true voicing in Japanese obstruents comes
mainly from two areas: assimilatory and concatenating processes (which
will be discussed here), and early language acquisition and aphasia (which
will be discussed in 6.3).
First, as confirmed by the literature in general (see also the other papers
in this volume), Japanese displays active processes involving true voicing:
postnasal voicing and sequential voicing known as Rendaku. Postnasal
voicing is generally described as categorical voicing assimilation occurring
in nasal-obstruent clusters (e.g. kaNg ae thought within a single morpheme
and kande chew  kam + te across a morpheme boundary). This phenomena is also attested in some Type II languages such as Campa
(Arawak), Quichua and Zoque. (Some Type IV languages such as Thai,
which also employ [L] as the category for true voicing, also exhibit this sort
of pattern.)

The representation of laryngeal-source contrasts in Japanese


The true voicing process of Rendaku is also regarded as categorical. Under

Rendaku, an initial voiceless (voiceless unaspirated) consonant of the
second member of a non-coordinate compound is realised as its voiced
(truly voiced) counterpart (e.g. onna + kokoro  onnagokoro womans
heart)2. This process has its origins in the diachronic change which derived
voicing from the genitival particle no (or ni) by eliding the vowel and then
absorbing the nasal into the following voiceless obstruent (Unger 1977;
Vance 1983, 1987). In light of this fact, It and Mester (1986), for example,
assume that the source of voicing is considered to be a compounding conjunctive morpheme, which consists of no melodic content other than a voice
feature appearing at the left edge of the second member of a compound.
In terms of elements, the sole independent source category [L] participates in both of the phenomena outlined above. In postnasal voicing, [L] in
the nasal of a nasal-obstruent cluster spreads to the following obstruent. In
the case of Rendaku, the compounding conjunctive morpheme consisting of
[L] docks on to the initial obstruent position of the second member of a
compound unless the process violates Lymans Law.

6.3. Devoicing
Turning to the so-called voiceless members the two-way laryngeal contrasts in Japanese, these involve no source specification and are therefore
predicted to be phonologically inert. That is, the members of this neutral
series of obstruents can undergo laryngeal-source assimilation as in postnasal voicing but cannot trigger the process.
At first sight, however, phenomena such as devoicing do seem to contradict this prediction. Both this volume and also the wider literature report
that Japanese exhibits the notable exception of vowel devoicing, which most
frequently occurs when a high vowel is flanked by voiceless obstruents: e.g.
aki8ta Akita (place name). According to one view, as explained by Tsujimura
(1196: 2728), the voiceless properties of both flanking consonants affect
the intervening vowel: the value of [voice] in the vowel changes from plus
to minus. For this reason alone, it is often assumed that the voiceless
property in obstruents can be phonologically active in Japanese.
Within the framework of Element Theory, however, unlike distinctive
feature theories, the voiceless (neutral) obstruents in Japanese, as well as
all vowels and sonorant consonants, have no source specification which can
act as a trigger for laryngeal assimilation. Instead, I assume that vowel de-

82 Kuniya Nasukawa
voicing in Type II languages is related to the active status of the noise element [h], which manifests itself acoustically as aperiodic energy. The closest corresponding feature might be [+continuant], although unlike
[+continuant], [h] is present not only in fricatives and affricates but also in
plosives (Harris and Lindsey 1995: 734). It is always specified in the internal organisation of obstruents (except the glottal stop /). It will be recalled
that the context which triggers high vowel devoicing makes crucial reference to obstruents: segments flanking high vowels must be not only voiceless but also obstruents. In element terms, this condition is described by
the statement that [h] (which is required in obstruents) in non-nuclear positions flanking a high vowel nucleus affects the nuclear position only if [h]
is not combined with any laryngeal-source specification (that is, no [L] in
Japanese which indicates a neutral obstruent).
Following the analyses of the interpolation of nasality in Cohn (1993)
and Nasukawa (1995, 2005a), I assume that the extension of [h]s phonetic
signature over a flanked nucleus results from the phonetic interpolation of
the two [h]s in the flanking obstruents. This can be basically attributed to
the quality of Japanese high vowels, which are relatively centralised and
are frequently involved in various processes. For example, they are often
used as an epenthetic vowel because of their less salient profile (e.g. tas to
add + ta past tense suffix  tasita in Yamato Japanese, and ski ski 
s kii in lexical borrowing), and often undergo assimilatory processes (e.g.
eiga film  eega and ma horse  M ma). Spontaneous voicing of
such phonetically weak vowels tends to be overshadowed by its neighbouring [h]s, and the result is generally perceived as devoicing.
This kind of interpolation is not achieved when high vowels are flanked
by obstruents with long-lead voicing that is, when [h] co-exists with [L]
in a single expression. This is due to an acoustic effect whereby the characteristics of aperiodic energy are partially suppressed by the co-existence of
long-lead glottal pulsing.

6.4. Early language acquisition and aphasia

The zero-[L] source contrast straightforwardly captures the early stages of
language acquisition in Japanese. In the speech of very young Japanese
children (before the age of about 1 year 6 months), the laryngeal properties
of plosives are, as Jakobson (1968: 14) claims, neutralized into the region
of zero or short voicing lag (Nasukawa 2005b). Then at a later stage the long-

The representation of laryngeal-source contrasts in Japanese


lead and short-lag contrasts emerge. Assuming that unmarked melodic representations are acquired before marked ones, then we predict that acquisition
is characterized by two separate stages: first, the non-specification of [L]
(neutral laryngeal state), then later the specification versus non-specification of [L]. That is to say, the neutral state (baseline) of laryngeal-source
specification is preferred in the earlier stages of plosive production. This unmarked status of short-lag plosives in the speech production of very young
children is also backed up by acquisition studies involving other languages
(see Harris 1998: 179, for a detailed discussion and references therein).
Reports of aphasic deficit in Japanese can also be accounted for in similar terms. For instance, in Brocas and global deficit the laryngeal-source
contrasts are collapsed, and the production of stops tends to converge on
the short-lag region (Itoh, Tatsumi and Sasanuma 1986). This convergence
can be regarded as a loss of the categorical representation [L]. Similar patterns are attested across different languages: e.g. the loss of [H] in English
(cf. Blumstein, Cooper, Statlender, Goodglass and Gottlieb 1980) and the
loss of [L] and [H] in Thai (cf. Gandour and Dardarananda 1982, 1984).

7. Reductionism: merger of [L] and [N]

Within established versions of Element Theory (Harris & Lindsey 1995),
the laryngeal-source contrast in Japanese is represented by assuming a nonspecified neutral baseline versus the specification of [L]. While maintaining
this basic assumption, an alternative representation of long-lead voicing
(henceforth voicing) is proposed by Nasukawa (1995, 1998, 1999). This
move is intended to contribute to a general trend towards reducing the
number of elements available to the phonology.3 Reflecting the strong correlation between voicing and nasality in melodic representations, Nasukawa
proposes a merger of voicing and nasality under a single element [N]
(murmur element), which has been used for representing nasality.4 Then,
the notion of melodic headedness contributes to the dual phonetic interpretation of the single element:5 if [N] is headless, it is interpreted as nasality;
on the other hand, if [N] is headed, it is interpreted as voicing.6


Phonetic manifestation

[N] (non-headed [N])

[N] (headed [N])

(long-lead) voicing

84 Kuniya Nasukawa
The first formal evidence that voicing and nasality are two instantiations of
the same category is provided in Nasukawa (1995, 1998), which present an
integrated approach to the paradixical behaviour of voice and nasality:
nasals appear to be specified for voice in postnasal voicing assimilation
(e.g. in + ta  inda died), while they behave as if they have no voice in
Lymans Law, which allows only a single voiced obstruent in a particular
domain (e.g. kanade a play, a dance, *ganade). By adopting the representations in (5), postnasal voicing is treated as the extension of the [N] across
both positions of an NC cluster, where only the element in the second position is promoted to a headed status.7 On the other hand, the transparency of
nasal obstruents to Lymans Law follows from [N] failing to be headed.
Furthermore, the dual interpretation of [N] is supported by some robust
correlations between voicing and nasality. A typical instance of such a relation is postnasal voicing assimilation, found not only in Yamato Japanese,
but also in many languages such as Quichua and Zoque, where an obstruent
preceded by a nasal is obligatorily voiced. Another example of the relation
between voice and nasal is found in processes involving alternations between voiced obstruents and their nasal reflexes such as fully-nasalised
and prenasalised voiced cognates. This kind of process is often observed in
intervocalic contexts. For example, voiced obstruent prenasalisation is witnessed in Northern Tohoku Japanese, languages of the Reef Island-Santa
Cruz family, those in the Pacific area and several Bantu languages; in the
intervocalic context, conservative Tokyo Japanese exhibits voiced-velarobstruent nasalisation. Furthermore, in the verbal inflexion of Yamato
Japanese, the stem-final b in a verbal stem such as tob to fly is realised as
a nasal that is homorganic with the initial obstruent of a suffix such as -te
According to Nasukawa (2005a), the assignment of head status to voicing
rather than to nasality can be justified as follows. First, the representations
in (5) can encode an implicational universal between voicing and nasality.

Typology of (long-lead) voicing and nasal

Languages e.g.
Finnish, English

Spanish, Thai



The representation of laryngeal-source contrasts in Japanese


As illustrated in (6), we never encounter a system which displays trulyvoiced plosives without also having nasals. This observation allows us to
express the implication that the existence of voicing implies the existence
of nasal. This implicational universal is straightforwardly captured by the
representations in (5).
Second, the representations in (5) encode the optional status of voicing.
Almost all languages exploit contrastive nasality, whereas voicing is parametrically controlled. The optional status of voicing is reflected in the nonintegral nature of headedness: some systems permit [N] to be headed, while
others disallow this as a structural possibility.
Furthermore, the representations in (5) reflect differences in complexity
between voicing and nasality. In the analysis of prenasalisation and velar
nasalisation by Nasukawa (1999), nasality must be less complex structurally
than voicing, since the latter property is often suppressed in intervocalic
contexts and instead nasality is interpreted (e.g. in some dialects of Japanese,
some Western Indonesian languages and several Bantu languages). According to Harris (1994, 1997), segmental structure is less complex in weak
positions than in strong positions a state of affairs predicted by the proposed representations in (5).
Finally, the elimination of [L] from the melodic inventory for phonationtype contrasts complements a recent analysis of tone and intonation. Instead
of [L] representing low pitch in nuclear positions, for example, the nonspecification of [H] (which is interpreted as high tone) in nuclear positions
phonetically manifests itself as low pitch in the analysis of Japanese pitch
accentuation (Yuko Yoshida 1995). Also, in the analysis of intonation, as
an alternative to the specification of [L], a prosodic boundary unassociated
to [H] is interpreted as low pitch (Cabrera-Abreu 2000).
According to this revised approach, the specification of source contrasts
can be summarized as follows:

Source elements
[N] (headed [N])
[N, H]

Phonetic manifestation
Short or zero voicing lag (neutral)
Long voicing lead (truly voiced)
Long voicing lag (voiceless aspirated)
Long voicing lead and lag, murmur (breathy)

The two-way source contrast of Japanese (a type II language) is then represented by the contrast between a non-specified neutral baseline and headed

86 Kuniya Nasukawa
8. Summary
The main purpose of this paper has been to present an analysis of laryngealsource contrasts in Japanese within the scope of Element Theory. Exhibiting
cross-linguistic variation in VOT values, the contrasts are derived by the
non-specification of any source elements versus the specification of the low
source element, which correspond phonetically to the laryngeal properties
of neutral and long voicing lead, respectively. This is supported by a number
of phonologically-active phenomena involving true voicing, the preference
for the neutral laryngeal state in early language acquisition and the convergence of VOT values on the neutral region in cases of aphasia in Japanese.
Finally, following a recent proposal to merge the low source and nasal
elements, this paper has shown how long voicing lead is represented by a
headed nasal element ([N]) in a given expression. To support this position,
evidence has come from the correlations observed between voice and nasal,
as well as from implicational universals. To conclude, the Japanese laryngealsource contrast is represented by the specification of two laryngeal states:
the bare source baseline versus the same baseline with a headed nasal
element superimposed on to it.

1. The first Element Theory analysis of source contrasts in Japanese was given in
Shohei Yoshida (1991, 1996), where both elements [L] and [H] are employed.
Without taking into consideration any of the cross-linguistic differences regarding source contrasts, he utilises [L] for long voiced obstruents and [H]
for voiceless cognates.
2. An exception arises when a voiced obstruent is already specified in a given
lexical form. In such cases Lymans Law requires the original voiceless
consonant to remain unchanged.
3. There have been several proposals to extend the element-reducing programme
in various ways, for instance by merging aspiration with noise and coronality
with openness (van der Hulst 1995; Marten 1996; Charette & Gksel 1998;
Kula & Marten 1998). The conceptual advantages of this approach are clear.
However, the empirical consequences have yet to be fully worked out.
4. Instead of [N], Ploch (1999) and others use [L] and eliminate [N] from the
element inventory. However, I have opted for [N] rather than [L] to represent

The representation of laryngeal-source contrasts in Japanese





the correlation between voicing and nasality, since the bare element without
headship status contributes nasality.
In this treatment, the headedness of a given element is regarded as an intrinsic
property which enhances the acoustic image of the element (Harris 1994; Harris & Lindsey 1995; Backley 1998; Nasukawa 1998, 1999).
Within a geometry-based version of Element Theory (Backley 1998; Backley
& Takahashi 1998), Nasukawa (2005a) proposes that the contrast between
voicing and nasality is represented using the same idea: if the element [N] licenses its [comp], then it is interpreted as (long-lead) voicing, while the same
element without a licensed [comp] is interpreted as nasality.
See Nasukawa (2005a: 4.5) for a detailed discussion.
See also Ploch (1999) for further arguments to support the merger of long-lead
voicing and nasal elements.

Rendaku in inflected words

Timothy J. Vance

1. Rendaku
The Japanese term rendaku, which Martin (1952: 48) translates as sequential voicing, refers to a morphophonemic phenomenon found in compounds
and in prefix+base combinations. A morpheme that shows rendaku has one
allomorph beginning with a voiceless obstruent and another allomorph beginning with a voiced obstruent. The rendaku allomorph (i.e., the allomorph beginning with a voiced obstruent) of such a morpheme appears
only when it is a non-initial morph in a word. The examples in (1) illustrate
the pairs of phonemes that can alternate.








river boat
chopstick case
starry sky
arrow symbol

Because of well-known historical changes, some of the alternations in

modern Japanese involve more than just a difference in voicing. Notice that
/b/ alternates not with /p/ but with /f/ ([F]), as in /fune/~/bune/, and with /h/
([h] or []), as in /hako/~/bako/.1 Notice also that /z/ ([dz] or [z]) alternates
both with /c/ ([ts]), as in /cuka/~/zuka/, and with /s/, as in /sora/~/zora/, and
that /j/ ([]) alternates both with // ([c]), as in /i/~/ji/, and with /s/ ([]),
as in /irui/~/jirui/.2


Timothy J. Vance

2. Historical development
The oldest substantial texts in Japanese date from the 8th century, and the
language they represent presumably reflects a variety spoken by the aristocracy in the contemporary capital of Nara. There is general agreement that
word-medial voiced obstruents were prenasalized in Old Japanese: [Ng ndz
n m 3
d b] (Vance 1983: 335337). As Unger (1977: 89) first pointed out, if
we make the plausible assumption that such prenasalization was present in
prehistoric Japanese as well, a satisfying explanation for the origin of sequential voicing is available. Hamada (1952: 23) cites the examples in (2)
to illustrate the historical process of interest in some items that developed
after the 8th century.4

/sumi+sur-i/  /suzuri/
/ika ni ka/  /ikaga/

 writing brush

In each case, it looks as if a sequence of the form N (nasal consonant) + V

(vowel) + O[-vce] (voiceless obstruent) was replaced by O[+vce], i.e., the
voiced counterpart of the original obstruent. This replacement would have
been a natural consequence of vowel syncope, given that word-medial voiced
obstruents were prenasalized at the time. Vowel syncope alone would have
yielded a phonotactically anomalous nasal+obstruent cluster in each case.6
It is a plausible inference that this process was involved in the origin of
rendaku. As the example in (3) shows, this account requires us to posit an
earlier syllable of the form NV between the two elements of a compound
that showed rendaku in Old Japanese.7


/yama/ + POJNV + POJ/ta/  OJ/yama+da/

mountain + ??
+ paddy  mountain paddy

The obvious candidate for the mystery syllable is the genitive particle
/n/, the ancestor of OJ/n/ and modern /no/. Attested Old Japanese vocabulary items like those in (4) suggest why rendaku was irregular (as it
continues to be in modern Japanese).8




autumn leaf
bamboo leaf
bamboo-grass leaf 


/sasa n pa/

Rendaku in inflected words


As expected, lexicalized phrases that retained genitive OJ/n/ (as in 4a), did
not show rendaku.9 Noun+noun compounds could have originated either by
simple juxtaposition, in which case rendaku did not occur (as in 4b), or by
contraction of a phrase, in which case rendaku did occur (as in 4c).10

3. Inflected words
Verb+verb compound verbs are abundant in Japanese, but they rarely show
rendaku. An example is /kak-i+tor-u/ write down, which contains the roots
of /kak-u/ write and /tor-u/ take. The first component verb in such a
compound is invariable; it must appear in its continuative form.11 The
second component verb bears whatever inflectional ending is required for
the compound as a whole; the citation form of a verb is the nonpast indicative. The account in 2 of the origin of rendaku provides a natural explanation for the rarity of rendaku in compounds of this type (Vance 1983).
There is no reason to suppose that the components of a verb+verb compound verb were ever connected by a genitive particle or any other NV
syllable in earlier stages of Japanese.
As noted in the previous paragraph, the first element of a verb+verb
compound verb appears in its continuative form. The continuative of any
verb is an inflectional form, and as a word on its own it functions to connect its clause to a following clause. The example in (5) illustrates with
/hana-i/ (romanized hanashi), the continuative of /hanas-u/ speak.

Tomodachi to hanashi,
sore kara nemashita.
with speak-CONT that from sleep-POLITE-PAST
(I) spoke with (my) friend, and after that (I) went to bed.

According to the traditional Japanese analysis of verb morphology, almost

all verbs fall into one of two regular conjugation classes.12 Assuming the
widely adopted morphological segmentation of verb forms proposed by
Bloch (1946), every verb in the first of these two classes has at least one
stem allomorph that ends in a consonant.13 The verb meaning speak in the
first clause in (5) is an example of such a consonant-stem verb: the stem
allomorph in the citation form /hanas-u/ ends in /s/, and the stem allomorph
in the continuative /hana-i/ ends in //.14 The continuative of every consonant-stem verb has the inflectional ending /i/. Every verb in the other regular conjugation class has an invariant stem ending in a vowel (either /i/ or


Timothy J. Vance

/e/). An example of such a vowel-stem verb is /tabe-ru/ eat, with the nonpast indicative marked by /ru/ rather than by the /u/ of consonant-stem
verbs. The continuative of this verb is /tabe/, with no inflectional ending,
since the continuative of every vowel-stem verb is identical to its stem.15
Many verbs have a corresponding deverbal noun that is segmentally identical to the continuative, although it may be accented on a different syllable
(Martin 1952: 34). The examples in (6) illustrate, pressing English gerunds
into service as translations of the continuative forms.

/yasum-u/ rest16
/yasum-i/ resting
/yasum-i/ vacation, break

/kikoe-ru/ be audible
being audible

Okumura (1955) claims that rendaku does not occur in compounds of inflected word plus inflected word.17 In fact, we do find examples of rendaku
in such compounds, but as mentioned above, rendaku is rare in verb+verb
compound verbs. Okumuras illustrative examples actually suggest a more
interesting generalization. Two of those examples appear in (7).

a. /waka-i+kak-u/ write with spaces between words (a verb)

b. /waka-i+gak-i/ writing with spaces between words (a noun)

Both examples in (7) derive from the verbs /wakac-u/ divide and /kak-u/
write, and the former, like all non-final verbal elements in compounds,
appears in its continuative form /waka-i/. The verb /waka-i+kak-u/ (7a:
V1+V2=V) is given in its citation form, with the second element bearing the
nonpast affirmative ending /u/. The noun /waka-i+gak-i/ (7b: V1+V2=N),
on the other hand, does not inflect; the second element is fixed in form.
Okumuras precise claim thus appears to be that rendaku will not occur in a
compound which consists of two inflected words and is itself an inflected
word. At the same time, the second example suggests that we should expect
rendaku in a compound that consists of two verb stems but is itself a noun.
The examples just considered involve verbs. The other major class of
inflected words in Japanese is adjectives.18 Just as in the case of a verb, the
citation form of an adjective is the nonpast indicative. The adjectival nonpast indicative suffix has the invariant form /i/. The continuative form of an
adjective is always marked by the suffix /ku/ and is never identical to the
stem. When a compound contains an adjective as its initial element, the
adjective always appears as a bare stem. The examples in (8) illustrate.

Rendaku in inflected words



being heavy
oppressive (cf. /kurui-i/ strained)


being early
early rising (cf. /oki-ru/ get up)


Some adjective stems can be used as nouns (Martin 1975: 399), as the examples in (9) show.

/maru-i/ round

/a+iro-i/ brown (literally tea-colored)

/a+iro/ brown (literally tea-color)

4. Verb+verb compounds
A set of verb+verb compounds was collected to assess the notion that rendaku does not occur in a compound that consists of two inflected words and
is itself an inflected word. The first step in the collection procedure was to
make a list of all the non-compound verbs beginning with a voiceless obstruent that appear in Kazama (1979), a reverse dictionary that has a separate section for each part of speech. There is no point in considering verbs
that do not begin with a voiceless obstruent, since rendaku cannot affect a
vowel (as in /oboe-ru/ remember), a sonorant (as in /nom-u/ drink), or an
obstruent that is already voiced (as in /de-ru/ leave). In order to limit the
investigation to words in common use in modern Japanese, each verb on the
list was checked in a medium-size Japanese-English dictionary (Hasegawa
et al. 1986). Every verb on the list that does not appear as a headword in
this dictionary was eliminated from further consideration. Also eliminated
was every verb that contains a medial voiced obstruent (e.g., /sage-ru/
lower). A compound containing such a second element is subject to a
well-known constraint on rendaku called Lymans Law (Vance 1987: 136
139): rendaku almost never affects an initial obstruent of an element that
already contains a voiced obstruent. Consequently, it would not be appropriate to cite a verb+verb compound verb such as /hik-i+sage-ru/ pull
down as support for the claim that compounds of this form resist rendaku.
The next step in the data collection process was to find compounds of
the form V1+V2=V or V1+V2=N in which V2 is one of the verbs remaining
on the list described in the preceding paragraph. The original intent was to


Timothy J. Vance

select every compound of the appropriate form that appears in either or

both of two reverse dictionaries (Y. Kitahara 1990 and Iwanami Shoten
Henshbu 1992). However, this step in the process turned out to be tremendously time-consuming, and it was completed only for items in which V2 is a
consonant-stem verb.19 Each compound was then checked in an unabridged
dictionary produced by a major Japanese publisher (Matsumura 1988). To
avoid missing any relevant items, even if, for a particular V1 and a particular
V2, only one of V1+V2=V and V1+V2=N was found in the reverse dictionaries, both were checked in the unabridged dictionary. Those compounds
that appear in the unabridged dictionary were retained for further consideration.20
The next step in the process was to eliminate compounds that probably
should not be analyzed as containing two verbs in modern standard Japanese.
For some of these excluded items, the reason is simply that the entry in the
unabridged dictionary identifies them as obsolete or dialectal. In other cases,
the unabridged dictionary does not list V1 as an independent verb in the
modern language. In still other cases, a compound contains an etymological
verb form that seems to have lost its verb-form status. Examples of this last
type include /tatami+kae/ replacing mats, in which the etymological continuative /tatam-i/ seems to be functioning as a simple noun meaning mat
with no connection to the verb /tatam-u/ fold. Another example is /kumori+gai/ tending toward overcast, in which /gai/ is etymologically the rendaku allomorph of the continuative of /kac-u/ win. In modern Japanese,
/gai/ is simply a derivational suffix that derives nouns from verbs and no
longer has any connection to the verb meaning win.21
Next, coordinate compounds such as /yom-i+kak-i/ reading and writing
were eliminated, since it is well known that rendaku generally does not
occur in coordinate compounds (Vance 1987: 144145). It would not appropriate to cite /yom-i+kak-i/ as evidence that verb+verb compound nouns
do not show rendaku.
The last step in the process was to search the remaining examples for
pairs in which a verb+verb compound verb (V1+V2=V) and a verb+verb
compound noun (V1+V2=N) both involve the same two verbs in the same
order. The final data set consists of every such pair (a total of 234 pairs)
and the pronunciation(s) given in the unabridged dictionary for each paired
The data set includes some pairs that exemplify the pattern described in
3, including (10). Notice that the verb does not show rendaku while the
noun does.

Rendaku in inflected words

(10) V1+V2=V [rendaku]

V1+V2=N [+rendaku]



/toor-i+kakar-u/ pass by
/toor-i+gakar-i/ passing by

But there are also pairs in which both the verb and the noun show rendaku
and other pairs in which neither shows rendaku. The examples in (11) illustrate.
(11) a. V1+V2=V [+rendaku] : /kaer-i+zak-u/ bloom again
V1+V2=N [+rendaku] : /kaer-i+zak-i/ second blooming
b. V1+V2=V [rendaku] : /mi+toos-u/
V1+V2=N [rendaku] : /mi+too-i/


In some pairs, the pronunciation of one or both members has the mora obstruent /Q/ following the continuative form of V1, as in (12a), or in place of
the last syllable of the continuative form of V1, as in (12b).22
(12) a. V1+V2=V [rendaku] : /hane+kaer-u/ rebound
V1+V2=N [rendaku]~[mora obstruent] :
/hane+kaer-i/~/hane-Q+kaer-i/ rebound
b. V1+V2=V [mora obstruent]: /yoQ+para-u/ get drunk
V1+V2=N [mora obstruent]: /yoQ+para-i/ drunken person
The continuative form of /yo-u/ get drunk (V1 in 12b) is /yo-i/. Since the
mora obstruent pre-empts rendaku (Vance 1987: 148), if the unabridged
dictionary gives a pronunciation with /Q/ as the only pronunciation of either
member of a pair, that pair was excluded from the statistics reported below.
In other pairs, the pronunciation of one or both members has the mora nasal
/N/ in place of the last syllable of the continuative form of V1, as in (13).
(13) V1+V2=V [mora nasal]
V1+V2=N [mora nasal]

: /fuN+gir-u/ take decisive action

: /fuN+gir-i/ taking decisive action

The continuative form of /fum-u/ step (V1 in these examples) is /fum-i/.

Since the mora nasal seems to induce rendaku in compounds of this kind, if
the unabridged dictionary gives a pronunciation with /N/ as the only pronunciation of either member of a pair, that pair was excluded from the statistics
reported below.
The unabridged dictionary gives alternative pronunciations for several
of the words in the data set. The three examples in (14) illustrate.


Timothy J. Vance

(14) a. V1+V2=V [rendaku]~[+rendaku] : /i+kum-u/~/i+gum-u/ set up

b. V1+V2=N [rendaku]~[+rendaku] : /ne+kom-i/~/ne+gom-i/
(time of) sound sleep
c. V1+V2=V [rendaku]~[mora obstruent] : /sa-i+hik-u/~/saQ+pik-u/
In the statistics reported below, if a verb+verb compound verb has one pronunciation with rendaku and another pronunciation without rendaku, the
pronunciation without rendaku was counted. For example, (14a) was counted
simply as not showing rendaku. On the other hand, if a verb+verb compound noun has one pronunciation with rendaku and another pronunciation
without rendaku, the pronunciation with rendaku was counted. Consequently,
(14b) was counted simply as showing rendaku. Treating verbs and nouns
differently in this way biases the statistics in favor of the putative pattern
described in 3, i.e., that rendaku does not occur in verb+verb compound
verbs but does occur in verb+verb compound nouns. Since the statistics
reported below will be used to deny the existence of this pattern, the deliberate bias in counting will strengthen the argument. As for items having
one pronunciation with /Q/ or /N/ and another pronunciation without, the
pronunciation without a mora consonant was counted. For example, (14c)
was counted simply as not showing rendaku.
A few pairs in the data set were particularly problematic, including the
two examples shown in (15).
(15) a. V1+V2=V [rendaku]: /mi+tor-u/
V1+V2=N [rendaku]: /mi+tor-i/
V1+V2=N [+rendaku]: /mi+dor-i/

comprehend by looking at
comprehending by looking at
looking over and selecting

b. V1+V2=V [+rendaku]~[rendaku]~[extra mora obstruent]:

/de+bar-u/~/de+har-u/~/deQ+par-u/ protrude
V1+V2=V [+rendaku]~[rendaku]~[extra mora obstruent]:
/de+bar-i/~/de+har-i/~/deQ+par-i/ protruding; protrusion
Corresponding to the verb in (15a), the unabridged dictionary has two separate noun entries, one with rendaku and another without, and the definitions
for these two entries are different. Although the noun without rendaku
matches the verb semantically, in keeping with the deliberate bias explained
just above, (15a) was counted as a verb not showing rendaku and a noun
showing rendaku. The unabridged dictionary gives three pronunciations

Rendaku in inflected words


each for the verb and noun in (15b). Maintaining the bias again, (15b) was
counted as a verb not showing rendaku and a noun showing rendaku.
The data set contains a total of 234 verb/noun pairs, and the table in (16)
shows how these pairs pattern in terms of rendaku.







As the lower right cell in (16) shows, in the great majority of the pairs
(202/234 =86%), neither the verb nor the noun shows rendaku. In other
words, pairs like /mi+toos-u/ foresee and /mi+too-i/ prospect (11b) are
the norm. By comparison, despite the deliberately biased counting described above, only a small fraction of the pairs in the data set (22 /234 =
9%) exhibit the behavior that Okumura (1955) suggests is typical. This
means that pairs like /toor-i+kakar-u/ pass by and /toor-i+gakar-i/ passing by (10) are actually quite unusual.
Needless to say, a data set consisting of entries in an unabridged dictionary certainly will not match the relevant portion of a representative native speakers actual vocabulary. To get some idea of how serious this
shortcoming might be, a well-educated native speaker went through the 234
verb/noun pairs in the data set, discarded those that were unfamiliar to her,
and noted pronunciations (with or without rendaku) that differed from her
own.23 Applying the same counting bias as above to this revised data set,
the pairs in this speakers vocabulary pattern as in (17).






The total number of pairs in this revised data set is 208, and their distribution
in the four cells of the table in (17) differs very little from the distribution of


Timothy J. Vance

the pairs in (16). Here again, in most of the pairs (188/208 =90%), neither
the verb nor the noun shows rendaku, and only a small fraction of the pairs
(13/208 =6 %) show rendaku in the noun but not in the verb. In short, the
revised data set suggests that simply relying on the dictionary entries does
not lead us astray. Consequently, no attempt was made to go beyond dictionary entries for the counts reported below in 5.

5. Compounds involving adjectives

As Kikuda (1971) notes, adjectival elements in compounds actually pattern
very differently from verbal elements. Compounds of four additional types
will be considered in this section: adjective+adjective compound adjectives
(A+A=A), verb+adjective compound adjectives (V+A=A), adjective+verb
compound verbs (A+V=V), and adjective+verb compound nouns
(A+V=N). Although some adjective stems can be used as nouns, as noted
in 3, compound nouns ending with an adjectival element (A+A=N or
V+A=N) are very rare and will not be considered further.
A set of relevant compounds involving adjectival elements was collected
by following a procedure parallel to the one described in 4 for verb+verb
compounds. In this case, of course, the first step was to make a list of all
the non-compound adjectives (rather than verbs) beginning with a voiceless
obstruent that appear as headwords in the medium-size Japanese-English
dictionary (Hasegawa et al. 1986). The result of the collection procedure was
a data set consisting of compounds that appear in the unabridged dictionary
(Matsumura 1988). The verbal elements in items of the form A+V=V or
A+V=N were restricted to consonant-stem verbs, since these items were
collected in tandem with verb+verb compounds. As explained above in 3,
this part of the process was so time-consuming that it was completed only
for examples ending in a consonant-stem verbal element.
The number of adjective+adjective compound adjectives in the data set
is small, but almost half (8 /18=44%) show rendaku, as in (18).
(18) V+A=A [+rendaku]: /usu+gura-i/
Cf. /usu-i/ thin, /kura-i/


Rendaku also appears in nearly all verb+adjective compound adjectives

(17/20 =85%) and in all adjective+verb compound verbs (7/7 =100%), as
in (19).

Rendaku in inflected words


(19) V+A=A [+rendaku]: /utaga-i+buka-i/ suspicious

Cf. /utaga-u/ doubt, /fuka-i/ deep
A+V=A [+rendaku]: /naga+bik-u/
be prolonged
Cf. /naga-i/ long, /hik-u/ pull
Most adjective+verb compound nouns in the data set also show rendaku
(43/47=91%), as in (20). Just like the second element in a verb+verb compound noun, the verbal element in an adjective+verb compound noun appears in its continuative form.
(20) A+V=N [+rendaku]: /waka+gaer-i/
Cf. /waka-i/ young, /kaer-u/


The table in (21) summarizes the data collected for the six categories of
two-element compounds in which both elements are verbal or adjectival.
rendaku %













Unlike the table in (16), the table in (21) includes unpaired items in the two
V+V categories. Of the 732 (16+716) V+V=V items tabulated in (21), only
234 are those tabulated in (16). The remaining 498 are V+V=V compounds
for which no corresponding V+V=N compound is listed as a headword in
the unabridged dictionary. For example, the verb /okur-i+kaes-u/ send back
(cf. /okur-u/ send and /kaes-u/ return) is listed, but there is no entry for a
corresponding noun (which would be either /okur-i+kae-i/ or /okur-i+gae-i/).
Similarly, of the 469 (211+258) V+V=N items tabulated in (21), 235 are
V+V=N compounds for which no corresponding V+V=V compound is
listed as a headword in the unabridged dictionary. For example, the noun
/oboe+gak-i/ memo (cf. /oboe-ru/ remember and /kak-u/ write) is listed,
but there is no entry for a corresponding verb (which would be either
/oboe+kak-u/ or /oboe+gak-u/).
Including such unpaired items makes it clear that V+V=N compounds are
much more likely to show rendaku than V+V=V compounds. Nonetheless,
this difference is just a strong statistical tendency, not an inviolable principle.
Furthermore, the fact that compounds containing adjectival elements are so


Timothy J. Vance

likely to show rendaku means that only verbal elements exhibit this tendency. It is not a generalization that applies to all inflected-word elements.

6. Conclusion
As shown in 4, only a small minority of paired V+V=N and V+V=V items
show rendaku in the noun but not in the verb (as in /toor-i+gakar-i/ passing
by and /toor-i+kakar-u/ pass by). On the other hand, when unpaired items
are taken into consideration (as in 5), it is clear that V+V=N compounds
are much more likely to show rendaku than V+V=V compounds. Nonetheless, this difference is just a strong statistical tendency, not an inviolable
principle. Furthermore, the fact that compounds containing adjectival elements are so likely to show rendaku (as demonstrated in 5) means that
only verbal elements exhibit this tendency. It is not a generalization that
applies to all inflected-word elements. Incidentally, if rendaku originated as
described in 2 above, the behavior of adjectival elements is a mystery,
since there is no reason to suppose that the two elements in a compound
containing an adjectival element were linked by a syllable of the form NV
at some time in the past.

1. Many linguists prefer to analyze [h], [], and [F] as allophones of a single phoneme except in borrowings. I am assuming a uniform phonemic inventory for
all vocabulary strata and a split that has resulted in a contrast between [F] and
[h]/[] (with [h] appearing before /e/, /a/, or /o/ and [] appearing before /i/ or
/y/). Either way, the rendaku alternation is not simply a matter of voicing. The
ancestor of modern /h/ and /f/ was pronounced [p], although there is some controversy about how long the [p] pronunciation persisted in the central dialects
(Kiyose 1985). See also Ohno (this volume, 4.2).
2. In Vance (1987: 24), I said that the two allophones of /z/, [dz], and [z], are
distributed as follows: [dz] word-initially or immediately following the mora
nasal /N/ and [z] elsewhere. The actual distribution is certainly not this clean,
but there is no contrast, and the two are unquestionably allophones of a single
phoneme. The modern rendaku pairing of /z/ with /c/ and /s/ reflects the historical merger of a voiced affricate and a voiced fricative, and so does the pairing of /j/ with // and // .

Rendaku in inflected words


3. Old Japanese also had a corresponding series of phonemes realized as voiceless obstruents, two phonemes realized as nasals, and two phonemes realized
as semivowels. The entire Old Japanese consonant inventory is typically transcribed phonemically as /p t s k b d z g m n y w/. For a recent attempt at phonetic
reconstruction of the entire Old Japanese phonological system, see Miyake
4. The etymologies in (2) are reasonably secure and are given in Nihon Daijiten
Kankkai 197276, although Miller (1967: 21314) is dubious about this etymology for /fude/. The earliest attestations for the shortened forms range from
ca. 900 for /ikaga/ to ca. 1000 for /fude/. Although the earliest attestation for
/sumi+suri/ is from the tenth century, the other two long forms are attested
from the eighth century.
5. The hyphen /sur-i/ separates what are commonly analyzed as a verb stem and
an inflectional ending. See the discussion below in 3 for details.
6. The mora nasal /N/, which occurs syllable-finally in modern Japanese, is a
later development (Hamada 1955; Vance 1987: 5657).
7. All Old Japanese examples are marked with a superscript OJ. The transcription
conventions follow Millers (1986: 198) slightly modified version of the system
first adopted by Mathias (1973) and endorsed by Martin (1987: 50). The transcription reflects the fact that many modern standard syllables with one of the
vowels /i e o/ correspond to two distinct eighth-century syllables. (For details,
see Lange 1973 and Shibatani 1990: 125139). For each such eighth-century
pair, it is standard practice to label one syllable type A (k-rui) and the other
type B (otsu-rui), following Hashimoto (1917: 173186). Some researchers construe the phonological differences between the type-A and type-B syllables as
vowel-quality distinctions; others construe them as distinctions between syllables
with and without a glide: CV vs. CGV. In any case, the transcription adopted
here represents type-A syllables with a circumflex over the vowel / /, typeB syllables with a diaresis over the vowel / /, and syllables for which there
was no A/B distinction with no diacritic /i e o/. A capitalized vowel /I E O/ indicates a syllable for which there was an A/B distinction but for which the
category is unknown. The source for all Old Japanese forms is the Jdaigo Jiten Hensh Iinkai 1967, the definitive dictionary of Old Japanese. Hypothetical pre-Old Japanese forms are marked with a superscript POJ.
8. On the fundamental irregularity of rendaku, see Vance 1987: 146148, Ohno
2000, and several of the papers in this volume. It is curious, to say the least,
that these irregularities have not been leveled out over the course of the last
millennium, but they have not. This is not to say that the situation has been
static. Many individual vocabulary items that used to have rendaku no longer
do and vice versa. But these changes do not seem to have any discernible direction. To give just one set of examples, Hepburns 1867 dictionary lists the
verb /ki+kae-ru/ change clothes and the corresponding noun /ki+gae/ changing clothes, and it also lists the verb /nor-i+kae-ru/ change horses and the










Timothy J. Vance
corresponding noun /nor-i+gae/ changing horses. The descendants of these
items for most modern Tokyo speakers are /ki+gae-ru/ (a gain for rendaku),
/ki+gae/ (no change), /nor-i+kae-ru/ (no change), and /nor-i+kae/ (a loss for
On the other hand, there are puzzling examples of rendaku in phrasal items of
this form, including OJ/ama+n+gapa/ Milky Way (cf. OJ/kapa/ river), with
genitive OJ/n/, and OJ/ma+tu+g/ eyelash (cf. OJ/k/ hair), with genitive OJ/tu/
(which has not survived into modern Japanese).
As explained in note 7, forms marked with a superscript POJ are hypothetical
pre-Old Japanese. The genitive POJ/n/ in POJ/sasa n pa/ in (4c) is not attested;
it is merely an inference. Interestingly, the form now in use in modern Japanese is /sasa no ha/, not /sasaba/. The other two items in (4) are also obsolete. I
have nothing illuminating to say about why certain composite items in the preOld Japanese vocabulary contained a genitive particle while others did not. I
assume the situation was much the same as it is in modern Japanese. For example, I have no explanation to offer for why the notion toe is expressed by
the phrase /ai no yubi/ foots digit whereas the notion ankle is expressed
by the compound /ai+kubi/ foot-neck.
The term continuative is Kunos (1973: 195). The traditional term in Japanese
grammar renykei adverbial form, and Bloch (1946: 6) calls it the infinitive
The two classes are called godan-kastuy-dshi five-row inflection verbs and
ichidan-kastuy-dshi one-row inflection verbs. For details, see Vance (1987:
178 184).
This rather clumsy characterization is necessary because of verbs such as /ka-u/
buy, which has a consonant-final allomorph only before /a/, as in the negative
/kaw-ana-i/. The Old Japanese citation form of this verb was OJ/kap-u/, and the
modern forms reflect a well-known sequence of historical changes. The standard
account is that, in word-medial position, [p] >[w], and then [w] > except before /a/. I use Blochs (1946) morphological segmentations as a convenience,
not as an endorsement of the analysis behind them.
Many linguists prefer not to analyze [s] and [] as contrastive, treating [] before /i/ as a realization of /s/ and [] before by any other vowel as a realization
of /sy/ .
Parallelism with consonant-stem verbs would dictate a zero morph marking
the continuative of a vowel-stem verb (as in /tabe+), but I will not clutter the
transcriptions in this paper with zero morphs. For an argument that the very
notion of a zero morph is incoherent, see Matthews (1974: 117).
The distinctive part of the pitch-accent pattern on a Japanese word is a fall
from high pitch to low pitch. I mark the location of a fall with a downwardpointing arrow (). Some words are unaccented, i.e., contain no pitch fall, and
no arrow appears in the transcription of an unaccented word. Standard references on Japanese accent include McCawley (1977), Haraguchi (1977), and

Rendaku in inflected words







Pierrehumbert and Beckman (1988). Aside from these examples in (5), I have
not bothered to mark accent in this paper, since accent does not figure in the
Sakurai (1966: 41) makes a similar claim about compounds of inflected word
plus inflected word, but he qualifies it by saying that if the first element is used
as a noun, sequential voicing can occur. However, since the first element must
appear in its stem form, it is not clear how to determine whether it is being
used as a noun (Vance 1987: 143).
The other class of inflected words in Japanese contains only a single member:
the copula (Bloch 1946: 2124). We will not consider it here, since even if it
occurred in forms that could be construed as compounds, its citation form /da/
and most of its other forms begin with the voiced obstruent /d/, making rendaku
inapplicable (or vacuous).
I have no reason to think that examples containing vowel-stem verbal elements
would significantly change the overall picture that emerges. I could be wrong,
of course.
A compound could appear either as a headword itself or as a subentry under its
first element.
To be more precise, the continuative of a verb followed by /gai/ is either an
adjectival noun (keiydshi) or what Martin (1975: 179) calls a precopular
noun. See Martin (1975: 418419) for discussion and examples.
For details on the mora obstruent in compounds like (12b), see Vance (2002).
I am grateful to my research assistant, Mieko Kawai, for her painstaking work.

Ranking paradoxes in consonant voicing

in Japanese
Haruka Fukazawa and Mafuyu Kitahara

1. Introduction
Quite often phonological phenomena are found only in a certain vocabulary
class but not in others within a single language. However, the basic tenet in
OT, a single invariant ranking, seems incompatible to those multiple vocabulary classes with inconsistent phonological phenomena. Recent OT
analyses have developed useful notions to approach this problem.
First, multiple sub-lexica are defined when their phonological properties
are distinct enough. For instance, Japanese has at least four phonological
sub-lexica, such as Yamato, Sino-Japanese, Mimetics, and Foreign (It and
Mester 1995, 1999; Fukazawa et al. 1998).
Second, those sub-lexica are organized in a core-periphery structure (It
and Mester 1995). Generally (and historically), the native vocabulary tends
to form the core part while non-native vocabularies tend to form the periphery. A constraint-based implementation of the core-periphery structure is
to assume that the more native the sub-lexicon is, the more markedness
constraints it may obey. So, for example, in the most native sub-lexicon Z,
constraints *F, *G, and *H are all respected. In the least native sub-lexicon
X, however, only the constraint *F is satisfied. Figure 1 shows these relations
in a set of concentric ellipses.

*H respected
*G respected
*F respected

Figure 1. A schematic diagram of the core-periphery structure in a constraint-based

system. X, Y, and Z are sublexica and *F, *G, *H are markedness constraints.

106 Haruka Fukazawa and Mafuyu Kitahara

Faithfulness constraints must be ranked between any two markedness constraints for this core-periphery structure to work. For example, in the sublexicon X, *F is respected but *G and *H are violated. What ensures the
latter two to be violated is a relevant faithfulness constraint for the sublexicon X. By the same token, the sub-lexicon Y has its own version of
faithfulness constraint where *F and *G are respected but *H is violated.1
The overall ranking for this example is shown in (1).

*F >> Faith-X >> *G >> Faith-Y >> *H >> Faith-Z

In this system, the ranking of markedness constraints is what determines the

order of posited sub-lexica. In other words, the ranking of markedness constraints must be consistent anywhere in the phonology of the language. A
piece of evidence for a ranking paradox in the markedness hierarchy totally
confuses the sub-lexicon analysis. However, we have found three apparent
ranking paradoxes around consonant voicing phenomena in Japanese. The
markedness hierarchy for the consonant voicing in Japanese established in
It and Mester (2001) cannot account for the data, which were first brought
up in Tateishi (2001) and (2002).
In this paper, we will re-examine the constraint ranking regarding consonant voicing in Japanese. Through the analysis in the following sections,
our goal is to show three theoretical claims. First, building on Fukazawa,
Kitahara, and Ota (2002), we argue that any etymology-motivated sublexica does not exist in the phonological grammar of Japanese. 2 We will
introduce a new system of sub-lexica which is based solely on grammatical/
morphological information. The gist of our claim is that Japanese phonological lexicon is classified not into native, non-native etc. but into
marked and unmarked groups with respect to a particular markedness
Second, these groups are defined by the relativized faithfulness constraints. The trigger for relativizing the set of faithfulness constraints is again
grammatical/morphological information. Historical/etymological categories
cannot become evidence for relativizing the faithfulness constraints. We will
propose that the relativization is triggered by stem/affix distinction in the
case of consonant voicing in Japanese.
Third, we thrash out a question: can markedness constraints be relativized
as well as faithfulness constraints? In Fukazawa and Kitahara (2001), we
have argued that only the faithfulness constraints can be relativized. This
point is further reinforced by the analysis in the present paper.

Ranking paradoxes in consonant voicing in Japanese 107

The organization of this paper is as follows: In section 2, we will show that

the constraint ranking proposed by It and Mester (2001) leads to three
cases of ranking paradoxes against the data presented in Tateishi (2001).
Section 3, then, will give our solution to these cases. In section 4, we will
discuss theoretical implications of the present analysis in OT, such as sublexicalization of lexicon without etymological knowledge, removing domains
from markedness constraints, and relativization of faithfulness constraints.


Data and issue

2.1. Consonant voicing and Japanese phonological lexicon

As reviewed in introduction, the recent OT analyses for the grammar with
phonological sub-lexica assume that a core-periphery structure arises
through the interaction of faithfulness constraints and markedness constraints. A partial constraint ranking relevant for consonant voicing in
Japanese has been considered as in (2).

Ranking for consonant voicing (adapted from It and Mester 2001)


In this ranking, *VoiObs2stem, Express Affix, *NT, and *VoiObs are markedness constraints whose definitions are given in (3).

108 Haruka Fukazawa and Mafuyu Kitahara


Definition of relevant markedness constraints

a. *VoiObs2stem: no double obstruent voicing in a stem. (Lymans Law)
b. EXPRESSAFFIX: affixes must be realized in the output.
c. *NT: no voiceless obstruent after a nasal.
d. *VoiObs: no voiced obstruents.

These markedness constraints are respected most in the Yamato sub-lexicon,

but none in the Foreign sub-lexicon, which is enforced by the intervening
faithfulness constraints. The ranking in (2) further shows that there are two
sub-lexica between Yamato and Foreign: the Sino-Japanese(SJ) sub-lexicon
and the Common-Sino-Japanese(CSJ) sub-lexicon. The only difference between them is whether Rendaku occurs or not. It and Mester (2001) posit
EXPRESSAFFIX as the regulating markedness constraint for Rendaku since,
in their treatment, Rendaku is an insertion of a [voice] feature as an affix in
compounding (see It and Mester 1986, 1998).3
Due to the fact that Rendaku is typically a characteristic of Yamato
words, we see that CSJ words are more nativized than SJ words. In a constraint-based view, this is represented in such a way that EXPRESSAFFIX is
ranked between the faithfulness constraint for CSJ and that for SJ. Let us
see just one example from It and Mesters analysis where all the markedness constraints appeared in (3) are relevant.

Tableau for [oyako geNka]parent-child quarrel in It and Mester


*VoiObs2stem EXPRESSAFFIX ID[voice]CSJ *NT *VoiObs

 a. oyako-geNka

b. oyako-keNka


c. oyako-keNga


d. oyako-geNga





The word [oyako-genka] is a Yamato-CSJ compound, which triggers only

the IDENT[voice]CSJ constraint to watch voicing modification in /kenka/. A
violation of that constraint by Rendaku in candidate (a) is not fatal since
other markedness constraints are ranked higher. Candidates without Rendaku (b and c) are penalized by EXPRESSAFFIX constraint. Candidate (d)
has Rendaku and voiced obstruent after [N], which is penalized severely by
the highest ranked *VoiObs2stem. Note that the stem domain specified for

Ranking paradoxes in consonant voicing in Japanese 109

this self-conjoined constraint is crucial in the analysis. The constraint only

sees two voiced obstruents as in [geNga]: another voiced obstruent in the
first stem does not matter as in, for instance, [mizu-geNka]-fight for water.


Ranking Paradoxes

2.2.1. Paradox between markedness and faithfulness

Tateishi (2001) provides data of loanwords from English plural forms
which pose problems for the ranking in (2). These words are team names of
Major League Baseball and National Hockey League in the US, which have
become popular quite recently in Japan. The data in (6) are added to show
that words in other areas are also relevant.
Tateishi indicates that English plural forms are not just borrowed into
Japanese as they are in English but altered so as to fit into Japanese phonology. In English, the voicing of the plural morpheme -s depends on
that of the last segment of the stem. However, the voicing contrast in plural
forms does not necessarily follow the pattern in English when they are
taken into Japanese.

Data from Tateishi (2001)



[howaito sokkusu]
[reddo uingusu]

White Socks
Red Wings

Additional Data



Words in (5ac) show that loanwords are copying the voicing value of the
plural morpheme in the original. However, those in (5de) and (6cd) suggest that the situation is not that simple. The pronunciation of the plural
morpheme of those words in English is always [z] since the stem ends in a
voiced segment. However, corresponding Japanese loanwords have [-su]. It

110 Haruka Fukazawa and Mafuyu Kitahara

is obvious that this is not a simple final devoicing phenomenon because of
the existence of [-zu] forms in (5b) and (5c). The pattern here seems that (i)
plural -s is voiced after a nasal as in (5b), (5c) and (6b), (ii) plural -s is
voiceless when the stem contains at least one voiced obstruent as in (5d),
(5e), (6c) and (6d), (iii) otherwise, plural -s copies the voicing of the
original pronunciation as in (5a) and (6a). Thus, the data suggests that there
is a phonological alternation of some sort.
Tateishi points out that relevant markedness constraints for this phenomenon are *NT and *VoiObs2stem.4 In Japanese, especially in the native
vocabulary, these two markedness constraints are considered to be highranked due to the phenomena of post nasal voicing (PNV) and Lymans
Law, respectively. There is no voicing contrast after nasals in Yamato since
voiceless obstruents are not allowed in the environment. A morpheme does
not include more than one voiced obstruent in Yamato (Lymans Law).
These phenomena lead to the following constraint ranking.

*NT, *VoiObs2stem >> IDENT[voice]-Yamato

In contrast to Yamato, voicing contrast after a nasal is observed and a morpheme can contain more than one voiced obstruent in the Foreign sublexicon, which leads to the following constraint ranking.

IDENT[voice]-Foreign >> *NT, *VoiObs2stem

It is evident that Tateishis data in (5) contradict to the ranking for Foreign
words in (8) although those words are undoubtedly Foreign. It is true that
there are some assimilated-foreign words5 in Japanese lexicon, such as
[karuta] card (borrowed from Portuguese carta in 16th century). Assimilated-foreign words are phonologically quite close to Yamato words. For
example, [karuta] shows Rendaku in a compound [iroha garuta] cards of
the Japanese syllabary (see Takayama, this volume, for similar examples).
However, we cannot say the words in (5) are well-assimilated to Japanese because they are not widely used and are not popular to people other
than sports fans.
Looking closely, due to the fact that -s must be voiced after a nasal as
in (5bc), *NT needs to be ranked higher than the faithfulness constraint
for the Foreign sub-lexicon. Thus, the data suggest that either we abandon
the membership of the words in (5) to Foreign sub-lexicon, or admit a
paradoxical ranking (9).

Ranking paradoxes in consonant voicing in Japanese 111


*NT >> IDENT[voice]-Foreign

Also, *VoiObs2stem must be ranked higher than the faithfulness constraint

for Foreign to account for the data in (5eg), which is shown in (10).
(10) Tableau for Cubs  /kabusu/



a. kabu-zu



b. kabu-su

These are the first two problematic cases for the current OT analyses of
multiple sub-lexica in Japanese. The ranking paradoxes here occur between
a markedness constraint and a faithfulness constraint. The next subsection
introduces a more serious case where the ranking paradox arises between
two markedness constraints.

2.2.2. Paradox within markedness

As we have seen in (2), It and Mester (2001) propose a ranking where
*VoiObs2stem is ranked higher than *NT. However, we need a reversed
ranking, *NT >> *VoiObs2stem to account for the data in (5b) and (6b). In
those words, English plural morpheme -s is pronounced as [zu] after a
nasal although there is a voiced obstruent in the stem.
Tableau (11) shows that the reversed ranking is justified from the data.
Candidate (a) has two voiced obstruents, violating the *VoiObs2stem constraint. However, it wins over candidate (b) which has a voiceless obstruent
after a nasal. To get a correct output, the *NT constraint must be ranked
higher than the *VoiObs2stem constraint.
(11) Tableau for Indians  /iNdiaNzu/


 a. iNdiaN-zu
b. iNdiaN-su




112 Haruka Fukazawa and Mafuyu Kitahara

2.2.3. Summary of ranking paradoxes
We have seen three paradoxical cases for the ranking in (2) from the data in
Tateishi (2001). To account for Tateishis data, we need the rankings in (12).
(12) Partial rankings conforming Tateishis data
a. *NT >> IDENT[voice]-Foreign
b. *VoiObs2stem >> IDENT[voice]-Foreign
c. *NT >> *VoiObs2stem

for (5b) [iNdiaN-zu]

for (5d) [kabu-su]
for (5b) [iNdiaN-zu]

On the other hand, It and Mester (2001) proposed the ranking in (2) to
account for consonant voicing in Japanese. The relevant partial rankings
are summarized in (13).
(13) Partial rankings proposed by It and Mester (2001)
for e.g. [furaNsu]-France
a. IDENT[voice]-Foreign >> *NT
b. IDENT[voice]-Foreign >> *VoiObs stem for e.g. [gyagu]-gag
c. *VoiObs2stem >> *NT
for (4) [oyako-geNka]
It and Mesters rankings are thus in contradiction to the ranking for the
new data introduced by Tateishi (2001). In the following section, we will
give a solution to these ranking paradoxes.

3. Solution
Two of the three paradoxes in the previous section are essentially coming
from mixing up etymological knowledge with phonological knowledge.
We want to put IDENT[voice]-Foreign higher than *NT because we etymologically know that Indians is a foreign word in Japanese. Meanwhile, we
cannot phonologically know Indians as a foreign word because the obstruent after the nasal is voiced.
Fukazawa, Kitahara and Ota (2002) show the necessity of reconsidering
etymology-based Japanese sub-lexica. The previous literature has claimed
that sub-lexica are phonologically motivated and etymology-oriented labelling of sub-lexica is just a convention (It and Mester 1995; Fukazawa
et al. 1998). Subscript numbers or letters are often used instead. However,
just substituting labels to anonymous numbers or letters does not guarantee
the independence of phonology from etymology. Fukazawa et al. (2002)

Ranking paradoxes in consonant voicing in Japanese 113

proposed a concrete alternative that phonological sub-lexica can totally be

independent of etymological information. In lieu of etymology-based categorization, lexical items are classified into a marked or an unmarked group
with respect to a particular markedness constraint. In other words, there are
no items with [+Yamato] diacritics nor faithfulness constraints labelled as
Yamato which is sensitive to the diacritics.
(14) Markedness-driven system (Fukazawa, Kitahara and Ota 2002)
IDENT[voice]-X  Marked (both NT and ND possible)
[no alternations: PNV contrastive]
IDENT[voice]-Y  Unmarked (only ND possible)
[alternations in all morphemes: PNV redundant]
In the schematic partial ranking in (14), the upper IDENT constraint designates the marked sub-lexicon where there is no alternation in voicing after a
nasal. That is, both voiced and voiceless obstruents are possible after a nasal
for words in this sub-lexicon. When a word has a voicing alternation in post
nasal position, such as verb roots and the past tense /ta/, it belongs to the
unmarked sub-lexicon designated by the lower IDENT constraint. *NT is the
determining constraint in this case.
But, what can X and Y in (14) be? Our proposal in the present paper is
that general morpho-phonological domains and categories, such as stem,
affix, and word might replace those letters. This is not a new trick of any sort
but a quite standard approach to the relativization of faithfulness. Morphophonologically natural domains are the basics of Correspondence Theory
(McCarthy and Prince 1995) where base, reduplicant and such are specified
for the domain of faithfulness constraints. On the contrary, we argue against
any relativization and domain specification of markedness constraints. This
is the approach advocated in Fukazawa and Kitahara (2001) where we tried
to eliminate the domain specification of the Obligatory Contour Principle
(OCP) constraint. *VoiObs2stem constraint is the equivalent of the OCP in
the present analysis for the case of Rendaku. As we have seen in (3), the
domain stem is the crucial part of the definition. However, there is no
restriction which domain can be specified for a self-conjoined markedness
constraint.6 Introducing arbitrary domains for self-conjunction leads to the
relativization of markedness if the same markedness constraint with different
domains are put in a single ranking. Therefore, we will use a plain *VoiObs2
without stem domain in the present analysis.

114 Haruka Fukazawa and Mafuyu Kitahara

Having those considerations in mind, let us analyze the data in (5) and (6).
We assume -zu transferred from English plural morpheme -s belongs
to the unmarked sub-lexicon in Japanese because Japanese speakers are
tacitly aware of the voicing alternation of that morpheme.7 We assume only
morphological alternation is the driving factor for the split of a faithfulness
constraint. IDENT[voice]stem and IDENT[voice]affix are thus introduced and
they are ranked in that order. With the *VoiObs2 ranked at the top, these
split IDENT constraints produce the correct output [kabu-su] as shown in
tableau (15).8 The ranking essentially says avoid two voiced obstruents,
but voicing change in the stem is worse than that in the affix, leading candidate (b) wins over candidate (c).
(15) Tableau for Cubs  /kabusu/


a. kabu-zu



 b. kabu-su


c. kapu-zu


As in the schematic ranking in (14), *NT is ranked below IDENT[voice]stem ,

which is evident from a case without affix, such as /furaNsu/-France as in
(16) Tableau for France  /furaNsu/


a. furaNzu


 b. furaNsu


Thus, we have established a partial ranking: *VoiObs2 >> IDENT[voice]stem

>> *NT >> IDENT[voice]affix. However, this ranking cannot account for
[iNdiaN-zu] in (5b). The highest ranked *VoiObs2 kills the desired output
(a), leaving candidate (b), without voicing change in the stem, as the selected output.

Ranking paradoxes in consonant voicing in Japanese 115

(17) Tableau for Indians  /iNdiaNzu/


a. iNdiaN-zu
b. iNdiaN-su
c. iNtiaN-zu
d. iNtiaN-su

*VoiObs2 IDENT[voice]stem *NT



The problem here is that the highest ranked *VoiObs2 constraint immediately kills the desired output because there are apparently two [voice] features
for obstruents in /iNdiaN-zu/. However, is this really a problem? The answer
is No since a single [voice] feature can have two segments to be voiced. In
other words, a fusion of two [voice] features is a viable representation for
/iNdiaN-zu/. In Fukazawa and Kitahara (2001), we proposed that UNIFORMITY[F] can be relativized to a morpheme to regulate the fusion of features as
a repair strategy for the OCP violation. In the present paper, we will explore
more candidates with featural fusion for /iNdiaN-zu/ and other examples.
In addition to IDENT[F] and UNIFORMITY[F], relativized MAX[F] will be
necessary in the present analysis. For the sake of brevity, the definitions are
all given in (18) and the proposed overall ranking of relevant constraints is
shown in (19).
(18) Definition of relativized faithfulness constraints relevant for the present
analysis (Abbreviations in parentheses)
a. IDENT[voice]stem (IDst): the correspondent segments in a stem in the
input and the output have identical values for the feature [voice].
b. IDENT[voice]affix (IDaff): the correspondent segments in an affix in
the input and the output have identical values for the feature [voice].
c. UNIFORMITY[voice]stem (UNIst): no feature [voice] in a stem in the
output has multiple correspondents in the input (i.e., no coalescence
regarding the feature [voice] in a stem).
d. UNIFORMITY[voice]word (UNIwd): no feature [voice] in a word in
the output has multiple correspondents in the input (i.e., no coalescence regarding the feature [voice] in a word).
e. MAX[voice]stem (MAXst): every feature [voice] linked to a segment
in a stem in the input has a correspondent in the output.
f. MAX[voice]affix (MAXaff): every feature [voice] linked to a segment
in an affix in the input has a correspondent in the output.

116 Haruka Fukazawa and Mafuyu Kitahara

(19) Proposed ranking for consonant voicing in Japanese (Abbreviations
in parentheses)
MAX[voice]stem (MAXst)
UNIFORMITY[voice]stem (UNIst)
ExpressAffix (EXPAFF)
IDENT[voice]stem (IDst)
UNIFORMITY[voice]word (UNIwd)
IDENT[voice]affix (IDaff)
MAX[voice]affix (MAXaff)
With this new ranking, not only the problem in (17) but also all the ranking
paradoxes mentioned in the previous section are solved. 9 First, as shown in
(20), the fused candidate (a) is selected in spite of the violation of UNI10
(20) Tableau for Indians  /iNdiaNzu/

*Voi Obs2




IDst *NT UNIwd IDaff






[voi] [voi]








Ranking paradoxes in consonant voicing in Japanese 117

In (20), candidate (b) has two [voice] features resulting in a violation of the
highest ranked constraint *VoiObs2. The violation of MAX[voice]stem penalizes candidates (d) and (e). In those candidates, the feature [voice] attached
to the segment [d] in the input is lost in the output: [iNtiaN- zu/su]. Devoicing in the stem is worse than that in the affix because MAX[voice]stem is
ranked far higher than MAX[voice]affix. In the optimal candidate (a), two
[voice] features are fused into one. Therefore, it does not violate *VoiObs2.
Coalescence of the features in the word violates the faithfulness constraint,
the low-ranked UNIFORMITY[voice]word, but does not violate
UNIFORMITY[voice]stem since one of the [voice] features belongs to the stem
but the other belongs to the affix. That is, the coalescence takes place not
within a stem but within a word. Candidate (a) wins over candidate (c)
since *NT outranks UNIFORMITY[voice]word.
(21) Tableau for oyako-genka

*Voi Obs2 MAXst UNIst


IDst *NT UNIwd IDaff






# #



[voi] [voi]







Now, let us reanalyze the example (4) in from It and Mester (2001). In (21),
candidate (b) loses due to the violation of *VoiObs2. Candidate (a) in which
two [voice] features are fused within a stem violates UNIFORMITY[voice]stem .
Candidates (d) and (e) lose since Rendaku does not take place, resulting in
the violation of EXPRESSAFFIX. Consequently, candidate (c) in which Rendaku
takes place becomes optimal. It violates both IDENT[voice]stem and *NT, but
neither violation is more serious than those in other candidates.

118 Haruka Fukazawa and Mafuyu Kitahara

(22) Tableau for bibs  /bibu-su/
*Voi Obs2 MAXst UNIst



IDst *NT UNIwd IDaff

b i b u-su



b i b u-zu

# #


[voi] [voi] [voi]

b i b u-su

# #


[voi] [voi]
b i b u-zu





b i b u-su

b i b u-zu



[voi] [voi]
b i b u-zu





In (22), both candidates (a) and (c) have the same segmental structure
[bibu-su], but the featural structures are different. Candidate (c) has two
independent [voice] features violating the highest ranked constraint
*VoiObs2. On the contrary, candidate (a) violates UNIFORMITY[voice]stem,
since two features are fused within a stem. Similarly, we can consider three
different featural structures for [bibu-zu] as shown in candidates (b), (f),
and (g). All of them lose because they result in violating the highest ranked
constraint *VoiObs2 regardless of their featural structures. Candidate (e) violates MAX[voice]stem because the [voice] feature in the stem in the input loses
the correspondent in the output, resulting in a violation of MAX[voice]stem.
On the contrary, the loss of [voice] feature in candidate (a) occurs in the
affix, resulting in a violation of low-ranked IDENT[voice]stem. Consequently,
candidate (a) becomes optimal.

Ranking paradoxes in consonant voicing in Japanese 119

(23) Tableau for Cubs  /kabu-su/


*Voi Obs2




IDst *NT UNIwd IDaff









[voi] [voi]



In (23), candidate (c) loses due to its violation of *VoiObs2. Candidate (d)
loses since devoicing takes place in the stem, resulting in the violation of
high-ranked MAX[voice]stem. The violation of UNIFORMITY[voice]word in
candidate (a) is more serious than that of IDENT[voice]affix in candidate (b)
although both of them are relatively low-ranked. Two [voice] features are
fused not in the stem but in the word in candidate (a). Devoicing in the affix makes candidate (b) violate IDENT[voice]affix. However, (b) becomes
optimal since other candidates commit more serious violations. 11

4. Conclusion
We have seen paradoxical cases for the previously proposed system of multiple phonological sub-lexica in Japanese. Our proposal to resolve the paradoxes is simple: relativize faithfulness constraints with standard morphophonological categories. All the patterns brought up in Tateishi (2001) in (5)
and additional data of our own in (6) are all accounted for in our analysis.
The analysis so far brings up some theoretical implications. First, as we
have claimed earlier in introduction, any etymological information should
not be mixed up with phonological information for setting up sub-lexica.
This position is enforced by a simple consideration about language acquisition. There is no a priori knowledge for children that a certain item belongs
to a particular sub-lexicon. At the early stage of acquisition, the grammar,

120 Haruka Fukazawa and Mafuyu Kitahara

vocabulary, and the structure of lexicon are all acquired through phonological input.
Second, we have discussed elsewhere that relativization of a markedness
constraint is not a viable idea (Fukazawa and Kitahara 2001). That is the
background reason why we eliminate the stem domain from the selfconjoined *VoiObs2 constraint. If we allow arbitrary domain specification
in a markedness constraint and allow the same markedness constraint with
different domain specifications co-exist in a single ranking, the result will
be a relativization of markedness.
Finally, we would like to point out that relativization of faithfulness is a
fairly standard and well-motivated idea in recent OT studies. What domainrelative faithfulness constraints represent is, we believe, that different domains have different phonological tightness. For example, stem is less
vulnerable to modification than affix is since it is more tightly woven.

We thank the editors of this volume, Jeroen van de Weijer, Kensuke Nanjo
and Tetsuo Nishihara for giving us an opportunity to contribute our paper.
We also thank Shigeto Kawahara, Linda Lombardi, Mits Ota, and Koichi
Tateishi for helpful inputs. Of course, all errors are our own.

1. Sub-lexicon specific faithfulness constraints are derived from a general faithfulness constraint, which is called relativization, or split of faithfulness (Fukazawa 1999).
2. Of course, etymology was not part of the phonological grammar in the previous analyses either. It has been repeatedly pointed out that historical origins of
morphemes do not necessarily coincide with the identification of lexical subclasses (It and Mester 1999; Tateishi 2003). However, what blurs our eyes is
native speakers intuition about lexical classification. As Takayama (this volume) argues, we do have an intuition about lexical classes and it may interact
with the phonological grammar. Our point is that the former is theoretically
distinct from the latter.

Ranking paradoxes in consonant voicing in Japanese 121

3. Rendaku is known for many exceptions and there has been a number of analyses
dealing with its irregular nature. Articles in this volume covers most (if not all)
aspects of irregularity in Rendaku. In accordance with Kubozono (this volume),
we admit Rendaku has a hybrid nature: some words are lexicalized but there
also exists a productive synchronic process. Ohno (this volume) and Takayama
(this volume) focus on the lexicalized part of Rendaku and argues against productivity. Meanwhile, Haraguchi (2002) and Rice (this volume) try to capture
generalizations in the productive part. Following the thread of research on
Rendaku in the generative literature, we assume the productivity in Rendaku
though being aware of its limitation.
4. Tateishi suggests that a faithfulness constraint called NEIGHBORHOOD[voice]
might be relevant here. It essentially bans a change in a non-derived environment. His argument is rather directed to the free ranking of faithfulness constraints among multiple sub-lexica, such as Yamato and Foreign. We will not
go into this issue because our solution in Section 3 does not require those labelled sub-lexica anymore.
5. This terminology is adopted from It and Mester (1995).
6. Self-conjunction is an extension of Local Conjunction (Smolensky 1994, 1995,
1997) where the domain is given a priori.
7. Japanese speakes are certainly aware of the fact that English plural -s is a
suffix. Suzuki (1990) found that -s is often dropped in loanwords (e.g., [on
za rokku] on the rocks) due to the low functional load of plurality in Japanese
grammar. As for the unmarked status of -zu, the sole motivation for this assumption is that it obeys the post nasal voicing effect, which is commonly seen
in native suffixex, such as -ta/-da alternation in [kaN-da] bite-Past and
[kai-ta] write-Past. Tateishi (2003) argues that plural -zu is indeed a Yamato
8. The input form /-zu/ is postulated for the plural suffix following Tateishi
(2001, 2002, 2003). If we take /-z/ as an input instead, constraints on the licit
syllable structure in Japanese forces the final /u/ to be epenthesized. This follows the notion of the Richness of the Base (Prince and Smolensky 1993).
9. MAX[voice]affix is skipped in the following tableaux because it plays no role in
selecting the winning candidate.
10. For a detailed ranking argument for UNIFORMITY[voice] and markedness constraints, see Fukazawa and Kitahara (2001).
11. We have analyzed other data such as [gyagu]-gag, a usual Rendaku word like
[tabi-bito] travelers, a usual Lymans law case like [kita-kaze] north winds
with the proposed ranking, and have found out that they are all accounted for
without any paradox. We do not put the analyses in this paper due to space

The implicational distribution of prenasalized stops

in Japanese
Noriko Yamane-Tanaka

This paper gives an account of the regular affinity among voice, nasal and
place of articulation, focusing on intervocalic stop consonants in Japanese.
There is agreement that voiced obstruents appeared only intervocalically and
were prenasalized [Ng, ndz, nd, mb] in prehistoric and Old Japanese (Unger
1977; Vance 1983), and that these prenasalized stops (henceforth, PNS) are
still retained only in some of the Tohoku dialects in northern Japan.1 It has
been suggested that the phonemic inventory of the Tohoku system is similar
to that of Old Japanese (F. Inoue 2000), when it was only prenasal which
used to play a distinctive role for the intervocalic obstruents (Hamano 2000;
M. Takayama 2002; among others).
The loss of PNS spans several centuries and is still in progress crossdialectally, but the variations are not random. It is well-known, in the National Language Study in Japan, that the existence of [mb] implies that of
[nd] but not vice versa, and [nd] implies [N] but not vice versa (Hashimoto
1932; Hirayama et al. 1992; Kamei et al. 1997; Kindaichi 1941; Oohashi
2002; M. Takayama 2002; T. Takayama 1993; Uwano 1989; Yanagita 1930;
among others). In the framework of Optimality Theory (henceforth OT;
Prince & Smolensky 1993; McCarthy & Prince 1995), [N][g] alternation
in current Tokyo Japanese has been taken up (McCarthy & Prince 1995; It
& Mester 1997; Hibiya 1999; among others), however, little discussion has
been made as to the related alternation in other places of articulation, its
dialectal variance, and the relation to the historical shift of voice contrast.
This paper will try to shed light on these issues, adopting FAITH reranking model developed by It & Mester (1995ab, 1997, 1998, 1999ab,
2000, 2003). Focusing on the interface between synchronic variation and
diachronic change emphasized by Anttila & Cho 1998, Cho 1998, and others (references cited in Yamane & Tanaka 2002), we will show that the
minimal demotion of FAITH leads to the gradual loss of PNS. It will also be

124 Noriko Yamane-Tanaka

discussed that the shift of the distinctive role from prenasal to voice
may follow from the grammatical force.

Prenasalized stops in Japanese dialects

1.1. Tohoku dialects

Tohoku dialects, which are spoken in the northeast area in Japan (cf. Appendix 1; No. 27, and North of No. 15), have a unique phonological character,
or what is called synchronic chain shift in intervocalic stop consonants.2
In their systems, especially in Yamato lexical items, i) /k/ becomes /g/, and
/g/ becomes /N/, ii) /t/ becomes /d/, and /d/ becomes /nd/, and iii) /b/ becomes /mb/, in a non-neutralizing way.3 Thus, unlike other dialects, prenasalized stops (henceforth, PNS) appear in the surface forms.4 It should be
noted, however, that PNS tend to be replaced by plain voiced stops, due to
the ongoing loss of the synchronic chain shift in younger generation.
This section describes how PNS are realized in a system which retains
PNS. For expository convenience, the underlying representation will be
used as the corresponding lexical output of Tokyo Japanese (cf. It &
Mester 1997: 430, fn. 14). Then the prenasalization facts will be given below:5 (Examples are from Aomori dialects.)

Intervocalic (Pre-)Nasalization
(i) g  N
[N] / V _ V
[g] / elsewhere

(ii) d nd 6
[nd] / V _ V
[d] / elsewhere
(iii) b  mb
[mb] / V _ V
[b] / elsewhere

a. kagami
b. kagi
c. uguisu
d. toge
e. kago


bush warbler

a. hada 
b. ude 
c. mado 





a. saba
b. ebi
c. abura
d. kabe

The implicational distribution of prenasalized stops in Japanese


PNS appear only in intervocalic position. Elsewhere contexts are, besides

word initial position, (a) V1. _ V2 : where V1 = long, (b) C1V1. _ V2 : where
C1 = [voice], V1 = [+high], V2 = [high], (c) V1. _ V2. C3V3 : where V2 =
[+high], C3 = [voice], V3 = [high], (d) after nasal (see note 5).
Unlike [mb] and [nd], [N] is a simple nasal stop. This idiosyncrasy of the
velar nasal may be interpreted as a gap-filling action of phonemic systems;
[N] resulted from [Ng], when [m] and [n] existed already. However, from
the viewpoint of OT, the early loss of [Ng] would be expected. Not only prenasalized stop is structurally complex but also velar place is the most marked
among the three places of articulation (henceforth, PoA).7 This can be characterized as banning the worst of the worst effect (Prince & Smolensky
1993: 180).
The direction of the shift of [Ng] to [N] can also considered to be articulatorily natural. The difference between the two is in the relative timing
of a gesture of velic opening and a gesture of oral closure (Maddieson &
Ladefoged 1993: 255). This would be illustrated as below.

a. Plain nasal
velic aperture
oral constriction |-------------|
b. Prenasalized stop: shortening of velar lowering gesture
velic aperture
oral constriction |-------------|
c. Prenasalized stop: lengthening of oral closing gesture
velic aperture
oral constriction |----------------------|

In a simple nasal, the relative timing of oral and velic gestures are closely
coordinated as in (2a), but in a prenasalized stop, the nasal passage has to
be closed before the oral articulation is released. According to this view, a
prenasalized stop seems articulatorily unstable. There are several different
ways of producing PNS: One way is to shorten the duration of the velic
lowering gesture as illustrated in (2b), and the other is to extend the duration of the oral closure as in (2c). Furthermore, the total duration of the
prenasalized stop is longer in case (2c), which would be consistent with
the idea that the durations of complex segments are greater than the duration of simpler ones. (Maddieson & Ladefoged 1993: 255). However, it
seems that the Japanese system would choose option (2b) rather than (2c),
otherwise the contrast between moraic nasals (e.g., kaN
N go nursing) and

126 Noriko Yamane-Tanaka

nonmoraic nasals (e.g., kaN go basket) would be hard to be maintained. It
follows that as for Japanese PNS, the gestural shift of the velum (i.e., lowering and raising) is forced in a rather short span. Under the condition, [Ng]
in Japanese does not seems to be as clearly articulated (and probably perceived) as [mb] and [nd], since the velar has to be raised before the constriction at the velum is released. This conflict could be adjusted by the parallel
overlap of both of oral and velic gestures as in (2a).8
The shift of [Ng] to [N] is also supported by the geographical distribution. Among the variations, [N] is attested most widely in Tohoku dialects.
The distribution is summarized based on the previous reports and linguistic
atlas, as shown below (Oohashi 2002: 225).

a. [N] All Tohoku areas except below

b. [ )g](=[Ng]) Midsouth in Akita, Midsouth of the inland in Yamagata,
north of the lower part in Niigata, a part of Fukushima
c. [g] Coastal area of Sanriku in Iwate, Midwest in Fukushima, periphery of Murakami city in Niigata

Furthermore, Oohashis phonetic experiments indicate that only [N] was

observed even in some dialects in (3b). This fact may also suggest that [Ng]
has already been replaced by [N].10
In contrast to intervocalic positions, word-initial plain voiced stops appear as they are. The examples are follows.

a. [g] [gakko] school, [geda] clogs, [go] five

b. [d] [daigu] carpenter, [degiru] be able to, [dogu] poison
c. [b] [baSa] carriage, [biwa] Japanese lute, [budo] grape,
[boro] rag

PNS never appear word-initially, which might erroneously lead one to assume that PNS are mere positional variants. However, voiced stops appear
also intervocalically, and in this sense, they are not in complementary distribution. Furthermore, the intervocalic voiced stops are synchronically
derived from voiceless stops, for instance saka  sag a slope, kaki 
kagi persimmon, doko  dogo place, geta  geda cloggs, mato 
mado window, and kata  kada shoulder. As a result, minimal pairs are
easily found as follows.

The implicational distribution of prenasalized stops in Japanese



Minimal pairs
(i) /g/ vs. /N/ (k  g, g  N):
ageru open vs. aNeru raise, kagi oyster vs. kaNi key
(ii) /d/ vs. /nd/ (t  d, d  nd):
mado target vs. mando window, hada flag vs. handa skin

This fact suggests that the contrast between voiced stops and PNS is considered to be phonemic rather than allophonic.

1.2. Cross-dialectal variation

Although Tohoku dialects could generally prenasalize all voiced stops b, d,
g, this pattern does not simply extend to other dialects. Cross-dialectal
comparison leads us to divide prenasalization systems into four distinct
types A, B, C and D. Let us call Tohoku system A, which prenasalizes all
three voiced stops. If we incorporate the details in (3a, b), system A may be
divided into A1 and A2.9 System B prenasalizes only d and g, system C
(pre)nasalizes only g, and system D prenasalizes none of them. Thus, as far
as PNS and nasals are concerned, the output inventory of each system
would look like this.


Intervocalic PNS and nasals

{Ng, mb, nd; (N,) m, n}
{ mb, nd; N, m, n }
{nd; N, m, n}
{N, m, n}
{m, n}

The regional dialects described as A through D are regarded as synchronic

variation, in that all systems above are attested as regional dialects in Japan.
Regional information is below.

128 Noriko Yamane-Tanaka




Regional information
Aomori, parts of Iwate, Miyagi, Akita, parts of Yamagata, parts of Fukushima, parts of Niigata, parts of Mie,
parts of Ehime, parts of Nagasaki, parts of Kagoshima
Parts of Nara, parts of Wakayama, Kochi
Parts of Ibaragi, parts of Tochigi, parts of Chiba, Tokyo,
Kanagawa, Toyama, Ishikawa, Fukui, Yamanashi, Nagano, Gifu, Shizuoka, parts of Shiga, parts of Kyoto,
Osaka, parts of Hyogo, parts of Tottori, parts of Okayama, parts of Okinawa
The rest of the regions above

Linguistic atlas (cf. Appendix 2) shows the geographical distribution of

intervocalic PNS in Japan. Regions marked with black shadow indicate that
they have the voiced PNS. Map [1] represents the areas with (pre)nasalized
d; map [2] the areas with prenasalized d, and map [3] those with prenasalized b.
Notice that the marked areas get minimized in the order [1] > [2] > [3].
This means that [N] survives in the most extensive areas, including Kinki,
Kanto, Shikoku and Tohoku areas, [nd] is seen in limited regions of Kinki,
Shikoku and Tohoku, and [mb] is only in some regions of Tohoku. More
importantly, areas [1][3] are not distributed at random, but are in an inclusion relation: Area [3] can be included in area [2], and area [2] can be included in area [1] (i.e., [3][2][1]). This suggests that the existence of
[mb] implies that of [nd], the existence of [nd] implies [N], and similarly [mb]
implies [N]. In this paper I refer to this kind of regularity as the implicational relation in the geographical continuum.
The next section will show that such a regularity is also observed in the
chronological continuum.

1.3. Historical change

The assumption that remnants of the past survive in remote regions (Yanagita 1930) is widespread. This is known as hoogen shuuken ron or
theory of peripheral distribution of dialectal forms.10 He found that i) the
innovative forms are seen in the areas of Kyoto or Nara, ex-capitals of Japan, while the older forms are seen outside the areas, and ii) the new forms

The implicational distribution of prenasalized stops in Japanese


are diffused in a gradual succession, just like a ripple, which would be made
by a stone thrown into a pond.
This assumption will be supported if the forms in peripheral areas existed in the past. As far as PNS are concerned, there is agreement that [Ng,
dz, nd, mb] existed intervocalically in prehistoric and Old Japanese (Unger
1977; Vance 1983).13 More importantly, PNS are gradually lost, which has
been attested by historical materials in the central and other dialects. According to Hashimoto (1932), the loss of [mb] took place in the late Muromachi
era, and the loss of [nd] took place in Modern Japanese. 14
It is not known when [Ng] was lost, and there is no clear phonetic evidence
to prove its existence in OJ and even in current Tohoku dialects. Although it
would be reasonable to assume that the loss of [Ng] occurred prior to the loss
of [mb], but as for the period of the loss, I could only speculate it is in OJ.
The scenario would be that [Ng] was replaced by [N] in OJ, then [N] is also
going to be replaced by [g]. In fact, the loss of [N] is a well-known ongoing
change in present-day Japanese. Kindaichi (1941) reports that [N] started to
be lost and replaced by [g] among the younger generation in Tokyo.
Summarizing, the loss of PNS affected velar, labial, and coronal in this
order, and further affected [N]. This is shown in the rightmost row of the
table below.

Loss of PNS
System Intervocalic
PNS and Nasal
{mb, nd; N, m, n}
{nd; N, m, n}
{N, m, n}
{m, n}


Major Change

Old Japanese (OJ)

Middle Japanese (MJ)
Modern Japanese (ModJ)
Present day Japanese (PJ)

Loss of Ng
Loss of mb
Loss of nd
Loss of N

Following the generalization above, we could specify OJ as system A, MJ

as system B, ModJ as system C, and PJ as system D. Each system historically turns up in the order ABCD. I will refer to this kind of regularity as the implicational relation in the chronological continuum.
Notice that the historically reconstructed contextual phonemic systems
AD all match with those in cross-dialectal distribution AD which I described in the previous section. That is, the implicational relation in the
chronological continuum and that in the geographical continuum are perfectly matched. It is clear that this regularity would strengthen Yanagitas

130 Noriko Yamane-Tanaka

assumption that the current geographical variation mirrors a series of grammars that existed at the past historical stages.
In the next section, I will show that such a parallelism between synchrony and diachrony is not just a matter of coincidence, but would be expected under the analysis of OT.11 I will propose that the demotion of
Faithfulness constraint (henceforth, F) is responsible.


OT analysis

2.1. Subset structure and harmonic completeness

The parallelism between the chronological and geographical continuums
suggests that it is unlikely that prenasalization patterns deviate from either
of these systems; for example, no system can be found whose inventory has
{Ng, mb; N, m, n} or {mb; N, m, n}, or {nd; m, n} and so on. This is because
PNS are realized in accordance with a certain markedness scale of place of
Now let us suppose four constraints relevant to this phenomenon; Markedness constraints in (9a-d), and Faithfulness constraint in (9e).

a. *Ng: Ng is prohibited in the output.

b. *mb: mb is prohibited in the output.
c. *nd: nd is prohibited in the output.
d. *N: N is prohibited in the output.
e. MAX(NAS): Velic aperture of the input has an identical correspondent in the output.
(No deletion of velic aperture)

MAX(NAS) requires that every PNS of the input should be realized in the
output, so the change of prenasalized stop to simple voiced stop in the output would incur the violation of this constraint (This will be explained in
(19ii) and (20b)). The random permutation of these constraints could generate 120 possible dominance hierarchies. However, as far as PNS of OJ
through present-day Japanese is concerned, the hierarchies should be limited to only 4 ways. This can be achieved by adopting the following two

The implicational distribution of prenasalized stops in Japanese


(10) a. Fixed markedness ranking hypothesis

(Prince & Smolensky 1993: ch. 9)
b. Faith-reranking hypothesis
(It & Mester 1995ab, 1999, 2002)
The details will be described in the following.
First, there are reasons to set the fixed markedness hierarchy relevant
here as *mb >> *nd >> *N. The original version of OT (Prince & Smolensky
1993: ch. 9) claims that possible segmental inventories can be captured
with the fixed ranking of universal Markedness constraints (henceforth, M).
Crucial to PNS in Japanese are the rankings pertaining to two kinds of harmonic scales. One is the PoA markedness scale, COR LAB, i.e., coronal is
more harmonic than labial, which leads to the constraint hierarchy as *Lab
>> *Cor. There has been little agreement as to the ranking between *Lab
and *Dor.16 But based on the assumption that *Lab and *Dor may be equivalent but that the ranking is determined by language-specific choice (Rice
2003: 418), we can posit *Dor >> *Lab so that it could fit to the facts of
Japanese PNS. Thus the overall ranking among the three M concerning
PoA would be *Dor >> *Lab >> *Cor. The other harmonic scale is SIMPLEX
 COMPLEX, i.e., a simple segment is more harmonic than a complex one,
which would roughly lead to *Complex >> *Simplex. Since PNS are structurally complex, the constraint against PNS would have to be ranked above
the constraint against simple nasals as below. (The constraint on the top is
most dominant among those in the same box.)

* Ng
* nd



Once M hierarchy is fixed in this way, the rerankable constraint would necessarily be only F. This assumption allows us to capture the so-called harmonic completeness.
The definition of harmonic completeness is given below.
(12) Harmonic completeness (Prince & Smolensky 1993; Prince 1998)
Let S be a system and ,  elements that are markedness-wise comparable, with  . Then, if S contains , it must also contain : ( 6&

)  6

132 Noriko Yamane-Tanaka

The output inventory of Japanese PNS is harmonically complete in this
(13) a. Harmonically complete systems { mb, nd, N}{ nd, N}{ N}{ }
b. Harmonically incomplete systems { mb, nd}{ mb, N}{ mb}{ nd }
We already observed the regularity of harmonically complete systems, in
terms of the implicational relation in the geographical and the chronological
continuum. None of the harmonically incomplete systems was found in any
of the continuums.
Harmonically complete patterns are very common crosslinguistically,
and F reranking model would appropriately capture many hierarchical inclusions between areas of constraint activity in the phonological lexicon.
Given the fixed M ranking M1 >> M2 >> M3 with rerankable F, it would
follow that items observing M1 may also observe M2, but not vice versa;
items observing M2 may also observe M3, but not vice versa. Given the
hierarchy *Ng >> *mb >> *nd >> *N for Japanese segment inventories, the
hierarchical inclusion can also visually be expressed with constraint domain map (It & Mester 1995ab) as below. The allowable segments are
shown in { }.
(14) a.

* Ng
* b
*d C
*N D

{ mb, nd, N}
{ nd, N}
{ }


more marked

less marked
As we go toward the periphery, more structures are allowed, while as we go
inward, the allowable structures are minimized due to the obligation to the
observance to more constraints. Based on the observation in the preceding
sections, we could state that system A falls outside of the circle of any domain; system B is inside the domain of *mb; system C is inside the domain
of *nd, and system D is in the innermost circle of *N.
From a historical perspective, Japanese starts from the outer circle allowing the full range of instantiations of PNS, and as time goes by, the system goes inward eliminating marked segments gradually. This seems to be

The implicational distribution of prenasalized stops in Japanese


intuitively right, in the sense that the historical shift proceeds toward the
less marked structure. Such a direction could be expressed as an inward
shift or a shift from implicans to implicatum as shown in (14b).17
Before representing how variation between grammars is accounted for,
let us clarify some difference between the constraint domain map developed by It and Mester and the one in here. It and Mesters map is about
lexicon-internal variation within a grammar of one speaker of current Tokyo
Japanese. What I am addressing is variation between grammars in the PNS
inventory of speakers both across time and space. But in both models, the
emphasis would be placed on the observance of the hypotheses in (10ab).

2.2. Three-dimensional constraint map

Given that a diachronic grammar of a language consists of a series of synchronic grammars, I will present a three-dimensional constraint map. This
new image would only make sense if each synchronic grammar would be
aligned in a chronological order. Systems A through D are arranged as vertical planes as if we cut across a tree trunk, and also arranged from bottom
to top as if the tree grows, along the line of a diachronic continuum.
The shaded area in (15) indicates inactive constraint domain in each
system, while the white area indicates the active constraint domains. As is
clear, the inactive domain is minimized step by step.18 In system A, as OJ
is in the outermost ellipse, the inner constraint domain *mb does not have
any force, so that {mb, nd, N} is possible. In system B, domain *mb starts to
be active, so only {nd, N} is attained. In system C, the white area encroaches
on the domain *nd, then only {N} is attained. In system D, the white area
further reaches *N, making the whole domain active, where PNS and even
{N} are not allowed. As the white domain augments, the segment inventory
is minimized.

134 Noriko Yamane-Tanaka

(15) Three dimensional domain map
*Ng *mb *nd


PJ (=D)

*Ng *mb *nd


ModJ (= C)

*Ng *mb *nd


MJ (=B)

*Ng *mb *nd


OJ (= A)

This image shows that every system is harmonically complete, and the
sound change proceeds toward a specific direction step by step. Again, the
direction of the diachronic change here can be characterised from implicans
to implicatum.
(16) a. Implicational relation: DCBA
b. Unmarked direction of diachronic change: ABCD
In terms of OT, thanks to the fixed ranking of M, harmonically incomplete
systems such as those in (13b) would never be generated. Furthermore, a
diachronic prediction can be made. That is, [mb] may be lost before [nd], but
[nd] will never be lost before [mb]. Likewise, [nd] may be lost before [N],
but [N] will never be lost before [nd]. This prediction holds true for both the
synchronic and the diachronic continuum, as seen in the previous sections.

2.3. Max demotion analysis on prenasal loss

We have seen that if the rerankable constraint is only MAX, there are four
possible hierarchies (i.e., rankings in A, B, C and D shown below). MAX
prohibits changing the segment or feature from input to output. Given all

The implicational distribution of prenasalized stops in Japanese


the input as diachronically old forms (Yamane-Tanaka 2003), then candidates which lost PNS would incur violations of MAX(NAS).
In this section, I will insist that the rerankability of MAX(NAS) is also
not random. If FAITH(NAS) can be reranked in any free way, even if allowable systems would be only 4, the ways of order that each system will
emerge should add up to 24 ways (e.g., DCBA, ACBD,
CABD), but it is not the case. In order to derive the correct diachronic change (i.e., ABCD), F has to be demoted minimally among
the relevant constraint system. Thus the minimal F demotion limits the possible order of permutations to only one way, as shown below. (The constraint at the top is most highly ranked.)
(17) Possible permutations and direction



* Ng
* nd

* Ng

* Ng

* Ng

{ mb, nd, N}

{Nd, N}



As F is ranked higher, more M are invalidated (i.e., free from force), so that
more PNS are allowed in the system. As F goes downward, more M become active, so that more PNS are removed from the system. F minimal
demotion hypothesis correctly captures the way of progression that the less
marked structures never have been eliminated until the relatively marked
ones were eliminated.
Let me discuss some lexical items, to see how PNS would be realized in
each system A-D. Items /kabe/ wall, /hada/ skin and /toge/ thorn are
synchronically attested as below.
(18) PNS in intervocalic context

System A
System B
System C
System D




136 Noriko Yamane-Tanaka

System A has all {mb, nd ,N}, system B has {nd, N}, system C has {N} only,
and system D does not have any of them.
Notice that in systems AC, the place asymmetry between velar and
non-velar is observed: only [Ng] turns to simple nasal [N], while [mb] and
[nd] directly turn into voiced obstruents [b] and [d] respectively. Thus two
strategies to avoid PNS can be posited:
(19) Timing difference between PNS, simple nasal and simple stop
Velic aperture


Oral constriction


(i) Simple Nasal

(ii) Simple Stop



The first strategy is to extend the velic aperture of PNS to create a simple
nasal, as in (19i), and the other is to delete the entire velic aperture to create
a simple stop, as in (19ii).
Both strategies would satisfy *PNS, but each of them would violate one
of the following constraints.
(20) a. DEP(NAS): Velic aperture of the output has an identical correspondent in the input.
(No extension of velic aperture)
b. MAX(NAS): Velic aperture of the input has an identical correspondent in the output.
(No deletion of velic aperture)
The strategy in (19 i) would satisfy MAX(NAS), but violate DEP(NAS) since
the velic aperture is extended in the output (e.g., mbm). On the other hand,
the strategy in (19ii) would satisfy DEP(NAS), but violate MAX(NAS) since
the velic aperture is deleted in the output (e.g., mbb).
As it stands, there is no predictive force to capture the idiosyncrasy of
the velar. Then I will propose the following constraint.
(21) MAX(NAS)/VEL: Velic aperture with the oral constriction at velar of
the input has an identical correspondent in the output.
(No deletion of N)
MAX(NAS)/VEL demands that the velum remains to be lowered when the
oral constriction is formed at the same area (i.e., velar). Thus it is violated
if the velum is raised while the constriction is formed at the velar (e.g., Ng
 g). This kind of constraint may be phonetically-grounded (see note 9),

The implicational distribution of prenasalized stops in Japanese


and is motivated by a theoretical schema of specific-to-general constraint

hierarchy. MAX(NAS)/VEL is a special version of MAX(NAS) proposed in
(20b). The violation of MAX(NAS)/VEL would always violate MAX(NAS).
This kind of partitioning of faithfulness constraints is not new: positional
faithfulness (J. Beckman 1997), HEADMAX-BA >> MAX-BA (Kager 1999:
sec. 6.4), and specific lexical strata faithfulness (It & Mester 1995, 1999,
2001, 2003) are the outcomes via splitting a special constraint from the
general faithfulness.
The hierarchy of the proposed constraints is determined from the schema
in (22a) (It & Mester 2003: 165168), and interact in two ways as in
b. MAX(NAS)/VEL >> DEP(NAS) >> MAX(NAS): system AC
c. DEP(NAS) >> MAX(NAS)/VEL >> MAX(NAS): system D
According to the schema (22a), some markedness constraint should intervene between a special F and a general F. DEP(NAS) is commonly known
as F rather than M in OT, but for the present purpose, I only speculate that
this constraint could work as M in the sense that it demands that the oral
release should not be suppressed by the extended timing of the velic aperture. Importantly, the ranking in (22b) is invariant among the systems AC,
where thanks to the special effect only Ng turns into N (rather than g). Another ranking, (22c), where no relevant constraint intervenes between the
two kinds of MAX, makes the special effect muted in system D (thus the
velar comes to pattern with the labial and the coronal).
This is confirmed as below.
(23) a. Relevant constraint (i.e., DEP(NAS)) intervenes: N is optimal.
Systems A, B, C

to ge

* Ng






138 Noriko Yamane-Tanaka

b. No relevant constraint (i.e., DEP(NAS)) intervenes: g is optimal.



 to ge






In system AC in (23a), where DEP(NAS) intervenes between MAX(NAS)/

VEL and MAX(NAS), N is selected as an optimal output. On the contrary, in
system D in (23b), where DEP(NAS) does not intervene, which makes the
special effect immune, and g is selected. In both systems in (23ab), the
ranking MAX(NAS)/VEL >> MAX(NAS) remains fixed, with only MAX(NAS)/
VEL can be demoted. The Max demotion analysis holds true also here. (The
input contains voiced and voiceless versions of PNS. It means that either of
them would work for the present purpose. This point will be touched in the
next section.)
The overall rankings of constraints in systems AD are constructed by
combining the ranking in (17) and the ranking in (23). The unified rankings
are given in (24). MAX(NAS)/VEL and MAX(NAS) are highlighted so that it
will be clear that in all systems (i) the ranking between them is fixed, (ii)
the rerankable constraint are only one of them, and (iii) the MAX demotion
is minimal.
(24) a. Old Japanese / Tohoku: {mb, nd, N}





ha da,







* mb



The implicational distribution of prenasalized stops in Japanese


b. Middle Japanese / Kochi: {nd, N} Demotion of MAX(NAS)

ka be

ha ta
 toNge *!
to ge,
System B

c. Modern Japanese / Tokyo: {N} Demotion of MAX(NAS)

ka be
ka pe

 toNge *!
System C

d. Present-day J / younger generation in part of Tokyo:

{ } Demotion of MAX(NAS)/VEL
ka be
ka pe
ha ta
 toNge *!
System D

System A in (24a), such as Old Japanese or Tohoku dialects, ranks

MAX(NAS) below *Ng and above *mb, thus {mb, nd, N} is correctly realized.
System B in (24b), such as Middle Japanese or Kochi dialect, demotes

140 Noriko Yamane-Tanaka

MAX(NAS) below *mb, thus {nd, N} is correctly realized. System C in (24c),
such as Modern Japanese or Tokyo dialect, demotes MAX(NAS) further
below *nd, thus {N} is correctly realized. System D in (24d), such as dialects in younger generation in part of Tokyo, ranks MAX(NAS)/VEL below
DEP(NAS), thus none of the PNS or [N] is allowed in this system.
All systems are captured with the reranking of MAX only. It was also
ascertained that its minimal demotion matches the chronological order in
which each system turns up.
Once the M hierarchy is determined as above, the predictions below
would follow.
(25) a. Synchronic prediction about the loss of PNS:
(i) N appear as frequently or more frequently than nd.
(ii) nd may appear as frequently or more frequently than mb.
b. Diachronic prediction about the loss of PNS:
(i) mb may be lost before nd, but nd will never be lost before mb.
(ii) nd may be lost before N, but N will never be lost before nd.
Observations (6)(8) would be sufficient to claim that predictions in (25)
are true in Japanese PNS.


Prenasal loss induces voice contrast?

3.1. Prenasal/voice controversy

This section discusses how the prenasal loss is related to the history of
voicing in Japanese. It is widely admitted that the feature voice has been
distinctive throughout the history of Japanese (for example, see It &
Mester 2003: 211212), but there is another view that only prenasal was
distinctive in early Japanese (Wenck 1959; Hayata 1977a,b; T. Takayama
1993; Hamano 2000). As for the former view, the prenasalization in OJ
would be considered to be allophonic. Supporters of this view may agree
with the idea that the prenasalization could be a phonetic mechanism for
facilitating the expression of voicing on a stop.19 The functional reasoning
of prenasalization maybe plausible in terms of phonetics, but its theoretical
consequence may bring questions in treating phonology. Why do current
prenasalization dialects still have minimal pairs such as hada flag vs.
handa skin, where prenasal can not be regarded as allophonic? Why do

The implicational distribution of prenasalized stops in Japanese


current non-prenasalization dialects show minimal pairs such as hata flag

vs. hada skin, where only voice should be distinctive? And how do these
dialects are related with each other?
If we admit the view that the prenasal was distinctive rather than allophonic in OJ and was lost gradually, these questions could be answered.
The prenasalization dialects are the remnants of the past system where prenasal was distinctive, while the non-prenasalization dialects emerge from it
through the diachronic chain shift involving prenasal, voiced and voiceless
stops. The two sets of dialects are both attested in light of the geographical
and chronological continuums as shown previously.
An OT analysis in the previous section is not the one I believe would
lend a strong support to the prenasal-distinctive view. But at least it could
question the view that the prenasal in OJ is mere allophonic, suggesting the
possibility that the contrastive feature of stop consonants may have shifted
from prenasal to voice in a systematic way. To put it another way, the prenasal loss may have gradually induced the voice contrast from MJ, which
was not observed in early OJ.
The scenario I intend to show here is that the voice contrast on one PoA
didnt appear until the prenasal on the same PoA is lost. This is summarized as below.

Voiceless stops
p, t, k
p, t, k
p, t, k
p, t, k

b, nd, N
d, N

Voiced stops

b, d
b, d, g

The only thing that is certain here is that PJ has the voice contrast without
prenasal, and the segmental distribution in the other eras is hypothetical.
The question is how PJ has attained such a system. Imagine when one prenasalized stop is lost, it is replaced by the plain voiced stop; say when nd is
lost, nd is replaced by d. Then, the original d may become t in order to
avoid merging with the original d. Given that such a diachronic chain shift
proceeds from labial, coronal and velar in this order, we could further assume that the voice contrast didnt emerge at all PoA simultaneously. In
other words, in early OJ the contrastive feature pertaining to stops was prenasal, but the contrastive feature may have shifted from prenasal to voice
gradually from MJ through PJ.

142 Noriko Yamane-Tanaka

This line of thought agrees with other assumptions shown below. Hayata
(1977) holds the view that the OJ intervocalic stops are phonetically voiced
thus the contrast of voiced stops are prenasal rather than voice, arguing that
it could give a unified account for two consonantal changes which were
previously considered to be unrelated; what was considered as the shift of
voiceless labial fricative to labial glide (i.e., ha-line shift) may actually be the
shift from the voiced labial fricative on one hand, and what was considered
as the deletion of voiceless velar stops (i.e., i-onbin or u-onbin) may also be
the shift from the voiced velar fricative to palatal glide with its subsequent
deletion on the other hand. M. Takayama (1992a) agrees with this, arguing
that MJ ceased postnasal voicing, when voice started to play a distinctive
role; the voice in OJ must not have been contrastive like other languages
that have postnasal voicing. Moreover, Hamano (2000) insists that the prenasal became redundant on labials historically earlier than nonlabials on the
ground that the distribution of intervocalic stops in sound-symbolic stratum
shows an asymmetrical pattern; nonlabial stops are predominantly voiceless
but labial stops are predominantly voiced.20
Also, the UCLA Phonological Segment Inventory Database (briefly,
UPSID) (Maddieson 1984) can tell us that the markedness scale may vary
according to the manner of articulation.21 Typological survey about the
segment distribution of voiced stops would assume PoA markedness hierarchy as below (see also It & Mester 1999).
(27) *g >> *d >> *b
Interestingly, the ranking in (27) is in a mirror image of the ranking on PNS
*mb >> *nd >> *N shown in the previous sections; labial is most marked in
the PNS scale, while in contrast it is least marked in the voiced stops scale.
Nonetheless, they are both sides of the same coin; the ranking of PNS may
state that mb is most likely to be missing from the segment inventory, and
the ranking of the voiced stops may state that b is most likely to be present.
These implications would make us believe strongly that mb is the first target
of being lost and subsequently shifting to b. In short, two different rankings
may support the assumption that the appearance of a voice contrast proceeded with labial, coronal and dorsal in this order. Some physiological and
aerodynamic reasons could also support PoA effect on voicing. Hayes and
Steriade (2004) admit hierarchy (27), giving it the reason that it is more
difficult to sustain vocal vibration, as the size of the cavity behind the oral
constriction is smaller. (This account is not totally new. see also note 9).

The implicational distribution of prenasalized stops in Japanese


3.2. Gapped Inventory

The discussion so far may seem to suggest that the sound changes may not
affect every segment in the same feature class all at once. This thought
would lead us to take the view that the synchronic segmental inventory may
be gapped during the course of sound changes. It may be against the hypothesis that segmental modification such as prenasalization, aspiration and
labialization occur simultaneously on a natural class of segments, rather
than on one individual segment (Hinskens & van de Weijer 2003). What I
meant to show here, however, is that even if diachronic changes caused
some stage to have gapped inventories, harmonically incomplete systems in
(13b) are rarely attested: 46 out of 47 regional dialects in Japanese fit into
the harmonically complete systems in (13a), with only one exception of
Tokushima dialect (cf. Appendix 3: No. 37).22 The gapping pattern along
with the harmonic completeness may be the unmarked case, but at the same
time, we should bear in mind that other types of gapped inventory do exist,
which would seek for a different explanation (for example, de Lacy 2002,
Mielke 2004).
3.3. Complexity in old systems
PNS may be marked compared to simple (voiced) stops, in terms of segmental complexity. Then it would be reasonable to suppose its loss is a
natural direction. But the interesting question may possibly be cast on the
fairness to assume that such a complexity is held only in the old system.
One answer would be that every synchronic system has some marked trait,
which may differ from system to system. Compare the overall underlying
consonant inventories for OJ and PJ:
(28) a. OJ: prenasal is contrastive 23

Labial Alveolar Palatal Velar

Son Liquids

144 Noriko Yamane-Tanaka

b. PJ: voice is contrastive 24

Labial Alveolar Palatal Velar Glottal

Son Liquids
Suppose that in the OJ inventory, all obstruents would be systematically
voiceless, while all sonorants are systematically sonorants. The fact that
this pattern is observed in many languages (e.g. Pulleyblank 1997: 7785)
is phonetically grounded, in that it is more difficult to maintain voicing
when the oral constriction is greater. Archangeli & Pulleyblank (1994) and
Pulleyblank (1997, 2003) characterize this state of unmarkedness as feature
cooccurrence restrictions. The constraint governing the feature cooccurrence
relevant here is represented as OBSVOI in OT term, which would interact
with FAITH(VOI).
(29) a. OBSVOI: Obstruent should be voiceless.
b. FAITH(VOI): Voice should be identical in the input and the output.
(No change of voice)
These constraints can interact in the following two ways (Pulleyblank
1997: 7980).
(30) a. OBSVOI >> FAITH(VOI)
(i.e., Voiced obstruents are excluded from the inventory)
(i.e., Voiced obstruents are attested in the inventory)
The ranking in (30a) would force all the surface obstruents to be voiceless,
while the ranking in (30b) would force voice in the input and output to be
identical. This is summarized as below.

The implicational distribution of prenasalized stops in Japanese


(31) a. Languages with a voice contrast (=30b)




b. Languages with no voice contrast (=30a)





(T = p, t, k; D = b, d, g; A = vowel)
Recall that as far as word-initial position is concerned, OJ does not allow
voiced obstruents, while PJ does. If OJ can be categorized into type (30a)
and PJ as type (30b), then the asymmetry of such a voice contrast between
OJ and PJ should be already captured in (31).
The hierarchies in (30) should be compared to those given below, which
could be developed from the hierarchy in (17).
(32) a. Early J: MAX(NAS) >> *PNS
(i.e., Prenasal obstruents are attested in the inventory)
b. ModJ: *PNS >> MAX(NAS)
(i.e., Prenasal obstruents are excluded from the inventory)
Since in early J, it is more important to retain nasal than to avoid PNS, PNS
can be surfaced in the output. But in PJ, it is more important to avoid PNS
than to retain nasality, the underlying PNS cannot be surfaced in the output.
It would easily be found that OJ (or early J) and PJ (or Mod J) show
different markedness. The former is marked in segmental complexity, but
unmarked in inventory. On the contrary, the latter is marked in inventory,
but unmarked in segmental complexity. As is clear from the trade-off relation of markedness shown in (30) and (32), Mod J seems to have reduced
the segmental complexity, sacrificing the unmarked status of obstruents.

146 Noriko Yamane-Tanaka

3.4. Voicing in Old Japanese
The viewpoint that OJ has all voiceless series of obstruents with only prenasal distinctive would not only answer the question on the fairness of the
segmental complexity in the way above, but also capture how PNS as well
as plain voiceless stops are voiced in the output. The important idea underlying this stance is that it is voicing rather than prenasalization which is
allophonic in OJ, in that the distribution of voicing is predictable and complementary: a voiced stop occurs after a nasal or a vowel, and a voiceless
stop occurs elsewhere. Such an allophonic variation will appear in terms of
the following schema (Pulleyblank 1997: 85).

([Ft = Feature])

SPECIAL CONDITION ON FT means a context-specific M on a certain feature,

and *FT is a context-free M on a feature, and FAITH[FT] is a F on a feature.
of SONVOI, see It, Mester & Padgett 1995).
(34) POSTSONVOICE: Obstruents after a vowel or a nasal should be voiced.
This is a constraint family of positional markedness, which force a certain
feature to be surfaced in some specific context. POSTSONVOI allows voice to
be surfaced on a obstruent when it follows a vowel or a nasal. (For a different
view of the voice contrast in OJ, see It & Mester 2003: 211212.)
Then the schema in (33) would be filled with POSTSONVOICE >> OBSVOI
>> FAITH(VOI). This constraint hierarchy can be interpreted as feature
voice does not occur in the inventory because of OBSVOI >> FAITH(VOI),
but the allophonic voicing occurs after a nasal as well as a vowel because
ICC[VOICE] overrides the general prohibition. Here is the summary.



The implicational distribution of prenasalized stops in Japanese


Whether the input is voiceless or voiced, the obstruents in this environment

are consistently voiced in the output. The outcome attained here is coherent
with the one attained by the tableau (24a). (Then the tableau here may be
ranked above the one in (24a).) It should also be noted that POSTSONVOICE
should work also for the sequence of a moraic nasal and a voiceless stop
(thus ANTA  ANDA), as well as for the sequence of a vowel and a voiceless stop (thus, ATA  ADA).
It is known that postnasal voicing was productive in OJ. It might call for
some explanation, because it would suggest that the feature voice, irrespective of its noncontrastive status, would play an active role in postnasal voicing, contrary to the standard prediction. Pulleyblank (2003), however,
shows that both overt features and covert features can certainly trigger
and target phonological phenomena, and they are the basis of phonological
constraints. This hypothesis would make the story in OJ more reasonable
that the noncontrastive voice can be targeted for postnasal voicing.
What happened later on is, borrowed from M. Takayama (2002), When
the feature [pre]nasal disappeared from the distinctive feature set in central
dialects, postnasal voicing necessarily came to cease. given that the distinctive features for obstruents began to shift in Middle Japanese, some
phonological changes would be offered a more reasonable and principled
explanation (translated by N.Y-T). Thus, the voice contrast may have historically emerged through the loss of prenasalized stops.

4. Conclusion
We have discussed the connection between the loss of PNS and the history
of voice of obstruents. Based on the observation that the loss of PNS proceeded along the harmonic scale of PoA, I gave it a principled account with
the minimal demotion of MAX(NAS). It was suggested that the demotion
may indicate the historical emergence of the voice contrast.
The parallelism between synchronic variation and diachronic change is
attested in the form of an implicational relation. Among 47 regional dialects surveyed here, 46 systems involving PNS fit into the factorial typology that emerged from the fixed markedness hierarchy with rerankable
MAX(NAS); the only exception is the Tokushima dialect (cf. Appendix 3;
No. 37). Historical surveys demonstrate that the possible grammatical
change proceeds from implicans to implicatum, which is mirrored in the
demotion of MAX(NAS).

148 Noriko Yamane-Tanaka

The empirical and theoretical observations suggest that as the system
gradually loses PNS from the segment inventory, voice contrasts come to
play a more important role. The analysis presented here is in accordance
with the assumption that the prenasal became redundant on labials earlier
than nonlabials (Hamano 2000), and voice started to play a distinctive role
in Middle Japanese (M. Takayama 2002).
I hope the study here could serve to shed light on the historical shift of
the status of voice in Japanese.

Part of this article was first read at the workshop Voicing in Japanese on
Linguistics and Phonetics (LP) held at Meikai University on September 3,
2002. I am deeply indebted to Kensuke Nanjo, and Tetsuo Nishihara, and
Jeroen van de Weijer, who organized this project and edited this book.
Other presentations of this research include the ones at the meeting of
the Tokyo Circle of Phonologists (TCP) at Seikei University on May 25,
2003 and the informal research meetings as well as a presentation session
in LING 507 at the University of British Columbia from September to December, 2003.
I express my profound gratitude to Sonya Bird, Atsushi Fujimori,
Shosuke Haraguchi, Ayako Hashimoto, Takeru Honma, Junko It, Itsue
Kawagoe, Masahiko Komatsu, Haruo Kubozono, Masao Okazaki, Ruangjaroon, Keiichiro Suzuki, Timothy Vance, Ian Wilson and my classmates,
who provided me with insightful comments and discussions.
Special thanks go to Linda Lombardi, Kan Sasaki, Michiaki Takayama,
and Tomoaki Takayama, who kindly sent me various important materials
related to the issues Im interested in. I am also grateful to Hiroyuki Maeda,
who wrote a critique of an earlier version of this paper (Maeda 2004).
Last but not least, I would express my deepest thanks to Gunnar lafur
Hansson, Douglas Pulleyblank, Shin-ichi Tanaka, and Jeroen van de Weijer,
and anonymous reviewers, who read earlier versions of this paper and made
valuable comments and suggestions.
The research I have conducted since September 2003 is supported by
SSHRC Standard Research Grant [#410-2002-0041], which was awarded
to Douglas Pulleyblank.
Id like to dedicate this paper to the memory of my father and my fatherin-law.

The implicational distribution of prenasalized stops in Japanese


1. According to the dialectal division of Tj (1954), Tohoku dialects consist of
Aomori, Iwate, Akita, Miyagi, Yamagata, Fukushima and North of Niigata.
2. For an OT analysis of the chain shift, see Yamane-Tanaka (2003).
3. /p/ didnt participate in voicing. /p/ turned into [] or [h] word-initially, or
turned into [w] or was deleted word-medially (F. Inoue 2000: 421).
4. As an anonymous reviewer pointed out, some readers may wonder if PNS in
this dialect contrasts with NC clusters intervocalically. It seems cross-linguistically rare for NC clusters and PNS to contrast (Maddieson & Ladefoged 1993).
Also, from the point of view that Tohoku dialects is a syllabeme dialect
(Shibata 1962: 140 141), where light and heavy syllables do not show weight
contrast, it may be hard to believe there is a distinction. However, there are
several reasons to believe so. First, this dialect has minimal pairs to show this
contrast. For example, /samba/ mackerel vs. /samba/ midwife, /handa/
skin vs. /handa/ solder, and /kaN
N o/ basket vs. /kaN
N No/ nursing. Second,
the moraic nasal has longer duration than prenasals. So far I have not attained
any phonetic measurement data to show the durational contrast between the
prenasal (e.g., /m/ in /..mb../) and the moraic nasal on the same place of articulation (e.g., /m/ in /..mb../), but according to Oohashi (2002: 210215), the duration of the nasal murmur showed 130 ms. for /N/ (in /teNki/ weather), which
contrasted with 36 ms. for /m/ (in /ombi/ kimono belt) and 100 ms. for /n/ (in
/handa/ skin). Third, the duration of the moraic nasal is longer compared to
the second element of long vowels or geminates (Oohashi 2002: 317349).
5. Generally, PNS only appear in Yamato items, not in Sino-Japanese (SJ), Mimetics and Foreign items. There are also phonetic environments which prohibit
PNS: (a) V1. _ V2 : where V1 = long, (b) C1V1. _ V2 : where C1 = [-voice], V1 =
[+high], V2 = [-high], (c) V1. _ V2. C3V3 : where V2 = [+high], C3 = [-voice], V3
= [-high], (d) after nasal. Nonetheless, some prenasalized items in SJ as well
as Yamato are found in these environments.

[ko bodaisi]


N u]

Saint Kobo (SJ)

activity, hall (SJ)

tool (SJ)


VV _

[u de]

N aN]

writing brush (Yamato)

equinoctial week (SJ)

C-voi V+hi . _ V-hi


_V+hi . C-voi V-hi



/N/ _



N ite]
right hand (Yamato)
N o]
apple (SJ)

(Based on F. Inoue 2000: 361)

150 Noriko Yamane-Tanaka





Interestingly, [N] can appear in all environments (a-d), [nd] in limited environments (a, b), and [mb] in only one environment (a). This observation also
seems to match the assumption that [mb] is least likely to survive.
In this paper the vowel phonetic symbols are represented by simplified 5 vowels [a, i, u, e, o] rather than the narrow transcription. In Aomori dialect, /u/ is
[_] (unrounded and centralized), which is common to the most dialects in
Eastern Japan. Tohoku dialects in general have no contrast between /i/ and /e/,
merged as [e] (raised).
/..di/ and /..du/ are left out from the column of the examples, because they are
exclusively observed in Foreign items (e.g., [torendi] trendy, [andutowa]
un, deux, trois (Fr)) at present. Relevant to /di/ or /du/ in Yamato items, there
is something worth noting. It is assumed that historically /di/ vs. /zi/ were pronounced respectively as [di] vs. [Zi], and /du/ vs. /zu/ were as [du] vs. [zu].
Such a distinction is still kept in only some areas in Kochi, well known as
yotsugana dialects [dialects with four different ways of pronounciation for
four kana letters] (e.g., [uZi] for /fuzi/ name of area or person, [undi] for
/fudi/ wisteria, [kuzu] for /kuzu/ arrowroot vs. [kundu] for /kudu/ trash).
In contrast, the North of Tohoku areas neutralize such four forms into [dz] ([]
= centralized [i]), which is characterized as hitotsugana dialects [dialects
with only one way of pronunciation for four kana letters] or dzii dzii dialects. This dialect also neutralizes /ti/ and /tu/ into [ts], which turns up as
[dz] intervocalically. Thus, /tizi/ governor and /tizu/ map are both realized
as [tSindz], and /titi/ milk and /tuti/ soil are both [tsdz] (see R. Sato 2002).
As for the structurally complex segments, see van de Weijer (1996). As for the
markedness ranking of features, see sec. 2.1.
The other way of adjustment is turning it into simple voiced stops (i.e., [g]), as
it has happened to [mb] and [nd] in younger generation. But only [Ng] didnt
show this option. It might be ascribed to Boyles Law: voicing is difficult to
maintain when the supraglottal cavity is small (Ohala 1983; Vance 1987).
McCarthy & Prince (1995: 353) express this effect in terms of a constraint
POSTVCLS posterior stops (i.e., velars) be voiceless. In their system treating
[g] ~ [N] alternation in Tokyo Japanese, *[N] >> POSTVCLS >> IDENT-IO(NAS)
turns [k] to [g] only where [N] cant. It & Mester (1997) postulates *g, which
can be interpreted as trigger-constraint, producing the impetus for nasalization of [g] into [N] to occur (Kager 1999: 241). Such an avoidance of [g] may
be aerodynamically or physiologically grounded, as the oral constriction at the
velar could lower the velum so easily that the airflow may be let out of the nasal cavity. It may be worth exploring in terms of Grounding Theory (Archangeli & Pulleyblank 1994).
From the viewpoint of the long-term sound changes, this change can be taken
as the first stage of the consonant shift; [Ng] > [N] > [g] > [F]. As for the labial
series, [mb] > [b] > [B] is assumed. The change in the coronal series is divided

The implicational distribution of prenasalized stops in Japanese


into two: [nd] > [d] before nonhigh vowels, and [nd] > [ndz] > [dz] > [z] before
high vowels (See T. Takayama 1993). It should be noted that velars had [N]
before reaching [g], while labial and coronal did not have [m] or [n] respectively in any stage of the consonant shift.
11. [Ng] and [N] may not stand in phonemic distinction, so the latter is parenthesized. However, positing the two systems of A1 and A2 seems to cover the observation so far. Thanks for Gunnar lafur Hansson for this suggestion.
12. This may be similar to the wave theory in the European tradition of dialectology. However, Yanagitas theory lays more emphasis on the aspect that the
spread of newer forms takes place in a circular pattern just like ripples with its
center located in the cultural center (Shibatani 1990: 201).
13. This discussion is developed based on the following chronological division (cf.
Miller 1967: ch. 1).
Old J
Late Old J

9c. 12c.

Middle J

13c. 16c.

Early Modern J
Modern J

17c. 1868

Rough Correspondence of Eras

Asuka, Nara
Kamakura, Muromachi, Sengoku,
Meiji, Taisho, Showa, Heisei

14. We focus on the prenasalization of voiced stops, and therefore do not treat
[ndz] here. However, [ndz] likewise underwent loss of PNS, and subsequently
merged with [z] around the 16th through 17th century. For details, see T. Takayama (1993).
15. For another OT analysis treating dialectal differences of consonant voicing, see
Nishihara (2002).
16. For references, see de Lacy (2002: 193194). In fact, not only the ranking between *Lab and *Dor, but also the whole ranking has been a matter of debate.
See Hume & Tserdanelis (2002). For a restriction on M, see de Lacy (2004).
17. The relation between implicans and implicata seems to show the basic pattern
of phonological asymmetries. In the pattern of consonant harmony in acquisition (Pater & Werle 2001), implicans and implicatum are both instantiated in
the early stage, but only implicata are observed in the later stage (e.g. coronals
are targets of harmony as frequently or more frequently than non-coronals.).
18. Anttila and Cho (1998) divide the systems into invariant and variable systems.
Variable systems are expressed as combinations of two invariant systems, such
as A+B, B+C, and C+D. Thus, 7 systems will be logically possible. Since we
assume that diachronic change originates in synchronic variation, we must allow for those variable systems.
19. Thanks for Douglas Pulleyblank for raising this issue.
20. See Nasu (1999) for more data with an emphasis on the markedness of [p].

152 Noriko Yamane-Tanaka

21. As for the scale of voiceless stop, *p >> *k >> *t could be posited based on the
typological investigation of segment inventory (It & Mester 1997, 2003).
Hayes & Steriade 2004: 26) states that [p] is the most difficult obstruent to
keep voiceless (particularly in voicing-prone environments, such as
intervocalic position).
22. I cross-referenced the linguistic map of M. Takayama (2002) with the original
map in Hirayama et al. (1992). Hokkaido was not included in the original map.
But Hokkaido is categorized into system C (cf. Kamei et al. 1997: 280).
I marked three kinds of checks according to the area size.
a. : observed in all area
b. : observed in roughly more than half of the area

c. : attested in roughly less than half of the area

23. This is modified by the inventory of F. Inoue (2000: 431432). The original
versions of prenasal stops were all voiced, with noting that it would be safe to
say that they are all voiceless. See also Hamano 2002: 209).
24. This is from Kamei et al. (1997: 216)

The implicational distribution of prenasalized stops in Japanese

Appendix 1: Regional division (i.e., Todfuken) in Japan


154 Noriko Yamane-Tanaka

Appendix 2:
Geographical distribution of intervocalic prenasalized stops in Japan

[1] [N, g

[3] [mb]

[2] [Nd]

The implicational distribution of prenasalized stops in Japanese


Appendix 3: System information in each prefecture















16. Nagano
17. Toyama
18. Ishikawa
19. Fukui
20. Gifu
21. Yamanashi
22. Shizuoka
23. Aichi
24. Shiga
25. Mie
26. Kyoto
27. Nara
28. Osaka
29. Wakayama
30. Hyogo
31. Tottori
32. Shimane
33. Okayama
34. Hiroshima
35. Yamaguchi
36. Kagawa


















156 Noriko Yamane-Tanaka






The correlation between accentuation

and Rendaku in Japanese surnames:
a morphological account
Hideki Zamma

1. Introduction
Although Rendaku (or Sequential Voicing) has been extensively studied in
the literature, its relation to accentuation has not. Sugito (1965) is a unique
study, which, after conducting a limited research on person names which
end with the morpheme (rice field), claims that words which undergo
Rendaku tend to be accentless. This correlation is intuitively supported, as
some researchers follow her on various occasions (cf. H. Sato (1989),
Kubozono (1998), Kubozono (this volume), Tanaka (this volume), etc.).
This paper examines the extent to which Sugitos generalization applies
in Japanese, investigating more thoroughly other person names. It will become clear that the correlation in question is observed to some extent, but is
not overwhelming. Moreover, it will be shown that obedience or nonobedience to it is lexically determined by the rightmost head morpheme of
the name (e.g. ta in Yoko-ta), and further, that each morpheme shows quite
a diverse behavior in accentuation and Rendaku.
The paper is organized as follows: in the next section we first review the
generally-held view of the relationship in question, giving a summary of
Sugitos (1965) investigation on names with ta. In Section 3, we investigate
other Japanese names to see to what extent the relevant observation applies
in general. It soon becomes evident that the pattern differs significantly
depending on the last head morpheme in the name; in Section 4 such various
morpheme-specific patterns will then be illustrated. Section 5 concludes the

158 Hideki Zamma

2. Sugito (1965): voicing and accentuation in names with ta
The generalization first given in Sugito (1965) and often intuitively supported can be summarized as follows:

a. Accented names do not tend to undergo Rendaku.

b. Accentless names tend to undergo Rendaku.

Sugito investigated names which end with the morpheme ta rice field one
of the most productive morphemes for Japanese surnames and examined
if ta is subject to this generalization, referred to as Sugitos Law in this
paper. As shown in the examples below, names which conform to this generalization are abundant.

a. non-Rendaku pattern: mostly accented

Fuji-ta, Mori-ta, Shiba-ta, Kubo-ta, Yoko-ta, Tomi-ta, Aki-ta
b. Rendaku pattern: mostly accentless
Yoshi-da, Yama-da, Ike-da, Mae-da, Oka-da, Matsu-da, Wa-da

As exemplified in (3), however, this correlation is not absolute. There are

quite a few exceptions.

a. non-Rendaku but accentless

Oo-ta, Mura-ta, Naka-ta, Hira-ta, Iwa-ta, Miya-ta, Naga-ta
b. Rendaku but accented
Hara-da, Nishi-da, Kuro-da, Hama-da, Tsuno-da, Kane-da

The names in (3a) do not undergo Rendaku although they are accentless.
On the other hand, those in (3b) get accented even though they are subject
to Rendaku.
After investigating 362 names with this morpheme, Sugito found that
the generalization applies to more than half of those ending with ta. Below
are the results in which Sugito counted the number of names with respect to
accentedness and Rendaku sensitivity [slight modifications are mine].1

The correlation between accentuation and Rendaku in Japanese surnames







According to (4), 189 names that is, 52.2% of all the ta-names and 71.0%
of names which do not have alternating patterns (i.e., all the names excluding
those listed in both cells) conform to the generalization in (1), as highlighted above. It is possible to conclude from this table that Sugitos Law
applies moderately, though not strictly, to names ending in ta.
Sugito also pointed out that both voicing and accentuation are influenced
by the onset segment of the last mora of the preceding morpheme, which I
will call the base. If the segment is voiced including sonorants but not
nasals the name tends to be accented and exempt from Rendaku (as in
Fuji-ta). If the segment is voiceless including nasals or the mora has no
onset, names tend to be accentless and undergo Rendaku (as in Yoshi-da).2
The table below illustrates this point, where [v] stands for voiceless and
[+v] for voiced in the above-mentioned classifications regarding sonorants
and nasals. Note also that the counting of names is slightly different from
(4), mainly because Sugito excluded names with alternative voicing






a: 19 of these are names whose last onset of the base is /k/.

b: All contain /k/ as the onset in question.
c: 43 of these are names with nasals as the last onset of the base.
The cells highlighted in gray represent areas predicted by Sugitos analysis
accented without Rendaku when the segment in question is voiced, and accentless with Rendaku when the segment is voiceless. This is also observed in
names without alternating accent patterns, highlighted in black. These suggest that even when the accentuation is unpredictable, the application or
non-application of Rendaku can be predicted from the last onset of the base:
when it is voiceless, the name undergoes Rendaku; when voiced, it does not.

160 Hideki Zamma

Exceptions to this generalization seem rather abundant, but if we assume /k/
can be exceptionally regarded as voiced as does Kubozono (this volume)
19 out of 23 accented non-Rendaku names with a voiceless onset can be
correctly predicted, increasing the number of accented non-Rendaku names
from 55 to 74. All 12 of the non-Rendaku names with alternative accentuation also contain /k/ as the onset. This exceptional treatment of /k/ should
not be applied to Rendaku names, however, which already have a high
enough number (i.e. 87 accentless names and 56 names with both accentuations) under a calculation which regards /k/ as voiceless.
As Kubozono (this volume) also suggests, the sensitivity to Rendaku
which depends on the voicing of the segment in the base can be regarded as
another case of the OCP, or Lymans Law. The OCP itself does not force
names with a voiceless onset to undergo Rendaku, although it does prohibit
those with a voiced segment from undergoing it. This might be another
reason why names with exceptional Rendaku are rather large in number
when the onset is voiceless.
Exceptionally accented Rendaku names with a voiceless onset are rather
abundant (58), but again, the OCP itself does not prohibit them from undergoing Rendaku. Note that such names are scarce when the onset is voiced
(only 2). They do violate Sugitos Law, suggesting that some other mechanism forces them to override it. Interestingly, 43 of these exceptional names
contain a nasal as the base-final onset. A possible explanation is that something similar to Post Nasal Voicing is at work here, by which a coda nasal
voices the following onset. This speculation of course needs further investigation, so I will just suggest the possibility and leave it open to question.
In sum, according to Sugitos research, the tendency in (1) is preserved
to some extent in names with ta. It is further possible to predict Rendaku
sensitivity by the last onset in the base, although it is necessary to treat
nasals, sonorants and /k/ in special ways. Since both are somewhat predictable, we can assume that ta is not lexically specified with any particular
information regarding accentedness and Rendaku sensitivity, and that they
are both determined by mechanisms such as the OCP and Sugitos Law. As
the voicing of a segment is a lexical property of the base, it is also possible
to assume that Rendaku sensitivity is first determined by the segment included in the name, and that accentuation is then determined by the voicing
of the head morpheme. This difference in determination order might lead to
the predominance in (4) of the number of names with alternative accentuation (10 + 56 = 66) over those with alternative Rendaku sensitivity (8 + 0 = 8).
Although Sugito did not include names with a monomoraic base in her investigation, this type exhibits a distinct pattern in accentuation and Rendaku

The correlation between accentuation and Rendaku in Japanese surnames


sensitivity. Below are examples of names whose base consists of one mora,
and they are all accentless:

I-da, U-da, E-da, O-da, Ki-da, Su-da, Ta-da, Tsu-da, To-da, No-da,
Hi-da, Ya-da, Yu-da, Yo-da, Wa-da; Se-ta, Ha-ta, Mi-ta

Moreover, except for the last three, names in (6) almost all undergo Rendaku.
This might result from the fact that voiced obstruents are scarce in the last
onset of the base in this type.
Before moving on to the next section, let us summarize the characteristics of ta in the following table:

characteristics of ta:



Peculiarity of
monomoraic base
Yes: A, +R

The first column in (7) is lexical specification of a morpheme, which is

absent for ta: Rendaku sensitiveity can be predicted from the last onset of
the base, and accentedness from its voicing. The second column shows that
ta observes Sugitos Law. The third column is for the subjectivity to the
OCP. The rightmost column is for peculiar behavior of monomoraic bases,
with the property assigned to the name. A is for accentedness and R for
Rendaku, with plus and minus indicating whether the entire name has a
positive or negative value for the property in question.
As we will see in the following sections, morphemes show distinct characteristics as to accentedness and Rendaku sensitivity. Keeping those of ta
in mind, we will observe cases in which other morphemes constitute the
head of the name, and consider how similar and different they are to ta.
First, we examine to what extent the observation in (1) applies to Japanese
surnames in general.

3. Investigation: overall tendency

For the purpose I have just mentioned, an investigation was made on Japanese surnames which contain various morphemes. Names examined are
taken from the first 1,000 popular names in Murayama (2000), excluding
(i) names with ta (which are extensively investigated by Sugito); and (ii)

162 Hideki Zamma

names which cannot undergo Rendaku by definition (that is, those in which
the rightmost head morpheme begins either with a vowel, a voiced obstruent or a sonorant). Those morphemes which have the same pronunciation
and meaning are counted as one (e.g. shima island can be written as
and ). The remaining 347 names are investigated in this paper.
Five native speakers of standard Japanese, all in their mid-thirties, were
asked to read the 347 names, which were shuffled into a random order. When
more than one speaker read a name in one way and the rest in another, the
name was regarded as having alternative pronunciations. When only one
speaker read it in a particular way, the name was regarded as having one
specific pronunciation, as in the case when all of the speakers read it in the
same way. Below is the result of this endeavor:





105 (30.3 %)

59 (17.0 %)

5 (1.4 %)

169 (48.7 %)

89 (24.5 %)

63 (18.2 %)

1 (0.3 %)

153 (44.1 %)


15 (4.3 %)

6 (1.7 %)

4 (1.2 %)


209 (60.2 %)

128 (36.9 %)

10 (2.9 %)

25 (7.2 %)

The areas predicted by the generalization in (1) are shaded. The percentage
shows the rate of appearance among all the names in question (i.e. 347
names). The high percentage of accented non-Rendaku names is in accordance with Sugitos Law, in that it is higher than those of both accented
Rendaku names and accentless non-Rendaku names. That is, if a name is
accented, it is most likely to be exempt from Rendaku (i.e. 30.3% to
24.5%). If it does not undergo Rendaku, it is most likely to be accented (i.e.
30.3% to 17.0%). The percentage of accentless Rendaku names, on the
other hand, does not seem to be consistent with Sugitos Law. The percentage is only slightly higher in the Rendaku group (i.e. 18.2% to 17.0%), and
even lower in the accentless group (i.e. 18.2% to 24.5 %).
This tendency described above is even clearer in another calculation of
the appearance rate. The percentages in (9a, b) are calculated by the number
among one particular group, not among the entire set of names. For example,
the 105 accented non-Rendaku names in (9a) comprise 62.1% of all the nonRendaku names (i.e. 105 to 169). On the other hand, in (9b) they comprise
50.2% of all the accented names (i.e. 105 to 209). In both (9a) and (9b),
accented non-Rendaku names comprise more than 50%, which suggests that
to some extent they conform to Sugitos Law.

The correlation between accentuation and Rendaku in Japanese surnames






105 (62.1 %)

59 (34.9 %)

5 (3.0 %)


89 (58.2 %)

63 (41.2 %)

1 (0.7 %)




105 (50.2 %)

59 (46.1 %)


89 (42.6 %)

63 (49.2 %)


15 (7.2 %)

6 (4.7 %)






Accentless Rendaku names, on the other hand, do not exhibit such prevalence in any group. Among Rendaku names (9a), they comprise 41.2%,
which is less than accented Rendaku names (58.2%). They are prevalent
among the accentless names, but with only 4 names exceeding nonRendaku accentless names (9b).
Though this is a very simple comparison, from these data we can conclude that certain characteristics found in ta in particular, the generalization in (1) do not apply to Japanese person names in general. As will become obvious in the following sections, accentedness and Rendaku
sensitivity are actually morpheme-dependent properties. Moreover, the
influence of the voicing of the base on Rendaku, which is operative in ta,
also seems to be determined by each morpheme, as we will shortly see. In
the next section, therefore, we discuss how such properties are specified for
each major morpheme which appears in Japanese surnames.

4. Morpheme-specific tendencies
As previously mentioned, accentedness and Rendaku sensitivity are properties specified for each morpheme independently. Morphemes can be categorized into four types as to the degree in which the properties are specified: that is, (i) those in which both accentedness and Rendaku sensitivity
are specified; (ii) those in which one of these properties is specified; (iii)
those in which neither is specified; and (iv) those which have peculiar pattern. In what follows, we will consider various morphemes according to
this categorization, so that we can examine to what extent Sugitos Law is
upheld in them and how different they are in accentuation and Rendaku.

164 Hideki Zamma

The morphemes examined here are those which have more than five entries
in the list of 347 investigated names.

4.1. Names in which both accentedness and Rendaku sensitivity are specified
Examples that most clearly exemplify this category are names with hara
field. As shown in (10), these are most likely to be accented without Rendaku, and should be specified as such.
(10) hara: accented, non-Rendaku
Shino-hara, Kuri-hara, Taka-hara, Ue-hara, Ta-hara, Naka-hara,
Kawa-hara, Yoshi-hara, Tsuka-hara, Take-hara, Kasa-hara,
Oo-hara, Kita-hara, Nishi-hara
The table below shows more precisely the specific behavior of this morpheme. The areas predicted by Sugitos Law are shaded in gray.




23 (63.9 %)

3 (8.3 %)

3 (8.3 %)

29 (80.6 %)


4 (11.1 %)

(0 %)

(0 %)

4 (11.1 %)


2 (5.6 %)

(0 %)

1 (2.8 %)

3 (8.3 %)


29 (80.6 %)

3 (8.3 %)

4 (11.1%)


The percentage of words which obey the lexical specification (i.e. 63.9%)
might seem rather small, but it is clear that the other major pattern predicted by Sugitos Law (i.e. accentless with Rendaku) is never produced,
not even as an exception. If it is the case that this morpheme just follows
the generalization i.e. not that it is specified as accented with Rendaku
the alternative pattern should contain fairly large number of examples. Note
also that the total percentages of accented and non-Rendaku groups are
both 80.6%. From these observations, we can conclude that this morpheme
is most typically accented without Rendaku, because it is specified as such.
Moreover, the monomoraicity of the preceding morpheme seems to play
a role in accentuation, as in the case of ta. In (11), exceptions with a monomoraic base can be found in the following four categories: (i) two in accentless non-Rendaku names (Mihara and Ihara); (ii) two in non-Rendaku
names which have both accented and accentless patterns (Ki()hara and

The correlation between accentuation and Rendaku in Japanese surnames


No()hara); (iii) one in an accented Rendaku name (Ebara); and (iv) one
in an accented name which may or may not undergo Rendaku
(Ko[h/b]ara).4 Thus, it is possible to conclude that hara is specified not
only as accented without Rendaku, but also as accentless or undergoing
Rendaku for names with a monomoraic base.5
On the other hand, names with hara are not influenced by the voicing of
the preceding morpheme in terms of Rendaku. Names with a voiceless
segment in the last onset of the base are equally exempt from Rendaku, as
is obvious in (10): compare, for example, Takahara and Yoshihara with
Kurihara and Kawahara. This, however, is natural because the OCP occurs
only when Rendaku might occur, which does not normally happen in hara
names due to their specification.
Other examples which show similar behavior to hara are:
(12) a. shita under (8/8):
b. tani valley (7/8):
c. se shallows (7/7):
d. saka slope (6/7):

Yama-shita, Matsu-shita, Miya-shita,

Mizu-tani, Naka-tani, Shin-tani, Ko-tani,
Hiro-se, Taka-se, Mura-se, Iwa-se,
Kawa-se, Naru-se
Ko-saka, Ho-saka, Haya-saka, Aka-saka,

As the rate in parentheses shows, almost all the names with these morphemes are accented and do not undergo Rendaku, which conforms to the
pattern predicted by Sugitos Law. Monomoraicity does not seem to play a
role in tani and saka, since names with monomoraic base behave in the
same way as others.
On the other hand, there are many names which are specified as having
a pattern that do not obey Sugitos Law; that is, those which are specified
either as accented and undergoing Rendaku, or as accentless and not undergoing Rendaku. Names with kuchi mouth as head are the clearest examples which show the former pattern.
(13) kuchi: accented, Rendaku
Yama-guchi, Tani-guchi, No-guchi, Kawa-guchi, Hi-guchi,
Ta-guchi, Seki-guchi, E-guchi, I-guchi, De-guchi, Mizo-guchi,
Hama-guchi, Hori-guchi, Hara-guchi

166 Hideki Zamma

22 names were found with this morpheme, and interestingly enough, there
were no exceptions to the pattern in question.6 Furthermore, neither the
length (e.g. Noguchi, Higuchi, etc.) nor the voicing of the last onset of the
base (e.g. Deguchi, Mizoguchi, etc.) have an influence on the pattern.
Thus, we can conclude that kuchi is specified as accented with Rendaku,
and as not relevant to the OCP.
Behavior similar to kuchi can be observed in names with hayashi forest.
Six names were found in this investigation, and five of them are accented
with Rendaku:
(14) hayashi: accented, Rendaku
Waka-bayashi, Hira-bayashi, Naka-bayashi, Kuri-bayashi,
The only exception is Kobayashi, which is accentless. The shortness of the
preceding morpheme might be the reason for this exception.
The morpheme kura storehouse shows an opposite pattern to kuchi and
hayashi; that is, names with this morpheme are accentless and exempt from
Rendaku, although they are similar in not obeying Sugitos Law.
(15) kura: accentless, non-Rendaku
Asa-kura, Ita-kura, Taka-kura, Ishi-kura, Oo-kura
Among the seven names that were found, Ogura, which undergo Rendaku,
and Kumakura, which is accented, are the only exceptions. The former
might be due to the monomoraicity of the base, and the latter might be influenced by the voicing of the base-final onset (as in the case of exceptional
names with ta). Obviously, these are just possible explanations, as these
two names are the only examples in the relevant environments.
In (16), we summarize the properties of each morpheme discussed in
this section in the format employed for ta. Those which need more investigation are enclosed in parentheses. Cells which do not have decisive data
are left blank. The OCP is irrelevant to names with the specification [-R].

The correlation between accentuation and Rendaku in Japanese surnames




+A, R
+A, R
+A, R
+A, R
+A, R
+A, +R
+A, +R
A, R




Peculiarity of
monomoraic base
Yes: A or +R


(Yes: A)
(Yes: +R)

4.2. Names which are specified for one of the properties

Several morphemes are specified for either accentedness or Rendaku sensitivity, but not both. In these morphemes, it is often the case that the property
which is not lexically specified is determined by other factors. Take sawa
swamp as an example:
(17) a. O-zawa, Naka-zawa, Taki-zawa, Yoshi-zawa, Fuka-zawa, No-zawa
b. Fuji-sawa, Kuro-sawa, Yanagi-sawa, Naga-sawa, Furu-sawa, Hirasawa
It is clear from (17) that this morpheme is specified as accentless, as the
names do not have accent either when they undergo Rendaku (17a) or not
(17b). Rendaku sensitivity is determined by the last onset of the base, as
with ta: when it is voiceless (including nasals and a null onset), names with
this morpheme undergo Rendaku; when it is voiced (including liquids and
glides) they do not. In (18) we show the precise number of names for each
category. Note that /k/ is counted as a normal voiceless segment here (e.g.
Nakazawa), different from the exceptional cases of ta.







a: All contain a base-final /m/: Ume[s/z]awa, Tomi[s/z]awa, and Kuma[s/z]awa.

168 Hideki Zamma

What is peculiar about this morpheme is that the accentuation is not influenced by voicing again unlike ta. This is because sawa is lexically specified as accentless, and this specification is respected at all times. As a result,
Sugitos Law is only observed in cases where Rendaku applies, i.e. in the
accentless Rendaku cell. Moreover, /m/ behaves differently among voiceless
segments (note that nasals are counted as voiceless in Sugitos calculation),
as it is included in all the names with alternative Rendaku patterns.7 It may
be that /m/ can sometimes be regarded as voiced in names with sawa.
Such is also the case for shima island, which is also lexically specified
as accentless. Again, Rendaku sensitivity is determined by the last onset of
the base.8
(19) a. Ko-jima, Ta-jima, Ii-jima, Naka-jima, Kita-jima, Nishi-jima,
b. Kawa-shima, Naga-shima, Tera-shima, Mizu-shima, Toyo-shima,





As in the case of sawa, /k/ is counted as voiceless in (20). The Sugito Law
pattern is again only observed in the accentless Rendaku cell.
Names with tsuka mound also show a similar pattern to sawa and
shima in that the name with this morpheme becomes accentless.9
(21) Oo-tuska, Hira-tsuka, To-tsuka; Ii-zuka, Ishi-zuka, Te-zuka
The difference between this and the previous two is that voicing is not determined by the last onset of the preceding morpheme; cf. Totsuka vs. Tezuka. As long as there are no Rendaku names with voiced base-final onsets,
it can be said that this morpheme also respects the OCP. Only seven examples with this morpheme were found, however, and thus this tendency remains to be examined more thoroughly.

The correlation between accentuation and Rendaku in Japanese surnames


Saki cape is also similar to sawa, shima and tsuka in the sense that accentedness is fixed lexically; still, the value of the specification is opposite
that is, specified as accented. Rendaku sensitivity is again determined by
the last onset of the base:
(22) a. Miya-zaki, Oka-zaki, Matsu-zaki, No-zaki, Shino-zaki,
Ishi-zaki, Shima-zaki
b. Iwa-saki, Fuji-saki, Naga-saki, (Kawa-saki)
As shown in (22b), a name does not undergo Rendaku when the last onset
of the base is voiced, including exceptionally accentless Kawasaki. This
fact is observable from the following table:






a: All contain a base either (i) with [m] (e.g. Yama-[s/z]aki and Hama-[s/z]aki)
or (ii) of monomoraic shape (e.g. Ta-[s/z]aki and E-[s/z]aki)

As in the case of sawa, /m/ behaves differently from other voiceless segments, as well as monomoraic bases.
Another morpheme that is specified as accented is hata farm:
(24) Oo-hata, Taka-hata; Ta-bata, O-bata; (Kawa-bata)
The OCP effect on Rendaku seems to be absent here note that Takahata
does not undergo Rendaku even though the last onset is voiceless [k]. It may
be that names with a monomoraic base undergo Rendaku, as in Tabata and
Obata, although the data is limited to only these two. Kawabata is exceptional not only in that it is accentless but also in that it undergoes Rendaku
even though the base-final onset is voiced, violating the OCP.
The morpheme ki tree shows a unique pattern. A name with this morpheme shows one of the following three patterns: (i) accentless without
Rendaku as in (25 a); (ii) accentless with Rendaku as in (25b); and (iii) accented without Rendaku as in (25c).

170 Hideki Zamma

(25) a. Suzu-ki, Sasa-ki, Ao-ki, Oo-ki, Masa-ki, Fuji-ki
b. Taka-gi, Ya-gi, Mote-gi, Kashiwa-gi, Aka-gi
c. Ara-ki, Kuro-ki, Shira-ki, Mura-ki, Tama-ki, Mi-ki
Interestingly, the fourth possible category of accented with Rendaku does
not have any members. Moreover, the OCP does not seem to play a role in
deciding Rendaku sensitivity, as shown below:






Both accented and accentless names are found in non-Rendaku and Rendaku
cells. Possible explanations would be either: (i) accentedness is fixed as
accentless and the names in (25c) are exceptions; or (ii) Rendaku sensitivity
is fixed as negative and the names in (25b) are exceptions. If we look at the
data more closely, it becomes clear that the former is a better analysis. Note
that the names which belong to (25c) have a base with [r] or [m] as the last
onset. Only when the base has one of these particular segments, does the
name gets accented instead of its [A] specification.10
It is worthwhile to note that the exceptional names in (25c) obey Sugitos
Law: when the name is exceptionally accented, it is always exempt from
Rendaku. Thus, we can conclude that the generalization is partly preserved
in names with this morpheme.
We summarize this section with the list in (27). As in the previous sections, each morpheme is supplied with information concerning its lexical
specification, Rendaku/accent correlation, the OCP, and peculiarities regarding monomoraic bases. In addition, the idiosyncratic behavior of some
morphemes is given in the rightmost column. Some morphemes either undergo or fail to undergo Rendaku when the base has [m] as the last onset,
which means that [m] must be regarded as both voiced and voiceless.

The correlation between accentuation and Rendaku in Japanese surnames


cation Accent correlation


Peculiarity of
monomoraic base

[m]: [v]

















(Yes: R)





(Yes: +R)


(in subgroups)




[m]: [v]
[m] and [r]: +A

Note that specified accentedness and OCP-driven Rendaku often produce

patterns which do not follow the generalization in (1). For example, when
sawa specified as accentless takes a base whose last onset is voiced, the
result is an unpredicted accentless non-Rendaku name, such as Fujisawa.

4.3. Names in which neither is specified

Several morphemes are specified for neither accentedness nor Rendaku
sensitivity. In other words, names with such a morpheme are randomly
accented and subject to Rendaku. Hashi bridge is one example:
(28) Taka-hashi, Mitsu-hashi, Moto-hashi; Ita-bashi; OO-hashi; Ishi-bashi
In (28), the first three names are accented without Rendaku, the fourth one
is accented with Rendaku, the fifth one is accentless without Rendaku, and
the last one is accentless with Rendaku. It is impossible to predict which
name will have which pattern, as they all have a two-mora base with a
voiceless final onset. Although it is not entirely clear for the seven examples found, it seems that this morpheme does not follow Sugitos Law.
A more complex case comes from kawa river. First, let us observe the
case where the base consists of two morae:
(29) a. Furu-kawa, Ichi-kawa, Yoshi-kawa, Nishi-kawa, Mae-kawa,
Hoso-kawa, Mori-kawa, Kuro-kawa, Yama-kawa, Tachi-kawa
b. Hase-gawa, Kita-gawa, Tani-gawa, Taki-gawa, Sasa-gawa,
Asa-gawa, Yana-gawa, Ima-gawa, Shina-gawa

172 Hideki Zamma







Examples with a monomoraic base are excluded in (29) and (30). It is obvious from (29) and (30) that this morpheme follows the generalization (1)
when the base is bimoraic: that is, the name is exempt from Rendaku when
accented and subject to it when accentless. This pattern is also observed in
three names categorized as having both kinds of accentuation with/without
Rendaku, as they are either accented without Rendaku or accentless with
Rendaku (e.g. Shimo-kawa/Shimo-gawa).
This distribution can be accounted for if we assume that kawa is not
assigned any specific value for accentedness or Rendaku sensitivity. In this
case, a name randomly takes a value for accentedness but not for Redanku,
for reasons we will see shortly and if it is accented, obedience to Sugitos
Law prohibits it from undergoing Rendaku. Conversely, if it is accentless,
the name undergoes Rendaku for the same reason.
Also interesting in (30) is the fact that accentless Rendaku names are
restricted to names with bases having final voiceless onsets, but accented
non-Rendaku ones do not have such a restriction. This is because the OCP
only prohibits two consecutive voiced consonants, but not a voicelessvoiceless sequence. Suppose a name with a final voiced onset (say, paba)
has a [A] value. Sugitos Law would predict an illegal voiced-voiced sequence *Pabagawa. On the other hand, obedience to the OCP produces a
pattern which violates Sugitos Law when it preserves the [A] value:
*Pabakawa. Satisfying both the [A] value and the OCP is thus impossible
for a name with a final voiced onset, which leads to the near nonexistence
of [+v] names in the accentless Rendaku cell of (30).
On the contrary, a base with a final voiceless onset does not violate the
OCP if it does not undergo Rendaku: a sequence of two voiceless segments
is not ill-formed in itself with respect to the OCP. Thus, if a name with a
base-final voiceless onset is given a [+A] value, Sugitos Law prohibits it
from undergoing Rendaku, resulting in many [v] as well as [+v] names
in the accented non-Rendaku cell of (30).

The correlation between accentuation and Rendaku in Japanese surnames


When the base is monomoraic, the morpheme shows a distinct pattern: the
name is accented and subject to Rendaku:
(31) Se-gawa, Ta-gawa, E-gawa, Ka-gawa, I-gawa, Sa-gawa
It is clear from these examples that kawa receives special treatment in
names with this kind of base, so that they must be assigned with [+A, +R]
Interestingly, the pattern varies slightly when the name literally refers to
a river not a person even though the same morpheme is used as a head.
River names are always subject to Rendaku, and moreover, they are typically accented as shown in examples of longer shape (32b). Four-mora
names are always accentless as in (32a), due to a restriction against accented
four-mora nouns (cf. Kubozono 1996; Zamma 2001, 2003).
(32) kawa in river names:
a. Kamo-gawa, Yodo-gawa, Shuku-gawa, Kako-gawa, Ibi-gawa
b. Katsura-gawa, Takase-gawa, Nagara-gawa, Temuzu-gawa
This difference suggests that the specification is determined not only by the
morpheme itself, but also by the type of noun it produces.
We summarize this section with the table in (33).




Peculiarity of
mono-moraic base


Yes: +A, +R

Although the number of attested morphemes is small in this limited study,

it seems to be the case that some morphemes are not assigned any accent/Rendaku values.

4.4. Names with peculiar patterns

Some morphemes show quite a bit of idiosyncratic behavior with regards to
accentuation and Rendaku. One is the morpheme too wisteria. Below is
an exhaustive list of names with this morpheme found in this study (15 in

174 Hideki Zamma

(34) a.

after nasal:
after [u]:
after [(a)i]:

+A, +R
+A, +R
A, R
+A, R

En-doo, An-doo, Shin-doo, (Kon-doo)

Ku-doo, Su-doo, Shu-[t/d]oo, (Mu-too)
Sai-too, Nai-too, (I-too)
Sa-too, Ka-too, E-too, (Go-too)

Those which are exceptional in each category are enclosed in parentheses.

The category in (34 c) may be described as being after moraic [i] rather than
syllabic [ai], as similar behavior to Saitoo and Naitoo is observed in Itoo,
though being the only example.
The voicing in (34 a) might have resulted from Post Nasal Voicing, by
which the voicing of the base-final nasal might spread to the following
morpheme-initial /t/. Other specifications cannot be attributed to other,
clear-cut properties, and seem quite arbitrary.
Another fact worthy of comment is that this morpheme typically takes a
monosyllabic base. This is quite remarkable for such a non-productive morpheme as too. Although it is not unusual for productive morphemes (such
as ta and kawa) to have a monosyllabic base, bisyllabic (or more) bases are
much more common even for such morphemes. Too, on the other hand, rarely takes bisyllabic bases: only two examples come to mind (i.e. Kawatoo
and Sugitoo). This preference for a monosyllabic base is also a property
specified for too.
5. Concluding remarks
Japanese surnames provide interesting data as to accentuation and Rendaku,
much as other word classes do. Their behavior differs depending on the
head morpheme of the name, i.e. the rightmost element. As shown in Section 4, this variation can be quite diverse. Sometimes such behavior is observed only in a single morpheme, as each morpheme determines its own
phonological behavior according to its lexical specification.
Sugitos Law represents one kind of morpheme-specific constraint. It is
satisfied by many morphemes, producing the overall tendency of preserving the law as we saw in Section 3, but not always. As is obvious from
the lists in (16), (27) and (33), quite a few morphemes show a pattern
which violates Sugitos Law.
One naturally wonders how such variation among morphemes can be
theoretically treated. Clearly it is far beyond the scope of theories based on
a simple dichotomy or on groupings of some kind, such as classhood (cf.
Siegel (1974), Kiparsky (1982), Benua (1998), etc.) or lexical strata (cf. It

The correlation between accentuation and Rendaku in Japanese surnames


and Mester 1995b, etc.). Consequently it should be handled within a theory

which allows considerable variation (e.g. Inkelas (1998), Orgun (1998),
Anttila (2002), etc.). Still we must await future studies to see what such
analyses would be like.

A very preliminary version of this paper was presented at the annual meeting
of PAIK (Phonological Association in Kansai) held at Kobe College on
July 13, 2002. I would like to thank PAIK participants (especially Shigeto
Kawahara, Haruo Kubozono, Kazutaka Kurisu, Michinao Matsui and Akio
Nasu) and Jeroen van de Weijer for their valuable comments and discussion. I am also grateful to Mark Campana, who suggested stylistic improvements of this paper.

1. The original list is more complicated because Sugito also made a comparison
between the dialects spoken in Tokyo and Osaka.
2. Kubozono (this volume) also discusses this issue.
3. These four names in fact follow Sugito's Law, as they undergo Rendaku when
they are accentless, and do not when accented. As none of the factors accentedness and Rendaku sensitivity are fixed for them, they are included in this
4. Tahara is the only name with a monomoraic base which satisfies the general
specification of hara: i.e. accented without Rendaku.
5. In addition, one of the accented Rendaku names has a base which ends with a
moraic nasal (Kambara). It might be possible to attribute this Rendaku to the
nasal segment that is, to Post Nasal Voicing, in which a moraic nasal acts as
the trigger.
6. Only Mizu[g/k]uchi has an alternative pronunciation in which Rendaku does
not apply.
7. The only name with /m/ which does not belong to this category is Misawa,
which has a monomoraic base.
8. The pattern with shima is quite different from that found in river names, where
non-Rendaku names get accented (cf. Tanaka, this volume). In this case, morever, the voicing of the last onset of the base is not the trigger of Rendaku: e.g.
Sakura-jima and Itsuku-shima.

176 Hideki Zamma

9. The only exception is Naka-tsuka.
10. The only exceptions are Namiki, which is accentless with [m], and Ueki, which
is accented without [m].

A survey of Rendaku in loanwords

Tomoaki Takayama

The aim of this article is to present a survey of rendaku (sequential voicing)
in loanwords. There are some differences in phonological behavior between
native words and loanwords. It is often said that rendaku is one of those
differences, since rendaku hardly occurs in loanwords. However, we find a
number of exceptional occurrences. In this article, we take up some problems concerning such examples. Investigation into those exceptions sheds
light on some aspects of lexical stratification in Japanese. The question of
lexical stratification is one of the central issues in recent research in generative phonology, and some of the principal studies (It and Mester (1999a,
2003); Fukazawa, Kitahara, and Ota (2002); among others) have paid attention to phenomena such as rendaku in the Japanese lexicon. This article,
however, intends to survey the relationship between lexical stratification
and rendaku from a different viewpoint. If we try to answer the question of
what is the loanword stratum, or what is the relationship between lexical
stratification and phonological phenomena, we need to look further into the
background behind the lexical stratification. Especially we have to recognize the significance of stylistic and sociolinguistic aspects. Paying serious
attention to these aspects helps us to understand what the occurrence of a
phonological phenomenon depends on. In our opinion, such a consideration
is useful even for theoretical research.
The borrowed vocabulary of the Japanese language consists of two main
groups: the group of Sino-Japanese (SJ) words,1 and the group of foreign
words that are largely borrowed from European languages. These two
groups differ from each other with respect to rendaku occurrences. In the
following sections, we look at this difference by examining loanword rendaku examples, and discuss some issues in loanwords in Japanese. Without
intending to reach a conclusive argument, this article emphasizes the importance of stylistic or sociolinguistic aspects when dealing with phonological

178 Tomoaki Takayama

In the following discussion, we will use word group or vocabulary instead of lexical stratum in order to avoid undesirable confusion, because
the latter is often used in a restricted sense in recent works of generative

1. Rendaku exceptions in foreign loanwords

If we put aside SJ loanwords (see section 2), rendaku does as a rule not
occur in foreign loanwords that have been borrowed mainly from European
languages. Nevertheless, there seem to be some exceptions, as illustrated in
the examples (1) and (2) which were borrowed from Portuguese.

karuta Japanese style card game

iroha garuta2 iroha3 karuta (iroha card game)
haikai garuta haiku karuta (haiku card game)


kappa rain jacket, rain wear, raincoat

ama gappa
kappa (rain wear)
bin l gappa
plastic, bin l from vinyl kappa
(rain wear made of plastic)

Interestingly, Japanese native speakers have a kind of intuition about

whether some word is foreign or not. Of course, this intuitive judgment
does not always agree with the etymological facts. There are a number of
foreign loanwords that are thought of as non-foreign by the majority of
native speakers, except by people who have some special knowledge of
etymology. The words karuta and kappa in (1) and (2) above are members of this type.
There seem to be two reasons why some foreign loanwords easily merge
into the native word group (or into the SJ word group). First, some foreign
loanwords have the same phonotactic arrangement as native (or SJ) words.
For this reason, they apparently do not look like foreign words, and thus
have a natural tendency to merge into the non-foreign vocabulary. Second,
they have already lost any connection that would associate them with a
foreign culture. Let us illustrate this point with a few examples.
Ikura salmon roe originated from the Russian ikra. The great majority
of Japanese people think that ikura is a purebred native word. The word
form itself provides no phonotactic clues that would lead native speakers to

A survey of Rendaku in loanwords


know it came from a foreign language. Moreover, ikura has no cultural

association with Russia. Japanese people regard it as a typical Japanese
seafood. In contrast, for example, pirosiki refers to the Russian style pie
piroshki, which many bakeries sell in Japan. Since its form has an initial /p/
that we rarely find in the common native words, people easily recognize it
as a foreign word. In addition, this food has some cultural ties with Russia.
Another example is okura gumbo, which entered the Japanese lexicon
via the English okra. This vegetable has spread across the nation during the
last couple of decades and is now found all over Japan. If sellers had
wanted consumers to notice that this is a foreign vegetable, they could have
adopted a form that looks like a foreign word such as
kur. Instead, they
adopted the native-like okura. This choice has successfully won a great
number of consumers who believe that okura is a domestic vegetable. On
the other hand, if a loanword connotes a foreign cultural background, as in
example (3), native speakers do not believe that it is a native word even if
its form apparently looks like a native word. Thus, we have to take into
consideration both word forms and connotation.

sonata4 sonata, a form of classical music

Let us return to rendaku in words like karuta, kappa. What was said about
ikura equally applies to karuta and kappa. Although both words were originally borrowed from Portuguese in the 16 th century, they are not foreign in
terms of a native speakers intuition. First, they look like non-foreign words
in terms of their forms.5 Second, neither of them has any connotation with
something foreign. As for karuta, it refers to a Japanese style card game,
and Japanese people believe that playing karuta belongs to their own tradition. We can also regard kappa as a non-foreign word. In fact, it has a foreign
counterpart rein k
to, which comes from the English word raincoat. Although both kappa and rein k
to are daily expressions in present Japanese,
they are subtly different from each other in connotation. People prefer rein
to to kappa in some contexts, because the latter suggests cheaper or less
fashionable quality.
Another example, illustrating the history of rendaku in loanwords, is (4)
karuka, which refers to the stick for loading a bullet into the barrel of a
matchlock gun from the muzzle.

karuka stick for matchlock gun

kae garuka spare or alternative karuka
(alternative stick for matchlock gun)6

180 Tomoaki Takayama

This word is considered a loanword from the Portuguese calcador, which
means a tool to press something. It seems that karuka went through truncation in the process of its nativization. Its rendaku form garuka is attested in
Monogatari7 published in the middle of the 19th century during the
Edo period, and it is natural to assume that its rendaku form dates back to an
even earlier time. Karuka was conventionally spelt by two specific Chinese
logographs with the usage for native words (kun-reading), which is attested
in an older manuscript of Z
Monogatari written in the 17th century.
This spelling convention suggests that karuka had already merged into the
native word group. Probably it did not take much time to generate the rendaku form after the truncation.
The above examples show that there is a correlation between the merging of foreign loanwords into the non-foreign vocabulary and their rendaku
forms. In addition to these examples, the last example we will examine in
this section allows us to discuss a somewhat complicated semantic aspect
of nativization. Ketto in example (5), which comes from the second half of
blanket, referred to a kind of blanket or a kind of blanket-like cloth that
gained nationwide currency in the late 19th century.

ketto a kind of cloth

aka getto red ketto (red coloured ketto)

An investigation of its written form in Chinese logographs shows that ketto

reminded native speakers of the native word ke, which means wool. It is
probable that this folk-etymological interpretation of ketto provided a moment of deviation from the foreign vocabulary, after which this deviation in
turn caused the rendaku aka getto. However, we also have to take into account that aka getto had a kind of pejorative connotation as well. This word
also referred to some people in urban areas who were regarded as unsophisticated because they originally came from the countryside. It is pointed out
that this metonymic meaning originated from the fact that people from the
countryside often came to urban areas being clothed in aka getto red ketto
instead of a cloak or an overcoat. It is important to notice that this compound
with an exceptional rendaku in the foreign element refers to an unsophisticated object. We may speculate that such a nativized form was suitable for
stigmatizing someone or something as lacking sophistication; by contrast,
authentic western objects match foreign words that are less nativized. Although further investigation is needed, aka getto suggests that a connotative
effect could be brought into a loanword by rendaku, one of the nativizing

A survey of Rendaku in loanwords


processes. This problem is worthy of further investigation in order to clarify the relationship between phonological phenomena and semantic aspects
in loanwords.
To sum up this section, rendaku in foreign loanwords takes place only in
the words that merged into the non-foreign word group (including both
native words and SJ words). Therefore, we conclude that the occurrence of
rendaku essentially depends on the difference between the foreign word
group and the non-foreign word group. In addition, there are still other
problematic rendaku cases, of which some examples will be mentioned in
(19) of section 4.

2. Rendaku in Sino-Japanese words

Rendaku takes place not only in the above-mentioned foreign loanwords (in
the etymological sense) but also in SJ loanwords. However, cases in SJ are
more complicated than in foreign loanwords.
We easily find a number of cases similar to the examples discussed in
section 1, as shown in (6).

kiku chrysanthemum
sira giku white kiku
no giku wild kiku

Although kiku originates from the Classical Chinese kuk, this word had already joined the native word group in the Heian period (ca. 9th12th century).
One striking evidence is that in those days kiku was commonly and quite
often used in Japanese poetry Yamato uta or waka where the usable expressions were confined to the native word group except for extraordinarily
licensed cases. This indicates that the rendaku in compounds with kiku in (6)
reflects its membership in the native word group. We therefore can explain
rendaku in kiku in the same manner as in section 1, where it was concluded
that the foreign loanwords that undergo rendaku are limited to words which
have merged into the non-foreign word group. In fact, the form of kiku
looks like a native form in terms of phonotactics. This property must have
been one of factors that led kiku to join the native word group.
However, not all cases of rendaku in SJ can be treated in the same manner
as kiku. We also find a special type of rendaku words that does not follow the
pattern we looked at in section 1. Some examples are given in (7) below.

182 Tomoaki Takayama

(7) a. h
service, labor
nenki b
years of employment h
(labor or apprenticeship)
detti b
apprentice h
b. hyakusy
peasant, farmer
mizunomi byakusy
water drinking hyakusy

(lower class peasant)

c. kko practice, exercises
hatu gko first, starting kko (practice at the beginning of a year)
kan gko the coldest season kko (practice in the coldest season)
d. kenka quarrel, blows
dai genka brother or sister kenka
(quarrel or fight between siblings)
kuti genka mouth kenka (quarrel, dispute)
e. kesy
atu gesy
thick kesy
(heavy makeup)
usu gesy
thin kesy
(light makeup)
f. suiry
conjecture, surmise, guess
ate zuiry
to shot at random suiry
g. syasin photograph
kao zyasin face syasin (photograph of face)
ao zyasin blue syasin (blueprint)
h. tepp
mizu depp
water tepp
(water pistol, squirt gun)
kara depp
blank tepp
(a blank shot of gun)
i. t
isi d
stone t
(lantern made of stone)
mawari d
rotative t
(rotative lantern)
j. t
ry stay
naga d
ry 8 long stay (long stay)
As pointed out in section 1, the application of rendaku essentially depends
on whether a word is foreign or not (not in the etymological sense).9 As far
as we look at cases such as kiku in (6), the same situation seems to apply to
SJ. However, the examples in (7) are not likely to adapt to the same treatment. These examples have the following phonotactic properties that seldom
appear in native words but quite frequently occur in SJ words.

A survey of Rendaku in loanwords



palatalized elements; e.g. in hya, sya, sy

, ry
, ry ; especially, those
after non-coronal consonants; e.g. in hya, ry
, ry ; moreover, multiple
palatalized elements normally occur in a word; e.g. in hyakusy
(ii) multiple long vowels in a word; e.g. in h
, t
, t
ry .
(iii) the sequence NT (voiceless obstruents after nasal); e.g. in kenka.
In the examples in (7) we do not detect any tendency towards native word
forms, while we do find such a tendency in (6) kiku and in the examples
examined in section 1. This indicates that, unlike foreign words, SJ words
undergo rendaku even if they have no similarity to native forms.10
In order to understand the background behind these examples, we need
to look at the complex status of SJ in the Japanese lexicon. Of course, it is
often said that one of the differences between SJ words and native words
relates to the degree of formality or the stylistic diversity. For instance, a
native word such as sakana ya in (8) is preferred in informal or colloquial
contexts, whereas the SJ word sengyo ten in (8) is preferred in formal contexts. In general, a great number of SJ words constitute nomenclatures of
many scholarly fields, official expressions in current topics like politics or
economics, and dignified expressions unique to various ceremonies.

a. sakana ya (fish retailer shop)

b. sengyo ten (fish retailer shop)

However, it is important to also pay attention to the fact that the great majority of SJ words are not uniquely used in formal contexts. Even among SJ
words, there are differences in formality or style and we find a great number
of SJ words that are more compatible with informal contexts. The examples
(9), (10), and (11) illustrate this point.

a. isya (doctor, physician, surgeon, practitioner)

b. isi (doctor, physician, surgeon, practitioner)

(10) a. tepp
b. zy (gun)
(11) a. hyakusy
(farmer, peasant)11
b. n
min (farmer, peasant)
Although both isya and isi in (9) are SJ, isya is a more informal expression
preferred in daily colloquial contexts; isi is a stiff expression mostly used in

184 Tomoaki Takayama

formal contexts such as official statements or documents. A similar difference is observed between tepp
in (10) and zy in (10), or between
in (11) and n
min in (11). Thus, contextual pluralism exists even
inside the SJ vocabulary itself. We tentatively call the informal or colloquial side of the SJ vocabulary vulgarized Sino-Japanese12 (the reason why
we avoid applying the term group for the vulgarized SJ will be mentioned
Apart from words such as kiku that merged into the native word group,
possible targets of rendaku in the SJ vocabulary are to be sought in the vulgarized SJ words. For example, tepp
in (10) and hyakusy
in (11) undergo
rendaku as already illustrated in (7) mizu depp
and in (7) mizunomi
, respectively. All SJ words that undergo rendaku are informal or
colloquial expressions, and we can recognize them as vulgarized SJ words
(Some words with rendaku like mizunomi byakusy
in (7), are seldom heard
in present Japanese but they were used in informal or colloquial contexts).
The relationship between SJ words and rendaku is summed up in (12).
(12) Native

Vulgarized SJ
Formal SJ
Possible targets of rendaku

Since the difference between Vulgarized SJ and Formal SJ depends on the

degree of formality or the stylistic diversity, we cannot exclusively sort all
SJ members into two separated groups, and we have to recognize a gray
area between these two sides. However, the important thing is that this
point does not contradict the fact that all the SJ words that undergo rendaku
are or were used in informal or colloquial contexts, i.e., they must be vulgarized SJ words.
Finally, two kinds of voicing phenomenon need to be mentioned. One is
the voicing at the morpheme level, as shown in (13). This type of voicing
occurs regardless of the vulgarization in SJ words, but it is limited to sporadic cases. Another is the post-nasal voicing in SJ morphemes, as shown
in (14), which is attributed to the voicing of obstruents after a nasal element
in the native phonology.13
(13) san mountain
ka zan fire
mountain (volcano)
to zan climbing mountain (mountain climbing)

A survey of Rendaku in loanwords


(14) sya person

en zya to play, to perform person (actor or actress, performer)
Cf. saku sya to write, to make person (writer, maker)
It appears that the former phenomenon is historically related to the latter.
Further research on these phenomena from the diachronic viewpoint is

3. A difference between foreign and SJ words

Nowadays, a great number of foreign words are indispensable in various
situations of daily life and frequently used in colloquial contexts. Nevertheless, the foreign word group shows no symptoms of rendaku derivation, as
illustrated in (15), even if many foreign words have become common expressions in present Japanese.
(15) a. *kami goppu paper koppu<cup, Dutch (paper cup)
b. *isy
gsu clothing ksu<case, English (clothing storage box)
c. *ky sui danku water supply tanku <tank, English
(container for supplying water)
In the native word group, only a small number of minimal pairs are distinguished by the contrast between voiced and voiceless obstruents in initial
position, because the great majority of native words have no initial voiced
obstruents. It has been pointed out that this distribution correlates with the
occurrences of rendaku, because even if an initial voiceless obstruent is
voiced by rendaku, there can hardly be any conflict with another lexical
item in the native word group. On the other hand, a large number of such
minimal pairs are found among foreign words, as shown in (16).
(16) a. kurasu (class) : gurasu (glass)
b. kurpu(crepe) : gurpu (grape)
Although the avoidance of homonymic clash is one of the factors that block
rendaku in foreign words, this factor alone cannot sufficiently explain the
blocking of rendaku in the foreign word group. For one thing, there is not
always an initial voiced counterpart to each initial voiceless foreign word.
For example, koppu and ksu in (15) have no opponents such as goppu,

186 Tomoaki Takayama

gsu. If rendaku were blocked only by this functional factor, it could take
place in some of the foreign words that have no minimal pairs, even if they
were confined to a small number of words. Nevertheless, there are no essential exceptions, as we have seen in section 1. Second, the SJ word group
is the same as the foreign word group with respect to minimal pairs, because
initial voiced obstruents are very common. However, we do find occurrences
of rendaku as shown in section 2, even though the rendaku in SJ words is
limited in comparison to rendaku in native words.
In order to answer the question as to why the SJ word group and the
foreign word group differ from each other with respect to rendaku, we need
to separately deal with each word group before trying to address this main
question, and we also need to elucidate somewhat obscure aspects of the
Japanese lexicon. Since this article does not intend to fully discuss all these
issues, only a few brief remarks about the problems involved are made in
the final section below.

4. Problems of loanwords in Japanese

The final section focuses on some problems concerning the foreign word
group of which the significant growth began in the late 19th century, because it is relatively easy to look at a state of affairs from the present viewpoint.
In our times, Japanese people generally tend to respect the original form
of words that are borrowed from foreign languages.14 However, in the past,
some words were transformed in the process of their popularization, as is
reported and discussed by Grootaers (1976) and Sanada (1981, 1991) who
investigated the geographical distributions of word forms. Among the dialect forms they investigated, we find a kind of nativization that is rarely
observed in recently introduced foreign words.
(17) Variations of syaberu shovel:
a. syabori (bori is the rendaku form of hori, the gerund of the verb
horu dig. For details, see Grootaers 1976 and Sanada 1981, 1991)
b. syabiro (biro is the rendaku form of hiro, the stem of the adjective
hiroi wide. For details, see Grootaers 1976),

A survey of Rendaku in loanwords


(18) Variations of stsyon station (For details, see Sanada 1981, 1991):
a. stensyo (syo is a SJ morpheme, meaning site or place),
b. tensyoba (ba is a native word, meaning site or place. Cf. the
non-foreign word tsyaba station, composed of tsya a stop
and ba site or place),
A reanalysis inspired by folk-etymology or a blending with native or SJ
elements played an important role in the formation of these variants. This
displays a tendency for merging newcomers into the familiar existing vocabulary rather than separately constructing a new word group. The word
ketto in (5) can be added as another exemplification of this trend, since native
speakers associated ke- with the native word ke, as mentioned in section 1.
This nativization trend was stronger in the past, though it has not gained the
mainstream status. Furthermore, we cannot overlook Nakagawa (1966)s
suggestion that some compounds with foreign words occasionally undergo
rendaku, as shown in (19), even though these forms are unstable in comparison to non-rendaku forms.15
(19) a. indo gar Indo- kar <curry (Indian curry)
b. yama gyanpu mountain kyanpu <camp (camping in mountains)
Although it is necessary to further investigate into the foreign words that
older generations were using in colloquial or dialectal contexts, the examples
in (19), at least, demonstrate that even foreign words are subject to rendaku
in the same way as vulgarized SJ words. Of course, this trend has not been
observed. But it is noticeable that, at least in the past, some foreign words
were apt to undergo rendaku when they got used in colloquial contexts.
However, the leading class of native speakers, especially educated people,
respected the forms that are seemingly faithful to their foreign origin, and
they probably thought that these faithful forms should occupy standard
positions. It is possible that this inclination became predominant over the
tendency toward further nativization, such as rendaku and transformation
by means of reanalysis or blending. This trend has accelerated especially
after World War II and more speakers have become sensitive to the existence of foreign languages behind foreign words. At the same time, people
have become sensitive to the difference as to whether an initial consonant
of a foreign word is a voiced or voiceless obstruent (daku-on or sei-on).
Finally, a few words about SJ words need to be added. A main point of
concern is the relationship between rendaku and the complexity in the SJ

188 Tomoaki Takayama

vocabulary, and we focused particularly on vulgarized SJ words. The bulk
of the SJ vocabulary is not the result of daily close contact with spoken
Chinese, but rather the result of learning written Chinese or Chinese
logographs (so called Chinese characters), which were the sole medium of
written communication in East Asia. Therefore, people using the SJ vocabulary were limited to members of the upper class in the early stages. In
the history of the Japanese language, some parts of the SJ vocabulary
gradually entered the vocabulary used in colloquial contexts. It is necessary
to further investigate this diachronic process.
These are only a few brief remarks on the relationship between
phonological phenomena and lexical stratification. Further research from
the sociolinguistic viewpoint is indispensable for clarifying the problems of
both SJ words and foreign words.
We do not claim that the state of affairs in Japanese loanwords is idiosyncratic. Rather, we believe that a similar situation may be observed in
many languages, and that it is quite important to investigate the correlation
between phonological phenomena and the background of borrowing in
many languages.

This article is a revised version of Takayama (1999). I am grateful to
Jeroen van de Weijer, Kensuke Nanjo, and Tetsuo Nishihara for providing
an opportunity to contribute to this volume. I am indebted to Jeroen van de
Weijer and anonymous reviewers for many helpful comments. I would like
to thank Paul Hoornaert for useful suggestions for improving the English
expressions. Special thanks go to Yoiko Aoyama for valuable information
on rendaku words. Of course, I take ultimate responsibility for all the inadequacies and errors that remain. This work was partially supported by the
Grants-in-Aid for Scientific Research from the Japan Society for the Promotion of Science, No.15520285, 20032004.

A survey of Rendaku in loanwords


1. The Sino-Japanese vocabulary, which occupies an important part in the Japanese lexicon, originally derives from the Chinese language mainly before the
10th century through the learning of the Chinese logographic system (so-called
Chinese characters).
2. There are various kinds of karuta and each kind is named using a compound
with karuta such as iroha garuta. However, in the written forms of these compounds, we encounter non-rendaku forms such as {iroha karuta}, {haikai karuta}
({} indicates the transliteration of kana phonograms). I think that it owes to the
fact that those written forms do not always directly reflect their sound forms,
and/or to the fact that they are often pronounced without rendaku voicing.
3. Iroha is a series of kana syllabaries like an alphabet, which is ordered by i, ro,
ha, ni, etc.
4. There is another sonata in the native vocabulary, which is an antiquated expression referring to the second (singular) person. Nowadays, this word is used
only in historical plays or dramas.
5. Vance (1987: 141) argues that kappa is not a virtual native word but is rather
virtual Sino-Japanese. However, in some cases, it is hard for native speakers to
determine by intuition whether some word is native or SJ. Kappa is one of
those cases. The boundary between these two groups is not always clear inside
the non-foreign vocabulary. The boundary between foreign and non-foreign is
clearer, although there are also vague cases.
6. In Zhy Monogatari (see note 7), we find a compound like futo+karuka
(thick stick). But we cannot determine whether its form was karuka or garuka.
Although the kana phonogram system has a diacritic mark dakuten to indicate
voiced obstruents, it was not a compulsory element in the Edo period. Without
this mark, we have no decisive clue to the rendaku form.
7. Zhy Monogatari (Tales of common soldiers) is a collection of stories told
by experienced common soldiers. Probably its first manuscript appeared in the
17 th century.
8. Some speakers use a form without rendaku, naga try.
9. When saying that rendaku can apply to a word group, this does not mean that
rendaku occurs to all targets belonging to that group.
10. There is a phonotactic constraint in the native vocabulary: more than one voiced
obstruent per morpheme is prohibited. This constraint explains why rendaku
is blocked in words that already have more than one voiced obstruent (see
Kubozono, this volume, It & Mester 1986, Yamaguchi 1988, and Haraguchi
2002). As illustrated in (7), every SJ word that undergoes rendaku has no
voiced obstruent by itself. Although a SJ word generally comprises two morphemes, it behaves like one simplex word with regard to this point.
11. Since hyakusy often has a pejorative connotation, a form with honorific elements o-hyakusy-san is preferred.

190 Tomoaki Takayama

12. It is often said that the honorific prefix o has a tendency to be added to native
words, such as o-tukue (desk), o-sara (dish), o-hana (flower), whereas the
other honorific prefix go has a tendency to be added to Sino-Japanese words,
such as go-t
tyaku (arrival), go-sy
kai (introducing). The prefix o is also often
added to vulgarized SJ words, such as o-keiko, o-kesy
, o-syasin. At the same
time, we find many examples in vulgarized SJ words, such as go-h
, gosuiry
. When accounting for all these examples, we need to pay attention to
the fact that the meaning of o is not exactly equivalent to that of go.
13. In Optimality Theory analysis of Japanese phonotactics, this is expressed by
the constraint *NT. It is said that the contrast between voiced and voiceless
obstruents is historically derived from the contrast between prenasalized consonants and plain ones. At the earlier historical stages, the sequence ND can be
analyzed as the gemination of prenasalized consonants (Rices article in this
volume also discusses this point).
14. Truncation is applied to a great number of long foreign words, as shown in
paso con derived from psonaru konpj t (personal computer). This might appear incompatible with the inclination above-mentioned, but this is not the
case because it involves in length in the Japanese language, which is not a subject for our discussion.
15. According to my wifes recollection, her grandmother, who was born in 1905,
used some foreign words with rendaku.

Recognizing Japanese numeral-classifier

Keiichiro Suzuki

1. Introduction
The main goal of this paper is to present the results of a study conducted to
improve the performance of Large-Vocabulary Continuous Speech Recognition (LVCSR) by modeling context-dependent pronunciation variation
(i.e. morphophonemic alternation) and context-independent pronunciation
variation (i.e. free variation). In particular, I report the results of performance tests run on numeral-classifier combinations in Japanese (e.g. ni-hon
two stick-type objects, san-bon three stick-type objects), showing how
the accuracy of our Japanese LVCSR engine was improved through modeling the context-dependent pronunciation variation and context-independent
pronunciation variation. On the one hand, these numeral-classifier combinations are a typical subject of phonological/morphological study, displaying
linguistically significant, regular morphophonemic voicing alternation
patterns. On the other hand, the same set of data shows linguistically insignificant free variation involving voicing. I demonstrate that these two types
of pronunciation variation are indeed captured by the same process of statistical adjustment in our LVCSR engine.
The secondary goal of this paper is to introduce a glimpse of research in
the area of Automatic Speech Recognition (ASR) to a phonological audience
and to contribute to the knowledge transfer between the two disciplines.
While much attention has been paid to the inter-disciplinary study between
phonology and cognitive science, not much discussion has been generated
between phonology and speech engineering. The practice of computational
phonology (Bird 1995) does exist; however, it studies implementations of
theoretical phonology, which is not the same as the research aimed at improving ASR systems.
Note that it is not the goal of this paper to offer some particular linguistic
insight. Rather, this paper presents an alternative look at the Japanese numeral-classifier combinations, a typical subject for a phonological analysis,
from the viewpoint of a commercial LVCSR.

192 Keiichiro Suzuki

This paper is organized as follows. In the remainder of this section, I provide
a brief overview of our LVCSR. In Section 2, I describe the problem that
we aimed to resolve. Next, I discuss the solution we took to resolve the
problem. Section 4 gives our test results to verify that our solution was successful. Finally, Section 5 concludes the paper.

1.1. Brief overview of LVCSR

LVCSR takes the incoming acoustic stream as its input and outputs some
text string that matches the input. Thus, the goal of ASR systems is to
maximize the probability of a string that best matches an input acoustic
stream. What is the most likely sentence W out of all sentences in the language L given some acoustic input X? This can be expressed as:

a. Acoustic input = sequence of observations: X = x1, x2, xt

b. Output sentence = sequence of words: W = w1, w2, wn
c. The goal of ASR: W = argmax P(W | X)

The term, argmax, in (1c) means that the formula following it (the probability
of the output word sequence given the sequence of acoustic input) is maximized. The formula in (1c) straightforwardly expresses the goal of ASR: to
find the string of words W that maximizes P(W | X). Rather than trying to
solve the formula as is, we use Bayes rule to (1c) to get the mathematically
equivalent formula (2):

W = argmax P(X | W) * P(W) / P(X)

It is quite difficult to directly compute the posterior probability P(W | X).

Bayes rule provides a way of calculating the posterior probability,
P(W | X), from P(W), P(X), and P(X | W). P(X | W) is the posterior probability of the acoustic input sequence given the word sequence.
In (2), the denominator, P(X), is a non-factor, since the probability for
the given acoustic input does not change for each potential output sentence.
Removing the denominator, the final formula is P(X | W)*P(W), and this is
THE golden formula for any ASR system.

W = argmax P(X | W) * P(W)

Recognizing Japanese numeral-classifier combinations


The first part P(X | W) can be calculated by what we call the Acoustic Model
(AM) and the second part P(W) can be calculated by what we call the Language Model (LM).
AM is a collection of probabilistic sound sequences for a given word. In
our LVCSR, AM is based on the Hidden Markov Model (HMM), which
gives transitions of observation sequences when no deterministic information about observed input is given (thus the name is Hidden). Many stateof-the-art ASR systems employ some form of HMM. Our AM treats each
state in the HMM as the basic subphonetic unit called a senone (Hwang and
Huang 1993). Senones are the units composing a triphone (context dependent phones), consisting of the left context, the phone, and the right context
(e.g. /to/ consists of two triphones, <sil>-t+o and t-o+<sil>, where <sil> is
silence). The parameters (probability values) for our AM can be automatically estimated by going through hundreds of hours of acoustic data. Once
the AM is trained, the spectral features extracted from the acoustic input get
computed and matched against the probable phone sequence for the candidate word.
LM is the model for determining the probability of a word sequence w1,
w2, wn, namely P(w1, w2, wn). This probability gets broken down into
its component probabilities by the Chain Rule:

P(w1, w2, wi) = P(w1)*P (w2 | w1)**P (wi | w1, w2, wi-1)

 P (w | w

1 )


Since it may be difficult to compute a probability of the formula P(wi | w1,

w2, wi-1) even for moderate values of i, we typically assume that the
probability of a word depends only on a set number of previous words (N).
This leads to an n-gram language model.


P (w1n ) 

 P (w | w

iN+1 , wiN+ 2 ,K, wi1 )


If the probability of the word depends on the previous two words (N=3), we
have a trigram (6c). Similarly, it is called a unigram when N=1 (6a), a
bigram when N=2 (6b). The trigram language model is widely used in most
commercial LVCSR systems today.


a. Unigram:

P (w1n ) 

 P (w )


194 Keiichiro Suzuki


b. Bigram:

P (w1n ) 

 P (w | w

i1 )


c. Trigram:

P (w1n ) 

 P (w | w

i2 wi1 )


Applying this to Japanese, for example, consider the phrase gakkooniiku

going to school. Since there is no white space to delimit words in Japanese
texts, we take what we consider a morpheme as the base unit, such as gakkoo
school, ni to, i (to) go, ku [present tense]. Using bigrams, the probability for this sentence is calculated as:

P(gakkoo, ni, i, ku) = P (gakkoo | <s>) * P (ni | gakkoo) * P(i | ni) *

P(ku | i) * P(</s> | ku)
where <s> and </s> are the placeholders for sentence initial and
sentence final

The current practice is that we use large text corpora to calculate the ngram probabilities. It follows that the larger the size of the text corpora, the
better the n-gram coverage. Even with large corpora, there will always be
many word sequences, especially for trigrams, that get zero probability.
There are various discounting and smoothing techniques to circumvent this
data sparseness problem, but I will not discuss them here (See Huang, Acero, and Hon 2001 for more details).
In order for the recognizer to know the phonetic content of the word
sequences stored in LM, the module called lexicon acts as the database
storing the pronunciation(s) of each word counted in LM. Many LVCSR
systems are equipped with the lexicon containing over 100,000 words for a
given language. The recognizer is only capable of recognizing words that
are listed in the lexicon. Thus, it is safer to have a large lexicon to avoid
out-of-vocabulary errors.
The above is a quick introduction to ASR and specifically to LVCSR.
Besides AM, LM, and the lexicon, there are two more important pieces to
the system: Front End and Decoder. I will not cover these topics, since these
are not relevant to the main discussion of this paper (for more in-depth introductions to ASR, see Jurafsky and Martin 2000, Huang, Acero, and Hon
2001, Shikano, et al. 2001).

Recognizing Japanese numeral-classifier combinations



The Problem

2.1. Japanese numeral-classifier combinations

Classifiers () in Japanese attach to numerals () to express the
type of object being counted, e.g. ni-mai () two thin paper-like objects,
san-mai () three thin paper-like objects. Some numeral-classifier
combinations are simple, in the sense that the pronunciation for the particular
combination is just a concatenation of the pronunciations of individual
parts. For example, we get ni-mai () two thin paper-like objects by
adding the pronunciation of the classifier part mai to the numeral part ni.
However, the majority of numeral-classifier combinations are complex, in
the sense that the pronunciation of the whole is not simply the combination
of the two parts.
There are three notable characteristics about these complex numeralclassifier combinations. First, some classifiers have multiple pronunciations,
and the pronunciation of the particular classifier is determined by the preceding numeral. For example, the same classifier /hon/ () stick-shape
object is pronounced as pon in ip-pon () one stick-shape object, hon
in ni-hon () two stick-shape objects, and bon in san-bon () three
stick-shape objects. Second, some numerals themselves have multiple pronunciations, and the particular numerals pronunciation is dependent on the
following classifier. For example, the same numeral /ichi/ () one is pronounced as ichi in ichi-ban () the first, ip in ip-pon () one stickshape object, and hito in hito-tsuki () one month. Finally, some numeral-classifier combinations have multiple pronunciations that are in free
variation. For example, for the term one stock (), it is equally plausible
for a native Japanese speaker to pronounce it as ichi-kabu, ik-kabu, or hitokabu. Similarly for eight cups (of) (), it can be pronounced as hachihai or hap-pai. The existence of such variations was validated by consultation of native Japanese colleagues at Microsoft. The three characteristics
of the complex numeral-classifier combinations are summarized below.

The characteristics of the complex numeral-classifier combinations

a. For some classifiers, their pronunciation varies depending on what
the preceding numeral is.
b. For some numerals, their pronunciation varies depending on what
the following classifier is.
c. Some numeral-classifier combinations are in free variation.

196 Keiichiro Suzuki

The first two characteristics (8a,b) are typical context-dependent morphophonemic alternation cases, both of which can be a subject of phonological/
morphological study in theoretical linguistics. On the other hand, the third
characteristic (8c) may not be something that would attract theoretical linguists interests, being completely context-independent. On the contrary,
commercial ASR systems are required to deal with both the contextdependent and the context-independent characteristics of the complex numeral-classifier combinations, because the users of the Japanese LVCSR
are likely to produce any of the free variants.

2.2. The description of the problem

Our Japanese LVCSR engine was having trouble dealing with numeralclassifier combinations. The accuracy of our Japanese engine was noticeably
lower when the dictated sentence included certain numeral-classifier combination(s), and so I conducted a large scale study to tackle the problem.
The essence of the problem was the following. When we train our LM,
billions of raw text data from various corpora are processed to 1) produce a
list of words appearing in the data to be in the lexicon, and 2) produce ngram counts by calculating the frequencies of each word in the lexicon. We
had a problem in each of the two points in the LM creation process. For 1),
no checking was done on the lexicon to make sure that all pronunciation
variants are listed in the lexicon. Thus, even though our lexicon contained
the item /hon/ () with the pronunciation hon, it might have lacked pronunciation variants such as pon and bon. For 2), the pronunciation variation
might be incorrectly modeled by treating all numeral-classifier combinations
as simple. If the combination was a context-dependent one, the characteristics in (8a,b) must be resolved to get the correct pronunciation of the combination. This step was simply ignored. Moreover, even if we resolved the
dependencies, the correct pronunciation may have a free variant. No
mechanism existed to assign appropriate probabilities to these free variants.
The three problem areas that we identified at two points during the LM
training process are summarized in (9) below.

The problem areas

a. Lexicon creation stage: no checking was done on the lexicon to
make sure that all pronunciation variants are listed in the lexicon.

Recognizing Japanese numeral-classifier combinations


b. N-gram count file creation stage: no attention was paid to the pronunciation variation for numeral-classifier combinations.
c. N-gram count file creation stage: no mechanism existed for assigning appropriate probabilities to free variants.
In the next section, I discuss how we dealt with each of the above problem


The Fix

3.1. Lexicon check

In order to make sure that all pronunciation variations are covered in the
lexicon (9a), I went through the lexicon and checked to see if all pronunciation variants for a given numeral or a given classifier were listed in the
lexicon. This process was necessary, since the pronunciation for a numeralclassifier combination is not always regular.
The actual method I used was the following. First, I identified 65 representative classifiers. These representative classifiers were selected from the
base list manually created to cover the most frequent classifiers. Then I produced a table that lists 1) all pronunciation variations of these classifiers and
2) all pronunciation variations of the numerals 0 through 10. Exhausting
possible pronunciations for each numeral and classifier was a manual process as well. Finally, I went through the lexicon and added the entries if any
of the pronunciation variants in the two tables were not listed in the original
lexicon. This final process was automated, as we had a tool to query the
lexicon to confirm whether the pronunciation in question exists.
Note that there is an alternative solution in which we add individual
numeral-classifier combinations as a lexical entry in addition to the individual numerals and classifiers. Adding the numeral-classifier combinations
would provide direct mappings between the particular numeral-classifier
combination and its pronunciation variation. However, such a brute-force
solution was avoided since we did not wish to mess up the probability distribution by introducing all sorts of numeral-classifier combinations to the
lexicon. The size of n-grams would unnecessarily increase if we added individual numeral-classifier combinations like ip-pon () one stickshape object ni-hon () two stick-shape objects, san-bon ()
three stick-shape objects, hyap-pon () one hundred stick-shape

198 Keiichiro Suzuki

objects, sen-bon () one thousand stick-shape objects, to the lexicon. Moreover, it would be difficult to identify the appropriate upper limit
of the numerals for the combination. Thus, we decided to leave the choice
of the correct pronunciation variant for a given combination to n-gram.
Another possibility was to introduce an intermediate level rather than
the golden formula in (3) (Cremelie and Martens 1995, 1997, 1999; Strik
and Cucchiarini 1999; Fukada et. al 1998, 1999)
(10) Intermediate level
W = argmax P (X | V) * P (V | W) * P (W )
In this formula, there is an intermediate level V that conditions the posterior
probability of X. P(V | W) expresses the probability of the variants given the
words, and P (W ) represents the probabilities of sequences of words. The
task then is to collect pronunciation variants for the given word sequence.
However, we did not take this option, since in this model the contextdependence of pronunciation variants is not modeled directly in the LM. As
we saw in (8a,b), except for those in free variation (8c), the majority of
complex numeral-classifier combinations are context-dependent, and thus,
it is better to model the variation directly in n-gram. For other possible approaches, Strik and Cucchiarini (1999) provide an excellent overview of the
approaches to pronunciation variation modeling.
Exhaustively listing the pronunciation variants for each classifier in the
lexicon (9a) was a prerequisite to the next step adjustment of probability
for numeral-classifier combinations.

3.1. Explicit n-gram extrapolation

Adding the pronunciation variants to the lexicon does not provide context
information about when the variant pronunciation would occur. It is necessary to calculate the n-grams with the variants, so that the specific numeralclassifier combination yields a particular pronunciation of the whole (9b).
In order to directly model the numeral-classifier combinations in LM,
we used explicit n-gram extrapolation. That is, we manually increased the
n-gram count to cover unseen data in the corpus. Since some numeralclassifier combinations were not seen from our corpus, the n-gram would
get under-trained. The sparseness of data is a common problem during the
training of LMs, and individual combinations of numerals and classifiers

Recognizing Japanese numeral-classifier combinations


would have a probability that is too low to factor in LMs. Thus, we first divided the 65 classifiers into three tiers based on their frequency of occurrence
in our corpora: Tier 1 classifiers included /hon/ stick-shape object, /en/
yen, etc., Tier 2 classifiers included /hyoo/ number of votes,
/shoo/ number of wins, etc., and Tier 3 classifiers included /seki/
number of ships, /kumi/ group of etc. Then, during the training, we
explicitly increased the count for the numeral-classifier sequences that had
a relatively lower frequency count within the tier. This resulted in equal
distribution for the numeral-classifier combinations within the same tier. For
example, if there are 1,000,000 occurrences of the numeral-yen sequence
(i.e. (n) n yen where n is 010), all the tier 1 classifiers will be counted
as occurring 1,000,000 times.
In addition to the probability adjustment for the numeral-classifier combinations within a tier, we also smoothed the probability distribution for the
different numerals (010) for a given classifier. Most, if not all, classifiers
have significantly larger counts for their combination with /ichi/ () one
compared to the other numerals. This would cause the pronunciation variant
for the one-classifier (e.g. pai in ip-pai () one cup of) to be so strong
that it would inappropriately win out for the other numeral-classifier combinations (e.g. *ni-pai () two cups of instead of the correct ni-hai).
Thus, we took the count for the one-classifier combination to be the base
count for the rest of the numbers (0-10 except 1). For example, we explicitly
added ni-hai () two cups of, san-bai () three cups of, ... jyup-pai
() ten cups of for each occurrence of ip-pai () one cup of in the
Some classifiers do not take zero as a numeral (*zero-choome (0)
zero street address (?)) or take it with extremely low probability (?zerohai (0) zero cups (?)), so zero was discounted from the count of numeral-classifier combinations for these particular classifiers.
Incorporating the explicit n-gram extrapolation, it was possible to directly
model the pronunciation variants for both numerals and classifiers in the ngram, thereby resolving the problem in (9b).

3.2. Explicit n-gram extrapolation for free variation

One final issue to resolve was (9c), modeling of free variants. Here again,
we used explicit n-gram extrapolation. The assumption here is that the probability between the free variants is unpredictable. Based on this assumption,

200 Keiichiro Suzuki

we assigned equal frequency count to all the free variants of a particular
numeral for a given classifier. For example, we gave equal distribution to
each of the free variants, ichi-kabu, ik-kabu, and hito-kabu for () one
stock. This makes it possible to model the free variation directly in ngram, making these variants of a numeral equally probable for a given classifier. After n-gram counts are manually adjusted, we applied smoothing
and adjusted the backoff weighting to minimize the side effects (see Huang,
Acero, and Hon 2001 for more details on smoothing and backoff).
Having rebuilt our LM using the explicit n-gram extrapolation, we tested
the performance of our Japanese LVCSR to measure the improvement. In
the next section, I discuss the test procedure and the results.


The Test

4.1. Test Procedure

The test was conducted with data consisting of sentences with 65 representative classifiers and was run against our Japanese LVCSR engine. The 65
representative classifiers were divided into two groups: 12 h-initial classifiers
and 53 non h-initial ones. This was because h-initial classifiers show the
most variability and the performance on the h-initial classifiers was identified
as particularly critical to the overall performance. For example, the h-initial
set contained sentences like I had three cups of coffee this morning where
the otherwise h-initial classifier hai appears as bai in san-bai three cups
of. The test sets contained a total of 65 base sentences, and the numbers
from 1-10 were substituted to produce a total of 650 (65*10) sentences: 120
for the h-initial test set (12*10) and 530 for the non h-initial test set
Then, 16kHz recordings of all 650 sentences were collected for 6 speakers
(3 male and 3 female). Two wave files for each speaker were created: one
for the h-initial set, the other for the non h-initial set. These wave files were
fed to our automated accuracy test tool which spits out the recognition accuracy results for the given version of the engine. Both the h-initial and the
non h-initial sets were tested with SI (Speaker Independent) mode.

Recognizing Japanese numeral-classifier combinations


4.2. Results
We recorded the results of the accuracy tests in versions 1 through 4 where
Version 4 was the newest incarnation of our speech recognition engine. As
the version progressed, we have incrementally added fixes to improve the
accuracy of numeral-classifier combinations. The test result consists of the
following two numbers per system: Word Accuracy Rate (WAR) and Numeral-Classifier combination Accuracy Rate (NCAR).
(11) Accuracy Results Numbers
a. WAR (Word Accuracy Rate)
100 WER
b. WER (Word Error Rate)
100*(#Insertion Errors+#Deletion Errors+#Substitution Errors)/
c. NCAR (Numeral-Classifier combination Accuracy Rate)
100*(#correct numeral-classifier combinations)/
(#numeral-classifier combinations)
WAR is the rate obtained by subtracting Word Error Rate (WER) from 100
(%). WER is based on how much the output string returned by the recognizer differs from the correct string for a given test set. WER is calculated
as 100*(#Insertion Errors+#Deletion Errors+#Substitution Errors)/(#Words).
For a given test set, WAR is the indicator of how likely the recognizer gets
the correct recognition results. For example, for the correct string I had
three cups of coffee this morning consisting of 8 words, if the hypothetical
output of the recognizer was I hid three cups of cold feet morning (hid
is substituted for had, cold is inserted, feet is substituted for coffee,
this is deleted) then the WER is 100*(1+1+2)/8 = 50%, and the WAR is
50% (10050).
NCAR represents the accuracy specific to the numeral-classifier combinations in a given test set. For each numeral-classifier combination, I gave
the value 1 if the output contained the correct numeral-classifier combination; otherwise, I gave 0. Note that the insertion errors were not
counted in the calculation of NCAR. As long as the output contained the
correct string, it was counted as correct. Thus, the formula for NCAR is
100*(#correct numeral-classifier combinations)/(#numeral-classifier combinations). For example, taking the previous hypothetical output that contains

202 Keiichiro Suzuki

one instance of numeral-classifier combination three cups of, the result
string I hid three cups of cold feet morning would yield the NCAR of
100*1/1 = 100%. The test sets contain the total of 65 (12 for h-initial, 53
for non h-initial), so the maximum number of correct numeral-classifier
combinations is 65 for the two sets.
The table of the overall test results is shown below in (12). The numbers
for the accuracy rate for each test set (h-initial and non h-initial) against
each system (14) are included (rounded for readability). The average of hinitial and non h-initial numbers are given in the bottom two rows. Version
1 engine did not have any fix for the numeral-classifier problem. Version 2
and 3 engines incorporated partial fixes for the problem by extending the
coverage of numeral-classifier combinations. Version 4 engine implemented
the LM that had the full coverage of the targeted 65 classifiers and their
pronunciation variants along with the pronunciation variants for the numerals. The baseline data here was obtained by running identical test sets with
a commercial 3rd party Japanese LVCSR engine.
(12) Overall test results


Version 1
h-ini non

Version 2
h-ini non

Version 3
h-ini non

Version 4
h-ini non

h-ini rest

The graph in (13) below shows the improvement more clearly. As is obvious
from the graph, our Version 1 engine was performing very poorly, getting
lower accuracy rates for both WAR and NCAR than the baseline. As we
incorporated the fix progressively version by version, gradually completing
the adjustment of the frequency counts of numeral-classifier combinations,
it is evident that the performance of our engine for both the WAR and the
NCAR improved dramatically. By Version 3, our engine outperformed the
baseline engine for both WAR and NCAR. At Version 4, as our implementation of the probability adjustment for the numeral-classifier combinations
was completed, we obtained the best results. Note that the improvement of
NCAR did not hinder the WAR but helped the WAR improvements.

Recognizing Japanese numeral-classifier combinations


(13) Accuracy rate progression against the baseline

4.3. Summary
The test results reveal the successful improvement in the performance of the
Japanese LVCSR engine regarding the pronunciation variability of numeralclassifier combinations by making probability adjustments using the explicit
n-gram extrapolation. Three problem areas identified earlier in (9) were resolved by 1) exhaustive listing of pronunciation variants for numerals as well
as for classifiers in the lexicon, and by 2) manually adjusting the counts of
numeral-classifier combinations to model in the n-grams. Not only did the
explicit n-gram extrapolation resolve the context-dependent pronunciation
variation, but it resolved the issue with free variation as well.
One of the things we did not cover with this study was the testing of zeroclassifier instances. As I mentioned earlier, not all classifiers may be used
with the numeral zero. Future research will need to test whether these exceptional cases are handled appropriately. Another remaining issue is that of
numerals that are larger than 10. Increasing the frequency counts of numeralclassifier combinations for the numerals 0-10 may have adverse effects on
instances where the numeral is larger than 10. If so, we will need to make
the appropriate modifications to handle larger numerals. Further testing is
required before the engine will end up in commercial products. The goals
of my future research are to expand the test cases as well as to seek other
ways of improving the performance of our Japanese LVCSR engine.

204 Keiichiro Suzuki

5. Conclusion
In this paper, I have presented the results of a study designed to improve
the performance of our Japanese LVCSR engine regarding numeralclassifier combinations. I have demonstrated that the context-dependent
pronunciation variation (8a,b) and the context-independent pronunciation
variation (8c) are handled by the same mechanism, explicit n-gram extrapolation. These two types of variation are not different species from the
perspective of an LVCSR. This is very different from a linguistic point of
view where morphophonemic alternations are considered as linguistically
significant, while free variation is considered insignificant.
I have also introduced, at least minimally, the domain of ASR to the
primarily linguistic audience. The research on modeling pronunciation variation for ASR systems has increased lately, and some of the new ideas have
actually been inspired by theoretical phonology (see Strik and Cucchiarini
1999). I believe that there are opportunities for phonologists to make significant contributions to the field of ASR research.
Just as ASR research can be informed by linguistic analyses, I also believe that phonologists can benefit from the study of ASR systems. Recently some stochastic models of phonology have been proposed (Anttila
1995; Frisch 1996; Boersma 1997; Coleman and Pierrehumbert 1997; etc.).
The fully stochastic nature of the techniques used in the current ASR systems may be worthy area for phonologists to explore.

Corpus-based analysis of vowel devoicing

in spontaneous Japanese: an interim report
Kikuo Maekawa and Hideaki Kikuchi

1. Introduction
Introductory textbooks of phonetics or pronunciation dictionaries of Japanese
often state that close vowels (/i/ and /u/) are devoiced when they are both
preceded and followed by voiceless consonants. This description turns out
quickly to be incorrect when we look at real data. For one thing, close vowels
are not always devoiced, even in the above-mentioned environment, and in
addition, close vowels followed by voiced consonants can be devoiced to
some extent when they are preceded by voiceless consonants. Moreover,
non-close vowels like /a/ are also devoiced occasionally.
These facts, which we will examine more closely in this paper, indicate
that vowel devoicing is a probabilistic event: an event whose occurrence cannot be predicted with 100% accuracy. Vowel devoicing, accordingly, should
be analyzed from a statistical perspective. In this perspective, phoneticians,
including the first author of this paper, have in the past conducted statistical
analyses of vowel devoicing in order to find out which factors determine
the probability of vowel devoicing in a given phonological context.
The reported results, however, have not always coincided. For example,
there is disagreement regarding the influence of the manner of articulation
of the following consonant. Han (1962) claimed that close vowels followed
by an affricate or fricative were more likely to be devoiced than those followed by a plosive, but Takeda and Kuwabara (1987) obtained exactly the
opposite result. The latter study also reported that one of the devoicing
rules proposed in NHK (1985), namely a low-pitched mora in pre-pause
position is likely to be devoiced, was almost useless in interpreting the
devoicing patterns observed in a read-speech corpus.
There may be several possible reasons for such disagreements. First,
some descriptions of devoicing were based upon introspection. Generally
speaking, introspection alone is not an appropriate analysis method for a
probabilistic event like devoicing.

206 Kikuo Maekawa and Hideaki Kikuchi

Second, the experimental data examined in at least some previous studies
were too small to be able to arrive at stable conclusion. This problem is
likely to happen when the occurrence probability of an event is inherently
very low, and/or, multiple factors and their complex interactions are involved.
Third, the data analyzed in different studies were not homogeneous with
respect to the data collection method. At least three different methods were
used in the previous studies: reading of isolated words, reading of words in
a carrier sentence, and reading of prose.
It is important to note, at this point, that no previous study examined
devoicing in spontaneous speech. Observation of spontaneous speech is
necessary because vowel devoicing may be influenced by the differences in
speaking style, as is the case with many other linguistic variations.
Theoretically, it is not impossible to conceive an experiment designed to
solve all three problems mentioned above, but from a practical point of
view, it is virtually impossible to conduct such an experiment. The cost of the
experiment would be too high to be supported if the aim of the experiment
is nothing but the analysis of devoicing.
Recent development of speech corpora, however, has opened up a new
vista for the study of vowel devoicing and other phonetic variations. Since
the size and coverage of speech corpora are growing rapidly, we can use
them for the study of phonetic variation. In fact, Takeda and Kuwabara
(1978) and Yoshida and Sagisaka (1990) have analyzed the ATR speech
database developed for speech synthesis and recognition, and have shown
that the use of large-scale corpora provide a solution to the first of the two
problems mentioned above.
The problem of speaking style, however, has so far remained unsolved
since most existing corpora contain only read speech. This last problem
might be solved by a large corpus of spontaneous speech. In the rest of this
paper, we will examine the distribution of devoiced vowels in a corpus of
spontaneous Japanese.


The data

2.1. The Corpus of Spontaneous Japanese (CSJ)

The data we analyzed is an excerpt from the Corpus of Spontaneous Japanese (henceforth CSJ), which we have been developing since 1999, aiming
for public release in the spring of 2004. CSJ is a large-scale speech database

Corpus-based analysis of vowel devoicing in spontaneous Japanese 207

designed mainly for the study of speech recognition and phoneticslinguistics (See Maekawa, Koiso, Furui and Isahara 2000 for the blueprint
of the CSJ).
The whole body of the CSJ contains about 7.5 million words spoken by
native speakers of so-called Standard, or Common, Japanese. This corresponds roughly to about 660 hours of speech. The main body of the corpus
is monologue taken from two sources: academic presentation speech (APS)
and simulated public speaking (SPS).
The APS is the live recording of academic presentations done in meetings
of nine different academic societies covering both humanities, natural science, and engineering fields. The SPS, on the other hand, is the public
speech on every-day topics, performed by recruited lay subjects in front of
small audiences. The sex and age of the SPS speakers are roughly balanced.
The speech data was recorded using a head-worn directional microphone
and a DAT with the sampling frequency of 48 kHz and 16-bit precision.
The speech data was then down-sampled to 16 kHz and stored in computer.
All recorded speech was transcribed and morphologically analyzed in
terms of word boundary and part-of-speech information. In addition to this
tagging of the entire corpus, we have done extensive annotation of a number
of linguistic features to a subset of the corpus; we call this subset the Core.
The Core contains about 500,000 words or about 45 hours of speech, all
of which have been (sub-)phonemically segmented and labeled for intonation.1 The tag set used in the segmental labeling of the Core is shown in
Table 1. The tag set is a mixture of phonemic and sub-phonemic labels.
This inconsistency was a deliberate choice of ours to enrich the value of the
Core as resource for the study of phonetic variation. When this segment
label information is coupled with the X-JToBI intonation labels that we
developed for the CSJ (Maekawa, Kikuchi, Igarashi and Venditti 2002), the
Core can be an excellent resource for the phonetic study of spontaneous
The segment labeling of the Core was preformed in three steps. First,
the initial labels were generated from the transcription text and aligned
automatically to the speech signal using a Hidden Markov Model based
speech recognition toolkit (Young et al., 1999). The accuracy of automatic
alignment in terms of phoneme boundary location, averaged over all phonemes, is currently 3.84 ms average and 21 ms standard deviation (Kikuchi and Maekawa 2002).

208 Kikuo Maekawa and Hideaki Kikuchi

Table 1. Label set used for the segmental labeling of the CSJ


a, i, u, e, o (voiced)
A, I, U, E, O (devoiced)

Plain Consonants:
k, g, G[F], @[], s, z, t, c[ts], d, n, h, F, b, p, m, r [R], w, y
Phonetically palatalized consonants:
kj, gj, Gj, @j, sj[S], zj[Z], cj[tS], nj[], hj[]
Phonologically palatalized consonants (youon):
ky, gy, Gy, @y, sy, zy, cy, ny, hy, by, py, my, ry

Moraic phonemes:
Long vowel:
Geminate (sokuon):
Moraic nasal (hatsuon): N

Then, human labelers checked the appropriateness of the generated labels

and their location on the time axis. Finally, trained phoneticians checked
inter-labeler inconsistencies before fixing the final labels.
During the course of manual corrections, the voicing of vowel segments
was judged to be either voiced or voiceless. Information from the wide-band
spectrogram, speech waveform, extracted speech fundamental frequency,
peak value of the autocorrelation function, in addition to audio playback
were all available for these judgments, but the most important criteria was
the audio playback and presence versus absence of the speech fundamental
frequency. In our speech-analysis environment, fundamental frequency was
judged to be present if the probability of voicing of an analysis frame was
higher than 0.5, and this probability was determined according to a twodimensional normal distribution of speech intensity and periodicity.

2.2. The current data set

Because compilation of the CSJ is currently underway (as of february 2003),
we are not able to use the whole body of the Core. The data set used for the

Corpus-based analysis of vowel devoicing in spontaneous Japanese 209

analyses reported below consists of about 23 hours of segment-labeled

speech containing 427,973 vowel segments.
This data set contains 29 female and 56 male speakers whose average
age and standard deviation were 32.25.5 and 32.36.6 years old, respectively. Sixty five subjects were born in Tokyo, and all others were born in
three surrounding prefectures of Tokyo, namely, Saitama, Kanagawa, and
Chiba. From a dialectological point of view, all subjects spoke so-called
Standard Japanese. As for the type of monologue, 41 APS and 44 SPS
monologues are present in our data set. Six APS and 23 SPS monologues
are by female speakers and 35 APS and 21 SPS monologues are by male
speakers. Most of these monologues lasted from 10 to 15 minutes.
During the course of transcription work, the speech signal was divided
into chunks delimited by a pause longer than 200 ms. This chunk we will
call an utterance, but utterance in this sense may or may not correspond to
a syntactically meaningful construction.
Lastly, the following notation is adopted in the rest of this paper. Symbols C and V stand for consonants and (short) vowels. Co and Cv
stand respectively for voiceless and voiced consonants. Vc and Vnc stand
respectively for close and non-close vowels. The combination of these
symbols placed within forward slashes represents the phonological environment; for example, /CoVcCo/ stands for the phonological environment in
which close vowels are both preceded and followed by voiceless consonants, while /CoVcCv/ stands for the environment in which close vowels
are preceded by a voiceless consonant and followed by a voiced consonant.
When it is necessary to make a distinction between the preceding and following consonant, integers 1 and 2 are used as an index: C1 and C2
stand for preceding and following consonant, respectively.

3. Overview of vowel voicing

We start our analysis by giving an overview of the vowel voicing in the
current data set. Table 2 tabulates the number of vowel samples and the
average devoicing rate represented as a percentage. Devoicing rates of long
vowels (/aH/, /eH/, /iH/, /oH/, and /uH/) remained consistently the lowest.
Among short vowels, close vowels showed distinctively higher devoicing
rate than non-close vowels, as expected.

210 Kikuo Maekawa and Hideaki Kikuchi

Table 2. Number of samples and averaged devoicing rate of all vowel segments







Table 3 shows the distribution of devoicing rate as a function of the voicing

of the C1 and C2 in the /C1VC2/ environment (tabulated over 300,018
vowels). In addition to the expected fact that the devoicing rate is by far the
highest in the /CoVcCo/ environment, this table reveals interesting findings
about the nature of vowel devoicing.
First, the devoicing rate of close vowels in the typical /CoVcCo/ environment was not 100%. Second, close vowels were also devoiced with
modest probability in the /CoVcCv/ environment (17.37% and 20.91% for
/i/ and /u/, respectively). Third, non-close vowels also could be devoiced in
the /CoVncCo/ environment (2.10%, 3.31%, and 3.45 % for /a/, /e/, and /o/,
respectively). Moreover, there was no environment in which devoicing was
completely blocked. Vowels could be devoiced even in the /CvVncCv/
environment (i.e., non-close vowels preceded and followed by voiced consonants), which is regarded to be the most atypical environment for vowel
devoicing. Similar findings were reported earlier in Venditti and van Santen
To examine whether the devoicing occurring in environments other than
/CoVcCo/ is phonetically the same as the devoicing in /CoVcCo/ is an interesting research question. In the next section, we will examine devoicing
in three different environments, i.e., /CoVcCo/, /CoVcCv/, and /CoVncCo/.

Corpus-based analysis of vowel devoicing in spontaneous Japanese 211

Table 3. Devoicing in the /C1VC2/ environment as a function of the voicing of
C1 and C2











Analysis of vowel devoicing

4.1. The /CoVcCo/ environment

We will first analyze devoicing in the /CoVcCo/ environment. As we saw
already in Table 3, the devoicing rates in this typical environment were
less than 90%. So, the essential task here is to identify the conditions that
decrease the probability of vowel devoicing in this context.
Tables 4 and 5 summarize the voicing status of /i/ and /u/ according to
the phonemic classification of C1 and C2. These tables, as well as all the
following tables, need some introduction. First, because C1 and C2 were
phonemically classified, allophones shown in Table 1 were merged into
phonemes. Also, we presuppose a voiceless (dental) affricate phoneme /c/
adopting the phonemic analysis of Hattori (1950).

212 Kikuo Maekawa and Hideaki Kikuchi

Second, the combinations of C1 and C2 where the total number of samples
was less than 10 were omitted from the tables. Third, all phonemically
palatalized consonants were omitted altogether, because in most of the C1C2 combinations involving the palatalized consonants, the number of samples was less than 10.
Table 4. Cross-tabulation of the voicing of /i/ in the /CoVcCo/ environment by C1
and C2












Corpus-based analysis of vowel devoicing in spontaneous Japanese 213

Table 5. Cross-tabulation of the voicing of /u/ in the /CoVcCo/ environment by
C1 and C2


























214 Kikuo Maekawa and Hideaki Kikuchi

4.1.1. Interaction of consonant manners
Tables 4 and 5 show the importance of the manner of articulation of C1 and
C2 as the factors of vowel devoicing, as suggested by many previous studies
(See introduction and discussion for references). Tables 6 and 7 are summaries of Tables 4 and 5 from this point of view.
Table 6. Devoicing rate [%] of /i/ in the /CoVcCo/ environment classified by the
manner of C1 and C2























Table 7. Devoicing rate [%] of /u/ in the /CoVcCo/ environment classified by the
manner of C1 and C2























These tables show several interesting tendencies. First, the devoicing rate
was the highest when fricative C1 was followed by stop C2 in both tables,
and the second highest devoicing rate was observed when fricative C1 was
followed by affricate C2 in both tables. In contrast, the devoicing rate was
the lowest when affricate C1 is followed by fricative C2, and the second
lowest rate was observed when fricative C1 is followed by fricative C2 in
both tables. Also, it is worth noting that, in terms of the peripheral distribution, the highest devoicing rate was observed when C2 was stop, and the
lowest devoicing rate was observed when C2 was fricative.
These facts show clearly that there is an interaction between the manners
of articulation of C1 and C2. A two-way ANOVA between the manners of

Corpus-based analysis of vowel devoicing in spontaneous Japanese 215

C1 and C2 applied to data pooled over /i/ and /u/ showed that main effects
of C1 and C2 and their interaction were all significant (For C1, DF =2,
F=44.38, P <0.0001; For C2 DF =2, F =1959.43, P <0.0001; For C1*C2,
DF =4, F =263.24, P <0.0001). Phonetic interpretation of the manner
interaction will be discussed in Section 5.1 below.
In the calculation of Tables 6 and 7, samples in which C2 was a geminate
/Q/ were omitted, because the manner of /Q/ per se is not specified from a
phonological point of view, and, it seemed that a following geminate constituted a special environment of devoicing, as shown below.
Table 8 compares devoicing rates of close vowels (pooled over /i/ and
/u/) in cases where C2 was and was not a geminate. This table shows that
the devoicing rate was lower when C2 was a geminate, regardless of the
manner of C1 (DF =758, t=24.84, P<0.0001, unequal variance). Further
analysis revealed that the devoicing rate was the highest for the combination
of fricative C1 and a stop geminate (namely a geminate followed by a
stop), and was the lowest for the combination of fricative C1 and a fricative
geminate (namely a geminate followed by a fricative). These show the same
tendency as observed in Tables 6 and 7.
Table 8. Effect of the following geminate on devoicing rate:
Pooled data of /i/ and /u/

C2 non /Q/

C2 /Q/

Voiced Devoiced % Devoiced Voiced Devoiced % Devoiced






















4.1.2. Consecutive devoicing

Because the initial and final consonants in the /CoVcCo/ environment are
both voiceless (and due to the common CV syllable structure of Japanese),
it happens that more than two consecutive vowels can belong to this environment (e.g. (CoVc)CoVcCo(VcCo)) When this happens it is called
consecutive, or sequential, devoicing. Experimental studies have shown that
more than two consecutive close vowels can be devoiced in this environment (Maekawa 1990 a,b).

216 Kikuo Maekawa and Hideaki Kikuchi

At the same time, however, it is widely believed that there is a tendency to
avoid consecutive devoicing (See Sakuma 1929 and Maekawa 1989, among
many others). If this tendency does exist in spontaneous speech, it may help
us to understand why the devoicing rate in the canonical /CoVcCo/ environment was not 100% in our data.
Although the environment of consecutive devoicing can be formed both
word-internally and across a word boundary, we examine only the wordinternal environment in order to exclude potential influence of a word
boundary (cf. Kondo, 1997).
The current data set contains 318 samples where consecutive devoicing
could happen word internally. Table 9 shows the distribution of voicing
status with respect to the first two vowels in the environment of consecutive
devoicing. For example, if /niNsiki/ (recognition) is followed by verbforming suffix (i.e., sahen verb) /suru/, the last two vowels of /niNsiki/ are
in the consecutive devoicing environment.
According to this table, 84 samples out of the total of 318 showed consecutive devoicing (26.4%), while in all other samples in this environment
consecutive devoicing was avoided. The table also shows that the most
frequent pattern of vowel voicing in this environment was a devoiced first
vowel followed by a voiced second vowel.
Table 9. Voicing of the first two vowels in the environment of consecutive devoicing



Figure 1 compares devoicing rates of the first and second vowels in the
consecutive devoicing environment. Its abscissa represents the combination
of the manner of C1 and C2, and is sorted in the descending order of the
observed devoicing rate of the first vowel. Letters, A, F, and S stand
respectively for affricate, fricative, and stop; and are combined in the order
of C1/C2. This figure shows that the two devoicing rates were, by and
large, inversely proportional, reflecting a one or the other relationship
between the two vowels.2 The graph also shows that when a fricative was
combined with an affricate or stop, it was always the vowel associated with
(i.e., in the same mora as) the fricative that showed the higher devoicing
rate, and, when both consonants were fricatives, it was the second vowel
that showed a high devoicing rate.

Corpus-based analysis of vowel devoicing in spontaneous Japanese


Figure 1. Devoicing rate of two vowels in the environment of consecutive devoicing

4.2. The /CoVcCv/ environment

From this point on, we will examine vowel devoicing in atypical environments. This section deals with the /CoVcCv/ environment. Tables 10 and
11 show the devoicing rate of /i/ and /u/ as a function of the manner of C1
and C2. A two-way ANOVA between the manners of C1 and C2 applied to
data pooled over /i/ and /u/ showed that main effects of C1 and C2 and their
interaction were all significant (For C1, DF =2, F =440.24, P<.0001; For
C2 DF=4, F =344.15, P<.0001; For C1*C2, DF=8, F =155.35, P<.0001).
Table 10. Devoicing rate [%] of /i/ in the /CoVcCv/ environment classified by the
manner of C1 and C2


 Approximant Fricative Liquid Nasal Stop

9.4  12.5
C1 Fricative
38.4  10.2  28.3
5.9  7.8



218 Kikuo Maekawa and Hideaki Kikuchi

Table 11.

Devoicing rate [%] of /u/ in the /CoVcCv/ environment classified by the

manner of C1 and C2


 Approximant Fricative Liquid








12.9  19.8







22.1  36.8













As far as C1 is concerned, the effect of consonant manner was similar to

that observed in the /CoVcCo/ environment in that fricatives and stops
showed the highest and lowest devoicing rate, respectively. As for C2, the
effect of consonant manner was drastically different from that observed in
the /CoVcCo/ samples. The manner of articulation that showed the highest
devoicing rate here was nasal. This is congruent with the results of Maekawa
(1989 and 1990a).
Also, approximants, i.e., /w/ and /y/, enhanced devoicing more than
stops did. The highest devoicing rate of all was observed for vowel /u/
preceded by a fricative and followed by an approximant.A closer look at
the data, however, revealed that this enhancing effect of an approximant
was the result of a high devoicing rate in only a few lexical items, namely,
/desu/ (polite form of copula /da/) and /masu/ (an auxiliary verb of politeness). In the /CoVcCv/ samples, /desu/ was followed by sentence-ending
particle /yo/ 138 times and devoiced 107 times (the devoicing rate was
77.54%). Also, /masu/ was followed by particles /yo/ or /wa/ 28 times and
devoiced 14 times (50% devoicing). If we remove these two lexical items
from the data set, the resulting devoicing rate was only 18%, and is lower
than the 46.5% reported in the C1 fricative/C2 approximant cell of Table 10.
Figure 2 shows the relation between word-frequency and devoicing rate
of words in the /CoVcCv/ environment. Note that individual symbols in the
figure represent the averaged devoicing rate of a given word. Note also that
both axes are plotted on a logarithmic scale, and, words whose frequency
was lower than 10 or whose devoicing rate was 0 were excluded from the
The data points for /desu/ and /masu/ in this figure are likely to be outliers
of the overall trend of a slight negative correlation (N =293, r=0.146)3.
The effect of the following approximant should be regarded, at least partly,
as a consequence of word idiosyncrasy of high frequency function words.

Corpus-based analysis of vowel devoicing in spontaneous Japanese


Figure 2. Word-frequency and devoicing rate in the /CoVcCv/ environment

4.3. The /CoVncCo/ environment

The last environment we will examine is /CoVncCo/, namely, non-close
vowels both preceded and followed by voiceless consonants. Tables 1214
show the devoicing rate of three non-close vowels as a function of the
manner of C1 and C2.
It is difficult to extract any phonetically meaningful generalizations
from these tables. Fricative C1 and stop C2 seem to enhance devoicing
more than other manners, but the difference was not salient. Indeed, a threeway ANOVA of vowels (/a/, /e/, /o/), C1 manner, and C2 manner revealed
that none of the main effects were significant (For vowels, DF =2, F =2.57,
P>0.0766; For C1 manner, DF =2, F=1.82, P>0.1616; For C2, DF=3,
F=0.64, P >0.5890). The C1-C2 manner interaction was not significant
either (DF =6, F =0.98, P>0.4354).

220 Kikuo Maekawa and Hideaki Kikuchi

Table 12. Devoicing rate [%] of /a/ in the /CoVncCo/ environment by the manner
of C1 and C2







(7) 0.0 


(47) 1.3 

(79) 0.7

C1 Fricative

3.1  (389) 5.0  (714) 1.5  (1206) 1.3  (316) 2.7


0.8  (880) 1.1  (2537) 2.8  (5162) 1.1  (1134) 2.0





Numbers in parenthesis show the number of samples for each combination.

Table 13. Devoicing rate [%] of /e/ in the /CoVncCo/ environment by the manner
of C1 and C2


























4.4 (1,083)

3.4 (2,925)








Numbers in parenthesis show the number of samples for each combination.

Table 14. Devoicing rate [%] of /o/ in the /CoVncCo/ environment by the manner
of C1 and C2




















4.3 (1,205)




2.1 (2,070)

3.8 (7,208)







Numbers in parentheses show the number of samples for each combination.

In Tables 1214, the devoicing rate stayed nearly the same regardless of the
combination of consonant manners, and it is this very fact that characterizes
the devoicing of non-close vowels. Devoicing in the /CoVncCo/ environment is special in that the manners of adjacent consonants do not play a
crucial role in the prediction of devoicing rates. But this does not mean that
devoicing of non-close vowels was completely free from phonological con-

Corpus-based analysis of vowel devoicing in spontaneous Japanese 221

ditioning. There is at least one phonological factor that influences the devoicing rate of /CoVncCo/ vowels: consecutive identical morae, or, the
repetition of the same mora.
Sakuma (1929) noted that in words like /kokoro/ (mind) and /haha/
(mother), the vowel in the first mora could be devoiced. Table 15 summarizes the devoicing rate of the first vowels of 1260 samples that contain
consecutive identical morae in the /CoVncCo/ environment. Devoicing
rates of /a/ and /o/ shown in the table were higher than the overall devoicing rate shown in Tables 12 and 14.
Table 15. Devoicing of the first vowel of two identical morae in the /CoVncCo/







In addition to this phonological conditioning, extra-linguistic factors played

an important role in the devoicing of /CoVncCo/ samples. First, Figure 3
shows the effect of speaking rate on the devoicing of non-close vowels.
The speaking rate of a given speakers utterance was taken to be the number of mora per second, averaged over the entire utterance. A histogram of
speaking rates was plotted for each speaker, and was divided into 4 intervals for purposes of the current analysis. In the figure, speaking rate 1
means that the average speaking rate of the utterance containing the vowel in
question is within the lowest 25% of the speakers histogram, and, speaking
rate 4 means the top 25%. With the exception of /o/, the devoicing rate of
non-close vowels increased monotonically as a function of the speaking

222 Kikuo Maekawa and Hideaki Kikuchi

speaking rate
Figure 3. Effect of speaking rate on devoicing rate in the /CoVncCo/ environment

Lastly, Table 16 shows the effect of laughter on non-close vowel devoicing.

In the transcription of CSJ, a tag was given if the speaker was speaking while
laughing. Although this difference was not statistically significant (DF =27,
t=0.86, P<0.3967, unequal variance), the devoicing rate of non-close
vowels in utterances containing the laughter-tag was consistently higher
than in utterances without the tag.
Table 16. Devoicing rate in the /CoVncCo/ environment as a function of laughter





















Corpus-based analysis of vowel devoicing in spontaneous Japanese 223



5.1. Interpretation of manner interaction

The results of our analysis about the manner of C1 and C2 are congruent
with most past studies. For example, Takeda and Kuwabara (1987) reported
that the devoicing rate of vowels in general was higher when C1 was a
fricative, and the devoicing rate of the vowel in the /si/ mora was highest
when the mora was followed by a stop. Similarly, Yoshida and Sagisaka
(1990) reported that the devoicing rate of close vowels preceded by voiceless consonants became the highest when they were followed by stops.
However, these studies examined the effects of C1 and C2 independently,
and did not pay attention to their interaction.
Recently, N. Yoshida (2002) and Fujimoto (2003) examined the interaction of adjacent consonants and arrived at conclusions similar to ours. However, their experiments examined only a subset of all possible manner combinations. Yoshidas experiment examined /k/ and /s/ only, and Fujimotos
examined /k, t, s/ and /h/.
Our results reveal the validity of the manner interaction in much wider
phonetic context, and in a more naturalistic setting, namely, in spontaneous
speech. This is probably the most valuable finding of the current study.
In our analysis of the /CoVcCo/ environment, we found that the interaction between the manners of C1 and C2 was statistically significant. The
fact that the combinations of fricative-fricative and affricate-fricative resulted in a low devoicing rate is interpreted naturally if we think about the
ease of mora boundary perception. In a CV mora whose consonant is a
fricative or affricate, the devoiced vowel is phonetically realized as the extension of the frication noise. So, devoicing of vowels in the abovementioned phonetic context (that is, Co[fric/affric]-Vc-Co[fric]) results in
the succession of frication noise, of which the first and last halves belong to
different morae. Devoicing of this sort is likely to be avoided because it is
difficult to perceive the mora boundary within this extended frication.
Similar perceptual difficulty is also likely to arise when a devoiced
vowel is preceded by a stop and followed by a fricative. In this combination, the mora boundary occurs between the aspiration noise of the stop and
the frication noise of the fricative. Perception of a mora boundary in this
context, however, is not as difficult as the combination of a fricative/
affricate followed by a fricative, because the presence of a stop can easily
be perceived by the presence of its burst, and, the aspiration noise of a stop

224 Kikuo Maekawa and Hideaki Kikuchi

is phonetically different from frication noise with respect to its quality and
On the other hand, in the manner combinations having a stop as C2, it is
relatively easy to perceive a mora boundary, because the boundary is
formed by an acoustically salient feature, i.e., the burst of the stop. This
salience is also preserved when C2 is an affricate, since the first half of an
affricate is phonetically nothing but a stop.
Lastly, the negative effect on devoicing of a following geminate can
also be interpreted from a perceptual point of view. Devoicing of a vowel
before a geminate requires, on the part of the listener, perception of two
mora boundaries embedded within a stretch of voiceless sounds. For example, if the first vowel of /hiQsori/ (quietly) is devoiced, the listener is required to perceive the first mora boundary at the point where palatal fricative (the conditional variant of /h/ before /i/) changes its color into a
alveolar fricative, and, the second mora boundary somewhere within the
long stretch of the alveolar fricative. It is not surprising that the language
has a tendency to avoid such a difficult perceptual combination.

5.2. Consecutive devoicing

The second valuable finding of the current study is the quantitative confirmation of the tendency to avoid consecutive devoicing and the role played
by the combination of consecutive consonants. In Section 4.1.2, we noted
that it was vowels associated with (i.e., in the same mora as) fricatives that
showed higher devoicing rates. It is interesting, in this respect, to see that
the observed devoicing rates of the first vowel in a consecutive devoicing
environment were, by and large, close to those observed in the /CoVcCo/
environment, as summarized in Table 17. This similarity suggests that consecutive devoicing is basically a simple process. No special forwardlooking processing is needed to determine the devoicing rate of the first
vowel. The devoicing rate of the second vowel, on the other hand, involves
backward reference to the voicing status of the preceding (i.e. the first)
At this point, it is important to note that the combination S/S in an
exception in both Figure 1 and Table 17. The devoicing rate for this combination in the consecutive devoicing environment is low, yet the rate in the
canonical /CoVcCo/ environment is high. Currently, we are unable to explain this exception, but it is noteworthy that the number of samples used in

Corpus-based analysis of vowel devoicing in spontaneous Japanese 225

the analyses of consecutive devoicing is small for many of the manner

combinations (see Figure 1). An increase in data will make it possible to
decide if this case is really an exception.
Lastly, the finding that consecutive devoicing does play an important
role in the devoicing of close vowels requires revision of past analysis presented by the first author. Maekawa (1989 and 1990a) reported that the
devoicing rate of close vowels could be higher when the following mora
contained a non-close vowel. Although we do not present the data here, this
tendency was clearly observed in the current data set. However, The tendency should be interpreted, at least partly, as a by-product of the avoidance of consecutive devoicing. That is, when a close vowel has a non-close
vowel in the following mora, this automatically means that the vowel in
question (i.e. the first clost vowel) is not in the environment of consecutive
devoicing, hence the devoicing rate of that vowel is expected to be higher
than elsewhere.
Table 17. Comparison of the devoicing rate of the first vowel in a consecutive
devoicing environment with that of the vowel in the /CoVcCo/ environment, pooled over /i/ and /u/






5.3. Atypical environments

The third contribution of this study is the observation of devoicing in atypical environments, namely, in /CoVcCv/ and /CoVncCo/ environments. Our
analyses suggest that the devoicing of /CoVcCv/ close vowels were similar
to that of /CoVcCo/ close vowels in that they were deeply conditioned by

226 Kikuo Maekawa and Hideaki Kikuchi

the manner of articulation of adjacent consonants. Although the influence
of C2 was quite different depending on the voicing of C2, it seems that
these environments constitute one large class of vowel devoicing. Devoicing of non-close vowels, on the other hand, was a radically different phenomenon from close vowel devoicing in that the manners of adjacent consonants had almost no influence on devoicing rate.
With respect to the influence of extra-linguistic factors that we presented in the analysis of non-close vowels, it is worth noting that both
speaking rate and laughter showed exactly the same influence upon the
devoicing of close vowels. The devoicing rate of close vowels increased
monotonically as a function of speaking rate without exception, and, vowels
uttered with laughter showed higher a devoicing rate than those uttered
without laughter.
The effect of speaking rate on the devoicing rate has been repeatedly
confirmed in previous studies such as Maekawa (1990a) and Kondo (1997),
and has been confirmed here for spontaneous speech data.
Recent studies of linguistic variations recorded in CSJ have revealed that
the presence of laughter was an excellent indicator of the speakers relaxation, resulting in a casual speaking style. Perhaps vowels are more likely to
be devoiced in a casual speaking style than in a more formal speaking style
in which speakers pay more attention to their speech. This view is consistent with the finding of Imaizumi, Hayashi and Deguchi (1995) that close
vowel devoicing is less prominent when school teachers spoke to hearingimpaired pupils than when they spoke to normal hearing pupils.
In the current data, as a matter of fact, the average devoicing rates in
SPS (simulated public speaking) samples were significantly higher than
that in APS (academic presentation speech) samples, as shown in Table 18.
According to a two-way ANOVA between phonetic environment and speech
type, both main effects were significant and the interaction was not significant (Environment: DF =2, F =536000.9, P<0.0001; Speech type: DF =1,
F=39.32, P<.0001; Environment*Speech type: DF =2, F=2.95, P <0.0524).
Table 18. Difference of devoicing rate due to speech type












Corpus-based analysis of vowel devoicing in spontaneous Japanese 227

6. Concluding remarks
The use of a spontaneous speech corpus has revealed its effectiveness in the
analysis of vowel devoicing. The data presented here is one of the most
reliable resources for the study of vowel voicing, both in its quality and in
its quantity. Full coverage of the many C1C2 manner combinations would
have been impossible if the amount of data was substantially smaller than
the current data set. Needless to say, however, the current data set is still
not large enough for a complete analysis of the statistically complex phenomena like consecutive devoicing discussed in Section 4.1.2. More reliable
conclusions will be achieved once we have access to the entire CSJ-Core
whose data size is more than twice the current data.
Most of the analyses done in this paper are linguistic analyses in the
sense that phonological environments were used as the factors conditioning
vowel devoicing. Yet, as suggested in the analysis of non-close vowel devoicing, it is obvious that extra-linguistic factors also played a certain role.
Extensive analyses of extra-linguistic factors and the integration of linguistic
and extra-linguistic factors is an important step towards a full understanding
of vowel devoicing phenomenon. Lastly, intonation labeling of the CSJCore will make it possible to examine the effect of prosodic conditionings
such as pitch accent. All of these analyses should be the focus of future

The authors are grateful to all speakers in the Corpus Spoken Japanese. Our
gratitude also goes to Professor Hisao Kuwabara of Teikyo Science University who sent us his paper upon our request, and Dr. Jennifer Venditti
whose comments on an earlier version of this paper helped us greatly.

1. The Core is also labeled for other research information such as clause boundary, discourse segmentation and dependency structure, but this information is
not relevant to the current paper. Visit the following URL for more information about CSJ; http://www2.kokken.go.jp/~csj/public/index.html

228 Kikuo Maekawa and Hideaki Kikuchi

2. It seems that S/S is an exception to the general tendency of inverse proportion. See section 5.2 for discussion.
3. The sample located in between /desu/ and /masu/ in Figure 1 is /si/, a suffix
that turns a noun or adjectival into a verb ( i.e. a sahen verb).

Syllable structure and its acoustic effects on vowels

in devoicing environments
Mariko Kondo

1. Introduction
Vowel devoicing is a common phonological process in many languages and
typically involves high vowels and schwa. High vowels and schwa are inherently short (Bell 1978; Dauer 1980) and the process usually occurs when
the vowels are either adjacent to, or surrounded by, voiceless consonants,
during which the glottis is fully open. It is thought that vowel devoicing is a
consequence of articulatory undershoot of glottal movements. It also suggests
that vowel devoicing processes are the results of glottal gestural overlap between voiceless consonants and short vowels. The movements of glottal
muscles for the short high vowels /i/ and /u/ blend with those of the adjacent voiceless sounds or a pause (Jun 1993; Jun and Beckman 1994). In
many languages, the process is also considered to be part of the vowel neutralization and reduction processes in which vowels are first reduced in
duration and centralized in quality, typically in the unaccented position, and
then eventually devoiced and/or deleted in fast or casual speech (Hyman
1975; Wheeler 1979; Dauer 1980; Kohler 1990). 
The Japanese high vowels /i/ and /u/ also become voiceless when surrounded by voiceless consonants, or when preceded by a voiceless consonant and followed by a pause: i.e. /C8VC8/ or /C8V#/ (where the Vs are
[+high]). However, in Japanese the vowel devoicing processes do not
involve apparent centralization of vowels. There is no obvious durational
reduction of vowels in the unaccented positions in Japanese, nor does
vowel quality depend on accentuation. However, the vowel devoicing process is very common in many Japanese dialects, especially in eastern dialects
including Standard Japanese. The process occurs even in slow or formal
speech (Kondo 1997). This suggests that Japanese high vowel devoicing is
not merely an optional process in fast or casual speech, but is also a phonologically controlled process.

230 Mariko Kondo

Vowel devoicing means a lack of vocal fold vibration during the production
of the high vowels /i/ and /u/ between voiceless sounds. This seems to be a
natural process because it is not very economical for vocal folds to vibrate
during vowel production when the glottis is open for the preceding and
following voiceless sounds. Studies have suggested that Japanese vowel
devoicing can be affected by various phonetic and phonological factors,
such as the type and combination of preceding and following consonants
(Kuwabara and Takeda 1988; Yoshida and Sagisaka 1990), the presence of
an accent on the vowel (Takeda and Kuwabara 1987), position in a word or
utterance (Maekawa 1989; Takeda and Kuwabara 1987) and following
word boundary (Sakurai 1985). However, Kondo (1997) found that Japanese vowel devoicing was an almost obligatory process even when the vowel
was accented and followed by an internal word boundary, so long as there
were no devoiceable vowels in adjacent syllables (the single devoicing environment). For example ashita /asita/ tomorrow, kikai /kikai/ machine
and kusa /kusa/ grass (in these examples and future examples vowels that
can be devoiced are shown in italics and underlined). Devoicing in two
consecutive syllables can also occur to a certain extent in spontaneous
speech (Maekawa and Kikuchi, this volume). However, when vowels in adjacent syllables are all devoiceable (the consecutive devoicing environment),
such as kashitsuchishi /kasitutisi/ accidental death and fukushikikokyuu
/fukusikikokjuu/ abdominal breathing, some vowels remain voiced. All
studies agree that in the consecutive devoicing environment only some devoiceable vowels undergo the devoicing process. Therefore, the devoicing
factors suggested above do not always result in devoiced vowels.
When speaking tempo is altered, the effects of most of the suggested
devoicing factors are minimal in the single environment. If vowel devoicing
is merely a consequence of articulatory undershoot or glottal gestural overlap between short high vowels and voiceless consonants, then the devoicing
should occur more when speech rate increases. In Japanese, devoicing of
high vowels seems to be almost compulsory at all tempi, as long as there
are no devoiceable vowels in neighboring syllables. Speaking tempo has
very little effect on devoicing rates. On the other hand, devoicing rates in
consecutive devoicing environments vary according to the tempo. Devoicing
rates of high vowels in prose texts are not necessarily high at a comfortable
speaking tempo for all types of preceding consonants. But when consecutive
devoicing environment data are excluded, the devoicing rates at all tempi
significantly rise, and vowel devoicing seems to be almost compulsory
(Kondo 1997).

Syllable structure and its acoustic effects on vowels in devoicing environments 231

Devoicing rates indicate that vowels do not always become voiceless in

phonetically ideal environments. The most important factor affecting vowel
devoicing is whether there is a devoiceable vowel in adjacent syllables: i.e.
single or consecutive environments. Almost all high vowels in single devoicing environments are devoiced, whereas in consecutive environments
high vowels sometimes remain voiced. It has also been found that voiced
vowels in devoicing environments are often acoustically different from the
same vowels in non-devoicing environments. Acoustic analyses of high
vowels in devoicing environments revealed that vowel devoicing is not
always a clear-cut distinction of either voiced or voiceless; there are also
many partially voiced/devoiced vowels. In fact, phonetic realizations of
vowels in devoicing environments varies from fully voiced to completely
voiceless. Despite being phonetically in the same condition, high vowels
show different acoustic characteristics. This means that vowel devoicing
does not simply indicate the presence or absence of vocal fold vibration,
but involves fundamental acoustic changes of the vowels in the processes,
and the degree of change is dependent on its phonological environment.
Devoicing occurs mainly in the single devoicing environment. It sometimes
occurs in the presence of two consecutive syllables, but never in three at a
normal speaking tempo. Devoicing conditions in single and consecutive
devoicing environments are phonetically identical. This implies that phonetic conditions are not the only conditions that affect devoicing. When
phonetic and phonological conditions are in favor of vowel devoicing and
the high vowels become voiceless, it is important to determine whether the
acoustic changes of the vowels are simply a change of phonation or the
processes involve other acoustic changes.
In this paper, the acoustic changes of vowels in devoicing environments
will be examined with respect to vowel duration and intensity. Since devoicing rates differ significantly in single and consecutive environments,
vowel quality will be examined under both environments. Based on the
results of the acoustic study, the devoicing processes will then be analyzed
in terms of syllable structure. The aim is to determine how Japanese vowel
devoicing changes syllable structures, why some vowels in devoicing environments do not become voiceless and also why devoicing does not occur
in consecutive syllables, especially when there are more than three consecutive syllables. From the results there will be a discussion as to whether
Japanese vowel devoicing processes are part of the vowel weakening processes that produce devoicing in other languages.

232 Mariko Kondo


Acoustic characteristics of vowels in the devoicing environment

2.1. Durations of Devoiced Morae

It is well known that Japanese speech rhythm is based on the mora. There is
a tendency towards equalizing durations of morae (Campbell and Sagisaka
1991; Y. Sato 1993; Han 1994, etc.), and the duration of whole words or
phrases is proportional to the number of morae in those words or phrases
(Port et al. 1987). However, M. Beckman (1982) found the duration of
morae with devoiced vowels were significantly shorter than their voiced
counterparts. If vowel devoicing simply means a change of vowel phonation from voiced to voiceless, then voiceless vowels should retain their
duration. However, if devoicing is part of the vowel weakening process,
durational reduction may occur. Therefore, the durations of devoiced morae
were compared with the durations of their voiced counterparts in the same
phonetic environment to examine whether devoiced vowels retain their
In the experiment, six subjects pronounced 41 test words (containing 74
devoicing sites) three times each in random order. Their individual pronunciation of devoiceable vowels in the same words was not always consistent.
For example, both /i/ vowels in /hootiki/ fire alarm are in the devoicing
environment. The same speaker may devoice the first /i/ in one utterance
and voice it in another utterance. When voicing of the same vowel varied,
the duration of the mora with a voiceless vowel was compared with the
duration of its voiced counterpart with a voiced vowel. In order to minimize
the effects of the various factors that control segmental duration such as (a)
type of phonemes, (b) neighboring phonemes, (c) mora position in a breath
group, and (d) speaking rate, durational comparisons were made only of the
same mora in the same word uttered by the same speaker in utterance internal positions.
Data were collected from 738 recorded words (41 test words x 6 subjects
x 3 pronunciations) containing 1332 devoiceable vowels (74 devoicing
sites x 6 subjects x 3 pronunciations). There were 45 devoicing sites that
had voicing variations. All words with voicing variation (45 sites x 3 times
= 135 high vowels) were segmented using Waves+ speech analysis on a
SUN workstation.
The results found that morae with voiceless vowels were significantly
shorter in duration than those with voiced vowels [t(44)=8.49, p <.001]
(Figure 1). The average ratio of devoiced morae against /CV/ counterparts

Syllable structure and its acoustic effects on vowels in devoicing environments 233

was 83.93% (SD 12.97). However, the durations of devoiced morae were
significantly longer than their corresponding consonants portion of /CV/
morae [t(44)=13.62, p<.001].

Figure 1. Average durational difference between devoiced morae and consonants

and vowels in CV morae of all types of consonants

Figure 2. Average closure duration and the duration after release of stops and stop
part of affricates in CV morae and devoiced morae

The closure durations of stops and the stop part of affricates in devoiced
morae were compared with the closure durations of prevocalic stops and
the stop part of prevocalic affricates in /CV/ morae. The average closure
duration of stops and affricates in devoiced morae was not significantly
different from that in /CV/ morae as shown in Figure 2. However, the average duration of stops and affricates in devoiced morae excluding closure
duration (i.e. after release of stop closure) was significantly shorter than
that of /CV/ morae [t(31)=7.12, p<.005]. This means that vowel devoicing
reduces the duration of devoiced vowel but does not affect the duration of

234 Mariko Kondo

the preceding consonant. It is technically impossible to measure the duration
of devoiced vowels after voiceless fricatives, since there is no way to tell the
boundary between the voiceless vowel and preceding voiceless fricatives.
Therefore, the durations of morae with devoiced fricatives were excluded
from the data. The results indicated that when a vowel was devoiced, the
vowel became shorter than its fully voiced counterpart, and as a result, the
whole duration of the mora was reduced.1

2.2. Intensities of Vowels in the Devoicing Environments

The previous section demonstrated that there was durational reduction
when vowels were devoiced. Devoicing rates showed that vowels in single
devoicing sites were almost always devoiced, whereas only some vowels in
consecutive devoicing sites became voiceless. If devoicing in single sites is
a natural process, voiced high vowels in single sites are unnatural and
therefore they may be acoustically different from the same vowels in nondevoicing environments. On the other hand, if only some high vowels undergo the devoicing process in consecutive sites, then it must be natural for
voiced vowels in consecutive devoicing sites to retain the same acoustic
qualities as in non-devoicing environments. Moreover, if devoicing is part
of the vowel weakening process, voiced vowels in devoicing environments
may show weakening of their intensities as well as durational reduction.
An experiment was conducted to measure intensities of voiced vowels
in devoicing environments at three speaking tempi and to compare them
with the intensities of voiced vowels in non-devoicing environments in the
same words. Three subjects pronounced 6 test words listed in (1a) and (1b)
with single and consecutive devoicing sites at slow, comfortable and fast
tempi (devoiceable vowels are underlined in italic).

a. /ta,i.sjo.ku. te.a.te/
retirement allowance
/ka.mo.tu. se,N.pa.ku/ [kamotssempak] cargo boats
/ta.ka.sa.ki. si.mi,N/ [takasakiimii)]
the Takasaki citizens
b. /hu.ku.sjo.ku. ke,N.sa/ [kokkensa] dress inspection
/sjo.ku.hi.se.tu.ja.ku/ [okCisetsjak] a cut in food expenses
[haitskid)] exhaust limit
(Here dots /./ denote syllable boundaries, and commas /,/ denote
mora boundaries.)

Syllable structure and its acoustic effects on vowels in devoicing environments 235

The total number of devoiceable vowels in single devoicing sites in the test
words was 6 (as some of the test words contain more than 1 single site),
and the number in consecutive devoicing sites was also 6, yielding 324
devoiceable vowels ([6 vowels + 6 vowels] x 3 rates x 3 repetitions x 3
speakers = 324 devoiceable vowels).
The average intensities of voiced vowels in devoicing and nondevoicing environments, excluding word-initial and word-final morae, were
calculated for the three tempi and individual subjects, and were compared
using a T-test. Since speaking tempo was effective only in the single devoicing sites and not in the consecutive sites, the vowel intensities were
compared by their devoicing environments using a T-test. When the devoiceable vowels remained voiced in the single devoicing condition, their
intensities were significantly lower than those of non-devoiceable vowels at
all speaking tempi for all subjects (Table 1). One of the subjects (A) devoiced all devoiceable vowels at the normal tempo and only once voiced
the underlined devoiceable vowel /u/ in /hukusjokukeNsa/ at the slow rate.
Therefore, no comparison was made of subject As data for the two tempi.
This result was expected because when a vowel was voiced in a single devoicing environment, it was often partially voiced, i.e. the duration of the
vowel tended to be shorter and its intensity was lower, which sometimes
made it difficult to judge whether a vowel was actually voiced or voiceless.
Table 1. T-test results of intensity differences between voiced vowels in single
devoicing and non-devoicing environments (one-tailed)




Average intensity
of devoiceable
vowels (dB)

Average intensity
of non-devoiceable
vowels (dB)





p < 0.005













p < 0.025




p < 0.05





p < 0.001




p < 0.025




p < 0.05




p < 0.001

236 Mariko Kondo

However, in the consecutive devoicing environments, when devoiceable
vowels were voiced their intensities were not necessarily lower (Table 2).
The intensity of voiced devoiceable vowels and the intensity of nondevoiceable vowels were significantly different only for Subject A at all
tempi. There was also a significant difference at the normal tempo for Subject B, but not at fast or slow tempi, and not at any tempo for Subject C.
Table 2. T-test results of intensity differences between voiced vowels in consecutive devoicing and non-devoicing environments (one-tailed)


Average intensity
of devoiceable
vowels (dB)

Average intensity
of non-devoiceable
vowels (dB)







p < 0.025



p < 0.005




p < 0.005









p < 0.001






















The average intensity ratios of voiced devoiceable vowels of three speakers

at three tempi in single and consecutive environments are presented in Figure
3. Subject As intensity data at normal and slow tempi in the single environment were excluded from the analysis as their intensities were not statistically compared (see Table 1).
Intensity of sound is very sensitive and is influenced by various factors,
such as neighboring sounds, pitch, stress and accentuation. Under the same
conditions, the same vowel has higher intensity in higher pitch than in
lower pitch, and also higher intensity in a stressed position than in an unstressed position. Moreover, different vowels have their own intrinsic intensities even when spoken with equal effort. Vowels made with a wider vocal
tract have a higher intensity level than close vowels. For the same degree of
opening, vowels with closer F1 and F2 have a higher intensity than vowels
with F1 and F2 far apart; i.e. back vowels are a little more intense than

Syllable structure and its acoustic effects on vowels in devoicing environments 237

Figure 3. The average intensity ratios of three speakers between voiced devoiceable vowels and non-devoiceable vowels in single and consecutive devoicing environments at three tempi

front vowels. In Japanese, the F1 and F2 of the vowels [i], [e] and [] are
relatively far apart while [a] and [o] have relatively close F1 and F2. In
other words, the intensities of [i], [e] and [] are generally less than those
of [a] and [o]. In this experiment, all devoiceable vowels were either [i] or
[] with an inherently weak intensity. This may have lowered the average
intensity ratios of devoiceable vowels against non-devoiceable vowels that
are inherently greater in intensity.
It was extremely difficult to find ideal test words for comparing intensities in both devoicing and non-devoicing environments, and therefore the
type of vowel tested was not always identical. Although there were differences between the intensities of voiced vowels in the devoicing and nondevoicing environments, this might simply have been due to the different
types of vowels in the two environments. Under equal conditions, high
vowels have intrinsically lower intensities than non-high vowels, and all
vowels in the devoicing environment are high vowels. However, the following patterns were noted: (a) more intensity weakening at all tempi in the
single devoicing environment than in the consecutive environment, (b)
greatest intensity weakening at the fast tempo and least intensity weakening
at the slow tempo in the single devoicing environment, and (c) there was no
tempo effect on intensity in the consecutive devoicing environment.

238 Mariko Kondo

Vowels in the devoicing environments are not only shorter but also have
less intensity than non-devoiceable vowels. In other words, vowels in the
devoicing environments are first reduced in duration and intensity, and then
further devoiced. In extreme cases, the vowels become deleted.
3. Syllable constraints on vowel devoicing
High vowels are almost always devoiced in the single devoicing environment, whereas devoicing is not a compulsory process in the consecutive
devoicing environment. Also, the results presented in the previous section
indicate that voiced vowels in single devoicing environments are acoustically short and weak whereas in consecutive devoicing environments they
have full duration and full voicing, despite being in phonetically identical
conditions. In other words, the devoicing process must be controlled by
more than just phonetic factors. Moreover, there are no definite voicingdevoicing patterns of preceding and following consonant types, although
there is a tendency for preceding fricatives to trigger devoicing more than
stops or affricates, in both single and consecutive devoicing environments
(Maekawa and Kikuchi, this volume). Physiological studies found that
movements of laryngeal muscles during the production of a voiceless consonant and a following high vowel blend better when the preceding consonant
is a fricative than a stop or affricate (Yoshioka 1981). Devoicing rates of
vowels preceded by fricatives are high in single devoicing sites, but they
are not necessarily high in consecutive devoicing environments. This
means that devoicing is controlled by the presence of devoiceable vowels in
adjacent syllables in addition to phonetic factors.
The fundamental difference between the two environments is that only
in the single devoicing environment is it possible for a preceding consonant
to be resyllabified to an adjacent syllable after a vowel becomes voiceless,
i.e. a change of syllable structure. When a vowel becomes voiceless, it is
acoustically manifested either as the continuation of a preceding fricative or
as a preceding stop released into a fricative, thus creating sequences of
voiceless consonants. Japanese syllables are predominantly light open syllables /(C)V/. Consonant clusters do not occur within a syllable except for
the rare occurrence of /NC/ sequences in syllable coda position e.g.
/hoNtte/ Books are... and /abadi:Nkko/ Aberdonian. They can also occur
in a /Cj-/ sequence in syllable onset position if we consider a glide /j/ as a
consonant, as in /tja/ tea and /gjoo/ line. When the voiceless vowel loses
its sonority, the preceding consonant in the devoiced syllable cannot consti-

Syllable structure and its acoustic effects on vowels in devoicing environments 239

tute a syllable on its own. The syllable structure of a word is altered as a

result of vowel devoicing. In addition to the mora, the syllable is important
in Japanese as an accent bearing unit, and constrains various phonological
processes, such as the formation of loan words (It 1990; Shinohara 1997).
Vowel devoicing processes lose the syllabicity and moraic status of devoiced morae and change the syllable structure of a word by creating consonant sequences.
The phonological structure of the word akikan /a.ki.kaN/ [akikaN] (unaccented) empty can with devoiced [i] is represented as (2a)2. When the
high vowel /i/ becomes voiceless, the devoiced mora /kC/ cannot be attached
to the syllable node because the syllable has lost its core element. Therefore, the second syllable cannot sustain its status as a syllable (2b). Then
the remaining /kC/ becomes non-moraic because devoiced morae are not
long enough to be considered as a mora (see Section 2.1). The quality of
the devoiced vowel /i/ is reflected in the palatalization of the preceding
consonant as /kC/, but duration is one of important characteristics of the
mora in Japanese. Hence, the non-moraic /kC/ is syllabified to the coda of
the preceding syllable /a/ (2c). The process creates a bimoraic heavy syllable /VC/ (which is permitted in Japanese), and reduces the number of syllables and morae of the word.
( denotes a syllable, denotes a mora.)
(2a) akikan /akikaN/ [akikaa)]3








a N

kC k

a N

When devoicing occurs at the beginning of a word, the consonant in the

devoiced syllable is syllabified to the onset of the following syllable because this is the only possible place it can move to e.g. kita /ki.ta/ [kita]

240 Mariko Kondo

(unaccented), north and hikari /hikari/ [CikaRi] light. As shown in (3ac),
when the vowel /i/ becomes voiceless in the word kita /ki.ta/ (3a), the sequence [ki] loses its syllabic status and becomes /kC/ (3b). Then the /kC/
becomes non-moraic because devoiced morae are not long enough to be
considered as fully moraic. Hence, the non-moraic /kC/ is syllabified to the
onset of the following syllable /ta/ (3c). The syllable onset consonant clusters
are not very common in Japanese, but occur in /Cj-/ sequences.
(3b) [kCta]

(3a) kita /kita/ [kita]

(3c) [kCta]





kC t

kC t

Devoicing occurs even when a devoiceable vowel is in an accented syllable.

Devoicing of an accented vowel can be blocked or avoided by shifting the
accent to another syllable (McCawley 1977; Sakurai 1985; Vance 1987).
However, vowel devoicing in accented syllables can also occur in normal
speech (N. Hattori 1989). As mentioned earlier, in Japanese, syllables carry
the lexical accent. When the vowel in an accented syllable is devoiced, and
its preceding consonant cannot sustain its syllabic status, it can no longer
carry an accent. Even if the preceding consonant remains moraic, the mora
is not an accent bearing unit. For instance, the underlined /i/ in the word
shokikan /sjo.ki.kaN/ cabinet secretary is devoiceable. When it is devoiced and loses its syllabicity, it cannot simply be syllabified to the preceding syllable /sjo/ as (4a). This is because it was the syllable /ki/ that carried the accent in the word and the accent bearing syllable has now been
The preceding /k/ has to be syllabified to the following syllable /kaN/.
This analysis seems appropriate since the accented syllable corresponds to
the acoustic manifestation of an accent match. The acoustic cue for the lexical accent is the fall of the fundamental frequency (F0) from an accented

Syllable structure and its acoustic effects on vowels in devoicing environments 241

vowel to the following syllable. Perceptual cue for accent on a devoiced

vowel is the unusually high starting F0 of the following vowel that then
falls very sharply (Sugito and Hirose 1988). The first /k/ is desyllabified,
becomes non-moraic as in (4b), and then is resyllabified to the onset of the
following syllable /kaN/, creating the superheavy syllable /kCkaN/ (/CCVC/)
as in (4c). The acoustic cue of the lexical accent is manifested in that syllable.
(4a) */sjokikaN/

(4b) Demoraification
of / kC/

(4c) Resyllabification




s j o kC k a N

s j o kC k a N

s j o kC k a N

When there is only one devoiced vowel in a non-word-initial syllable, the

preceding consonant in the same mora can be syllabified to its preceding
syllable. In the case of word initial position or in an accented syllable, it
can be syllabified to the following syllable. Therefore, vowel devoicing in
the single devoicing environment is always possible4. However, in the consecutive devoicing environment, not all devoiceable vowels can become
voiceless. For example, both underlined italic /u/ vowels are devoiceable in
the word dookutsu /dookutu/ (unaccented) cave, and common pronunciations for the word are [do:k8ts] with the first /u/ devoiced and [do:kts]
with both /u/ vowels voiced. In an earlier study the pronunciation [do:k8ts]
occurred in 16 out of 18 samples (Kondo 1997). The first pronunciation is
possible because /k/ in /ku/ can be syllabified to the following syllable as
shown in (5a). The process (5b) may be plausible. However the first syllable
is super heavy /CVVC/ which is less favored in Japanese. The morphological
structure must also be considered because strictly speaking the word dookutsu is a compound word (doo + kutsu), and very few native speakers
would analyze the word as dooku + tsu. Therefore, it would be more sensible to analyze the process as (5a). The vowel /u/ of /ku/ becomes voiceless,

242 Mariko Kondo

then is desyllabified, becoming non-moraic, and finally resyllabified to the
following syllable /tu/.
On the other hand, in normal speech it is not possible to pronounce
*[do:k8ts8] with two consecutively devoiced vowels. As shown in (5c),
/k/ in /ku/ can be syllabified to the preceding syllable creating /CVVC/, but
/t/ in /tu/ cannot because it would create the sequence */CVVCC/. This is
not considered to be an acceptable superheavy syllable as the second last
consonant is not a moraic nasal. Therefore, the pronunciation of [do:kts]
with both voiced vowels is more favorable than the syllable final obstruent
(5a) /doo#kutu/ [do: k8ts]



d o

d o

k u t

o kx t


o kx t u

(5b) */dooku#tu/ [do: k8ts]



o k

o kx t


o kx t u

Syllable structure and its acoustic effects on vowels in devoicing environments 243

(5c) *[do: k8ts8]



u t

(6a) [k8ts8ita]


o kx ts

(6b) *[k8ts8ita]



kx t

o o kx ts

kx ts

In the word kutsushita /kutusita/ sock(s), where all the underlined vowels
are devoiceable, consonants preceding devoiceable vowels are the stop [k],
the affricate [ts] and the fricative []. The pronunciation [k8tsi ta] with
the first and third vowels devoiced and the second vowel voiced is most
common. This process can also be explained in relation to the syllable structure. As shown in (6a) and (6b), when the first vowel /u/ in /ku/ becomes
voiceless, the preceding /kx/ is desyllabified and becomes non-moraic, and
then is syllabified to the onset of the following syllable. The third vowel /i/
also becomes voiceless, and the preceding /s/ [] is also syllabified to the
onset of the following syllable. The process creates sequences of less common syllables /CCV/+/CCV/, but it is still better than devoicing in three
consecutive morae. Triple devoicing is not acceptable as shown in (6b). The

244 Mariko Kondo

first two consonants /kx/ and /ts/ cannot be syllabified to the following syllable, because it would create quadruple-consonant clusters */CCCCV/.
Moreover, in the word /kutusita/ sock(s), the syllable /tu/ carries the lexical accent. It is most logical to leave the vowel of /tu/ voiced rather than
creating an awkward heavy syllable */tsta/ [tsta] or */ktsta/ [kxtsta]. Alternatively, it is also acceptable to pronounce this word with all vowels voiced
[ktsita], but never with all three vowels devoiced.
The devoicing processes can be explained as (1) change of phonation,
(2) durational reduction, (3) loss of syllabisity, (4) demoraification, and (5)
resyllabification. For analyses of other syllable structures such as in cases
including devoicing before geminate consonants and the acceptability of
bimoraic syllable onset, refer to Kondo (2001).

4. Conclusions
Vowel devoicing is fundamentally a phonetic process that economizes glottal
movements of a short high vowel and its surrounding voiceless consonants.
However, Japanese vowel devoicing processes are also affected by various
phonological factors, especially the syllable structure. The experimental
results showed that high vowels in the single devoicing sites were almost
always devoiced but not all devoiceable vowels became voiceless in the
consecutive devoicing sites. Vowels in typical devoicing sites became voiceless only when the consonants preceding devoiced vowels are possible to
be syllabified to their adjacent syllables. Moreover, voiced high vowels in
typical devoicing environments were often not fully voiced and were reduced in duration. These voiced devoiceable vowels were not only shorter
but also had less intensity. This means that it is more natural for high vowels
to undergo the devoicing process between voiceless sounds. Therefore when
they did remain voiced they were acoustically shorter and weaker than when
they occurred in the non-devoicing environment.
The results also suggest that vowel devoicing is part of a vowel weakening process and the final state of the process is completely voiceless or in
an extreme case vowels are deleted. Vowel weakening in Japanese affects
vowel intensity and duration, but the quality of the vowels remain relatively
unchanged regardless of the intensity level of the vowel. Two different
mechanisms, namely phonetic and phonological processes, appear to control Japanese vowel devoicing. The vowel devoicing environment must
qualify certain phonetic conditions: namely short vowels surrounded by

Syllable structure and its acoustic effects on vowels in devoicing environments 245

voiceless consonants or a voiceless consonant and a pause. Even when a

phonetic environment favors devoicing, devoicing may be blocked if constrained by syllable structures. When syllable structures do not block vowel
devoicing, other factors such as type of adjacent consonants, presence of
lexical accent, word boundary, become effective.

The author would like to thank an anonymous reviewer for useful comments
and suggestions.

1. The presence of one devoiced vowel in a word did not affect the duration of a
whole word. Despite shorter duration of devoiced morae, the whole durations
of words did not show a significant difference from words of the same number
of morae without devoiced vowels. However, when there are more than one
devoiced vowel in a word, the whole word duration becomes significantly
shorter. See Kondo (2003) for details.
2. The mora tier usually represents an alternative rather than an addition to the
CV tier, and onset consonants are attatched directly to the syllable node as they
are nonmoraic (Hayes 1989; Kenstowicz 1994). However, I use a separate CV
tier in order to present clearly the formation of moraic consonant and resulting
syllable structures.
3. Pseudo-phonemic transcriptions are used to describe devoiced vowels and
resulting moraic consonants for convenience. The examples are the devoiced
vowels /i/ and /u /, the allophones of consonants // instead of /s/ and /sju/ (in
/si/ and /sju/), /C/ and // instead of /h/ (in /hi/ and /hu/), /t/ instead of /t/ (in
/ti/) and /ts/ instead of /t/ (in /tu/). /kC/ was also used to indicate palatalization
of /k/ and its release into a palatal fricative [C] in /ki/, and [kx] for /k/ in /ku/ to
indicate backness of the /k/ and its release into a velar fricative [x].
4. For the arguments concerning vowel devoicing and the loss of syllabicity, see
Kondo (2001).

The effect of speech rate on devoiced accented

vowels in Osaka Japanese
Miyoko Sugito



This paper discusses the results of acoustic and physiological experiments on

the effects of speech rate on devoiced accented vowels in Osaka Japanese.
The close vowels /u/ and /i/ are often devoiced between voiceless consonants
in many dialects of Japanese. Word accent on the moras with devoiced
vowels are generally shifted to the following moras in the Tokyo dialect and
other dialects, as are shown in accent dictionaries (NHK 1998; Kindaichi
and Akinaga 1997). However, in the Osaka, Kyoto and other dialects in the
Kansai district, the close vowels /u/ and /i/ preceding open vowels such as
/a e o/, as in /kusa/ grass, /sita/ tongue or /sika/ deer etc., are often both
devoiced and accented in natural or fast speech. Using the results of acoustic
and physiological experiments, this paper explores how devoiced accented
vowels are produced, and also how accent change occurs in those words
when they are produced at different speech rates.


Preview studies on word accent in Osaka Japanese

2.1. Word accent in two-mora words in Osaka Japanese

Most dialects of Japanese have a moraic word pitch accent. Two-mora
words in the Osaka dialect have four kinds of accent: HH, HL, L-HL, LH,
where H and L represent high and low pitch. The accent-type -HL refers to
a descending pitch from high to low within a single mora. Wada (1947)
classified these pitch accent patterns into two types: high-starting and lowstarting. These differences are correlated with physiological differences
(Sugito and Hirose 1978), as reported in section 5.1 of this article.


Miyoko Sugito

2.2. Words with devoiced accented vowels

The devoiced, accented vowel has been a central topic in discussions on
whether the Japanese language has pitch accent or not. S. Hattori (1960)
and Kawakami (1969) reported that devoiced vowels might be heard as
accented because of the greater intensity in that mora.
Polivanov (1928) was the first to report a devoiced accented word on the
basis of fieldwork carried out in 1914 and 1915, viz. in /kita/ north in the
Mie dialect in Nagasaki. He explained that only /a/ had a falling F0 contour,
while the /kit/ portion had no vocal cord vibration. S. Hattori (1928) reported that words with devoiced accented initial vowels in the Kansai dialect, such as /sita/ tongue and /sika/ deer, supposedly had final vowels
with falling tones, where initial refers to in the initial mora or syllable;
final refers to in the final mora or syllable. Sakuma (1931) extracted F0
contours of words such as these and compared them with words with the
accent type L-HL in the Kansai dialect. However, he failed to find falling
F0 contours on the final vowels, and concluded that the accent type in these
words was not HL or L-HL but all had LH instead, insisting, incidentally,
that experiments on Japanese accent were useless.
Sugito (1969) extracted fundamental frequencies of 556 words of 1 to 6
moras produced by both native Tokyo and Osaka dialect speakers. The results showed that the falling F0 contours of the following vowels influenced
perception of the preceding vowels as accented (Sugito 1969). Vowels following devoiced accented vowels had more sharply falling F0 contours than
those following accented voiced vowels (Sugito 1969/1970). The abrupt
falling F0 contour on the second vowel plays an important role in listeners
perceiving an accent on the first devoiced vowel (Sugito 1969, 1982).
Maekawa (1990), Matsui (1993) and M. Kitahara (1998) ran follow-up
acoustic and perceptual experiments which showed similar results.
In the production and perception of devoiced accented vowels, speech
rate may also play an important role. This paper reports on experiments that
examine the relationship between speech rate and the production of devoiced accented vowels.

The effect of speech rate on devoiced accented vowels in Osaka Japanese



Experimental procedures

3.1. Acoustic experiments

Two-mora words were uttered in two frame sentences in randomized order
seven times each, at slow, natural, fast and very fast rates, respectively. The
words were /kusa/ grass, /kuse/ habit, /sita/ tongue /sika/ deer with a
close-open vowel sequence [Note: The terms 'close' and 'open' vowels are
used instead of 'high' and 'low' to avoid confusion with the H or L notation
for word accent and also with high or low fundamental frequencies.], /kasa/
bulk with an open-open vowel sequence, and /kusi/ comb with a closeclose vowel sequence. All of these words have the accent type HL. The
word /huta/ (close-open) lid with accent HH was also included.
The two sentence frames were (A) Kore-wa. (This is.) with HHH
accent, and (B) Tsugi-wa. (The next is.) with HLL accent.
The speakers were one male (MN), born in 1930, and two females (KK and
TA), born in 1963 and 1979, respectively. All three speakers were born and
raised in the Osaka Prefecture.
The accent of each word in a sentence frame was presented auditorily
five times in randomized order. The listeners were three female Osaka dialect
speakers who had studied Japanese accent. An accent type was assigned
when there was 90% agreement among the three listeners.
Acoustic analysis was done using the SUGI SpeechAnalyzer (Sugito
2000). Presence or absence of voicing in the first mora of tokens of /kusa/
was decided on the basis of speech waves and spectrograms. To determine
speech rate, durations of the words and the second vowel /a/ were measured.
In addition, F0 contours were extracted.

3.2. Physiological experiments

The material for the physiological experiments used in this paper was made
available at the University of Tokyo Research Institute of Logopedics and
Phoniatrics. The subject was an Osaka dialect female speaker YI, born in
1950. Electromyographic recordings were made for twelve randomizations
of the words /imi/ (four accent types) and /kusa/, /kusi/, etc. (accent HL),
using hooked wire electrodes inserted into the cricothyroid (CT) and sternohyoid (SH) muscles (Sugito and Hirose 1978, 1988).


Miyoko Sugito

Results of the acoustic experiments

4.1. Effect of speech rate on voicing and perception of pitch accent

The results of acoustic analysis of the words /kusa/, /kuse/, /sita/, and /sika/
(close-open vowel sequences) produced at natural or fast rates were similar,
their initial vowels often being devoiced and accented, although in slow
speech all of the words were produced with the first vowels voiced. In this
section, the results of /kusa/ uttered by speakers MN, KK, and TA are examined. Table 1 shows the results for /kusa/ produced in the sentence frame
(A) kore-wa (HHH) kusa (HL) by three speakers at three different rates:
natural, fast and very fast. The table shows the voicing status of the initial
vowel /u/, the perceived type of accent, the number of times that accent was
perceived as such out of seven utterances, the averaged durations and standard deviations of the words, and the averaged durations and standard deviations of the second vowel /a/. The following is a description of the results for each of the three speakers.
Table 1. Analyzed results of utterances /kusa/ in carrier sentence (A), spoken by
(1) MN, (2) KK, and (3) TA, at natural, fast, and very fast rates.


(1) MN

very fast

(2) KK

type of







mean (SD)

mean (SD)


305.0 (34.3)
275.0 (24.2)

65.9 (11.1)
79.0 (6.6)



266.0 (36.8)
202.0 ()




165.5 (19.7)

65.0 (10.7)



405.6 (42.2)

118.7 (22.0)



334.7 (15.3)

104.0 (14.2)

very fast



275.5 (36.8)
224.0 ()

98.0 (18.2)
47.0 ()



338.0 (40.1)

142.3 (35.8)



very fast



334.0 (44.2)
280.9 (32.7)
187.0 (14.6)

160.8 (35.4)
128.4 (29.2)
65.0 (12.6)


175.0 (7.1)

(3) TA

1st vowel
or not



The effect of speech rate on devoiced accented vowels in Osaka Japanese


(1) MN results: At a natural rate of speech, MN uttered the first mora of

/kusa/ voiced (+v) with accent HL. At a fast rate, five tokens of the initial vowel /u/ were accented and voiced, while two were devoiced (-v).
However, when he was urged to speak faster, six of the /u/ vowels were
devoiced, and all of the words were heard as High-High accented (HH*
in the table). Six of the vowels /u/ were devoiced. However, the final
/a/ vowels were all heard by the Osaka speakers to be as high as the devoiced accented first vowels /u/.
(2) KK results: KK produced /u/ as devoiced and accented in natural,
fast, and very fast speech. One exception occurred in very fast speech
in which /u/ was voiceless and the accent was perceived as HH.
(3) TA results:
TA produced the /u/ as devoiced and accented 4 out of 7
times in natural speech, and all of the time in fast speech and very fast
speech. Accent shift occurred twice when spoken at a very fast rate.
Notice that the durations of /a/ of all the tokens that were perceived as
accent HH are very short.
Table 2. Analyzed results of utterances /kusa/ in carrier sentence (B), spoken
by1MN, (2) KK, and (3) TA, at natural, fast, and very fast rates.


1st vowel
or not

type of



very fast





mean (SD)

mean (SD)


287.1 (23.3)
277.7 (13.9)
272.8 (22.5)

64.7 (10.3)
75.3 (16.2)
74.0 (4.6)



163.9 (19.9)
388.1 (21.1)

60.3 (10.0)
131.9 (14.9)


331.2 (11.7)

102.2 (12.0)






282.6 (15.2)
261.0 (15.6)












341.8 (23.0)
270.0 (33.7)

170.0 (24.9)
122.4 (22.8)

very fast


169.6 (38.1)

60.0 (21.8)

(1) MN

(2) KK


very fast

(3) TA







Miyoko Sugito

Table 2 shows the data for sentence frame (B) Tsugi-wa (HLL) kusa
(HL). Looking at the table, we see that MN and TA devoiced and accented
/u/ in /kusa/ more often in sentence frame (B) than in sentence frame (A).
KK devoiced and accented all words. For TA, both voiced and devoiced
vowels were found in both sentence frames; however, accent changes occurred more often in (B) than in (A).
The results of the acoustic analysis may be summarized as follows: (1)
Individual differences were observed. Speaker MN tended to produce the
first mora voiced and accented. However, for the younger speakers, KK
devoiced all the first mora vowels, while TA usually, but not always, devoiced and accented them. (2) Speech rate affected vowel voicing and accentedness. In fast speech, devoiced, accented vowels were observed more
often than in natural speech. At a very fast speech rate, not only was the
first mora vowel devoiced, but also the word accent tended to change to
HH. (3) The sentence frame affected the accent patterns. Accent change
was more often observed in frame (B) (accent HLL) than in (A) (accent
HHH). A reason may be that when they spoke at a fast rate, it was more
difficult for speakers to make the necessary laryngeal adjustments to raise
the pitch for the accent HL immediately after the falling tones of Tsugi-wa

4.2. Speech rate and accent

Examples of the effect of speech rate on accent are illustrated in Figure 1,
which shows examples for speaker TA of speech waves in the top panels
and F0 contours in the bottom panels of /kusa/ with accent HL in the sentence frames (A) and (B). The examples of sentence frame (A) Kore-wa
kusa (HHH HL) are on the left, and (B) Tsugi-wa kusa (HLL HL), on
the right. The vertical broken lines show the end of the sentence frame. The
tokens shown in (1)(4) were produced at a natural rate, while (5)(6) were
produced at a very fast rate. The F0 contours of (1) and (2) show that the
first vowels of /kusa/ are voiced, as indicated by the white arrows. The
same vowels in (3) and (4) are devoiced; however, the first moras of the
tokens are perceived to be accented because of the abrupt falling F0 contours on the second vowels (indicated by the black arrows). Although the F0
contours at the end of the carrier sentences in (5) and (6) are rising, suggesting that the following mora has high pitch, the tokens of /kusa/ were
perceived as having a HH accent by three Osaka dialect speakers. The second

The effect of speech rate on devoiced accented vowels in Osaka Japanese


vowels of /kusa/ in (5) and (6) are very short and their F0 contours have
nearly level tones.

Figure 1. Speech waves and F0 contours of /kusa/ (HL) grass in sentence frames,
(A) Kore-wa kusa(HHH HL) This is grass (1)(3)(5) and (B) Tsugiwa kusa.(HLL HL) The next is grass (2)(4)(6).
(1)(2): the first vowels voiced, accented. (3)(4): with devoiced accented
vowels (natural speech). (5)(6): with accent perceived as HH (very fast
speech). Broken lines: the beginning time points of the words kusa
(speaker: TA).

An acoustic comparison of /kusa/ (HL) and /huta/ with accent HH also provides evidence to support the shift of /kusa/ from HL to HH in very fast


Miyoko Sugito

Figure 2. Speech waves and F0 contours of /huta/ (HH) lid in sentence frames,
(A) Kore-wa huta(HHH HH) This is a lid (1)(3)(5) and (B) Tsugiwa huta.(HLL HH) The next is a lid (2)(4)(6).
(1)(2): the first vowels voiced. (3)(4): the first vowels devoiced (natural
speech). (5)(6): the first vowels devoiced (very fast speech) (speaker:
TA). Broken lines: the beginning time points of the words huta
(speaker: TA).

Figure 2 shows the F0 contours of the words /huta/ with HH accent in the
sentence frames (A) and (B). The tokens in (1)(4) were spoken at a natural
rate, and those in (5)(6) at a very fast rate by speaker TA. The F0 contours
of (1) and (2) show that the first mora vowels are voiced, as indicated by
white arrows. The first mora vowels of (3) and (4) are devoiced. The F0
contours of the second mora vowels are almost level. The second vowels of

The effect of speech rate on devoiced accented vowels in Osaka Japanese


/huta/ in (5) (6) of Figure 2 are similar to those of /kusa/ in (5) and (6) in
Figure 1. All of them had level F0 contours, short durations, and were perceived as having HH accent.


Results of the physiological experiments

This section uses the results of physiological experiments to investigate

how Osaka accent patterns and devoiced accented vowels are produced in
words /kusa/, with a close-open vowel configuration, compared with words
like /kusi/ with a close-close vowel sequence. We will also discuss the
question why accent change occurs in /kusa/ when it is spoken at a very fast

5.1. Production of Osaka accent

Figure 3 shows the averaged EMG (electromyographic) activities of the CT
(Cricothyroid) (thick line) and SH (Sternohyoid) muscles (thin line) in 12
repetitions of /imi/ spoken with four different accent patterns: HH, HL, LHL, and LH at natural speech rate by a female Osaka dialect speaker, YI.
The contours are superimposed on the same horizontal axis. Fundamental
frequency contours were also averaged. Activities of the CT, the muscle
shown to be involved in F0-raising, begin before the starting time point (the
vertical thick line) of words with High-starting accent (HH and HL). However, SH activity is seen to occur more than 200 msec preceding L-HL or
LH, (the Low-starting accents). SH activity has also been observed corresponding to the pitch fall on the second mora of the words with accent
types HL and L-HL. The activity of SH is physiologically explained in that
it is related not only to jaw opening and tongue back lowering, but also
with voice lowering (Honda et al. 1999). Notice also that when H precedes
L, as in the accent types HL or L-HL, the activity of CT is greater than in
the accent types HH or LH. The EMG data show that the activity of CT is
greater in HL, where H precedes L, than in HH where the fundamental frequency is high throughout.


Miyoko Sugito

Figure 3. Averaged F0 contours, patterns of EMG (electromyographies), CT and

SH, for twelve utterances of /imi/ with four accent types HH, HL, L-HL,
and LH, and speech envelopes (subject: YI).

5.2. F0 contours of /kusi/ and /kusa/ with accent HL

When /kusi/ (HL) and /kusa/ (HL) are spoken at a very slow rate, the F0
contours of both words are not very different from each other. However,
when they are spoken at a natural or fast rate, /kusi/ (with a close-close

The effect of speech rate on devoiced accented vowels in Osaka Japanese


vowel sequence) and /kusa/ (with a close-open vowel sequence) have different F0 contours. Figure 4 shows twelve superimposed F0 contours of
/kusi/ and /kusa/, respectively. Dotted lines were interpolated through the
period of /s/ from the end of V1 to the start of V2. Speaker YI spoke at a
natural speech rate during the physiological experiment. Here, the F0 contours of (1) /kusi/ and (2) /kusa/ are quite different from each other. In
/kusi/ (1), F0 contours begin to fall in the vicinity of the end of the first
vowel, while in /kusa/ (2) it begins to fall at the beginning of the second
vowels, as indicated by the black arrows. The initial vowel in /kusi/ is
voiced, while that in /kusa/ is devoiced (except in one token whose second
vowel starts a little lower compared with the other falling contours).

Figure 4. Superimposed F0 contours, (1) /kusi/ (HL) and (2) kusa/ (HL), twelve
utterances each. Dotted lines: interpolated through the period of /s/ from
the end of V1 to the start of V2. Arrows: the starting time points of falling F0 contours (speaker: YI).

5.3. Production of different F0 contours in /kusi/ and /kusa/

Figure 5 shows averaged F0 contours, cricothyroid (CT) muscle activity,
sternohyoid (SH) muscle activity, and the speech amplitude of (1) /kusi/
(close-close vowel sequences) and (2) /kusa/ (close-open vowel sequences)
with devoiced, accented first vowels. The contours represent the average of
12 and 11 repetitions, respectively. The vertical thick lines mark the onset
point of the second vowel.


Miyoko Sugito

Figure 5. Averaged F0 contours, EMG (electromyographies), CT and SH, and

speech amplitudes of (1) /kusi/ for twelve utterances, and (2) /kusa/ for
eleven utterances with devoiced accented vowels. Arrows: the starting
time points of F0 fall (subject: YI).

(1) /kusi/ with accent HL: The F0 contour of the first vowel of /kusi/ is high
while the second vowel starts with a relatively low frequency. CT activity
begins prior to the onset of the first vowel, which presumably accounts for
the high F0 of the first vowel. During the first vowel, only the CT is active
and SH activity is almost absent. SH activity begins prior to the onset of the
second vowel. The end of CT activity and the beginning of SH activity
occur at the same time at the end of the first vowel, as indicated by the broken vertical line. Activity of SH is associated with the low F0 of the second
(2) /kusa/ with accent HL: This figure shows the F0 contour, the CT, and
SH pattern of the word with devoiced accented vowel. The F0 contour of
the vowel following the devoiced mora starts high and then drops sharply.
It is notable that the CT peak (as indicated by the white arrow) is observed
at the time it would occur if the first vowel were voiced; the same time
point as observed in /kusi/. This suggests that the command for raising F0

The effect of speech rate on devoiced accented vowels in Osaka Japanese


was input for the first vowel, even though it was devoiced. An additional
peak of CT activity (where the small black arrow points) is also observed in
/kusa/. Notice that the second CT peak begins preceding the initial high
starting F0 of the second vowel /a/. In /kusi/, the onset of the second vowel
/i/ has a low F0, and correspondingly, there is no second CT activity associated with the second vowel /i/. As for /kusa/, co-occurring with the second
CT activity, there is also onset of SH activity. The SH activity is associated
with the F0 fall on the second vowel. All eleven utterances of words /kusa/
showed a similar pattern.

5.4. Discussion of the results of the physiological experiments

Differences were found in the F0 contours of /kusa/ and /kusi/ with the
same accent HL. The second vowel /a/ of /kusa/ showed a much steeper F0
fall than was observed in the second vowel /i/ in /kusi/. The second activity
of CT and following greater SH activity continued during /a/ of /kusa/. A
possible explanation for this may be that contraction of SH is involved in
hyoid-larynx lowering, jaw opening, and tongue backing instrumental for
F0 lowering (Honda et al. 1999). As for /kusa/ with the first vowel voiced in
Figure 4 (2), F0 does not begin to fall at the end of the first vowel, but at the
beginning of the second vowel. F0 fall on the following second vowel
causes the first vowel to be perceived as accented. The result is different
from what was discussed before, in which stress on the first vowel caused
the first mora to be heard as accented. Moreover, timing relationships between prosodic and segmental control are different according to the differences in vowel height (close-open vs. close-close) in these words (Sugito
2003). Speech rates also affect these differences.
Activity of CT was found for the accented first vowel even though it
was devoiced. This observation suggests that the first vowel in /kusa/ is
accented, even though there is no voicing during the vowel. The second set
of activities of CT, and the following activities of SH may be related to the
high starting F0 and abrupt falling contours of the second vowel /a/ of
/kusa/. However, when the words were pronounced very fast, the HL accent
of /kusa/ was changed to accent HH, as shown in Figure 1 (5) and (6).
There may not have been enough time for the muscle commands of SH to
bring about an abrupt falling F0 contour in the second vowel. Natural or
rather fast speech rate is necessary for pronunciation of words with devoiced accented vowels. It may be the case that in very fast speech there is


Miyoko Sugito

no SH activity associated with these second vowels. The vowels that follow
devoiced accented vowels need to have adequate length in order to allow
for a falling F0 contour to occur.

6. Summary
This paper examined the acoustic and physiological characteristics of voicing
and accent changes in Osaka dialect words at different speech rates. Individual differences were observed. When speakers spoke at a relatively fast
rate, devoiced, accented vowels were produced more frequently. Moreover,
at a very fast rate, the HL accent was often changed to a HH accent. Laryngeal activities for the devoiced, accented vowels in /kusa/ were compared
with those for the voiced accented vowels in /kusi/. CT activity in devoiced
accented /kusa/ was found to occur at the time it would have occurred if the
vowels were voiced. This observation strongly suggests that the devoiced
vowels were not only perceived as accented, but were also produced as
With regard to the laryngeal activity for vowels in the second mora of the
words, the second peak of CT activity was associated with a high starting
F0, and the following SH activity with a fall in F0. These joint activities
may be involved in the resulting steep falling F0 contour following the devoiced accented vowels. The accent change found in very fast speech might
be due to the short duration of the second vowels. That is, we might conjecture that since the vowels were short, there was no time for the SH to
become active, and consequently, no F0 fall occurred on these short vowels
spoken in very fast speech. We hope that additional physiological experiments with natural, fast, and very fast speech, using MRI, will provide
further insight into how this accent change occurs.

The author would like to express her gratitude to Professors Donna Erickson,
Raymond Weitzman, and Jeroen van de Weijer who kindly provided comments on this paper.

Where voicing and accent meet:

their function, interaction, and opacity problems
in phonological prominence
Shin-ichi Tanaka

1. Introduction
This study is devoted to rethinking the function and interaction of voicing
and accent from a perspective of prominence and tackling phonologicallysignificant issues on their interaction. Our ultimate goal is to shed new light
on their interaction in a general theory of prominence that involves the
harmonic scale of accent, tone, sonority, and voicing and to solve certain
problems observed in the accentual phenomena on devoiced vowels of
Japanese. Specifically, we are concerned with the issues of what happens
when a vowel that should bear accent is exactly in the position that should
be devoiced. This situation causes various problems because accent and
devoicing are incompatible in principle but turn out to be sometimes compatible in the phonological grammar of Japanese.
Let us review the historical background and motivation of our study.
There have been many phonetic studies on vowel devoicing and its relation
to accent in Japanese, and some researchers in this field are contributing
their recent findings to the vowel voice part of the present book (Sugito,
this volume). Compared to the abundance of phonetic literature on this topic,
little attention has been paid to a phonological account of what happens
when an accented vowel is devoiced. Yet we can find some theoretical
work in the metrical framework, such as Yamada (1990), Haraguchi
(1991), Tanaka (1992), and Yokotani (1997), which agree, on the basis of
the descriptive literature (NHK, ed. 1998; Akinaga, ed. 2001), that accent
can either remain on a devoiced vowel or shift to an adjacent vowel. However, the optionality and directionality of accent shift are so complicated
that derivational analyses such as those above are problematic in their empirical coverage and do not explain the cases of accent shift beyond metrical
constituents, as Yokotani (1997) and Tanaka (2002a) point out. Derivational accounts also pose the fundamental question as to how accent shift

262 Shin-ichi Tanaka

from a devoiced vowel is represented phonologically in the first place.
Their usual assumption is that a devoiced vowel loses its capacity to act as
an accent-bearer and that the loss of the capacity is expressed phonologically by deleting the syllable node of the devoiced vowel, which triggers
accent shift to an adjacent landing site.
In section 3.1, it will turn out that an accented devoiced vowel still has a
syllable node, while an accent-losing devoiced vowel does not, even though
they are equally devoiced. Even if it is true that a devoiced vowel loses its
original syllable node in the case of accent shift, it is still unclear what happens to the floating devoiced vowel with its voiceless onset that has lost a
syllable node: floating segments are erased by convention, but the devoiced
vowel may not be deleted, because vowel devoicing is distinct from vowel
deletion (e.g., sentakki  sentku ki / sentkki washing machine and
suizokkan  suizku kan / suizkkan aquarium). That is, the syllable
node should be deleted to cause accent shift but it should be preserved to
make the distinction between vowel devoicing and vowel deletion, which
makes the situation fall into a dilemma. All of these problems follow from
the fact that previous phonological accounts do not uncover the essential
nature of accent and voicing or make clear the exact phonological mechanism of the optionality of devoiced accent and accent shift.
A key to solving such problems lies in reconsidering accent and voicing
from a much wider perspective before focusing on the specific issues of
devoiced accent and accent shift. This is because, as we will show, accent
and voicing equally count as phonological prominence, and a general theory
of prominence that involves tone, length, and sonority as well as accent and
voicing allows us to understand their fundamental nature. Such a theory
also serves to account for why accent quite often interacts with tone, length,
sonority, and voicing. There have been systematic studies on the relation
between accent and one of these four even in the recent OT literature; for
example, accent and tone in de Lacy (1999), accent and sonority in
Kenstowicz (1994b) and de Lacy (2001), and accent and length (syllable
quantity) in Prince & Smolensky (1993) and other work on quantity sensitivity. However, these studies do not aim at elucidating the issues in the
general context of prominence. Instead, Hayes (1995) is the first that attempts to construct an integrated theory of syllable prominence in which
accent placement is sensitive to tone, length, and sonority. Unfortunately,
however, his theory does not make clear the mutual relations between any
two of the concepts of tone, length, sonority, and voicing, but only pays
attention to the sensitivity of accent to the four elements of prominence.

Where voicing and accent meet


Furthermore, it lacks restrictiveness in that the relation of accent to length

can be represented either in metrical grids or in prominence grids. Thus, a
comprehensive and coherent theory is necessary that can account for the
whole system of the relations and interactions among the elements of
To develop such a theory and tackle the specific problems with accent
and voicing, we will take the following steps. First, in section 2.1, we will
show the general schema of our theory and demonstrate how it works with
the harmonic scale of phonological prominence, where the overall interrelations among the elements of prominence will be made clear. Section 2.2
will focus on our particular interest, i.e., the interaction between accent and
voicing, and we will realize the exact nature of their interaction: the harmonically-complete relation (implicational markedness) between them. We
will observe, however, that the harmonic completeness in this implicational
relation does not hold when we take the accentual phenomena of devoiced
vowels in Japanese into consideration, which will be discussed in section
3.1. Then, we will show that they even raise an opacity problem in phonology, which is so serious that we cannot find any solution in the derivational
framework. Instead, as we will argue in section 3.2, the notion of sympathy
in Optimality Theory can give a very simple account of the phenomena,
and the problems of harmonic incompleteness and opacity can be resolved
within that framework in a fairly convincing way.


A general theory of prominence

2.1. Completeness in the harmonic scale of prominence

As Hayes (1995: 271) notes, [h]eavy syllables, or syllables with high tone,
or syllables with low vowels, and so on, tend to sound louder than other
syllables. We can also add voicing to this category, because vowels in the
syllable nucleus are usually voiced and do sound louder than syllables with
voiceless vowels, as we see by comparing normal speech with whisper.
Here, what sounds louder in production is perceptually more salient at the
same time, so phonological elements such as tone, length, sonority, and
voicing can be said to serve as prominence just like accent. Unlike these,
there is another type of prominence that may be called positional prominence, that is, syllable onsets, root-initial syllables, word-initial syllables, etc.
(J. Beckman 1998, de Lacy 2001, among others), but our particular interest

264 Shin-ichi Tanaka

here is in phonetically-driven prominence or inherent prominence which
corresponds to a specific articulatory means to improve perceptual salience:
for example, voicing corresponds to vibration of the vocal cords, sonority
to aperture, tone to pitch, etc. Positional prominence does not involve its
own specific articulatory resource.
To make the discussion clearer, let us consider the phonetic factors or
resources of the elements of prominence by using the chart in (1):

Articulatory resources of phonological prominence



tone, length




pitch, duration

pitch, duration

least prominent


most prominent

As is indicated in (1), voicing is the least prominent element, sonority is the

next, tone and length are equally more prominent than the two, and the
most prominent is accent. This is because more phonetic factors (or more
articulatory effort) accumulate in a more prominent element. For example,
voicing involves only vibration of the vocal cords, and sonority concerns
both vibration and aperture. Furthermore, tone includes high pitch with
vibration and aperture, and length refers to the duration of both. Finally,
accent (including stress) involves intensity as well as the four articulatory
resources, and all of these factors contribute to loudness (here, I use the
term accent to mean both stress accent and pitch accent by definition).
The idea of prominence has several advantages and elucidates the fundamental characteristics of the elements of prominence. First, it correctly captures the implicational relation among the prominence elements: sonorant
segments are voiced segments but not vice versa (e.g., voiced obstruents);
high-toned segments are always sonorants, i.e., vowels or sonorant consonants, but it is not always the case that sonorants bear high tone; and accented vowels have higher pitch (i.e., high tone) and longer duration, but
high-toned vowels or long vowels do not always have accent. Second, this
implicational relation also captures typological differences in prosody: the
intensity of accent in a word does not matter in true tone languages such as
Chinese, but pitch is important in both pitch-accent languages like Japanese
and stress-accent languages like English (the difference between both lan-

Where voicing and accent meet


guages lies in whether pitch gets phonologized as tone or remains somewhat phonetic as tune in the realization of pitch contour). Third, the chart in
(1) accounts for what Hayes (1995: 7) calls the parasitic nature of stress,
which refers to the fact that stress parasitically invokes phonetic resources
that serve other phonological ends. This point is clear from (1), because in
accent or stress, all the articulatory resources are put together to realize
loudness in speech production.
The phonetic characterization of prominence elements in (1) can phonologically be represented as the Harmonic Scale of Prominence in (2),
where A B indicates that B is a proper subset of A, and A > B means A is
more prominent than B:

Harmonic Scale of Prominence

a. voicing sonority tone accent
b. accent > tone > sonority > voicing

(2a) shows a harmonically-complete system of prominence where an element always implies the existence of any element(s) to the left. Especially,
accent presupposes the existence of all the elements in the scale, so accent
often interacts with tone, sonority, and voicing in realizing prominence, as
will be discussed below. Note here that length is not incorporated into this
phonological system. This is because an accented vowel is indeed phonetically longer than an unaccented one but is not necessarily a long vowel in
phonology, although a long vowel tends to attract accent in quantity-sensitive languages. The aspect of quantity sensitivity is captured by the constraint of WEIGHT-TO-STRESS or more strictly speaking, PEAK-PROMINENCE
(Prince & Smolensky 1993) and syllable quantity is a different concept
from syllable prominence (Hayes 1995: 270273). So, in what follows, we
will just consider syllable prominence in line with the scale in (2) and exclude syllable quantity (i.e., length) from our discussion.1
Now let us look at the interplay of accent with the other prominence
elements. As shown in (2), accent implies tone, sonority, and voicing, and
there is a good possibility that accent has an effect on, or is influenced by,
these elements in pitch-accent and stress-accent languages, where accent
works together with the other prominence elements in order to enhance syllable prominence or the culminativity of a word. It is a kind of conspiracy
effect of prominence elements. Such cases are classified into accent-conditioned prominence and prominence-conditioned accent (Tanaka 2005): in
the former case, the behavior of tone, sonority, and voicing is sensitive to

266 Shin-ichi Tanaka

accent position, whereas in the latter, accent placement is sensitive to the
other elements. Since accent is the most dominant element in the scale of
prominence, it can generally be said that we can find the former case more
easily than the latter, due to the dominant nature of accent. This dominance
relation is related to the prosodic hierarchy: voicing is a segmental property, sonority and tone are the properties in the domain of syllable, and accent is characterized by the upper prosodic category, foot.2
For example, in various pitch-accent languages, including Japanese,
tone patterns are determined by connecting the accented mora to the high
tone, as in (3a), but there are only few languages where accent placement
relies on the tone patterns, as in (3b):

Interaction of accent and tone

a. Accent-conditioned tone (Japanese)
kokro heart
| | |

kmakiri mantis
| | | |


habrasi toothbrush
| | | |


otoot little brother

| | | |


niwakame sudden rainfall

| | || |

b. Tone-conditioned accent
(Lithuanian, from Halle & Vergnaud 1987)
viras man Vislas Vistula
| |
| |

vinas wine
| |




viksmas course

The tone patterns in Japanese are derived by linking accent to H, and then
the preceding and following moras are assigned H and L, respectively, with
the proviso that the unaccented initial mora is always L (Haraguchi 1991).
On the other hand, a long vowel in Lithuanian may either have acute (HL)
or circumflex (LH) tone, and accent falls on the first mora linked to H
(Hayes 1995). In both languages, accent and tone cooperate to highlight
prominence in a word. This conspiracy effect is also seen in (4), where accent and sonority agree in prominent position:

Where voicing and accent meet



Interaction of accent and sonority

a. Accent-conditioned sonority (English)
Japn [] / Japanse [] cnduct [A] / condctive []
taly [] / Itlian []
ccident [] / accidntal [e]
b. Sonority-conditioned accent (Winnebago, from Susman 1943)
hiira more
aahia a deerskin hagoria sometimes
gipisge enjoyable herinaga he was and haguhi go to get

(4a) shows that accent loss causes vowel reduction and, conversely, accent
acquirement turns schwas to full vowels; that is, sonority is based on accent
placement.3 (4b) is the opposite case, where accent placement is based on
vowel sonority. Winnebago, a Siouan language, has the sonority hierarchy
of a > o > u > e > i; when accent is assigned on a diphthong, it falls on the
more sonorous vowel (Susman 1943). Although Hayes (1995: 15) reports a
few other cases of sonority-conditioned accent, they are relatively restricted
in number.
The fundamental characteristics of such interactions as in (3) and (4) are
also seen in the case of accent and voicing. In the next section, we will present our main concern, accent and voicing, on the basis of the harmonic
scale in (2).

2.2. Interaction of accent and voicing

Let us briefly review the findings of the previous section. A crucial point is
that the prominence scale in (2) is a harmonically-complete system in which
implicational relations always hold among the elements of prominence in
phonetic, phonological, or typological respects. As for accent and voicing,
it is predicted from the prominence scale that i) accent implies voicing but
not vice versa, ii) in pitch-accent or stress-accent languages, there may be a
conspiracy effect that accent and voicing cooperate to enhance syllable
prominence, and iii) accent-conditioned voicing is more dominant than
voicing-conditioned accent. The prediction in i) is quite natural, since there
is no doubt that accent always falls on a vowel within a word and a vowel
is a voiced and sonorant segment, while it is not necessarily the case that
any vowel bears accent in a word. As for the predictions in ii) and iii), accent-related voicing phenomena are given in (5) and (6), which are accentconditioned voicing and voicing-conditioned accent, respectively. Examples

268 Shin-ichi Tanaka

are taken from English and Japanese to show that both stress-accent and
pitch-accent languages exhibit the phenomena in accordance in ii).

Accent-conditioned voicing
a. /ks/-Voicing (English)
xecute [ks] / excutive [gz] exhbit [gz] / exhibtion [ks]
b. /s/-Voicing (English)
trnsit [s] / transtion [z]

prsody [s] / prosdic [z]

c. Blocking of /l/-Devoicing (English, from Hayes 1995)

celand [l] / Icelndic [l]
thlete [l] / athltic [l]
d. Blocking of high vowel devoicing (Japanese)
hsu hysteria psu piss
hhi baboon ssu soot
sysi seed kut shoes kus comb
kik daisy
tut soil
huk clothes

Voicing-conditioned accent (Sino-Japanese, from Tanaka 2002a)4

b-ka subordinate
z-ken affair
k-zoku noble
z-ki period


*h-ka / hu -k failure
*s-ken / si-kn exam
*k-soku / ki-sku rule
*s-ki / sik four seasons

Voicing in (5a, b) applies when the syllable in question acquires accent,

just like in Verners Law (Halle 2003 and references therein), so this is
clearly a case of the conspiracy of accent and voicing. In (5c), /l/ is devoiced when it is located between /s, T/ and a vowel, but devoicing is
blocked when the following vowel is accented, which means that the accented vowel should be prominent together with the preceding voiced consonants just like (5a,b).5 Blocking of devoicing in the presence of accent is
also seen in Japanese. As in kisetu season, kisoku rule, and sisetu facilities, the high vowels /i, u/, which have the lowest sonority among vowels,
usually undergo devoicing when flanked by voiceless consonants or preceded by a voiceless consonant in word-final position. However, the words
in (5d) whose vowels are both devoiceable in principle show that accent
bans its application (devoicing does not occur on consecutive syllables, as
we will argue in section 3.2). By contrast, the examples in (6) are cases
where accent position is controlled by the application of devoicing: SinoJapanese should have accent immediately before the final morpheme, as in
b-ka subordinate, z-ken affair, etc., but for the words in the right-hand

Where voicing and accent meet


column, devoicing must apply and accent automatically shifts to the adjacent syllable. Note that the blocking effect of devoicing is also seen on the
landing site of the final example sik four seasons. 6 This is because adjacent syllables are usually not devoiceable, as stated above.
What (5) and (6) have in common is that accent and voicing conspire to
maximize syllable prominence. Especially, the presence/absence of accent
and voicing must target the same vowel, because accent implies voicing in
the harmonically-complete system of prominence.
Finally, the following are interesting cases with the interaction between
accent and voicing, where compound accent and Rendaku voicing are in
complementary distribution and the presence of one of them is necessary
and sufficient for word prominence (Zamma, this volume, also discusses
this point):

Names of islands with sima island from Tanaka (2003a, 2005)7

a. Rendaku without accent
sakura-zima miyako-zima isigaki-zima iriomote-zima iou-zima
b. Accent without Rendaku
ituk-sima syoud-sima awaz-sima taneg-sima okin-sima


Personal names with saburou the third son from Haraguchi (2002)8
a. Rendaku without accent
nin-zaburou ken-zaburou dai-zaburou
b. Accent without Rendaku
yo-sburou ki-sburou


tama-sburou tomi-sburou

This complementary distribution of accent and voicing can be given a plausible account in our prominence theory. The domain of prominence in these
cases is the whole word, not the syllable as in the previous cases, and one
word prominence is necessary and sufficient as the basic nature of culminativity of a word. It may be the case that either of them functions as the
prominence that marks the boundary of a compound.

270 Shin-ichi Tanaka


The optionality of devoiced accent and accent shift

3.1. Incompleteness, opacity, and problems in derivational theory

We have seen in the previous section that accent shift in (6) applies so as to
avoid the devoiced vowels, because accent must agree with voicing according to the implicational relation in the prominence scale. In that sense, accent shift is a kind of repair strategy for upholding harmonic completeness
in prominence.
Actually, however, it is a well-known fact that accent can remain on a
devoiced vowel as well as shift to the adjacent vowels (NHK, ed. 1998;
Akinaga, ed. 2001). Moreover, a careful investigation of various data
makes us realize that the movement and directionality of accent is very
complicated, as is clear from the examples in (9):

Optionality and directionality of accent shift

a. Rightward shift across feet
(k)-(sya) / (ki)-(sy) reporter (k)-(so) / (ki)-(s) basics
(s)-(ken) / (si)-(kn) exam (k)-(kai) / (ki)-(ki) machine
(k)-(soku) / (ki)-(sku) rule (bou)(s)-(kake) / (bou)(si)-(kke)
hat rack
b. Leftward shift across feet
(nana)-(h )(sigi) / (nan)-(hu)(sigi) seven wonders
(nana)-(h)(kari) / (nan)-(hi)(kari) seven lights
(influence of parents)
c. Leftward shift within a foot
(bi)(zyut )-(kan) / (bi)(zytu )-(kan) art museum
(dai)(gak )-(sei) / (dai)(gku )-(sei) university student
d. Rightward shift within a foot
(sita)-(k ti)(biru) / (sita)-(ku t)(biru) lower lip
(syoku)(mu)-(stu)(mon) / (syoku)(mu)-(sit)(mon)
police check up
e. Adjacent devoiceable syllables
(s)-(ki) / (si)-(k) four seasons (k)-(ti) / (ki)-(t) base
(h )-(ki) / (hu )-(k) no return (sy )-(ki) / (syu )-(k)
alcoholic smell

Here, feet are already assigned to each word for expository purposes, following the analysis in Tanaka (2001, 2002b): accent is placed by constructing
bimoraic feet from right to left without crossing morpheme boundaries, and
it basically falls on the penultimate foot of each word.

Where voicing and accent meet


What is crucial is the very fact that the non-shifted variants allow accent to
fall on devoiced vowels, a harmonically-incomplete behavior of accent and
voicing. For that matter, vowel devoicing itself is a very strange phenomenon in the first place, since sonorants, including vowels, are preferably
voiced in phonology, which is another aspect of harmonic completeness in
prominence (see also note 1).
There might be a possibility that devoicing is a phonetic rule outside the
grammar, but we do not adopt this idea because, as we will see below, devoicing and its correlation to accent can be phonologized in a constraintbased grammar. Phonetically, pitch cannot be implemented without the
vibration of the vocal cords, and yet the fact that accent stands in the devoiced environment clearly shows that phonetics and phonology are different. In fact, accent should be an abstract entity. This incompleteness suggests
that the Harmonic Scale of Prominence in (2) and its related constraints
may be outranked by the system of compound accent. In other words, compound accent may be respected at the cost of harmonic completeness, or
otherwise accent shift applies by giving priority to harmonic prominence
over compound accent. We will put forward such an account in the next
In addition to harmonic incompleteness, the devoiced accent also poses
another problem with phonological analysis, viz., opacity. More exactly,
harmonic incompleteness may stem from the opacity concerned. As illustrated in (10 a), in derivational terms, accent-shifted forms are obtained in
the feeding order of devoicing and accent shift, with compound accent preceding them. The resulting forms are transparent:
(10) Non-surface-true opacity

Feeding Order

(k)(soku )
transparent (ki)(sku )



(bi)(zytu )(kan)

Counter-Feeding Order



(k)(soku )


(bi)(zyut )(kan)

Compound Accent
Accent Shift
Compound Accent
Accent Shift

272 Shin-ichi Tanaka

On the other hand, non-accent-shifted forms are derived in the counterfeeding order of accent shift and devoicing. In this case, accent shift does
not apply because there is no trigger or environment there. However, devoicing applies after that, so the resulting forms exhibit non-surface-true
opacity, or underapplication of accent shift. We have seen in (9) that both
transparent and opaque outputs are acceptable in the Japanese accentual
grammar. So not only is some theory necessary to obtain opaque outputs, but
also the optionality of accent shift must be accounted for within that theory.
Unfortunately, derivational theory does not serve to do the job. The
opaque forms (k )(soku) and (bi)(zyut )(kan) in (10b) would be difficult to
obtain, precisely because a condition that bans accent on devoiced vowels,
which is a trigger of accent shift in (10a), should not be violated in derivational theory. Recall that in derivational theory, constraints are universal
and inviolable in any case. More crucial cases involve accent shift across
constituent boundaries. Consider the following examples, where the vowel
in question loses its syllable as a consequence of devoicing, because accent-bearing elements are syllables in Japanese: 9
(11) Accent shift under derivational theory
a. Leftward shift
(*) (. *) <(*)>
bi zyutu kan 

( *)
(*) (*) <(*)>

bi zyutu kan

b. Rightward shift
(*) <(. *)>
(*) <(. *)>
ki soku  ki soku
c. Cancellation of extrametricality
( *)
(*)(. *)
(. *)
ki soku  *ki soku
d. Leftward shift
(. *) (*) <(. *)>
(. *) (*)<(. *)>

nana hi kari  nana hi kari

Where voicing and accent meet


(11a) shows how leftward shift occurs after devoicing. We can obtain the
correct form by deleting the syllable on the devoiced vowel. But a problem
occurs with the rightward shift in (11b): the head of the foot cannot go out of
the domain after syllable deletion, and even worse, the final foot is invisible,
so accent shift is not predicted. Even if extrametricality were canceled before the application of devoicing and accent shift, as in (11c), the landing
site of accent shift would be wrong and an ungrammatical form would be
derived. In the same way, leftward shift across constituent boundaries is not
accounted for, as (11d) shows.
Another fundamental question arises when we take into account the fact
that the accent-preserving vowel -t - in the left column of (11a) is still
dominated by a syllable node but its accent-losing counterpart in the right
column of (11a) is not, even though they are equally devoiced. This situation is very puzzling. Moreover, it is unclear what happens to the floating
devoiced vowel that has lost its syllable node: floating segments are erased
by convention, but the devoiced vowel may not be deleted, because vowel
devoicing is distinct from vowel deletion as in sentakki  sentku ki /
sentkki washing machine and suizokkan  suizku kan / suizkkan
aquarium (cf. Kondo, this volume). In short, the syllable node should be
deleted to cause accent shift but it should be preserved to make the distinction between vowel devoicing and vowel deletion, which leads to a paradoxical situation.

3.2. Sympathy in Optimality Theory

Now let us give a constraint-based account of devoiced accent and optional
accent shift in OT. Following the line in Tanaka (2002a), we assume the
following constraints that are ranked in the given order:
(12) Constraints for devoicing and accentuation 10
a. OCP (devoice)
Syllables with devoiced vowels are not adjacent.
b.*C8VC8 and *C8V#
A high vowel is devoiced between voiceless consonants or when
preceded by a voiceless consonant in word-final position.
c. IDENT (voice)
Voicing values remain identical between input and output.

274 Shin-ichi Tanaka

d. *V!8
Accent is not permitted on devoiced vowels.
The accented mora or syllable or foot must not be final in PrWd.
f. ALIGN-R (PrWd, )
The right edge of any PrWd is aligned with the right edge of an
accented syllable.11
(12d) is a specific constraint that reflects the Harmonic Scale of Prominence
we discussed before (we will not enter into strictly formal analysis for the
whole scale; see note 1 for this topic). Tanaka (2002a) demonstrates that
such constraints as in (12) are sufficient to correctly capture the complicated
directionality of accent shift in (9). Actually, however, another mechanism
is necessary to account for the optionality in (9) and obtain the opaque
forms with devoiced accent as well as the transparent ones with accent
shift. (13) is such a mechanism where we follow McCarthy (1999, 2003a)
in assuming the notion of sympathy in grammar:12
(13) Sympathy-related constraints
a. Selector: IDENT (voice)
b. Sympathy Constraint: MAX-O (accent)
There is only one faithfulness constraint in my proposed ranking of (12),
namely, IDENT (voice); thus, this is the selector, as in (13a). Moreover, since
we have been concerned with accent shift as a consequence of devoicing,
the sympathy constraint is MAX-O (accent), as in (13b), which monitors
the similarity in accent between the sympathetic candidate and other candidates. This constraint assigns one violation mark to accent shift but two
violation marks to accent deletion, since the latter is more serious than the
Given the constraints in (12) and (13), the non-surface-true opacity can
be accounted for very straightforwardly, as shown in (14):


Where voicing and accent meet

(14) Sympathy and free ranking



() (ki)+(sku8)



OCP *C8VC8 *C8V#






() (bi)(zytu8)+(kan)



OCP *C8VC8 *C8V#



OCP *C8VC8 *C8V#


OCP *C8VC8 *C8V#

() (si)+(k)





MAX-O *V!8





MAX-O *V!8



MAX-O *V!8








MAX-O *V!8





() (sita)+(ku8t)(biru)





() (nan)+(hu8)(sigi)


MAX-O *V!8


OCP *C8VC8 *C8V#



Here, all the non-shifted forms with devoicing are evaluated as optimal
correctly. More crucially, note here that if we rerank *V!8 over MAX-O
(accent) in (14), as shown by the dotted lines, then the accent-shifted (i.e.
transparent) form of each example is uniformly selected as optimal. So the
two constraints must have free ranking (Anttila 1997, 2002; Anttila and

276 Shin-ichi Tanaka

Cho 1998). This is the reason why both transparency and non-surface-true
opacity are observed in the accentual system of Japanese. However, in the
younger generation, devoiced but non-shifted forms are getting more and
more common (Akinaga 1998: 220),which means that devoicing tends to
apply obligatorily without any accent shift. This clearly shows the recent
establishment of the ranking in (14).

4. Conclusion
In this article, we have developed a theory of prominence by showing evidence for the Harmonic Scale of Prominence from phonetics, phonology,
typology, and prosody. Specifically, we have argued for an implicational
relation among voicing, sonority, tone, and accent by considering their
various phonological interactions for enhancing syllable prominence or
culminativity. Then, we have seen that the well-known phenomenon of
devoiced accent poses such problems as i) incompleteness for prominence
theory, ii) optionality and directionality of accent shift for derivational theory, and iii) non-surface-true opacity for general phonological theory. The
apparent harmonic incompleteness in prominence has much to do with the
opacity of accent and devoicing. Thus, we have proposed a solution in OT
and have demonstrated that the notions of sympathy and reranking serve to
solve all of these problems.

This paper is part of my talk delivered at the Phonology Forum 2001 of the
Phonological Society of Japan, which was held on August 28 at Chiba University. I would like to thank the audience for their helpful comments. I am
also very grateful to Stuart Davis, Jeroen van de Weijer, and an anonymous
reviewer for helping me improve the content and style of this paper. Special
thanks go to Jeroen van de Weijer, Tetsuo Nishihara, and Kensuke Nanjo,
who gave me the opportunity to elaborate my idea in this project. Any remaining inadequacies or misconceptions are my responsibility alone, of
course. This study is supported by the Grant-in-Aid for Scientific Research
(Basic Sciences (C)(2), grant number 15520306) of the Japan Society for
the Promotion of Science.

Where voicing and accent meet


1. We assume here with Hayes (1995) that syllable weight consists of syllable
quantity and syllable prominence. But his theory differs from ours in that
length can be reflected on both moraic structure (i.e., syllable quantity) and
prominence structure. Instead, we assume that length is only a matter of syllable quantity. Incidentally, another interesting approach to prominence is seen
in Anttila (1997), who proposes a constraint-hierarchy system that incorporates
accent, length, and sonority by using Prince & Smolenskys (1993) Harmonic
Alignment, although it does not incorporate voicing or tone and does not capture the implicational relations among prominence elements. If we incorporate
voicing and tone as well in this approach and assume such binary prominence
scales as V! > V (accent), V > V (tone), VV > V (length), V > C (sonority),
and V > V (voicing), Harmonic Alignment can derive constraint hierarchies
like *{V, V!} >> *{V!, V} (accent & tone), *{VV, V!} >> *{V!V, V} (accent &
| |
| |
length; WEIGHT-TO-STRESS or more strictly, PEAK-PROMINENCE), *{V, C!} >>
*{V!, C} (accent & sonority), *{V, V!8} >> *{V!, V8} (accent & voicing), *{C,
V8} >> *{C8, V} (sonority & voicing), etc. See Tanaka (2003b) for the details of
such a theory, where the overall scale in (2) and these specific binary scales are
developed in a uniform fashion.
2. Conversely, length-conditioned accent (i.e., quantity-sensitivity) seems to be
more dominant cross-linguistically than accent-conditioned length (e.g. vowel
lengthening on an accented vowel, vowel shortening with accent loss, etc.).
This is another reason we argue for the division of labor between syllable
quantity and syllable prominence and exclude length from our discussion of
syllable prominence. For the interaction of accent and syllable length, see
McGarrity (2003).
3. We assume here that schwa is less sonorous than non-high vowels.
4. In what follows, morpheme boundaries are represented with .
5. Of course, it is not true that the consonants that precede accent should always
be voiced in such a way as in (5a, b). As is well-known, stops in English may
be voiceless and aspirated when they precede accent. Generally, less sonorous
consonants are preferred to their voiced counterparts in onset position, because
of the Dispersion Principle (Clements 1990, 1992). In Pirah, CV and CVV are
more prominent and attract accent more strongly than GV and GVV, respectively, which is a very rare case (C is a voiceless consonant and G is a voiced
consonant). This is also related to the principle and a steep rise from onset to
nucleus may sound more prominent than a gentle rise.
6. hsu in (5d) and sik in (6) are fairly contrastive in that although they are originally accented on the first syllable, only the latter undergoes devoicing and

278 Shin-ichi Tanaka





accent movement to the right. The former example may pose a problem with
the analysis that we will present in section 3.2, but it is true that his may optionally be acceptable as well. We will leave this question for further study,
since such examples as in (5d) are limited in occurrence and the patterns in (6)
are productive. The difference between them is whether or not they have a
word boundary in their domain.
Examples we are concerned with here are ones in which the modifier of sima
is more than two moras. Exceptions such as nakano-sima and nakadoori-sima
without any accent and Rendaku and hatizyu-zima and kakar-zima with both
are quite rare. As for (7b), syoud-sima, awazsima, and taneg-sima may lead
us to believe that the voiced obstruents immediately before the boundary cause
the blocking of Rendaku (due to the Lymans Law in a wider domain); however, there are island names like isigaki-zima, ogi-zima, and megi-zima, which
contradict such a hypothesis.
The distinction in (8) can be phonologized in such a way that the specifier in
(8a) is CVC or CVV while the one in (8b) is CV or CVCV. It is surprising that
the violation of Lymans Law is acceptable in the head of the names -zabu in
We assume here, following Poser (1990) and Tanaka (1992), that compound
accent is derived by final-foot extrametricality.
(12a) may be a locally self-conjoined constraint banning a devoiced vowel, but
we adopt an OCP-based version for expository purposes.
This constraint is virtually equivalent to Prince & Smolenskys EDGEMOST
(pk; R; Word), which states that a peak of prominence (i.e. accent) lies at the
right edge of PrWd. In addition to (12e, f), there are other constraints for compound accentuation; see Tanaka (2001, 2002b) for the exact hierarchy and the
ranking relation between (12e) and (12f).
McCarthy (2003b) compares various approaches to opacity using comparative
markedness, local conjunction, stratal OT, sympathy, and targeted constraints
only to conclude that each of them has its own advantages and disadvantages.
Empirically, sympathy seems to be the best candidate to account for the data
concerned, so we adopt it here.


Abramson, Arthur S. and Leigh Lisker

Discriminability along the voicing continuum: Cross-language tests.
Proceedings of the 6th International Congress of Phonetic Sciences,
569573. Prague: Academia, Czechoslovak Academy of Sciences.
Akinaga, Kazue
The accent of Standard Japanese. In A Dictionary of Pronunciation
and Accent in Japanese, NHK Institute of Broadcasting Culture (ed.),
174221. Tokyo: NHK Publication.
Akinaga, Kazue (ed.)
A New Concise Dictionary of Japanese Accent. Tokyo: Sanseido.
Anderson, John M. and Charles Jones
Three theses concerning phonological representations. Journal of
Linguistics 10, 126.
Anderson, John M. and Colin J. Ewen
Principles of Dependency Phonology. Cambridge: Cambridge University Press.
Nihongo no Rekishi 2: Moji to no Meguriai [History of Japanese 2:
Contact with Characters] Tokyo: Heibon-sha.
Archangeli, Diana B. and Douglas G. Pulleyblank
Grounded Phonology. Cambridge and London: MIT Press.
Anttila, Arto T.
Deriving variation from grammar. In Variation, Change, and Phonological Theory, Frans Hinskens, Roeland van Hout and W. Leo Wetzels
(eds.), 3568. Amsterdam: John Benjamins.
Morphologically conditioned phonological alternations. Natural Language and Linguistic Theory 20: 142.
Anttila, Arto T. and Young-mee Yu Cho
Variation and change in Optimality Theory. Lingua 104: 3156.
Avery, J. Peter
The Representation of Voicing Contrasts. Doctoral dissertation, University of Toronto.
Avery, J. Peter and William J. Idsardi
Laryngeal dimensions in Japanese phonology. Talk presented at the
Montreal-Ottawa-Toronto Phonology Workshop. Montreal, February

280 Bibliography
Backley, Phillip
Tier geometry: An explanatory model of vowel structure. Doctoral
dissertation, University College London.
Backley, Phillip and Toyomi Takahashi
Element activation. In Structure and Interpretation: Studies in
Phonology (PASE Studies & Monographs 4), Eugeniusz Cyran (ed.),
1340. Lublin: Wydawnictwo Folium.
Bao, Zhiming
On the nature of tone. Doctoral dissertation, MIT.
Beckman, Jill N.
Positional faithfulness, positional neutralization, and Shona vowel
harmony. Phonology 14 (1): 146.
Positional faithfulness. Doctoral Dissertation, University of Massachusetts, Amherst.
Beckman, Mary E.
Segmental duration and the Mora in Japanese. Phonetica 39: 113
Bell, Alan E.
Syllabic consonants. In Universals of Human Language, Volume 2:
Phonology, Joseph H. Greenberg (ed.), 153201. Stanford, California:
Stanford University Press.
Benua, Laura H.
Transderivational identity: Phonological relations between words.
Doctoral dissertation, University of Massachusetts, Amherst.
Bird, Steven G.
Computational phonology: A constraint-based approach. Cambridge:
Cambridge University Press.
Bloch, Bernard
Studies in colloquial Japanese I: Inflection. Journal of the American
Oriental Society 66: 97109 (References are to the version in Miller
1970: 124).
Blumstein, Sheila E., William E. Cooper, Harold Goodglass, Sheila Statlender and
Jonathan Gottlieb
Production deficit in aphasia: a voice onset time analysis. Brain and
Language 9: 153170.
Boersma, Paul P. G.
How we learn variation, optionality, and probability. Proceedings of
the Institute of Phonetic Sciences, Amsterdam 21: 4358. (Available
on the Rutgers Optimality Archive, ROA-221.)
Cabrera-Abreu, Mercedes
A phonological model for intonation without low tone. Bloomington,
Indiana: Indiana University Linguistics Club Publication.



Calabrese, Andrea
A constraint-based theory of phonological markedness and simplification procedures. Linguistic Inquiry 26: 373463.
Campbell, Nick and Yoshinori Sagisaka
Moraic and syllable-level effects on speech timing. Journal of Electronic Information Communication Engineering SP 90107: 3540.
Charette, Monik and Asli Gksel
Licensing constraints and vowel harmony in Turkic languages. In
Structure and Interpretation: Studies in Phonology (PASE Studies &
Monographs 4), Eugeniusz Cyran (ed.), 6588. Lublin: Wydawnictwo
Cho, Young-mee Yu
Language change as constraint reranking. Historical Linguistics 1995.
Amsterdam: John Benjamins.
Chomsky, A. Noam and Morris Halle
The Sound Pattern of English. New York: Harper and Row
Clements, George N.
Tone and syntax in Ewe. In Elements of Tone, Stress and Intonation
Donna J. Napoli (ed.), 2199. Washington, D.C.: Georgetown University Press.
The role of the sonority cycle in core syllabification. In Papers in
Laboratory Phonology I: Between the Grammar and Physics of
Speech, John C. Kingston and Mary E. Beckman (eds.), 283333.
Cambridge: Cambridge University Press.
The sonority cycle and syllable organization. In Phonologica 1988,
Wolfgang U. Dressler, Hans C. Luschtzky, Oscar E. Pfeiffer and
John R. Rennison (eds.), 6376. Cambridge: Cambridge University
Representational economy in constraint-based phonology. In Distinctive Feature Theory, T. Alan Hall (ed.), 71146. Berlin /New
York: Mouton de Gruyter.
Clements, George N. and Susan R. Hertz
Nonlinear phonology and acoustic interpretation. In Actes du XIIme
Congrs International des Sciences Phontiques, Aix-en-Provence,
1924 aot 1991 [Proceedings of the XIIth International Congress of
Phonetic Sciences, Aix-en-Provence, August 1924, 1991], Vol. 1,
364373. Aix-en-Provence: Universit de Provence, Service des
Cohn, Abigail C.
Nasalisation in English: Phonology or phonetics. Phonology 10: 43

282 Bibliography
Coleman, John S. and Janet B. Pierrehumbert
Stochastic phonological grammars and acceptability. In Computational Phonology: Third Meeting of the ACL Special Interest Group
in Computational Phonology, 4956. Association for Computational
Linguistics, Somerset.
Cremelie, Nick and Jean-Pierre Martens
On the use of pronunciation rules for improved word recognition. In
Proceedings Eurospeech 95: 17471750.
Automatic rule-based generation of word pronunciation networks. In
Proceedings Eurospeech 97: 24592462.
In search of better pronunciation models for speech recognition.
Speech Communication 29: 225246.
Dauer, Rebecca M.
The reduction of unstressed high vowels in modern Greek. Journal
of the International Phonetic Association 10: 1727.
de Lacy, Paul V.
Tone and prominence. Ms., University of Massachusetts, Amherst
(Available on the Rutgers Optimality Archive, ROA-333).
Prosodic markedness in prominent positions. Ms., University of
Massachusetts, Amherst (Available on the Rutgers Optimality Archive, ROA-432).
The Formal Expression of Markedness. Ph.D. dissertation, University of Massachusetts, Amherst.
Markedness conflation in Optimality Theory. Phonology 21: 154
End, Kunimoto
Kaiki to ruisui: Ma-gyou no dakuon-gana to sono haikei [Back formation and analogy: The kana of the [b]-column used for the [m]column, and its background]. Gifudaigaku Kyiku Gakubu Kynky
Hkoku: Jinbun 21: 103112.
Kokugo Hyoogen to Onin Genshoo [Expression in Japanese and
Phonological Phenomena]. Tokyo: Shinten-sha.
Flemming, Edward S.
Auditory representations in phonology. Doctoral dissertation, University of California, Los Angeles.
Frisch, Stefan A.
Similarity and Frequency in Phonology. Doctoral dissertation, Northwestern University, Evanston, Illinois.
Fujimoto, Masako and Shigeru Kiritani
Comparison of vowel devoicing for speakers of Tokyo and Kinki
dialects. Journal of the Phonetic Society of Japan 7: 5869.



Fukada, Toshiaki, Takayoshi Yoshimura and Yoshinori Sagisaka

Automatic generation of multiple pronunciations based on nural networks and language statistics. In Proceedings of the ESCA Workshop
on Modeling Pronunciation Variation for Automatic Speech Recognition, Helmer Strik, Judith M. Kessens and Mirjam Wester (eds.),
4146. Rolduc, Kerkrade.
Automatic generation of multiple pronunciations based on nural
networks. Speech Communication 27: 6373.
Fukai, Ichiro (ed.)
Zouhyou Monogatari Kenkyuu to Sou Sakuin [A Study of Zhy
Monogatari: with facsimile and word index]. Tokyo: Musashino Shoin.
Fukazawa, Haruka
Theoretical implications of OCP effects on features in Optimality
Theory. Doctoral dissertation, University of Maryland, College Park.
Fukazawa, Haruka and Mafuyu Kitahara
Domain-relative faithfulness and the OCP: Rendaku revisited. In
Issues in Japanese Phonology and Morphology, Jeroen M. van de
Weijer and Tetsuo Nishihara (eds.), 85109. Berlin /New York:
Mouton de Gruyter.
Fukazawa, Haruka, Mafuyu Kitahara and Mitsuhiko Ota
Lexical stratification and ranking invariance in constraint-based
grammars. In Papers from the 34th Regional Meeting of the Chicago
Linguistic Society, Part Two: The Panels, M. Catherine Gruber, Derrick Higgins, Kenneth S. Olson and Tamra Wysocki (ed.), 4762.
Chicago: Chicago Linguistic Society.
Constraint-based modelling of split phonological systems. In Onin
Kenkyuu [Phonological Studies] 5, the Phonological Society of Japan (ed.), 115120. Tokyo: Kaitakusha.
Fukuda, Suzy E. and Shinji Fukuda
The operation of rendaku in the Japanese specifically languageimpaired: A preliminary investigation. Folia Phoniatrica et Logopaedica 51: 3654.
Fukui, Seiji
Tonal features of numeral sequences in Kinki Dialect [in Japanese].
Studies in Phonetics and Speech Communication IV: 4167. Kinki
Society of Phonetics.
Fukushima, Kunimichi
Nihonkigo gokai [Commentary on the words in Nihonkigo [Ribenjiyu]]. Kokugogaku 36: 4453.
Gandour, Jackson T. and Rochana Dardarananda
Voice onset time in aphasia: Thai I. perception. Brain and Language
17: 2433.

284 Bibliography
Gandour, Jackson T. and Rochana Dardarananda
Voice onset time in aphasia: Thai II. production. Brain and Language 23: 177205.
Grootaers,Willem A.
Nihon no Gengo Chiri Gaku no Tameni [For the Sake of Dialect
Geography in Japan]. 4877. Tokyo: Heibonsha.
Halle, Morris
Verners Law. In A New Century of Phonology and Phonological
Theory: A Festschrift for Professor Shosuke Haraguchi on the
Occasion of His Sixtieth Birthday, Takeru Honma, Masao Okazaki,
Toshiyuki Tabata and Shin-ichi Tanaka (eds.). Tokyo: Kaitakusha.
Halle, Morris and Kenneth N. Stevens
A note on laryngeal features. MIT Quarterly Progress Report of the
Research Laboratory of Electronics 101: 198213.
Knowledge of language and the sounds of speech. In Music, Language, Speech and Brain: Proceedings of an International Symposium at the Wenner-Gren Center, Stockholm, 58 September 1990,
Johan Sundberg, Lennart Nord and Rolf Carlson (eds.), 119.
Houndmills: MacMillan Press.
Halle, Morris and Jean-Roger Vergnaud
An Essay on Stress. Cambridge, Massachusetts: MIT Press.
Hamada, Atsushi
Hatsuon to dakuon to no sookansei no mondai [Issues in relativity
between moraic nasals and voiced obstruents]. Kokugo-Kokubun
21(3), 1832.
Kouji gonen Chousen-ban Iroha Onmon taion kou [Thoughts on
correspondence of Hangul [to Japanese kana] seen in Iroha (Iropa)
of the fifth year of Kouji [1492] printed in Korea]. Kokugo-Kokubun
21 (10): 2232.
Haneru-on [Moraic nasals]. In Kokugogaku Jiten, Kokugo Gakkai
(ed.), 750751. Tokyo: Tokyo-do.
Rendaku to renjou [Rendaku and sandhi]. Kokugo-Kokubun 29 (10):
Sei daku [Sei-daku: Clear-muddy]. Kokugo-Kokubun 40 (11): 4051.
Hamano, Shoko
Voicing of obstruents in Old Japanese: Evidence from the soundsymbolic stratum. Journal of East Asian Linguistics 9: 20725.
Han, Mieko, S.
Japanese phonology. Tokyo: Kenkyusha.
Unvoicing of vowels in Japanese. Study of Sounds 10: 81100.
Acoustic manifestations of mora timing in Japanese. Journal of the
Acoustical Society of America 96: 7382.



Haraguchi, Shosuke
The Tone Pattern of Japanese: An Autosegmental Theory of Tonology.
Tokyo: Kaitakusha.
A Theory of Stress and Accent. Dordrecht: Foris.
A theory of voicing. In A Comprehensive Study on the Phonological
Structure of Languages and Phonological Theory, Shosuke Haraguchi (ed.), 122. Technical Report of Basic Sciences (A)(1), Grant-inAid for Scientific Research by the Japan Society for the Promotion
of Science.
Harris, John K. M.
Segmental complexity and phonological government. Phonology 7:
English Sound Structure. Oxford: Blackwell.
Licensing Inheritance: An integrated theory of neutralisation. Phonology 14: 315370.
Phonological universals and phonological disorder. In Linguistic
Levels in Aphasia: Proceedings of the RuG-SAN-VKL Conference on
Aphasiology, Evy Visch-Brink and Roelien Bastiaanse (eds.), 91117.
San Diego, CA: Singular Publishing Group.
Harris, John K. M. and Geoffrey A. Lindsey
The elements of phonological representation. In Frontiers of Phonology: Atoms, Structures, Derivations, Jacques Durand and Francis
X. Katamba (eds.), 3479. Harlow, Essex: Longman.
Vowel patterns in mind and sound. In Phonological knowledge:
Conceptual and empirical issues, Noel Burton-Roberts, Philip Carr
and Gerry J. Docherty (eds.), 185205. Oxford: Oxford University
Hasegawa, Kiyoshi, Katsuaki Horiuchi, Tsutomu Momozawa and Saburo Yamamura
Obunshas Comprehensive Japanese-English Dictionary. Tokyo:
Hashimoto, Shinkichi
Kokugo kanazukai kenkyshij no ichihakken. Teikoku Bungaku 23
[References are to the version in Hashimoto (1949: 123163).].
Kokugo ni okeru biboin [Nasalized vowels in Japanese]. Kokugo
Onin no Kenkyuu. Tokyo: Iwanami.
Moji Oyobi Kanadzukai no Kenkyuu. Tokyo: Iwanami.
Hattori, Noriko
Mechanisms of word accent change: Innovations in Standard Japanese. Doctoral dissertation, University College, London.
Hattori, Shiro
On two-syllable words uttered in Kameyama-cho area in Mie Prefecture. Bulletin of The Phonetic Society of Japan 11.

286 Bibliography
Hattori, Shiro
Phoneme, phone, and compound phone. Gengo Kenkyu 16: 92109
(Revised version appeared in Gengogaku no Hoohoo [Methods in
Linguistics]. Tokyo; Iwanami, 1960).
Gengogaku no Hoohoo [Methods in Linguistics]. Tokyo: Iwanami.
Hayata, Teruhiro
Nihongo no onin to rizumu [Sounds and rhythm of Japanese]. Dentou to Gendai 45: 4149.
Seisei akusento ron [Generative accentuation] In no, Susumu and
Takeshi Shibata (eds.), Nihongo 5: Onin [Japanese 5: Sounds], 323
Hayes, Bruce P.
Compensatory lengthening in moraic phonology. Linguistic Inquiry
20: 253306.
Metrical Stress Theory: Principles and Case Studies. Chicago: The
University of Chicago Press.
Hayes, Bruce P. and Donca Steriade
Introduction: the phonetic basis of phonological markedness. In
Phonetically-Based Phonology, Bruce Hayes, Robert Kirchner and
Donca Steriade (eds.), 133. Cambridge: Cambridge University Press.
Hepburn, James Curtis
A Japanese and English Dictionary with an English and Japanese
Index. Shanghai: American Presbyterian Mission Press [Reprinted in
1983. Tokyo: Charles E. Tuttle].
Hibiya, Junko
Variationist Sociolinguistics. The Handbook of Japanese Linguistics,
Natsuko Tsujimura (ed.), 101120. Cambridge, Mass.: Blackwell.
Hinskens, Frans and Jeroen M. van de Weijer
Patterns of segmental modification in consonant inventories: A crosslinguistic study. Linguistics 41 (6): 10411084.
Hirayama, Teruo, Ichiro Oshima, Makio Ono, Makoto Kuno, Mariko Kuno and
Takao Sugimura
Gendai Nihongo Hgen Daijiten [Dictionary of Japanese Dialects].
Tokyo: Meiji Shoin.
Hirose, H. et al.
Analysis and formulation of the prosodic features of Standard Mandarin Chinese. The Journal of the Acoustical Society of Japan 50 (3):
Honda, Kiyoshi, Hiroyuki Hirai, Shinobu Masaki and Yasuhiro Shimada
Role of Vertical Larynx Movement and Cervical Lordosis in F0
Control. Language and Speech 42: 401411.



Huang, Xuedong, Alex Acero and Hsian-Wuen Hon

Spoken Language Processing: A Guide to Theory, Algorithm, and
System Development. Upper Saddle River, NJ: Prentice Hall PTR.
Hulst, Harry G. van der
Atoms of segmental structure: Components, gestures and dependency.
Phonology 6: 253284.
Radical CV Phonology: The categorial gesture. In Frontiers of phonology: Atoms, structures, derivations, Jacques Durand and Francis
X. Katamba (eds.), 80 116. Harlow, Essex: Longman.
Hume, Elizabeth V. and Georgios Tserdanelis
Labial unmarkedness in Sri Lankan Portuguese creole. Phonology 9:
Hyman, Larry M.
Phonology: Theory and Analysis. New York: Holt, Rinehart and
Hwang, Mei-Yuh and Xuedong Huang
Shared-Distribution Hidden Markov Models for Speech Recognition.
IEEE Trans. on Speech and Audio Processing 1 (4): 414420.
Ide, Itaru
Manyougana [Many-gana]. In Kanji-kza 4: Kanji to kana [Kanji
and kana], Kiyoji Sato (ed.), 225255. Tokyo: Meiji Shoin.
Imaizumi, Satoshi, Akiko Hayashi and Toshisada Deguchi
Listener adaptive characteristics of vowel devoicing in Japanese Dialogue. Journal of the Acoustical Society of America 98(2): 768778.
Inkelas, Sharon
The theoretical status of morphologically conditioned phonology:
a case study of dominance effects. In Yearbook of Morphology 1997,
Geert E. Booij and Jaap van Marle (eds.), 121155. Dordrecht: Kluwer.
Inoue, Fumio
Tohoku hogen no hensen: Shonai hogen rekishi gengogaku teki
koken [History of Tohoku dialects: Contribution of historical linguistics of Shnai dialect]. Tokyo: Akiyama.
Inoue, Michiyasu
Manyoushuu zakkou [Thoughts on Manysh]. Tokyo: Meiji-shoin.
It, Junko
Prosodic minimality in Japanese. In CLS 26-II: Papers from the
Parasession on the Syllable in Phonetics and Phonology, K. Deaton,
Manuela Noske and M. Ziolkowski (eds.), 213239.
It, Junko, Yoshihisa Kitagawa and Ralf-Armin Mester
Prosodic faithfulness and correspondence: Evidence from a Japanese
argot. Journal of East Asian Linguistics 5: 21794.

288 Bibliography
It, Junko and Ralf-Armin Mester
The phonology of voicing in Japanese. Linguistic Inquiry 17: 4973.
Licensed segments and safe paths. Canadian Journal of Linguistics
38: 197213
Japanese phonology. In Handbook of Phonological Theory, John A.
Goldsmith (ed.), 817838. Cambridge: Blackwell.
The core-periphery structure of the lexicon and constraints on
reranking. In University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory, Jill N. Beckman, Suzanne
C. Urbanczyk and Laura Walsh Dickey (eds.), 181210. Amherst:
Stem and word in Sino-Japanese. In Phonological Structure and
Language Processing: Cross-linguistic Studies, Takeshi Otake and
Anne Cutler (eds.), 1344. Berlin /New York: Mouton de Gruyter.
Correspondence and compositionality: The ga-gy variation in Japanese phonology. In Derivations and Constraints in Phonology, I. M.
Roca (ed.), 419462. New York: Oxford University Press.
Markedness and word structure: OCP effects in Japanese. Ms. University of California, Santa Cruz (Available on the Rutgers Optimality Archive, ROA-255).
The phonological lexicon. In The Handbook of Japanese Linguistics,
N. Tsujimura (ed.), 62100. Malden, Mass. and Oxford, U.K: Blackwell Publishers.
The lexicon in Optimality Theory. Handout presented at University
of Tsukuba, Special Research Project for the Typological Investigation of Languages and Cultures of the East and West.
Weak parallelism and modularity: Evidence from Japanese. In Report
of the Special Research Project for the Typological Investigation of
Languages and Cultures of the East and West III, Part I, Shosuke
Haraguchi (ed.), 89 105. Ibaraki: University of Tsukuba.
Covert generalizations in Optimality Theory: the role of stratal faithfulness constraints. In Proceedings of 2001 International Conference
on Phonology and Morphology, 333, Yongin, Korea.
Japanese Morphophonemics: Markedness and Word Structure.
Cambridge, Mass.: MIT Press.
It, Junko, Ralf-Armin Mester and Jaye E. Padgett
Licensing and underspecification in Optimality Theory. Linguistic
Inquiry 26: 571614.
Lexical classes in Japanese: A reply to Rice. Phonology at Santa
Cruz 6: 3946.
Itoh, Motonobu, Itaru F. Tatsumi and Sumiko Sasanuma
Voice onset time perception in Japanese aphasic patients. Brain and
Language 28: 7185.



Iwabuchi, Etsutar
Youkyoku no utai-kata ni okeru nisshou tsu ni tusite [On the entering tone [=coda] -t in the singing of ykyoku]. Kokugo to Kokubungaku 11: 5, 7 and 9. (98117, 91101 and 8595, respectively)
Iwanami Shoten Henshbu (ed.)
Gyakubiki Kjien. Tokyo: Iwanami.
Jakobson, Roman
Child language, aphasia and phonological universals. The Hague:
Jakobson, Roman, C. Gunnar M. Fant and Morris Halle
Preliminaries to speech analysis. Cambridge, Mass.: MIT Press.
Jessen, Michael and Catherine O. Ringen
On the status of [voice] in German. In WCCFL 20: 304317.
Jdaigo Jiten Hensh Iinkai (ed.)
Jidaibetsu kokugo daijiten: jdaihen. Tokyo: Sanseid.
Jun, Sun-Ah
The phonetics and phonology of Korean prosody. Unpublished Ph.D.
dissertation. The Ohio State University, Columbus, Ohio.
Jun, Sun-Ah and Mary E. Beckman
A gestural-overlap analysis of vowel devoicing in Japanese and
Korean. Paper presented at the 1993 Annual Meeting of the LSA,
Los Angeles, 710 January, 1993.
Jurafsky, Daniel and James H. Martin
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.
Upper Saddle River, NJ: Prentice Hall.
Kager, Ren W. J.
Optimality Theory. Cambridge: Cambridge University Press.
Kamei, Takashi
Kana wa naze dakuon senyou no jitai o motanakatta ka-o megutte
kataru [Discussing why kana did not have letters only for daku-on].
Hitotsubashi Daigaku Kenkyuu Nenpou: Jinbun Kagaku Kenkyuu
12: 192.
Dakuon [Voiced sounds]. Heibonsha Hyakka Jiten 9: 227228.
Tokyo: Heibonsha.
Kamei, Takashi, Rokuro Kono and Eiichi Chino
Gengogaku daijiten selection: Nihonrett no gengo [Linguistics encyclopedia selection: Language in the Japanese archipelago]. Tokyo:
Kasuga, Kazuo
Kojiki ni okeru seidaku kakiwake ni tsuite [On the sei-daku distinctions in Kojiki]. Kokugo-Kokubun 11(4): 3978.

290 Bibliography
Kawakami, Shin
Musee haku no tsuyosa to akusento kaku [The intensity of devoiced
moras and the accent nucleus]. Kokugakuindai Kokugo Kenkyu 27.
Kawasaki, Takako
Sonority and voicing: a structural analysis. Ms., McGill University,
Montreal, Quebec.
Kaye, Jonathan D., Jean Lowenstamm and Jean-Roger Vergnaud
The internal structure of phonological representations: A theory of
charm and government. Phonology Yearbook 2: 305328.
Constituent structure and government in phonology. Phonology 7:
Kazama, Rikiz (ed.)
Tsuzuriji gyakujun hairetsu goksei ni yoru Daigenkai bunrui goi.
Tokyo: Fuzanb.
Kenstowicz, Michael J.
Phonology in Generative Grammar. Oxford: Blackwell.
Sonority-driven stress. Ms., Massachusetts Institute of Technology,
Cambridge (Available on the Rutgers Optimality Archive, ROA-33).
Kess, Joseph E. and Tadao Miyamoto
The Japanese Mental Lexicon. Psycholinguistic Studies of Kana and
Kanji Processing. Amsterdam and Philadelphia: John Benjamins.
Kikuchi, Hideaki and Kikuo Maekawa
Accuracy of automatic phoneme labeling on spontaneous speech.
Proceedings of the 2002 Spring meeting of the Acoustical Society of
Japan: 9798.
Kikuda, Norio
Ygen no rendaku no ichiyin [xxx]. Kaishaku 17 (5): 2429.
Kindaichi, Haruhiko and Kazue Akinaga
Shinmeikai Accent Dictionary of the Japanese Language. Tokyo:
Kindaichi, Haruhiko, Ooki Hayashi and Takesi Sibata (eds.)
An Encyclopaedia of the Japanese Language. Tokyo: Taishukan.
Kindaichi, Kyosuke
Kokugo no hensen [Transition of Japanese Language]. Tokyo: NHK.
Kiparsky, R. Paul V.
Lexical Phonology and Morphology. In Linguistics in the Morning
Calm, the Linguistic Society of Korea (ed.), 391. Seoul: Hanshin.
Some consequences of Lexical Phonology. Phonology Yearbook 2:
Kitahara, Mafuyu
The interaction of pitch accent and vowel devoicing in Tokyo Japanese. In Japanese-Korean Linguistics 8, D. Silvia (ed.), 30315.
Stanford, CA: CSLI.



Kitahara, Yasuo (ed.)

Nihongo gyakubiki jiten [Japanese Reverse Dictionary]. Tokyo:
Kiyose, Gisabur N.
Heianch hagy-shiin p-onron [The study of ha-gyo consonant psound in the Heian period]. Onsei no Kenky 21: 7387.
Kohler, Klaus J.
Phonetic explanation in phonology: the feature fortis/lenis. Phonetica
41: 150174.
Segmental reduction in connected speech in German: phonological
facts and phonetic explanations. In Speech Production and Speech
Modelling, William J. Hardcastle and Alain Marchal (eds.), 6992.
Dordrecht: Kluwer.
Komatsu, Hideo
Nihon seishoushi ronkou [Study of the history of Japanese accent].
Tokyo: Kazama shob.
Kondo, Mariko
Temporal adjustment of devoiced morae in Japanese. Proceedings of
the 13th International Congress of Phonetic Sciences 3: 238241.
Mechanisms of vowel devoicing in Japanese. Doctoral dissertation,
University of Edinburgh.
Vowel devoicing and syllable structure in Japanese. Japanese/Korean
Linguistics 9.
Speech rhythm and consonant sequence production in Japanese. The
Proceedings of the 6th International Seminar on Speech Production,
Kubozono, Haruo
The Organization of Japanese Prosody. Doctoral dissertation, Edinburgh University (Published by Kurosio Publishers, 1993).
The Organization of Japanese Prosody. Tokyo: Kurosio Publishers.
Gokeisei to Onin Koozoo [Word Formation and Phonological
Structure]. Tokyo: Kurosio Publishers.
Syllable and accent in Japanese. The Bulletin of the Phonetic Society
of Japan 211: 7182.
Kintaroo-to Momotaroo-no akusento-koozoo [The structure of accent
in Kintaroo and Momotaroo]. Kobe Gengogaku Ronsoo 1: 3549.
Prosodic structure of loanwords in Japanese: Syllable structure, accent and morphology. Journal of the Phonetic Society of Japan 6 (1):
The syllable as a unit of prosodic organization in Japanese. In The
Syllable in Optimality Theory, Caroline Fry and Ruben van de Vijver
(eds.), 12956. Cambridge: Cambridge University Press.

292 Bibliography
Kula, Nancy Chongo and Lutz Marten
Aspects of nasality in Bemba. SOAS Working Papers in Linguistics
and Phonetics 8: 191208.
Kuno, Susumu
The Structure of the Japanese Language. Cambridge, Mass: MIT
Kuroda, Shige-Yuki
Contrast in Japanese. A contribution to feature geometry. Paper presented at the Second International Conference on Contrast in Phonology. University of Toronto, Toronto, Ontario, Canada. May 3,
Kuwabara, Hisao and Kazuya Takeda
Analysis and prediction of vowel-devocalization in isolated Japanese
words. ATR Technical Report TR-I-0033. Kyoto: ATR Interpreting
Telephony Research Laboratories.
Labrune, Laurence
Variation intra et inter-langue: Morpho-phonologie du rendaku en
japonais et due sai-sios en coren. In Phonologie: thorie et variation, Cahiers de grammaire 24: 117152.
Lange, Roland A.
The Phonology of Eighth-Century Japanese. Tokyo: Sophia University.
Liberman, Mark Y. and Alan S. Prince
On stress and linguistic rhythm. Linguistic Inquiry 8: 249336.
Lisker, Leigh and Arthur S. Abramson
A cross-language study of voicing in initial stops: acoustical measurements. Word 20: 384422.
The voicing dimension: some experiments in comparative phonetics.
Proceedings of the Sixth International Congress of Phonetic Sciences,
Prague 1967, 563567. Prague: Academia, Czechoslovak Academy
of Sciences.
Lombardi, Linda
Laryngeal features and privativity. The Linguistic Review 12: 3559.
Why place and voice are different: Constraint-specific alternations in
Optimality Theory. In Segmental Phonology in Optimality Theory, L.
Lombardi (ed.), 1345. Cambridge: Cambridge University Press.
Lyman, Benjamin S.
Change from surd to sonant in Japanese compounds. Oriental Club
of Philadelphia.
Mabuchi, Kazuo
Kokugo on-in ron [Japanese Phonology]. Tokyo: Kasama Shoin.
Maddieson, Ian
Patterns of Sounds. Cambridge: Cambridge University Press.



Maddieson, Ian and Peter N. Ladefoged

Phonetics of partially nasal consonants. In Phonetics and Phonology
Vol. 5, Nasals, Nasalization, and the Velum, M. K. Huffman and R.
A. Krakow (eds.), 251301. San Diego: Academic Press.
Maeda, Hiroyuki
Seiyaku joretsu to nyuuryokukei no henka [Constraint ranking and
shift of inputs]. In Nihongoshi no rironteki jisshouteki kiban no
saikouchiku [Reconstructing the theoretical and empirical bases of
the history of Japanese], Grant-in-Aid Scientific Research, Ministry
of Education, Culture, Sports, Science and Technology.
Maekawa, Kikuo
Boin-no museika [Devoicing of vowels]. In Nihon-go no OnseiOnin (1), M. Sugito (ed.), 135153, Meiji Shoin.
Effects of speaking rate on the voicing variation in Japanese. Technical Report of the Institute of Electronics, Information and Communication Engineers (SP89-148): 4753.
Production and perception of the accent in the consecutively devoiced
syllables in Tokyo Japanese. Proceedings of International Conference on Spoken Language Processing (ICSLP) 2: 517520. Kobe.
Hanashikotobani okeru chouboinno tanko. Kokugogakkai 2002
nendo syunkitaikai youshisyuu: 4350.
Study of language variation using Corpus of Spontaneous Japanese.
Journal of Phonetic Society of Japan, 6 (3): 4859.
Maekawa, Kikuo, Hanae Koiso, Sadaoki Furui and Hitoshi Isahara
Spontaneous speech corpus of Japanese. Proceedings of the Second
International Conference of Language Resources and Evaluation
(LREC) 2: 947952.
Maekawa, Kikuo, Hideaki Kikuchi, Yosuke Igarashi and Jennifer J. Venditti
X-JToBI: An extended J_ToBI for spontaneous speech. Proceedings
of the 7th International Conference on Spoken Language Processing
(ICSLP2002) 3: 15451548. Denver.
Marten, Lutz
Swahili vowel harmony. SOAS Working Papers in Linguistics and
Phonetics 6: 6175.
Martin, Samuel E.
Morphophonemics of Standard Colloquial Japanese. Supplement to
Language (Language Dissertation No. 47). Baltimore: Linguistic
Society of America.
A Reference Grammar of Japanese. New Haven: Yale University
The Japanese Language through Time. New Haven: Yale University

294 Bibliography
Maruyama, Rinpei
Joudaigo Jiten. (Dictionary of Jdai [710794] vocabulary). Tokyo:
Meiji Shoin.
Mathias, Gerald B.
On the modification of certain Proto-Korean-Japanese reconstructions. Papers in Japanese Linguistics 2: 3147.
Matsui, F. Michinao
Museihaku joo no akusento kaku no chikaku ni tsuite [Perceptual
study of the accent on devoiced accented mora]. Paper presented at
the 28th Kinki Onsei Gengo Kenkyuukai, Osaka, Japan.
Matsumoto, Takashi
Ma-gyou on ba-gyou on koutai genshou no keikou [Tendency of
alternations between the [b]-column sounds and the [m]-column
sounds]. Kokugogaku Kenkyuu 5: 5265.
Matsumura, Akira (ed.)
Daijirin [Daijirin Japanese Dictionary]. Tokyo: Sanseid.
Matthews, Peter H.
Morphology: An Introduction to the Theory of Word Structure.
Cambridge: Cambridge University Press.
McCarthy, John J.
Sympathy and phonological opacity. Phonology 16: 331399.
Sympathy, cumulativity, and the Duke-of-York Gambit. In The
Optional Syllable, Caroline Fry and Ruben van de Vijver (eds.).
Cambridge: Cambridge University Press.
Comparative Markedness. Ms., University of Massachusetts, Amherst.
Available on Rutgers Optimality Archive, ROA- 489].
McCarthy, John J. and Alan S. Prince
Faithfulness and Reduplicative Identity. University of Massachusetts
Occasional Papers in Linguistics 18: Papers in Optimality Theory,
Jill N. Beckman, Suzanne C. Urbanczyk and Laura Walsh Dickey
(eds.), 249384. Amherst: GLSA.
McCawley, James D.
The Phonological Component of a Grammar of Japanese. The
Hague: Mouton.
Accent in Japanese. In Studies in Stress and Accent, Southern California Occasional Papers in Linguistics 4, Larry M. Hyman (ed.),
261302. Los Angeles: University of Southern California Department of Linguistics.
McGarrity, Laura W.
Constraints on patterns of primary and secondary stress. Doctoral
Dissertation, Indiana University.
Mielke, Jeffrey
The emergence of distinctive features. Ph.D. dissertation, Ohio State



Miller, Roy A.
The Japanese Language. Chicago: University of Chicago Press.
Nihongo: In Defence of Japanese. London: Athlone.
Miller, Roy A. (ed.)
Bernard Bloch on Japanese. New Haven: Yale University Press.
Miyake, Marc H.
Old Japanese: A Phonetic Reconstruction. London: Routledge Curzon.
Mori, Hiromichi
Kodai no onin to Nihonshoki no seiritsu [Sounds of Old Japanese
and completion of Nihonshoki]. Tokyo: Taishukan.
Murayama, Tadashige
Nihon-no Myooji Besuto 10,000 [Top 10,000 Surnames in Japanese].
Tokyo: Shin-Jinbutsu-Ooraisha.
Nakagawa, Yoshio
Rendaku, Rensei (Kashou) no Keifu [Compounds with Rendaku and
Compounds without Rendaku]. Kokugo Kokubun 35 (6): 302314.
Kyoto: Kyoto University.
Nakata, Norio (ed.)
Kooza kokugo-shi 2: Onin-shi Moji-shi [History of sounds and
characters]. Tokyo: Taishkan.
Nakata, Norio and Hiroshi Tsukishima
Dakuten [Daku-ten]. In Kokugogaku daijiten [Dictionary of National
Language Study], Kokugo Gakkai (ed.), 586587. Tokyo: Tokyodo.
Napoli, Donna J. and Marina A. Nespor
The syntax of raddoppiamento sintattico. Unpublished Ms.
Nasu, Akio
Onomatope-ni okeru yuuseika-to [p]-no yuuhyoosei [Voicing in
onomatopoeia and the markedness of [p]]. Journal of the Phonetic
Society of Japan 3: 5266.
Heiretugo akusento no yure to keisan [Accent of Japanese dvandva
and its variations]. Paper presented at the 26th Annual Meeting of
Kansai Linguistic Society.
Nasukawa, Kuniya
Melodic structure and no constraint-ranking in Japanese verbal inflexion. Paper presented at the Autumn Meeting of the Linguistic
Association of Great Britain. University of Essex.
An integrated approach to nasality and voicing. In Structure and
Interpretation: Studies in Phonology (PASE Studies & Monographs
4), Eugeniusz Cyran (ed.), 205225. Lublin: Wydawnictwo Folium.
Prenasalisation and melodic complexity. UCL Working Papers in
Linguistics 11: 207224.

296 Bibliography
Nasukawa, Kuniya
A Unified Approach to Nasality and Voicing. Berlin /New York:
Mouton de Gruyter.
Melodic complexity in infant language development. In Developmental Paths in Phonological Acquisition, Marina Tzakosta, Claartje
Levelt and Jeroen van de Weijer (eds.). Leiden Papers in Linguistics
2 (1): 5370.
Nihon Daijiten Kankkai (ed.)
197276 Nihon kokugo daijiten [Grand Japanese Dictionary]. Tokyo: Shgakukan.
Nihon Hoso Kyokai [NHK]
NHK Nihongo Hatsuon Akusento Jiten [NHK Pronunciation and
Accent Dictionary of Japanese]. 1st ed., Nihon Hoso Shuppan
Kyokai, Tokyo.
NHK Nihongo Hatsuon Akusento Jiten [NHK Pronunciation and
Accent Dictionary of Japanese]. 2nd ed., Nihon Hoso Shuppan
Kyokai, Tokyo.
Nishihara, Tetsuo
Tohoku hgen ni okeru Shiin no Yseika [On consonant voicing in
the Tohoku dialect]. Miyagi Kyiku Daigaku Gaikokugo Kenky
Ronsh 2: 1924. Miyagi University of Education, Sendai.
Nishimiya, Kazutami
Joudai-go no seidaku: shakkun moji o chshin to shite [Sei-daku in
Jdai Japanese [710794]: focusing on the characters for kun readings]. Many 36: 119.
Ogura, Sinpei
Lyman-si no rendaku-ron [Lymans theory of sequential voicing].
Kokugakuin zassi [Journal of the National Research Institute] 16 (7):
Ohala, John J.
The origin of sound patterns in vocal tract constraints. The Production
of Speech, P. F. MacNeilage (ed.), 189216. New York: Springer.
Ohno, Kazutoshi
The lexical nature of Rendaku in Japanese. In Japanese/Korean
Linguistics, Vol. 9, Mineharu Nakayama and Charles J. Quinn, Jr.
(eds.), 151164. Stanford: CSLI publications and Stanford
Linguistics Association.
Rules or lexicon: sticking to rules or giving them up. Presentation at
the Second Conference on Formal Linguistics. June 2223. Hunan
University, Changsha, Hunan, China.
Analogy: guessable rules Towards a better understanding of the
rendaku phenomenon. In Proceedings of LP2002, Shosuke Haraguchi, Bohumial Palek and Osamu Fujimura (eds.).



Okumura, Mitsuo
Rendaku. In Kokugogaku jiten, Kokugo Gakkai (ed.), 916961.
Tokyo: Tkyd.
Ono, Masahiro
Kindai no moji [Characters in Kindai (in and after 1338)]. In Gaisetu
Nihongo no rekishi [Survey of the history of Japanese], Takeyoshi
Sat (ed.), 4283. Tokyo: Asakura.
no, Susumu
194748 Nihonshoki no jion-gana ni okeru seidaku hyouki ni tuite. [On the
sei-daku representations by kana of the on reading in Nihonshoki].
Kokugo to Kokubungaku [Japanese Language and Literature]
24 (11): 4959 and 25(1): 4350.
Joudai Kana-dzukai no Kenkyuu. [Study of Kana Usage in Joodai
(710794)]. Tokyo: Iwanami.
Nihongo no sekai 1: Nihongo no seiritsu [World of Japanese 1: Formation of Japanese]. Tokyo: Chkronsha.
Oohashi, Junichi
Tohoku hogen onsei no kenkyu [Study of the Sounds of Tohoku dialects]. Tokyo: Oufuu.
Orgun, Cemil Orhan
Cyclic and noncyclic phonological effects in a declarative grammar.
In Yearbook of Morphology 1997, Geert E. Booij and Jaap van Marle
(eds.), 179218. Dordrecht: Kluwer.
Otsu, Yukio
Some aspects of rendaku in Japanese and related problems. In Theoretical issues in Japanese linguistics: MIT Working Papers in Linguistics 2, Yukio Otsu and Ann Farmer (eds.), 207227.
tsubo, Heiji
Katakana, hiragana [Katakana and hiragana]. In Nihongo 8: Moji
[Japanese 8: Characters], Susumu no and Takeshi Shibata (eds.),
249299. Tokyo: Iwanami.
Parker, Charles K.
A dictionary of Japanese compound verbs. Tokyo: Maruzen.
Pater, Joseph V.
Austronesian nasal substitution and other NC effects. In The prosody-morphology interface, Ren W. J. Kager, Harry G. van der Hulst
and Wim Zonneveld (eds.), 310 343. Cambridge: Cambridge University Press.
Pater, Joseph V. and Adam Werle
Typology and variation in child consonant harmony. Proceedings of
HILP 5: 119 139.
Pierrehumbert, Janet B. and Mary E. Beckman
Japanese Tone Structure. Cambridge, Mass: MIT Press.

298 Bibliography
Ploch, Stefan
Nasals on my mind: the phonetic and the cognitive approach to the
phonology of nasality. Doctoral dissertation, School of Oriental and
African Studies, University of London.
Polivanov, Yevgeny D.
Two kinds of musical accent of the Mie dialect in Nagasaki Prefecture. Studies on the Japanese Language (translated by S. Murayama
1976): 61.
Port, Robert F., Jonathan M. Dalby and Michael L. ODell
Evidence for mora timing in Japanese. Journal of the Acoustic Society
of America 81 (5): 15741585.
Poser, William J.
Evidence for foot structure in Japanese. Language 66: 78105.
Japanese periphrastic verbs and noun incorporation. Ms., University
of Pennsylvania.
Prince, Alan S.
Foundations of Optimality Theory; Current directions in Optimality
Theory. In Handouts of lecture at the Phonology Forum 1998, Kobe
University, September 1998. Phonological Studies 2, the Phonological Society of Japan (ed.). Tokyo: Kaitakusha.
Prince, Alan S. and Paul Smolensky
Optimality Theory: Constraint interaction in generative grammar.
Ms., Rutgers University and University of Colorado. [Blackwell, Oxford, 2004.]
Pulleyblank, Douglas G.
Optimality Theory and features. Optimality Theory: An Overview,
D. Archangeli and T. Langendoen (eds.), 59101. Massachusetts,
USA and Oxford, UK: Blackwell.
Covert feature effects. WCCFL 22 Proceedings, Gina Garding and
Mimu Tsujimura (eds.), 398422. Somerville, Mass.: Cascadilla Press.
Reinhart, Tanya M.
The syntactic domain of anaphora. Doctoral dissertation, MIT.
Rice, Keren D.
A reexamination of the feature [sonorant]: The status of sonorant
obstruents. Language 69: 308344.
Japanese NC clusters and the redundancy of postnasal voicing. Linguistic Inquiry 28: 541551.
Featural markedness in phonology: Variation. The Second Glot International State-of-the-Article Book. Lisa Cheng and Rint Sybesma
(eds.), 389429. Berlin /New York: Mouton de Gruyter.
Rice, Keren D. and J. Peter Avery
On the relationship between laterality and coronality. In Phonetics and
Phonology 2. The Special Status of Coronals: Internal and External



Evidence, Carol Paradis and Jean-Franois Prunet (eds.), 101124.

San Diego: Academic Press.
Sakuma, Kanae
Nihon Onseigaku [Japanese Phonetics]. Tokyo: Kyobunsha.
Word accent of the Kyoto dialect. Study of Sounds. Phonetic Society
of Japan.
Sakurai, Shigeharu
Kytsgo no hatsuon de chi subeki kotogara. In Nihongo hatsuon
akusento jiten, Nihon Hs Kykai (ed.), 3143. Tokyo: Nihon Hs
Shuppan Kykai.
Kyootsuu-go no hatsuon de chuui subeki kotogara [Notes on the
pronunciation of Standard Japanese]. In Japanese Pronunciation and
Accent Dictionary, Appendix to NHK (ed.), 128143. Tokyo: NHK
Sanada, Shinji
Chiiki to no kakawari: Koutsuu to tsuushin no gairaigo [Foreign
words of transportation and communication]. In Eibei Gairaigo no
Sekai [The world of loanwords from the English language], Yoshifumi
Hida (ed.). Tokyo: Nanundou.
Hyoujungo wa Ikani Seiritsu sitaka [How did Standard Japanese get
established?]. 176198. Tokyo: Soutakusya.
Sato, Hirokazu
Hukugoogo ni okeru akusento kisoku to rendaku kisoku [Accent and
rendaku rules in compounds]. In Nihongo no Onsei Onin [The Phonetics and Phonology of Japanese], Miyoko Sugito (ed.), 233265.
Tokyo: Meiji Shoin.
Decomposition into syllable complexes and the accenting of Japanese
borrowed words. Journal of the Phonetic Society of Japan 7(1): 6778.
Sato, Ryoichi
Gendai nihongo no hatsuon bumpu [Pronunciation distribution of
Present-day Japanese]. Gendai nihongo kza [Lectures on Presentday Japanese] Vol. 3: Hatsuon [Pronunciation], 2039. Tokyo:
Meiji Shoin.
Sat, Takeyoshi (ed.)
Gaisetu Nihongo no rekishi [Survey of the History of Japanese].
Tokyo: Asakura.
Sato, Yumiko
The durations of syllable-final nasals and the mora hypothesis in
Japanese. Phonetica 50: 4467.
Schane, Sanford A.
Fundamentals of particle phonology. Phonology Yearbook 1: 129155.
Diphthongisation in particle phonology. In Handbook of Phonological Theory, John A. Goldsmith (ed.), 586605. Oxford: Blackwell.

300 Bibliography
Shibata, Takeshi
Onin [Phonology]. Hogengaku gaisetsu [Phonology, General survey
of dialectology]. Tokyo: Musashino Shoin.
Shibatani, Masayoshi
The Languages of Japan. Cambridge: Cambridge University Press.
Shikano, Kiyohiro, Katsuteru Ito, Tatsuya Kawahara, Kazuya Takeda and Mikio
Yamamoto (eds.)
Speech Recognition Systems. Tokyo: Ohmsha.
Shimizu, Katsumasa
Voicing features in the perception and production of stop consonants
by Japanese speakers. Studio Phonologica 11: 2534.
Shinohara, Shigeko
The roles of the syllable and the mora in Japanese adaptations of
French words. Cahiers de Linguistique Asie Orientale 25 (1): 87112.
Siegel, Dorothy C.
Topics in English phonology. Doctoral dissertation, MIT.
Smolensky, Paul
Harmony, markedness, and phonological activity. Ms., Johns Hopkins
University (Available on the Rutgers Optimality Archive, ROA-37).
On the structure of the constraint component Con of UG. handout for
talk at UCLA.
Constraint interaction in generative grammar II: Local conjunction.
Paper presented at the Hopkins Optimality Theory Workshop /University of Maryland Mayfest, May 812, 1997.
Steriade, Donca
Underspecification and markedness. In The Handbook of Phonological Theory, John A. Goldsmith (ed.), 114174. Oxford: Blackwell.
Strik, Helmer and Catia Cucchiarini
Modeling pronunciation variation for ASR: A survey of the literature.
Speech Communication 29: 225246.
Sugito, Miyoko
Shibata-san to Imada-san: Tango-no chookakuteki benbetsu ni tsuiteno ichi koosatsu [Mr. Shiba-ta and Mr. Ima-da: A study in the auditory differentiation of words], Gengo Seikatsu 165 [S40-6], 64 72
(Reproduced in Miyoko Sugito 1998, Nihongo Onsei no Kenkyu
[Studies on Japanese Sounds]. Izumi-Shoin, Vol. 6: 315.)
Akusento no aru museika boin [A study on accented voiceless vowels].
The Bulletin of the Phonetic Society of Japan 132: 13.
1969/70 Measurements of tone movement of vowels and hearing validity in
relation to accent in Japanese. Studia Phonologica 5: 119. University of Kyoto.



Nihongo Akusento no Kenkyuu [Studies on Japanese Accent]. Tokyo:

SUGI SpeechAnalyzer. Yokohama: Fujitsu Animo.
Timing relationships between prosodic and segmental control in
Osaka Japanese word accent. Phonetica 60: 116.
Sugito, Miyoko and Hajime Hirose
An electromyographic study of the Kinki accent. Annual Bulletin of
the Research Institute of Logopedics and Phoniatrics 12: 3551.
Production and perception of accented devoiced vowels in Japanese,
Annual Bulletin of the Research Institute of Logopedics and Phoniatrics 22: 1937. University of Tokyo.
Susman, Amelia L.
The accentual system of Winnebago. Doctoral dissertation, Columbia
University, New York.
Suzuki, Takao
Nihongo to gaikokugo [The Japanese language and foreign languages]. Tokyo: Iwanami.
Takagi, Ichinosuke, Tomohide Gomi and Susumu no (annotators)
Manyooshuu 3 [Manysh 3]. Tokyo: Iwanami.
Takayama, Michiaki
Rendaku to renjoudaku [Rendaku and sandhi voicing]. Kuntengo to
kuntenshirou 88: 115124.
Sei daku shoukou [Brief thoughts on sei-daku]. In Nihongo ronkyuu 2:
Koten Nihongo to Jisho [Discussion on Japanese 2: Classic Japanese and dictionaries], Tajima, Ikudo and Kazuya Niwa (eds.), 17
56. Osaka: Izumi shoin.
Sokuon no atono dakuon [Voiced geminates]. Shimadai kokubun 21
[Bulletin of Japanese Literature of Shimane University 21], the Japanese Literature Society of Shimane University (ed.), 4855.
Takayama, Michiaki
Lecture notes (revised from Takayama 2000, Nihongo oninshi no
hh [Methodology of Japanese historical phonology], Nihongogaku 19 11). Ms., Kyushu University.
Takayama, Tomoaki
Hasatsu-on to masatsu-on no gry to daku-shiin no henka: iwayuru
Yotsugana gry no rekishiteki ichizuke [Merger of affricates and
fricatives and sound change of voiced obstruents: historical view of
the so-called Yotsugana merger]. Kokugo Kokubun 62 (4): 1830.
Shakuyougo no Rendaku /Kouonka ni tsuite (1) [On Rendaku in
Loanwords]. Report of the Special Research Project for the Typological Investigation of Language and Cultures of the East and West
1999, Part II, 375385. Tsukuba: Tsukuba University.

302 Bibliography
Takeda, Kazuya and Hisao Kuwabara
Boin museika no youin bunseki to yosoku syuhou no kentou [Analysis and prediction of devocalizing phenomena]. Proceedings of the
1987 Autumn Meeting of the Acoustical Society of Japan 1: 105106.
Tamamura, Fumio
Gokei [Word form] In Nihongo no goi imi [Words and meaning in
Japanese], Fumio Tamamura (ed.), 2351. Tokyo: Meiji shoin.
Tanaka, Makir
Kodai no buntai, bunshou [Style and writing in Kodai (before 1338)].
In Gaisetu Nihongo no Rekishi [Survey of the History of Japanese],
Takeyoshi Sat (ed.), 190206. Tokyo: Asakura.
Tanaka, Shin-ichi
Accentuation and prosodic constituenthood in Japanese. Tokyo Linguistic Forum 5: 195216.
The emergence of the unaccented: Possible patterns and variations
in Japanese compound accentuation. In Issues in Japanese Phonology and Morphology, Jeroen M. van de Weijer and Tetsuo Nishihara
(eds.), 159192. Berlin /New York: Mouton de Gruyter.
An OT-based integrated model of accent and accent shift phenomena
in Japanese. Phonological Studies 5, the Phonological Society of
Japan (ed.), 99104. Tokyo: Kaitakusha.
Three reasons for favoring constraint reranking over multiple faithfulness. In A Comprehensive Study on the Phonological Structure of
Languages and Phonological Theory Shosuke Haraguchi (ed.), 121
130. Technical Report of Basic Sciences (A)(1), Grant-in-Aid for
Scientific Research by the Japan Society for the Promotion of Science.
Review of Eric Robert Rosen, 2001, Phonological Processes Interacting with the Lexicon: Variable and Non-Regular Effects in Japanese Phonology. GLOT International.
Japanese grammar in the general theory of prominence: Its conceptual basis, diachronic change, and acquisition. In A New Century of
Phonology and Phonological Theory: A Festschrift for Professor
Shosuke Haraguchi on the Occasion of His Sixtieth Birthday, Takeru
Honma, Masao Okazaki, Toshiyuki Tabata and Shin-ichi Tanaka
(eds.). Tokyo: Kaitakusha.
Accent and Rhythm: From the Basics of Phonology to Optimality
Theory. Tokyo: Kenkyusha.
Tanaka, Shinichi and Haruo Kubozono
Nihongo no Hatsuon Kyooshitsu [Introduction to Japanese Pronunciation]. Tokyo: Kurosio Publishers.



Tateishi, Koichi
Onin jisho kurasu seeyaku no bunpu ni tsuite [On the distribution of
constraints for phonological sub-lexica]. Paper presented at the 26th
Annual Meeting of the Kansai Linguistic Society, Ryukoku University, Kyoto.
Lexical stratification theories and (un)markedness. paper presented
at LP 2002, Meikai University, September 3, 2002.
Phonological patterns and lexical strata. In Proceedings of CIL 17,
E. Hajicova, A. Kotesovcova and J. Mirovsky (eds.). Prague: Matfyzpress, MFF UK.
Tj, Misao (ed.)
Nihon Hoogengaku [Japanese Dialectology]. Tokyo: Yoshikawa
Tsujimura, Natsuko
An Introduction to Japanese Linguistics. Oxford: Blackwell.
Tsukishima, Hiroshi
Kodai no moji [Characters in Kodai] (approximately 8c11c, in this
book). In Kooza kokugo-shi 2: Onin-shi Moji-shi [History of sounds
and characters], Norio Nakata (ed.), 311444. Tokyo: Taishkan.
Tsuru, Hisashi
Manyoushuu ni okeru shakkun-gana no seidaku hyouki [Sei-daku
notations of kun reading kana in Manysh]. Many 36: 2032.
Manyougana [Many-gana]. In Nihongo 8: Moji [Characters], Susumu no and Takeshi Shibata (eds.). Tokyo: Iwanami.
Unger, J. Marshall
Studies in Early Japanese Morphophonemics. Bloomington: Indiana
University Linguistics Club [Doctoral dissertation, Yale University,
Uwano, Zendo, Aizawa Masao, Kato Kazuo and Sawaki Motoei
Onin sran [Survey of Phonology]. Nihon hgen dai jiten [Encyclopedia of Japanese dialects], Munakata Tokugawa (ed.), 177. Tokyo:
Vance, Timothy J.
The psychological status of a constraint on Japanese consonant alternation. Linguistics 18: 245267.
On the origin of voicing alternation in Japanese consonants. Journal
of the American Oriental Society 102: 333341.
An Introduction to Japanese Phonology. Albany: State University of
New York Press.
Lexical phonology and Japanese vowel devoicing. In The Joy of
Grammar, Brentari et al. (eds.). Amsterdam: John Benjamins.

304 Bibliography

Sequential voicing in Sino-Japanese. Journal of the Association of

Teachers of Japanese 30: 2243.
Semantic bifurcation in Japanese compound verbs. Japanese/ Korean
Linguistics 10, Noriko M. Akatsuka and Susan Strauss (eds.), 365
377. Stanford: CSLI.
Varden, J. Kevin
On high vowel devoicing in Standard Modern Japanese: Implications
for current phonological theory. Doctoral dissertation, University of
Venditti, Jennifer J. and Jan P.H. van Santen
Modeling vowel duration for Japanese text-to-speech synthesis. Proceedings of the 5th International Conference on Spoken Language
Processing (ICSLP98), Sydney: 20432046.
Wada, Minoru
A view patterns and marks of Japanese accent (in Japanese). Language; Autumn Issue 2: 2944.
Wenck, Gnther
Japanische Phonetik, Volume 4. Wiesbaden: Otto Harrasowitz.
Weijer, Jeroen M. van de
Segmental Structure and Complex Segments. Tbingen: Niemeyer.
Wheeler, Max W.
Phonology of Catalan. Oxford: Basil Blackwell.
Yamada, Eiji
Stress assignment in Tokyo Japanese: Stress shift and stress in suffixation. Fukuoka University Review of Literature and Humanities
22: 97154.
Yamaguchi, Yoshinori
Kodaigo no Fukugougo ni kansuru Ichi Kousatsu [On Compounds in
Ancient Japanese]. Nihongogaku 7 (5): 412. Tokyo: Meiji Shoin.
Yamane, Noriko
Chain shifts in intervocalic obstruents in Japanese. In A New Century
of Phonology and Phonological Theory: A Festschrift for Professor
Shosuke Haraguchi on the Occasion of His Sixtieth Birthday, Takeru
Honma, Masao Okazaki, Toshiyuki Tabata and Shin-ichi Tanaka
(eds.), 121139. Tokyo: Kaitakusha.
Yamane, Noriko and Shin-ichi Tanaka
Gravitation and reranking algorithm: Toward a theory of diachronic
change in grammar. Onin Kenkyuu [Phonological Studies] 5, the
Phonological Society of Japan (ed.), 135140. Tokyo: Kaitakusha.
Yanagita, Kunio
Kagy-k. Tokyo: Tk Shoin.



Yip, Moira J. W.
The tonal phonology of Chinese. Doctoral dissertation, MIT.
Yokotani, Teruo
Accent shift beyond the foot boundary: Evidence from Tokyo Japanese compound nouns. Journal of the Phonetic Society of Japan 1(1):
Yoshida, Natsuya
The effect of phonetic environment on vowel devoicing in Japanese.
Kokugogaku [Japanese Linguistics] 53 (3): 3447.
Yoshida, Natsuya and Yoshinori Sagisaka
Boin museika no youin bunseki [Factor analysis of vowel devoicing].
Technical Report of ATR Interpreting Telephony Research Laboratories (TR-I-0159).
Yoshida, Shohei
Some aspects of governing relations in Japanese phonology. Doctoral dissertation, School of Oriental and African Studies, University
of London.
Phonological Government in Japanese. Canberra: The Australian
National University.
Yoshida, Yuko Z.
On pitch accent phenomena in Standard Japanese. Doctoral dissertation, School of Oriental and African Studies, University of London.
[Published in 1999 by Holland Academic Graphics. The Hague.]
Yoshioka, Hirohide
Laryngeal adjustments in the production of the fricative consonants
and devoiced vowels in Japanese. Phonetica 38: 236251.
Young, Steve J., Joop Jansen, Julian J. Odell, Dave Ollason and Phil C. Woodland
The HTK Handbook. Entropic Research Laboratories.
Zamma, Hideki
Affixation and phonological phenomena: From Lexical Phonology to
Lexical Specification Theory. Onin Kenkyuu [Phonological Studies]
2, the Phonological Society of Japan (ed.), 6976. Tokyo: Kaitakusha.
Accentuation of person names in Japanese and its theoretical implications. Tsukuba English Studies 20: 118.
Suffixes and Stress/Accent Assignment in English and Japanese:
More Than a Simple Dichotomy. On-line proceedings of Linguistics
and Phonetics 2002 (LP2002), Meikai University, Tokyo. [http://

Index of authors

Abramson, Arthur S. and Leigh Lisker,

Akinaga, Kazue, 261, 279
Anderson, John M. and Charles Jones,
Anderson, John M. and Colin J. Ewen,
Anonymous, 54
Archangeli, Diana B. and Douglas G.
Pulleyblank, 144, 150n9
Anttila, Arto T., 204, 275, 277n1
Anttila, Arto T. and Cho Young-mee
Yu, 123, 151n18, 275
Avery, J. Peter, 25
Avery, J. Peter and William J. Idsardi,
25, 28, 36f, 45n7

Charette, Monik and Asli Gksel, 86n3

Cho, Young-mee Yu, 123
Choi, Kyung-Ae, 67n42
Chomsky, A. Noam and Morris Halle,
14, 74, 77
Clements, George N., 14, 25, 28, 277n5
Clements, George N. and Susan R.
Hertz, 77
Cohn, Abigail C., 82
Coleman, John S. and Janet B.
Pierrehumbert, 204
Cremelie, Nick and Jean-Pierre
Martens, 198

Backley, Phillip, 87n5, 87n6

Backley, Phillip and Takahashi Toyomi,
Bao, Zhiming, 77, 79
Beckman, Jill N., 137, 263
Beckman, Mary E., 232
Bell, Alan E., 229
Benua, Laura H., 174
Bird, Steven G., 191
Bloch, Bernard, 91, 102n11, 102n13,
Blumstein, Sheila E., William E.Cooper,
Harold Goodglass, Sheila Statlender
and Jonathan Gottlieb, 83
Boersma, Paul P. G., 204

End, Kunimoto, 57f, 60, 66n36, 67n45

Flemming, Edward S., 77
Frisch, Stefan A., 204
Fujimoto, Masako, 223
Fukada, Toshiaki, Takayoshi Yoshimura
and Yoshinori Sagisaka, 198
Fukazawa, Haruka, 120n1
Fukazawa, Haruka and Mafuyu
Kitahara, 3, 27, 105ff, 113, 115, 120,
Fukazawa, Haruka, Mafuyu Kitahara,
and Mitsuhiko Ota, 105f, 112f, 177
Fukuda, Suzy E. and Shinji Fukuda,
5ff, 22
Fukui, Seiji, 21
Fukushima, Kunimichi, 67n44

Cabrera-Abreu, Mercedes, 85
Calabrese, Andrea, 25,28
Campbell, Nick and Sagisaka
Yoshinori, 232

Dauer, Rebecca M., 229

de Lacy, Paul V., 143, 151n16, 262, 263

Gandour, Jackson T. and Rochana

Dardarananda, 83
Grootaers, Willem A., 186

308 Index of authors

Halle, Morris, 268
Halle, Morris and Kenneth N. Stevens,
74, 77, 79
Halle, Morris and Jean-Roger
Vergnaud, 266
Hamada, Atsushi, 37, 51, 55, 57, 63n2,
65n28, 66n34, 66n40, 67n42, 68n46,
68n50,90, 101n6
Hamano, Shoko, 123, 140, 142, 148,
Han, Mieko, S., 205, 232
Hansson, Gunnar lafur, 151n11
Haraguchi, Shosuke, 2, 102n16, 121n3,
189n10, 261, 266, 269
Harris, John K. M., 71, 74, 77ff, 83, 85,
Harris, John K. M. and Geoffrey A.
Lindsey, 71, 74, 76, 77, 82, 83,
Hasegawa, Kiyoshi, Katsuaki Horiuchi,
Tsutomu Momozawa, and Saburo
Yamamura, 93, 98
Hashimoto, Shinkichi, 101n7, 123, 129
Hattori, Noriko, 211, 240
Hattori, Shiro, 248
Hayata, Teruhiro, 69n56, 140, 142
Hayes, Bruce P., 245n2, 262, 263, 265,
266, 267, 268, 277n1
Hayes, Bruce P. and Donca Steriade,
Hepburn, James Curtis, 101n8
Hibiya, Junko, 123
Hinskens, Frans and Jeroen M. van de
Weijer, 143
Hirayama, Manami, 32, 33
Hirayama, Teruo, Ichiro Oshima,
Makio Ono, Makoto Kuno, Mariko
Kuno and Takao Sugimura, 123,
Hirose, H. et al., 14
Honda, Kiyoshi, Hiroyuki Hirai,
Shinobu Masaki and Yasuhiro
Shimada, 14, 259

Huang, Xuedong, Alex Acero and

Hsian-Wuen Hon, 194, 200
Hulst, Harry G. van der, 74, 86n3
Hume, Elizabeth V. and Georgios
Tserdanelis, 151n16
Hyman, Larry M., 229
Hwang, Mei-Yuh and Xuedong Huang,
Ide, Itaru, 66n31
Iida, Takesato, 65n27
Imaizumi, Satoshi, Akiko Hayashi, and
Toshisada Deguchi, 226
Inkelas, Sharon, 175
Inoue, Fumio, 123, 149n3, 149n5,
Inoue, Michiyasu, 66n31
Ishizuka, Tatsumaro, 9, 64n10
It, Junko, 20, 21, 239
It, Junko and Ralf-Armin Mester, 5, 9,
17f, 24n8, 25, 26, 27f, 29ff, 35ff, 39,
40, 41, 42, 44n1, 44n2, 45n3, 45n5,
45n8, 74f, 81, 105ff, 111f, 120n2,
121n5, 123, 124, 131ff, 137, 140, 142,
150n9, 152n21, 174, 177, 189n10
It, Junko, Yoshihisa Kitagawa and
Ralf-Armin Mester, 22
It, Junko, Ralf-Armin Mester, and
Jaye E. Padgett, 22, 25, 27ff, 35f, 41,
45n6, 74f, 76, 146
Itoh, Motonobu, Itaru F. Tatsumi, and
Sumiko Sasanuma, 83
Iwabuchi, Etsutar, 67n40
Iwanami Shoten Henshbu, 94
Jakobson, Roman, 82
Jakobson, Roman, C. Gunnar M. Fant
and Morris Halle, 74
Jessen, Michael and Catherine O.
Ringen, 74
Jdaigo Jiten Hensh Iinkai, 101n7
Jun, Sun-Ah, 229
Jun, Sun-Ah and Mary E. Beckman, 229

Index of authors
Jurafsky, Daniel and James H. Martin,
Kager, Ren W. J., 137, 150n9
Kamei, Takashi, 53, 56
Kamei, Takashi, Rokuro Kono and
Eiichi Chino, 123, 152n22, 152n24
Kasuga, Kazuo, 49, 64n12
Kawai, Mieko, 103n23
Kawakami, Shin, 248
Kawasaki, Takako, 36
Kaye, Jonathan D., Jean Lowenstamm
and Jean-Roger Vergnaud, 71, 74
Kazama, Rikiz, 93
Kenstowicz, Michael J., 245n2, 262
Kess, Joseph E. and Tadao Miyamoto,
Kikuchi, Hideaki and Kikuo Maekawa,
Kikuda, Norio, 98
Kindaichi, Haruhiko, Ooki Hayashi and
Takesi Sibata, 9
Kindaichi, Kyosuke, 123, 129
Kiparsky, R. Paul V., 41, 174
Kitahara, Mafuyu, 248
Kitahara, Yasuo, 94
Kiyose, Gisabur N., 100n1
Kohler, Klaus J., 74, 229
Komatsu, Hideo, 53, 59, 65n23, 66n32,
66n35, 68n45
Kondo, Mariko, 4, 216, 226, 229ff, 230,
241, 244, 245n1, 245n2, 273
Kubozono, Haruo, 1, 2, 5ff, 11, 13, 14,
22, 44n2, 121n3, 157, 160, 173,
175n2, 189
Kula, Nancy Chongo and Lutz Marten,
Kuno, Susumu, 102n11
Kuroda, Shige-Yuki, 5, 25, 36, 45n8
Kuwabara, Hisao and Kazuya Takeda,
Labrune, Laurence, 25, 26, 28, 30,


Lange, Roland A., 101n7

Liberman, Mark Y. and Alan S. Prince,
Lisker, Leigh and Arthur S. Abramson,
Lombardi, Linda, 74, 76
Lyman, Benjamin S., 44n2
Mabuchi, Kazuo, 68n48
Maddieson, Ian, 142
Maddieson, Ian and Peter N. Ladefoged,
125, 149n4
Maeda, Hiroyuki, 148
Maekawa, Kikuo, 4, 215f, 218, 225, 226,
230, 248
Maekawa, Kikuo and Hideaki Kikuchi,
4, 205ff, 230, 238
Maekawa, Kikuo, Hanae Koiso, Sadaoki
Furui and Hitoshi Isahara, 207
Maekawa, Kikuo, Hideaki Kikuchi,
Yosuke Igarashi and Jennifer J.
Venditti, 207
Marten, Lutz, 86n3
Martin, Samuel E., 25, 27, 29, 31, 32,
38f, 40, 43, 44n1, 44n2, 45n4, 45n5,
89, 92, 93, 101n7, 103n21
Maruyama, Rinpei, 53, 65n25
Mathias, Gerald B., 101n7
Matsui, F. Michinao, 248
Matsumoto, Takashi, 58
Matsumura, Akira, 94, 98
Matthews, Peter H., 102n15
McCarthy, John J., 274, 278n12
McCarthy, John J. and Alan S. Prince,
113, 123, 150n9
McCawley, James D., 27, 38f, 40, 44n1,
45n5, 102n16, 240
McGarrity, Laura W., 277n2
Mielke, Jeffrey, 143
Miller, Roy A., 101n4, 101n7, 151n13
Miyake, Marc H., 101n3
Mori, Hiromichi, 66n31
Murayama, Tadashige, 161

310 Index of authors

Nakagawa, Yoshio, 187
Nakata, Norio and Hiroshi Tsukishima,
51, 52
Napoli, Donna J. and Marina A. Nespor,
Nasu, Akio, 19, 151n20
Nasukawa, Kuniya, 71ff, 75, 77, 82,
83ff, 87n5, 87n6, 87n7
Nihon Daijiten Kankkai, 101n4
Nihon Hoso Kyokai [NHK], 205, 247,
Nishihara, Tetsuo, 3, 151n15
Nishimiya, Kazutami, 49, 64n12
Ogura, Sinpei, 44n2
Ohala, John J., 150n9
Ohno, Kazutoshi, 2, 3, 5, 15, 28, 31, 32,
33, 34, 37, 44n2, 47ff, 100n1, 101n8,
Okumura, Mitsuo, 92, 97
Ono, Masahiro, 53
no, Susumu, 49, 64n12
Oohashi, Junichi, 123, 125, 149n4
Orgun, Cemil Orhan, 175
Otsu, Yukio, 5, 11f, 44n2, 45n3
tsubo, Heiji, 50, 51
Parker, Charles K., 32, 33
Pater, Joseph V., 34, 45n6
Pater, Joseph V. and Adam Werle,
Pierrehumbert, Janet B. and Mary E.
Beckman, 103n6
Piggott, Glyne L., 25
Ploch, Stefan, 77, 86n4, 87n8
Polivanov, Yevgeny D., 248
Port, Robert F., Jonathan M. Dalby and
Michael L. ODell, 232
Poser, William J., 31, 32, 42, 278n9
Prince, Alan S., 131
Prince, Alan S. and Paul Smolensky,
121n8, 123, 125, 131, 262, 265,
277n1, 278n11

Pulleyblank, Douglas G., 144ff, 151n19

Reinhart, Tanya M., 23n4
Rice, Keren D., 3, 25ff, 26, 28, 29ff,
36ff, 40, 41, 75, 121n3, 131,
Rice, Keren D. and J. Peter Avery, 25
Rodriguez, Joo, 60, 66n40, 67n43,
67n44, 68n47
Sakuma, Kanae, 216, 221, 248
Sakurai, Shigeharu, 103n17, 230, 240
Sanada, Shinji, 186f
Sato, Hirokazu, 12, 22, 157
Sato, Ryoichi, 150n7
Sato, Yumiko, 232
Schane, Sanford A., 74
Shibata, Takeshi, 149n4
Shibatani, Masayoshi, 101n7, 151n12
Shikano, Kiyohiro, Katsuteru Ito,
Tatsuya Kawahara, Kazuya Takeda
and Mikio Yamamoto, 194
Shimizu, Katsumasa, 73, 80
Shinohara, Shigeko, 239
Siegel, Dorothy C., 174
Smolensky, Paul, 121n6
Steriade, Donca, 34
Strik, Helmer and Catia Cucchiarini,
198, 204
Sugito, Miyoko, 4, 8ff, 23n2, 23n3,
157ff, 247ff, 248, 249, 259, 261
Sugito, Miyoko and Hajime Hirose,
241, 247, 249
Susman, Amelia L., 267
Suzuki, Keiichiro, 2, 191ff
Suzuki, Takao, 121n7
Takagi, Ichinosuke, Tomohide Gomi,
and Susumu no, 54f, 59
Takayama, Michiaki, 60f, 68n50, 69n56,
123, 142, 147, 148, 152n22
Takayama, Tomoaki, 3, 63, 110, 120n2,
121n3, 123, 140, 151n10, 151n14

Index of authors
Takeda, Kazuya and Hisao Kuwabara,
205, 206, 223, 230
Tamamura, Fumio, 2
Tanaka, Makir, 66n32
Tanaka, Shin-ichi, 4, 157, 175n8, 261ff,
268, 269, 270, 273, 274, 277n1,
278n9, 278n11
Tanaka, Shinichi and Haruo
Kubozono, 23n5
Tateishi, Koichi, 106f, 109ff, 119,
120n2, 121n4, 121n7, 121n8
Tj, Misao, 149n1
Tsujimura, Natsuko, 81
Tsukishima, Hiroshi, 64n10
Tsuru, Hisashi, 49, 51, 64n8, 64n11,
64n12, 65n19
Unger, J. Marshall, 90, 123, 129
Uwano, Zendo, Masao Aizawa, Kazuo
Kato, and Motoei Sawaki, 123
Vance, Timothy J., 2, 25, 26, 27, 29ff,
35, 40, 42f, 44n1, 44n2, 45n5, 45n9,
45n10, 59, 69n52, 81, 89ff, 90, 91, 93,
94f, 100n2, 101n6, 101n8, 102n12,
103n17, 103n22, 123, 129, 150n9,
189n5, 240
Varden, J. Kevin, 45n10


Venditti, Jennifer and Jan P. H. van

Santen, 210
Wada, Minoru, 247
Wenck, Gnther, 140
Weijer, Jeroen M. van de, 150n8
Wheeler, Max W., 229
Yamada, Eiji, 261
Yamaguchi, Yoshinori, 189n10
Yamane-Tanaka, Noriko, 3, 69n57,
123ff, 135, 149n2
Yamane, Noriko and Shin-ichi Tanaka,
Yanagita, Kunio, 123, 128, 129, 151n12
Yip, Moira J. W., 77, 79
Yokotani, Teruo, 261
Yoshida, Natsuya, 223
Yoshida, Natsuya and Yoshinori
Sagisaka, 206, 223, 230
Yoshida, Shohei, 75, 86n1
Yoshida,Yuko Z., 85
Yoshioka, Hirohide, 238
Young, Steve J., Joop Jansen, Julian J.
Odell, Dave Ollason and Phil C.
Woodland, 207
Zamma, Hideki, 4, 157ff, 173

Index of languages

Bantu languages, 84, 85

Burmese, 73
Campa (Arawak), 80
Catalan, 79
Chinese, 3, 14, 42, 48ff, 180, 188,
189n1, 264
Classical, 180
Japanized 64n6
Mandarin 63n4
Dutch, 185
English, 14, 39ff, 44n1, 59, 72f, 78, 79,
80, 83, 84, 92, 93, 109, 114, 121n7,
179, 185, 264, 267f, 277n5

Modern, 5, 9, 22, 42, 59, 60, 89,

129ff, 179, 184
Middle, 129ff, 148
Nara, 90, 128
Old, 9, 90ff, 123ff, 129ff
Osaka, 21, 175, 247ff
pre-old, 90, 102n10
Shikoku, 128
Standard, 207, 209, 229
Tohoku, 61, 69n57, 84, 123ff
Tokushima, 147
Tokyo, 21, 23n2, 24n10, 69n54, 84,
102, 124, 139f, 150n9, 175, 209, 248
western, 69n54

Ewe, 14

Korean, 57, 67n42, 67n43

Finnish, 73, 78, 84

French, 72f

Latin, 39
Lithuanian, 266

German, northern, 79
Germanic languages, 72
Greek, 39
Gujarati, 78

Pirah, 277n5
Polish, 72, 79
Portuguese, 57, 110, 178, 180

Hindi, 78
Indonesian languages, 85
Italian, 14
Aomori, 124
common, 207
eastern 69n54, 229
Kansai, 247f
Kanto, 128
Kinki, 21, 128
Kochi, 139
Kyoto, 21, 128, 247
Kyushu, 61
literary, 42

Quichea, 80, 84
Quileute, 84
Reef Island-Santa Cruz languages, 84
Romance languages, 72
Russian, 72, 178f
Serbo-Croatian, 79
Siouan languages, 267
Slavic languages, 72
Spanish, 72f, 78, 80, 84
Swedish, 72f
Thai, 73, 78, 80, 83, 84
Winnebago, 267
Zoque, 80, 84
Zuya-go, 22

Index of subjects

accent and sonority, 262ff

accent and tone, 262ff
accentedness, 157ff, 252
see also voicing and accent
acquisition, 71, 80, 82ff, 86, 119f,
alternations, 45f
aphasia, 71, 80, 82ff, 86
chronological continuum, 128ff
compounds, 29ff
coordinate compounds, 19, 81, 94ff
corpus, 194, 196, 198, 199, 202, 205ff
devoiced accented vowels, 247ff, 262,
270ff, 278n10
see also vowel devoicing
durational reduction, 229ff, 251ff
electromyography, 249, 255, 256, 258
Element Theory, 74ff
faithfulness, 106ff, 130, 137, 274
relativized faithfulness, 106, 115
falling, 77, 240f, 248ff
rising, 77, 258; see also pitch
foreign words, (loanwords, borrowings)
3, 21, 38, 39, 40ff, 45n8, 82, 100n1,
105ff, 149n5, 150n7, 177ff
functional load, 59, 121n7

inflected words, 91ff

intensity, 208, 231, 234ff, 244, 248, 264
laryngeal-source contrasts, 71ff
lexicon, 2f, 4, 26, 35, 37, 38ff, 105ff,
132f, 177, 179, 183, 186, 189n1,
194, 196, 197f, 203
core-periphery structure, 105ff
stratification, 2, 4, 26, 37ff, 48, 177,
see also Sino-Japanese, foreign
words (loanwords, borrowings)
Lymans Law, 2, 7ff, 22, 25ff, 81, 84,
86n2, 93, 108, 110, 121n11, 160,
278n7, 278n8
manyougana, 48ff
markedness, 9, 10, 15, 78, 105ff, 130,
131, 137, 142ff, 263, 278n12
minimal pairs, 126f, 140, 149n4, 185, 186
mora, 4, 8, 9, 11, 13, 15ff, 39, 95, 96,
103n22, 159ff, 205, 208, 217, 221,
223, 224f, 232ff, 239ff, 247ff, 266,
270, 274, 277n1, 278n7
moraic nasal, 66n35, 81, 95, 100n2,
101n6, 125, 126, 147, 149n4, 175n5,
208, 242
nasalization, (of voiced obstruents)
63ff; see also prenasalization

geminates, 17, 37, 149n4, 208, 215, 220,

224, 244
geographical continuum, 126ff

nativization, 108, 180, 186, 187

*NT constraint 35ff, 107ff, 190n13
numeral-classifier combinations, 23n5,

harmonic scales, 130ff, 261ff

implicational relationships, 75, 78, 84f,
86, 123ff, 263ff, 267, 270, 276, 277n1

OCP, 7ff, 27, 113, 115, 160f, 165ff,

172f, 273, 275, 278n10
opacity, 261ff

314 Index of subjects

optionality and directionality (of accent
shift), 261ff
physiological experiment, 238, 247ff
pitch, 4, 13, 59, 85, 102n16, 205, 227,
236, 247ff, 264ff, 271
prenasalization, 3, 37, 81, 84, 85, 90,
123ff, 190n13
prosodic asymmetry, 13
rendaku, 1ff, 5ff, 25ff, 80f, 89ff, 108,
110, 113, 117, 121n3, 121n11,
157ff, 177ff, 269, 278n7
branching constraint on ~, 11ff, 22
mora constraint on ~, 15ff, 22
sei-daku distinction, 47ff
sound values of ~, 54, 60, 61, 66n40
Sino-Japanese, 3, 16ff, 26, 33ff, 105,
107, 108, 149n5, 177ff, 268
vulgarized ~, 184, 187, 188, 190n12
speech rate, 221f, 226, 230, 232, 247ff
speech style, 4, 206f, 226
speech recognition, 191ff, 207
Large Vocabulary Continuous
Speech Recognition (LVCSR), 191ff
Automatic Speech Recognition,
specific language impairment, 5f

spontaneous speech, 205ff, 230

Sugitos Law, 8ff, 23n2, 23n3, 158ff
syllable structure, 4, 121n8, 215, 229ff
sympathy (in OT), 263ff
syncope, vowel ~, 90
UNIFORMITY constraint, 115ff
universals, 75, 78, 86
voicing, postnasal, 3, 25ff, 76, 80f, 84,
110, 113, 121n7, 142, 147, 160, 174,
175n5, 180, 184
voice contrasts, 47ff, 76, 109f, 123,
140ff, 147
voicing and accent, 4, 33, 52, 59,
69n53, 157ff, 247ff, 261ff
voicing and nasality, 47, 57ff,
66n40, 67n43, 67n44, 69n57, 77,
82, 83ff, 86, 86n4, 87n6, 123ff
vowel devoicing, 1, 4, 71, 76, 79, 81, 82,
205ff, 229ff, 261, 262, 268, 271, 273
atypical environments, 217, 225ff
consecutive devoicing, 4, 215ff,
224ff, 230f, 234ff, 241ff, 244, 268
manner interaction, 215, 219, 223ff
vowel weakening, 231f, 234, 237, 244
word frequency, 218f
writing system, 39, 42f, 45n9, 47ff

Похожие интересы