Double consonants in English: Graphemic, morphological, prosodic and etymological determinants

Kristian Berg

Carl-von-Ossietzky Universität Oldenburg


1. Overview

Consonant doubling in English shows regularities on many levels. Some regularities are graphotactic1, some are

morphological, some are prosodic, and some seem to be related to etymology. At first glance, the main

regularity seems to be phonographic2: There is a close correlation between double consonants letters and short

(or ‘lax’) vowel phonemes3. The main problem that arises in the description is that this regularity holds only in

one direction: Almost all double consonants indicate that the preceding vowel has a short correspondence (e.g.

<hammer>4, <getting>), but the opposite implication is not valid: Far from all short vowels are marked with

double consonants (c.f. e.g. <limit>, <divinity>). For the writer, this poses a major problem: Exactly when are

short vowel marked by double consonants and when not? The writer is affected as well, albeit indirectly. Since

some short vowel are not marked by consonant doubling (e.g. <lemon>), the resulting words are isomorphic to

words with long vowels (e.g. <demon>). Exactly when does a single vowel letter followed by a single

consonant letter correspond to a long vowel, and when does it correspond to a short vowel? These questions are

essential for learners of the English writing system.

This paper sets out to answer them, both from the reading and from the writing perspective, using

correspondences between graphemic and phonological forms that are established on the basis of the large lexical

database CELEX (Baayen et al. 1995). The first insight into the matter is that morphologically complex words

1 Graphotactic regularities are the written analog to phonotactic regularities in spoken language. They pertain to the written
side of words alone and capture the combinatorial principles of letters and graphemes. For example, the fact that <x> is,
unlike most other letters, never doubled is an instance of a graphotactic regularity. It cannot be explained by reference to any
other linguistic level.
2 Phonographic regularities describe the relation between units of writing and units of sound. The standard case are phoneme-

grapheme-correspondences, i.e. the relation between phonemes and graphemes. The grapheme <m>, for example, regularly
corresponds to the phoneme /m/.
3 In this paper, I will use British English as a point of reference. Following Jones et al. (182011), the vowels [ɪ], [æ], [ʊ], [ɒ],

[ʌ] and [e] will be referred to as short. The debate around short vs. long (e.g. Jones et al. 182011), tense vs. lax (e.g. Halle &
Mohanan 1985), free vs. checked (e.g. Kurath 1964) vowels, how to define these categories, and which category is the most
suitable phonologically or phonographically, will be ignored here. As Cummings (1988: 53f.) demonstrates, at least the tense
vs. lax contrast is rooted in spelling. This highlights the crucial importance of keeping the representational levels apart
analytically: If lax vowels are defined phonographically, i.e. as those phonemes that correspond to written vowels in
graphemically closed syllables (as e.g. <bit>, <dinner>, <pudding>), and we use this category to describe phonographic
correspondences (e.g. as in ‘vowel letters before double consonants correspond to lax vowel phonemes’), we end up with a
perfectly tautological statement.
4 Written and spoken words and segments are distinguished as follows: Written words and parts of words are presented in

angled brackets (e.g. <word>); spoken words and parts of words are presented in square brackets (e.g. [wəːd]). If the medial
realization is irrelevant, the word or part of word is italicized (e.g. word).
and morphologically simple words behave differently. Double consonants in morphologically complex words

can be motivated with reference to graphemic word-formation rules: Under certain well-defined conditions, the

stem-final consonant of a base is doubled when a vowel-initial suffix operates on that base (<bet> – <betting>).

As will be argued below, these conditions may be framed in purely graphemic terms without reference to

phonology. Morphologically simple words, on the other hand, are different in that the occurrence (or non-

occurrence) of double consonants hinges on the word ending. In this paper, the term word ending denotes

recurring word-final entities without meaning, but with distributional properties (see section 4, Fn. 7 for a more

thorough definition). For example, words which end with <-it> are very likely to occur with single intervocalic

consonant letters (<limit>, <profit>, <spirit>), while words which end with <-ow> are likely to occur after

double consonants (e.g. <sorrow>, <willow>). Perhaps surprisingly, both kinds of word endings sometimes

have the same phonological correspondence: Words that end with <-ic> almost never double the preceding

consonant letter (e.g. <panic>), while words that end with <-ick> almost always do (e.g. <derrick>). Across

both morphologically simple and complex words operate graphotactic constraints. For example, not all single

consonant letters can be doubled. These three areas will be covered in the following, starting with graphotactics

(section 2), moving on to morphologically complex words (section 3) and finally to morphologically simple

words (section 4). The last part (section 5) is a summary and discussion of the main findings.

2. Graphotactic constraints

The first observations come from a purely graphemic perspective. Generalizing over the set of all graphemic

words in the lexical database CELEX (Baayen et al 1995) we get (1) and (2):

(1) Only the letters <b, d, f, g, k l, m, n, p, r, s, t, v, z> are doubled.

Of these, double <k>, <v> and <z> are comparatively rare (for <v> and <z> cf. Venezky 1999: 6).5

(2) Double consonants occur after a single vowel letter and before one or more vowel letters. There are

three general exceptions to this statement:

(2a) Exception 1: Consonant doubling can occur preceding or following a consonant letter in compounds

(e.g. <headdress>, <granddaughter>) or words with Latinate prefixes (e.g. <apply>, <suppress>).

5Additionally, <ck>, <dg>, and <tch> have a distribution very similar to that of double consonants; Venezky
(1999: 14 et passim) calls them ‘pseudogeminates’. Synchronically, <ck> can be argued to be the doubled
variant of <c>, cf. e.g. <picnic> - <picnicked> - *<picnicced>.
Double consonants in this position could be utilized to determine the morpheme boundaries: The fact that the

probability of <ndd> occurring morpheme-internally is low is a viable cue for a morpheme boundary. The

number of Latinate prefix words with double consonant followed by a consonant is quite low (<acclaim>,

<accredit>, <accrete>, <accrue>, <address>, <attractive>, <attribute>, <attrition>, <applaud>, <apply>,

<appraise>, <appreciate>, <apprehend>, <apprentice>, <appropriate>, <approximate>, <approve>,

<ecclesiastic>, <oppress>, <supply>, <suppress> – this list is almost complete).

(2b) Exception 2: Consonant doubling regularly occurs before -le

The word ending <-le> is related to <-el>; as suffixes (e.g. sparkle, as opposed to non-functional word-endings

as in bottle), they are allomorphs6. <-le> is the only formative that shows this “inverted” behavior (Carney 1994:

124f., 277). Other suffixes that correspond to syllabic consonants are spelled differently; the spellings *<bakre>

or *<smittne> are impossible in English.

(2c) Exception 3: Consonant doubling occurs word-finally

Here we have to distinguish two different cases:

(3) a. <ebb>, <odd>, <inn>, <egg>

b. <block>, <trick>, <stiff>, <gaff>, <mass>, <fuss>, <bell>, <hill>; <staff>, <class>, <call>

The words in (3a) can be accounted for by a constraint demanding that lexical words in English have at least

three letters (cf. Jespersen [1909] 1928: §4.96). This is not specific for final double consonants, it also applies to

words like <bee>, <toe>, <rye> etc. Thus the double consonants in (3a) are independently motivated.

The words in (3b) show the regularly occurring word-final double consonants, <ck>, <ff>, <ss>, and <ll> (cf.

also Cummings 1985: 76f.).

There are two respects in which this group (resp. a sub-group) differs from the other consonants.

1.: In the context where <ck>, <ff>, <ss> and <ll> occur - directly after a single vowel letter - neither of the

single consonant letters <c> (or <k>), <f>, <s> or <l> can occur (4a). This sets the group apart from the other

consonant letters (4b); there are only a few exceptions to this (most notably the suffix <-ic>/*<-ick>):

6The condition for their distribution seems to be the stem-final phoneme: <-el> appears “after v, th, ch, n, as in hovel,
brothel, hatchel, kernel” (OED).
(4) a. *<bac>/*<bak>, *<rif>, *<kis>, *<hil>

b. <rib>, <cod>, <peg>, <jam>, <pen>, <pop>, <car>, <hit>

Likewise, Brame (1983b) points to the fact that the letter names for <f>, <s>, and <l> are <eff>, <ess>, and

<ell> respectively, as opposed to <em>/*<emm> and <en>/*<enn> for example. There is thus no opposition in

this context. This does not explain word-final consonant doubling for <c>/<k>, <f>, <s>, and <l> – but it makes

it look a little less haphazard.

2.: Moreover, the vowel letters preceding word final double consonants in (3b) do not always correspond to short

vowel phonemes; this holds for <staff>, <class>, <call> and similar words (cf. Carney 1994: 124f.)7. This again

sets the group <f>, <s>, <l> apart from other consonant letters.

These are the graphotactic constraints on consonant doubling in English: Not all consonant letters occur as

geminates. Double consonants occur after a single vowel letter and before one or more vowel letters. There are

some well-defined exceptions to these constraints. They hold across both morphologically simple and

morphologically complex words.

3. The core: morphologically complex words

Consonant doubling regularly occurs at morpheme boundaries. The following requirements hold (cf. e.g.

Cummings 1985: 161ff.):

(5) Consonant doubling occurs before a suffix with an initial vowel letter if a) there is a single base-final

consonant letter from the set in (1) before the boundary and b) this consonant letter is preceded by a

single vowel letter.

This regularity is purely graphemic. It captures almost all double consonants in inflectional products (like

<swimming>, <pinned>, <hotter>) and most of the derivational ones (like <baggage>, <potter>, <fattish>). As

noted above, the statement refers to the graphemic structure of the base. This explains spellings like the

following, which would be hard to motivate from a phonographic perspective:

7 This only holds for British English; in American English, the vowels in staff and class are short.
(6) a. <come> – <coming>

b. <look> – <looking>

c. <stem> – <stemmed>

In (6a), final <e> is dropped because the suffix begins with a vowel (see below). This leads to a potentially

misleading correspondence for <o> (cf. <homing>, <zoning>), but consonant doubling does not occur because

the requirement (5) holds (in the base word, the <m> is not graphotactically final). Likewise, in (6b) the

consonant is not doubled because there is more than one preceding vowel letter (<oo>). In (6c), although the

<e> in the suffix is silent, it still triggers doubling of the preceding consonant letter because -ed is a suffix with

an initial vowel letter.

What is the reason for this reference to base forms, which leads to deviations from (more or less) regular

phonographic correspondences? The systematic reason for consonant doubling can be found in <e>-deletion

before suffixes with initial vowel letters like <-ing>, <-er>, <-ance> etc. For example, Carney (1994: 129) and

Palmer et al. (2002: 1577) observe that while in monomorphemic words the differences in the corresponding

vowel qualities are expressed by the presence or absence of final <e> (e.g. <hope> – <hop>), in morphologically

complex forms (with vowel-letter-initial suffixes), the same difference is expressed by the presence or absence

of a double consonant (e.g. <hoping> – <hopping>). Carney (1994: 129) adds that a constant marking of long or

diphthongic correspondences by silent <e> (even in morphologically complex words) would lead to spellings

with vowel letter clusters <ei>, <ee> like *<takeing> and *<takeer>. Carney supposes that this spelling is not an

option because it would interfere with the correspondences these clusters have in monomorphemic words (cf.

<heir>, <peel>).

While that is certainly true, it is possible to go further. The second syllable in *<shakeer>, for example, could

potentially be interpreted as stressed and long (cf. <career>, <veneer>, <engineer>). The framework of Evertz &

Primus (2013) offers a suitable explanation for this: graphemic syllables with complex vowel letters are ‘heavy’,

and they regularly correspond to stressed phonological syllables.

The need to mark short vowels in morphologically complex forms thus follows from the impossibility to retain

final <e> before vowel-letter-initial suffixes. If the argument sketched above is correct, we would expect final

<e> and consonant doubling in essentially the same positions. For final <e>, there are two major constraints: It

does not occur after complex vowel letters (7a), and it rarely occurs after consonant letter clusters (7b) (cf.

Venezky 1999: 48):

(7) a. <bake>, *<baike>; <note>, *<noute>

b. <face>, *<fasse>; <hole>, *<holde>

As noted above in (5), consonant doubling at morpheme boundaries follows exactly these two constraints: there

is no doubling after a complex vowel letter (8a) or after another consonant letter (8b).

(8) a. <looked>, *<loocked>; <headed>, *<headed>

b. <casting>, *<castting>; <hanging>, *<hangging>

Consonant doubling is thus subject to the same constraints as final <e> is. This is further evidence for the close

relation of both markers. Moreover, when words with final <e> and consonant-initial suffixes are combined, the

<e> is not dropped (cf. <state> – <stately>, *<statly>), and accordingly stem-final consonants are not doubled

either (cf. <bad> – <badly>, *<baddly>). Consonant doubling thus seems like a backup solution: Whenever

final <e> is not available in a morphological process, consonant doubling helps out.

Summing up, double consonants are motivated by the need to mark vowel quality in morphologically complex

words with vowel-initial suffixes. This need in turn arises because final <e> cannot be employed for reasons of

graphemic syllable weight. As a consequence, morphologically complex words are formed very regularly on the

graphemic form of the respective base. This may lead to deviating correspondences in the morphologically

complex form. In the following, I will examine both inflectional (3.1) and derivational (3.2) cases in more detail.

3.1 Inflection

If an inflectional suffix (except -s) operates on a monosyllabic basis, statement (5) is exceptionless. This leads to

forms like the following:

(9) begged, banned8, fatter, starred, thinnest

8 Note that in most cases involving -ed, the suffix is phonologically a single consonant (/t/ or /d/), and the inflected form has
the same number of syllables as the base. So phonographically, <band> would be a good spelling for /bænd/. Yet the suffix
is graphemically vowel-intial, and thus the requirement in (5) holds, which leads to consonant doubling. This is further
evidence for the graphemic nature of consonant doubling. It also serves to give the suffix a distinct (and almost unique)
graphemic form (cf. Berg et al. 2014).

The reference to the graphemic form of the base also serves to explain why some spellings are excluded for

irregular past tense forms like kept and slept: Phonographically, *<kepped> and *<slepped> should be possible

(cf. e.g. <prepped>), but the fact that there is no base <kep> and <slep> prevents these spellings.

For polysyllabic bases, there are numerous exceptions, however:

(10) visiting/*visitting, authored/*authorred

There are two ways to deal with these exceptions. The data in (10) can be straightforwardly explained by

hypothesis (11), which I will call the prosodic hypothesis and which is an additional requirement to (5) above:

(11) Consonant doubling occurs if the base-final syllable receives phonological stress.

The bases in (10) and many others are trochees with unstressed ultimates, and accordingly the base-final

consonants should not be doubled. On the other hand, for iambic bases – i.e. bisyllabic bases with an unstressed

syllable followed by a stressed syllable – (11) correctly predicts consonant doubling:

(12) <inferred>, <beginning>, <remitted>, <occurring>

Cases like <panicking> and <picnicking> are prima facie counterexamples – the consonant is doubled even

though the base-final syllable is unstressed. But as e.g. Venezky (1999: 83) and Carney (1994: 223) have pointed

out, this is a very special case: Without consonant doubling, the stem-final <c> in *<picnicing> would be

wrongly assumed to correspond to [s]. Thus, <ck> can be motivated here by the pursuit of phonographic


However, compounds (and pseudo-compounds) are not covered by the prosodic hypothesis:

(13) <sandbagging>/*<sandbaging>, <babysitting>/*<babysiting>, <airdropping>/*<airdroping>

The words in (13) should not have a double consonant according to the prosodic hypothesis because the base-

final syllable is unstressed. One could advocate that in these cases the stem-final syllables bear secondary stress

(cf. Cummings 1988: 165). But it is far from clear that the second syllables in sandbag, babysit and airdrop are

intonationally categorically different from those in visit and author. The difference might just lie in the full vs.

reduced vowel quality (Bolinger 1986:. 351 N.3; 1989: 215f.; Fudge 1984: 31)9. The intonational difference

between e.g. outfit vs. trumpet seems to be a matter of degree and personal preference.

9As Bolinger (1986: 351 N.3) puts it: “Normally, syllables after the stress behave intonationally the same regardless of
whether they are full or reduced. The pitch contour is the same in both máypole and máple […]”.
An alternative way to capture the data in (13) is (14), which I will call the morphological hypothesis:

(14) Consonant doubling occurs if the base-final graphemic syllable is a monosyllabic root.

This easily captures cases like sandbag, babysit, airdrop. Cases like <visiting>/*<visitting> and

<authored>/*<authorred> are also covered by the morphological hypothesis (no base-final monosyllabic root →

no consonant doubling), as well as cases like <inferred>, <beginning>, <remitted>, and <occurring> (base-final

monosyllabic root → consonant doubling). But the morphological hypothesis also explains cases with a varying

degree of transparency from words like kidnap, bootleg, handicap to completely opaque forms like humbug,

zigzag, hobnob (which all occur dominantly with doubled consonants), and variation in cases like

<worshipping>/<worshiping>, <combatting>/<combating>, <formatting>/<formating>. Variation hinges on the

morphological analysis: If a writer analyses -ship in worship as a root, the consonant is doubled, if worship is

monomorphemic for her, it remains the way it is. If a writer analyzes com- and for- in combat and format as

prefixes and thus -bat and -mat as roots, the consonants are doubled; otherwise, they remain single consonants.

The prosodic hypothesis has problems with these data – at least in its prosodic formulation with reference to

secondary stress. A possible reformulation could be in terms of vowel quality: after all, this is a respect in which

compounds like sandbag, bootleg and kidnap differ from polysyllabic roots like summer, orbit or blanket.

Ultimately, however, this difference could also be argued to be morphological. Only vowels in affixes and in

non-initial syllables of polysyllabic roots are reduced, so the full vowel in the final syllable in sandbag, babysit,

and airdrop indicates some sort of lexical content. Both phonology (in form of vowel quality) and graphemics

(in form of consonant doubling/non-doubling) thus operate on similar kinds of morphological information.

But why are consonants doubled only in monosyllabic roots? This may have to do with the distribution of <e>-

deletion and the functional need to mark vowel quality. There are 121 pairs of phonologically monosyllabic

words which only differ in the presence or absence of final <e> (e.g. <bid> – <bide>, <mop> – <mope>, <stag>

– <stage>). Consonant doubling is necessary in these cases to prevent systematic homography of

morphologically complex forms (e.g. <biding>, <moping>, <staged>). For phonologically bisyllabic words,

there are only 12 such pairs, and almost all involve semantically closely related words (e.g. <artist> – <artiste>,

<ballad> – <ballade>, <human> – <humane>; but also <unit> – <unite>). So the functional motivation to

discern word pairs is far greater for monosyllabic roots.

Generally, dictionaries of American English tend to indicate post-primary secondary stress in compounds (e.g. Webster’s
Third), while dictionaries of British English do not (e.g. OED).
One final (minor) sub-regularity: In British English, <l> is doubled even in cases where it is unstressed/not a

monosyllabic root (15a); in American English, it is not (15b) (cf. e.g. Carney 1994: 251).

(15) a. <cancelling>, <marvelled>, <rivalled>

b. <canceling>, <marveled>, <rivaled>

In British English, one consonant thus behaves oddly; American English is more coherent in this respect (but cf.

Brame 1983a,b)

3.2. Derivation

Products of derivational word formation rules also comply with the constraint (5) and the morphological

hypothesis. This holds for the doubling (16a) resp. non-doubling (16b) of consonant letters in the following


(16) a. <biddable>, <cribbage>, <rebuttal>, <admittance>, <flippant>, <summation>, <gladden>

b. <personage>, <visitant>, <angelic>, <feverish>, <canonize>

In the cases in (16a) and similar cases, the stem ends with a monosyllabic root, and the final consonant letter is

doubled. The cases in (16b), on the other hand, involve polysyllabic roots and show single stem final consonant


There are only a handful of exceptions in CELEX. A number of words occurs with unexpected <l>-doubling

(e.g. <crystallize>, <bimetallism>, <panellist>, <marvellous>). As noted above for inflected words with <ll>,

this seems to be a case of diatopic variation, with British English favouring the <ll>-forms and American English

favouring the <l>-forms. The other exceptions involve unexpected single consonants (17a) or unexpected

doubling (17b):

(17) a. <parity>, <gasify>, <scarify>

b. <clarinettist>, <carburettor>

According to both the prosodic and the morphological hypothesis, we would expect consonant doubling in (17a)

if the words are indeed formed on the bases par, gas, and scar. For gasify, the form <gassify> was historically a

variant as the OED notes, attested from the 18th century onwards; it would be interesting to see whether this

spelling still occurs (there are no instances of <gassify> in CoCA, though). Scarify nicely captures the whole

point of consonant doubling: It can be formed on two bases, scar and scare, and both meanings are attested, both

with their own unique phonological form – but without consonant doubling, both senses and pronunciations


The two words with unexpected double consonants in (17b) are British spellings, as a comparison between

CoCA and BNC shows: The <tt>-forms are the dominant ones in the BNC, while the <t>-forms are the dominant

ones (in fact the only attested) in the CoCA.

The number of words with double consonants varies greatly between suffixes. For example, there are 212 words

with consonant doubling followed by <er> in CELEX (etc. <beginner>, <cropper>, <nagger>), but only four

words with consonant doubling followed by <ance> (<admittance>, <quittance>, <remittance>, <riddance>). At

first glance, this appears to be a feature of suffixes – -er occurs with double consonants, -ance only marginally

does so. But the reason for this may very well be morphological: If the suffix operates on monosyllabic bases,

the amount of consonant doubling is higher; if it operates only on polysyllabic bases, the amount is smaller.

Finally, independent of whether the prosodic or the morphological hypothesis is preferred, there seems to be a

constraint relating all double consonants in derived words to stressed syllables before the corresponding

consonant phonemes. This usually holds in cases like (18a), but not in those in (18b, cf. also Cummings 1988:


(18) a. <biddable>, <baggage>, <aquittance>

b. *<inferrence>, *<referrence>, *<preferrable>

The forms in (18b) should be preferred on the grounds of both the prosodic and the morphological hypothesis:

the stem has final stress, and the stem contains a final monosyllabic root (fer). Yet it is unclear how systematic

these spellings are. They are certainly exceptions within the set of -ence and -able formations. The reason for the

spellings in (18b) may be thought to be prosodic: After all, the double consonants wrongly indicate word stress

on the second phonological syllable. But this explanation does not hold for a number of other words: In e.g.

<muggee>, <floggee>, <plannee>, <allottee>, <chattee> and <submittee>, the ult is stressed, not the penult (as

the double consonants indicate). For the time being, I will treat the spellings in (18b) as idiosyncratic.

4. The periphery: Monomorphemic words

In monomorphemic words, consonant doubling is much less regular. The first class of words are of Latin origin.

They are not strictly monomorphemic, but they are not unequivocally morphologically complex either; they are

composed of a Latin prefix and a Latin stem, e.g. those in (19):

(19) collect, command, correct, connect, illegal, immersion, innate, comment

In these cases, the vowel letter before the double consonant corresponds to a short vowel phoneme, but it is

mostly not stressed (there are exceptions like comment, however). To capture this distribution, one can list

prefixes that often correlate with consonant doubling. The following list is from Rollings (2004: 83):

(20) ad-, con-, dis-, in-, ob-, sub-

Additionally, assimilations in the original Latin words have to be accounted for as well (ad + facere => affect, in

+ mobilis => immobile). The biggest problem for writers, however, is that at least some etymological knowledge

seems to be required to predict these double consonants (cf. e.g. Carney 1994: 119ff., Rollings 2004: 83f.).

Without it, the spellings <atone> (at + one) and <attain> (ad + tangere) are purely idiosyncratic. There may be

certain distributional cues (cf. Carney 1994: 120, Rollings 2004: 84), but on the whole, some knowledge about

word origin seems to be required.

Apart from words with Latin prefixes, the most important observation is that consonant doubling is highly

correlated with the graphemic shape of the word ending10. A review of the pertinent literature leads to the

inventory in (21a) (examples in 21b) for those word endings that occur with double consonants and the inventory

in (21c) (examples in 21d) for those that follow a single consonant (Carney 1994: 116f; Rollings 2004: 81f):

(21) a. <-ic>, <-id>, <-it>, <-ish>, <-ace>, <-ous>, <-al>, verbal <-age>, <-ule>

b. <panic>, <fetid>, <limit>, <palace>, <populous>, <metal>, <damage>, <module>

c. <-et>, <-ow>, <-y>, nominal <-age>

d. <bonnet>, <shallow>, <happy>, <cottage>

This correlation of certain word endings with either single or double consonants is often discussed in terms of

the words’ etymology: For Rollings (2004: 81f.), this behavior is an indicator of whether a word belongs to the

10The term word ending is used in the following to describe recurring letter strings. In bisyllabic words, the word ending is
the part of the word starting with the vowel (letter or phoneme) of the second syllable (e.g. <it> in <limit> or <er> in
<hammer>. It is the reverse unit to Taft’s (1979) BOSS: If you subtract the BOSS from a word, the remainder is the word
‘native’ or the ‘Latin’ part of the lexicon. The basic insight is that consonant doubling is rarer in words of Latin

origin and more frequent in ‘native’ words, but that it is hard to model this behavior synchronically (cf. also

Carney 1994: 116).

To test the effect of word endings on consonant doubling, three corpus analyses were carried out.

1. The first analysis is a purely graphemic investigation: Which word endings occur with double

consonants (e.g. <er> as in <summer>), which do not (e.g. <it> as in <visit>), and which occur with

vowel letter clusters (<ish> as in <nourish>)?

2. The second analysis takes the speller’s perspective and asks: How are short vowels marked, and how

much is this spelling determined by the word ending?

3. The third analysis takes the reader’s perspective and asks whether words with single intervocalic

consonant letters (e.g. <paper>, <limit>) correspond to words with a short or a long vowel phoneme (or

a diphthong) in the first syllable.

These three analyses will be described in the following. They are all based on bisyllabic words: As Carney

(1994: 123) states, apart from the Latin prefixes mentioned above, monomorphemic three or more syllable words

hardly contain double consonants (cf. e.g. <elephant>, <bigamy>, <strategy>).

4.1. Graphemic analysis

To determine the relation between word endings, single/double consonants and single vowels/vowel letter

clusters, we use CELEX. More specifically, we use the set of all words in CELEX that meet the following


 the word is not annotated as morphologically complex in CELEX (remaining morphologically complex

formations on free bases are manually filtered)

 the word is graphemically bisyllabic (<limit>), or trisyllabic with single final <e> (<palace>)

 the word contains a single or double consonant letter after the first vowel letter or vowel letter cluster

This leads to a sub-corpus of 2,324 words. On a purely graphemic basis, we determined for each word ending

how many words contain a single vowel letter followed by a single consonant letter (<VC>, as in limit); or a

single vowel letter followed by a double consonant (<VCC>, as in hammer); or a cluster of vowel letters

followed by a single consonant letter (<VVC>, as in eager). The pattern <VVCC>, though logically possible, is

very rare. It occurs only three times in the corpus (<caisson>, <bouffant>, <pierrot>). Apparently, the

constraint found for morphologically complex words extends to morphologically simple words: No consonant

doubling after vowel letter clusters.

The following table summarizes the results. It indicates whether a pattern is systematically attested for a given

word ending, and (if more than one pattern is attested) which pattern is dominant. ‘Systematically attested’ in

this context means that at least 10% of the words with a given ending fall into the respective category (<VC>,

<VCC>, <VVC>). This is indicated by the symbol ‘’. Accordingly, for -er in table 1 below this means that all

three patterns occur with more than 10% of the -er-words. Information about the respective dominant pattern is

included in the next line: ‘80%’ means a dominant pattern occurs in more than 80% of the cases; ‘60%’ means it

occurs in more than 60%, and ‘40%’ means the dominant pattern occurs in more than 40% of the cases. For the

ending <-er>, for example, VCC is dominant with more than 60% of all words with <-er> falling in this

category. Only word endings which occur 15 times or more are listed in the following table.

group VC VCC VVC word endings

1    -er, -y


2    -on


3    -ish, -o, -ure, -or, -an


4   -ing, -et


5   -in


6   -ot, -ar, -ey, -en, -age, -ard


7   -ect, -a, -al, -is


8   -ic, -i, -us, -um, -ile, -ate, -ent, -it


9  -our, -id


10  -ow, -ock


Table 1: Graphemic patterns (<VC>, <VCC>, <VVC>) associated with different word endings. :

systematically attested (> 10% of words with this ending occur with this pattern). >80%: Dominant pattern >

80%; >60%: dominant pattern > 60%; >40%: dominant pattern > 40%.

Ten word endings are correlated strongly (i.e. >80%) with one pattern: groups 8 and 9 (<-ic>, <-i>, <-us>,

<-um>, <-ile>, <ate>, <-ent>, <-our>, <-it>, <-id>) occur predominantly with <VC> patterns, and group 10

(<-ow>, <-ock>) occurs predominantly with <VCC> patterns. In other words, if a speller knows the word

endings is one of these twelve, the dominant pattern already follows on a graphemic basis. For the great

majority of word endings, however, there is graphemic variation – they occur with both single and double


4.2. The marking of short vowels in spelling

The second analysis takes phonology into account and investigates how short vowels are marked in spelling. Do

words with a single intervocalic consonant phoneme and a short vowel in the first syllable correspond to words

with consonant doubling (e.g. <summer>) or with single intervocalic consonants (e.g. <metal>)? The data base

to answer this question is the set of all words in CELEX that meet the following requirements:

 the word is not annotated as morphologically complex in CELEX (remaining morphologically complex

formations on free bases are manually filtered)

 the word is phonologically bisyllabic; the first syllable is stressed

 the word contains a single intervocalic consonant phoneme which corresponds to a consonant letter that

can be doubled11

This leads to a set of 1,583 words. Cross-classifying vowel quality (short vs. long/diphthong) over single vs.

double consonants (<C>/<CC>), we get:

11This excludes the consonant phonemes /ð, θ, ŋ, ʃ, ʒ, v, z/, which all correspond to complex graphemes that have no doubled
equivalent (*<thth>, *<shsh>). Moreover, the following non-doubled complex graphemes are also taken into account: <ck>
(as a doubled variant of <c> or <k>), <dg> (as a doubled variant of <g> if it corresponds to /d͡ʒ/, cf. e.g. Venezky 1999: 14),
and <tch> (as a doubled variant of <ch> if it corresponds to /t͡ʃ/, cf. e.g. Venezky 1999: 14). /z/ can correspond to <z>, which
can be doubled (e.g. <buzz>); however, <zz> is rather marginal and limited to recent borrowings (cf. Venezky 1999: 45).
The doubled variant for /v/ (<vv>) is marginal as well.
<C> <CC> total

short 323 708 1,031

long + diphthong 549 3 552

total 872 711 1,583

Table 2: The relation between consonant doubling and phonological vowel quality. Data base: trochaic CELEX

entries with one intervocalic consonant phoneme.

The majority of words with short vowel phonemes has a doubled consonant in their corresponding graphemic

form (708 of 1,031, or 69%), as table 2 shows. Words with long or diphthong vowel phonemes almost never

occur with doubled graphemic consonants (3 of 552, <1%). The three words which do occur are the ones

mentioned in the last section, <bouffant>, <caisson>, and <pierrot>, which are all of French origin. So words

with short vowels are often spelled with doubled consonants; long vowels or diphthongs almost never are.

Having a short vowel is thus a necessary condition for consonant doubling.

Focusing on the 1,031 words with short vowels, we ask what determines the distribution in table 2. As noted by

Carney (1994), Rollings (2004) and others, the ratio of doubled consonants varies depending on the word ending.

The following table presents single vs. double consonants for word endings which occur at least ten times. 20

spellings with vowel letter clusters were excluded, e.g. treadle, zealot, meadow, flourish; in these cases,

consonant doubling cannot be expected for graphotactic reasons (see above).

word ending <VC> <VCC> total %<VC> examples

ock 14 14 0% hammock, haddock

le 2 127 129 2% shuttle, pickle

er 3 93 96 3% trigger, hammer

ow 2 34 36 6% sorrow, mellow

y 12 109 121 10% carry, city

ey 2 13 15 13% valley, money

et 9 54 63 14% socket, planet

a 3 16 19 16% comma, para

ar 3 8 11 27% collar, vicar

o 6 14 20 30% ghetto, demo

in 6 14 20 30% tiffin, robin

ot 4 6 10 40% maggot, spigot

age 10 9 19 53% image, scrimmage

on 13 11 25 54% melon, gallon

ish 11 2 13 85% finish, parish

ic 20 3 23 87% critic, panic

it 14 2 16 88% spirit, edit

id 16 2 18 89% rapid, solid

Table 3:Marking of short vowels with single/double consonants according to word ending. All word endings

that occur 10 times or more in the sub-corpus.

Taking 20%/80% as arbitrary thresholds, the word endings in table 3 fall into three groups, those with mostly

doubled consonants (22.a), those with mostly single consonants (22.b) and those in between the two groups


(22) a. >80% <VCC>: <-ock>, <-le>, <-er>, <-ow>, <-y>, <-ey>, <-et>, <-a>

b. >80% <VC>: <-id>, <-it>, <-ic>, <-ish>

c. 20-80% <VC>: <-ar>, <-in>, <-ot>, <-age>, <-on>, <-o>

From the purely graphemic overview given above (table 1 above) it follows that the distribution of <-id>, <-it>,

and <-ic> on the one hand and of <-ock> and <-ow> on the other hand are hardly surprising: In all those cases,

there are no graphemic alternatives, e.g. no graphemic words ending with double consonant followed by <ic>, or

no graphemic words endings with single consonant followed by <ow>. The other word endings in (22) extend

the list determined purely graphemically, and also the one from the pertinent literature (21 above). Moreover,

what is striking about the word endings in (22) is that all endings that prefer single consonants involve the vowel

letter <i>. The letter <e>, on the other hand, is found in many of the word endings that prefer double

consonants. This is true for the whole corpus as well: Overall, 200 of 232 words with <e> following a single or

double consonant (e.g. <summer>, <bonnet>, <hockey>, <wicked>) have a double consonant (86%). This

resonates well with Evertz & Primus (2013) who attribute a special theoretical status to this structure (which they

call the ‘canonical trochee’ – a bisyllabic graphemic word with <e> in the second syllable).

It follows that there are indeed patterns (in the sense of recurring word endings) which correlate with the

presence or absence of double consonants. Note that the graphemic form of the word ending is the determinant,

not the phonological form. For the phonological word ending [ɨk] for example, there are at least three different

spellings, <ic>, <ick>, and <ock>. While <ic> predominantly occurs with preceding single consonants, <ick>

and <ock> occur exclusively with double consonants. The following table lists this and similar cases:

phonological graphemic word

word ending ending <CC> <C> %<CC> example

ɨk 7 19 27%

<ic> 3 20 14% panic

<ick> 4 0 100% derrick

<ock> 13 0 100% haddock


<el> 6 0 100% barrel

<al> 0 3 0% moral

<yl> 0 2 0% beryl

<il> 0 2 0% peril


<et> 51 6 89% cricket

<ate> 0 6 0% palate

<it> 2 14 13% edit

<ot> 6 2 75% maggot


<er> 93 3 91% hammer

<ar> 8 3 73% collar

<or> 6 3 67% horror

<our> 1 6 14% honour


<ow> 33 2 94% sorrow

<o> 14 6 70% motto

<ot> 0 2 0% depot

<eau> 0 3 0% plateau


<us> 3 0 100% cirrus

<ous> 3 0 100% callous

<ace> 2 4 33% menace

<is> 4 1 80% tennis

<ise> 0 3 0% promise

<ice> 3 4 43% malice

Table 4: Homophonous word endings which differ graphemically, and which show a different amount of

consonant doubling depending on the graphemic form of the word ending.

For the speller, this is an unfortunate situation: To deduce whether or not a consonant is doubled, she must know

which of many possible written forms a phonological word ending has. In this respect, the written forms are

doubly coded. This correlation can be termed graphemic harmony: One choice of graphemic options determines

another choice.

At least partly, graphemic harmony correlates with the words’ etymology: Words of French origin, for example,

tend to have single consonants (e.g. <ic>, <ate>, <our>, <ot> for /əʊ/, <eau>, <ace>, <ise>, <ice>); words of

Germanic origin tend to have double consonants (e.g. <ick>, <ock>, <er>, <ow>). One notable exception is

<et>; the respective words are mostly of French origin, but occur mostly with doubled consonants.

4.3. The reading of single intervocalic consonant letters

The third analysis takes the reader’s perspective. To understand the patterning of consonant doubling and word

endings (table 3 above), it is important to understand the ‘functional load’ for each word ending. For example,

as shown above, words which end with <id> are only rarely spelled with a preceding double consonant. But if

<id>-words never contained long/diphthong vowel phonemes, the marking of vowel quality would be negligible.

If, on the other hand, a significant fraction of <id> words contained long/diphthong vowel phonemes, the

graphemic forms would be a lot more idiosyncratic – the reader would just have to know how to pronounce this

particular <id>-word, as opposed to a rule for the set of all <id>-words.

The data base to tackle this question is the set of all words in CELEX that meet the following requirements:

 the word is not annotated as morphologically complex in CELEX (remaining morphologically complex

formations on free bases are manually filtered)

 the word is graphemically bisyllabic (<limit>), or trisyllabic with single final <e> (<palace>)

 the word contains one single intervocalic consonant letter between the first and the second syllable

(<limit>/*<happy>); this consonant letter could in principle be doubled12.

 the first syllable contains a single vowel letter (<limit>/*<tailor>)13

This leads to a corpus of 1,114 words. Classifying for short (‘/V’/) or long/diphthong (‘/VV/’) vowel phonemes

(according to CELEX’s classification system), we get the following distribution:

/V/ /VV/ total

trochaic 318 440 758

iambic 297 59 356

total 615 499 1,114

Table 5: Cross-classification of vowel quality (‘/V/’: short; ‘/VV/’: long/diphthong) over foot structure

(iamb/trochee). Data base: all graphemically bisyllabic words (and trisyllabic words with single final <e>) in

CELEX with a single vowel letter in the first syllable followed by a single consonant letter.

As table 5 shows, phonological foot structure co-varies with vowel quality: If <VC>-words correspond to

trochaic phonological forms, the vowel phoneme is a long vowel or diphthong 58% of the time, e.g. <tiger>. If

they correspond to iambic words, the vowel phoneme is short (and often reduced) 83% of the time, e.g.


The correspondence of a given graphemic words to an iambic or trochaic phonological word depends on many

factors, e.g. the presence or absence of prefixes like <de-> or <re-> (e.g. <demand>, <report>) and the word

category (e.g. pro’test (V) vs. ‘protest (N)). In the following, we will only investigate the 758 trochaic words

from table 5.

For these words, we get the following distribution according to their word ending (only word endings with ten or

more occurences):

12 This excludes the consonant letters <h, j, q, v, w, x, y, z>. In words with these letters, there is no potential opposition (as in
e.g. <dinner>/<diner>). Thus, intervocalic <v> (for example) cannot code vowel quality, and both a short and a
long/diphthong reading are possible for structurally similar words (cf. <never>/<fever>).
13 As noted above, vowel letter clusters in the first syllable usually correspond to long or diphthong vowel phonemes; the

word ending has little effect on it.

word ending /V/ /VV/ total %/V/ examples

us 0 27 27 0% opus, bonus

a 3 38 41 7% drama, schema

um 1 11 12 8% velum, datum

er 3 25 28 11% paper, cater

ent 2 12 14 14% silent, latent

ey 2 12 14 14% crikey, phoney

o 6 26 32 19% zero, lino

or 3 11 14 21% manor, minor

ar 3 8 11 27% radar, vicar

al 3 8 11 27% oral, coral

y 13 27 40 33% lady, many

i 5 10 15 33% yogi, mini

ile 5 7 12 42% senile, fragile

our 6 6 12 50% vapour, glamour

on 13 9 22 59% melon, demon

ate 6 4 10 60% senate, climate

id 16 5 21 76% solid, stupid

it 16 3 19 84% limit, vomit

et 9 1 10 90% planet, comet

ic 21 2 23 91% comic, logic

age 11 1 12 92% manage, damage

ish 11 1 12 92% vanish, fetish

Table 6: Reading of words with single intervocalic consonant letters as containing short or long/diphthong

vowel phonemes, according to word ending. All word endings that occur 10 times or more in the sub-corpus.

An in-depth analysis of the word lists that serves as the basis of table 6 may lead to interesting – and potentially

clearer – results. For example, almost all words with <-ic>, <-id>, and <-it> that correspond to long vowels

involve <u> (e.g. <humid>, <cupid>, <music>, <tunic>, <unit>), and no <u> before these suffixes corresponds

to a short vowel. In this sense, <u> is special (cf. e.g. Cummings 1988). It is conceivable that similar sub-

patterns exist that can explain some – though far from all – variation in table 6.

Like in the last section, we can classify these endings: Those in (23a) are clearly associated with a

long/diphthong reading of the respective vowel, those in (23b) are clearly associated with a short reading; those

in (23c) are in between.

(23) a. >80% /VV/: <-us>, <-a>, <-er>, <-um>, <-o>, <-ent>, <-ey>

b. >80% /V/: <-it>, <-et>, <-ic>, <-ish>, <-age>

c. 20-80% /V/: <-ar>, <-al>, <-y>, <-i>, <-ile>, <-our>, <-on>, <-ate>, <-id>, <-or>

4.4 Synopsis

Many of the word endings in (23) are in the same group as in (22); there seems to be a connection. Functionally,

this makes sense: If a short vowel is often encoded with consonant doubling (e.g. <summer>, group 22a above),

then a single consonant letter can correspond to a long/diphthong vowel phoneme (e.g. <paper>, group 23a). If,

on the other hand, a short vowel phoneme is often encoded with a single consonant letter (e.g. <limit>, group

22a), then the same structure should not be used to encode long/diphthong vowels.

Figure 1 shows this connection. For each word ending from (22) and (23) above, the amount of consonant

doubling (horizontal axis) is plotted against the amount of a long/diphthong reading in words with a single

intervocalic consonant letter (vertical axis). So for example, -age in the middle of the bottom of figure 1

indicates with its position that it occurs with consonant doubling 47% of the time (e.g. <village>, <message>),

while at the same time a short vowel reading is dominant (92%) in <…VCage> words (e.g. <manage>,


100% us
um a le
90% ent ey
al ar

70% i y

%/VV/ 50%
ate on

30% id

20% it
in et
icish age
ow ock
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


Figure 1: Percentage of <CC> spellings (as a ratio of combined <CC> and <C> spellings) for each word

ending combined with the percentage of /VV/ (long vowel or diphthong) reading for single vowel letters in words

with that suffix.

These results have to be taken with a grain of salt because the actual numbers are sometimes rather small. For

example, <um> occurs only three times with a short vowel in the corpus; one of the words is spelled with a

single consonant (<alum>), two are spelled with a doubled consonant (<vellum>, <possum>). This leads to a

67% ratio of <CC> spellings, but it should clear that this figure is of a different quality than e.g. the 19 instances

of <a>.

With that in mind, we can identify four groups of word endings in figure 1:

Group 1: no consonant doubling, no long/diphthong reading (<-id>, <-ic>, <-ish>, <-it>; lower left corner of

figure 1). This relation is functional, as sketched above: If words that end with <ic> are never read as having a

long/diphthong vowel phoneme, the shortness of the vowel in turn does not have to be indicated.

Group 2: mostly consonant doubling, mostly long/diphthong reading (<-us>, <-a>, <-er>, <-le>, <-ey>; with

limitations also <-um>, <-ent>, <-ar>, <-or>, <-o>, <-y>; upper right corner of figure 1). This relation is also

(mostly) functional: If vowels preceding single consonant letters are likely to be read as long/diphthong vowel

phonemes (e.g. <paper>), then the shortness of the vowel phoneme should in turn be indicated (e.g. <summer>).

However, there are some idiosyncratic cases (e.g. <scholar> vs. <molar>).

Group 3: some amount of consonant doubling, some amount of long/diphthong reading (<-ate>, <-our>, <-al>,

<-i>, <-ile>, <-on>, <-ot>; in between groups 1 and 2, towards the center of figure 1). These word endings do

not systematically encode shortness, even though a considerable amount of words with a single intervocalic

consonant letter gets a long/diphthong reading; shortness is “under-coded”, so to speak. In effect, this group

shows a greater amount of idiosyncratic words. This leads to problems for the reader (consider the long/short

pairs <demon>/<lemon>; <facile>/<docile>; <robot>/<spigot>), but also for the speller (consider the

single/double consonant pairs <canon>/<cannon>; <spigot>/<maggot>).

Group 4: some amount of consonant doubling, no long/diphthong reading (<-in>, <-et>, <-ow>, <-ock>; with

limitations <-age>; bottom right of figure 1). These word endings systematically encode shortness by consonant

doubling, and they do so even without a functional need: Vowels before single intervocalic consonant letters

hardly ever correspond to long/diphthong vowels. In a way, this group “over-codes” shortness.14

Note that we do not find word endings in the upper left corner. The respective spellings would be highly


So what is the condition for consonant doubling in monomorphemic words? Obviously, word endings have a

strong effect (although only the most frequent ones are accounted for in figure 1): Some endings are correlated

with consonant doubling, some with single consonants, and some are in between. If we take figure 1 as a basis,

we can at least formulate a sufficient condition for doubling: Consonant doubling occurs with word endings that

are also associated with a long/diphthong reading. This condition is not necessary: Group 4 also contains double

consonants, even though there is no functional pressure.

5. Conclusion

Consonant doubling is regular in morphologically complex words. It can be motivated with reference to e-

deletion before vocalic suffixes, and for morphologically complex words it can possibly be described in

graphemic and morphological terms alone, without reference to phonology. Of course, the resulting spellings are

14 With the exception of group 4, the relation between the two dimensions could also be interpreted as being linear.
However, to my mind it makes more sense to think of the distribution in terms of cluster and outliers/inbetweeners.

also phonographically plausible, and there are regular correspondences on a suprasegmental level (cf. e.g.

Rollings 2004, Evertz & Primus 2013). Phonological terms are, however, not necessary to capture the graphemic

behavior of morphologically complex words.

Consonant doubling is far less regular in morphologically simple words, where a short vowel phoneme is a

necessary condition. Doubling in these words varies with the respective word’s ending. Some word endings

trigger consonant doubling (e.g. <-er>, <-a>, <-y>, <-ow>, <-ock>), some do not (e.g. <-it>, <-id>, <-ic>). This

is an effect of the graphemic form of the word ending, not one of the phonological form. As a matter of fact, the

same phonological ending (e.g. [ɨk]) can be spelled in different ways, and the presence of consonant doubling

hinges on the choice of this spelling (cf. e.g. <comic>/<gimmick>). This phenomenon was dubbed graphemic

harmony. Word endings are thus recurring entities with distributional properties – they correlate with consonant

doubling or non-doubling. That makes them very similar to suffixes. But unlike suffixes, they have no

morphosyntactic or semantic function. It is an interesting question whether they are psychologically “real”. Do

proficient readers strip them off the word just like they do with suffixes (cf. e.g. Rastle, Davis & New, 2004)?

There is a functional relation between consonant doubling and the amount of words with single intervocalic

consonants that correspond to words with long/diphthong vowel phonemes: Doubling is only necessary if the

alternative spelling would be prone to misreading. The systematic occurrence of words like <paper> makes the

spelling <summer> (not *<sumer>) necessary.


