You are on page 1of 343

This page intentionally left blank

Selfish Sounds and Linguistic Evolution

This book takes an exciting new perspective on language change, by

explaining it in terms of Darwin’s evolutionary theory. Looking at a
number of developments in the history of sounds and words, Nikolaus
Ritt shows how the constituents of language can be regarded as mental
patterns, or ‘memes’, which copy themselves from one brain to another
when communication and language acquisition take place. Memes are
both stable in that they transmit faithfully from brain to brain, and active
in that their success at replicating depends upon their own properties.
Ritt uses this controversial approach to challenge established models of
linguistic competence, in which speakers acquire, use and shape lan-
guage. In Darwinian terms, language evolution is something that hap-
pens to, rather than through, speakers, and the interests of linguistic
constituents matter more than those of their human ‘hosts’. This book
will stimulate debate among evolutionary biologists, cognitive scientists
and linguists alike.

   is Professor and Head of the English Department at

Vienna University. He has published in many linguistics journals, and is
co-editor (with Christiane Dalton-Puffer) of Words: Structure, Meaning,
Function (2000), and author of Quantity Adjustment: Vowel Lengthening
and Shortening in Early Middle English (Cambridge University Press,
Selfish Sounds and Linguistic
A Darwinian Approach to Language Change

Nikolaus Ritt
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press

The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
Information on this title:

© Nikolaus Ritt 2004

This publication is in copyright. Subject to statutory exception and to the provision of

relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.

First published in print format 2004

isbn-13 978-0-511-19449-8 eBook (EBL)

isbn-10 0-511-19449-8 eBook (EBL)

isbn-13 978-0-521-82671-6 hardback

isbn-10 0-521-82671-3 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.

List of figures page vi

Preface ix

1 Introduction 1
2 The historical perspective 11
3 Approaching ‘language change’ 19
4 The Darwinian approach 62
5 Generalising Darwinism 89
6 Towards an evolutionary theory of language 122
7 What does all this imply for the study of language
change? 230
8 How to live with feet, if one happens to be
a morph-meme 240
9 The prosodic evolution of English word forms or
The Great Trochaic Conspiracy 289
10 Conclusion 307

References 313
Index 323


3.1 Six manifestations of ‘language’ page 22

3.2 Schematic representation of the processes involved in
linguistic change 38
3.3 Variants of greene in Modern English dialects (map Ph94
from Orton–Sanderson–Widdowson 1978) 44
3.4 Andy Warhol’s portrait of Marilyn Monroe, or rather a
‘copy’ of it 51
4.1 A population of three replicator types A, B and C 72
4.2 The evolutionarily stable distribution of As, Bs and Cs 73
5.1 Operation of a Complex Adaptive System (after Gell-Mann
1992: 11) 95
5.2 Operation of biological species, viewed as Complex
Adaptive Systems 97
5.3 Language acquisition, viewed as a Complex Adaptive
System 100
5.4 Language evolution and change, viewed as a Complex
Adaptive System 106
6.1 How to identify constituents in structured networks 165
6.2 How to identify copies of network constituents 166
6.3 Possible variants of constituent types 166
6.4 More variants of constituent types 167
6.5 The phone-meme /z / 171
6.6 The morph-meme {bυl} 174
6.7 A meme for distinguishing between onsets and nuclei 175
6.8 A meme ‘for’ the phonotactics of onset clusters 176
6.9 A meme cluster for syllable structure 178
6.10 How {strip} activates the [␴ OOO[R NC]R ]␴ -meme 179
6.11 How {bit} fails to trigger C1 when occurring before {of} 179
6.12 A meme for foot structure 181
6.13 The rhythm of memory 182
6.14 (Part of) a ‘rule-meme’ for pre-consonantal devoicing 184
6.15 The internal selection of brain-states 206

List of figures vii

8.1 Another look at a meme for feet 259

8.2 How analogue transmission selects for binary oppositions 261
8.3 The emergence of binary oppositions 261
8.4 The implementation of Open Syllable Lengthening 268
8.5 The implementation of lengthening among CVC
monosyllables 278

This book was intended to become a study of English historical phonol-

ogy and morphology based on a generalised Darwinian model of linguistic
evolution. Its basic idea, going back in my case to a summer reading of
Richard Dawkins’ Selfish Gene, is that languages represent teams or pop-
ulations of replicating mental patterns, which use their human hosts, that
is, speakers, for the purpose of their essentially selfish replication. As my
attempts began to take some shape, I presented ideas for a few chap-
ters at various conferences and had some of them published in journals
and conference volumes. Although they were usually well received, how-
ever, nobody seemed to understand why I needed what came across as
‘biological metaphors’.
As far as I was concerned, however, the concepts I employed were not
metaphors at all, and the accounts I gave only made sense, I thought,
within the particular perspective I had begun to take. The failure of my
colleagues to understand that I was not just using exotic language to
lend more hype to otherwise perfectly conventional stories caused me
considerable worry, and I therefore tried to be more explicit about my
approach, its theoretical foundations and its advantages over the more
established view that languages represent mental tools which speakers
use and modify according to their needs. As I went on with this task,
it began to seem more and more likely that this book would become a
plea for a theoretical perspective rather than an exercise in description,
as I had originally intended. And this is indeed what it seems to have
Of the case studies which I originally prepared, only two have made it
into the final version. One of them attempts to explain why words such
as ModE man have retained their short vowel ever since Old English
times, although they ‘ought to’ have lengthened if my 1994 thesis on
Middle English adjustments of vowel quantity was correct. The other
makes the case for a ‘Trochaic Conspiracy’ in the historical develop-
ment of Old and Middle English word forms, in which they all attempted

x Preface

to optimise their metrical structure. The rest of this book is a long

attempt to argue why the stories I tell in the more empirical sections make
This book questions many established assumptions about languages,
speakers, and what it is that linguists are describing. Making excursions
into evolutionary biology, the theory of complex adaptive systems, cog-
nitive science and, indeed, memetics, it defends an approach to the study
of language and its history which will strike many linguists as somewhat
unusual. I shall argue that it is not only possible to speak, metaphori-
cally, of languages as if they were entities with a life of their own, but
that they indeed are. Although they are not made of genes, their con-
stituents do qualify as replicators and are capable of evolution. Like
the evolution of genes, however, the evolution of language constituents
proceeds by the mechanism of quasi-random mutation, and subsequent
automatic selection. Thus, the most appropriate framework for under-
standing the properties of languages and their historical development is
generalised Darwinism. Since I am convinced that this does not only
apply to the descent of [mæn], and the particular historical conspiracy
in which English word forms evolved to become better trochees, you are
invited to let yourself be convinced.
While I am personally responsible for all shortcomings of this book, it
is fair to say that I owe much of what might make it a worthwhile read
to others. The names of many are included in the references, of course,
but there are a number to whom I owe special gratitude. Although they
cannot be aware of it, Richard Dawkins, Daniel Dennett and Douglas
Hofstadter have opened my mind to the perspectives and ways of arguing
on which most of this book is based. So has Roger Lass – only he knows, I
hope. Had it not been for his encouragement and critical support, I do not
think I would ever have attempted to write this book. Trying to emulate
his style of thinking has been an immensely rewarding experience.
People who know Roger Lass’ views on language change will per-
haps find it somewhat ironical that I am almost as indebted to Wolfgang
Dressler, in whose Circle of Natural Linguists I have always felt at home.
In particular, I would not want to have missed any of the inspiring dis-
cussions with Katarzyna Dziubalska. Which brings me to the friends and
colleagues in my own department, in particular our working paper dis-
cussion group, and most above all Herbert Schendl, Barbara Seidlhofer
and Henry Widdowson. I thank them for their loyalty, and the inter-
est, the feedback, and the time they have given me. They are also great
colleagues, generally.
I thank April and Ron McMahon for their encouraging and helpful
comments on the beta version of this book, and Kate Brett and Helen
Preface xi

Barton from Cambridge University Press, without whose support, insight

and professionalism this book would not be what it is.
To Laura, who would deserve to know in great detail exactly how much
I am indebted to her, I can only say that I am still at a loss for words. All
I can do is dedicate this book to her.
1 Introduction

1.1 The benefits of language

It might be premature to decide whether our species has been an evo-
lutionary success or not, but the number of its members has clearly
increased exponentially during the last 100,000 years or so. Most prob-
ably, one of the main reasons why humans have been so extraordinarily
successful in reproducing before dying is that they have language.
Language helps humans to establish within their minds representations
or models of the worlds in which they live, and enables them to carry out
experiments on those models. Since these experiments take place in the
virtual realities of their minds, humans do not have to suffer their actual,
potentially harmful consequences. Indeed, the particular ease with which
language allows them to direct and control their own thinking seems to
distinguish them from most other animal species, which seem to be much
more strongly constrained – be it by external stimuli or by instincts – in
what they ‘think about’.
Language also allows them to share knowledge. Each individual can
thus learn about the experience of others and avoid repeating their mis-
takes. The possibility of sharing information through language is not only
good for individual humans, however. It is socially significant as well, since
it makes all human beings, at least potentially, useful to one another.
This might be an important factor behind the unique social instincts
that characterise the human species. Unless humans had good reasons
to expect of each of their co-speciates that they might come to learn
something useful, they might not generally treat each other with a co-
operativeness and apparent selflessness that is otherwise rare in the animal
Apart from making information communicable and tradable, language
also facilitates co-operation in a more general sense. As a means of ‘gently’
manipulating the behaviour of others through commanding, requesting,
negotiating or – more indirectly – through altering their perception of real-
ity, language provides a flexible medium for groups to co-ordinate their

2 Selfish Sounds and Linguistic Evolution

actions and to pursue goals which, although beneficial to all members,

would exceed the capacity of any single individual.
Finally, language may play yet another role in establishing and main-
taining coherence in human groups. Being acoustically transmissible with
relatively little physical effort, it makes it easier for group members to
identify each other quickly, reliably and from relatively safe distances. If
an individual recognises that another one speaks like itself, it will identify
it as a member of its group and treat it accordingly. Although this aspect
of language does have its sinister aspects (just consider how humans have
tended to behave towards co-speciates who do not speak like them), it
may have been the decisive factor which allowed early humans to live
together in groups comprising as many as 150 individuals. This greatly
exceeds the group size typical among other primate species, and as has
recently been suggested (Dunbar 1996), it may have even constituted the
crucial selective pressure which got the human language faculty off the
ground in the first place.
It is obvious, then, that language is a good thing to have, both for us
as individuals and for our species as a whole.

1.2 . . . its shortcomings

Although language is definitely very useful, however, there will be hardly
anybody who has not become aware – at one time or another – of its
limitations and, indeed, its dark side. To begin with, we all know how
easy it is to be misunderstood or to misunderstand, we all have expe-
rienced the agony of groping, in vain, for the proper words to express
specific thoughts. People who are better than others at using language
often acquire high social prestige or draw material benefit from their tal-
ents. But even among professional writers, speakers or even poets it has
always been a commonplace that les mots justes are extremely hard to find
and that some things seem beyond the reach of language altogether.
Another of its not so helpful properties is that language has a way of
diversifying into different languages, dialects, styles, registers and even
individual ‘ways of speaking’. This diversity has always tended to be
exploited by human selfishness and to nurture feelings of xenophobia.
We despise or envy each other for the ways we speak, we form coalitions
against each other on linguistic grounds, and we have come to make
enemies of those who speak differently.
Finally, the very power of language as a device for influencing others
can of course be exploited not only for good and altruistic, but also for
selfish and downright evil purposes.
Introduction 3

Most human societies are aware of the limitations of and the dangers
inherent in human language and many have tried to come up with ways
of reducing language related risks: children are sent to school, are trained
in the most profitable use of their mother tongues, and are taught to
understand and see through the rhetorical tricks of demagogues. Also,
a considerable and growing number of people all over the world are
taught foreign languages, so that the gaps between linguistically differ-
ent communities are more easily bridged. Finally, institutions of research
and higher education that dedicate themselves to the study of human
language have spread all over the globe during the last one hundred

1.3 . . . and ways of studying it

Clearly, the central role which language plays in human existence repre-
sents an almost self-evident justification for all efforts directed at studying
and understanding it better. Yet, language, omnipresent in human lives
though it may be, is rather elusive as an object of rational enquiry and
difficult to pin down for analysis. To see why this is so, let us take a crude
first look. In everyday experience language typically comes across as a
kind of ‘tool’. Common sense regards it as ‘a system of knowledge that
is put to use in speaking and understanding’ (Chomsky 1988: 15) and
that seems to serve people as a ‘means’ of communication (both with
others and with oneself ). How, then, might this ‘tool’ be studied and

1.3.1 Observation and inference in language modelling

If one thinks of language as a tool, even if only metaphorically, it is rea-
sonable to ask oneself in what ways tools in general are examined and
investigated when one wants to understand how they work. Of course,
tools in the normal sense of the word are artefacts designed and con-
structed by humans, and if one knows the actual designer of a particular
tool, one can ask him how it works, or can ask, at least, for blueprints or
building instructions. In the case of natural human languages, however,
this option is clearly not available because, for all that is known, they are
not artefacts in the normal sense. Languages have not been designed by
anybody in particular at all.
When one does not know the designer of a tool, one can still try to
understand its design and function through reverse engineering. One
dismantles the tool, looks at the nature and arrangement of its parts, and
4 Selfish Sounds and Linguistic Evolution

tries to work out how they interact to produce its specific effects. If one
succeeds, one can then reconstruct the plan and the intentions behind
the tool. Unfortunately, however, this approach faces serious problems
when applied to language, since many of its aspects are simply impossible
to dismantle in such a way that their constituents could be easily isolated
and observed. This has several reasons. To begin with, it is not at all
obvious what exactly to take apart if one wants to lay open ‘the internal
mechanics of language’. Language seems to manifest itself in a variety of
different domains, such as in texts, in behaviour, in individual speakers’
competence, or in social conventions. Where exactly, and in what manner
does ‘it’, that is, the tool that we are interested in, exist then? Which, if any,
of its manifestations should be considered primary? As we shall see below,
the issue is rather complex and forces one to make subtle, yet principled
decisions.1 Secondly – albeit closely related to the ontological problem –
there exist good reasons to suspect that at least much of language is part
of the human mind. The mind, of course, is still somewhat of a white
spot on the scientific landscape, and relatively little is known about it. To
make things worse, all that is known about it suggests that the properties
of minds depend most crucially on the workings of human brains, and –
for both practical and ethical reasons – we are in no position to take those
apart for the purposes of academic enquiry.2
Now, if one cannot dismantle a tool and look at its parts, the only way
in which one can try and develop an idea about its internal mechanics is
through inference. One observes the behaviour of the tool under variable
and controlled conditions and then tries to imagine what kind of con-
struction could achieve the observed effects. The hypothetical blueprint
which one thus constructs might also be called a ‘theory’ or ‘model’ of
the tool. The problem with such indirectly derived models is that one
can never be sure how similar they actually are to the ‘real’ machine
of which they are supposed to be models. One will never really know
if the model and the original look alike inside, even though both might
‘behave’ almost identically. For practical purposes, this may not make a
1 It might be necessary to stress already at this point, however, that ‘language’, if it is viewed
as a tool, cannot at the same time be identified with ‘texts’. Text, i.e. the output produced
with language tool, can of course be ‘taken apart’ and analysed rather easily (at least in
certain ways), but the same does clearly not hold for the ‘tool’ itself, which includes the
mental machinery involved in both producing and understanding textual output.
2 As a matter of fact, the last decades have seen the development of techniques by which
activity within human brains can actually be measured and recorded without damaging
the brains themselves. The best known ones are ‘Positron Emission Tomography’ (PET),
‘Magnetic Resonance Imaging’ (MRI) and the ‘Superconducting Quantum Interference
Device’ (SQUID) (Rose 1992: 131–4). It is fair to say, however, that the measurements
they permit are still fairly rough and don’t yield sufficiently fine-grained pictures for most
linguistic purposes, so that the claim to which this footnote refers is still largely valid.
Introduction 5

big difference. Having a good ‘model’ might even put one in a position
to design new tools that are just as efficient as, or even better than, the
original in performing certain tasks. Still, one will never ultimately know
whether one’s model faithfully represents the internal make-up of the
original, and there will always remain the possibility that circumstances
might arise – not encountered before – in which one’s model will behave
differently from the original after all. Should this happen, one will have
to revise and adapt one’s model accordingly.3 In short, modelling a tool
through inferring its internal mechanics from its observable effects tends
to strike one as somewhat unsatisfactory, yet if the tool under considera-
tion is language, it is the only choice one has.

1.3.2 Modelling by inference: data problems

Apart from being somewhat unsatisfactory, however, the intention of
deriving a model of language by the observation of its effects also forces
one to make a number of preliminary decisions and theoretical assump-
tions, which – at least in the case of language – is rather difficult. For
instance, even if one can model language only by inference rather than
by taking it apart and actually looking at it, it is necessary to take a stance
on the problem of what it ‘actually’ is; that is to say its ontology. One
cannot just model away, as it were, without first having a reasonable idea
of what it is that one is constructing a model of. Putting it in slightly
different terms, the question arises as to what in the observable world
does actually constitute evidence of language and how ‘language in and
by itself ’ should be conceptually disentangled from and then related to
that evidence. Already, in the context of this rather basic problem, it turns
out that the everyday meaning of the word language is highly ambiguous
and likely to create considerable confusion in focused academic enquiry.
In fact, ‘language’ in the everyday sense seems to be multi-faceted and
to assume many different shapes as soon as one begins to question prelim-
inary common sense notions. Of course, it manifests itself most obviously
as ‘text’, that is, complex patterns of speech or writing. In this form, lan-
guage is comparably easy to observe. Texts are part of the material world
‘out there’ and can be described in a detached and intersubjectively ver-
ifiable manner. It is easy to agree, for example, that the word language
consists of eight letters, that the one in position two is identical with the
one in position six, and so on. However, its textual manifestation can-
not be all there is to language. There is clearly more to it than just the
3 This argument is well known in the philosophy of science, of course, as the Popperian
insight that theories can never be ultimately ‘verified’ and can at best be regarded as not
yet falsified (Popper 1968).
6 Selfish Sounds and Linguistic Evolution

textual shapes in which it comes. For example, the word language does
not only have a shape, but also expresses some meaning and it is only
this that makes it language. Otherwise it would just be a pattern of black
shapes on white background. In order for textual patterns to ‘have mean-
ing’, however, their form is not sufficient. Instead, speakers need to be
involved: either those who produce texts to ‘express’ meaning, or those
who interpret them to ‘recover’ it. It is important to note that the kind
of meaning that gets associated with particular textual patterns depends
at least as much on what speakers do with them as on the structures of
the texts themselves. The American philosopher Daniel Dennett (1990,
and has contrived a nice
little text which will get two radically different meanings when ‘pro-
cessed’ by either English or French speakers, and which illustrates the
often underestimated role which speakers play in endowing texts with
(1)   –  
In the first case, it can be interpreted to ‘mean’ or to ‘express’ something
like You have (a) great leg(s). Why don’t you touch ours/mine?, in the second
something like Great heritage! Sixteen bears! Of course, this example is
made up for the purpose and not a very natural text, but it does drive the
point home quite impressively. Language must be more than texts.
From a different perspective ‘language’ could, for instance, be regarded
as a form of human behaviour that involves mental and physiological
processes somehow linking ‘meanings’ to ‘texts’. An established term for
language in this dynamic, procedural sense is ‘discourse’.4 Observing
and describing it is more challenging in many ways than analysing static
texts, but both the physiological aspects of the processes involved (such
as articulation or auditory perception) as well as the behavioural context
of discourse (including many of the effects it has on people, for example)
are still relatively amenable to detached, empirical observation.
Yet, even communicative behaviour cannot be all there is to language.
After all, there is a sense of ‘language’ in which speakers ‘have it’ even
while they do not actively use it. It appears to exist in speakers’ minds
as a cognitive potential for producing or interpreting an infinite number
of possible utterances. Often referred to as linguistic ‘competence’ (e.g.
Chomsky 1965: 4), language in this sense represents a system of knowl-
edge which speakers draw upon when they engage in linguistic behaviour
and produce or interpret texts. This cognitive or mental implementation

4 Defined in Beaugrande 1997 as ‘the level of the total communicative event, including
discoursal moves, gestures, facial expressions, emotional displays, and so on, in contexts
of situation’ (44).
Introduction 7

of language is more difficult to investigate than either discourse or its tex-

tual products, of course. It cannot be directly observed except through
introspection, and introspection is by definition subjective, which makes
it highly problematic as a method in empirical science.
The fact that nobody can introspect any other except their own minds
is particularly unfortunate, because the fact that language works for inter-
personal communication implies that it must necessarily transcend the
level of individual speakers in some way or other. Thus, another mani-
festation of language is social. Any language is always shared by a com-
munity of speakers. At the same time, no two speakers of ‘a’ language
speak exactly alike, which suggests that their linguistic competences will
differ as well, and this means that, in the super-individual or social sense,
a ‘language’ will be ‘complete’ only within its speech community as a
whole and not fully represented in any single mind at all.5
There are still more senses in which the word ‘language’ can be used.
One of them is biological. The human species is alone in ‘having’ language,
and at the same time, and although there are large differences between
the languages that humans speak, all of them do speak one, if they are
healthy. Thus, the capacity for linguistic behaviour, that is, the acquisition
and use of a human language, is a species specific human universal, and
must therefore have a biological and ultimately genetic basis. In the sense
which refers to that capacity, ‘language’ is often also called an ‘instinct’, or
an ‘organ’ (e.g. Pinker 1994), and can be studied in neuro-physiological,
and genetic terms.
Finally, there is a completely abstract, or even metaphysical sense in
which the word ‘language’ can be employed. For instance, a ‘language’
can be said to ‘exist’ without being used or known by living speakers
or communities at all. This is true of so called ‘dead’ languages, which
may occasionally be ‘revived’. Thus, classical Hebrew was extinct as a
spoken language for many centuries before it came to be ‘resurrected’ as
the official language of the modern state of Israel. It thus seems to have
‘existed’ somewhere outside the domain of spatio-temporal boundedness
altogether, ‘kept alive’ in a world of abstract knowledge (the well-known
philosopher of science Charles Popper might have referred to it as World
Three; see, for instance, Popper 1968a).

1.3.3 Modelling by inference 2: modelling what, how and why?

As we have seen, it is already difficult to decide where even to look for
language in order to study it, and it will certainly be wise to try and

5 ‘For language [langue] is not complete in any speaker; it exists perfectly only within a
collectivity’ (Saussure 1959: 21).
8 Selfish Sounds and Linguistic Evolution

disentangle the many phenomena referred to as ‘language’ from each

other – both if one wants to analyse or model any of them in detail,
and if one wants to understand the relations among them. Unless one
has at least a crude preliminary understanding of what one should focus
on, ‘language’ in its comprehensive and confusing everyday sense may
easily impress one as a ‘hopeless amalgam’ (Chomsky 1992: 102). It
seems to involve so many ‘complex and obscure sociopolitical, histori-
cal, cultural and normative-teleological elements’ (Chomsky 1992: ibid.),
that in its stunning complexity it might strike one as impossible to study
altogether, its investigation ‘verge[ing . . .] on the “study of everything” ’
(Chomsky 1992: ibid.). In short, principled distinctions need to be made
and clear research strategies established. Otherwise no two scholars can
be even sure whether they are studying the same thing when they say
they are studying language, nor will they be able to agree what the phe-
nomena they are observing and possibly describing should be taken as
evidence of.
Defining strategies of investigation before one has a good understand-
ing of one’s subject is a delicate matter, of course, and there are no general
and reliable guidelines for doing so. More often than not one has to rely
on trial and error. This is as true of everyday life as of academic research,
and given the many different manifestations in which language comes, it
is not altogether surprising that language scholars should have developed
a variety of sometimes quite different strategies in order to tackle the phe-
nomenon. This is not merely because there simply are a large number of
possible approaches to language, of course, but also because there exist
a large number of reasons for studying language, each of them suggesting
a different order of research priorities. If one is interested in, say, ‘the
German language’ because one wants to teach ‘it’ to native speakers of
English, the detailed manner in which human mind/brains manage to
parse speech chains and attribute syntactic structure to them is arguably
of little immediate interest. It will suffice to know, for instance, that in
German direct objects can occur before verbs, while in English they nor-
mally cannot. Not only can the essentials of this difference be usually
taught to learners without worrying about how human minds manage to
identify ‘direct objects’ in the first place, but dwelling on that problem
might even impede efficient instruction. The situation will be completely
different, on the other hand, if one is looking for an explanation of syn-
tactic speech disorders in native speakers of German. When one faces
that problem, the mental or even the neurological status of syntactic cat-
egories will be of the utmost importance. In short, the question ‘What is
language?’ seems to justify different answers depending on who wants to
know and why. That the academic community of language scholars at the
Introduction 9

beginning of the twenty-first century is rather heterogeneous is therefore

indeed no surprise.
This is not the place for scientific historiography nor for a detailed
survey of the scientific approaches to language that presently exist. It
is important to point out, however, that within the academic commu-
nity of today, language is not only approached for a variety of different
purposes, and from a variety of different perspectives, but is conceptu-
alised by different ‘theoretical camps’ in ways that are sometimes mutu-
ally exclusive and often incompatible. There are still many fundamental
aspects of language, about which there is no full agreement among lin-
guists. Yet, although language in all its facets is still far from being fully
understood, many scholars have tended to elevate to the rank of ‘theories
of language’ their often rather preliminary assumptions simply because
they have apparently allowed them to come to terms with those particular
aspects of language they happened to be interested in. Few admit to the
incompleteness and the provisional character of their conceptual frame-
works. Instead linguists of various persuasions tend to be quite ‘defensive’
about their individual approaches, and consequently fail to keep them
open and flexible enough for integrating insights gained from different
perspectives. Therefore, instead of contributing to what might eventually
grow into a general theory of language worthy of the name, various lin-
guistic schools work in parallel, while failing to trade insights in a mutually
profitable way.
This book does not address a specific sub-community of linguists, nor
does it expect its readers to share a set of specific assumptions about lan-
guage. Given the heterogeneity of the linguistic community, I am aware
that this is somewhat risky. First, issues will necessarily be raised which
some informed readers may regard as settled, solved or at least handled
better within their preferred frameworks. Secondly, some of the phe-
nomena I shall refer to in order to develop my argument have been dealt
with in much greater depth by other linguists and my own treatment of
them may strike some as naive and superficial in comparison. Finally, the
very explicitness and transparency which is required if one wishes to be
understood by more colleagues than just one’s closest research associates,
will make one a comparably easy target of both friendly and unfriendly
I am willing to take that risk, however. From my own experience, I
have learnt that there are dangers to specialisation as well. When one
chooses a particular approach, adopts a particular theoretical framework,
internalises the appropriate terminologies and formalisms, and attempts
to advance and refine the theory by holding it against a specific set of data
one knows very well, one may certainly ‘get somewhere’, but often one
10 Selfish Sounds and Linguistic Evolution

gets so attached to one’s perspective that one becomes all but incapable of
questioning its basis. Certainly, this may be socially safe. If one adopts a
theory that is shared by a substantial number of colleagues, one can count
on their goodwill even if only for joining their ranks. But should one’s
chosen approach be inherently flawed, one is unlikely to discover it that
way. Therefore, I have decided quite deliberately to approach my subject
matter as naively as possible. Risking reinventing one or the other wheel,
I shall try to describe the motivation of the present study, the particular
problems it addresses, the perspective it takes, and the assumptions it
makes in considerable detail in the following sections. I will be pleased if
I persuade some of my readers to follow me back to basics. Since I have
no intention of ‘impressing’ them, or ‘persuading’ them of the ‘ultimate
correctness’ of my argumentation, I shall try to make it as easy as possible
for them to take issue with the points I make. My hope is that they will
do so, detect all the flaws I am certain to have overlooked, and make the
best of it.
2 The historical perspective

2.1 Evidence of language change

This study deals with the historical changes that languages are known
to undergo. It thus focuses on an aspect of language which does not
seem to constitute one of its most spectacular properties. On the one
hand, the historical changeability of languages does not form part of our
everyday experience, and on the other, it might strike one as a rather trivial
phenomenon, particularly if one shares the common view that everything
there is in this world will naturally be subject to change. Yet, it is possible
that languages change in language specific ways, that is to say in ways in
which they can only change because they are languages. If this is so, the
study of linguistic change may help to reveal clearly aspects of the nature
of language, and it may pay to investigate language as an historically
changing entity.
Of course, it deserves to be stressed right at the outset that the very
claim that languages actually ‘change’ is not really self evident. ‘Language
change’ is no ‘fact’, but a theoretical construct because processes of
‘language change’ are not observable as such. That language change is
nevertheless treated as if it were a ‘fact’ reflects, basically, a complex but
plausible interpretation of observable variation among linguistic texts
from different historical periods. Such variation becomes evident, for
example, in the following versions of Luke , 11–13:
(2) a. [11] He cwæð . soð lice sum man hæfde twegen suna. [12] þa
cwæð se yldra to his fæder; Fæder. syle me minne dæl minre
æhte þe me to ge-byreþ. þa dælde he him his æhte, [13] Da−

æfter feawa dagum ealle his þing gegaderude se gingra sunu.

& ferde wræclice on feorlen rice. & for-spilde þar his æhta
lybbende on his gælsan; (West Saxon I (Gospels), c. 990)
b. [11] He cwæð soð lice. Sum man hæfde twege sunes. [12] þa
cwæð se ylder to his fader. Fader syle me minne dæl minre
ehte. þe me to ge-byreð . Da
− dælde he him his ehte. [13] Da

12 Selfish Sounds and Linguistic Evolution

æfter feawa dagen ealle his þing ge-gaderede se gingre sune.

& ferde wræclice on feor landen. & for-spilde þær his ehte
libbende on his gælsan. (West Saxon II (Gospels), c. 1175)
c. [11] And he seide, A man hadde twei sones; [12] and the
yonger of hem seide to the fadir, Fadir, yyue me the porcioun
of catel, that fallith to me. And he departide to hem the catel.
[13] And not aftir many daies, whanne alle thingis weren ged-
erid togider, the yonger sone wente forth in pilgrymage in
to a fer cuntre; and there he wastide hise goodis in lyuynge
lecherously. (John Wycliffe (Late), c. 1395)
d. [11] And he sayd: A certayne man had two sonnes, [12] and
the yonger of them sayde vnto the father: father, geue me the
porcion of the goodes, yt to me belongeth. And he deuided
vnto them his substance. [13] And not longe after, whan the
yonger sonne had gathered all that he had together, he toke
his iorney into a far countreye, and there he wasted his goodes
with ryotous liuing. (Great Bible, 1540)
e. [11] And hee said, A certaine man had two sonnes: [12] And
the yonger of them said to his father, Father, giue me the por-
tion of goods that falleth to me. And he diuided vnto them
his liuing. [13] And not many dayes after, the yonger sonne
gathered al together, and tooke his iourney into a farre coun-
trey, and there wasted his substance with riotous liuing. (King
James, 1611)
f. [11] And He said, a certain man had two sons: and the younger
of them said to his father, [12] Father, give me the portion that
falleth to my share. And he divided his substance between
them. [13] And not many days after the younger son gath-
ered all together and went abroad into a distant country, and
there squandered away his substance by living luxuriously.
( John Worsley (NT), 1770)
g. [11] Again he said: ‘There was once a man who had two sons;
and the younger said to his father, “Father, give me my share
of the property.” [13] So he divided his estate between them.
A few days later the younger son turned the whole of his share
into cash and left home for a distant country, where he squan-
dered it in reckless living.’ (New English Bible, 1970)

While being similar enough to be comparable, the seven texts dif-

fer in many respects and on various levels of formal description. For
instance, the word for ‘father’ is spelt fæder (a) fader (b) fadir (c) father
The historical perspective 13

(d–g) respectively. These graphic differences – noteworthy for what they

are – can plausibly be assumed to represent (loosely) corresponding dif-
ferences in the pronunciations and phonological representations of the
word: fæder may have stood for something like [fæ:də r], fader and fadir
for [fa:də r], and father for [fɑ:ðə r], or ultimately – as we know from con-
temporary evidence – for [fɑ:ðə ].
On the morphological level, it seems that certain inflectional endings
found in the earliest versions of the text – such as the -a in suna (gen. pl.
of OE sunu ‘son’), the -um in dagum (dat. pl. of OE dæ ‘day’) or the -ende
of lybbende (pres. part. of OE libban, lybban ‘to live’) do not appear in the
later texts anymore, the inflectional categories they must have expressed
either being altogether absent in more modern stages of English (as in the
case of the dative plural -um) or being expressed by different means (Ø
or occasionally -s for the genitive plural -a; -ing for the present participle
ending -end).
With respect to words and their meanings, it appears that certain words
have been replaced by others, as in the case of OE æhte, which later shows
up as catel (c), good(e)s (d and e), share (f ) and property (g) respectively,
and others seem to have changed their meaning, such as syle (a), which
in the Old English text seems to have expressed a concept more general
than that of its Modern English counterpart to sell, and comparable to
that of ModE to give.
As far as the structure of sentences is concerned, it is worth observing
that in texts (a) and (b) the first clause of sentence [13] is characterised
by an arrangement of constituents which does not occur in any of the
later texts. The analyses in (3a) and (3b) make this explicit:

a. Temporal Adjunct Object Verb Subject

[œfter feawa dagum] [gegaderude] [se gingra sunu]

b. Temporal Adjunct Subject Verb Object Verbal Particle

[not many dayes after] [the yonger sonne] [gathered] [al] [together]

Old English, it seems, admitted Object – Verb – Subject (or: OVS)

order in main clauses, while Modern English seems to demand SVO
14 Selfish Sounds and Linguistic Evolution

Finally, the texts also differ in a number of other interesting ways. For
instance, text (2g) has the son turn ‘the whole of his share into cash’
before wasting it all, while in all earlier versions nothing specific is said
about the form in which the son decided to take his share of the family
property with him. Interestingly, also, in texts (2a) and (b), it seems to be
the elder of the two sons who asks the father to divide his property before
his younger brother then decides to leave home with his share and waste
it by living luxuriously, whereas in all the other versions it is the younger
son himself who asks his father to do so. A difference which changes the
moral of the whole story considerably.1
The textual differences mentioned in the last paragraph seem primar-
ily to reflect differences among the cultures within which they were pro-
duced, not among the languages in which they were written. Thus, it
has become normal in twenty-first century advanced capitalist societies
to convert one’s property into money or an even more virtual equivalent
if one wants to spend or, indeed, waste it conveniently, and this may
account for the innovative phrase turned the whole of his share into cash
in (2g). Similarly, the other difference might reflect different concepts of
divine justice held by the respective author-interpreters. To the authors
of texts (2a) and (b) it might have been inconceivable that God should
reward a sinner unless that sinner can in a way be exculpated – in this
case through an elder brother whose impatience to come into his prop-
erty forces his younger sibling to face a temptation before he is mature
enough to overcome it.
All other differences, however, that is graphic, phonological, inflec-
tional, or syntactic ones, are typically thought to reflect ‘changes in the
English language’ itself. This assumption is basically plausible, and rests
on the following type of reasoning. First, the differences among the texts
are taken to reflect parallel differences among the ‘languages’ in which the
texts were written – that is, differences on the level of individual or socially
distributed linguistic competence. Second, the texts (and by implication
the ‘languages’ they ‘are in’) are not merely different, but display more
similarities and correspondences among each other than one could expect

1 The story continues like this. When the son has spent his property and ends up having
to beg for food, he decides to go home to his father and ask his forgiveness. He does so
and his father not only forgives him but actually celebrates his homecoming in a big way.
This irritates the brother who has stayed home dutifully all the while, and he complains
that his own obedience ought to deserve a celebration much better than the homecoming
of his brother after years spent in luxury and sin. The father, however, replies that he
regards the repentance of a sinner as a happier occasion than the simple righteousness of
the righteous. Arguably, the father’s behaviour towards his older and more dutiful son is
a bit unfair and might be easier to understand if he had somehow been responsible for
the misfortune of his younger brother.
The historical perspective 15

to find among two different-language versions of the same Bible passage

picked at random.2 Now, the only instances of (greater-than-average)
similarity3 among individual competences that can be readily explained
on the basis of everyday experience are those which hold among compe-
tences of speakers who belong to the same speech community, or who are
in communicative contact with one another. In particular, it is well known
that the linguistic competences which children acquire normally turn out
to be highly similar to the competences of the speakers they communicate
with during their maturation. Thus, one can think of communication as
a process through which properties of languages get transferred, most
markedly so in language acquisition. The particular language of any indi-
vidual can accordingly be said to derive most of its specific properties
from the languages of speakers with whom he/she comes to commu-
nicate, particularly during his/her early childhood. It may therefore be
hypothesised that all languages which are extraordinarily similar3 to one
another will be linked through uninterrupted chains of communicative
exchanges. Such communicative links establish channels through which
languages may apparently spread across space and time while essentially
maintaining their properties. Therefore, the languages of the text samples
in (2) can in a sense be regarded as temporally successive varieties of the
‘same language’. This appears to be the only plausible explanation of the
similarities that hold among them. If similarities among languages point
to successful property transmission, then the differences among the indi-
vidual languages or varieties must be due to infidelities in the transmission
of such properties. And for instances of ‘flawed’ property transmission,
‘change’ is a term which naturally suggests itself. This, then, is the argu-
mentative basis of the view that the language in which the texts in (2)
were composed ‘has changed over time’ and that the textual specimens
represent ‘different stages’ in that ‘historical development’. Of course,
2 Compare, for example, the Modern English version from (2 g.):
[11] Again he said: ‘There was once a man who had two sons; and the younger said to his
father, “Father, give me my share of the property.” [13] So he divided his estate between
them. A few days later the younger son turned the whole of his share into cash and left
home for a distant country, where he squandered it in reckless living.’ (New English Bible,
to its Modern Finnish counterpart.
[11] Jeesus jatkoi: Eräällä miehellä oli kaksi poikaa. [12] Nuorempi heistä sanoi isälleen:
‘Isä, anna minulle osuuteni omaisuudestasi.’ Isä jakoi omaisuutensa poikien kesken.
[13] Jo muutaman päivän päästä nuorempi kokosi kaikki varansa ja lähti kauas vieraille
maille. Siellä hän tuhlasi koko omaisuutensa viettäen holtitonta elämää. (Finnish Bible,
3 Clearly, we are speaking here of similarities among individual languages that go beyond
universal properties possibly shared by all human languages.
16 Selfish Sounds and Linguistic Evolution

this piece of reasoning is compatible with the common experience that

younger generations usually tend to speak slightly differently from older
ones. It is easy to imagine that a sufficiently large number of such small
differences accumulating over time can amount to such striking changes
as those to which examples in (2) testify.
Throughout the history of modern man all known languages seem to
have undergone modifications of the types just observed, and if there is
anything that deserves to be called a universal feature of natural human
language, then it is its inability to resist change. Since its changeability is
an undeniable fact, however, one can certainly not simply ignore it when
one intends to understand the nature of language, particularly since it is
possible that languages might not ‘just change’, but change in specific,
interesting ways. Should this be the case, then any model of language
which cannot explain these ways must necessarily be incomplete or inad-
equate, or most likely both.

2.2 Language as a changing object

Linguists often try to establish first how languages are structured, or
how they come to fulfil their particular functions for their speakers.
Only once they have established their basic models, they go on to ask
whether their theories can also help to account for the ways in which
languages change. This book will start from the opposite end. It will first
ask why and how languages come to change over time, and attempt to
model languages as historically changing entities. Only afterwards it will
ask whether the resulting theory can explain their structures and their
functionality. Taking up the metaphor introduced in the introduction,
this book will not ask why and how the tools that languages represent
change historically. Instead it will simply try to explain how and why
languages change, in the hope that the resulting theory might also indicate
why they are structured as they are, and why they appear to be such useful
Although it focuses on changes that have occurred in the history of
English, the main purpose of this book is not to facilitate our under-
standing of historical English texts or to help us to access and profit from
the knowledge of our forbears. Nor is it primarily intended to describe
the historical development of the English language specifically. Instead,
it intends to show what contribution linguistic historians can make to
the task of constructing a comprehensive theory of language, its onto-
logical status, its relation to human brains, minds and communities and
the mechanics that determine both its internal workings and its interac-
tions with the rest of the world. In short, the present book will attempt
The historical perspective 17

to address the following central question: what must languages be like if

they change the way they do?
In order not to sell an explanatory framework that comes to terms
with a particular aspect of language as a complete theory of language,
this study will offer its conclusions in a deliberately tentative manner.
It is hoped, at the same time, that the view of language it advocates
may inspire reflection on, or even help understand, also other aspects
of language than just the ways in which it seems to change in time. I
strongly believe that the ‘language which changes’ must in some sense
hang together with the ‘language which is used in communication’, the
‘language which is part of our biological endowment’, the ‘language which
gets acquired by children’, the ‘language which gets taught in the foreign
language classroom’, or ‘the language that you are reading just now’.
Therefore, the study of language as an historically changing object may
shed light on other ways in which its properties become manifest, such
as the ways language is acquired and put to use by speakers in normal
This book will attempt to make the following main point. If one wants
to understand languages as things with histories, the common view of lan-
guages as self-contained, essentially static and passive systems of knowl-
edge, ‘employed’ by speakers as a ‘means of communication’, is not
helpful. Instead, the historicity of languages is much easier to account
for, if they are regarded as open, dynamical systems which are capable
of adaptive self-organisation and similar, in this respect, to autonomous
life forms. Since the properties of life forms can be most fully understood
from an evolutionary perspective, and since evolution depends on actively
replicating patterns, a view of languages will be proposed and put to the
test, which sees them as systems of mental – or ultimately neural – repli-
cators whose existence depends on (and can thus be explained by) their
ability to reproduce before disintegrating. In that sense, languages will be
seen as analogous to the genetic systems that inhabit and evolve in the
biosphere of our planet, as well as to all systems that work upon similar
principles. It will be argued that the replication on which the elements
of human languages depend for their survival represents their primary
raison-d’être and that, as they pursue the aim of replicating, they ‘use’ the
speakers that ‘host’ them almost as much as they appear to be used by
the latter. They will in many respects be beyond their conscious reach
and control.
This perspective is of course strongly inspired by biological evolution-
ary theory as well as by recent attempts to develop more general theories of
‘complex adaptive systems’. As I hope to show, such an approach, which
might be called ‘generalised Darwinism’, seems to explain the patterns of
18 Selfish Sounds and Linguistic Evolution

change observable in linguistic evolution at least as well, and in some cases

clearly better than competing approaches to language change. At the same
time, it seems to be more easily compatible with evidence on language
that has been gathered in studies of its biological and socio-psychological
foundations, its structure, its use and its textual manifestations.
The approach I shall be taking is necessarily interdisciplinary. It will
draw on, and try to integrate findings from many different fields of study,
such as traditional philology, modern linguistics, neurology, biology, phi-
losophy, dynamical systems theory, sociology, anthropology and cognitive
science. Once again, I am perfectly aware that interdisciplinary endeav-
ours have their dangers, since it is difficult to be an expert in more than
a small number of academic fields at best. But as already pointed out
above it seems to me that the possible costs of simplifying or misrepre-
senting knowledge from one academic discipline or another are clearly
outweighed by the potential benefit of getting a more complete and
integrated view of a phenomenon such as language, which notoriously
permeates very different empirical domains.
3 Approaching ‘language change’

3.1 Preliminaries
The words ‘language’ and the compound ‘language change’ are familiar
from everyday use. There they carry meanings and associations which
work well enough for everyday purposes. In a way, these meanings and
associations might be viewed as mostly implicit and rather crude theories,
or working hypotheses. All humans who are confronted by language in
their daily experience have them. Investigating the nature and mechanics
of language and language change, however, is definitely not a typical
everyday purpose. Approaching a language as something that changes and
has a history is clearly different from approaching language as something
that one learns, knows, uses or understands. Since the shape of any theory
will reflect the purposes for which it is constructed, it cannot be taken
for granted that the normal, everyday way in which we think of languages
should prove useful when we intend to understand their historicity. After
all, it is obvious that common sense often conceptualises phenomena
in a way which can be utterly inadequate for special, and in particular
scientific purposes. Take such concepts as sunrise and sunset, to give a
trivial but telling example. While they are perfectly adequate for referring
to the phenomena as most of us experience them, they are downright
deceptive for the purpose of describing or even understanding the celestial
mechanics behind the events. This means that although we may have no
other choice but to work with the established notions that constitute our
‘common sense’ when approaching the phenomenon of language and its
change, we must accept that our investigations might eventually cause
us to revise our understanding of the concepts, or even to give them up
altogether, like – to cite another well-known example – the notion that
there was one particular quality or substance, distinguishing life from
dead matter, had to be discarded together with the concept of élan vital.1

1 A term which was originally coined by the French philosopher Henri Bergson (1859–
1941) in 1907 in his book L’évolution créatrice. He proposed that what distinguished life

20 Selfish Sounds and Linguistic Evolution

The following sections will attempt to show that our everyday under-
standing of language is highly ambivalent and open to many different
interpretations. These will be made explicit, and it will be discussed
how the different phenomena which can be referred to as language hang
together and what roles they can play in an attempt to understand lan-
guage. This exercise should help us to avoid common sense notions which
might tacitly bias our understanding of linguistic change and thus prevent
us from conceptualising in the manner which is most appropriate to our

3.2 Establishing basic assumptions

Nobody denies that language manifests itself (A) as ‘texts’ and (B) as
‘behaviour’. ‘Texts’ are here understood in the sense of physical (that is,
acoustic or graphic) patterns which exist ‘out there in the real world’.2
‘Linguistic behaviour’ is understood to involve neuronal and other phys-
iological processes which may either result in texts (in language produc-
tion), or may be triggered by texts (in reception).3
Furthermore, it can be taken for granted that (C) some cognitive or
mental system must exist that one could reasonably call ‘linguistic compe-
tence’. This system will be implemented in the form of brain-states, and
thus represent a neuro-physiological phenomenon. ‘Competence’ both
informs – and thus constrains – communicative behaviour and is in turn
(co-)determined by the latter: what one can say and understand depends
on what one knows, and what one knows depends on what one has picked
up and tried out in communication.4

from mere matter was an essence with which the former was imbued and which was so
ephemeral that science had not yet discovered it. He turned out to be wrong. See also
Russell (1961: 756–66).
2 There are understandings of the word ‘text’ which differ markedly from the definition
above and which are employed by text linguists or literary scholars. I am of course perfectly
aware of this. For the purposes of the present argumentation, and throughout most of this
book, however, I shall use the term in the straightforwardly materialist sense employed
3 In spite of their ultimately neuro-biological character, the processes involved in linguistic
behaviour can also be described in terms of such higher-level concepts as those used
in psychology, for example. The question of how they are best described need not be
addressed at this point, however. For a debate of neuronally reductionist explanations
of cognitive and behavioural processes see for example Hubel (1988), Edelman (1989),
Fodor (1989), Zeki (1993), P. S. Churchland (1986) P. M. Churchland (1995), Crick
(1995), Snyder (1986), Chalmers (1996), McGinn (1999), Gold/Stoljar (1999), Chater
(1999), or Jamieson (1999).
4 Just how autonomous ‘linguistic competence’ might be from other parts of human neuro-
biological systems or to what degree it should be regarded as separate from them at all is
a non-trivial question and no rash decisions ought to be taken on it. Furthermore, and
just like in the case of linguistic behaviour, the neuro-physiological implementation of
Approaching ‘language change’ 21

If ‘linguistic competences’ are implemented as brain-states, and if lin-

guistic behaviour involves neuronal and other physiological processes, it
is obvious that both will be biologically co-determined, or constrained.
Without committing ourselves to any view on its specific properties, we
may refer to the biological basis of linguistic competence and behaviour
as (D) the ‘human language capacity’.
Next, (E) language is also instantiated super-individually, or socially.
It is difficult to decide at this stage what exactly the relationship between
the social instantiation of a language and its instantiations in terms of
individual competences, instances of behaviour or texts might be. So the
question will be left open for the time being, bearing in mind that at
a later stage we shall have to form an opinion as to whether the super-
individual instantiations of a language should be regarded as the mere sum
of lower level instantiations, whether they assume higher-level properties
that result from the interaction of lower level instantiations in a complex
and thus intractable manner, or whether an even more radical distinction
should be made.
Finally, the possibility (F) needs to be taken into account that lan-
guages might in some sense also represent abstract constructs, or theo-
ries, which seem to be ontologically different from, though not necessarily
completely independent of the manifestations of language mentioned so
far. As already pointed out above, there is a sense in which a language
may be said to exist even when there exist no competent speakers of it,
when no one speaks it and when no texts ‘in it’ are being produced. All
historically ‘extinct’ languages might be said to belong in this category.5
In order to elucidate the relations among them and in order to pro-
vide a firm basis for the ensuing discussion, the five manifestations of
language just mentioned are represented in the form of a schematic chart
(figure 3.1).

3.3 What ‘language change’ must represent

What is it then that changes when ‘language’, that is to say ‘a particular
language’, changes? The question sounds deceptively simple, and this

linguistic competence should not be interpreted to rule out the possibility that its prop-
erties might be more suitably described in terms of higher-level concepts. Their essentially
material nature will be regarded as beyond reasonable doubt, however.
5 Of course, the question whether languages of which no textual evidence whatsoever nor
any second hand witnesses exist should also be assumed to ‘exist’ in this abstract sense is
more of philosophical than of practical relevance. If it is answered positively, however, I
see no principled way of denying ‘existence’ of the same kind to all future languages, or
even to all possible languages, which creates the uncomfortable situation that one would
have to speak of the ‘reality’ of the ‘unrealised’. For my taste this is bordering too closely
on the paradox to be rationally discussed.
22 Selfish Sounds and Linguistic Evolution


F in
(productive and/or receptive)


INDIVIDUAL (biological basis
LINGUISTIC of competence
COMPETENCE and behaviour)

consists of / COMPETENCES
emerges from within a

Figure 3.1 Six manifestations of ‘language’.

may be one of the reasons why it is rarely discussed in depth in literature

on the subject.6 Yet it takes different answers and invites different research

6 There are notable exceptions, of course. One of the earlier ones would be Hermann Paul’s
Prinzipien der Sprachgeschichte (1880), or Weinreich, Herzog and Labov’s seminal 1968
paper ‘Empirical foundations for a theory of language change’. That the latter explicitly
addresses and revises Paul’s views indicates that the discussion was not very intense in
between, however. See also McMahon (1994: 7f).
Approaching ‘language change’ 23

strategies depending on how language is defined and thought about. As

this section will show, some definitions make it difficult, if not impossible,
to approach the phenomenon of language change altogether.
Recall what was said above about the different domains in which lan-
guage may manifest itself. It can be regarded as (a) text, (b) a type of
human behaviour, (c) the linguistic competence of individual speakers
(which has been informally called ‘tool’ above), (d) the biological basis
of linguistic competence and behaviour, (e) a system of knowledge shared
by speakers in a specific community, and (f) an abstract knowledge system
which is ontologically independent of its realisation in actual speakers’
minds. The question which of the possible manifestations of language are
involved in language change, and in what way, is anything but trivial and
clearly important.

3.3.1 Language as text

Look at texts – that is, the finished material products of linguistic dis-
course – first. The good thing about texts is that they are easily accessible
to empirical investigation. They are out there in the physical world, so to
speak, and one can investigate their structural properties from a detached
and quasi-objective point of view. Clearly, however, the very fact that texts
are passive physical objects implies that by themselves they are not capa-
ble of ‘changing’ in the sense which is relevant here. The only types of
changes that texts can really undergo are the gradual physical and chemi-
cal processes of decomposition. Once produced, the blots of ink (or toner,
or whatever) that are letters, begin, albeit relatively slowly, to decompose
in regular and fairly – though not exactly – predictable ways. In a similar
way, although much more quickly, patterned sound waves, the material
‘products’ of spoken language, remain manifest in the medium that car-
ries them only for a while and then simply fade away. These types of
‘change’, however, are utterly uninteresting for our purposes, because
they can by no means explain similarities or differences such as the ones
between the two following corresponding lines:
(4) Sum man hæfde twege sunes. (2b)
A man hadde twei sones. (2c)
Clearly, for the sentence from (2b) to ‘change into’ its counterpart from
(2c), the active involvement of speakers is required, who need to inter-
pret the former and then somehow ‘re-create’ it in the form of the latter.
Changes of this kind thus crucially involve processes of text interpre-
tation and production rather than just the physical properties of texts
themselves. So, while language change may be reflected in differences
between texts, the textual manifestations of language are not sufficient
24 Selfish Sounds and Linguistic Evolution

for describing and explaining language change. All one can possibly do
on the textual level alone is to identify counterparts and then chart corre-
spondences between them – that is, similarities and differences.7 A worth-
while and necessary endeavour, to be sure, but clearly only the starting
point for a serious investigation of the processes that actually constitute
linguistic change.

3.3.2 Language as behaviour

If we think of language as a type of behaviour, we face the problem that
many of the actual behavioural processes involved in text production
and/or perception are difficult to observe and to describe, because they
include not only physiological but also cognitive and other mental pro-
cesses. However, even if linguistic behaviour were as easily accessible to
empirical investigation as actual texts, we would not be much better off.
Taking the sentences in (4) again, and assuming that we could reconstruct
in all the necessary detail the behavioural processes by which they were
actually produced at different times (and probably in different places), all
we would be able to observe is apparent parallels and differences among
them. We might notice, for example, that uttering twei seems to be the
behavioural counterpart of uttering twege, and we might proceed to look
in greater detail for similarities and differences between the correspond-
ing instances of linguistic behaviour. Since the descriptions of the events
involved in the production even of very simple and short texts are likely
to be very complex, incorporating both information about cognitive, psy-
chological and physiological processes, as well as information about the
context in which the acts are performed, comparisons of utterance events
will be similarly complex, and possibly more interesting than the neces-
sarily superficial comparisons of mere texts. Still, we cannot assume any
immediate and straightforwardly causal links between the behaviour of
one speaker and that of another – particularly and most obviously not
when the two speakers live at different historical periods. Instances of
linguistic behaviour are individual events, which start and end, and just
like texts they clearly do not ‘change into one another’. Instead, linguis-
tic behaviour normally causes other linguistic behaviour only when it is

7 That texts do not actively undergo language change has always been recognised by lan-
guage historians. Already Hermann Paul, for example, stressed that
Das wirklich Gesprochene hat gar keine Entwicklung. Es ist eine irreführende Ausdrucks-
weise, wenn man sagt, dass ein Wort aus einem in einer früheren Zeit gesprochenen
Worte entstanden sei. Als physiologisch-physikalisches Produkt geht das Wort spurlos
unter, nachdem die dabei in Bewegung gesetzten Körper wieder zur Ruhe gekommen
sind. (1920: 28)
Approaching ‘language change’ 25

interpreted and reacted to by speakers. Just like textual differences then,

differences among instances of linguistic behaviour may reflect language
change, but are not sufficient for describing or explaining it either.

3.3.3 Language as competence

What then, if we think of language as competence, that is, a system
of knowledge implemented in the brains of adult speakers? As pointed
out above, this presents us with enormous empirical difficulties, because
human brains are presently not really observable except so crudely as
is practically useless for investigating linguistic competence. Imagine,
though, that we were indeed in a position to know (if not through observa-
tion, then through reconstruction or modelling) the systems of linguistic
knowledge on which the speakers producing the texts in (2) drew when
they produced them. Once again, it does not seem that we would be much
better off than with either behaviour or texts. Presumably, the models of
the two competences to be compared would be very complex indeed,
representing the neural machinery for producing all sentences the two
speakers would potentially have been able to utter. But would there be
any causal link between them for all their stunning complexity? It does
not seem so.

3.3.4 Language as a biological capacity

Of course, if one considers linguistic competences to be biologically con-
strained, it follows that one should be able to predict certain properties of
one from properties of the other, namely those which are in fact biolog-
ically determined by the human genome. The idea that such properties
do exist is basically accepted by all contemporary linguists. It is believed
that no ‘general learning device’ could acquire competence in a language
from being exposed to actual speech in it. Rather than a ‘clean slate’, it
is argued that successful language acquisition requires a fairly specialised
‘mental organ’ which has the principles on which human languages work
already pre-installed, as it were, so that it need not learn them. Since the
design of such a language organ must necessarily also constrain the kinds
of language which humans can learn, the room for actual variation among
languages must be limited. In its strongest form, the idea is even carried
so far as to liken human language to a circuit board on which most of
the principles of its organisation are hardwired, while only a few switches
(‘parameters’) remain to be ‘set’ during actual language acquisition.8
8 See Chomsky (1993), and, for applications of this idea in historical linguistics, Lightfoot
(1991 and 1996).
26 Selfish Sounds and Linguistic Evolution

From this perspective, then, all linguistic competences in the brains

of human beings must be very similar to one another, at least in those
parts which are in fact genetically determined. If we knew which of them
actually are, we could then ‘predict’ that the same properties will be found
in the brains of all other human speakers as well. The causal link between
the competences of any two speakers would then be via their common
biological ancestors, that is, via the germ line.
However, even though the linguistic competences of any two human
beings might be relatable to one another via the ancestry of the genes that
code for them, this is not the relation in which we are interested here.
Although it cannot be ruled out in principle that the genetic basis of the
human language capacity may have altered as our species evolved, the
changes which were responsible for the differences between successive
stages of ‘modern’ languages such as Old or Present Day English cannot
have been brought about by biological evolution. The kind of language
change which we are interested in here can clearly affect only those aspects
of human competences which are left variable by biology.
Thus, we still face the problem which we encountered when we tried
to understand language change in terms of differences and similarities
between texts or instances of linguistic behaviour. Competences do nei-
ther transform into one another, nor can one have an immediate effect on
the other. They are linked only very indirectly, and for one competence to
affect another, speakers will once more have to be involved in producing
texts and interpreting them.

3.3.5 The competence–behaviour–text cycle

Before turning to language as a system of social conventions or even as
an abstract system of knowledge inhabiting a metaphysical ‘World 3’ in
Popper’s (e.g. 1968a) sense, it might be useful to take stock of what has
been said so far and to reflect on some of its possible implications. We
have observed that language change is impossible to describe or explain
on the textual, the behavioural or the cognitive levels alone. No causal link
can be established between two temporally distant texts, speech acts or
individual competences without taking factors into account which clearly
belong to the other two domains. It seems, however, that causal links
could be established, if one considered the ways in which the three mani-
festations of language interact. Thus, texts may give rise to texts, if they
are interpreted and reacted to linguistically. Interpretation and produc-
tion represent types of linguistic behaviour, which in turn presuppose lin-
guistic competence. Thus, texts can cause texts via linguistic behaviour
Approaching ‘language change’ 27

and competence. Linguistic behaviour can similarly cause linguistic

behaviour only if its textual output gets perceived, interpreted and reacted
to. Once more the link between two manifestations on one level, this time
the behavioural one, is established via the other two. Very much the same
is true, finally, if we consider language on the cognitive level. One com-
petence can influence another only via behaviour and texts, namely when
the texts which it enables speakers to produce are received and inter-
preted by others in a way which changes their own competences. This
happens most obviously during language acquisition, of course. There,
textual output is processed by brains which have not yet acquired their
stable, adult structures but are still in the process of development and
organisation, and this process is strongly influenced by the texts to which
they are exposed. In less far-reaching ways, however, also adult and more
stable competences can be assumed to change when they are exposed
to new texts. In short, competences are causally linked to one another
just as instances of linguistic behaviour or texts are: indirectly and via the
other two domains in which language manifests itself.
Summing up what we have seen so far, it becomes clear that any serious
attempt to study language change will necessarily have to consider all
of the three levels on which language manifests itself, if it purports to
describe the actual processes linking two language stages instead of merely
charting correspondences between them.
This raises another question, however, namely whether in an investi-
gation of language change any of the three levels deserves to be given
conceptual priority over the others, or whether they should be all con-
sidered of equal status. Should we think of language change as compe-
tence change mediated by behaviour and its textual output, as change of
behaviour mediated by texts and competences, or should we regard it as
text change mediated by behaviour and competence? What has been said
so far, would admit of all these possibilities. To see what the problem is,
look at diagram (5) below. As it shows, competences (C), instances of
behaviour (B) and texts (T) seem to be arranged on a causal chain like
beads of different colours in a simple recursive pattern. Competence (C)
informs production (B). Production yields text (T). Text gets perceived
and interpreted, which means that it triggers and informs (B). Percep-
tion and interpretation alter competence (C). Competence (C) informs
production (B). Production yields . . . and so on: it seems to be C1 –B1 –
T1 –B1 –C2 –B2 –T2 –B2 –C3 –B3 –T3 . . . ad infinitum, and it looks as if we
could theoretically choose any bead Xn on the chain as a starting point for
an account of language change, look for its counterpart Xn+1 and regard
a single episode as ending there, as it were.
28 Selfish Sounds and Linguistic Evolution

(5) a. changes into

C1 C2
informs B produces T informs B alters

b. changes into
T1 T2
informs B alters C informs B informs

c. changes into
B1 B2
produces T informs B alters C informs

In practice, this is not the case, however. The reason is related to the
fact that we shall never have direct empirical access to all the individual
factors which underlie differences between any two stages of C-, B- or
T-language, irrespectively of how we look at it. Instead we shall always
have to make a large number of assumptions. If, for example, we regard
language change as something which happens to C-language and is in
principle ‘brought about’ through the mediation of linguistic behaviour
and textual output, we shall never be able to collect the necessary evidence
of all the actual instances of language behaviour nor of all the written and
spoken texts which contributed to bringing the differences between C1
and C2 about. All we can do is hypothesise about the kinds of linguistic
behaviour and the kinds of texts that must have been involved. Of course,
our hypotheses might achieve a respectable degree of plausibility, since
we can of course sample actual texts or instances of linguistic behaviour
and check whether they are at all compatible with our assumptions. Yet,
they will still remain hypotheses for all that.
If we ‘start’ on any of the other two levels, things would not seem to
be different at first. When we compare two temporally distant texts, for
instance, we will hardly ever know through what actual chain of events
they are linked and will also have to make assumptions about likely event
types instead. Yet, there seems to be one crucial difference between com-
petence on the one hand, and the other two manifestations of language
on the other. While the latter are typically strongly co-determined by
extra-linguistic factors such as the situation, the physical condition, the
mood, the social setting or the particular communicative intentions of
individual speakers, the former is not – at least not to a similar degree.
This follows from a simple observation: whatever the contingencies of
your upbringing, whatever texts you actually come across as a child
and whatever people say to you, you will wind up being competent in
English, as long as that is the language which is around for you to acquire.
Approaching ‘language change’ 29

On the other hand, no matter what your competence is like: if, when and
where you say anything, and what you say, does not only depend on your
knowing English but on a large number of other factors, which may fall
outside the linguistic domain proper.
In other words, differences between texts or instances of linguistic
behaviour may always have a variety of reasons which fall outside the
chain of causal links represented in (5). Furthermore, many of these
extra-linguistic factors are likely to be historically contingent, and this
makes stories about how texts or instances of linguistic behaviour are
causally linked difficult to tell – not exclusively, but predominantly from
a linguistic point of view. In a very crucial respect, then, things are not
quite the same on the level of competences. Because if it is true that
children can acquire the ‘same’ language from different sorts of textual
input as long as it is ‘in’ that particular language, this implies that in the
acquisition of linguistic competence many of the contingencies that make
texts difficult to compare with one another will be filtered out.
Although it may not at all be clear what informational input comes to
be treated as linguistically relevant in language acquisition and how the
developing minds of children manage to distinguish it from accidental
information, the conclusion that some distinction of that kind must be
made is unavoidable. Language acquisition can therefore be regarded as
a kind of sieve through which only linguistically relevant information can
pass, and we can revise the causal chain of events sketched in (5) at least
in principle as follows:

(6) X2 Y2 Z2 A2
X1 Y1 Z1 A1

X3 Y3 Z3 A3
T1 informs Bi alters C informs Bp informs T2
X4 Y4 Z4 A4

X5 Y5 Z5 A5

Clearly, the interpretation (Bi ) of textual input will not only be

informed by that input (T1 ) but by a variety of contingent contextual
factors (X1 –Xn ) as well. The properties of an utterance act (Bp ) will not
only reflect properties of a speaker’s competence, but also a basically open
set of other situation specific factors (Z1 –Zn ). Also, the properties of a
text (T2 ) will only be partly determined by the properties of its underlying
utterance act (Bp ) and reflect additional contextual influences (A1 –An ). –
As a person’s linguistic competence (C) develops and changes when
30 Selfish Sounds and Linguistic Evolution

exposed to interpreted textual input, however, most contingent, contex-

tual information (Y1 –Yn ) appears to be factored out in the process, so that
only ‘linguistically relevant’ information is incorporated into a person’s
schematic ‘knowledge’ of his/her language.
Why does this suggest, then, that stories about language change should
be thought of as changes of linguistic competence rather than of texts
or instances of behaviour? The reason is basically this: if the linguistic
behaviour which speakers display, as well as the shapes which the texts
they produce take, are strongly determined by historical contingen-
cies, this will necessarily increase variation among instances of linguistic
behaviour or texts considerably. By the same rationale, there will be more
and closer structural correspondences among related competences than
among related instances of linguistic behaviour or among related texts.
Consequently, the similarities and differences between related compe-
tences will reflect relatively well how closely they are actually related,
while the presence or absence of similarities between instances of linguis-
tic behaviour or texts may have many reasons that have nothing to do
with their relatedness to each other.
Thus, the examples in (2), which served as a first illustration of lan-
guage change, are actually highly exceptional in being both functional
as well as to a high degree also structural counterparts of each other. If
the overall textual output of any speech community over any period were
considered, however, instances of texts which are similarly comparable
and which might count as functional and structural counterparts of one
another would definitely turn out to be rather rare.9 Furthermore, even
though texts produced at one point in time may well be causally related to
texts at another point more or less directly, texts hardly ever prompt the
creation of proper counterparts of themselves. Translations, as those in
example (1), will be one of the few instances where this might indeed be
the case. Even in that example, however, the individual texts may not have
influenced one another directly, but may each have been translated from
Latin, Greek or Hebrew originals. In short, their great comparability on
so many levels of formal description may be a happy coincidence from
the point of view of language historians, but is probably not representa-
tive of the normal situation. If one seriously attempted to describe lan-
guage change on the textual level, one would often find oneself having to

9 There is a way in which this statement is only partly true. As computer-aided studies of
text corpora have shown, there exist collocations and phrases in every language which
tend to recur quite frequently in individual utterances (see, for example, Sinclair (1992)
and Svartvik (1992)). These, albeit fairly small, stretches of text do represent formally
comparable counterparts of each other. As stretches of texts get longer, however, the
number of structural disparities among them rises drastically.
Approaching ‘language change’ 31

compare the (almost) incomparable, such as Beowulf and the programme

of the Labour Party, for example.
Clearly, the same must necessarily be true for linguistic behaviour as
well. If one wanted to compare instances of linguistic behaviour to one
another one would again face the problem of identifying counterparts.
If we consider competences, however, the situation is different. When
a speaker acquires competence in a language, she normally acquires
a system of knowledge that enables her to produce, to recognise and
to interpret all grammatical sentences in the language in which she is
competent – and not only those she has been confronted with during
the acquisition process. In this sense, competences represent virtual sys-
tems (see Hartmann 1963: 87), and it is possible to argue that they
contain – in the sense of being able to generate, recognise and inter-
pret – all the grammatical textual output that could be produced in a
language. It therefore follows that on the level of competence languages
must be much more comparable to one another than on the textual or the
behavioural levels. Of course, if the correlation between structural sim-
ilarity and historical relatedness is stronger in the case of competences
than in the case of linguistic behaviour or texts, it makes much more sense
to study language change as competence change brought about through
the mediation of behavioural events and their textual outputs, than as
either of the other two theoretically possible types of change indicated in
graph (5).
Drawing a preliminary conclusion from what has been said so far, then,
language change shall be regarded as a set of events which bring about
differences between temporally successive competences, and which cru-
cially involve events on behavioural and textual levels. As has already been
indicated, this is compatible with the normal intuition, of course, which
makes one think of the examples in (2) as evidence, rather than as instances
of language change. One takes them to show that ‘the English language’
has changed over time, and not only that English versions of Luke :
11–13 have changed.

3.3.6 Beyond the individual: language and the community or

language ‘as such’?
Let us consider next whether the picture of language change which has
emerged so far is reconcilable with the notion that language represents a
system of social conventions which is not completely represented in the
mind of any single speaker, but only within whole speech communities, or
with the notion that language might be an abstract system of knowledge.
Both notions reflect the common sense view that no natural language is
32 Selfish Sounds and Linguistic Evolution

the property of any single person, but rather something super-personal

to which individual members of a speech community have more or less
limited access.
To give a very simple example, there will be English words which you
know and I don’t, and possibly vice-versa. All of them deserve to be called
words ‘of English’, however. The problem does not only concern the
lexical level, of course. Some speakers of British Standard English have
‘intrusive r’s’, and pronounce law and order as /lɔ :rə ndɔ :də /, while others
don’t. Still others may employ intrusive rs only occasionally. But both
phonologies with and without such processes are clearly ‘English’. On
the morpho-syntactic level, there are speakers who will use subjunctives
in phrases such as
(7) It is essential that this mission not fail,10
while others will regularly use indicatives and say
(8) It is essential that this mission does not fail.
Even if individual speakers might ‘lack’ a subjunctive in these construc-
tions, however, ‘English as such’ seems to have it. It seems therefore that
in some sense ‘languages’ are not identical with any individual compe-
tence in which they are ‘implemented’. This suggests that a separate level
might have to be posited to which language, in this super-individual or
collective sense, can be attributed.
So far, so good. It is a different question altogether, of course, how
the difference between individual speakers’ competences and the whole
body of linguistic knowledge to which a speech community has access
should be conceptualised. As far as I can see it, there are basically two
ways of approaching the issue. On the one hand, one might think of
the socially distributed body of linguistic knowledge simply as the set of
all competences present within a certain community of speakers. Any
competence would then be simply part of a larger ‘pool’ of related and
typically similar competences. Note that this does not necessarily imply
that such a ‘pool’ would be the mere sum of the competences it is made
up of. It is equally well conceivable that the competences which make up
a ‘pool’ and thus amount to the ‘language’ of a speech community should
be organised into (fuzzy) subsets. Competences within a subset will be
more similar to one another than to members of other subsets within the
whole ‘competence pool’ that constitutes a language. That such subsets
must exist is of course well established,11 and one commonly distinguishes
10 The example is from Quirk et al. 1985: 156.
11 See Weinreich/Herzog/ Labov (1968) for some of the implications of this fact for historical
Approaching ‘language change’ 33

between various ‘dialects’ or ‘sociolects’ of languages. These varieties

typically correlate with regional, social or other subsets of speakers within
a community. Also, a part–whole conception of the relationship between
individual competences and the ‘language of a speech community’ would
not preclude the possibility of differentiating between core properties of a
language, containing only properties shared by a majority of competences
within a community, and peripheral features, that is, such properties
as can be found only within the competences of smaller – and possibly
regionally, socially or similarly defined – subgroups of speakers within a
A strategy which is radically different from such a basically quanti-
tative approach is to regard that body of linguistic knowledge which is
represented only imperfectly within individual speakers’ competences as
a kind of ideal type (Chomsky’s I-language).12 While the first solution dif-
ferentiates between individual competences and competence pools only
in quantitative terms, the second approach implies a qualitative, onto-
logical difference between an ideal language type and any number of
actual competences. Being realised within the minds or brains of actual
speakers, the latter could be regarded as psychologically and thus mate-
rially ‘real’, while the concept of an ideal language type represents an
abstraction. Common sense has no problems with abstractions of this
type, of course, and we readily accept the idea that ‘English as such’
exists without worrying where exactly it should in fact exist. For every-
day purposes it is OK to assume that ‘it’ is the English we know, which
we share with other speakers or about which we can consult dictionar-
ies or grammar books. Metaphysically speaking, however, the status of
‘ideal types’ has always represented a deep philosophical problem, and
has given rise to such notions as Plato’s world of ideas or Charles Popper’s
World 3.
If one is interested in language change, both approaches confront
one with problems. Taking the materialist, quantitative view, a historical
‘language stage’ such as Old English needs to be viewed as compris-
ing the total number of individual competences to be found in the Old
English speaking population during a certain period. This must clearly
be a large and heterogeneous set with a possibly complex internal struc-
ture, which to reconstruct appears to be altogether impossible in practice.
The textual evidence of Old English is limited to the small number of
written documents which have survived from the period, and even the

12 Thus using Chomsky’s term (for which see Chomsky (1986: ch. 2) or Maher/Groves
(1996: 15–19) slightly differently than he would probably say he does, but in accordance
with some critical interpretations of his way of using it.
34 Selfish Sounds and Linguistic Evolution

hypothetical reconstruction of a single competence from a body of such

texts represents a difficult task, and may easily border on the speculative
in practice. For living languages such as Present Day English, the situa-
tion is not much better although, or rather because, the body of available
textual data is immensely large. It defies analytic efforts because of its
sheer size, even though digitised data-bases such as the British National
Corpus13 or the Bank of English14 may make some questions considerably
easier to address.
The problems involved in grasping and describing a ‘language’ in a
super-individual sense are relevant for all sorts of linguistic enquiry, of
course, not only historical ones. In practice, the impossibility of surveying
a complete speech community or even a representative sample on the
level of the linguistic competences that make it up has forced particularly
those linguists who are interested in competence to resort to conceptual
simplifications or idealisations. Thus, when a body of texts is analysed
with the purpose of reconstructing, or modelling, the properties of a
linguistic competence by means of which it could plausibly have been
generated, these properties are often regarded not merely as possible
properties of the competence of an individual speaker but taken, or hoped
to be, somehow ‘representative’ of most, if not all, the competences within
a given speech community.
The assumption that any single competence should be representative
of others in the community is naturally problematic, particularly if the
high variability that can be observed within all speech communities is
taken into account. Therefore, it has attracted a lot of criticism,15 some
of it polemically discrediting the whole idea of attempting to study lan-
guage on the level of linguistic competence altogether. In reaction, the
strategy of idealisation and abstraction has been defended as a ‘prerequi-
site to any serious inquiry into the complex and chaotic world’ (Chomsky
1995: 19), and the question how a competence as modelled by a linguist
relates to the highly varied set of actual competences in real speech com-
munities came to be brushed aside. It was argued that there was nothing
wrong with modelling the competence of an ‘ideal speaker-hearer living
in an admittedly unrealistic homogeneous speech community’ (ibid.).
That abstraction was supposed to represent merely a conceptual aid for
linguists, who should focus on constructing a possibly complete model of
one kind of competence first, before addressing the question of linguistic
variability later.

13 2000).
14 info.html (January 2000).
15 See, for example, de Beaugrande (1991: 147–87), or, Weinreich/Labov/Herzog (1968).
Approaching ‘language change’ 35

Of course, the concept of an ideal speaker’s competence being repre-

sentative of a completely homogeneous speech community is so similar
to that of an ‘ideal language which real individual speakers cannot know
perfectly because of natural limitations of the human mind’ that the two
can easily be conflated in practice. Furthermore, it connects very read-
ily to assumptions which we all tend to make, albeit tacitly, in everyday
communication. There we seem to use language in the belief that ‘it’ is
shared at least by the people we talk to, and do not usually consider the
possibility that we might all know somewhat different languages. When
we do observe that others speak differently from ourselves we tend to
attribute this to idiosyncrasies in the way they use the language, and even
if we take it to mean that they may indeed speak ‘another language’,
this does not shake our belief that, in principle, there is such a thing as
‘our’ language, and that it is shared by the speakers in ‘our’ community.
Since such common sense attitudes are difficult to suppress, linguists
concerned with modelling an idealised competence of, say ‘Present Day
English’ may easily forget that the hypothetical speaker whose compe-
tence they are modelling does not exist in reality and cannot be claimed
to represent any speech community either.
Now, there may certainly be areas of enquiry, where it is legitimate
and even useful to posit the concept of an ideal language type. If one is
interested in language change, however, this strategy ultimately creates
more problems than it solves. After all, an abstract world of ideas is by
definition non-physical, nor is it temporally or spatially bounded. What
exists in it can therefore not ‘change’ in the normal sense of the word.
If we thought of ‘Old English’ as an abstract system, for example, we
would be forced to accept that in a sense it is still ‘there’, and the same
would be true of Middle English, Early Modern English and, indeed, all
stages and varieties of all imaginable human languages. Any historical,
or causal relation between ‘them’ could concern only the ways in which
they happen to be realised in the physical world. Consequently, the only
meaningful way even to ask the question how they might causally interact
is in terms of their physical realisations. As it was already put very aptly
by Hermann Paul,
[. . .] zwischen Abstraktionen gibt es überhaupt keinen Kausalnexus, sondern
nur zwischen realen Objekten und Tatsachen. [A causal connection can only
hold between real objects and facts, never between abstractions.]16 (1880: 24)

16 Actually, Paul’s statement is not completely adequate in its extreme form. Abstrac-
tions can interact causally, though only via their psychological realisations in human
brains. Furthermore, they may be logically connected, if they are parts of abstract formal
36 Selfish Sounds and Linguistic Evolution

Thus, any description of the processes by which some abstract or ide-

alised Old English might have ‘become’ abstract Modern English, would
have to include the physical realisations of the two languages in terms
of actual competences, as well as the behavioural and textual manifes-
tations by which Old English competences and Modern English ones
are causally linked. In fact, all the elements and processes required to
describe and/or explain a specific linguistic change would be physically
real ones, and no reference at all would need to be made to ideal languages
except at the beginning and the end of the story. This clearly makes them
This means that the strategy of assuming a part–whole relationship
between individual competences and language in its super-individual
sense is in fact the only one which makes any sense for historical pur-
poses. Certainly, the problem that whole competence pools are difficult
to grasp remains, as does the problem that it is difficult to tell for any
model of an individual competence how representative it might be of the
larger set it is part of. Yet, acknowledging this problem explicitly is safer
than brushing it under the carpet under the pretext of carrying out neces-
sary idealisations.18 Furthermore, the problem is not as insurmountable
as is sometimes suggested. First of all, any single linguistic competence
can be assumed to be representative of its speech community at least to
some extent, otherwise, one would think, mutual intelligibility (a defin-
ing property of speech communities) would not be possible. Secondly,
socio-linguists and dialectologists19 have been quite successful in mod-
elling linguistic variation at least with respect to selected properties of
competences. Their findings show that the heterogeneity which charac-
terises speech communities is not random or chaotic, but ordered in a way
which admits meaningful generalisations to be made. The more sophisti-
cated our theories of linguistic variation will get, the easier will it become
to formulate relatively educated guesses about what the properties of a
single or few competences might imply for larger sections of a speech
17 Having said that I-language cannot play any role in a causal account of language change,
this statement needs to be slightly revised. There is a way in which the ‘idea’ of an abstract
and perfect language can influence the properties of physically real instantiations after all.
The very belief in such a language may alter the linguistic behaviour of individual speakers
and thus appear to prove its own reality. An entertaining but instructive treatment on
self-fulfilling prophecies of this kind is Eco’s Foucault’s Pendulum (1989). In this novel,
the protagonists spread rumours about a secret society of their own invention until, in
the end, they are murdered by people who take the rumours seriously, decide they ought
to be members of the invented society, find one another, and thus bring it to life.
18 See also Paul (1880: 24).
19 See, for example, Cheshire (1982), Labov (1972), J. Milroy (1992), L. Milroy (1980
and 1987), or Trudgill (1974).
Approaching ‘language change’ 37

community. Therefore, the best available strategy historical purposes is

to regard languages such as ‘Old English’ or ‘Present Day English’ not
as ideal types, but indeed as structured sets, ‘pools’, or ‘populations’ of
individual competences. While we may be unable to describe such pop-
ulations in any completeness, we will at least know in principle what it
is that we are studying and may even be in a position to make plausible
assumptions about it.

3.3.7 Summary
Let us see what the arguments made so far imply for the interpretation of
statements such as ‘Old English’ has ‘changed into’ or ‘become’ ‘Present
Day English’.
What is usually called ‘Old English’ represents a heterogeneous (yet
most probably inherently ordered) pool of competences ‘in’ Old English.
These competences will each have been different from one another, but
will have shared a sufficient number of properties for making commu-
nication among ‘Old English’ speakers possible. ‘Present Day English’
represents another pool of competences, once again heterogeneous in an
orderly way. Importantly, the mix of competence properties that charac-
terises the pool constituting ‘Present Day English’ differs considerably
from the mix of properties that characterises the ‘Old English’ compe-
tence pool. Some properties that can be found in one pool are absent in
the other, and of those which are present in both some will be distributed
differently. We assume that these differences are due to ‘language change’.
This implies that some causal link can be established between the ‘Old
English’ competence pool on the one hand and the ‘Present Day English’
pool on the other. Most probably, such a link is established via behavioural
and textual manifestations of language, as well as by competences of
‘intermediate’ stages of English. It is supposed that temporally later com-
petences assume their characteristic properties by interpreting the textual
output produced on the basis of earlier competences. Linguistic change
happens because later competences do not always appear to assume quite
the same properties as earlier competences. Thereby, the mix of prop-
erties that characterises the competence pool of a speech community is
continually altered – albeit only slightly – as one new competence after the
other assumes first its adult, and ultimately its final state. At the same time
earlier competences are continually removed from the pool as the speak-
ers who host them die. Over time, these processes may amount to such
differences as those which distinguish ‘Present Day English’ from ‘Old
English’. This, then, is what we refer to when we say that ‘Old English’
38 Selfish Sounds and Linguistic Evolution

Stage 1
T1 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 ... Cn

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 ... Bn

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 ... Tn

C1’ C2’ C3’ Cd C5’ C6’ C7’ C8’ Ci C10’ Ckt C12’ C13’ C14’ ... Cn

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 ... Bn

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 ... Tn

C1’‘ C2’‘ C3’‘ Cd C5’‘ Cf C7’‘ C8’‘ Ci C10’‘ Ck C12’‘ C13’‘ Cn ... Cn

Stage 2
T2 Ca Cb Cc Cd Ce Cf Cg Ch Ci Cj Ck Cl Cm Cn ... Cx

C1 Old competence Cn Obsolescent competence Ca New competence
C1’ Altered competence Tn Text Bn Instance of linguistic behaviour

Figure 3.2 Schematic representation of the processes involved in lin-

guistic change.

has become, or changed into ‘Present Day English’. The processes are
schematically represented in figure 3.2, which summarises, how the dif-
ferent levels on which language manifests itself interact to bring linguistic
change about.
A few things need to be stressed. First, and in spite of its apparent
complexity – the schema still represents only a very general framework
Approaching ‘language change’ 39

for the discussion of language change. It merely indicates which man-

ifestations of language must be involved in language change, and how
they will interact to bring it about: from a set of competences utterances
and texts emerge; a subset of the latter informs the acquisition of new
competences even as some of the older competences are removed from
the pool, due to the death of speakers; the process repeats itself until
eventually a complete exchange of competences has taken place in the
pool. Clearly, this schema is far too crude to say anything about partic-
ular instances of linguistic change or about any specific properties of the
processes behind them. Nor does it make any claims about the internal
structure of competences, and the boxes which represent them are not
intended to suggest that they are homogeneous. We only have referred
so far, very abstractly and generally, to unspecified competence ‘prop-
erties’ and have asserted no more than that language change seems to
alter the mix of such properties in historically successive ‘sets’, ‘pools’,
or ‘populations’ of competences. Also other important issues have not
even been touched upon. For instance, even the very basic question of
whether the processes by which the mix of competence properties are
altered are regular and amenable to systematic description has not even
been addressed. In short, we have only laid out a general groundwork
for a closer investigation of actual instances of linguistic changes and for
reflecting on the question how they might best be described and possibly
explained. Finally, I am quite aware that none of the points so far made
are in themselves very original. In fact, most of them represent received
linguistic lore, and my intention was simply to bring to mind a few issues
which, possibly because everybody takes them for granted, tend to be
overlooked at times. In short, the point of the exercise has been to clarify
the basic status of the phenomena with which one has to deal when one
investigates linguistic change, and it is hoped that the effort will pay off
as we go along.

3.4 Reconstructing a particular ‘phonological change’

After having established basic concepts, let us look next at some evidence
of a particular change and see what we can safely say about it. Take the
texts in (2) again, and consider the word forms me and he, which can
all be found in extracts (a) to (g). We know that in Modern Standard
English (the southern British variety), their so-called strong forms are
pronounced /hi / and /mi /. At the same time, the established interpre-
tation of their spellings in Old and Middle English suggests that before
the sixteenth century counterparts of the two words were pronounced
40 Selfish Sounds and Linguistic Evolution

with long /e /s. Generalising over the two words, one can say that a cor-
respondence seems to hold between present day /i /s and earlier /e /s.
Interestingly, the correspondence holds in very much the same way when
one considers other words which are today pronounced with /i /s and
which had counterparts in earlier stages of English, such as the verbs
meet (OE mētan) and see (OE sēon), or the adjective green (OE grēne). In
fact, it is so general that it justifies a correspondence rule such as OE/ME
/e / ←→ ModE /i /.
Now, that the histories of all languages abound with regular correspon-
dences of this kind came to be established as fact as early as the eighteenth
century and gave rise to the research programme known as historical com-
parative linguistics, advanced most successfully during the nineteenth
century by a group of scholars commonly referred to as ‘Neogrammari-
ans’. It has admitted the writing of law-based sound histories and inspired
important conclusions about linguistic phonology. That the /e /s in words
so different as he, metan, or greene were apparently all affected by the same
type of change may be taken as evidence, for example, that all these /e /s
must indeed be instantiations of one and the same sound type: a point
which is not as self-evident as it might appear.
But all this need not concern us at the moment. Let us return to
the particular ‘change’ we are presently considering and which we sup-
pose to have been behind the correspondence between /e /s and /i /s
in words like he, me, see, meet and green, and their historical counter-
parts. In the community of historical English linguists, the ‘change’ is
well known. It is one in a set of changes which are normally referred to
as the ‘Great Vowel Shift’.20 These changes are likely to have occurred
at the end of the Middle or the beginning of the Modern English
period, although there is considerable disagreement about its exact
Now, for our purposes it is crucial to be extremely careful as we
approach evidence of a change in order to reflect upon the processes that
may have actually brought it about. For instance, we ought to remind our-
selves that the homogenous systems which words such as ‘Old English’,
‘Middle English’ or ‘Modern English’ suggest are unlikely to be histori-
cally real. Also, we try to distinguish sharply between the textual evidence
that the correspondence between /e / and /i / represents, and the entities
and processes we interpret it to be evidence of. In short, we must pre-
vent our common sense from performing generalisations and abstractions
which might turn out to be inadequate for historical linguistic purposes,
and resist the natural impulse to assume that there was a language called

20 See for instance Luich 1914/21: 554–91.

Approaching ‘language change’ 41

Middle English at a particular time, that this language had an /e /, and
that this /e :/ ‘became’ /i / as Middle English ‘became’ modern.
What, then, should we allow the regular correspondence between /e /s
in Old English words, and /i/s in Modern English words to tell us? First
of all, we have to be aware that the words we are looking at do not rep-
resent first-hand data, but interpretations of such data. The first-hand
data on which they are based, are, first, contemporary pronunciations
of the words he, me, see and meet, and their graphic representations,
and, second, historical spellings of pre-sixteenth-century counterparts
of those words, passed down to us in the form of written documents.
These days, such historical documents are made accessible to the larger
academic community in the form of printed, or digitised editions of such
documents. Sometimes, selections of texts from various periods are col-
lected in computer corpora. Most famous among historical corpora of
English is the Helsinki Corpus.21 In it, we will find text passages like the

(9) Trust veryly in God and leve hym and serve hym, and he wyl not
deseve w.
(    225: Heading)
Ther sall a childe borne be,
Goddis sone of heuen is hee
And man ay mast of myght.
(    124: Heading)
Alle þis worlde is wrothe with mee,
fi is wote I wele.
(    72: Heading)
Notwithstanding, by mine advice, if ye have this letter or the messenger
come to you, come to the Kinge wards or ye meete with him, and when
ye come ye must be suer of a great excuse.
(    201: Heading)
For him wes loþ men to mete;
Him were leuere meten one hen,
þen half anoundred wimmen.
(/2    25: Heading)

21 This widely known corpus was compiled at the University of Helsinki, under the direc-
tion of Matti Rissanen. It is part of a corpus collection available from the Interna-
tional Computer Archive of Modern and Medieval English. For further information see
42 Selfish Sounds and Linguistic Evolution

Tak þe grene bowes of an asche & bryne þam & kepe þe jeuse þat
commes owte at þe endis
(/4    6: Heading)
And the erthe broute forth greene erbe [. . .]
(3    ,1: Heading)
ther shaltow seen anoon thilke verray blisfulnesse that I have behyght
(3    430.1: Heading)
first doe it, and be sen meke, that other may lere for to ouercome pride,
(2/4    25: Heading)
The stories which can be derived from data like these all require a
fair amount of conjecture. Thus, although it is very plausible, even the
belief that the sounds which the historical e(e)-spellings represented
were phonetically similar to [e ], depends on an interpretation of wit-
nesses. And the same is true, to a possibly even greater extent, with regard
to the mental properties assumedly underlying and expressed by e(e)-
spellings. There, however, the issue is even more complicated because
there are rather fundamental disagreements within the linguistic commu-
nity about how phonological competence should be conceived of at all.
It would therefore be clearly rash, though tempting, to take it for granted
that the fact that words like he, me, see, green or meet were spelt with e
or ee means that they were mentally represented by ‘the phoneme /e /’.
Yet, certain assumptions clearly have to be made for all the problems that
the reconstruction of historical pronunciations and phonologies poses,22
otherwise it is impossible even to start thinking about linguistic evolu-
tion. Thus, the following preliminary description strikes me as relatively
plausible and safe.
First, before the sixteenth century there seem indeed to have existed
counterparts of the ModE words he, me, see, green, meet, and so on; sec-
ond, their pronunciations are very likely to have involved [e ]-like sounds;
and third, phonological competences will have existed which are likely
to have had a property in common, namely that of which the graphic
e(e)s and the assumed phonetic [e ]s can count as ‘expressions’. Let
us call this shared property, for the purposes of the present discussion,
From what is known about linguistic diversity in present day speech
communities, we may safely conclude that, at any point of time before the
sixteenth century, the set of linguistically competent minds to be found

22 See Lass (1997) for an excellent discussion.

Approaching ‘language change’ 43

in the general area of England, will have been about as heterogeneous as

comparable sets of minds are today, and definitely much more so than the
limited number of surviving textual witnesses seems to suggest. We cannot
take it for granted, therefore, that all of those minds will have contained
mental representations of the words he, me, see, green and meet, nor can
we safely assume that, even if they did, these mental representations will
necessarily have involved property {EE}. In fact, all that we can safely
assume is that a proportion of minds within the larger set will have had
those properties.
As far as the contemporary situation is concerned, we are in a similar
position, of course, with the crucial difference that we have a very good
idea of the actual diversity that characterises the vast set of minds that
can roughly be thought of as being linguistically competent in a way that
would make them pass as ‘speakers of English’ in the eyes of themselves,
other people and linguists. Thus, the Linguistic Atlas of England (Orton–
Sanderson–Widdowson (1978: Ph94)), lists [i ], [e ], [iə ], [ε i] and [əi]
as recorded pronunciations of the vowel in green, and this is certainly
not the complete spectrum, because the Atlas is based on a survey of
rural dialects only, fails to distinguish between social or situational reg-
isters and does not contain any information about ‘Englishes’ that are
spoken and ‘known’ outside England itself. Also, of course, the Atlas’
focus is on phonetics rather than on phonology, and therefore charts only
behavioural rather than mental diversity.
At all events, the picture which the Atlas conveys, as well as the infer-
ences that can safely be drawn from it about the situation in pre-sixteenth-
century England confirm that statements like ‘Middle English /e / was
replaced by, changed into, became, or shows up as ModE /i /’ represent
gross simplifications, and do therefore not offer themselves as safe start-
ing points for further enquiries. In particular, the fact that green is still
pronounced as [e ] in some regions of England even suggests that in some
cases – dare we call them ‘dialects’, or ‘varieties’? – no ‘change’ seems to
have taken place at all.
Still, certain things can nevertheless be asserted. First, [i ] is the vowel
that green gets in utterances that are widely recognised as Standard
English. And, second, map Ph94 of the Atlas shows that, at least in geo-
graphical terms, [i ] enjoys by far the widest distribution of all variants
of the green vowel in Modern England.
As far as the pre-sixteenth-century situation is concerned, a brief look
into the relevant entries in the Oxford English Dictionary confirms that
it was indeed characterised by the diversity one is led to expect anyway.
Thus, the OED lists the following attested spellings for the words we are
interested in:
44 Selfish Sounds and Linguistic Evolution

Figure 3.3 Variants of greene in Modern English dialects (map Ph94

from Orton–Sanderson–Widdowson 1978).

he 1- he (6 7 h’), 2–3 hi, 2 heo, 3–4  e, ghe, 3 hæ, 3–4 ha, 4 ho, 3 e,
3–9 (dial.) a, 4–5 hye, 6 hie, 4–7 hee

23 The numbers before the cited forms represent the centuries of attestation, with ‘1’ indi-
cating the eleventh century, ‘2’ the twelfth, ‘3’ the thirteenth, and so on. Additionally,
‘1’ is exceptional in that it refers also to the centuries preceding the eleventh, from
which manuscripts in English have survived. A hyphen following a number identifies the
spelling that is still used in Modern English Standard.
Approaching ‘language change’ 45

me 1- me, mec, 3–4 mi, 4–7 mee

see (infinitive forms): 1 séon, sı́on, Merc. séan, sı́an, Northumb. séa,
2 syen, 2–3 sien, 2–5 seo(n, 3 sean, 3–4 sei(e, 3–5 sen, 3–6 se,
4 suen, seeyen, sey, sy, si, Kent. zy, si, 4–5 seye, 4–6 sene, 4–6
seen, 4 (north.) 6–7 (Sc.) sie, 5 seene, seyn, 5–6 seyne, 3- see
meet 1 métan, Northumb. moeta, 3 meten, 3–4 miete(n, 3–6 mete, 4–7
Sc. meit(e, 5–7 meete, (4 meyt, met, 5 mett, 6 might) 4- meet
green 1 gróeni, gréne, 2–7 grene, 4–6 grenn(e, greyn(e, 4–7 greene, gren,
5 – green, 6 greane, grein(e, gryne

Of course, caution is warranted before one commits oneself to quanti-

tative statements. The OED lists spelling types, and does not say anything
about currency of tokens. Also, it lists spelling variants, not pronuncia-
tions. The actual percentage of people who could read and write was cer-
tainly considerably smaller before the sixteenth century than it is today. It
is therefore possible that the texts they produced when they wrote may be
fairly unrepresentative of the spoken texts that were produced by them-
selves and other people, and in all sorts of different contexts. Further-
more, the medium of writing may be of an inherently more authoritative
and thus normative character than the medium of speaking, so that writ-
ten texts may in fact be likely to display less diversity than speech and
minds do.
Yet, even though writing may invite standardisation and thus hide
diversity in speech and competence, the textual attestations from pre-
sixteenth-century England do seem to admit certain tentative conclu-
sions. Thus, in the case of green, no spelling variant containing an i or
y graph is attested before the sixteenth century. For meet, me and he, the
number of spelling(type)s without i/y attested before the sixteenth cen-
tury is in all cases greater than the number of those with. In the fifteenth
century, the one which may have immediately preceded the developments
we are interested in, only he is attested with an y spelling. If the gener-
ally accepted view that pre-modern, non-normalised spelling tended to
reflect speech and/or phonology more faithfully than modern standard-
ised spelling systems do is correct, it would be absurd to assume that [i ]
variants of words like he, me, see, meet or green were as widely distributed
as they are today. i spellings are simply much rarer than e spellings.
Even though a certain number of [i ] forms may well have existed, they
are definitely likely to have represented a small minority.
Thus, we can safely say without prejudging the matter that from the
sixteenth century onwards both the number of utterances in which words
like he, me, see, meet or green were pronounced with [i ]-like sounds and
the number of competence(state)s in which they were represented accord-
ingly (through a property which we shall call {II} and define, in analogy to
46 Selfish Sounds and Linguistic Evolution

the property {EE}, as a competence property that gets expressed as [i ])

seem to have risen at the cost of [e ]s and {EE}s. Since we argued above
that linguistic history is best approached as something that happened
on the level of competences, a good starting point for further enquiries
would be to observe that with respect to the distribution of {EE} and {II}
in representations of words such as he, me, see, meet or green the make-
up of the competence pool in the area of England has changed between
the fifteenth century and now. Instead of talking about a straightforward
qualitative sound change – i.e. ‘/e / becoming /i /’ –, we are talking about
a quantitative change in a population of competence properties, realised
within a population of minds that undergoes a continuous process of
member renewal (essentially through the death and birth of speakers,
but also through their migration).
Of course, one thing needs to be pointed out. While it is plausible
to assume that the population of competences in pre-sixteenth-century
England was characterised by diversity, and while it is perfectly legitimate
to adduce the attested variety of spellings as evidence for this assumption,
it would be rash and unjustified to conclude that the properties in that
pool were ‘the same’ as the ones to be found in contemporary England,
and that all that has changed is their distribution. In particular, it would be
rash to infer from the existence of early i-spellings of words like he, me,
see, meet or green that competences existed before the sixteenth century,
which not only caused the words in question to be pronounced with [i ]-
like vowels, and to be spelt with is, but which must therefore have had the
same properties as the contemporary competences that ‘produce’ com-
parable pronunciations. If phonological competences are multi-levelled,
as most contemporary linguists would seem to agree, and if the various
levels of representations are linked by rules or similar derivative devices,
then there may be more than one competence property which could cause
words like he, me, see, meet or green to be pronounced with [i ]s. It cannot
be assumed that those which did so in Early English were the same as
those that do so today. All we can assert for now is that the population
of competences was as heterogeneous in pre-sixteenth-century England
as it is today, and that it may never have consisted exclusively of compe-
tences in which words like he, me, see, meet or green were represented with
/e /s.
At some point in time, of course, one or more competences must have
emerged in England, in which the relevant words were indeed represented
in a way that can count as a predecessor of, and that is similar to the way
in which they seem to be represented in the majority of contemporary
competences. How and why such an event may have taken place, and
if that question is at all important, will need to be discussed. Suffice it
Approaching ‘language change’ 47

for now to observe that after it had occurred, the rest of the ‘change’
must have been essentially a quantitative matter, that is, a change in
the distribution of competence properties within a population of minds.
It seems to me that this quantitative aspect is far more significant for
understanding linguistic evolution than the question how new properties
may arise within a population of competences in the first place. After all, a
competence property which never gets implemented in a sufficiently large
number of individual minds, will be as good as unnoticeable for speakers
and linguistic historians alike, and might as well not exist at all for that
matter. Therefore, the central question seems to be how a change in the
relative frequencies of properties within a population of competences may
come about, and what may cause it.
This question, it seems to me, breaks into two, more fundamental prob-
lems, namely (a) how do competences assume the properties they have,
and (b) how do they maintain them? Speaking very generally, problem
(a) can be given the following answer: competences get their properties
partly from their biological basis – which is genetically determined and
which imposes universal constraints on the properties that competences
may assume,24 and partly from the dynamical and complex interaction
between minds and environmental factors, which results in ‘language
learning’ or ‘language acquisition’. Question (b) can be answered along
similar lines. In order to be maintained, the properties of competences
depend both on their genetically provided biological substrate (such as
a working long term memory in a living body) and on interaction with
their external environment (linguistic competence that is not put to any
use whatever, for example, is unlikely to remain stable).
Now, among the factors in the environment of a linguistic compe-
tence that are likely to affect its properties, the experience of meaningful
discourse will certainly be more prominent than most others. In order
to experience meaningful discourse, however, a person depends on the
presence of people with whom they can interact via behaviour and text
production. In order for such interaction to be possible, of course, those
people need to be linguistically competent as well, and the discourse
they produce, understand and react positively to will clearly reflect, or
express, the properties of their own competences. Thus, the properties
which individual competences can assume and maintain will to a con-
siderable extent be determined – albeit indirectly – by the properties of
the competences in their environment, that is, in the population in which
they develop.

24 In generative theory: the ‘initial state’ of human competences, or ‘universal grammar’.

48 Selfish Sounds and Linguistic Evolution

Of course, this is exactly the causal link chain which has been pro-
posed in section 3.3.5 above, and which is implicit to the view that lan-
guage(stage)s derive historically from earlier ones, and have a bearing on
the evolution of later ones. In short, competences are likely – and indeed
known – to assume properties that are similar, or – on a sufficiently coarse
graining – ‘identical’, to the properties of the competences their ‘owners’
interact with. One can therefore also say that (a) the competence proper-
ties to be found within a population get copied or replicated when they are
assumed by other (typically) new competences, and that (b) it is through
such replication that their stability within a population is brought about.
Thus, the question how a change in the distributions of {EE} and {II}
in representations of he, me, see, meet, green and so on was brought about
can be answered, very generally, by saying that {II}s were copied and
maintained both more effectively than, and at the cost of, {EE}s. Gener-
alising from the particular case, it can then be said that the question how
languages evolve and change boils down to the question how competence
properties emerge, how they replicate, and how, by doing so, they acquire
historical stability.

3.4.1 Language evolution as property replication

What the perspective just outlined amounts to is that languages (in the
sense of linguistic competences, or populations of such) might be con-
ceived of as types of replicating systems whose properties, or constituents,
at any point in time can be explained – at least in principle – by saying that
they are there because they have replicated before disappearing. Now, at
first sight this view does not impress one as very illuminating. After all, it
has always been known that languages are ‘passed on’, in a sense, when
they are learnt or acquired, so calling them ‘replicating systems’ may be
a fancy way of putting this, but whether it has any value beyond that is
unclear. What makes replicating systems special

So, we have said that linguistic evolution (that is, both historical stabil-
ity and change) results from the replication of competence properties.
Where then does this take us? Does the fact that languages (consist of
things that) happen to get copied help us to understand why they are as
they are? At first sight, this might not seem to be the case. Competence
properties are replicated, or copied, by humans, and humans seem to be
capable of copying all sorts of things. For instance, apart from copying
texts25 orally, in writing, or by other means, people have been known

25 Note: texts, not competences!

Approaching ‘language change’ 49

to ‘copy’ pictures, music, knowledge and artefacts of all kinds, chemical

substances, and even organisms. Yet, acknowledging the mere fact that
people apparently (re-)produce – in one way or other – things like apples,
pigs, sugar, PVC, sushi, chairs, tables, cars, cell-phones, the Pythagorean
theorem that a2 +b2 =c2 in right-angled triangles, waltzes, Beethoven’s
Ninth, Andy Warhol’s portraits of Marilyn Monroe, the urban legend of
the ‘Vanishing Hitchhiker’, or the New Testament, does not seem to tell
us how or why they are reproduced, nor does it appear to explain why
they are as they are. The only conclusion that seems to suggest itself is
that they might be of some value to people, which seems to follow from,
but does not explain their properties.
True, saying that things are replicated does not yet answer the question
how or why that is performed. But it may well provide clues to the question
why they are as they are. For example, there is a fundamental difference
between the ways in which replicated things and non-replicating things
can be said to ‘exist’. Take, as clear cases, Beethoven’s Ninth and Mount
Everest. Mount Everest exists only in a single copy and will cease to exist
if that copy should vanish. Beethoven’s Ninth, on the other hand, exists
in multiple copies and in different ontological domains: on paper, on
disk, in sound, and in memory. Whether the autograph of Beethoven’s
original composition is still there or not, is not relevant for the exis-
tence of Beethoven’s Ninth as long as any copy or instantiation of it is
around. Therefore, Beethoven’s Ninth will only cease to exist when each
single instantiation, or copy, of it is gone. Thus, when we refer to a non-
replicating item, we usually refer to a single entity, while when we refer
to a replicating item we usually refer to a set of entities which share a
common master copy, or ancestor. In other words we are actually refer-
ring to one or more ‘lineages’ of objects.
Now, as time goes by, the properties of Mount Everest will undergo
changes through erosion and similar physical influences, and it is possible
to explain and predict these changes in essentially physical terms. The
changes which Beethoven’s Ninth can and will undergo are of a com-
pletely different kind, however. As versions of it are passed down through
history they may be altered in a complex variety of ways. ‘It’ may come
to be played by different kinds of orchestras or on different instruments.
Reflecting the tastes of their times, musical directors will ‘interpret’ and
‘render’ the piece in very different ways. Even more radically, composers
may produce copies which just bear enough resemblance to the original to
justify calling it ‘ Beethoven’s Ninth’. Eventually, the things that go under
the name ‘ Beethoven’s Ninth’ will come to represent an inhomogeneous
pool of rather diverse versions, and each of them will display properties
that reflect the conditions under, and the means by and the purposes for
which it has been (re-)produced. Also, not all versions will be equally
50 Selfish Sounds and Linguistic Evolution

long-lived. Some will disappear while others will become more frequent,
so that it is conceivable (and probably even inevitable) that over time also
people’s understanding of what Beethoven’s Ninth is actually like will
change. Contrary to the changes that Mount Everest can undergo, the
course which the evolution of Beethoven’s Ninth will run is not a matter
of simple decay. Instead, it will reflect a complex variety of circumstantial
factors and be very difficult to predict.
Crucially, the differences between replicating and non-replicating items
have consequences when it comes to describing and explaining their prop-
erties. The case of Mount Everest is rather straightforward. If we want to
understand it, we will subject the mountain to a empirical investigation,
measure it, analyse its composition, and so on. We can then understand
it as the result of the seismic and volcanic forces that built it, and the ero-
sive forces that have given it its present shape. In the case of Beethoven’s
Ninth, however, it is not even obvious what it actually is that we should
describe: should we look at Beethoven’s autograph? Should we look at
contemporary interpretations? Should we look at all (or at least many) of
them and describe what they have in common? Obviously, we shall have
to make some decisions, however, and whatever they may be, it is clear
that we shall never be studying ‘Beethoven’s Ninth as such’ but always
particular instantiations of it.
If we study individual instantiations or copies, however, many of their
properties will be due to the fact that they are copies and thus to the
mechanisms and the circumstances of the processes by which they came
to be copied. Denying this would be indefensibly essentialist: only if
Beethoven’s Ninth ‘is his Ninth is his Ninth’, will differences between
individual instantiations be irrelevant and one copy ‘as good as another’.
But Beethoven’s Ninth is not simply ‘his Ninth’. There are only individ-
ual instantiations out there, and that we happen to regard them as tokens
of a single type is just our normal common sense way of dealing with
them. As soon as one acknowledges that only individuals exist, of course,
both the properties they share as well as the differences between them
do matter, and many of them are likely to reflect factors relating to their
Let me illustrate these claims through an example, and look at
the following reproduction of Andy Warhol’s Marilyn Monroe26 (see
figure 3.4 on page 51). Consider the peculiarities of your individual copy
first. For example, the fact that it is in shades of grey rather than in
colours follows from the design of the laser printer with which the copy
was produced. The amount of detail in the copy reflects the resolution

26 on 13 April 2000.

Approaching ‘language change’ 51

Figure 3.4 Andy Warhol’s portrait of

Marilyn Monroe, or rather a ‘copy’ of it.

of my printer and to some degree also the limits of the JPG-format, in

which the digital ‘master’ of the copy you see, is encoded. Its size results
from my decision that it would do to illustrate the principle. Thus, many
of the properties which distinguish this particular copy of Andy Warhol’s
Marilyn Monroe from others reflect circumstances of and constraints on
its reproduction.
So much for your individual copy. But also the properties which it will
share with other tokens of the same type (or: copies of the same master)
are significant. First, each of them is like it is because it has ‘inherited’
many of its properties from the copy/-ies, that is/are its immediate ances-
tor(s). Thus, your Marilyn has inherited most of its properties from a
copy which once sat on the hard disk of my computer. That, in turn,
had inherited its properties from a copy on the computer which hosted and had the IP-address <> on
13 April 2000. Ultimately, all members of the set of pictures that we
categorise as ‘Andy Warhol’s Marilyn Monroe’ go back to the historical
‘master copy’ or ‘copies’ produced by Andy Warhol himself. They all
owe their shared properties to that master copy and are, in that sense,
historically ‘caused’ by it.
Also the fact that ‘your’ Marilyn is there at all is not trivial. Its place
might just as well have been taken by a different picture, been filled with
text or left blank altogether. Yet, there is a pattern of black pixels on
the white background of the preceding page, which reflects properties of
pixel arrangements to be found in many other places. While this might
strike one as an ‘historical accident’, it might also indicate that there is
52 Selfish Sounds and Linguistic Evolution

something about Andy Warhol’s Marilyn Monroe (both the original and
most of its copies) which has made it extremely successful as a replicator.
If this is so, it must – at least to some extent – be that fact which has
‘caused’ the reproduction above. It would then clearly be plausible to say
that your Marilyn is there because the particular combination of properties
that constitute it are a highly successful team of replicators. Thus, both
the existence as well as many of the properties of replicated objects can
be derived from the fact that they have been reproduced and from the
circumstances of their replication.
Similar arguments are valid for everything that people make copies of.
The quality of any particular rendering of Beethoven’s Ninth will both
reflect Beethoven’s original composition and its quality as a (team of)
replicator(s). Also, it will reflect the mechanics by which it is reproduced,
such as the instruments on which it is performed, the virtuosity of the
performers, or their ability to read musical notation. Likewise, the design
of an individual chair will reflect the way in which it was manufactured,
the purpose for which it was made, the creativity and skill of the artisan,
his or the prospective buyers’ aesthetic preferences, as well as their knowl-
edge of and the experiences they made with prior instances of chairs. As
far as apples are concerned, finally, the qualities of individuals will, on
the one hand, reflect the skills of farmers, the tastes of consumers, the
availability of fertilisers and pesticides and so on, and on the other hand,
the properties of those apples from whose seeds the ‘apple copies’ were
Obviously, the number of examples could be increased ad infinitum.
They all show that objects which are ‘reproduced’ by humans do owe
many of their qualities to the ways in which and the reasons why peo-
ple reproduce them. The properties of each individual copy appear to
be co-determined by the qualities of the ‘models’ after which they were
made on the one hand, as well as by inherent constraints and external
factors affecting the processes of their re-production. Thus, the relation-
ship between the properties of replicated items and the circumstances of
their reproduction is non-random and therefore informative. For exam-
ple, by looking at the above copy of Andy Warhol’s Marilyn Monroe, you
may be able to tell that it was copied to paper from a computer via a
laser printer. Conversely, you may be able to predict what the copy of a
picture will look like if you scan it, store it in JPG-format and then print
it out. This is because only some properties of any original are preserved
in a specific reproduction. Others will be lost, as in the case of Marilyn
Monroe colours and everything between the individual ‘pixels’ of which
computerised images are made up. On the other hand, if I were to copy
the digitised version that is stored on my hard disk, the resulting copy
Approaching ‘language change’ 53

would be, for all practical purposes, identical to its master. No further
information would be lost. This is important, because it highlights a cru-
cial difference between Andy Warhol’s original and copies in JPG-format,
namely that, when computers are the most commonly available means of
picture reproduction, the number of copies that look more like the one
on my screen will soon surpass the number of those that look more like
Warhol’s original. If from this day onward only digital copying were avail-
able, we could even predict that sooner or later only copies that look like
the one on my screen will be around. Since nothing lasts forever, all non-
digital ‘Marilyn Monroes’ will eventually decompose or be destroyed,
and all information that has not made it into the JPG-files will be lost.
From the quality of the pictures that they excavate future archaeologists
will then be able to reconstruct that there must have been a period on
our planet when all copying was done by computers, and they may even
infer the properties of those computers from the pictures they find. In
short, when one confronts objects or systems that exist through being
reproduced one cannot neglect that fact if one wants to understand why
they are as they are.
Clearly, languages are systems of this kind . The study of replicating systems and the linguistic community

Interestingly, and although it appears straightforward enough, the view
that languages owe many of their properties to the ways in which they
are replicated has not enjoyed great popularity within the linguistic com-
munity during the last century. Instead, the fact that they have histories
has typically been regarded as independent of their states at any point
in time, and generally thought to have little to do with their nature. The
person who is often held responsible for this influential view was Ferdi-
nand de Saussure (1974), who coined the dichotomy of ‘synchronic’ vs.
‘diachronic’ linguistics.
Basically, the Saussurean position is that languages owe their properties
to a tacit agreement among speakers within specific communities. On its
basis they choose an essentially arbitrary subset of the potentially open
set of properties that their common language, as a system of signs, might
theoretically assume. This view clearly backgrounds the undeniable fact
that the conventions which any speech community at any point tn in
time seems to agree upon are rarely very different from the conventions
assumed at a historically prior point tn-x , as long as x is small enough.
Thus, the Saussurean view suggests that there is no causal link between
the properties of historically successive language stages. Evidently, this
implication is not tenable.
54 Selfish Sounds and Linguistic Evolution

Of course the radical view attributed to Saussure may never have

been taken absolutely seriously anyway, because so much actual evidence
speaks against it. However, the issue does contain an aspect that cannot
be lightly dismissed by pointing to empirical evidence. Thus, it could be
argued that speakers, while they typically learn – and thus ‘reproduce in
their minds’ – the conventional systems of the communities they grow
up in, must nevertheless be regarded as autonomous and free-willed, and
their behaviour as essentially undetermined and hence unpredictable.
To the extent that speakers are ‘free’ to subscribe to social conventions
or not, and to the extent that those conventions themselves result from
a negotiation (albeit a subconscious one) among ‘free’ individuals, the
properties of linguistic conventions, and consequently those of their indi-
vidual implementations as well, must be considered as equally ‘undeter-
mined’, ‘unpredictable’, ‘arbitrary’ and therefore impossible to explain
either from the properties of their historical precursors or from the pro-
cesses that brought their replication about. Therefore, if one takes the
position that languages are indeed mental tools, or systems of knowledge,
which are controlled by the speakers who ‘have’ them, and if one considers
speakers to be essentially free-willed and undetermined, the conclusion
that the properties of any language at any given time are indeed ‘arbitrary’
in a Saussurean sense is inevitable. For all quantitative evidence to the
contrary, the fact that language stages (both in a social and an individ-
ual sense) appear to resemble their precursors must then be regarded as
accidental. Consequently, the idea that languages should be as they are
because they replicate would be wrong. All that could be asserted would
be that speakers of all languages observed so far have behaved in ways that
merely happened to replicate the properties of linguistic systems. Thus,
the question of whether viewing languages as replicating systems will have
any explanatory value at all depends on the philosophical stance one takes
on the determinedness of human behaviour, or at least on the question of
whether people are determined to replicate language in specifiable ways.
These are controversial issues, which merit a more elaborate and detailed
discussion and will therefore concern us again at a later stage.27
27 Contrary to the radical interpretation of the Saussurean approach, the Chomskyan posi-
tion that languages are the way they are because many of their properties are genetically
provided by the design of the human language faculty does not rule out the possibility
that the others owe their existence to the fact that they are replicated. In a way, the histor-
ical perspective developed here may be said to complement the Generative Programme.
It focuses on the evolutionary processes that determine how individual languages exploit
the design space that is laid out and constrained by genetically determined ‘universals’.
Just as the generative programme factors historical variability out in order to get a better
view of the biological basis of language, the historical approach views that basis as an
environmental constant against which linguistic evolution, i.e. the replication of compe-
tence properties, unfolds.
Approaching ‘language change’ 55

Now, although the view of languages as replicating systems was not

embraced by mainstream linguistics in the twentieth century, it is def-
initely not new either. In fact, the idea is older than the discipline of
modern linguistics, and is inherent to the common notion that languages
can have ‘ancestors’ from which they ‘inherit’ their properties, or ‘daugh-
ter languages’ to which they pass them on. This notion was expressed, in
western scholarship, at least as early as the sixteenth century by scholars
such as Theodor Bibliander (1548) or Conrad Gessner (1555).28 Also
William Jones, considered as the founding father of comparative historical
linguistics, observed in 1783 that the similarities between Indo-European
languages were probably due to ‘common ancestry’. So, albeit metaphor-
ically, he also seems to have thought of languages as ‘replicating’. Yet, it
was only later, that is, during the nineteenth century, that the study of
language replication was addressed with technically appropriate concepts.
Thus, Neogrammarian linguists such as August Schleicher explicitly for-
malised family relationships between languages in terms of ‘family-trees’
(‘Stammbäume [. . .], wie diess Darwin [. . .] für die Arten von Pflanzen
und Thieren versucht hat’ (1863: 14f.)) and Hermann Paul proposed to
regard languages, or rather ‘idiolects’ technically as ‘psychological organ-
isms’ consisting of ‘concept groups’.29 He argued that the ‘idiolect’, that
is the particular competence, of a person received its properties through
the influences exerted on it by the ‘idiolects’ of her communicative part-
ners. These could therefore be regarded as its progenitors in a technical
However, early attempts to study languages as historically replicat-
ing systems never acquired the status of a generally accepted research
paradigm and failed, rather ironically, to have much of an impact even
on the historical section of the linguistic community. This might suggest
that the approach must be flawed in some crucial way, which we have so
far overlooked. After all, if an approach keeps popping up within a scien-
tific community for more than two centuries and never really gets off the
ground, there is likely to be a reason for it. So why is it that the perspective
we are proposing here as the most suitable basis for historical linguistic
investigations has failed to be developed since it was first enthusiastically
adopted by nineteenth-century philologists? It would be too simple, it
seems to me, to attribute it all to Saussure and his programmatic claim

28 See Lass 1997: 108.

29 ‘Organismus von Vorstellungsgruppen’ (Paul 1880: 27)
30 ‘[Es . . .] gestaltet sich die Sprache jedes Individuums [. . .] nach den Einwirkungen
der Sprechen seiner Verkehrsgenossen, die wir von unserem Gesichtspunkte aus als die
Erzeugerinnen seiner eigenen betrachten können.’ (ibid.: 38)
56 Selfish Sounds and Linguistic Evolution

that the state of any language system at any time reflected essentially arbi-
trary social conventions, and could be understood better if one neglected
that it also had a history. While Saussurean structuralism, with its focus
on language states rather than language histories, certainly diverted lin-
guists’ attention from historical questions altogether, I do not think it
would have managed to bring a research programme more or less to a
halt, had there not been other factors involved. Instead, the main rea-
son why the linguistic community rejected the notion that the properties
of languages might be understood through the ways in which they are
replicated was that an approach which hinged on the concept of prop-
erty replication turned out to be immensely successful in another scien-
tific discipline, and eventually transformed it radically. That discipline
was biology, of course, and the approach was Darwinian Evolutionary
Theory. Arguably, the progress which biology has made since Charles
Darwin first described species of organisms as systems whose constituents
and properties owed their existence to the fact that they had been repli-
cated more successfully than competing variants has come to change our
view of the world and our role in it more radically than any other scien-
tific discovery before or after. Yet, it has not convinced linguists that they
should adopt a similar approach. Instead its success in biology may have
caused linguists to discard the whole idea more or less completely.
That this should have happened is due, I think, basically to two fac-
tors. First, as both Schleicher’s and Paul’s proposals go to show, the
linguists who first approached languages as replicating systems were not
cautious enough. Instead of slowly and subtly adapting the general idea
of property replication to the realm of language, they adopted concepts
and terms developed by biologists much too straightforwardly. Those
concepts were made to deal with biological phenomena, however, not
linguistic ones, so much unnecessary confusion was created through
rash analogical transfer. August Schleicher, for example, compared lan-
guages to biological ‘organisms’ and proposed that they went through
the developmental stages of infancy, youth, adulthood, senility and even-
tually death. Otto Jespersen (1922), on the other hand, compared them
to ‘species’, and assumed they were optimised and made continuously
‘fitter’ through their evolution.
Of course, it is obvious from our contemporary perspective that both
views are untenable for a variety of reasons (see the discussion in
McMahon 1994: 314–24). However, it is not their inadequacy that mat-
ters here but the reasons behind their inadequacy. To a large extent,
it seems to me, they are to be found in the scholarly impatience with
which conceptual frameworks for the study of language replication and
evolution were developed. Thus, both Schleicher and Jespersen seem to
Approaching ‘language change’ 57

have imposed biological concepts on the study of language in an enthu-

siastic, but at the same time rash and somewhat procrustean manner.
That the suggestions they came up with were impressionistic and did not
really work out is not surprising, and the linguistic community reacted
quite appropriately by dismissing them. Thus, while evolutionary biol-
ogy progressed impressively through investigating life-forms as replicating
systems, the view soon came to prevail within the linguistic community
that, while it might be entertaining, and possibly pedagogically useful at
times, to look at languages in terms of biological concepts, such ways of
talking were only metaphors which should not be taken too seriously.
However, when it was decided that languages were clearly not like
biologically replicating systems in all respects, the idea that they might be
conceived of as replicating systems at all ceased to be pursued as well.
And this is clearly a non-sequitur. After all, there may be other systems
than biological ones that replicate and that owe their properties to that
fact, and languages might very well be systems of that type. Recall that we
have derived that view on purely linguistic grounds and without reference
to specifically biological concepts. So the linguistic community may have
poured the baby out with the bath-water when it decided that none of
the ideas which proved so productive in biology should have a place in
the study of language.
Recognition of this has been growing rapidly during the last years. The
period during which ‘evolution’ was ‘a “dirty word” in [. . .] linguistic
theory’ (McMahon 1994: 314) seems about to be over, and the idea that
languages deserve to be regarded as replicating entities or systems has
been revived in the more recent past. This revival is primarily due to two
facts. First, biological evolutionary theory has come to be popularised far
beyond the boundaries of the discipline by brilliant and rhetorically gifted
scientists and authors such as Richard Dawkins (1982, 1986, 1989, 1995,
1996) Stephen J. Gould (e.g. 1983, 1989, 1996a and 1996b), Matt Ridley
(1994, 1996, 2000), and others. Second, its argumentative core came to
be taken up both in such diverse fields as cognitive science, medicine,
mathematics, sociology and economy,31 as well as by philosophers (e.g.
Dennett 1995) and historians of science (e.g. Hull 1988b). Suspicion
grew that the approach which explained the evolution of biological life
might be equally applicable in other areas of enquiry. It began to be recog-
nised that biological life might not be as special as had been assumed,

31 In 1984, the Santa Fe Institute ( was founded specifically to

encourage interdisciplinary study of complex and adaptive systems from many different
domains. The theoretical foundations of the research programme are strongly inspired
by Darwinism. For an introduction to the history and the agenda of the institute see
Waldrop (1993).
58 Selfish Sounds and Linguistic Evolution

that the particular replicating systems on which it was based might be

instances of a more general system type, and that variants of this type
could be found in many other domains as well. So the idea that language
might be one of them has once again come to appear plausible.
This is illustrated by Roger Lass’ recent volume on Historical linguistics
and language change. In a chapter on relatedness, ancestry and compari-
son, Lass claims that ‘whatever is unique about language, it is still on one
level of analysis (the most fruitful one, as it happens) a replicating infor-
mation system’ (113). He describes language as an ‘information system
not embodied in a permanent physical medium [which . . .] must repli-
cate itself to survive and have a history. The history of [such] a system
is the story of its attempts at replication’ (Lass 1997:111). His subse-
quent discussion of relatedness among languages employs concepts and
terms developed in ‘cladistics’, a particular approach to biosystematics,
and its lucidity shows that they fit the subject matter as snugly as if they
had been developed for the very purpose. Since linguistics is not the only
discipline concerned with replicating systems, Lass argues, ‘we ought to
use the same terminology as far as possible, so as to suggest that the
ontological bases may be the same or similar’ (113).
Lass is not alone in establishing the connection. The idea that languages
do not change by historical accident, but technically ‘evolve’ because they
are inherently replicating systems has come to be voiced with increasing
frequency during the last years both within and without the linguistic
community proper. Among the linguists who have recently published
to the topic there are Bernard Bichakjian (1988, 1996, 1999), William
Croft (2000), Jim Hurford (e.g. 1999), Rudi Keller (1994), Martin
Haspelmath (1999, forthcoming), Lass (1990, 1997), April McMahon
(2000), Salikoko Mufwene (1996, 1999, 2001), and Ritt (1995, 1996,
1997a and 2000).32 Scholars from related fields such as cognitive science,
philosophy or biology, who have made contributions to the topic include
Richard Dawkins (1982, 1986, and 1989) Terence Deacon (1992 and
1997), Daniel Dennett (1990, 1993, and 1995) or Robin Dunbar (1996).
Also, in 1984 a research institution was established at Santa Fe, which is
specifically dedicated to the question of whether such diverse phenomena
as biological evolution, social organisation, economy, ecological systems,
immune systems, cognitive development, and, notably, human languages
might belong to a general class of dynamically evolving or, in Santa Fe
terminology: ‘complex adaptive systems’. A workshop on language was
held in 1989, and produced a volume of papers (Hawkins/Gell-Mann
(eds.) 1992). In 2002, Harvard saw the fourth ‘Evolution of Language

32 See also Ritt (1995, 1996, 1997a and 2000).

Approaching ‘language change’ 59

Conference’, where not only was the question how the human language
faculty might have (biologically) evolved addressed, but also the question
if the histories of human languages should be conceived of in evolutionary
terms. And in 1996, to give a final example, the refereed online Journal of
Memetics33 was established on the internet, which deals with ‘evolution-
ary models of information transmission’. Clearly, human language falls
within its scope.
Of course, revived interest in an approach does not yet say anything
about its qualities, and the fact that other scholars are adopting perspec-
tives similar to the one advocated here does not automatically lend it any
plausibility.34 The question is rather why they are doing so and what they
expect to get out of it. Summary and outlook

As we have seen above, languages can indeed be conceived of as replicat-
ing systems. The view follows naturally from empirical observation and a
few widely shared and almost self-evident assumptions such as that lan-
guages are (at least for historical purposes) best regarded as individual
competence states and depend for their existence on being passed on.
Therefore the historical persistence and the variability of competence
properties within populations must be determined to a large degree by
the processes through which they are replicated. Accordingly, we have
described what is usually called a change of ME long /e / to ModE
long /i / as a development in which competences with a property {II} –
expressed as [i ] – first emerged and eventually spread at the cost of
competences with the property {EE} – expressed as [e ]. Also, we have
argued that replicating systems warrant an approach that treats their
properties as following from the fact that they are replicated, and thus
from the circumstances of their replication. Therefore, in order to under-
stand the replacement of {EE}s by {II}s in the population of ‘English’
competences, we need to find out how competence properties are repli-
cated at all and why the latter replicated more successfully than the

33 Journal of Memetics – Evolutionary Models of Information Transmission (JOM-EMIT): ‘Memetics’, about which more will be said below,
can loosely be defined as the Darwinian study of cultural evolution. The term is derived
from the word ‘meme’, coined by Richard Dawkins to denote a cultural replicator, i.e.
a ‘unit of cultural transmission, or a unit of imitation’ (Dawkins 1982). As we shall see,
the elements that make up linguistic competences are likely to qualify as units of cultural
transmission, or ‘memes’.
34 I am well aware of this, of course, and the reason why I gave the little survey of recent
research was mainly to avoid looking more original than I really am.
60 Selfish Sounds and Linguistic Evolution

We have also seen that representatives of different disciplines are begin-

ning to think that replicating systems from various ontological domains
may share essential properties, and that it might advance our under-
standing of many aspects of our world considerably, if a general theory
of replicating systems could be developed. Among other things, such a
theory could be expected to define the circumstances under which com-
petition among properties or constituents of replicating systems arises
and for what reasons properties may turn out to replicate more success-
fully than others. As far as language and language change are concerned,
it could provide both a conceptual basis and more specific directions for
attempts to understand its mechanics. At the same time, the large num-
ber of empirical data that have been collected, described and classified
during the last one-and-a-half centuries of historical linguistic research
represent a perfect testing ground for hypotheses about the behaviour of
replicating systems, so that linguists are not only likely to profit from,
but may also contribute to the advancement of a general theory of such
The rest of this volume will therefore attempt to sketch how a frame-
work for approaching languages and their histories in such terms might
look, and what explanatory value it has. Very generally speaking, the issue
poses, it seems to me, three different, albeit related, problems. First, there
is the question of what a theory of replicating systems actually is and what
explanatory strategies it can offer. So far we have merely observed – very
impressionistically – that at least some properties of replicating systems
seem to be causally related to the fact and the circumstances of their
replication. Obviously this observation is far too general and vague to
allow the assessment of its explanatory potential. Second, it needs to be
clarified how exactly a theory of replicating systems should be applied to
the domain of language. What we have established so far is merely that
there is some sense in which languages do seem to qualify as replicating
systems. Whether our perspective will still look promising once we have
a more systematic and technical understanding of such systems remains
to be seen. Third, there is the question of whether the study of languages
as replicating systems, should it turn out to be theoretically justifiable,
is at all practicable. After all, it might turn out that the information we
would need in order to operationalise the approach is too elusive to make
the enterprise meaningful, or we might find that the elements involved in
bringing language replication about are too many, and interact in ways
too complex to be tractable.
The next chapters of this book will consider each of the three prob-
lems in turn. First, it will be shown what makes theories that try to explain
the nature of replicating systems in evolutionary terms so attractive. For
Approaching ‘language change’ 61

this purpose, the essentials of Darwinian evolutionary theory will be

described. Then, it will be discussed which elements of Darwinian evo-
lutionary theory can potentially be given a more general interpretation,
so that they can be applied to other areas without losing their meanings.
Once the elements of such a generalised theory of system replication have
been identified, they will be adapted to the study of language and a sketch
of an evolutionary framework for linguistics will be drawn. Finally, that
framework will be applied in a case study of phonological and morpho-
logical developments in the history of English, in order to see both if this
can be done at all and if it produces any interesting insights.
4 The Darwinian approach

Another curious aspect of the theory of evolution is that everybody

thinks he understands it!
(Jacques Monod)1

4.1 A linguist’s view of evolutionary biology2

4.1.1 Why are life-forms as they are?

Among the many issues which biology deals with, the one which attracts
the greatest interest beyond the boundaries of the biological community
itself is the general question why living organisms are as they are. It has
always intrigued people for various rather obvious reasons. First, we are
living organisms ourselves, and why we are here is a rather obvious ques-
tion to ask. Secondly, all species seem to be extremely well designed for the
lives they are born into, and capable of amazing things: spiders spin webs
out of a material that is so elastic and shock absorbent that attempts have
been made to synthesise it for the production of bullet proof vests. Plants
have the capacity to convert sunlight and water into storable energy: a
feat which centuries of human research has not managed to achieve. Fish
have perfectly streamlined shapes, which allows them to move efficiently
and effectively in water, their natural habitat. And humans have central
nervous systems so sophisticated that they admit of rational thought, lan-
guage and culture. So complex and functional are the designs of nature
that for a long time it was inconceivable that they could exist at all unless
1 Quoted in Dawkins 1989: 18.
2 Since it was first developed by Charles Darwin in the nineteenth century, Evolutionary
Theory has obviously undergone certain historical developments itself. For evident rea-
sons I cannot deal with the history of the theory in the context of this book. Thus, it might
be necessary to make clear to what exactly I am referring when I use the term. So, for
the purposes of the present discussion ‘Evolutionary Theory’ will not refer to the original
Darwinian framework, but to its modern version, generally known as the Neo-Darwinian
Synthesis, which recognises the ‘gene’ rather than the individual organism as the unit on
which evolution works.

The Darwinian approach 63

they had been designed and created by a supernatural artisan. Hardly any
culture lacks a creation myth which includes such a deity, and until way
into the nineteenth century the complexity and functionality of almost all
things natural was adduced as a serious and seemingly irrefutable argu-
ment for the existence of a divine creator.3 Now, evolutionary biology
provides a theory which explains the same complexity and functionality
without having to make reference to supernatural agents which represent
huge explananda themselves. According to that theory (first put forward
in Darwin 1859), the amazing properties of living organisms result mostly
from two types of natural, albeit inherently complex, processes.

4.1.2 Phenotypes and genotypes

First, every organism, or ‘phenotype’, is the expression of a ‘genome’,
or ‘genotype’. A genome is a complex DNA molecule containing certain
sections, called ‘genes’, which trigger and direct the synthesis of proteins.
Proteins are the building blocks of cells, and thus of bodies.4 Each organ-
ism hosts – in the nuclei of each of its cells – a copy of the genome whose
expression it is. The actual development of an organism is an immensely
complicated process, which starts from a single cell which first divides
in two. The process repeats itself with the two daughter cells, then again
with their four daughter cells, and their eight daughter cells, and so on.
The genome not only directs the building of cells, however, but also gov-
erns their ‘specialisation’ into the building blocks of very diverse types of
organic tissue until the complex systems which represent viable, mature
organisms are complete. Thus, a genome can be understood as a recipe,
on whose instructions the self-construction of organisms unfolds in com-
plex interaction with environmental factors. This is one part of the story:
the story of embryonic development. The properties of an organism thus
result (to a relevant degree) from its genotype. The laws which actually
derive the former from the latter are the laws of embryology.
Needless to say, the embryological unfolding of an organism makes for
a many-stranded and very complicated story, and the roles which individ-
ual genes play in the construction of individual properties of phenotypes
are far from being well understood. Even though the exact relationship
between genotypes, phenotypes and environmental factors is difficult to
work out, however, one thing is certain: of the properties which an organ-
ism has there will definitely be many with a genetic basis.
3 See, for example, the discussion of Paley (1828) in Dawkins (1986: 7–37).
4 The DNA strands that ‘code for’ different species all synthesise a large number of different
proteins (a human body contains around 100,000, cf. Kauffman 1995: 94), out of which
fewer, but still highly specialised cell types are built (roughly 256 in a human, cf. ibid.
64 Selfish Sounds and Linguistic Evolution

4.1.3 Genotypes and gene replication

Now, while it is helpful to know that phenotypes derive from genotypes,
it immediately raises further questions: why are genotypes there at all,
why are they as they are, and why do they ‘build’ organisms with their
particular properties? Why are those organisms so well adapted to surviv-
ing in very difficult environments? Why are they so complexly designed
and so functional? And why are there so many different types, or species,
of them?
This brings us to the second part of the story. Genotypic properties
do not only underlie phenotypic properties and thus belong to a different
level, but differ crucially in another respect. When organisms reproduce,
only their genotypic properties, but not the properties which they have
acquired for other reasons, are passed on to the next generation. From
this observation evolutionary theory has managed to develop an elegant,
coherent and plausible line of argumentation to explain why genotypes
have the specific properties that show in phenotypes in the first place.5
The theory is essentially historical, and the insight at its core so self-
evident that it strikes many as a tautology: the particular genotypes which
inhabit our biosphere are there because they have managed to produce
copies of themselves before disintegrating – and coding for the pheno-
typic properties that express them has allowed them to do that. Thus,
ultimately the complex and intricately designed organisms which inhabit
the biosphere are there and as they are because genes replicate.
That genes do replicate results, to a large extent, from their own chemi-
cal make-up. Therefore, it is justified to think of genes as active replicators
even though their copying depends crucially on the presence of environ-
mental factors. Genomes, or populations of such, can accordingly be
regarded as replicator systems.
Most gene replication takes place when cells divide. The genomes in
their nuclei divide with them, so that each cell in an organism winds up
with a copy of the complete recipe. Some genes, however, do not end up
in normal body cells but instead in sex cells,6 which can leave their native
organisms and build new ones. Thus, a select number of copies of genes
manage to survive the bodies that they have made. They replicate ‘down
the germ line’ and form ‘lineages’ which are much more long-lived than
any individual gene or body.

5 The sequence in which I’m presenting the two sub-theories actually reverses the order
in which they were discovered. Historically, the explanation why the inheritable elements
that underlie the properties of organisms had to be as they were was discovered before
one knew what genes exactly were, and how they worked in building cells and organisms.
6 This is true for most animal and plant species.
The Darwinian approach 65

Now, as they replicate through history, genetic lineages may accumu-

late copying errors, so that if a population of genotypes is sampled at
two different points in time, the compositions of the two samples will
probably be different. Likewise, every genotype may contain both genes
that are faithful replicas of their ancestors, and of genes, which repre-
sent innovations, or copying errors. Thus, each existing genotype can be
explained, historically, as deriving from previous genotypes via copying
processes that were partly faithful and partly inaccurate.
As such, this self-evident observation may not seem very illuminating.
But consider that in order to have a stable existence, genes require addi-
tional qualities apart from being faithful copies: they need to be minimally
long-lived and themselves able to replicate.7 If they are not, their types will
disappear. It is the achievement of evolutionary theorists to have derived
from this very general, very simple principle an explanatory framework
which can explain how the amazing variety of complex organisms that
have come to inhabit our planet has evolved from possibly very simple
beginnings. The theory works, roughly, like this.
First, although gene replication is highly faithful, some mistakes
nevertheless occur.8 Such mistakes give rise to new gene variants, or
‘mutations’. Secondly, because there can only ever be limited resources
for sustaining genes and for producing new copies, no gene type can exist
forever, nor can the total number of genes increase ad infinitum. From
this follows that at any time only a subset of the gene types that have ever
come into being can exist, namely those which have been sufficiently
stable and/or have managed to replicate before disintegrating. Thirdly,
among the many gene variants that co-exist at any time some may be
inherently (that is, through their own structural properties) more stable,
or better at replicating, than others. Therefore, the number of copies of
the former will increase while that of the latter will dwindle. Thus, genes
which are more stable and better at replicating, or – to use an established
term – ‘fitter’ than others will outlive, out-replicate and eventually oust
the latter. Thus, the composition of the ‘gene pool’ which our biosphere
represents must have been forever changing since it first came into being,
and will keep changing until there will be no genes anymore. This is,
in a nutshell, how the Darwinian theory of evolution through random
mutation and natural selection works.
How then does this framework explain the actual properties of geno-
types and how does it explain why they should build organisms? Since
7 Dawkins (1989: 12–20).
8 What makes them occur, and whether some errors are more likely to occur than others are
highly complex questions. For the purposes of our argument, however, it is sufficient to
accept that some errors are bound to occur for the simple reason that no copying process
can be so perfect that they would not.
66 Selfish Sounds and Linguistic Evolution

this is not a biological essay, it would be beside the point to tell the
story in all its domain-specific detail. Instead, and since my account will
necessarily be schematic, I shall attempt to get it across in terms of a
thought experiment.

4.1.4 The (Neo-)Darwinian theory of gene-based evolution

Consider, first, that when one wants to explain the properties of geno-
types, one normally does not intend to understand every fine-grained
detail of their make-up – such an endeavour is generally out of the ques-
tion. Instead, one is typically interested in a select number of character-
istics. For the purposes of the present discussion let me single out the
following. Let us then say that, first, we want to know why genomes con-
sist of genes that build, or code for, organisms. Second, we want to know
why the organisms which genes code for appear to be so well adapted to
their environments. And third, it would be interesting to know why the
biosphere hosts such a staggering variety of different species (of genomes),
and why these interact with each other in such interesting ways as to form
food-chains, symbioses or other co-operative relationships. The mechanics of gene replication
Genes are DNA molecules which replicate. That they do so follows from
their chemical make-up, and the replication proceeds automatically, that
is, without their or anybody’s ‘intention’. Under conditions favourable
to the chemical reactions involved, a DNA molecule can happen to split
into two single chains, as schematically illustrated in (11b) below. Then,
given a supply of A, T, C, and G building blocks in their environ-
ment, each of the two chains attracts the blocks which it needs to build
its complement, binds them to the fitting loci (11c), and so manages
to reassemble itself to completeness again. When the process is com-
plete, there will be two identical DNA strands where first there was only
one (11d).
(11) CG C G C G CG CG
(a) (b) (c) (d)
The Darwinian approach 67

From this most of the rest follows. In order to see how, imagine a
bowl of free floating A, C, G and T building blocks in which you
throw a single strand of replicating DNA. From what has been said so
far, it seems to follow that this string will replicate until all the available
building blocks are used up. Eventually, you should end up with a bowl
full of identical copies of the original DNA strand, with a small quantity
of residual building blocks still floating around. Replication under constraints Constraints on replicator life-spans
This will only be true, however, if fully formed strings are perfectly stable,
and they cannot be. Just as everything in life is limited, nothing is eter-
nal. If strings can disintegrate, however, the outcome of our thought
experiment will be different from the one described above. Rather than
containing only strings and no building blocks, the bowl will eventually
reach a condition in which it contains a mix of both strings and building
blocks. That condition will not be a fixed state. Instead it will be charac-
terised by the constant disintegration and re-assembly of strings. The only
stable thing will be the ratio of strings against building blocks. If strings
replicate quickly and remain stable for a long period, there will be many
strings and few building blocks. If strings replicate slowly and disintegrate
soon, there will be many building blocks and few strings. Also, in such
a scenario, each string represents a potential source of building blocks. Limits on copying fidelity and the emergence of variation

Now, let us make the scenario still slightly more realistic. In the same way
as nothing is infinite, nothing is perfect either. Clearly, this must apply
to DNA replication as well. Therefore, in a certain number of copying
processes going on in our bowl of DNA, mistakes will occur. Imagine,
for example, that after splitting, a particular half-strand of DNA, say
AAGGCCTT, breaks apart between the two Cs and that the resulting
AAGGC and CTT fragments rejoin in the wrong order  to produce
half-string CTTAAGGC and eventually the string GAATTCCG (from

now on: C–C).

First of all, such an event will clearly introduce diversity into the DNA
bowl, but what will happen afterwards?
 On the assumption that C–C
replicates just as quickly as TTCCGGAA (from now on: A–T) and is

just as stable, our bowl will in the end be filled with a stable mix of
A–Ts, C–Cs and building blocks, the proportion of A–Ts against
C–Cs depending on when in the ‘history’ of the bowl the ‘copying
mistake’ took place and on how many free floating building blocks were
available at that time.
68 Selfish Sounds and Linguistic Evolution Differential replication

It is not certain, however, that C–Cs should be equally good at repli-
cating as A–As, nor need they be equally stable. Since replication and
stability result from the chemical properties of DNA strings, and since
A–Ts are chemically different from C–Cs, the chances that the two
types should replicate equally well and be equally stable is small. So let
us consider possible scenarios.
(1) One possibility is that C–C turns out to be incapable of replication
altogether. Since it cannot be immortal either, it will eventually disappear
from the bowl. (2) Then, C–C might be less stable and replicate less
quickly than A–T. If that is the case, it will not remain in the bowl for
long either. It may of course catch a few building blocks and spawn a
few copies of itself, but in the long run, all of them will be outrun by
A–Ts in the race for building material and, once again, all C–Cs will
eventually disappear from the bowl. (3) Next, there is the possibility that
C–C is more stable and replicates better than A–T. In that case, it
will eventually ‘out-replicate’ A–T and the bowl will eventually be filled
only with C–Cs and building blocks. (4) Fourth, there is the possibility
that C–C replicates better but is less stable than A–T, or the other
way round. While these scenarios might be slightly more complicated to
work out than the former three, one thing is clear. If the combined repli-
cation speed and stability of C–Cs enable them to keep their numbers
increasing in spite of the competition of A–Ts,9 then A–T is doomed,
and C–C will take over the bowl. First résumé

We are now in a position to make further generalisations about the popu-
lation of strings in the bowl. First, it will usually only contain such strings
as manage to replicate faithfully and more quickly than they disintegrate.
Second, new string types will be created through copying mistakes occur-
ring during replication. Third, strings whose combined stability and apt-
ness at replication exceed those of their competitors will – after a certain
time – oust the latter from the bowl.
Thus, we can predict that only such new strings will remain in the bowl
which are at least as stable and good at replicating as the rest of the pop-
ulation at the point of their emergence, and that the overall population
of strings will continually get more stable and better at replicating. Con-
versely, we can explain every change in the population of strings in the
bowl by saying that it results from (a) copying mistakes (‘mutations’),

9 Either because they are so long-lived that they can afford to take their time hunting for
building blocks, or because they are they are so quick at replicating that, in spite of the
A–T competition, more new copies are made than disintegrate.
The Darwinian approach 69

and (b) the fact that strings which are more stable and replicate better
than the rest necessarily oust the others, so that only those are ‘selected’
to remain in the bowl.
This is the essence of the Darwinian theory of evolution: when ‘random
mutation’ (that is, copying mistakes) and ‘natural selection’ (that is, what
comes about when some replicating molecules are better at replicating
or remaining stable than others) occur in a population of replicators,
the overall ‘fitness’ of that population will increase automatically and
without anybody steering the process or consciously trying to design fitter
variants. Consequences of constrained replication: adaptation and

‘phenotypic’ (side-)effects
But we still have not shown how this explanatory framework can explain
the properties of such complex genotypes as underlie living organisms,
why they are adapted and why there is such a variety of them. So far,
we have been talking exclusively about hypothetical and ‘naked’ DNA
strings in an imaginary bowl of DNA soup. Yet, the theory is practically
complete. The rest follows almost automatically.
Consider the following way in which a type of DNA strand might hap-
pen to acquire increased stability. Let us say, a particular sequence of
nuclear bases on a DNA strand has the accidental effect of attracting
molecules which are different from DNA building blocks. Let us say
that, as they attach to the bases that have attracted them, these molecules
form a shield against destabilising forces in the environment of the DNA
string on which they come to sit, because these forces could literally not
get at the building blocks ‘hidden’ behind them. Unless the building of
such shields impairs the ability of a DNA string to replicate well, copies
of strings which ‘build themselves’ such ‘shields’ will flourish at the cost
of copies of strings which don’t.
Thereby, incidentally, the first ‘phenotypes’, or ‘bodies’, will have
evolved, and we can predict that any new variant of a DNA replicator
which is to remain stable within the bowl will have to ‘come up with’
something better than a mere shield, such as either a bigger one, or pos-
sibly a simple device for ‘taking shields apart’, or both. Thus, an ‘arms
race’ among replicators will start, in which only those will survive which
can match the sophistication of their competitors. It is indeed the usual
view among biologists that the bodies, or ‘phenotypes’ of the organisms
which inhabit the biosphere of our planet are just such ‘by-products’ of
evolutionary arms races.10
10 This is why it is strictly speaking not sufficient to think of the DNA sequences that are
around in the world as we know it as strings of ‘mere replicators’ any more (although
that may well be their primary function).
70 Selfish Sounds and Linguistic Evolution

This, then, is also the key to understanding the adaptedness of phe-

notypes: it is not really they that are adapted at all, but rather the genes
whose ‘survival machines’11 they are. The replication of genes is heavily
constrained by environmental factors, including the competition of other
genes, and only such types of genes as manage to copy before disinte-
grating under those constraints will remain in the pool. Thus, it will be
‘adaptive’ for lineages of genes to evolve ‘body construction’. That those
bodies should turn out to be of a certain robustness themselves and able
to perform all sorts of astonishing things in order to stay alive is not what
matters. What matters is that genes require bodies to be that way so that
they, that is, the genes, can replicate before decaying.
Although we shall refine our account about body emergence below
(see page 74), this is basically how Darwinian evolutionary theory
explains why present day DNA strands build themselves the complex
organisms that inhabit our biosphere. The emergence of bodies in their
stunning complexity can thus be derived from the principles that gov-
ern the evolution of replicators, namely that they will evolve by random
mutation and the automatic selection of more stable and better replica-
tors. I find it difficult to think of any explanatory framework that man-
ages to relate such a staggering number of highly complex phenomena to
such a small number of simple concepts and principles governing their
interaction.12 Stable diversity

The model we have sketched so far still seems to make a counterfactual
prediction. It suggests that the replicator bowl will typically be filled with
copies of one replicator type only, namely that which at each stage in
the evolutionary self-reorganisation of the population happens to be the
‘fittest’. Yet, our biosphere does not host just a single type of organism but
a staggering variety of such types. Each of them seems sufficiently stable in
the presence of all the others. In fact, the organisms that inhabit our planet
seem to have divided it into a large number of ecological niches, each of
them hosting organisms that appear to be designed – often in staggeringly
complex ways – for surviving and replicating in just the particular niche
it happens to occupy. How can this be then? All we need to do in order to
explain it, is to take a few self evident factors into account which we have
11 Dawkins (1989: passim).
12 It is necessary to stress that the life-forms on our planet do – in all likelihood – not really
go back to a population of naked DNA molecules swimming about in ‘primeval soup’.
This image is only a pedagogical simplification. It is commonly acknowledged that DNA
molecules have always depended on the presence of rather complex chemical machinery
for their replication, so that ‘bodies’ (of sorts) must have been already around at the
time when the evolutionary processes that have come to produce life as we know it (see
Margulis 1981 and Dennett 1995: 149–86).
The Darwinian approach 71

so far neglected. Up to now we have based our thought experiment on the

unrealistic assumption that the conditions for DNA replications were the
same all through the bowl. But this is clearly an impossible scenario. For
example, there will be a difference in temperature between higher regions
of the bowl and lower ones. Equally, the soup cannot be perfectly trans-
parent, so that lower regions will be darker than higher ones. In short,
within the bowl itself there will be different regions, and these will impose
different environmental conditions on DNA replication. Thus, different
replicator types will turn out to be more stable or better at replicating in
different regions. Some will survive and replicate better at higher temper-
atures, others at lower ones. Some will be fitter in lighter regions, others
in darker ones, and so on. This predicts that the bowl will eventually
come to be populated by a variety of different replicator types. Also, it
predicts that replicators will appear to be well adapted to the respective
environments in which they happen to be evolutionarily stable. They will
appear as if designed for surviving there. Thus, the theory of evolution
through the mutation and automatic selection of replicators is capable of
explaining both the diversity in the biosphere and the fact that organisms
seem to be so amazingly well adapted to the environments they occupy.
By now the three properties we singled out for explanation have basi-
cally been accounted for. We have seen why genes code for organ-
isms, why these organisms appear to be so functionally designed and
so well adapted to their environments, and why life has diversified into
so many varieties. There is, however, one aspect which warrants further
discussion. Specifying the theory: replicator alliances and higher-level

So far in our thought experiment, we have talked about the replicators
which compete with one another for resources as if they were whole strings
of DNA, or whole genomes. But this is not true. No genome is a single
monolithic replicator. Instead genomes are systems of smaller entities,
namely genes, the true replicators, which appear to ‘co-operate’ when
making bodies. This is puzzling because we have said that replicators will
always compete with one another for resources. So why should they team
up and replicate ‘together’? Where does this ‘spirit’ of co-operation come
from? Is our theory lacking some crucial component? As we shall see it is
not. The fact that genomes are complex ‘teams’ of replicators can also be
derived logically from the few Darwinian principles that govern replicator
To see how this is so, imagine the following scenario in our hypothetical
replicator bowl: at some point there exist three different replicator types.
Let us call them A, B and C. Let us say that As are capable of
72 Selfish Sounds and Linguistic Evolution

Legend: ... 〈A〉 ... 〈B〉 ... 〈C〉

, ... building blocks
Comment: ‘Feeding’ on Bs, As replicate best in their neighbourhood.
Bs attract building blocks, so the concentration of such blocks is high-
est near Bs. Cs repel As and star shaped building blocks, so their
neighbourhood is populated mainly by diamond shaped building blocks
and Bs, although a few star shaped building blocks may be found
there as well, when the attracting force exerted on them by a B that
has drifted into the vicinity of a C outweighs the repelling force exerted
by the latter.
Figure 4.1 A population of three replicator types A, B and C.

synthesising a device which has the effect of decomposing Bs into raw
material for replication. Apart from that, Bs are better at replicating
than As because they chemically attract building blocks. Cs, finally,
repel As, so that the latter cannot ‘get at them’, but happen to repel
many building block types as well. Thus, while being relatively stable,
Cs are slow at replicating. To make the scenario easier to imagine, a
graphic representation is given in figure 4.1.
We can now predict that if there were only As and Bs around, the
former would eventually out-replicate the latter and oust them from the
bowl. Equally, we can predict that if there were only Cs and As around,
As would oust Cs because even though they cannot come close enough
to decompose them, they do not repel any building blocks, will replicate
more quickly than Cs and eventually outnumber and oust them. A sim-
ilar relation will hold between Bs and Cs. Bs do not threaten Cs
directly, but their capacity of attracting building blocks will eventually
deprive Cs of raw material for replication. Thus, Cs could not suc-
cessfully compete against either As or Bs alone.
The Darwinian approach 73

Legend: ... 〈A〉 ... 〈B〉 ... 〈CB〉 alliance

, ... building blocks

Comment: The CB alliance is stable in the presence of As. Cs ‘pro-
tect’ Bs from attacks by As, and Bs ‘help’ Cs by attracting building
blocks. As soon as the number of As sinks below a critical point, the
CB alliance will break, Bs will out-replicate Cs, and, ‘unprotected’
will provide easy victims for the As that are still around, so that the
latter will multiply again, forcing Bs back into alliances with Cs.
Thus, the proportion of As, Bs and CBs will keep fluctuating
around a stable equilibrium.
Figure 4.2 The evolutionarily stable distribution of As, Bs and Cs.

But if the three of them happen to be around at the same time, this
changes the matter completely. In the vicinity of Cs, Bs will thrive,
because Cs keep As at a distance, so that they cannot get at Bs
anymore. At the same time, Cs will profit from the ability of Bs to
attract building-block material, but again only as long as Bs and Cs
are close enough to each other. One possible outcome of such a scenario
would then be a stable equilibrium in which all of the three replicator
types co-exist. Bs will only replicate successfully as long as there are
Cs around, which repel ‘B-eating’ As. Cs, on the other hand, can
only replicate successfully if there are Bs around, which attract enough
building blocks so that Cs are not out-replicated by As. And finally,
also As will manage to replicate successfully in this scenario, because
as soon as Bs start to out-replicate Cs, there will be a sufficiently
large number of ‘unprotected’ Bs for As to decompose. We may thus
predict the stable replicator population in the region to be composed of
CB complexes, free floating Bs and free floating As, as illustrated in
figure 4.2.
74 Selfish Sounds and Linguistic Evolution

The following points are of particular interest. Firstly, it becomes obvi-

ous that the complex interactions among rivalling replicators may cause
some to ‘enter into alliances’ with others even though no individual repli-
cator owes its evolutionary stability to being ‘altruistic’. Alliances come
about simply because replicators may manage to replicate well only in
‘co-operation’ with others. Therefore, no additions have to be made to
the basic model of evolution through mutation and automatic selection
in order to account for alliances among replicators.
By this rationale, the Darwinian theory of evolution through muta-
tion and automatic selection can explain the emergence of complex
co-operative relationships among replicators without having to enrich
its basic conceptual machinery by including the notion of co-operation
as a primitive. It can be argued that many higher-level entities, such as
organisms, species and symbioses may represent ‘side effects’ of such
gene coalitions. Derived higher-level categories 1: ‘genomes’ and ‘organisms’

Let us re-consider the bodies of organisms first. We have already said
above (page 70) that the properties of bodies express genes whose evo-
lutionary stability depends on coding for those properties, but we didn’t
explain why genes should enter into alliances with one another in order
to do so. We didn’t explain, in other words, why bodies are coded for
by genomes, rather than by large and complex genes. Wouldn’t it seem
to be more natural for a replicator to make a body all for itself? The
hypothetical scenario developed above shows that this is not necessar-
ily so. Replicators can be forced into alliances if their own existence is
at stake. From their point-of-view, the alliances which we observe from
the outside are not even recognisable as such. For each single replicator,
the others, with which it appears to co-operate, simply represent specific
environments to which apparently ‘co-operative’ replicators are better
adapted than potential non-co-operative mutations. Just as populations
and lineages of replicators which are adapted to warmth will appear to
‘seek’ warm environments (of course they don’t literally, but just hap-
pen to survive and reproduce only there), so lineages of replicators which
are adapted to each other will appear to ‘seek each other out’. Now, if
you imagine a pair of mutually adapted replicators, it is easy to see that
they will be out-replicated by variants which do not depend on chance
for running into each other but which can rely on each other to be there
by default. This is true of replicators which always replicate together of
course, and this is basically what genomes are. Thus, the emergence of
gene alliances which are both complex and historically stable, and which
have even more complex but also stable phenotypic units as their common
The Darwinian approach 75

expressions can also be derived from the Darwinian principles determin-

ing the evolutionary stability of replicator types. Derived higher-level categories 2: ‘species’

Take species next. From what we have said so far, their existence and some
of their properties are still a puzzle. If we accept the account of genome
and body emergence just given, we should expect genomes to repro-
duce so faithfully that although they may well have come about through
gene alliances they should still count ‘large replicators’ in their own right.
After all, a stable gene alliance should be expected to remain stable, that
is, also across body generations. Thus we should expect our biosphere to
be inhabited by populations of genomes and phenotypes whose members
are genetically more or less identical to each other. But this is not what
we see. Instead, there is considerable variation among the members of
any species and/or population – also on the genetic level. In other words,
the replicator alliances which genomes represent are not as stable as we
would seem to have predicted. If genomes are not stable, however, this
means that from the genetic point-of-view species shouldn’t really exist
at all. The genetic difference between, say, you and me, might be smaller
than that between you and your pet dog (if you have one), but the dif-
ference between the differences itself will be a mere matter of degree.
Every successful mutation, and thus every difference between individual
genomes would, practically by definition, constitute a new and separate
species. Now, historically, this seems to make some sense. After all, you
and your pet dog do share a common ancestor which lived a long time ago,
and for some time after that also your separate ancestors will indeed have
belonged to the same species. But in another sense, it makes no sense at
all. When we refer to different species, we assume that there are bound-
aries between them that cannot simply be described in terms of (degrees
of ) genetic difference. Quite apart from the fact that you and your pet dog
couldn’t have offspring together, the genetic differences between mem-
bers of different species are typically much greater than those between
different members of the same species. Thus, species do undeniably have
a certain integrity and seem to represent units whose existence cannot be
derived from the replicator level. It therefore appears that we have reached
the explanatory limits of the (Neo-)Darwinian approach to biology. As I
shall try to show, however, this is not really the case.
For most species, the process which is responsible for most of the
genetic variation among its individual members is sexual reproduction.
In members of sexually reproducing species one usually finds a double
genome set, arranged on corresponding pairs of chromosomes. Thus,
each position in the gene team that a genome represents is filled twice,
76 Selfish Sounds and Linguistic Evolution

and the genes ‘sharing’ a position may differ. Sometimes only one of the
two candidates (the so-called ‘dominant’ gene) gets expressed, while the
expression of the other (called ‘recessive’) is suppressed. At other times,
the phenotype expresses both genes, which yields a mixed phenotypic
property. But all this concerns primarily development. What matters for
evolution is that half of the genes in the total genome of an organism are
inherited from the organism’s mother, and the other half of them from
its father. These halves are provided in the form of paternal and maternal
sex cells ( ‘gametes’), which have to get together when a new organism is
to be formed. Crucially, when parental genomes divide in halves to form
gametes, each gamete gets an idiosyncratic combination of genes from
the two teams in the parental genome. Since the number of genes in a
genome is high, and the number of possible combinations even higher,
no two organisms are likely to have identical genomes (apart, of course,
from identical twins).
Now, since normally only gametes of co-speciates can combine to form
a new organism, it is possible to define a species as comprising only those
individuals whose genomes are combinable in this way. Although not
unproblematic either (see the discussion in Eldredge 1995: 106–20), this
definition appears to be less fuzzy than one based on the number of shared
genes, or (worse) phenotypic similarities, for instance. Thus, species seem
to represent units which do not seem to be derivable from the replicator
However, this may be only superficially true, and the strange combina-
tion of internal variability and relative integrity that characterises species
may after all make sense from the point of view of the replicators involved.
Could there be a reason why an individual replicator might be able to
replicate better or to exist longer, if the composition of the teams with
which it lives and replicates does fluctuate? Imagine a genome which has
reproduced faithfully and a-sexually from its parent. Every gene on that
genome can be sure, metaphorically speaking, that the team in which it
exists represents a viable combination, otherwise it would not have been
replicated and would not exist. At the same time, teams of different com-
positions might be just as viable, or even more so, and the gene would
replicate better, if it was in them. From an individual gene’s point-of-
view, it would therefore make sense to try out new combinations. After
all, it has no loyalty to any of its present team mates. On the other hand,
new combinations may also be unviable, and since there are more ways
of being dead than of being alive, changing teams appears too much of a
risk to be worth undertaking after all. Thus, sexual reproduction should
not really occur.
What, however, if a gene was able to tell which potential new partners
are likely to be safe and promising? Clearly, in this case, the risks of
The Darwinian approach 77

random recombination would be greatly diminished, and the chance of

winding up in a better team will outweigh the costs of leaving a good
one. But genes have no foresight, so how can they know which potential
new team mates might be worth associating with? Well, it seems to me
that there is a plausible way that requires no foresight at all. To see it,
take a specific gene that sits on a specific genome. It ‘knows’ (through
being there) that its own genome makes a successful team. But this is
not only true of the genome on which the specific gene sits, but also
of all others in its environment. They must equally all be viable teams,
otherwise they wouldn’t be there either. Now, of these other teams some
will be similar to the genome of the specific gene we are looking at and
others rather unlike it. Clearly, the chances that our gene will fit into one
of them (in the sense of creating a successful team) will be the greater the
more similar it is to the gene’s present host genome. Thus, a mutant gene
which can get copies of itself into genomes that are roughly like its current
team is likely to be successful, because its copies are likely to survive
and replicate well in their new teams. More than that, it is even likely
to be more successful than a competing gene which will forever remain
faithful to the same team. This is because different genomes/teams will be
adapted to slightly different environments, so that a gene which spreads its
copies to different genomes will itself be able to survive and replicate in a
greater region of the biosphere than a gene which remains bound to just a
single, never-changing genome. Therefore, gene swapping among similar
genomes does make sense from the point of view of the genes involved,
and requires no fore-sight on their part to be evolutionarily successful.
Equally, what we have said suggests that there will be inherent limits
to the strategy of changing team mates. The more unlike a genome is
from the one on which a specific gene sits, the greater are the chances
that it will make it unviable by getting a copy of itself into it, so that
both that copy as well as its potential copies will be lost to the lineage.
In sum, the most successful gene will be one which gets copies of itself
into more types of genomes than just the one on which it sits, but at
the same time only into a limited variety of them, namely into such as
are sufficiently like its current host.13 Of course, sexual reproduction,
which can take place more or less only among co-speciates and thus
establishes species as a unit in the first place, provides the very mechanism

13 Incidentally, the fact that in sexual reproduction genomes do not copy faithfully explains
why genes, and not they, are the primary players in evolution. The high degree of vari-
ability of individual genomes, makes it impossible to say that some type of genome is
more stable, or produces more copies than others. There simply are no genome ‘types’.
Each of them is slightly different from all the others, and even if it is very successful in
terms of the life-span of the organism it codes for, it will not usually find its way into the
next generation intact.
78 Selfish Sounds and Linguistic Evolution

which genes would require for successful team-swapping. It can therefore

have emerged because genes ‘coding for it’ were more successful than
competitors which were incapable of limited genome re-shuffling. Thus,
both sexual reproduction as well as the existence of different species with
relatively clear boundaries between them (as opposed to a continuum
of genetically different individuals) can after all be understood from the
perspective of individual replicators, and explained through the evolution
of those replicators by random mutation and automatic selection. Derived higher-level categories 3: extended phenotypes, families,

social groups, symbioses and the general ‘fuzziness’ of higher-level
We have seen how units and categories such as bodies and species, which
appear fundamental and irreducible to us, can be understood as derived
units and categories, which have emerged and exist because of the bene-
fits (in terms of stability and successful replication) they confer on genes.
Genomes and organisms can be understood as temporary coalitions
of mutually parasitic genes. Species can be understood as higher-level
alliances of different genome types, which allow genes to try out differ-
ent coalitions while reducing the danger that they should wind up with
completely incompatible team mates.
Since neither bodies nor species are fundamental categories, however,
there is no reason why their boundaries should represent unsurpassable
limits to the expression of genes or to their ability to form win–win part-
And indeed they do not. As Richard Dawkins argues in The extended
phenotype, the phenotypic effects of genes do not stop at the boundary of
individual organisms. Thus, the raison d’être of web spinning, for exam-
ple, as well as of the spun webs themselves, is to ensure the evolutionary
stability of genes for web spinning (or webs). The same holds true for
many other types of behaviour and their effects. For instance, birdsong,
nest-building (both the act and the actual nests), and the feeding of off-
spring take place ultimately not for the benefit of birds and their young,
but because the genes for those endeavours have managed to establish
themselves securely in bird genomes. The last example also shows how
genes can establish social groups such as – most prominently but not
exclusively – families. The co-operative relations among different individ-
uals which characterise such groups can again be accounted for through
evolution by the mutation and selection of genes. Take child-care. What
looks, on the surface, like parents making sacrifices on behalf of their off-
spring is likely to be the expression of genes ‘for parental care’ which are
expressed in the offspring’s crying for food and in the parents’ sensitivity
The Darwinian approach 79

to it. On the genetic level, no gene sacrifices anything: the ‘cry-for-food

gene’, and the ‘feed-your-children-if-they-cry gene’ exist in the genomes
of parents and offspring alike and benefit equally from parental care by
being replicated before disintegrating.
The reach of genes does not only go beyond the boundaries of their
host organisms, however. It may reach into the bodies made by other
genes as well, establishing parasitic and/or symbiotic relationships. Thus,
the (physical and/or behavioural) properties of what appear to be single
integral organisms may express genes that do not even belong to one
and the same species. For example, the decomposition of food in human
intestines takes place not simply because it helps human organisms to
survive, and not even only because it expresses human genes, whose exis-
tence depends on it, but also, to some extent, because it expresses the
genes of intestinal bacteria, which live in human guts. From the genetic
point of view, human digestion is thus the concerted effort of more than
a single replicator team.
What follows from cases like this is that, from the genetic point of
view, organisms do not appear as the clear-cut units expected by a com-
mon sense view. Instead, it seems, a functioning organism may often be
regarded as an aspect of a whole set of expressions of potentially more
than one team of genetic replicators. Thus, evolutionary theory not only
explains the emergence of bodies, species, families, social groups and
many inter-species relationships, it also explains why those secondary,
higher-level categories cannot be expected to have clear boundaries. This
is because the only fundamental unit that matters in evolution is the

4.1.5 Summary and some further discussion

As I hope has become obvious in the preceding discussion Darwinian
Evolutionary Theory is very powerful in that it relates many different and
inherently complex phenomena to a more fundamental system, which
consists of elements of just a single kind (that is, DNA-based replicators),
and which is governed by a surprisingly small set of interaction rules. Here
are the essentials once again.
If one wants to understand life on earth, the most productive way of
addressing the issue is on the level of ‘genotypes’. Genotypes are teams,
or systems, of genes, that is, DNA-based replicators. These depend for
their existence on replicating before disintegrating. Replication can never
be perfect and the resources required for it will always be limited, so that
there will always be variation among the replicator types in the biosphere.
Also, different replicator types will hardly ever replicate equally well and
80 Selfish Sounds and Linguistic Evolution

be equally stable under identical environmental conditions. Naturally,

when replicators ‘compete’, the competitions will always be won by those
which replicate best and are most stable under given conditions. Since
conditions are variable, replicator types will diversify, and the diversifying
‘lineages’ that emerge will ‘adapt’ themselves to the specific environmen-
tal conditions in which they thrive. Often, the stability and success of
individual replicators will depend on phenotypic effects they have, such as
‘shields’ that increase their stability, ‘tools’ that help them find resources
for replication, or (ultimately) such complex survival machines as bodies,
organs and nervous systems. Also, in that process complex interactions
among replicators will emerge, including stable alliances of considerable
internal complexity, such as the genomes of the life-forms that we know.
Thus, the properties of the organisms that exist today can be understood
through acknowledging – simply – that the alliances of replicators which
happen to have them as their phenotypic expressions have turned out to
be evolutionarily stable, that is, have managed to replicate before disinte-
grating under the specific environmental conditions in which they exist.
Thus, a large part of the problem of why both present day and all
historical life-forms are/were as they are/were can be reduced to the more
fundamental question of what it is that confers evolutionary stability to
replicators. Evolutionary theory has not only produced the insight that
this is the most productive question to ask, but has developed the basic
conceptual machinery for tackling it as well. The essentially reductionist character of Evolutionary Theory

Some implications of this are quite surprising, such as that the Darwinian
perspective somehow deconstructs the concept of the organism. Many
of an organism’s properties can be understood best, it tells us, if one
does not regard them as properties of an organism at all, but instead as
products of the genes that underlie the organism. Thus, bones, muscles,
skin, organs, indeed the whole sensori-motor system, as well as, rather
disturbingly, minds, are ultimately there not because they help organisms
to survive, but because they are necessary for the survival and replication
of the genes that code for them. But this is not all of it. As we have seen, the
same can be said for behaviour and the results it produces, for alliances
of many different sorts between organisms, and even for the partnerships
formed for the purposes of sexual reproduction.
Now, many people find it rather disturbing to see the integrity of organ-
isms to be deprived of its essence. Interestingly, the negative attitude to
this perspective might itself be explainable as an expression of genes on
the emotional level. After all, genes which take the trouble of coding for
human bodies would be ill-advised if they coded for minds, at the same
time, that didn’t take their bodies seriously. But this is a different story;
The Darwinian approach 81

what matters here is that the perspective which Darwinian Evolution-

ary Theory has developed allows us to understand a number of highly
complex and diverse phenomena on a more fundamental level, whose
elements are much simpler, less diverse and governed by more general
laws. Thus, if described in terms which are specific to the phenomena,
spiders’ webs, for instance, display regularities and patterns which are
very different, both in kind and in detail, to the regularities and patterns
displayed by the digestive systems of bovines, for example, to the patterns
that can be detected in the courting behaviour of peacocks or walruses,
or to the regularities that govern the central nervous systems of humans.
Yet, all of them are derivable, at least in principle, from the properties
of genes, which in all cases can be described – chemically – in terms of
sequences of nucleic acids on DNA strands, and the laws which obtain
on that level are clearly more general and encompassing. In this sense,
the Darwinian programme is radically reductionist. In a way, it reduces
biological phenomena to chemistry.14 Emergent top-down constraints

Importantly, the reductionist character of gene-based evolutionary the-
ory does not preclude the possibility that higher-level units, such as organ-
isms or species, whose existence may in principle be derived from the
interaction among underlying, and simpler, replicators, should in turn
influence the fates of their lower level constituents. Since we have said
that organisms and species are secondary, hence derivable, units, it might
seem that their effects on the fate of genes should be reducible to effects
which genes have on other genes. However, in practice this is not the
case because the ways in which genes interact both with one another
and their non-genetic environments to ‘produce’ organisms and ‘species’
are for too complex to be computationally tractable.15 (See also below,
page 82.) Thus, we shall have to describe many aspects of higher-level
units in their own terms and explain them in terms of level-specific regu-
larities. Consequently, some of the apparent influences they exert ‘back’
on their constituents, will have to be described as ‘top-down’ effects.

14 Evolutionary theory represents not only an epistemological bridge between two sciences,
however. The link between chemistry and biology it establishes seems to be describing
an actual historical development as well. After all, it tells the story about how life may
have emerged from chemistry under the special circumstances on our planet, and it does
so without having to make reference to metaphysical forces such as divine creators or
mysterious substances such as elan vital.
15 The analysis which would produce the raw data for such a computation would involve
immensely fine-grained measurements, and the resulting number of factors that would
then have to be fed into the calculation would be so great and the interaction-laws so
complex, that we will face one of those proverbial situations in which all the computers
on earth would take longer than the actual age of the universe in order to finish the job.
82 Selfish Sounds and Linguistic Evolution

Thus, while the boundaries of bodies might be ‘fuzzy’ from the genes’
point-of-view, they are still boundaries, and genes are more likely to
respect them than not.16 In so far as the evolutionary fate of genes depends
on whether their bodies manage to reproduce before disintegrating, it is
justified to say the relationship between bodies and genes works both
ways, even though much more causal traffic may flow from the bottom
up than from the top down.
Similarly, although sexual reproduction and species may have emerged
for the benefit of genes, their existence has in turn come to impose con-
straints on the evolution of genes, so that sometimes the species-level may
have to be invoked for describing and explaining it. The following argu-
ment (based on Eldredge 1995: 106–39 and following Paterson 1985)
may serve to illustrate the point. Organisms normally seek their mates
for reproduction among co-speciates. Their ability to recognise them is
governed by what may be called their mate recognition system. Now,
imagine a mutant organism, say a dog, which sports a third eye on the
back of his head and which has therefore 360-degree vision. Naively, we
might expect this dog to live longer and to produce more offspring than
its co-speciates so that in the long run only three-eyed dogs will be found
in the population. However, because of its third eye the dog might no
longer be recognised as a dog by the females in the population. They may
avoid mating with him, and seek other partners. Thus, for all the advan-
tages of having 360-degree vision, the dog will not reproduce and the
gene for the third eye will die out with him. Therefore, the fact that genes
have organised themselves into the separate pools which we call species,
has introduced a constraint on their evolution which clearly deserves to
be called top-down, even though it does not warrant the conclusion that
evolution works for the benefit of species. Explanatory limits of Evolutionary Theory The role of environmental contingencies
Having said that evolutionary theory can explain biological phenomena in
terms of chemical ones, it needs to be added immediately that this is true
only in principle, and would require, in practice, an amount of additional
historical information which we shall never attain. Thus, the evolution-
ary changes which populations of DNA will undergo are to a large extent
16 This is why, for example, Richard Dawkins’ habit of referring to bodies as ‘vehicles’ or
‘survival machines’ of genes has come to be criticised by other biologists and philosophers
of science as somewhat too radical and ‘greedily reductionist’, although it represents, of
course, a rhetorically effective way of pointing out that organisms are derived units.
Instead, the more moderate term ‘interactors’ has been suggested (Hull 1988), as it
does greater justice to the fact that not all of an organism’s properties and actions are
under tight genetic control.
The Darwinian approach 83

co-determined by the specific properties in the environments in which

they find themselves. Therefore, in order to explain (coherently and strin-
gently) why a particular gene came to achieve evolutionary stability at
a certain point in time, we would have to know all the environmental
conditions which held at all times from the emergence of the first DNA-
replicator to the first appearance of the specific gene. This is clearly impos-
sible. Even if we knew them, however, we might find that the processes
which have governed the interaction between lineages and populations of
genes and factors in their environment are far too complex to be tractable
by humans at all. This is also because the environment of genes is typi-
cally constituted by other genes, so that one cannot really tell the story of
any particular lineage of genes without taking the lineages of many other
genes into account as well. Thus, the study of a single genetic lineage
may amount, in a rather sobering way, almost to the study of everything,
so that truly reductionist explanations of biological phenomena, deriving
their properties from their chemical components, are practically out of
the question, even though in a sense chemistry might nevertheless fully
determine biology.
Thus, while the insights it has produced are impressive, they should
not be over-interpreted to imply that Darwinian Evolutionary Theory
has explained exactly why present day life forms are the way they are,
how and why they have actually evolved from earlier ones, or how life
did actually begin. Far from it. A vast number of problems still need to
be solved and many issues may in fact remain forever unknown to us.
But this does not really matter. Given the variety and the complexity of
observable phenomena, determining on what levels and in what terms the
question of how life-forms have come to be as they are is best addressed,
is a great achievement in its own right and has laid the foundations of a
highly productive and immensely exciting research programme. Randomness and the impossibility of predictive laws

The issue is indeed so important that it warrants further elaboration. The
fact that Evolutionary Theory does not explain, causally and unambigu-
ously, why exactly every single organism is the way it is, or why evolution
has run its specific course, does not diminish its value. On the contrary,
it is an asset of Evolutionary Theory that it allows us to identify those
aspects which will forever lie beyond our explanatory powers and to dis-
tinguish them from those which we may hope to address with some chance
of success. Thus, as shown in the preceding paragraphs, Evolutionary
Theory has put us in a position to understand why a reductionist and
causal account of life on earth is practically unfeasible while establish-
ing at the same time that biological phenomena have indeed emerged, in
84 Selfish Sounds and Linguistic Evolution

complex ways, from chemical ones. Also, the theory itself shows why some
aspects in the historical unfolding of life will have been truly accidental,
or at least as close to accidental as makes no difference. Since innovations
come about through ‘copying mistakes’, evolution proceeds blindly, that
is, by trial and error, merely preserving ‘adaptive’ mutations once they
have occurred and discarding those that turn out not to be viable. Thus,
evolution has no foresight, and cannot be expected to ‘know’ which muta-
tion would be optimally adaptive under which circumstances. Therefore,
it might just as well not provide it. Whether or not it actually does, appears
at least to some extent to be a matter of chance, precluding the possibility
of a non-probabilistic, predictive theory by definition. The complexities of development

Also, Evolutionary Theory helps one to appreciate how complex the
unfolding of phenotypic properties from their underlying genotypes actu-
ally is and how hard it is to determine which property of a phenotype
expresses exactly which ‘gene’ or combination of genes. Again, this is an
asset rather than a shortcoming of evolutionary theory. Acknowledging
that only genetic properties are inherited, makes it easier to realise that
the effects of genes will always depend on environmental influences on
a developing organism as well. What is important is that the distinction
is made. If genes can be regarded only as necessary, but not as sufficient
conditions of phenotype properties, this may well be so.17 The insight that
the ways in which genetic and environmental factors interact to ‘produce’
phenotypes are highly complex and is something we owe to Evolutionary
Theory, and which should not be held against it. Optimality in Evolutionary Theory

Evolution is popularly understood as being driven by the ‘survival of the
fittest’, and the superlative ending in ‘fittest’ suggests that the organisms
and genomes around at any time might indeed be optimal in the strongest
sense of the term. From this, it is concluded that evolution necessarily
proceeds towards ever greater perfection, and this view is in turn some-
times interpreted to support the flattering notion that, humans, having
entered the scene fairly recently, should be regarded as nature’s highest
achievement so far. However, this view is wrong. Having no foresight,
evolution does not work towards any goal at all. The steps it has taken
have been guided by error and trial, and the courses it has run have
often been directed by chance. Genes and organisms which have come

17 See Lewontin (1982).

The Darwinian approach 85

to acquire evolutionary stability have not done so because they are opti-
mal, but simply because they are viable, and fitter than the competitors
which accident has pitched them against. This implies that local trends
that conspire towards the optimisation of particular functional proper-
ties, such as camouflage, speed, or sight are definitely likely to occur, but
generally speaking, evolutionary developments have never conspired to
produce what might be considered ‘best’ in absolute terms. Instead, they
have selected what has turned out to be good enough for survival and
discarded what is not. In order to make this clear, evolutionary biologists
prefer to speak of ‘satisficing’ rather than optimal properties.
Furthermore, evolution is by its very definition constrained to work
on the basis of what is already there. Its agents are copies, and copies
are always modelled after something that has been there before them.
Thus, evolution never designs genomes from fresh, but creates novelty
by adding to or altering given designs. As it has been aptly put by Konrad
Lorenz (1973: 25), the resulting systems, impressive though they are,
must also always to some extent be bricolage, like houses which are repeat-
edly remodelled over time, and which are full of items that may have been
of better use at earlier times, or which have simply been retained because
they have not proved to be downright harmful.18 Evolutionary Theory as a theory of change

We have said in the introduction to this section (see page 69 above) that
Evolutionary Theory is essentially historical. It casts the biosphere as a
system that is constantly changing and explains phenomena in terms of
the processes through which they have been brought about. One reason
why it is so successful in this is that it has managed to make explicit the
mechanisms by which change is effected. Let us look at Darwinism as a
theory of change then.
Let us start, once again, by stating the explanandum, that is, the kind
of change that we observe and which we expect the theory to explain.
Usually, one conceptualises change as transformation, and as common
phrases such as
(12) The necks of giraffes grew longer.
Humans evolved bigger brains.
The fore-limbs of birds developed into wings.
The leopard got its spots.
show, we extend this way of thinking to evolutionary change as well. In
reality, however, evolutionary changes do not transform organisms at all.

18 Reference from Lass (1997: 315).

86 Selfish Sounds and Linguistic Evolution

The only transformations that organisms do go through happen dur-

ing their development, and are not inheritable. So, the only meaningful
interpretation of statements like the ones in (12) is to regard them as
abstractions over populations or lineages. If one had sampled a popu-
lation of giraffes (or rather their ancestors) at various point over a long
time span, one may indeed have found that the average neck length of
its members increased. What ‘transformed’ would have been the popu-
lation of giraffes, or, more exactly, the distribution of properties within
the population, not an ideal type. At point t1 , a neck length around x will
have been most frequent, while at point t2 most individuals will have dis-
played a neck length around x + n. What we expect Evolutionary Theory
to explain, then, is why changes in the distribution of properties within a
population of organisms come about.
The first thing which the theory tells us is that we have to think in
terms of genotypic (that is, inheritable) rather than phenotypic proper-
ties. Although genotypic properties are in practice difficult to pin down
(unless you are a molecular biologist), it is not impossible to deduce them
from the frequency of phenotypic properties within a population. Thus,
significant changes in the distribution of properties within a sufficiently
large population of organisms are likely to reflect underlying changes
in the distribution of genes within the gene-pool of that population. If
long-necked giraffes grow significantly more numerous over time, this
indicates that genes ‘coding for’ long necks are being selected for and
out-replicating competitors coding for shorter necks.
The rest is easy. Changes in the distribution of genes within a popula-
tion require variety among gene types (created through copying mistakes,
or mutations), and come about if novel types are better at replicating
and/or more long-lived than established ones under the environmental
conditions in which the population exists, that is, through differential
replication and the automatic selection that results from it.
Thus, on the level on which evolution is primarily played out, changes
(in the sense of transformations) do not actually occur at all. Genes as
such do not change. They are not transformed. They are either there
or not. What changes through their differential stability and replication
is the composition of gene populations and gene lineages. Because gene
replication is not completely faithful, no population or species is homo-
geneous. Instead, it will always be characterised by variability. New gene
variants enter populations more or less by chance and randomly. It is
presently impossible (or as difficult as makes no difference) to predict
what kind of copying mistake will occur when and under what conditions.
What can be taken for granted is that mutations occur basically all the
The Darwinian approach 87

Given that heterogeneity is automatically and always provided to a

population of genes, changes in the composition of such a population can
only come about, if one gene variant replicates better than its established
competitors, so that its copies spread, typically at the cost of the latter.
After some time, the distribution of gene variants in the population will
reach a state that differs from the first.
The causes of why a particular gene variant comes to be selected for,
fall basically into three categories. First, the variant might be new to the
population and simply happen to replicate well under the particular con-
ditions in which the population exists. In such a process, the population
will become better adapted to those particular environmental conditions.
Second, the variant might have existed in the pool for some time, but may
have represented a minority in a stable equilibrium with a more frequent
competitor. In such a case, it might come under selection if the environ-
mental conditions in which the population exists change. Such changes
might involve a change in climate, the appearance of a new predator
species, the disappearance of the favoured prey species, or some catas-
trophic event like a volcanic eruption or the collision of the earth with a
meteor. Thus, the driving forces behind evolutionary change can be both
internal and external. The third possibility is a bit more complicated. It
may occur when for either of the first two reasons the composition of a
gene population has changed. Although internal to the population, such
a change counts as ‘environmental’ to other sets of competing genes, and
can disturb the equilibrium among them in the very same way as a change
in the pool-external environment may do.
Whatever the actual causes of a particular redistribution of genes within
a population may be, it is clear that change is always played out between
competing gene variants in heterogeneous populations, and that essen-
tialism has no place in evolution at all. When it comes to understand-
ing particular changes within particular populations, however, one will
have to take two alternative possibilities into account, which will have
different implications for one’s research strategy. If a better replicator
emerges through a copying mistake and comes to out-replicate its estab-
lished competitors under constant environmental conditions, one will
have to focus on the properties of the new replicator and try to discover
why they rewarded it with more offspring than its competitors could pro-
duce. If one has found that answer one has, in fact, explained the change.
If a change is prompted by a change in environmental conditions, on
the other hand, one may still ask oneself which of the properties of the
successful variant was responsible for its greater success at replicating,
but this will only lead to a partial explanation of the change. The other
part will lie in the reasons for the environmental change that triggered
88 Selfish Sounds and Linguistic Evolution

the genetic one. Sometimes, this environmental change may itself have
an evolutionary explanation, as when predators and prey enter into evo-
lutionary arms-races and evolutionary changes in the former trigger evo-
lutionary responses in the latter, or vice versa. Sometimes, however, the
causes of an environmental change may lie outside the field of biology
altogether, as in the case of the meteor which – allegedly – caused the
extinction of the dinosaurs and paved the way for the mammalian area.
Now, it is obvious that biologists should be more interested in such
changes where environmental conditions are – for practical purposes –
more or less constant and most of the relevant action takes place within
the gene-pool of either a single species or a select set of species within
a defined ecosystem. After all environmental changes may have non-
biological causes. Therefore, catastrophic ‘external’ events like meteor
impacts, or climatic changes are often regarded as mere historical acci-
dents, contingencies which do themselves not call for explanations, at
least not for biologically interesting ones. Also, it seems to be the gen-
eral opinion that environmental catastrophes have been relatively rare
and separated by long periods in which populations of replicator teams
(i.e. genomes) were left to play it out among themselves, so to speak,
and to become gradually better and better adapted to conditions which –
from their point of view – did not change much. Explanations of the
complex properties of biological species typically highlight the fact that
they have resulted from a long series of small evolutionary steps in which
successive generations of better adapted replicator teams ousted their less
fit predecessors, while the potential impacts of environmental conditions
are typically relegated to the notorious ceteris paribus condition. Thus,
Richard Dawkins’ provocative answer to the question ‘Why are people?’
is ‘because genes for making people have managed to replicate before
disintegrating’ (1989: chapter 1), and not ‘because a meteor hit the earth
some 50 million years ago’. Clearly, the latter answer would be as justified
as the first, the crucial difference being not its plausibility but the fact that
it does not exactly invite further research. It boils down, in a way, to say-
ing that we are essentially here because of a coincidence, which may be
true but fails to stimulate further enquiry. It is important to remember,
though, that Darwinian accounts of evolution which focus on competi-
tions among rivalling replicator teams within populations and/or sets of
such tend to give us only half of the story, even if it happens to be the
more interesting one.
5 Generalising Darwinism

To the extent that languages in time share certain properties with other
classes of systems simply by virtue of being historical, there is no need
to invoke any ‘special’ local properties in order to characterize their
(Roger Lass 1997: 390)

5.1 The temptations of metaphorical transfer

After having shown what a powerful framework Darwinian Evolutionary
Theory is, let us see whether its basic concepts and its argumentative core
can be generalised and/or transferred to the study of language.
Crucially, this is not the same as metaphorically importing biological
terms and concepts into linguistics. That can and has been (see section, repeatedly done in the past. Not only can one speak of ‘lan-
guage families’ and ‘daughter languages’, or chart relationships among
languages in terms of ‘family trees’: as soon as one starts to think about
it, one will notice many further apparent similarities between the realm
of language and the realm of life. Thus, it is easy to come up with lists
like the following.
r Like ‘organisms’, languages seem to be complex and functional, so
that it seems as if they were ‘adapted’ to the purposes they serve their
r Like species, languages can ‘die out’, and we speak of endangered lan-
guages as we speak of endangered species (see Fill 1993).
r In the same way as the properties of organisms contain information
about the environments in which they live, languages seem to represent
those aspects of the world that speakers think and talk about.
r Just as organisms can be categorised into species and populations,
whose individual members are all different from one another, so every
speaker speaks an idiolect of her own, which will always differ slightly
from those spoken by all others in the community.

90 Selfish Sounds and Linguistic Evolution
r Just as only co-speciates can have offspring with one another, only
speakers of ‘the same language’ can communicate.
r Just as living organisms can be described on the genotypic and on the
phenotypic levels, languages can be described on the level of speakers’
competences and on the level of texts. Texts seem to express compe-
tences just as phenotypes express genotypes.
r By this rationale, verbal behaviour, or performance, would be the lin-
guistic counterpart to embryological development.
r Like development, performance is always strongly co-determined by
external, or environmental contingencies. It is as difficult to predict,
from the properties of a person’s linguistic competence, what that per-
son will say on any particular occasion, as it is to foresee what an adult
organism will be like, if one knows only the properties of its genome.
r As far as the reproduction of both organisms and languages is con-
cerned, neither of the two can bypass the phenotypic level. Just as
genomes need to make themselves organisms to reach the next gen-
eration, so languages depend on being spoken if they want to make it
into the brains of new speakers.
Analogies like these are abundant (see for instance Stevick 1963, or
Lass forthcoming) but can only be carried to a certain point. Sooner
or later one will inevitably reach the limits of analogical transfer. Likely
mismatches are easy to think of:
r DNA replication seems to be a rather mechanical process, in which indi-
vidual sequences are copied not only faithfully but also very directly.
The replication of competences must be much more indirect and multi-
faceted. No immediate links between the competences of any two speak-
ers are conceivable, since brains have no direct access to one another.
r Similarly, straightforward competence lineages do not seem to exist
either. While in asexually reproducing species every organism has
exactly one parent, and in sexually reproducing species not more than
two, speakers normally acquire their languages from a large number
of individuals. The term ‘mother tongue’ is absolutely misleading in
this regard, and it is impossible, for all practical purposes, to determine
which individuals should count as a speaker’s linguistic ancestors.
r Furthermore, the genomes of organisms do not change after concep-
tion. Basically, organisms keep their genomes during their whole lives.
A person’s language, by comparison, does not ever seem to stop chang-
ing at all. Although our competences do assume a certain stability once
we reach the age of ten, roughly speaking, we can learn new words,
idioms, grammatical constructions and pronunciations also after that.
r Another fundamental difference between biological and linguistic evo-
lution is that the former is blind to the characteristics an organism
Generalising Darwinism 91

acquires during its lifetime, while languages by definition need to be

acquired in order to exist.
r Biological evolution is blind and works through essentially random and
purposeless mutations. Speakers, on the other hand, are conscious.
They appear to know what they need language for. Consequently, they
can use it – and may thereby change it – both creatively and purpose-
Again, the list is far from exhaustive, and merely supposed to prevent us
from getting off on the wrong foot. As indicated, the question of whether
the basic concepts and the argumentative core of Darwinian Evolutionary
Theory can be generalised and applied to other domains than biological
life must not be confused with the fact that there exist certain vague
similarities between life and language. While those similarities are clearly
suggestive, they are of little explanatory value. They invite analogical
transfer, but that is about it.
What we need to determine first is if the elements and causal relation-
ships that seem to steer biological evolution are specific to biological life,
or if they are merely domain specific instantiations of more general types
of phenomena and relations. The question is therefore whether evolution
of the Darwinian type can occur independently of the DNA substrate in
which it happens to be realised in the case of biological life, and under
what conditions we may expect to find it. Only later we may ask if such
conditions obtain in the domain of language as well.

5.2 ‘Complex Adaptive Systems’ and ‘Universal Darwinism’

As has already been suggested (see page 57) there seems to be a grow-
ing feeling within the wider scientific community that biological life may
be just one among many systems, which are capable of evolving in a
Darwinian way, that is, by exploiting accidentally provided variation
among their constituents and by ‘selecting’, in response to environmental
pressures, those which make the systems as a whole better adapted and
more stable in the conditions in which they exist.1 Such systems, it is
believed, may include human cognition, vertebrate immune systems, sci-
entific communities, cultures and economies and others. As Gary Cziko,
a recent proponent of what has come to be called ‘Universal Darwinism’,
puts it

1 For essays and studies on that topic see for instance Boyd/Richerson (1985), Cavalli-
Sforza/Feldman (1973, 1981), Cloak (1973, 1975), Dunbar/Knight/Power (eds.) (1999),
Gould (1982), Holland (1975), Hull (1988), Hurford/Kirby (1997), Lumsden/Wilson
(1981), Maynard-Smith (1989, 1996), McPeek et al. (eds.) (forthcoming), Plotkin (ed.)
92 Selfish Sounds and Linguistic Evolution

biological evolution is just one of many instances of cumulative blind variation and
selection leading to the adaptation of one system to another. So although scientific
theories, cultural practices, and genes may exist in very different forms and employ
distinct modes of variation, selection, and replication, these and other superficial
differences have in themselves little bearing on the argument that both thought
and science make progress through a process of cumulative blind variation and
hindsighted selection. [. . .] Biological evolution, insofar as it leads to increases
in adapted complexity, is a selectionist process. But not all selectionist processes
have to mimic adaptive organic evolution in all of its biological details. (Cziko
2000: 287)

A less biologically biased term for systems with ‘Darwinian’ character-

istics is ‘Complex Adaptive Systems’. It is favoured and promoted by a
growing group of scientists from various disciplines associated with the
Santa Fe Institute, a research institution dedicated to the study of ‘com-
plexity, complex systems, and particularly complex adaptive systems’.
How then is a ‘complex adaptive system’ (henceforth CAS) defined?
From the point-of-view of its overall behaviour it is characterised by an
ability to ‘evolve’, to ‘learn’, or, as the name suggests, to ‘adapt to its
environment’. In response to specific environmental influences, it appears
capable of adapting its structure so as to become more stable under those
influences. As far as the internal organisation of a CAS is concerned, it
is assumed that the ‘learning behaviour’ which such a system displays
on the macro-level is not governed by a central agent (such as the ‘self ’
in the case of human cognitive development and learning, or ‘God’ in
the case of life on earth) but emerges in complex ways from massively
parallel activities and the interactions of many simpler constituents, or
agents (neurones in learning, or genes in biological evolution). In some
cases, as in living organisms and their genomes, those constituents are
As the research programme of the Santa Fe Institute requires, its
research interests are necessarily and pronouncedly interdisciplinary and
cover a wide range of phenomena. The following list (from Gell-Mann
1992) conveys an idea of the diversity of subjects being investigated at
the institution:

1. prebiotic chemical evolution, including the chemical processes that gave rise
to terrestrial life around four billion years ago, others that must have given
rise to various life-like assemblages elsewhere in the universe, and related
chemical processes that can be studied in the laboratory, with the potential
for developing products of great utility;
2. biological evolution, leading through mutation and selection to the enormous
variety of life forms on earth and also to the existence and evolution of ecosys-
Generalising Darwinism 93

3. the behavior of vertebrate immune systems, in which specialized cells undergo

mutation at a very rapid rate accompanied by selection processes that facilitate
attacks on invaders of the body;
4. individual learning and thinking in animals, including human beings;
5. human cultural evolution, in which information is transmitted between indi-
viduals and succeeding generations, so that the whole society evolves [. . .];
6. the gobal economy as a complex, evolving system [. . .]
7. the programming of computers to evolve, by mutation and selection, new strate-
gies that no human has designed, for example for playing games.
(Gell-Mann 1992: 8f., my italics, NR)2

As indicated above, the research programme is based on the idea that

complex adaptive systems, which evolve and ‘learn’, are not merely
vaguely similar in terms of macro-structure and overall behaviour but
may share characteristic qualities on the level of their micro-constituents
and the ways in which they interact. Therefore, both bottom-up and top-
down research strategies are pursued in parallel. On the one hand, it is
asked how the qualities which Complex Adaptive Systems have in com-
mon in terms of their macro-organisation and behaviour can be defined
in possibly general and abstract terms, and on the other, one attempts
to find what elements are eligible to become constituents of such sys-
tems and what relations have to obtain among them so that they come to
organise into functionally diversified higher-level assemblies, and even-
tually come to display, in their totality, the expected macro-level ability
to ‘evolve’ and/or ‘learn’.
The research programme is still young, and several proposals have
been made by scientists associated with the Santa Fe Institute concern-
ing the macro-level characterisation of CASs in terms of their organisa-
tion, operation and ‘behaviour’ (e.g. Gell-Mann 1992, 1994, and 1995,
Holland 1995, Kaufmann 1995). In the following, just one of them will
be introduced, namely the version proposed by Murray Gell-Mann. For
the purposes of the present discussion it can count as representative of
most others.

5.2.1 Macro-level properties of Complex Adaptive Systems

According to Gell-Mann, a CAS can be understood as a schema which
contains compressed information about its environment. It can thus be

2 Note the frequency with which terms like evolve, evolution, mutation, and selection occur in
all but one paragraph of Murray Gell-Mann’s overview. This indicates the paradigmatic
character which biological evolution has assumed within the interdisciplinary research
programme, but should not be taken to imply that all other domains are seen, metaphor-
ically, through biological spectacles.
94 Selfish Sounds and Linguistic Evolution

interpreted as a ‘theory’, or ‘model’ of its environment. A CAS acquires

the status of a schema, theory or model through the way in which it
interacts with its environment. Such interaction takes place when a system
‘unfolds’ to yield what may be interpreted as the system’s ‘phenotypic
behaviour’ and/or the ‘predictions it makes’. Crucially, ‘behaviour’ in
this sense is not to be confused with what we normally understand by
the term, that is, the actions of individual organisms or groups. Instead,
it carries a more general and abstract meaning, similar to ‘consequences
or effects’. As it is understood here, ‘behaviour’ would include both the
embryological development of a body – that is, the ‘behaviour’ or ‘effects’
produced by an unfolding genome – and the production of a text – as
the ‘behaviour’ or ‘effects’ produced by the unfolding of a ‘linguistic
The effects which an unfolding system has on its environment
may sometimes feed back on the system itself. In particular, some
environmental responses may reinforce or stabilise the current state of
the system while others may destabilise it. Thus, such feedback will have
the overall effect of ‘favouring’ some system states while ‘disfavouring’
others. In Gell-Mann’s words, ‘The outcome of the unfolding leads to events
in the real world that affect the survival of the schema or of related schemata’
(1992: 11, Gell-Mann’s italics). The model can be graphically repre-
sented as in the figure opposite.
In such a configuration the apparent learning behaviour of the system
comes about, because the effects of a schema’s unfolding under specified
conditions are fed back to it and ‘select’ among various competing system
states or rivalling schemata. At any time, the state of such a system will
reflect its past experiences. Also, its state can be read as incorporating
predictions about the feedback which its behavioural ‘unfolding’ is likely
to incur. As time progresses and quite automatically, such a complex
adaptive system will increasingly assume states which have been (and are
therefore likely to be once again) stabilised by the environmental feedback
incurred by their unfolding, rather than in such which have not. By this
mechanism, a complex adaptive system is likely to assume such a state
which is maximally stable under those environmental conditions in which
it normally finds itself. It can then be said to ‘adapt’ to those aspects of
its environment to which it is sensitive.
The states of a complex adaptive system can be interpreted as encoding
information about the environment, but only to the degree that there is a
causal relation between environmental properties and the states which a
system preferably assumes. Since no schema will be sensitive to all aspects
of the environment in which it unfolds, the information it encodes will
necessarily be selective. Therefore, it can be regarded as a ‘schematic
Generalising Darwinism 95







Figure 5.1 Operation of a Complex Adaptive System (after Gell-Mann

1992: 11).

representation’, a ‘model’, or a ‘theory’ of its environment which, as

all theories necessarily do, categorises, classifies and abstracts from the
potentially available information.
One of the important qualities of descriptions such as Gell-Mann’s is
that they are indeed very abstract and general, and can therefore easily
be applied to specific real-world systems from many different domains.
Gell-Mann himself gives the following examples.

[. . .] in biological evolution, lessons from regularities of past experience are

condensed in the genetic message that each organism carries in its DNA. The
DNA of a baby organism unfolds, in the presence of the environment, in the
sense that development from embryo to adult takes place, determined both by
the DNA and by the fresh information from the environment.
In the scientific enterprise, the compressed schema is a theory. In a given set
of circumstances (described, for example, by what are called, in physics or math-
ematics, ‘boundary conditions’), the theory unfolds, as the result of calculation,
to give predictions that can be compared with experiment.
96 Selfish Sounds and Linguistic Evolution

[. . .]
In the case of a scientific theory, the feedback comes from the comparison of
the results of calculation with experiment. In biological evolution, the success in
producing offspring of an organism with particular DNA (or the relatives of that
organism) affects the survival of that or a similar pattern of DNA.
For individual learning and thinking, schemata can be ideas, including creative
ideas, or patterns of thought that are ways of interpreting the world. The results
of the behavior generated by those ideas or patterns in particular situations can
influence how those ideas or patterns fare in competition with others. (1992: 10f.)
Since our focus is on biological and linguistic systems, let us try to find
out in some more detail, albeit tentatively at this stage, how Gell-Mann’s
schematic diagram can be applied to life and to languages. All we have to
do, it appears, is to specify the elements that fill the individual slots.

5.2.2 Life and language seen as Complex Adaptive Systems Species as Complex Adaptive Systems
For biological life, this may work roughly like this. Let us say that the
adaptive system is a species, that is, a pool of genomes, or replicator
teams.3 A genome can then be regarded as a ‘schema’ which ‘predicts’
that the phenotype for which it codes is likely to be viable and able to
reproduce. Under specific conditions a genome will unfold and produce
its characteristic ‘behaviour’, that is, the actual construction of an organ-
ism with specific physiological properties and behavioural propensities.
In this way, the predictions inherent to the genome are ‘tested’. If a test is
successful, that is, if the organism survives until it manages to reproduce
(and thus to replicate the constituents of the genome), the responsible
genome type will remain in the pool and the number of its ‘tokens’ or
physical instantiations may even increase. If a test is unsuccessful, the
number of its instantiations will decrease by the same rationale. If this
happens repeatedly, the particular genome type will disappear from the
pool while other, more adequate types will increase in terms of their
Having been selected through environmental feedback, the gene-pool
of any species, as well as any individual genome can be interpreted as a
schematic model of the species’ habitat. For example, the genes which
code for a camel’s hump point to the aridity of deserts, the genes behind
the shape of fish point to the physical properties of water, the agility
of rabbits indicates predators, the ultra-violet patterns on the flowers
of plants indicate the visual system of insects, and so on. Of course,
apart from being difficult to read for outside observers, the genetically
3 In order to keep things simple, let us forget the particularities of sexual reproduction and
imagine an a-sexually reproducing species.
Generalising Darwinism 97








Figure 5.2 Operation of biological species, viewed as Complex Adaptive


encoded information is selective, biased and schematic in so far as only

those aspects in a species’ habitat are ‘recorded’ in its gene-pool which are
relevant to the species’ survival and reproductive success. No gene-pool
can contain a complete picture of the world within which it exists.
Note that, strictly speaking, there are two interpretations of the way in
which a genome’s environment feeds back on it. On the one hand, there
is feedback on the very instantiations of an unfolding genome that exist
in the cells of the actual body produced by it. This kind of feedback can
be measured, first and foremost, in terms of the life-span of such a body.
Simply speaking, the environment can kill a body (together with all copies
of its genome) either sooner or later. It can not, however, alter the genome
itself. On the other hand, however, there is environmental feedback on the
overall set of genome instantiations that exist within a whole population,
or species. This can be measured not merely in terms of the time for which
98 Selfish Sounds and Linguistic Evolution

they live, but also in terms of the number of instantiations that exist within
a population, or species as a whole, since the environment may ‘grant’ an
organism anything from zero to a large number of offspring.
Of course, the two kinds of feedback are related and their effects cannot
be neatly disentangled from each other, as the number of offspring which
an organism may have will often correlate with the time for which it is
alive, for example. But in extreme cases the difference does matter. A
genome might code for an extremely long-lived, but infertile organism,
for example. In that case the environmental feedback on all its somatic
instantiations would be positive, while the feedback on the total number
of instantiations with the population or species would clearly be negative.
On an evolutionary time scale, the genome type will be extremely short-
lived, no matter for how long the actual organism for which it codes may
Now, since the environmental feedback which intra-somatic genome
instantiations receive has no variety to select from (the genome copies in
body cells are, for all practical purposes, of the same type), no ‘evolution’
or ‘learning’ can take place on the level of an individual body’s genome.4
Instead, the systems that ‘evolve’ or ‘learn’ in the case of biological evo-
lution will be populations of genes, and it is the environmental feedback
on such populations rather than that on the genes of individuals which
drives this. What adapts to the environment is the distribution of genome
types within the gene-pool that defines biological species, and it is on this
level that biological evolution amounts to a learning process. Languages as Complex Adaptive Systems

Let us now turn to language. There are two interpretations of Gell-
Mann’s model which both make equal sense and which appear to be
quite different from each other – at least at first. On the one hand, it
is possible to regard an individual speaker’s competence as a complex
adaptive system which evolves, or learns, as the speaker acquires, fine-
tunes or extends his/her linguistic abilities. On the other, it is possible
to regard a whole population of competences as such a system. In this
case, the ‘evolution’ or the ‘learning’ would take place as the language is
transmitted among successive generations of speakers. Let us look at the
two interpretations in turn.

4 This does not mean the development of an individual body does not involve ‘learning’,
‘adaptation’, or ‘evolution’ in other respects. But in such cases, the ‘schemata’ among
which environmental feedback can come to select will not be genetic. Instead, they will
involve populations of antibodies, as in the case of immune system evolution (see Clark
1995), or patterns of synaptic connections as in the development of the central nervous
system and cognition (see Edelman 1987, Plotkin 1994).
Generalising Darwinism 99 Language acquisition

Take individual language acquisition first. It is clearly plausible to regard
the state of a human linguistic competence as a schema, existing in human
minds and implemented, ultimately, in terms of neuronal configurations.
This is perfectly consistent with Chomsky’s definition of language as a
system of knowledge instantiated as a brain-state. Now, if language is
indeed a CAS in Gell-Mann’s sense, then language acquisition must pro-
ceed within a single speaker’s mind roughly like this:
When exposed to specific environmental data – either input from
sensory organs or other mental modules whose effects may amount to
motivating a speaker to say something, or actual textual input – the
schema which a specific competence state represents unfolds, and pro-
duces either textual output or an ‘interpretation’ including, and/or subse-
quently resulting in, new behaviour. Next, the consequences of such pro-
ductive or interpretative behaviour feed back to the speaker’s mind/brain
some kind of ‘evaluation’, which will also affect that part of the mind/brain
in which the speaker’s competence ‘resides’. If the feedback is positive,
it will reinforce the original competence state; if it is negative, it will
destabilise it.
Every single competence state which happens to be the basis of an
unfolding will be rivalled, mind-internally, by a number of other, prob-
ably rather similar, states. The relative stability, or strength, of any par-
ticular state will correspond to the probability with which it may unfold
in behaviour. In such an interpretation, the ‘initial state’ of a linguistic
competence has probably to be conceived of as a set of rivalling schemata
with similar degrees of (relatively low) stability. Which of them actually
gets expressed will be more or less a matter of chance, resulting in what
may appear from the outside as relatively unsophisticated and purpose-
less behaviour. From the very beginning, however, the rivalling schemata
will come under ‘selection pressure’ from the feedback incurred by indi-
vidual unfoldings (both ‘productive’ and ‘interpretative’). In response
to such pressures, the population of rivalling schemata will assume a
more complex organisation, until the system reaches a comparably sta-
ble state. In such a ‘mature’ state a sophisticatedly organised set of
schemata is stabilised, whose ‘expressions’, ‘unfoldings’, or ‘predictions’
correspond, more often than not, to meaningful linguistic utterances or
The environment to which rivalling competence states are sensitive and
which selects among them will include reactions of other speakers in the
widest sense, the articulatory hardware of a speaker and the costs, in terms
of energy, its activation incurs, information fed to cognition through the
sensori-motor system, with which the interpretation of linguistic input
100 Selfish Sounds and Linguistic Evolution








Figure 5.3 Language acquisition, viewed as a Complex Adaptive


may, or may not ‘match’ and so on. Evidently, minds will have ways
of assessing the benefits gained by individual speech acts as weighted
against the costs incurred by them, for instance in terms of processing
and/or articulatory energy.
Language acquisition can thus be regarded as an evolutionary pro-
cess in which a specific part of a person’s mind/brain (Chomsky’s UG,
Pinker’s ‘language organ’) ‘adapts’ to aspects of its host’s body, to the
conceptual content of other parts of that host’s mind/brain, as well as
to the external environment of its host, in particular to the communica-
tive behaviour of the social group within which its host is embedded.
Consequently, a person’s linguistic competence evolves into a schematic
representation of these aspects. Figure 5.3 above, schematises that
Generalising Darwinism 101 Language change

However, language acquisition is not the only aspect in regard to which
language can be interpreted as a CAS. As mentioned, it is not only the
development of an individual speaker’s linguistic competence which dis-
plays learning behaviour, but a similar thing seems to be true of the lan-
guages of complete speech communities. They also change over time and
seem to adapt themselves to the changing needs of their speakers. Yet, the
idea that languages in the super-individual sense might indeed be Com-
plex Adaptive Systems in a technical sense raises a couple of problems.
First, the statement that language change is functional, purposeful or
adaptive is impressionistic. When one looks at actual historical changes
that languages have gone through it becomes doubtful whether it is really
adequate, and the historical linguistic community has seen heated debates
over the issue.5 Clearly, an ‘adult competence’ is usually more functional
than the competence of a new-born. But is Modern English more func-
tional than Old English was? Do languages improve over time? Put this
way, the question is probably not even sufficiently well defined to war-
rant a meaningful answer. If one interprets it naively, there seems to be
some evidence to the contrary. For example, take the history of Germanic
short /a / as reflected in Modern English man, what or that. In the lineage
leading to Modern English, the sound was first (i.e. in pre-Old English or
Anglo-Friesian times) ‘brightened’ to /æ/ in some contexts and darkened
to /ɔ / in others. At the beginning of the Middle English period, however,
it again shows up as /a / in both contexts. Finally, the pre-Old English
development seems to have repeated itself, so that the Modern English
counterparts of ME /a / show up, once more, as either /æ/ or /ɒ/. The
following diagram charts the development.
(13) a

,  ɔ mon

That, man  ɒ what

In such a development it is indeed hard to see any of the progress or

improvement that we normally associate with the terms ‘learning’ or
‘adaptation’, and cases such as this have repeatedly been adduced to

5 See, for example, Lass (1980, 1987b, 1997), Dressler (1985), Samuels (1987a, 1987b).
102 Selfish Sounds and Linguistic Evolution

argue that language change is essentially arbitrary and unexplainable.

This holds in a similar fashion for other changes as well, such as the syn-
tactic change from Subject–Object–Verb order to Subject–Verb–Object
order, which the English language seems to have gone through. Again,
it is hard to understand why either ordering principle should be ‘better’
than the other one.
There are of course many changes which obviously represent, or can
be construed as ‘improvements’ in some way or other. For example, the
unstressed, typically final and – originally often morphologically func-
tional – syllables of English words have been systematically reduced and
sometimes deleted over time. (Compare for instance OE twegen, dagum,
minre, ealle, lybbende, gegaderude from (2a) on page 11 with their Modern
English descendants two, days, my, all, living, gathered.) This develop-
ment clearly appears to make things easier for speakers, as it saves them
articulatory energy. Yet again, it is much more difficult to argue that this
‘improvement’ has made the language as a whole better, or more efficient.
The common view among linguists seems to be that with regard to their
global ‘functionality’ all languages are more or less equal (see for exam-
ple Dressler 1985: 265 and the references there). Thus, it appears indeed
to be questionable, whether language change can indeed be regarded
as ‘learning’ in the normal sense of the word. It might be only change,
after all.
While the question of whether language change represents ‘learning’
or is in any sense adaptive appears difficult to answer when one looks at
phonological, morphological or syntactic properties,6 however, at least
the lexicon provides unambiguous evidence that this might indeed be the
case. After all, changes and additions to the vocabularies of languages
typically reflect prior and/or concomitant changes in the culture or the
environment of the involved speech communities. It is a commonplace
that no discovery is made, no technological innovation introduced, and
indeed no concept developed in any human community, without the lan-
guage of that community coming up with ways of communicating about
it. For instance, even a brief look at any computer magazine will turn
up items like router, e-learning, RAM, token ring, LAN and others, which
even fairly recent dictionaries do not yet contain. This proves that at least
in some sense languages are indeed able to keep track of their environ-
ment by adapting to it, and thus justifies conceiving of languages in a
super-individual sense as complex adaptive systems as well.

6 The issue has been hotly contended within the linguistic community. Interesting con-
tributions are, for example, Lass (1980, 1987a, 1987b, 1997), Dressler (1985: 265–8),
Samuels (1987a and 1987b).
Generalising Darwinism 103

How then does Gell-Mann’s model apply to language change? As in

language acquisition, the place of the central schema will have to be
taken by any competence state existing within a speech community at
a given time. Also, the ways in which such competence states unfold
and get expressed will again be acts of communicative behaviour, texts
and interpretations. Things become more complicated, however, when
it comes to deciding where exactly on the level of a complete speech
community environmental feedback is supposed to be directed and by
what mechanics it should be assumed to exert its influences. This raises
a second and rather serious issue.
Recall that in the case of biological evolution it was relatively easy to
establish, at least in principle, how environmental responses to a genome’s
unfolding feed back on its intra-somatic instantiations on the one hand
and to its instantiations within the genome pool of a whole species on
the other. In the first case, the relevant factor is longevity, in the second
reproductive success, and the two types of feedback can be distinguished
fairly easily. Also, it is unproblematic to view an individual genome as
representative of other genomes in the population, so the offspring of
individual genomes of the same type can easily be added and the feedback
incurred by any single one of them in terms of reproductive success can
be straightforwardly translated into feedback incurred by larger sets of
genome types.
In the case of language, on the other hand, the relation between individ-
ual competence states and the population of states that make up a speech
community may be more complicated. Of course, we have argued above
that competence properties are replicated through communication and
language acquisition. Also, we have called languages replicating systems
and implied that they might therefore be evolving systems as well. But
now our first intuitions are put to a more rigorous test. Thus, the question
of how the feedback incurred by the unfolding of individual competence
states may affect the population of states within a whole speech commu-
nity represents a challenge which is more formidable than we might have
In order to see more clearly what the problem is, recall the basic account
we have given above (see section 3.4, pages 39ff.) of what is known as
the change of Middle English long /e / to Modern English long /i /. We
described it as a development in which competences with a property {II} –
expressed as [i ] – first emerged and eventually spread at the cost of
competences that had the property {EE} – expressed as [e ] – instead of
{II}. In Gell-Mann’s model of complex adaptive systems, the properties
{EE} and {II} would figure as constituents of schemata represented by
competence states of individual speakers. In Middle English times, the
104 Selfish Sounds and Linguistic Evolution

competence schemata within individual speakers’ minds would – during

language acquisition – develop towards states in which the property {EE}
was stably implemented. In Modern English times this has changed, so
that nowadays speakers’ competences develop towards states with the
property {II} instead. All this is fair enough. But what exactly are the
properties {II} and {EE}? As aspects of brain-states they must certainly
be neural configurations of some sort, but the way we have treated them
so far, gives us no clue at all about the structural properties they might
have on the level on which they, as the materially implemented patterns
which they must necessarily represent, are best described.
So far, we have identified competence properties by their functions, or,
as we should say in the context of the present discussion, by the results
of their unfolding, that is, their behavioural, phenotypic expressions. We
have deliberately avoided the question of what the configurations that
eventually get expressed in utterances actually look like and what they
are at all supposed to be. For our present task this is not good enough,
however, because in order to determine the consequences (in terms of
feedback) which the unfolding of an individual competence state may
have on a larger set of competence states within a speech community, we
need to know whether or not schemata that produce similar expressions
will also be structurally similar.
The reason why this is crucial becomes obvious if one considers the
theoretical consequences of either of the two possibilities. Assume, first,
that the competences of all speakers who pronounce he, me, see, meet,
green, and the like with [i ]s are indeed identical in structural terms with
regard to the responsible property. If this is the case, we can make a
few straightforward and helpful claims. We may say, for example, that
every individual competence which evolves, during language acquisition,
towards a state in which this property is stably represented will thereby
increase the overall number of competence states in which the same prop-
erty is stably implemented as well. More importantly, we may also say that
when it produces input to language acquisition, the unfolding of com-
petences with specific properties will cause other competences to evolve
towards states in which they are characterised by the same properties.
Then, the consequences of competence unfolding will feed back not only
on the unfolding competence itself but also on other competences within
the population. If this is so, a population of competence states can be
regarded as a complex adaptive system in Gell-Mann’s sense.
If, on the other hand, this is not so, then it is, to put it mildly, much more
complicated to determine how the properties which a single competence
evolves during its maturation relate to the population of competences in a
speech community. If one assumes a large degree of liberty regarding the
Generalising Darwinism 105

possible ways in which competences may structure themselves in order to

come up with one and the same type of ‘expression’, then every compe-
tence may evolve a structure which is different from all the others while
achieving the same things. If described on the level of competence prop-
erties, all speech communities will at all times represent highly heteroge-
neous and unordered sets. Sampling them at different times will not yield
any information about trends, or changes in a specifiable direction. All
one would ever get is meaningless difference. This clearly precludes the
possibility of regarding speech communities as competence pools capable
of evolution, adaptation or learning.
To see how disastrous this would be for all attempts to understand lan-
guage change in evolutionary terms, attempt to project the scenario back
to the domain of biological life. There, it would correspond to a situation
in which a species was, say, characterised by the ability to fly. Instead
of inheriting a specific set of genes from its parents, however, each indi-
vidual member of a species would inherit a random mix of genes which
‘evolves’ during the ontological development of an organism towards a
genetic configuration enabling its host organism to fly in a way that is sim-
ilar to the one practised by its co-speciates. If there are many pathways
towards acceptable flight (compare birds, bats and insects, for example),
the mature genomes of individuals within such a population or species
will bear little resemblance to each other. No species could evolve in
a Darwinian sense anymore, and historical changes could no longer be
meaningfully described on the level of species at all. All learning and
evolution would take place within individuals and the very existence of
species would beg a ‘social’ rather than a genetic explanation.
Of course, this scenario is absurd, but its very absurdity shows how
strongly the idea that languages, viewed as competence populations,
should be capable of adaptation and evolution depends on the assump-
tion that competence states which produce similar expressions should
also be structurally similar to one another. In other words, we simply
have to assume that this is indeed the case if we want to view languages
(as opposed to individual competences) as CAS in Gell-Mann’s sense at
all. There is no alternative.
Also, the assumption that competence states which generate similar
expressions should be structurally similar is stronger than it might appear
at first. After all, one might take a structural similarity among compe-
tences within a speech community for granted because it is common lin-
guistic practice to devise ‘grammars’ in the sense of competence models
and to regard them as representative of most, if not all speakers of a spe-
cific variety. So does this not mean that their competences must indeed
be similar to one another? Well, strictly speaking it does not, because the








Legend: : Pathway for feedback from the competence of speaker A to the competence population.
: Pathway for feedback from the competence of speaker B to the competence population.

Figure 5.4 Language evolution and change, viewed as a Complex Adaptive System.
Generalising Darwinism 107

grammatical models devised by linguists are typically functional models.

They are intended to emulate and to describe what real speakers’ com-
petences achieve, but they hardly ever claim to represent the material
structure and organisation of the mental modules in which real speakers’
competences are realised. Thus, just like actual competences, linguis-
tic grammar models may ‘predict’, generate or unfold into grammatical
utterances, but it cannot be taken for granted that each of their con-
stituents will necessarily have a counterpart in speakers’ minds. In order
to determine whether the language of a speech community can be viewed
as a CAS, however, we need linguistic competence models that are not
only functionally equivalent to specified states of actual speakers’ minds
but also structurally isomorphic to them, or simply speaking, empirically
interpretable. Of course, such psychologically realistic competence mod-
els are necessary for any historical account which sees successive stages
of a language as causally related to each other. The attempt to apply the
theory of complex adaptive systems to languages does not create the prob-
lem, but merely highlights it, which is actually a good thing. Therefore,
and for all its difficulty, it will have to be faced and discussed in greater
Let us return to our present problem, however, on the tentative assump-
tion that competences which unfold in similar ways are themselves struc-
turally similar as well. In what way, then, will the environmental feedback
incurred by the unfolding of an individual competence state affect a pop-
ulation of such competence states? Of course, any stable competence
state will add to the number of similar states in the population and thus
strengthen the type. As a schema gets stabilised in the mind of a child
it will come to resemble those competence states to whose expressions
is was exposed during acquisition. To the degree to which adult compe-
tence states keep their flexibility, similar transpersonal interactions can
be assumed to take place between adult competences as well. Thus, any
act of communicative behaviour will feed back not only on the compe-
tence which it expresses but also on competences in the minds of other
A schematic representation of the interaction between two competence
states is given in figure 5.4, which should be interpreted like this: more
often than not, communication involves more than a single speaker (for
the sake of simplicity figure 5.4 shows just two). The minds of speakers
host sets of ‘rivalling’ competence states among which the environmen-
tal feedback on their unfoldings selects. Now, assume that in a specific
situation the competence state of a speaker A unfolds to produce an utter-
ance. As an ‘environmental’ result of this unfolding, articulatory energy
will be consumed, and information about this will feed back to speaker
108 Selfish Sounds and Linguistic Evolution

A’s mind. At the same time, the produced utterance will trigger a con-
comitant unfolding of speaker B’s current competence state, resulting in
‘interpretative’ performance. Once again, this interpretation will incur
costs in terms of processing energy, and information about this will feed
back to speaker B’s mind, where it will be available for assessment. Finally,
the utterance and its interpretation will have further effects on the subse-
quent behaviour and cognitive states of both involved speakers, affecting
their actions, social relations, the ways in which they assess the situation,
and so on. These consequences will also feed back to the involved speak-
ers’ minds/brains and the competence states implemented in them. In
each speaker’s mind, they will be ‘evaluated’ separately, weighted against
the costs incurred by articulation and interpretation. Consequently, they
will either stabilise, or destabilise the current competence states. If both
participants’ minds/brains evaluate the overall results of the communica-
tive exchange that are fed back to them positively, both competence states
will be reinforced. If either of them does not, the state of his/her compe-
tence will be destabilised and rivalling states be strengthened. Thereby,
individual competences exert ‘selection pressures’ on each other.
If such exchanges take place repeatedly, the parallel ‘acquisitional’ (and
as we have said above also ‘evolutionary’) processes that take place within
each of the speakers’ minds will select competence states that incur pos-
itive7 feedback when they unfold in communication. Presumably, the
competences in each of the involved speakers’ minds/brains will evolve
towards stable states in which they resemble each other fairly well.
Of course, the scenario just given is simplified in many respects and
raises a couple of questions that cannot be dealt with exhaustively at this
point. Some of them nevertheless deserve to be mentioned.
First, it is obvious that the amount of influence which two competences
exert on each other when they interact via their unfolding will hardly ever
be distributed symmetrically. Obviously, for example, children’s compe-
tences are more likely to adapt to those of adults than vice versa.
Second, no speaker communicates with only one other speaker, but
with many. To the extent that there are differences among the compe-
tences of different speakers, every single one of them will be exposed
to conflicting selective pressures, and will evolve towards a state which
represents an acceptable compromise among them.
Third, what makes for an ‘acceptable compromise’ is a subtle question.
It is inconceivable, for instance, that a speaker’s competence state will

7 Of course, ‘positive’ must be understood in relative terms. What matters for the selection
of a particular competence state is that the feedback it gets is more positive than that of
rivalling states.
Generalising Darwinism 109

represent the simple ‘average’ of the competences to whose unfoldings it is

exposed. This would, in the long run, produce completely homogeneous
populations, which clearly do not exist. Instead, the evolving competences
of individual speakers are likely to respond to the influences of other
competences in variable ways, and to ‘distinguish’ such to which it pays to
adapt from those to which it does not. Such distinctions will reflect diverse
parameters both of a social and of a cognitive-functional nature. Thus,
competences will be more likely to adapt to selection pressures exerted by
competences whose ‘hosts’ are perceived as prestigious. They may also
be more likely to adapt to competences of group members as opposed
to outsiders. On the functional side, competence states whose unfoldings
incur lower articulatory and/or processing costs while producing effective
communication and/or beneficial cognitive states will not only be selected
for in speaker-internal competence evolution but will also ‘spread’ more
easily than others.
Clearly, these issues are merely the tip of an iceberg of questions con-
cerning the problem of what will adapt to what and why in communi-
cation, language acquisition and change. We shall discuss some of them
below (see section 6.6). At this point, it suffices to develop an awareness
of the kind of problems which the view of languages as complex adaptive
systems raises.
Pending the resolution of problems like the ones just listed, we have
seen that – on the assumption that similar types of linguistic behaviour
indicate structurally similar competence states – it is indeed possible to
view languages, that is, pools of competence states, as complex adaptive
systems and the changes they go through as ‘evolutionary’. This is because
the unfolding of individual competence states feeds back not only on
the set of rivalling states that co-exist mind-internally but also on com-
parable state sets instantiated within the minds of other individuals.
Processes of competence unfolding exert ‘selection pressures’ on other
competences, in which they ‘reward’ similarity and ‘punish’ difference.
Thus, just as the results of the unfolding of an individual genome are fed
back to the species as a whole, where they affect the overall distribution of
genome types, so do the results of communicative acts feed back on the
speech community as a whole, where they affect the overall distribution
of competence states. That way, competence states whose behavioural
effects incur positive feedback will spread within a community and be-
come more numerous, while states which don’t will decrease in number. Summary
We have seen that languages can be construed as specific subtypes of a
more general class of systems, which are adaptive and thus capable of
110 Selfish Sounds and Linguistic Evolution

evolution and learning. From this, we may draw a couple of preliminary

First and most importantly, the apparent similarities between lan-
guages and biological life-forms are no coincidence, nor are they an
artefact of a biologically inspired, metaphorical way of looking at lan-
guages. Instead, they reflect deeper and more general design principles,
which may characterise the organisation of many systems within the uni-
verse. That languages and species represent different subtypes of a more
general class of systems explains why there should be no one-to-one cor-
respondence between biological evolution on the one hand, and the ways
in which languages evolve on the other. In fact, differences are likely
because biological and linguistic information are implemented in mate-
rially very different substrates, that is, complex DNA molecules on the
one hand, and complex neuronal configurations on the other. This basic
ontological difference will certainly affect the ways in which the two sys-
tems will interact with their environments, as well as the ways in which
they transmit and maintain their structures as the material substrates in
which they are implemented get replaced over time.
In both cases, the evolution, or learning behaviour of the respective
systems, is not steered by central agents or designers. Instead it emerges
from the interactions of many possibly rather simple entities. Thus, the
complex variety of biological species has been created through the interac-
tions of replicating DNA patterns, each of which exists for no other reason
than the fact that the properties and effects it has cause it to be replicated
before it disintegrates. In the case of language, neither the acquisition
nor the historical evolution of competences is ‘steered’ or ‘directed’ by
conscious speakers.
If, as Melanie Mitchell puts it,

the brain is a massively parallel system of billions of tiny processors (neurons)

with limited communication with one another and no central controller running
the show [. . . , and if i]nstead the collective actions of the neurons (and their
attendant synapses, neurotransmitters, etc.) lead to the states we call ‘thinking’,
‘consciousness’, and so on in ways we do not yet understand very well, (Mitchell

then also linguistic knowledge must emerge from the interactions of neu-
ronal configurations. Like biological replicators, then, these specific con-
figurations exist because their properties and effects stabilise them, and
cause them to replicate before they perish.
Of course, the conclusion that competences might not merely be pas-
sive objects which speakers acquire and use as tools for their communica-
tive and cognitive purposes strikes one as rather counterintuitive at first.
Generalising Darwinism 111

In a way, it casts people, or rather their brains and bodies, as environ-

mental conditions with which linguistic competence schemata interact
to effect their own survival even as their temporary hosts perish. This
clearly does not correspond well to how it feels to know, and use, a lan-
guage. Nor does it correspond well with how it feels to be somebody in
the first place. But this should not necessarily bother us. After all, we are
not trying to explain the conscious experience we have of ourselves but
the way in which languages seem to change over time. We should not let
common sense notions and biases concerning the former prevent us from
developing the perspectives which are most suitable for dealing with the
It is clearly possible that the notion we all have of ourselves as
autonomous, free-willed agents who are ‘in control’ of our bodies,
thoughts and actions might be like the idea of ‘languages as such’, which
we can ‘learn’, ‘have’ and ‘use’ and which have essential inter-subjective
properties. That is to say, it might be like many another concept we have
because it helps ‘us’ to face the challenges with which ‘we’ are usually con-
fronted. ‘There is a language that everybody in my community knows, so
it is safe for me to use it’, ‘An apple is an apple is an apple, and if it is one,
then I can eat it’, ‘Dogs can bite, so I had better be careful around them’,
‘I have a body, which can do things for me, so I’ll look after it.’ That the
ways in which these concepts carve up reality and the ways in which they
‘model’ it are on average useful to the organisms that host them, does
not imply that they are true in an empirical sense. It may be useful to
believe in ‘guardian angels’ or ‘genii’, who are out there to protect and
support us, for example. Such beliefs may indeed alleviate anxiety, make
our behaviour more rational, and thus turn into self-fulfilling prophecies.
In the field of medicine, ‘placebo’ effects of a similar sort are well known,
of course.
What this suggests is that our minds may indeed be blindly evolving
systems in which concepts and theories establish themselves because the
feedback incurred by their behavioural effects is evaluated as positive,
not because they are ‘true’. Thus, the concepts and theories which most
people have are likely to have been selected by the environmental chal-
lenges most people usually face. The explanation of linguistic change is
unlikely to be among them. Few minds are ever exposed to it. On those
which are, however, the problem might exert selection pressures of an
unusual kind, and thus favour perspectives, concepts and theories which,
under different, more ‘normal’ circumstances would never have stood a
chance of becoming stable. Conversely, it may come to destabilise notions
which otherwise have proved highly adaptive and which form part of the
inventory of useful biases which we call ‘common sense’. The concept
112 Selfish Sounds and Linguistic Evolution

of speakers as ‘learners’, ‘owners’ and ‘users’ of language may be one of

However, for enquiries into linguistic change the role of speakers has
always been somewhat problematic. Of course, they have to be involved
in it somehow, because language wouldn’t even exist without them. But
how exactly should the relation between speakers and ‘their’ languages be
imagined? If one regards them as essentially free-willed, autonomous and
unpredictable, and if one regards language as a set of conventions upon
which speakers, albeit unconsciously, agree, one is logically forced to fol-
low Saussure and draw the conclusion that there can be no truly causal
relationship between successive language stages (see also page 59). Free-
willed speakers are, after all, free to subscribe to linguistic conventions or
not, and may renegotiate them at will. As the historical evolution of lan-
guages shows, however, speakers seem to exploit this essential freedom to
a much smaller extent than one might expect. Also, coherent stories have
been told about linguistic change, and regularities formulated, without
making reference to speakers at all. For instance, the striking regularity
of sound changes does not go together very well with the idea that lin-
guistic conventions should be under the arbitrary control of free-willed
and unpredictable agents.
What one may alternatively infer from the view that languages are in
principle under the control of their speakers, is that when they change
them, they will have good reasons. Then, both the regularity and the direc-
tion of linguistic changes would reflect the rationality of speakers. This
view underlies many functional approaches to language change. When-
ever a particular change occurs, it is supposed to reflect an attempt by
speakers to make their languages more efficient or effective, or both. For
example, the raising that affected English vowels during the Great Vowel
Shift can be supposed to have occurred in order to facilitate perception
(see Donegan 1979). Conversely, the shortenings, lenitions and deletions
which have kept affecting English phonemes in unstressed syllables for
the last thousand years at least, may have occurred because they saved
speakers articulatory energy. Similar explanations have been proposed
for many phenomena from other linguistic domains.
There are several problems with explanations like these, which have
been amply discussed in the literature. One is that they imply that lan-
guages should always be getting ‘better’, which implies that they should
all eventually converge on ‘the optimal’ solution, which they do not seem
to do. However, it is fairly easy to counter this objection. First, improve-
ments may be only local, and incur ‘deteriorations’ in other respects, so
that the overall functionality of languages remains constant. Thus, the
improved perceptibility of [i ] as against [e ], which may have motivated
speakers of English to ‘implement’ the Great Vowel Shift, had to be
Generalising Darwinism 113

paid for by the increased articulatory energy which the articulation of

[i ] instead of [e ] required. By a similar rationale, the reduction and/or
deletion of final syllables may have saved articulatory energy, but often
obscured morphological distinctions, thus making the system of case and
number marking more difficult to process. Of course, if nothing is ever
gained by changing one’s language, then the question is why ‘rational’
speakers should take the trouble at all.
One possible answer would be to say that speakers do not realise the
futility of their endeavours. Instead, when they think they see a good way
of making their language more efficient or more effective, they jump at
the chance, to find out that they really haven’t gained anything when
it’s too late. This idea, though superficially plausible, immediately raises
another question, of course, namely why particular aspects of a language
should catch speakers’ eyes and impress them as worthy of improving?
If it is a matter of mere chance, this brings us back to the Saussurean
view that language change is inherently contingent and thus can neither
be explained in itself nor be adduced to explain the properties of lan-
guages as they are at specific times. If it is not to be a matter of mere
chance, but results instead from the properties of particular language
states themselves, then languages might be like Rubik’s cube, and their
speaker’s amateurish players, who notice states in which, say, a particu-
lar plane is all but finished, apply a few twists and turns to the cube in
order to finish the plane, only to realise that they have made things worse
on the five others. In this case, however, it is difficult to maintain the
view that speakers really are ‘in control’ of their languages. To the degree
that their responses to specific language properties are predictable from
these properties themselves, the agency in linguistic change lies with those
properties rather than with the speakers. Such a view would cast speak-
ers as ‘victims’ of language change rather than agents, and is no different
in this respect from the view we are here advocating, namely languages
are complex adaptive systems, ‘live’ in speakers and ‘use’ them for their
propagation and evolution.
Another way of countering the objection that functionalism wrongly
predicts languages will converge on an ‘optimal’ solution is to say that
languages have social functions apart from communicative ones, and
that a central social function is the distinction of outsiders from insid-
ers. This provides a ‘reason’ for keeping and making languages different
from each other. It may motivate speakers in different (sub-)communities
to keep their language communicatively inefficient in idiosyncratic ways,
merely to distinguish themselves from neighbouring communities. Such a
view is consistent with the observation that linguistic differences typically
coincide with boundaries between regionally or socially distinct groups of
speakers as well as with the fact that changes seem to spread along routes
114 Selfish Sounds and Linguistic Evolution

which are defined by social networks (see e.g. Milroy 1980). However, it
has the same implications as the view that speakers change their languages
because they falsely believe they can improve them. First, it begs the ques-
tion by what features they should decide to distinguish themselves from
their neighbours. Partly, this will be constrained by what these neighbours
happen to be doing of course, but this transfers the problem merely to
the next level. Also, the question still remains: which of the possibilities
to ‘improve’ the language that is not taken by their neighbours are they
supposed to take? If their decision is purely arbitrary, then this puts an
end to all attempts at understanding language change; if it is not, but
instead motivated by properties of the language they happen to speak,
then speakers lose their autonomy to exactly that extent.
What this means, however, is that there can be no functionalist account
of linguistic evolution in which speakers are viewed as autonomous and
free-willed agents of change and languages merely as passive objects.
There is no way of avoiding this conclusion, and the fact that it becomes
so obvious if one views languages as complex adaptive systems may be
one of the greatest advantages of that perspective.8

8 Interestingly, the view that speakers might not be irreducible agents of linguistic change is
implicit in many functionalist accounts of language change anyway. To give an example,
explanations of phonological lenitions are hardly ever explicitly attributed to a decision on
behalf of speakers to make pronunciation easier for themselves. Instead explanations are
given in terms of depersonalised concepts such as ‘inertia’, which is ultimately a physical
principle over which speakers can definitely have no control. The same holds true, of
course, for the relative ease with which different types of sounds are perceived. Again, the
perceptibility of acoustic signals is clearly not under speakers’ control, but represents a
genetically determined part of their physiological make-up. Similarly, even less obviously
mechanic aspects of language are typically couched in a-personal, often semiotic terms.
Thus, the iconicity or transparency of signs (such as complex word forms, that represent
complex conceptual configurations) is normally measured in inter-subjective terms. In
sum, one gets the impression that when regularities are expressed and explanations pro-
posed, no active role is attributed to speakers at all. The only occasion where ‘speakers’
are really adduced is when it comes to explaining, on a meta-theoretical level, why the
laws which are proposed to account for language properties, both synchronically and
diachronically are as weak as they typically are, that is, statistical rather than covering.
Human actions, it is then typically argued, will forever remain somewhat mysterious and
will never be fully understood.
One consequence of this attitude is that laws which yield wrong predictions are not
really to be regarded as problematic. Clearly, this may easily tempt one to be satisfied
with rather half-baked accounts. Indeed, hardly a conference goes by, for example, with-
out papers being given that contradict each other in differing degrees of blatancy. While
the obvious conclusion that both cannot be right is definitely drawn by many members
in the respective audiences, it tends to be quickly superseded by the melancholy and
normally unexpressed acknowledgement that probably both will be wrong, but as long
as we are having fun speculating about it, it does not really matter, because who will ever
really understand speakers? In sum, the assumed ultimate unaccountability of human
behaviour tends to lend itself as a good excuse to practise linguistics as the art of thinking
up elegant and aesthetically pleasing just-so stories.
Generalising Darwinism 115

At any event, if one gets accustomed to the idea that the properties
of languages might be described and to some extent understood without
deriving them from their supposed ‘owners’, that is, human selves, one
will see the parallels between biological and linguistic evolution much
more easily. Both in species and in languages, ‘evolution’ is effected
through environmental feedback on rivalling schemata, which ‘stabilises’
and/or ‘selects’ some at the cost of others. While the results of this selec-
tion process appear to be ‘goal-oriented’, ‘adaptive’, or ‘functional’, there
is nothing intentional, or teleological about them. Being patterns and thus
not identical with their material realisations, both biological and linguistic
schemata may remain in existence even as their actual material substrates
get replaced over time. This is what makes them ‘replicating’ systems. If
you define a Darwinian system as one which evolves by producing vari-
ety in a pool of replicating patterns among which some will prove more
stable and better at replicating than others under specific environmental
conditions, then languages are clearly just as ‘Darwinian’ as biological

5.2.3 Inside Complex Adaptive Systems: universal Darwinism and

cultural replicators
In the preceding section we have seen that a ‘Darwinian’ perspective on
language may be gained even from a general, domain unspecific theory
of complex adaptive systems. We have outlined how competence proper-
ties may become stable within brains, and how they may be transmitted
among them. We now need to ask more detailed questions. One of the
most important ones will clearly be what the ‘schemata’ supposed to rep-
resent human competences may look like and what their constituents
may possibly be. Therefore, we shall reverse our approach to language as
a complex adaptive system. After having gained a rough idea of its global
organisation and operation as viewed from the top down, we shall adopt
a bottom-up perspective and discuss what language might be made of. In
that discussion, we shall – now less self-consciously than before – try to
profit from arguments which have been developed within the biological
community. The concept of cultural replicators, and Dawkins’ ‘memes’

The central role in the evolution of species is played by ‘genes’, that
is, self-replicating DNA patterns. It is gene-based evolutionary theory
which best explains qualities of life on earth such as the complexity of
organisms, their adaptedness to their habitats and to each other, as well
as the existence of species. What is special about genes is not that they
116 Selfish Sounds and Linguistic Evolution

are made of DNA, however, but that they are replicators. Recall that we
have explained the essentials of Darwinian evolutionary theory without
the chemical details of DNA replication. As a matter of fact, some of
the thought experiments we conducted involved fictitious replicators that
bore no resemblance to DNA molecules at all. This very circumstance
might have made us suspect that Darwinian evolution was not dependent
on the existence of DNA.
This conclusion has also been drawn by Richard Dawkins, one of the
most prominent members of the contemporary biological community. In
his best-selling introduction to gene-based evolutionary theory he pro-
posed that the principles which made Darwinian evolution work would
do so wherever true ‘replicators’ emerged. ‘What, after all, is so special
about genes?’, he asks.
The answer is that they are replicators. The laws of physics are supposed to be
true all over the accessible universe. Are there any principles of biology that are
likely to have similar universal validity? When astronauts voyage to distant planets
and look for life, they can expect to find creatures too strange and unearthly for us
to imagine. But is there anything that must be true of all life, wherever it is found,
and whatever the basis of its chemistry? [. . .] Obviously I do not know but, if I
had to bet, I would put my money on one fundamental principle. This is the law
that all life evolves by the differential survival of replicating entities. [. . .] The
gene, the DNA molecule, happens to be the replicating entity that prevails on our
own planet. There may be others. If there are, provided certain other conditions
are met, they will almost inevitably tend to become the basis for an evolutionary
process. (1989: 191f.)
We have seen that competences evolve by the differential survival of
their constituent properties. If Dawkins is correct, then these should rep-
resent ‘replicators’. Dawkins himself suggests that they may. His argu-
ment continues like this.
Do we have to go to distant worlds to find other kinds of replicator and other,
consequent, kinds of evolution? I think that a new kind of replicator has recently
emerged on this very planet. It is staring us in the face. It is still in its infancy, still
drifting clumsily about in its primeval soup, but already it is achieving evolutionary
change at a rate that leaves the old gene panting far behind.
The new soup is the soup of human culture. We need a name for the new
replicator, a noun that conveys the idea of a unit of cultural transmission, or
a unit of imitation. ‘Mimeme’ comes from a suitable Greek root, but I want a
monosyllable that sounds a bit like ‘gene’. I hope my classicist friends will forgive
me if I abbreviate mimeme to meme. [. . .] It should be pronounced to rhyme
with the word ‘cream’.
Examples of memes are tunes, ideas, catch-phrases, clothes fashions, ways of
making pots or building arches. Just as genes propagate themselves in the gene
pool by leaping from body to body via sperms or eggs, so memes propagate
themselves in the meme pool by leaping from brain to brain via a process which,
in the broad sense, can be called imitation. (1989: 192)
Generalising Darwinism 117

Evidently, at least at first sight, linguistic competence properties as

we have defined them appear to qualify as memes in Dawkins’ sense.
Of course such conclusion should not be jumped at, and merit a more
detailed discussion. This will be pursued below. For the moment let us
consider some of the baffling consequences which would follow, should
Dawkins’ proposal turn out to be correct.
First, the notion of memes clearly disturbs the normal, comfortable
view that what we know is ‘ours’ and under our own control. Dawkins’
suggestion appears as even more disturbing when one considers the par-
ticular context in which it was made. The selfish gene argues a perspective
on biological organisms which reduces them to passive ‘vehicles’ or ‘sur-
vival machines’, made solely to replicate the genes whose expressions
they are. While this perspective might be a bit unusual compared with
the everyday way in which we look at animals and plants, Dawkins’ argu-
ments are so stringent and compelling that one cannot really fail to see
their plausibility – at least in the case of plants, ants, sea slugs, fruit flies
and possibly even ‘higher’ animals. They do not, however, apply to us
humans, one is relieved to see, because, after all we and our behaviour
are not as ‘fully determined by our genes’ as other forms of life seem to
be. We have consciousness, thoughts, rationality and language. We can
think and share our thoughts with one another. We can consciously reflect
upon and thus steer our own actions, and we have ‘culture’. But where
does Dawkins take us next? To the very conclusion that even our ideas,
thoughts, words, and all cultural achievements of the human race may
not strictly speaking be ‘ours’ at all. Instead, our minds might be ‘theirs’
in the same way as our bodies ‘belong’ to our genes. So the only thing
that makes us special is that we do not serve one type of replicator but
two, and the seeming unpredictability and freedom that we see in human
actions may result merely from the fact that the competition which our
two masters wage for control over us is a very complex game.
It is hardly surprising that responses to Dawkins’ proposal were mixed
and seldom moderate. Within the humanities it was for some time more
or less ignored. The feeling seems to have been that since Dawkins was a
biologist, his arguments were unlikely to have any real relevance outside
the biological community. Consequently, their wider implications were
not taken seriously. As far as the linguistic community is concerned,
there was the additional problem, of course, that biologically inspired
attempts to deal with the historical development of languages had previ-
ously had a rather bad start and been discarded long ago for the reasons
discussed above in section On the other hand, scholars with
transdisciplinary interests took up Dawkins’ idea with great enthusiasm.
Most prominent among them were Douglas R. Hofstadter (e.g. 1995),
the cognitive scientist best known as the author of Gödel, Escher, Bach and
118 Selfish Sounds and Linguistic Evolution

Daniel C. Dennett, the Harvard philosopher and author of Consciousness

explained, Darwin’s dangerous idea and Kinds of Minds. Particularly the
latter has repeatedly urged for Dawkins’ idea to be taken as seriously as he
thinks it deserves, and has published a number of articles on the subject
(e.g. 1990b, 1999b).9 A first book-length attempt to see how far ‘memet-
ics’ can be taken was made by Susan Blackmore (The meme machine, 1999;
but see shorter contributions such as Blackmore 1997 or 1998), and, as
already indicated, the Journal of Memetics was established on the internet
in 1996. But even as the idea has come to gain a certain respectability
in academic circles, its status is made precarious. Because of its inherent
sensationalist aspects it has attracted a large number of enthusiasts who
are attempting to expand the idea into a theory with a rashness that makes
Schleicher’s and Jespersen’s attempts to ‘Darwinise’ historical linguistics
appear, in retrospect, as epitomes of scholarly caution (Brodie 1996, or
Lynch 1996). This, in turn, has naturally strengthened the scepticism of
the academic establishment of course (see, for example, Aunger 2001;
Boyd/Richerson 2000; Costall 1991; Hull 1982, 1999; Lass 1996; Plotkin
2000; Rose 1998; Schendl 1996).
To some extent, this may have to do with a certain ambivalence on
Dawkins’ part. On the one hand, he has repeatedly attempted to elaborate
his ideas. Thus, in 1982, he takes the topic up again and says
I have previously supported the case for a completely non-genetic kind of replica-
tor, which flourishes only in the environment provided by complex, communicat-
ing brains. I called it the ‘meme’ (Dawkins 1976a10 ). Unfortunately [. . .] I was
insufficiently clear about the distinction between the meme itself, as replicator,
on the one hand, and its ‘phenotypic effects’ or ‘meme products’ on the other.
A meme should be regarded as a unit of information residing in a brain (Cloak’s
‘i-culture’). It has a definite structure, realized in whatever physical medium the
brain uses for storing information. If the brain stores information as a pattern of
synaptic connections, a meme should in principle be visible under a microscope
as a definite pattern of synaptic structure. If the brain stores information in a
‘distributed’ form [. . .], the meme would not be localizable on a microscopic
slide, but still I would want to regard it as physically residing in the brain. This
is to distinguish it from its phenotypic effects, which are its consequences in the
outside world (Cloak’s ‘m-culture’). (1982: 109)
Yet, even when Dawkins tries to make the notion more precise and to clar-
ify its presumed ontology, the examples he continues to give of ‘memes’ –
or strictly speaking of their effects – make it dubious that they could really
be faithful replicators. Thus, also in The extended phenotype, they include
‘words, music, visual images, styles of clothes, facial or hand gestures,
9 See also Dawkins’ own paper in Dahlbom (1993).
10 The first edition of Dawkins (1989).
Generalising Darwinism 119

skills such as opening milk bottles in tits, or panning wheat in Japanese

macaques’ (109). Even more vaguely, ‘Marxist, or Nazi memes’ are men-
tioned, and in other places Dawkins has repeatedly made the case that
‘God’, ‘Christianity’ or, ‘The Virgin Mary’ might be ‘memes’. The obvi-
ous problem with such examples is that everybody’s conceptions of ‘God’,
‘Christianity’ or even simpler concepts such as ‘tables’ or ‘chairs’ differ
from one another in obvious ways, so it is hard to imagine that their
neuronal representations should be sufficiently similar to count as copies
of each other. If they are not, they cannot be replicators either, and the
whole approach loses its basis. Historical linguistics as memetics

We have seen that the best way of describing language change is in terms
of changes in the distribution of what we have called ‘competence prop-
erties’. These changes are brought about through their differential repli-
cation. We have also seen that both language acquisition and – more
importantly in this context – language history can be construed as evolu-
tionary processes: the ways in which languages change do display ‘learn-
ing’ and ‘adaptation’, and the directions of changes are not determined
by designers with a purpose but by the concerted actions of distributed
agents and the environmental feedback these actions incur. The appar-
ently purposeful ways of language change are their emergent higher-level
effects. Adopting a term coined by Adam Smith for economic purposes,
Rudi Keller has called them ‘Invisible Hand Effects’ (Keller 1990), to
express that they do not reflect human intentions.11
Now, exactly such a state of affairs would be predicted if competence
properties qualify as replicators in Dawkins’ sense. Therefore, it would
be more stubborn than cautious to deny the possibility that languages
evolve the way they do because their constituents are replicators.12

11 As Matt Ridley pointed out, of course, the ‘Invisible Hand is exactly the same concept
as Natural Selection’ (2000: 28). The main difference is that the bottom-up approach
taken by Darwin makes it more easily possible to investigate the mechanisms by which
‘the Invisible Hand’ does its tricks, and thus to explain why particular ‘selections’ are
being made.
12 It is somewhat curious that neither Dawkins, nor any of the scholars who have taken his
idea seriously, has ever attempted to test its plausibility by applying it to the rich and well
documented data provided by linguistic change. After all, there are few other domains
of cultural change which have been studied in comparable detail and in comparably
systematic ways. Possibly, however, it is the very long tradition of historical language
studies as an academic discipline, and the progress they have made without employing
an evolutionary paradigm, which made outsiders like Dawkins, Dennett, Hofstadter,
Blackmore and many others cautious. After all, trying to contribute to a discipline with
whose theoretical assumptions and methodologies one is not very familiar, is always
risky. Linguists, on the other hand, might have felt that even though many problems
120 Selfish Sounds and Linguistic Evolution

The route towards understanding language change will then lead via
the following questions. First, we elaborate the notion of linguistic repli-
cators. We need to ask whether the competence properties that can be
inferred from linguistic behaviour and textual output qualify as such.
Secondly, we must try and determine in as much detail as possible how
competence properties actually do get replicated, and on what factors
their stability depends. Thirdly, we shall have to discuss to what environ-
mental feedback the stability and replication of competence properties
are likely to be sensitive. If we manage to give plausible – if only prelim-
inary – answers to each of the three questions, we shall have outlined a
theory of linguistic evolution or an evolutionary theory of language.

5.3 Résumé and outlook

Let us sum up, once again, what we have said so far, and what it implies.
We have seen that language change can be described as the differ-
ential replication of competence properties within ‘property pools’ or
‘populations’ constituted by speech communities and implemented in
the minds/brains of the speakers in such communities. On this view lan-
guages are systems of patterns which maintain their ‘identities’ over long
periods of time through being copied between successive generations of
states of minds/brains.
After discussing the properties of complex adaptive systems, we have
concluded that language changes are likely to be brought about through
environmental feedback on rivalling systems of competence properties,
incurred via their behavioural and textual unfolding, and that language
change can indeed be construed as ‘evolutionary’, ‘adaptive’ or reflecting
‘learning’ on part of the system.
We have seen that ‘evolutionary changes’ of the kind that languages
seem to undergo can be predicted in replicator systems. It will reflect
those properties of its constituents (that is, the replicators that make
them up) that affect the relative success of their replication under given
environmental conditions.
These observations suggest a novel perspective on language and lan-
guage change as well as a potentially productive research programme.
Any property of any language at any time can be explained as existing
because it has managed to place a stable copy of itself into the compe-
tence (that is, the mind/brain) which has it. This, in turn, implies that the

concerning the explanation of linguistic change were not satisfactorily resolved, they
were not so crucial as to necessitate the adoption of a paradigm which to many linguists
was as unfamiliar as the peculiarities of language change were to a biologist.
Generalising Darwinism 121

most adequate way of approaching the study of language and language

change, is by asking (a) what the replicating units that constitute compe-
tences actually are, (b) by what mechanics they replicate, and (c) what
(environmental) factors influence their success at replicating.
Approaching language from the point of view of linguistic replicators
allows one to relate, conceptually and causally, a large variety of issues
that otherwise appear to belong to different empirical domains and have,
in the past, tended to be studied separately and consequently been dif-
ficult to re-integrate. These issues include the neurobiology of language
(now: the material implementation of linguistic replicators), the struc-
ture of linguistic competences (now: the structure of replicator systems),
linguistic performance or language use (now: the behavioural unfolding
of replicator systems), universal grammar (now: genetically determined –
and therefore ‘environmental’ – constraints on linguistic replicators and
systems of such), the bodily hardware for articulation and perception
(physiological – that is, once again ‘environmental’ – constraints on the
expression and thus the replication of competence properties), speakers’
cognitive and communicative needs (now: environmental constraints on
the stability of competence properties within individual minds), the social
structure and organisation of speech communities (now: environmental
constraints on the pathways for property replication) and so on.
Thus, it is no longer necessary to divorce the study of language from
matters not strictly speaking linguistic which are nevertheless clearly
related to language in order to establish linguistics as a coherent aca-
demic discipline. Rather, linguistics can be understood as studying how
languages acquire, maintain and change their properties in interaction
with those aspects of the world to which those properties, viewed as repli-
cators, are sensitive. Distinctions such as those between competence and
performance, or between language and thought (while strategically plau-
sible for some purposes), will no longer imply that one must focus on one
of the aspects to the exclusion of the other. Both the make-up of speakers’
minds, as well as the energy required and the benefits incurred by linguis-
tic performance can be seen to exert selective pressures on competence
properties, so that the latter cannot be explained without reference to the
former. Similarly, if one ‘takes the perspective’ of competence properties
as replicators whose success under varying conditions is to be determined,
factors from all these levels can be systematically related to the study of
language without amounting to a ‘hopeless study of everything’, as Noam
Chomsky appears to fear.
6 Towards an evolutionary theory of language

6.1 Can there be linguistic replicators at all?

6.1.1 Criteria for identifying replicators

The chances of finding an integrated evolutionary model of language and
language change depend on the question of whether linguistic replicators
can really be assumed to exist. Unless they do, we need not bother with
the rest. Are there really any competence properties that qualify as lin-
guistic replicators then? In order to decide, we first need clear criteria.1
Dawkins proposes a definition that is analogical to his definition of a
‘gene’, ‘which comes from G. C. Williams. A gene is defined as any por-
tion of chromosomal material [i.e. DNA] that potentially lasts for enough
generations to serve as a unit of natural selection. [. . . It] is a replica-
tor with high copying-fidelity. Copying fidelity is another way of saying

1 Importantly, this question does not depend on finding a way of physically describing
linguistic replicators. What matters is there may exist, in principle, mental linguistic
replicators capable of getting an evolutionary process going. Recall at this point also
that Darwinian evolutionary theory was established without the actual chemical units of
selection being known. Darwin himself did not grasp the importance of Mendel’s exper-
iments for his theory, and genes were discovered only in the twentieth century. And even
today defining the gene is not easy and in fact the term is used quite differently by breed-
ers, geneticists and molecular biologists because they are interested in different things. At
the molecular level, genes consist of sequences of nucleotides along a molecule of DNA.
Names are given to different lengths of DNA, such as a codon, which is a sequence of
three nucleotides, or a cistron, which is a sufficiently long sequence of nucleotides to pro-
vide instructions for building one protein – with a start symbol and a stop symbol. Neither
of these is necessarily passed on intact in sexual reproduction and neither corresponds
to what we think of as the gene ‘for’ something. [. . .] Yet it is these effects that natural
selection gets to work on. So what is the unit of the gene? [. . .] One useful suggestion is
that a gene is hereditary information that lasts long enough to be subject to the relevant
selection pressures. [. . .] This intrinsic uncertainty about just what to count as a gene
has not impeded progress in genetics and biology. It has not made people say, ‘We cannot
decide what the unit of the gene is so let’s abandon genetics, biology and evolution.’ These
sciences all work by using whatever unit they find most helpful for what they are doing at
the time. (Blackmore 1999: 54)

Towards an evolutionary theory 123

longevity-in-the-form-of-copies’ (Dawkins 1989: 28f).2 Accordingly,

Dawkins sees a meme as ‘an entity that is capable of being transmitted
from one brain to another’ (1989: 196), and Dennett basically follows
him by defining memes as ‘the smallest units that replicate themselves
with reliability and fecundity’.
These definitions yield the following criteria for ‘replicatorship’.3
r First, a replicator needs to be an ‘entity’. This means it must be iden-
tifiable, and persist for a minimal period of time with its characteristic
properties intact, that is, it must have a minimal stability, or longevity.
r Secondly, a replicator must be capable of being transmitted faithfully.
That is to say there must be a minimal fidelity to its copies. Otherwise it
would not be justified to call an entity replicating at all. So, the second
criterion must be copying fidelity. Of course, while copying fidelity must
be high for an entity to qualify as a replicator, it must be less than perfect
for evolution to become possible. Otherwise no variants with differen-
tial success at replicating will be produced, and neither environmental
selection, nor adaptation can occur.
r Thirdly, a replicator must have a minimal amount of fecundity. That
is to say, it must produce at least one copy of itself while it is stable.
Otherwise, copies of it will cease to exist, and the type will disappear
r Fourthly, and as the agentive suffix in replicator suggests, it must actively
contribute to bringing its replication about. Since we are talking here
about such entities as bits of DNA or mental/neuronal patterns, it might
seem at first that we can attribute activity to them only in a metaphorical
sense. Both in the case of genes, and in the case of memes, or linguis-
tic competence properties, is the actual replication not ‘carried out’ by
organisms rather than the replicators themselves? In a sense the issue
is a matter of perspective. If an organism carries a gene for, say, homo-
sexuality, then the failure of the organism to mate with members of
the opposite sex will impede the chances of the responsible gene being
replicated. But to the degree that the organism’s behaviour is genetically
co-determined, it can just as well be argued that the gene is responsible
for its own failure to be replicated. The term ‘activity’ will therefore
be used in a more general sense, in which it does not require an agent
to be animate. Let us say, then, that in order to qualify as ‘active’, it
is sufficient for replicating patterns to have some effect on their own
chance of being replicated or copied well4 (see Deutsch 1997:172).
2 See also there for a justification of that definition.
3 See Dawkins 1989 (17 and passim).
4 That is to say, the fact that copies of it come into existence, must depend in some mean-
ingful way on properties of the replicator itself. Thus, a spot of dirt which has formed on
124 Selfish Sounds and Linguistic Evolution

Thus, a competence constituent qualifies as a replicator if it has suffi-

cient longevity, copying fidelity and fecundity, and actively determines
the chances of its own replication.
Note that this definition involves the notion of a ‘critical amount’ in
relation to the first three qualities. This seems somewhat unfortunate,
because how is one to determine when a particular entity should count
as actually having the ‘critical amount’? This apparent fuzziness is not
really problematic, however. Whether or not an entity qualifies can easily
be determined, if one simply asks whether a new copy of it is produced
before the original disintegrates itself.
At the same time, the scalability of the three factors in the definition
has a positive side effect because it allows us to adduce them not only
for the identification of replicators, but also for measuring the relative
evolutionary success of replicators. That is to say they define parameters
for distinguishing better, or ‘fitter’ replicators from ones that are less good
at replicating. Take copying fidelity first. Obviously an entity is the better
as a replicator the more closely its copies resemble itself. In fact, if the
copy of an entity differs from it, and is also capable of replication, it will
compete with its original and threaten its lineage. Next, look at fecun-
dity. Clearly, a replicator will be the better the more copies of itself it can
produce per unit of time. Finally, consider longevity. As with fecundity
and copying fidelity, it is clear that, other things being equal, a replicator
will be the better the more it has of it. The longer an entity exists, the
more time it will have to produce copies, even if it is comparably slow at
doing so.
Thus, the optimal replicator would ‘live’ eternally (have maximal
longevity), produce an endless series of copies at extremely high speed
(have maximal fecundity), and make no copying mistakes at all (have
maximal copying fidelity). In practice, of course, actual replicators can-
not maximise their quality with regard to each of the three parameters.
Instead, stable compromises seem to have evolved. While the genotypes of
some species (such as those of elephants or humans) ‘build themselves’

the glass pane of a copying machine is not a replicator. It may be annoyingly long-lived
(difficult to remove), and will copy fecundly (whenever the machine is operated) and as
faithfully as the resolution of the copier admits. Yet, nothing in the particular pattern of
the spot bears any causal relationship to the accidental fact that copies of it happen to be
produced. Therefore, although it is replicated, it is not active. Genes, on the other hand,
do play an active role in their own reproduction. That is to say, although their repro-
duction also depends on machinery, on chemicals in living cells, that machinery would
not reproduce anything but genes and it would not reproduce genes either, were it not
for their special properties. Thus, the fact that genes do have the properties they have is
essential for their replication. Therefore, even though their replication is something that
happens to them, it also happens through them, so that they truly deserve to be called
active replicators.
Towards an evolutionary theory 125

long-lived but costly bodies, and manage to produce only a relatively

small number of offspring, others (such as those of bacteria or insects)
make small and short-lived bodies but reproduce quickly and in large

6.1.2 Narrowing the search

Where is one to search for linguistic replicators? Dawkins’ criteria are
helpful, but where should one look for entities on which to apply them?
Dawkins’ suggestion that ‘a meme should be regarded as a unit of infor-
mation residing in a brain’ implies that we ought to focus on brains and the
way in which linguistic information is implemented there. As has already
been said, however, this represents a daunting task, because there exists as
yet no way of observing and measuring brain activity at the high temporal
and spatial resolution which would be necessary for identifying individ-
ual units of information in the brain. This is even more true of linguists,
who have normally not received the adequate education for approach-
ing language from the neurological end at all, and whose approach has
been based on the implicit assumption that the brain–mind interface
or the relation between mental information and its material substrate
are altogether too complex and mysterious even to address. This is why
the concepts, categories and rules developed in linguistics are normally
derived from texts, their structural properties and/or their functions.
Established linguistic categories tend to describe the phenotypic
expressions of competence constituents rather than those constituents
This means that we shall have to start our search for linguistic replica-
tors from their expressions. This is exactly what we did in our informal
account of ‘/i /s’ replacing ‘/e /s’ in ‘words’ like green, see, or meet. We
replaced the terms ‘/i /’ and /e / by the notions of ‘competence properties
for /i / and /e /’ respectively, indicating both that textual /i /s and /e /s are
not to be confused with the mental entities they express and that on the
mental level there might be more than one way in which competences
could generate /ren /, /se /, /met /; /rin /, /si /, /mit /, etc.
Of course an abundance of ‘linguistic categories’ have been proposed
for competence modelling. In most established approaches languages are
conceived of as rule systems for creating and interpreting linguistic code.
Descriptive categories have been required to denote: (a) constituents
of the code, such as ‘texts’, ‘phrases’, ‘sentences’, ‘intonation groups’,
‘feet’, ‘words’, ‘morphemes’, ‘syllables’, ‘rhymes’, ‘onsets’, ‘phonemes’,
‘articulatory gestures’, ‘acoustic features’ and so on; (b) constituents of
‘meaning’ that the code transports, such as ‘schemata’, ‘concepts’,
126 Selfish Sounds and Linguistic Evolution

‘semantic features’ etc.; and (c) rules that relate these constituents in
various ways.5
Like meaning, code is seen to consist of hierarchically ordered con-
stituents, so that larger bits are constituted of smaller bits, and the ‘items’
or ‘building blocks’ of which code is made can be both complex and
simple. In a morphosyntactic hierarchy, for example, sentences consist
of phrases, phrases consist of words, words consist of morphemes, mor-
phemes consist of phonemes, and so on. The building of such hierarchies
may involve rules like S → NP VP, NP → (Det) N, VP → V NP etc.
Also, syntactic embedding allows complex constituents to assume the
roles of smaller bits in bigger structures, so that syntactic hierarchies dis-
play a certain degree of recursiveness, as in I hit the dog that chased the cat,
thought to be structured somewhat like this:
(14) [S [NP [N I]N ]NP [VP [V hit]VNP [NP [Det [the]Det [N dog]N ]NP
[S [NP [N which]N ]NP [VP [V chased]V [NP[Det the]Det [N cat]N ]NP ]VP ]S ]
NP ]VP ]S .

There are not only syntactic hierarchies, of course, but also others, such
as prosodic ones, in which utterances are seen to consist of intonation
phrases, feet, syllables, onsets, rhymes and segments. Because linguistic
code is hierarchically organised, linguistic theories assume that speakers’
minds must incorporate ‘rules’ not only for relating bits of code to bits
of meaning, but also for relating smaller bits to larger bits. Apart from
such ‘structure building’ rules, the need for assuming ‘structure changing’
rules has arisen as well, relating more superficial representations of code
to assumed ‘deeper’ ones, which may sometimes be different.
In short, linguistic theory provides us with a host of concepts for
analysing linguistic code and relating it to meaning. The above exam-
ples include merely a random selection, which does not even intend to be
representative of the vast descriptive repertoire that centuries of linguistic
practice have amassed. Its purpose is merely to indicate the vastness of
the task we are facing. For each linguistic category – be it simple, com-
plex, a rule, a bit of code or a bit of meaning – there exists the possibility
that it might represent a competence property that replicates.
Clearly, any attempt to consider even only a representative number of
them in detail and discuss how likely it is that they might actually be lin-
guistic replicators would merit a volume of its own. Therefore, we must
deal with the issue in an exemplary manner. That is to say, we have to
choose a small set of categories and see how far we get with them. Since

5 Importantly, one has distinguished between the material representation of code in writ-
ing, speech (and, recently, signing) and the representations of the code within minds.
Linguistics has typically focused on the latter.
Towards an evolutionary theory 127

our general goal is to find out if an evolutionary, generalised Darwinian

perspective on language can help us to understand the ways in which lan-
guages appear to change, we do not require a complete model of linguistic
competence in ‘replicator’ terms. It will be enough to identify a few com-
petence constituents that qualify as linguistic replicators, and to check
how much of their historical fate can be derived from the fact that they
are. We only need to find, then, if there are any among the competence
constituents proposed in established linguistic theories which indeed rep-
resent ‘memes’ in Dawkins’ sense, that is, units of linguistic information
in human brains which are (a) sufficiently stable, copy with (b) sufficient
fidelity and (c) in sufficient numbers, and qualify as (d) ‘active’ in the
sense that their own properties affect their chance of being replicated
In the following I shall focus on criteria (a) to (c). These can be dis-
cussed by looking at assumed constituents of linguistic competence, their
stability in the mind/brains of individual speakers, and their historical sta-
bility in speech communities. Criterion (d), on the other hand, raises the
question of how exactly competence constituents are transmitted. This
deserves a more detailed discussion, which we shall postpone until we
have a clearer picture of how units of linguistic knowledge match the first
three criteria.

6.1.3 Arguing from size

The question of whether units of linguistic knowledge, or competence
constituents are sufficiently stably represented in human minds, and copy
faithfully and fecundly enough to qualify as Dawkinsian replicators, is not,
I think, unaddressable. There is an argument by which one may deduce,
at least in a rough manner, how great the chances are that established
constituents of linguistic models should qualify as potential replicators.
As will be shown, they are actually likely to be quite high. The argument
goes like this.
As we have said, constituents of linguistic theories have been estab-
lished for the task of modelling the properties of observed linguistic
behaviour and its textual products in terms of a set of concepts plus rules
relating them. Obviously, the idea has always been that the derived mod-
els should be simpler than the phenomena for which they are supposed to
account.6 The standard method of linguistic modelling attempts to derive
observable discourse from a limited inventory of symbols plus a possi-
bly small system of combinatorial rules. In other words, linguists have
attempted to establish both what such a symbol inventory might be and
6 Clearly, a model that is not simpler than the phenomenon it is supposed to be a model
of would not pass the test of Occam’s razor.
128 Selfish Sounds and Linguistic Evolution

which rules a competence model would need for combining them into
the larger, and ever new pieces of discourse that we observe. Therefore,
a typical strategy has always been to analyse discourse into smaller com-
ponents, and to devise empirically adequate and possibly powerful rules
for their (re-)combination. The approach will be familiar to everybody
from grammatical parsing and from spelling. As a result, the competence
models that linguistics has produced have thus come to contain primi-
tives which are on the small side: ‘phonemes’, ‘syllables’, ‘morphemes’,
‘words’, and so on. A competence which can deal with a sentence like
John is easy to please will normally not have to retrieve a copy of that
whole sentence from memory, but rather copies of the words John, easy,
to, please and be. It will have to know, additionally, which of them is a noun,
an adjective, a verb or a particle, and so on. Then it can use components
such as these to assemble different sentences such as John pleases Mary,
It pleases John that Mary is here, and infinitely many others.
The details differ from theory to theory, but what matters is that the
components of which linguistic competence models consist are typically
small. For our purposes this is good news, because the smaller a com-
ponent is, the more easily it will qualify as a replicator. This is because
the criterion of copying fidelity will disqualify as potential replicators
all configurations that do not seem to behave as indivisible units in the
assumed replication process. Therefore, competence properties which
have idiosyncratic distributions and which may vary independently of
others will not be parts of bigger replicators.
In order to see why this must be so, consider a few potential candi-
dates. Start with the biggest possible unit, that is, with the possibility that
whole competences might copy intact. If this were the case it would mean
that when a child acquires a language from her parents, she acquires the
complete copy of a competence ‘for’ English. Only then would ‘English’
deserve to be called a replicator. The criteria of stability and fecundity
would not rule this out: any ‘language’ is stably represented in the minds
of all speakers once they have acquired it. Also, there is sufficient evidence
that a sufficient number of speakers keep acquiring English to make up
for speakers who die. Again, this is trivially true of any language as long
as it ‘is spoken’. However, ‘English’ certainly does not pass the copying
fidelity test. As we have already mentioned no two speakers speak exactly
alike and not all inter-individual differences can reasonably be attributed
merely to performance factors.7

7 While it might be argued that some of these differences are irrelevant because they might
have causes that are not strictly speaking linguistic, this argument cannot be used within
the approach we have taken here. We have defined competence properties as properties
‘for’ particular types of behaviour, and when we observe differences in such behaviour
we shall have to accept them.
Towards an evolutionary theory 129

Thus, whole ‘competences’ or ‘languages’ must be ruled out as repli-

cators. To qualify as replicators linguistic constituents must be smaller
than complete competence systems. We could now proceed by elimina-
tion in the following way. We could observe two speakers, model their
competences from their effects in discourse, and separate the properties
which the two models share from those which they do not. We could then
say that the complex of shared properties might constitute a single faith-
fully replicating entity, because copies of it seem to exist in two speakers’
brains. But this would only be the beginning. We would next have to
include more speakers into the sample, throw out competence properties
that are not shared and keep those that are common to all. Predictably,
the complex of shared competence properties is likely to get smaller and
smaller as the number of speakers in our sample increases. Ultimately
we may be left with a rather small set of, possibly, disconnected compe-
tence properties remaining which all speakers in the sample still seem to
Just as interesting as the set of competence properties that remain in
the pool as our elimination proceeds, are those which are eliminated in
each round. Of them we can be sure that if they copy at all, they copy
independently of the rest. If we find competence properties to be dis-
tributed differently from others, while being shared by a sufficiently large
number of speakers, we are likely to have identified replicating units of
indivisible particulateness. In short, every competence property with an
idiosyncratic distribution in the competence pool qualifies as an indepen-
dent replicator.
Thus, isolating competence components which appear to copy faith-
fully is an analytic procedure and similar in this respect to the method
employed in linguistic competence modelling in general. In the latter,
one looks for an inventory of possibly small stored units that can be used
for assembling the large and hierarchical structures of observed linguistic
code; in the former one looks for an inventory of small units with sta-
ble identities and high cross-individual similarity. There is bound to be
considerable overlap, and therefore a good chance that the competence
constituents posited in classical linguistic modelling will not be wildly
different from linguistic units that happen to replicate faithfully.
Of course, there are also bound to be differences. These are to
be expected because normal linguistic theory models competences by
analysing their expressions, that is, typically stretches of text. The com-
petence properties deduced that way will typically be functional units
and/or ‘building blocks’, that is, constituents with particulate and clearly
identifiable expressions. An example of a building block deduced that
way would be the suffix ity in serenity. It shows up also in charity, stupid-
ity, and so on. Likewise, the sequence seren(e) in the same word shows up
130 Selfish Sounds and Linguistic Evolution

also in serene, or serenade. Yet, the unit serenity is unlikely to be acquired

by any speaker unless she is exposed to it in its completeness. Thus,
while being analysable as the combined expression of two competence
constituents that can, in principle, be expressed independently of each
other, serenity may have to count as a single unit from the point of view of
replication. Units may tend to copy intact while being, at the same time,
derivable from smaller constituents. This seems to be clearly true of many
derived lexical items – particularly if they are lexicalised, non-productive
and partially opaque formations (such as serenity, impression, genetic), but
also if their forms and meanings are perfectly predictable as in reader
(so. who reads), writer (so. who writes), doable (of sth. that can be done).
As far as units that are bigger than words are concerned, this might be
similarly true of proverbs, idiomatic expressions and collocations. These
are often just as decomposable and derivable as obvious constructions,
but contrary to the latter they appear to replicate in toto. An example
would be Boys will be boys (a potential replicator) vs. (The) boys will (soon)
be men.
Conversely, some complex configurations may not be economically
derivable by rule and may thus have to be posited as ‘primitives’ in
‘derivation-based’ competence models, while not necessarily copying
intact. This might be true, for example, of words when interpreted as
complex semiotic configurations relating formal and semantic properties.
Of course, there are many lexical items (such as man, table, dog, book and
so on) which convey the impression that there is some fixed meaning that
they regularly carry. Yet, even for them this is by no means always clear.
There is hardly a word which cannot carry a wide range of (sometimes
only loosely related) meanings, or which cannot be used ‘metaphorically’.
Therefore, the question of what represents a word’s lexical ‘core mean-
ing’ is in practice never as straightforward as one might think. Further-
more, there are cases (such as sympathetic, which means ‘likeable’ to some
people and ‘agreeing’ or ‘supporting’ to others; or frugal, which some-
times denotes ‘rich’ meals, sometimes ‘cheap or small’ meals) which show
quite clearly that form and meaning sometimes appear to copy separately.
Still, most established competence models need to assume simple lexi-
cal entries to be stable and integral information complexes, because the
form–meaning pairings they represent are impossible to derive by general
rules. This does not only apply to the semantic properties of words but
also to their syntactic ones. A case in point would be like. For the major-
ity of speakers, it represents a preposition (as in He is like his father) or a
conjunction (as in He speaks like his father does), while others appear to
treat it like an adverb (as in He’s like sexy in a way). Thus, from the point
of view of replication, the morphotactic configuration like, needs to be
Towards an evolutionary theory 131

treated as only loosely bound to information about its syntactic status,

while in established competence modelling, which has no way of ‘gen-
erating’ the relevant associations, the form has to be regarded as stored
together with its syntactic properties as a single lexical entry.
However, such differences do not alter the fact that an established the-
oretical unit of linguistic description is the more likely to qualify as a lin-
guistic replicator the smaller it is. Since this seems to be true of ‘memes’ or
cultural replicators in general, the argument from size could clearly allevi-
ate a problem which has riddled most attempts to turn Richard Dawkins’
suggestion that cultural evolution involves the differential replication of
memes into an applicable theory. Most of the examples that have so far
been treated as potential memes, such as the concepts of Evolutionary
Theory, God, salvation, the Virgin Mary or hellfire (see Dawkins 1989:
197), or complex texts such as Romeo and Juliet, of which West Side Story
may be construed as a copy (see Dennett 1995: 342–69), would prob-
ably not really qualify as replicators on Dawkins’ own criteria. Dawkins
himself is aware of the problem and acknowledges that

[. . .] I have talked about memes as though it was obvious what a single unit-meme
consisted of. But of course it is far from obvious. I have said that a tune is a meme,
but what about a symphony: how many memes is that? Is each movement one
meme, each recognisable phrase of melody, each bar, each chord, or what. (1989:

He even suggests the very method we have employed here for identify-
ing actual memes, which is analogously applied in evolutionary genetics,
where a gene may be defined

[. . .] not in a rigid all-or-none way, but as a unit of convenience, a length of chro-

mosome with just sufficient copying fidelity to serve as a viable unit of natural
selection. If a single phrase of Beethoven’s ninth symphony is sufficiently distinc-
tive and memorable to be abstracted from the context of the whole symphony,
and used as a call sign of a maddeningly intrusive European broadcasting station,
then to that extent it deserves to be called one meme. (1989: 195)

While having made the suggestion himself, however, neither Dawkins,

nor any of the scholars who have tried to elaborate his theory, follow it to
its conclusion. This is mainly, I believe, because the relatively small units
which are likely to qualify as plausible cultural replicators do not lend
themselves as readily for telling stories with which one can capture large
and mixed audiences or readerships as more complex configurations like
religious beliefs or stories do. Linguists, of course, are used to approaching
their data exactly in terms of those small and abstract units that the rest
of us have learned to fear in grammar classes.
132 Selfish Sounds and Linguistic Evolution

6.1.4 A few potential replicators

In the next sections we shall try to single out a few linguistic constituents
which qualify as linguistic replicators. Our aim is to see whether the fact
that they do helps us to understand their properties and their historical
fates. For that purpose, a set of examples will do, and we shall derive
them from the domains of phonology and morphology, and no attempt
will be made here to develop anything like a complete model of language
as a replicator system. However, it will be briefly discussed what prob-
lems the identification of potential replicators generally raises, that is,
not only in phonology and morphology, but also in syntax and seman-
tics. That is to say, we shall examine a variety of basic types of compe-
tence constituents, that is (a) constituents for representing, recognising
and producing linguistic code (for example, phonemes, morphemes, and
sequences of either), (b) ‘deeper’ and more abstract concepts transported
by and required for dealing with linguistic code (both ‘syntactic’, such
as ‘sentence’, ‘noun’, ‘noun phrase’, and so on, and ‘semantic’, that is,
‘concepts’ or similar constituents of ‘meaning’) and (c) rules, or processes
that relate different constituents. Phonemes and distinctive features: replicators vs. building blocks

Let us see, then, if there are at all any units which might be good candi-
dates for linguistic replicatorship. If the size based argument is correct, we
shall be most likely to find ‘language memes’ among small competence
constituents. But what exactly is a small constituent? As we have said,
competences are mental states, and it is unclear how the size of mental
entities is to be measured. Therefore, we are reduced to speculating about
the possible relation between the size of textual constituents and the total
size/number of mental elements involved in their recognition, produc-
tion, representation and processing. Although that is clearly difficult, it is
nevertheless plausible to assume that larger and/or more complex bits of
text involve more competence constituents than smaller ones. Thus, we
shall suppose that fewer mental entities will be involved in dealing with
a speech sound than in dealing with a meaningful sequence of speech
sounds such as morphemes or words, for instance. After all, it is likely
that the mental processing of, say, a word, should involve the processing
of all the sounds it consists of plus something else, such as the overall
Gestalt of the word, the meaning it conveys and so on. Admittedly, it is
far from certain whether this is correct, but no serious alternative has
been proposed. Most established linguistic theories do not even address
the issue. Rather, they typically assume categories derived from the
Towards an evolutionary theory 133

analysis of texts to be ‘mentally represented’ in a relatively straightforward

Thus, on the assumption just outlined we shall tentatively think of
competence constituents as small if they are ‘for’ small bits of text. By
that rationale, ‘phonemes’, that is, competence properties for the recogni-
tion, categorisation and production of speech sounds or ‘phones’, clearly
look like promising candidates for linguistic replicatorship. This intuition
finds support when one checks how well phonemes meet Dawkins’ cri-
teria (see page 123). Their copying fidelity is definitely very high. If one
interprets the letters in words like OE æfter, feawa, dagum, ealle, his, thing,
and gegaderude (all from example (2) above) as graphic expressions of
‘phonemes’ and considers the ModE pronunciations of after, few, dayes,
all, his, and things (which also express phonemes, of course), it appears
that many of them have indeed managed to remain represented in English
competences for more than thousand years, notably /f /, /t /, /d /, /l/, /h /, //
and / /. At the same time, phonemes count among the most long-lived
constituents of individual linguistic competences: once acquired, they
normally remain stable within human minds for the whole lives of their
hosts. They are indeed so difficult to ‘forget’ that they often resist even
conscious efforts by speakers to escape their influence, as in second lan-
guage acquisition, for example. Thus, they also pass the longevity test.
That their fecundity is also sufficient is self-evident. Although phonemes
normally find it comparably difficult to enter mature competences8 they
seem to copy well into brains that are in the early stages of language
However, phonemes themselves are internally complex. They repre-
sent configurations of articulatory and auditory-acoustic features. These
might themselves qualify as replicators. //, for instance, will cause air
to be pressed outwards through the windpipe, while tightening the vocal
cords so as to produce resonance. It will direct the tongue tip to be raised
and fronted to a position just below the hard palate and will see to it that
speakers’ lips are not rounded. At the same time, // will incorporate infor-
mation about auditory impressions based on acoustic signal properties,
such as acuteness and diffuseness.
It might now seem safer to assume that these smaller constituents of
phonemes are the actual replicators, that is, that our competences might
be made up of units like [tongue-fronting], [tongue raising], [creating-
an-egressive-airstream], [recognise resonance], [recognise compactness]
and so on. One might argue that many of properties seem to replicate so
well that practically all speakers have them. However, this is exactly the

8 In this respect, they might be beaten by lexical replicators, if there should in fact be any.
134 Selfish Sounds and Linguistic Evolution

point which disqualifies them as replicators. The fact that all speakers
seem to have them might suggest that their existence does not depend on
replication at all. Instead, the capacity of raising one’s tongue, the capac-
ity to create an egressive airstream, or the capacity to distinguish among
various types of acoustic signals are likely to be part of every human’s
genetic endowment. If these properties do indeed replicate, then it will
probably be genetically, not culturally or ‘memetically’. Thus, they are
unlikely to qualify as linguistic replicators proper. Instead, they seem
to constitute for language what nucleotide acids represent for biological
life, that is to say the low level building blocks from which larger units
emerge. It is only on such larger units, however, that natural selection
can come to act. Thus, phonemes are both small enough to copy intact
and complex enough to allow for variants to arise. Therefore, they qualify
better as replicators and possible units of selection than phonological fea-
tures do. We shall therefore include them in our tentative list of linguistic
memes. Phoneme clusters, syllables, morphemes and the question of meaning

Consider a few ‘bigger’ constituents next. What about clusters of
phonemes, syllables or morphemes, for instance? The problem with them
is whether they should be regarded as replicators, or rather as configura-
tions of replicators, or ‘memeplexes’, as Susan Blackmore has proposed to
call them. Although the difference between the two may not be clear cut,
the distinction is nevertheless important because replicator alliances are
by definition not stable enough, historically, for being units of selection.
Let us consider each of the constituents in turn. Take sound sequences
first. If phonemes are replicators, one might think that sequences of them
cannot be actual replicators, but must be regarded as replicator-teams at
best. Yet, this is not necessarily the case. As long as mental configura-
tions ‘for’ phoneme sequences are stable and fecund enough, and copy
with sufficient fidelity, they are also eligible as replicators – irrespectively
of whether their ‘constituents’ may also replicate independently or not.
Now, of many phoneme clusters and syllables this appears indeed to be
Consider copying fidelity first. If one considers text (a) in example (2)
again and compares it to ModE texts, one will find that sequences like
/kw æ/, /su /, /h æ/, /twe /, /dr /, /his /, /lə /, /minə /, /jə /, /rə /, /tə /, /h im /, etc.
appear to be represented in both Old English and Modern English com-
petences, and thus seem to have copied intact and faithfully for roughly
a thousand years – just like some of the phonemes expressed in the text.
Since they appear to have copied so well, it would be inconsistent to deny
these clusters replicator status, if they meet the other criteria.
Towards an evolutionary theory 135

Next, we need to consider the longevity of phoneme sequences. As we

have said, linguistic replicators exist in human minds or brains and must
therefore be distinguished from their textual expressions. When we talk
about phonemes or clusters of such in the sense of potential linguistic
replicators or memes, we are referring to configurations, or constituents,
in the minds of adult speakers which are ‘for’ producing and dealing with
sounds and sound sequences phonemically. It is therefore in minds that
replicators need to be long-lived, that is, to maintain their integrity over
time. Linguistic replicators must be stored in memory.
Now, when we discussed phonemes we could take their longevity as
memorised mental constituents more or less for granted, because they
are hard to get rid of even if one tries. What about phoneme sequences
then? At first, the answer might appear to be obvious: of course, we mem-
orise them. After all, linguistic competence involves a so-called ‘men-
tal lexicon’, which is essentially an inventory of memorised ‘phoneme
sequences’. In fact, the forms of words and morphemes probably repre-
sent the epitome of ‘stored knowledge’. However, a mental dictionary is
clearly not a mere inventory of phonemic assemblies. Instead it is sup-
posed to have, for each item contained in it, a representation of its ‘mean-
ing’, plus information about how the item is to be used syntactically, what
its socio-stylistic value is, and so on. No phoneme sequences seem to be
stably stored in human minds unless they are associated with such bits
of meaning and information. It seems, therefore, that in order to acquire
mental longevity, phoneme sequences need to carry meaning, that is to
say they must represent ‘morphemes’.
It would therefore appear that the next bigger units that qualify as
linguistic replicators should be morphemes. But morphemes have, by
definition, also a semantic side, and ‘meaning’, as we have already indi-
cated, is highly problematic. Assuming that morphemes are replicating
units implies not only that the mental constituents for their semantic side
are as long-lived and copy as faithfully as those for their forms, but also
that the binding between a morpheme’s meaning and its form will be
as strong as those among its phonemic constituents. It implies that our
mental dictionaries are indeed like real dictionaries. In the latter (lexical)
morpheme forms are always listed together with ‘their’ meanings. Once a
dictionary is written, the form–meaning associations are fixed. Do mental
dictionaries work like that? This is a very strong assumption and requires
some discussion.
Once again, the only easily accessible evidence linguists have is human
behaviour, text production and reception, as well as texts themselves.
While text production provides fairly good evidence that the mental con-
stituents ‘for’ the phonemic sequences which represent the formal Gestalts
136 Selfish Sounds and Linguistic Evolution

of morphemes reproduce faithfully enough and are stable and coherent

units, the mental status of the so-called conceptual, or semantic side
of morphemes, words and, in fact all ‘meaning bearing’ constituents,
is more difficult to deduce. Of course, many approaches to linguistic
meaning assume that morphemes, and lexical morphemes in particular,
normally do have at least ‘core meanings’ that can be established with
some confidence. This assumption is superficially supported by common
sense and lexicographic practice. A common notion is that these ‘mean-
ings’ are ‘ideas’, or ‘prototypical concepts’, that is, constituents which
stand in the mind for the ‘things’ which words refer to, and which involve
information from non-linguistic perception and experience. It is then
often suggested that in a speaker’s mind, a word like, say, elk ‘means’ the
concept ‘elk’ and that this concept represents at the same time all expe-
rience (however indirect) that a speaker has with elks (see for instance
Pinker 1997: 87). Since knowing what an elk is, and what the word elk
means involves several pieces of information, ‘meanings’ or ‘concepts’
are often assumed to be complex bundles of smaller units and have been
described by linguists with in terms of ‘semantic features’ or similar rep-
resentations of their internal ‘semantic’ structures. Thus, it is possible to
describe the meaning of, say, bull as [+concrete] [+animate] [+bovine]
[+adult] [+male], or to say that the verb kill means [] []
[] []. (McCawley 1968, see Kastovsky 1982: 268).
There are, however, notoriously difficult problems, like the distinction
between denotation, the essential ‘core meaning’ of an item, and conno-
tation, that is, whatever other concepts may be associated with it. Should
the meaning of kill carry a feature [−ethical], for example, or that of bull
a feature [+aggressive]?
Related to the problem of connotations is the question how much
‘encyclopaedic’ knowledge possibly associated with a concept should
be regarded as actually belonging to the meaning a word conveys, or
how to draw the line between the two. May the ‘meaning’ carried by
the word bull be fundamentally different from ‘concepts’ invoked by the
experience of a bull after all? Structuralist linguists have tended to dis-
tinguish sharply between linguistic meaning and world-knowledge, but
more recent approaches tend to assume that there might be considerable
overlap between the two.9 But the issue is not resolved. Are [has horns]
or [will attack red capes waved by toreros] part of the meaning of bull
or do they represent independent ‘knowledge’? Does bull mean some-
thing different to a zoologist than to a linguist? Does it mean something
different to people who have been attacked by one than to those who

9 See, for example, Lakoff (1987), Langacker (1987), or Mettinger (1999).

Towards an evolutionary theory 137

haven’t? Does somebody ‘know’ the proper meaning of the word unless
she is aware that a bull belongs to the family bovis?
The last questions point to yet another issue that has been riddling sys-
tematic attempts to pin down and describe meaning. Meaning is by defini-
tion subjective. Words don’t just ‘mean something’, but they always mean
‘it’ to ‘somebody’. While the formal side of people’s linguistic behaviour
is comparably easy to describe in intersubjectively verifiable terms, the
‘meanings’ in their minds show up only indirectly in their behaviour. This
makes it extremely difficult to decide if a word really means the same to
any two speakers. This is particularly unfortunate since this is one of the
central questions that needs to be answered if one wants to know whether
meanings can be assumed to replicate faithfully and intact, and this would
clearly require a third-person perspective on what a morpheme means to
different individual speakers.
The issue is still further complicated by the complex interplay between
allegedly stored word meanings and contextual or pragmatic factors in
actual discourse. The meanings which words convey in the context of
use appear to be very diverse. How much of its alleged lexical meaning
‘to cause somebody to die’ does kill ‘carry’ in texts like the following, for
(15) My sister would kill me (=be very angry with me) if she heard
me say that.
Lack of romance can kill a marriage.
The doctor gave her some tablets to kill the pain.
The two of them killed a bottle in one evening.
We were killing ourselves laughing.10
Does a word still carry all of its ‘proper’ meaning when it is used fig-
uratively or metaphorically? How often does a word have to be used
‘metaphorically’ for the metaphorical meaning to become its proper one?
Apart from well-known difficulties in the description of meaning, how-
ever, there is also a more fundamental problem, namely that semantic
descriptions like the ones just given, represent themselves only (meta-)
linguistic labels for assumed competence constituents. They suggest that
meanings are stored in minds in a way that is similar to language itself.
Some linguists, like Steven Pinker for instance, refer to that assumed
language as ‘mentalese’ and call it ‘the language of thought’. Clearly,
however, assuming that meaning exists in human minds in another form
of language is just a way of making it easier to label assumed concepts, and
conceals how unresolved the issue really is. Semantic labels like ‘[elk]’, or

10 Examples from the Cambridge International Dictionary of English.

138 Selfish Sounds and Linguistic Evolution

constituents like [+animate], [] etc. are certainly not to be under-

stood as ‘English translations’ of ‘words in mentalese’.11 The meaning of
a word does not look like the word, and the labels with which linguists
describe meaning are labels, no more.
For our task, labels that ‘stand for’ meaning without describing it are
not at all helpful of course. If we want to determine whether word mean-
ings replicate faithfully we are forced to think of them as entities which
exist in the minds of individual speakers in specifiable forms. That we can
give such realistic readings to descriptions of meaning in terms of labels
or features does not seem likely, and would have very strange implica-
tions. Thus, if interpreted ‘realistically’, the analyses of bull and kill given
above appear to suggest that the mental constituents activated by the mor-
phological forms bull and kill are complex and comprise the constituents
activated by the morphological forms concrete, animate, bovine, adult and
male, in the first case and cause, to stop, and being alive in the second.
A highly speculative reading, of course, which implies that ‘causation’ is
less complex as a concept than ‘killing’.
All these points illustrate (albeit admittedly in a very sketchy manner)
what deep and difficult problems ‘meaning’ tends to raise. Although the
issue has not been approached only in linguistics, but in also in philoso-
phy, psychology, neurobiology and, more recently, in cognitive science,
the issue is still so poorly understood that committing oneself with regard
to the potential replicator status of linguistically transported meaning
would clearly be premature.
However, there are a few things that may be asserted with some confi-
dence. Thus, whatever ‘meaning’ may be, there can be little doubt that
by virtue of being morphemes, phoneme sequences convey more of it
and more predictably than meaningless sound sequences. It is even likely
that there will be some overlap among the ‘semantic’ constituents which
a particular morpheme relates to in different occurrences and with differ-
ent speakers. At least to a degree, the cognitive and behavioural response
of speakers who are exposed to the expression of a morpheme is pre-
dictable. Similarly, a speaker who uses a morpheme seems to do this on
the assumption that it is likely to have a certain intended effect, and more
often than not this assumption appears to be confirmed. This must mean
that in speakers’ minds, the memorised forms of morphemes seem to be
associated with some consistency to particular mental constituents, even
though the latter are hard to specify, and their cognitive and behavioural
effects difficult to describe.
11 If this were the case, we wouldn’t have any problem at all of course. We could then
simply say that morphemes are phoneme sequences like elk, kill or boar, which are tightly
bound to the concepts ‘elk’, ‘kill’ and ‘boar’, respectively. Meanings will then replicate
as faithfully as their forms. Unfortunately, this is completely absurd.
Towards an evolutionary theory 139

Furthermore, it can be taken for granted that some concepts, con-

cept bundles or similar mental constituents will be stable enough within
human minds and transmitted sufficiently faithfully among them to qual-
ify as replicators. To a certain extent, this can be established from a third-
person perspective and without having access to the contents of human
minds at all. Take, for instance, the concept of a bull. Obviously, the
behaviour of different people towards bulls is to a large extent similar and
predictable: most people will approach bulls cautiously if at all, many will
eat their meat or at least not be surprised if they see others do it, they will
not try to milk bulls, they won’t expect them to fly, and if they are English
speakers, they will refer to them as bulls. If we think of people’s concepts
of bulls as whatever it is in their minds that underlies their behaviour
towards them, we may be confident that much of it will be represented
in the minds of many. Also, we may be confident that much of it can be
transmitted among minds through communication. Unless this were the
case, cultural transmission of knowledge could not happen, and it clearly
does. But what these shared and transmittable concepts are is difficult
to establish in individual cases. Equally, the way in which they are tied
to the linguistic forms that help them replicate is, for the time being,
too uncertain to investigate the mechanisms by which morpheme mean-
ing might be replicated and the factors that might therefore govern their
While we may be unable to determine whether the ‘meanings’ of mor-
phemes are represented sufficiently stably in human minds and copied
faithfully enough among them for having replicator status, we may still
conclude that having mental associations to conceptual constituents
seems to be what lends sufficient longevity to phoneme sequences and
may be necessary for them to be memorised. Since this establishes the
formal constituents of morphemes as linguistic replicators, we can afford
to leave the question whether and in what way their meanings might also
be replicators unanswered. Instead, we shall be content with asserting
the following. In order to acquire replicator status, phoneme sequences
need to be associated in human competences with a (probably fuzzy) set
of constituents representing concepts. Whether or not these constituents
are also replicators is an open question. Similarly, the question of whether
mental constituents for morphological forms constitute replicating units
together with the semantic constituents they are associated with cannot
be answered. We shall therefore tentatively regard the relations between
morphological forms and semantic concepts as replicator alliances of
undetermined stability.12

12 In fact, it might be one of the characteristics of language as opposed to other commu-

nication systems, that there exist comparably tight bindings, associations or semiotic
140 Selfish Sounds and Linguistic Evolution

Attributing replicator status to phoneme sequences that are mor-

phemes – or as we should say more adequately: to competence con-
stituents ‘for’ (recognising, representing, processing and producing) mor-
phemes – does not imply anything about which particular sequences are
actually possible or likely. It merely states that there may be replicating
competence constituents for phoneme sequences, if they are stably asso-
ciated with ‘meaning’ or ‘concepts’. As is well known, of course, only
a subset of all the combinations which are theoretically possible among
phonemes really occur as words of natural languages. They seem to be
constrained by both universal and language specific phonotactic ‘prefer-
ences’. For example, all phoneme sequences attested in words are struc-
tured to form syllables in which, typically, consonants cluster around
vocalic nuclei according to certain principles (making CV combinations
such as pa, ta, da, ma, etc. more ‘likely’ than, say, CC-combinations such
as ft, tv, or dt).
We have now established with some confidence that both phonemes as
well as mental constituents for phonemic sequences that are morphemes
do qualify as linguistic replicators. This, it seems, might already provide a
reasonable basis for asking how much about their historical evolution can
be derived from this fact and to examine the insights that an evolutionary
approach to languages and their histories yields. Before doing so, however,
let us check whether our tentative inventory of linguistic replicators might
not be further enriched. Supra-segmental phonological constituents

It is well established in contemporary phonological theory that the shape
in which spoken texts come, that is, the speech chain, displays patterns

relations between the constituents of ‘meaningful’ forms, while the semiotic or associa-
tive binding between the formal and the semantic sides of signs may be relatively loose. In
Deacon’s (1997) terminology, linguistic signs should be conceived of as ‘symbols’ rather
than ‘indexes’. Although his definition of the two terms is somewhat non-standard from
the linguistic point-of-view it makes good sense in the context of the present discussion.
While indexical relations involve a stable relation between a signans type and a type of
signatum (such as Baum ←→ ‘tree’), in symbols signantia are only loosely related to a
group of signata, while at the same time being (syntagmatically) related to a set of other
potential signantia (Baum might be related to a specific set of other words Ast – wachsen –
Rinde – Blätter – Zweige – Verästelung – Wurzeln – schneiden – Holz and so on. Furthermore,
as a ‘noun’ it might be associated to all ‘verbs’, to ‘determiners’, ‘modifying adjectives’
and so on). Which of the possible signata to which a symbolic signatum is associated gets
activated in individual semioses, among other things on the presence of other signata.
If this is true, it would make it easy to understand why structuralist semantics has had
immense difficulties trying to establish word meanings. At the same time, such a view
would account for the apparent context-independence and the ‘proactivity’ of human
language and, indeed, human thought, that is, the fact that we can talk and think about
things that are not present when we speak and think.
Towards an evolutionary theory 141

which speakers recognise and process and which are independent, to a

certain extent, of the phonemic categorisations they perform. For exam-
ple, the ways in which phonemes typically combine seem to be governed
by both universal and language-specific regularities involving the rela-
tions between different classes of speech sounds. Thus, the number of
consonantal sounds that can occur in immediate succession seems to
be universally limited (and even restricted to one in some languages,
such as in many Polynesian languages (see Dziubalska (1995: 63)), and
the same is true for vowels. Speech normally displays patterns in which
more and less sonorous sounds alternate in regular ways. These regu-
larities may possibly be accounted for in terms of selection pressures on
phoneme sequences. For the moment, however, let us accept them as
empirical facts. Since it seems that speakers are sensitive to them and
behave accordingly in text production their phonological competences
must incorporate constituents for dealing with them. Current phono-
logical theory typically conceives of these constituents as defining roles
which the realisations of phonemes and phoneme combinations can play
in speech,13 and we need to consider the possibility that constituents ‘for’
these roles may represent replicators. Established phonological theories
recognise the following: ‘onsets’, that is, pre-vocalic consonants (or clus-
ters of such), ‘nuclei’, that is, vowels or highly sonorant sounds such as
nasals or liquids, and ‘codas’, that is, post-vocalic consonants (or clus-
ters). Nuclei and codas are assumed to form higher-level constituents
called ‘rhymes’, and onsets and rhymes are usually assumed to organise
into still higher-level constituents called syllables. Thus, English speak-
ers are assumed to process the morph /k æt / not simply as a sequence of
/k /, /æ/, and /t /, but as a syllable /k æt / with the onset /k / and the rhyme
/æt /, whose nucleus is /æ/ and whose coda is /t /. This yields a hierarchical
structure as in

(16) Syllable
Phonemes /k/ / / /t/.

13 The literature on the syllable is vast and growing. For the present purposes being up-to-
date does fortunately not matter, but see for example Anderson (1986), Clements/Keyser
(1983), Dziubalska (1995, 1996), Giegerich (1985), Goldsmith (1976), Harris (1994),
Hooper (1972), Kahn (1976), Lass (1976), Murray/ Vennemann (1982), Hogg/McCully
(1987), Vennemann (1988), Vincent (1986).
142 Selfish Sounds and Linguistic Evolution

Now, as far as their replicatorship is concerned, the assumed constituents

‘for’ recognising, representing and processing onsets, rhymes, codas and
syllables do appear to qualify. If thought of as constituents ‘for’ different
syllable types, such as [ [O][R [N]C]R ] as in /k æt /, [ [OOO][R [N]C]R ]
as in /str  p /, [ [O][R [NN]CC]R ] as in /haυnd /, or [ [O][R [NN]]R ] as
in /bi /, they seem to represent stable elements of phonotactic compe-
tence. They are transmitted faithfully in language acquisition, and, like
phoneme-inventories, they can impede second language acquisition. At
the same time, the set of potential syllable types is big enough to allow
for variation, selection and evolutionary change.
Apart from representing likely replicators in their own right, syllable
types also seem to enter relations with mental replicators for the formal
Gestalt of memorised morphs. Thus, when the formal Gestalt of a mor-
pheme – such as /k æt / to use the above example – replicates, the informa-
tion about the suprasegmental roles played by the phonemes /k /, /æ/ and
/t / seems to replicate with it, and it normally does so very faithfully. Also,
once they have acquired the suprasegmental structure of a morph, speak-
ers clearly do not seem to forget it, which suggests that it is as much a
part of the mental constituent ‘for’ that morph as the phonemes involved
in it. We can therefore assume that constituents for suprasegmental roles
such as syllable, onset, rhyme, nucleus and coda form integral parts of
replicators ‘for’ morpheme shapes.
Let us consider a few further suprasegmental categories. After all,
texts are not only segmented into (or produced to express sequences
of) phonemes, but can also be structured melodically and rhythmically.
Here, we shall consider rhythm. Rhythm involves the temporal relation
among recurrent units. The units which represent the basis of rhythm can
vary between languages. In some, they appear to be ‘syllables’, while in
others they are higher-level constituents, often called feet. Feet are units
in which acoustically more prominent ‘syllables’ combine with relatively
less prominent ones. In English speech, such feet typically consist of a
strong syllable that is followed, optionally, by a number of weak ones (see
Abercrombie 1964).14 Importantly, such feet may not be simple epiphe-
nomena, that is, result automatically from the fact that, lexically or other-
wise, only some syllables are selected to be expressed as prominent peaks
thereby creating dips. Instead, they seem both to be perceived as distinct
units of timing, and to influence speech production in that sense. Speakers

14 Like the syllable, the foot is a hotly debated subject in contemporary phonological theory,
so defining it in the rather straightforward manner of Abercrombie (1964) might seem
inappropriately simplistic to some. But I still find Abercrombie’s approach very much
viable. See also Dziubalska (1995), whose approach is compatible with the one taken
here, for further arguments.
Towards an evolutionary theory 143

seem to perceive the temporal distances between the prominence peaks

of neighbouring feet as being roughly equally long. Similarly, they tend to
avoid the production of feet with excessive amounts of phonemic material
in the dips, and if it cannot be helped they tend to articulate them with
increased articulatory speed, which results in the phonetic shortening of
segments within such feet, and diminishes durational differences among
feet of varying phonemic length.
There is some disagreement within the phonetic-phonological commu-
nity about whether the foot isochrony supposed to characterise languages
such as English has a verifiable, measurable basis, or whether it repre-
sents a pattern that speakers construct when they perceive speech (see
Couper-Kuhlen 1993, Laver 1994, and again Dziubalska 1995). For our
purposes, this question is not crucial however. What we need to ask is
whether mental constituents for feet, that is, for sequences of prominence
peaks and dips as units of timing, qualify as potential units of linguistic
replication, and what their relation to the replicators which we have iden-
tified so far might be.
Now, the inclination of English speakers to pronounce segments within
phonemically long feet more quickly means that they must be able to
recognise feet, that is, they must recognise sequences of one prominent
syllable plus all syllables between that and the next prominence peak as
units that contain a lift and a dip. This, in turn, implies that the compe-
tence of such speakers must contain constituents for recognising, repre-
senting and dealing with feet and varieties of them.
If this is so, then these constituents are indeed likely to qualify as repli-
cators. Since the behaviour of speakers with regard to the units of rhyth-
mic organisation does not seem to change much during their lives, con-
stituents for feet must be similarly long-lived as phonemes, and since
there seems to be little variation among speakers in this respect either, it
also appears to copy with high fidelity. It therefore seems safe to regard
(constituents for) feet as likely linguistic replicators.
Their relationship to the other replicators that we have so far iden-
tified, in particular to lexically stored phoneme sequences (morphs), is
a different matter, however. Basically, there are two possibilities. On the
one hand, morphs might be stably stored together with information about
their rhythmic structure. In lexical memory, the sequences /fɑðə / ‘father’
or /bɔ / ‘boy’ might be associated with the patterns /Sw / and /S/, for
example. In order to process a morph as a foot, however, it is not suf-
ficient to know which of its syllables can represent a prominence peak.
One also needs to know how many weak syllables follow before the next
peak, and this information does not seem to be storable in a meaningful
way, because it is largely context-dependent and will vary considerably
144 Selfish Sounds and Linguistic Evolution

between individual instances of use. Furthermore, it cannot be taken for

granted that a syllable which can figure as a prominence peak will always
do so. In sum, the rhythmic roles which individual morphs can come
to play in different utterances appear to be rather variable. This means
that their associations to constituents for such rhythmical roles might
not be stable enough to qualify as replicators in the Dawkinsian sense.
Therefore, the possibility needs to be taken into account that the rela-
tions between foot types and morphs might represent replicator alliances
of varying degrees of stability and do not constitute integrally replicating
patterns. Morpheme clusters, collocations, phrases, idioms, sentences, texts

Consider next if larger constituents, mental configurations ‘for’ combina-
tions of morphemes in complex words and larger syntagmas might also
qualify as replicating units. For many of them, this is indeed likely, in
particular for compounds and derived items. Thus, items such as baker,
serenity, blackbird, appear to be memorised in the same way as simple
morphemes such as bull or kill.
As in the case of simple morphemes, their longevity in memory is
likely to depend on whether or not, as units, they are associated in speak-
ers’ minds with a specific set of conceptual configurations. Thus, com-
plex items can of course be assembled online in language use. Then the
meanings they convey are predictable, to a large extent, from the mean-
ings of their smaller constituents. However, unless such combinations
come to be associated with particular sets of ‘concepts’ that bind them
to one another, it is unclear if they are memorised as coherent units,
in the sense that specific competence constituents ‘for’ them will estab-
lish themselves. In the case of lexicalised compounds (such as blackboard,
which means more than just a ‘black board’ or derivatives (such as freezer,
which denotes very specific machines for freezing food), however, this is
While complex lexical items with specific meanings may represent the
best examples or replicators bigger than morphemes, it does not appear
that they are the only ones. What applies to them, seems to apply in a very
similar way also to units that are still bigger. Also idiomatic phrases and
proverbs, for instance, convey meanings that cannot be easily deduced
from the meanings of their constituents. Someone who has to face the
music, does not have to sit in a concert hall, and birds of a feather that
flock together are normally not birds at all. As in the case of complex
words, the specific ‘meanings’ of such phrasal units seem to be what
binds them together, causes them to be memorised, and thus qualifies
them as replicators.
Towards an evolutionary theory 145

Furthermore, just as morphemes can only become replicators in spe-

cific languages, if their shapes obey universal as well as language specific
phonotactic constraints, also larger units need to conform to constraints
which can be described in terms of rules, word-formation rules in the
case of lexical items, and syntactic rules in the case of larger replicators.
On the whole, one gets the impression that there may be no clear-cut
difference between truly coherent replicators on the one hand, and more
temporary replicator alliances, or memeplexes, on the other. Instead,
the transition may be smooth. If the size-based argument that we devel-
oped above is correct, one may expect that constituents are less likely
to qualify as replicating units, the bigger they get. However, for many
constituents which are ‘smaller’ than whole languages but more complex
than phonemes, their copying fidelity will be difficult to establish, and
may itself be subject to historical change. Thus, one way for a ‘replicator’
to become extinct might be to ‘dissolve’ into smaller constituents, and
one way for a new ‘replicator’ to emerge might be through the clustering
of smaller components into an historically stable larger entity. Categories and rules

The replicators that we have discussed so far have had one thing in com-
mon. Apart from being mentally long-lived constituents that appear to
copy faithfully and fecundly, they also have expressions in discourse and
texts which make them easy to identify. Thus, the expressions of the mor-
pheme bull, the idiom face the music or the proverb birds of a feather flock
together will always sound similar. When we come across their expres-
sions in texts, we can assume that they indicate competences in which
they exist. Clearly, however, linguistic competences must also incorpo-
rate constituents of a more abstract kind, such as syntactic categories, for
instance, and ‘rules’ of various types. Their expressions will of course be
very diverse. In the following, it will be briefly discussed what kinds of
problems they raise for an evolutionary approach to language and how it
can be decided which of them qualify as replicators or memes. Syntactic categories and some theoretical implications

Consider syntactic categories such as sentence, noun, verb, verb phrase,
and so on. The reason why linguistic theories assume that competences
incorporate them is that this helps to describe and explain the behaviour
of speakers. Even though speakers may not be able consciously to parse
and categorise stretches of text, or name the identified categories, the
ways they react to them as well as the kinds of texts they produce suggest
that their minds perform such categorisation subconsciously. Grammat-
ical theories that attempt to model speakers’ competences are all about
146 Selfish Sounds and Linguistic Evolution

triangulating such categories by means of textual analysis, grammaticality

judgements, and so on. Usually, categories for which speaker behaviour
and texts provide indirect evidence are assumed to represent mental con-
stituents which speakers need in order to deal successfully with their
native languages. Thus, and although this might sound counterintuitive
at first, syntactic categories actually represent bits of meaning, or infor-
mation. In this sense, they are not altogether dissimilar from concepts
which peoples’ minds might host ‘for’ other aspects of external reality,
such as bulls or the act of killing somebody. Just like the concept of a
‘bull’ is the mental configuration that helps people to deal with a world
that contains bulls, one could say, so the concept of a ‘noun’ is what helps
people deal with a world that contains nouns. The only difference really
is that peoples’ minds do not only have to deal with nouns (and other
linguistic categories) but appear to be their sources as well.
Now, how are we to assess the potential replicator status of such cate-
gory concepts? Since they represent concepts we seem to be in a similar
fix as when we tried to determine whether word-meanings were replica-
tors. However, the situation is not quite as difficult. This is because the
behaviour of people with respect to bits of text that represent syntactic
constituents produces a huge body of evidence, most notably other texts.
And that evidence can be recorded and studied very comfortably. Thus,
the behavioural evidence we have of assumed mental constituents ‘for’
syntactic categories is both quantitatively and qualitatively different from
the evidence we have of people’s ‘concepts’ for dealing with other aspects
of the world, such as bulls. This is why the evolution of language repre-
sents a good testing ground for theories of cultural evolution in general.
The linguistic behaviour of speakers with regard to categories like, say
‘nouns’ or ‘verbs’ has been studied extensively and found to be surpris-
ingly uniform across speech communities. Therefore, we can be much
more confident that the concepts which they have of ‘nouns’ and ‘verbs’
are both stably represented in their minds and sufficiently similar across
different individuals (and therefore potential replicators), than we can be
in the case of concepts like those for ‘bulls’ or ‘killing’.
Of course, syntactic categories will also raise intricate questions when
it comes to assessing their replicator status. For example, syntactic cat-
egories like ‘noun’ or ‘verb’ seem small enough to qualify. But, if they
are universally present, the possibility once again arises that they might
be genetically provided rather than being spread through linguistic repli-
cation. Therefore, we might have to look for suitable units of linguis-
tic selection among more complex configurations of syntactic primitives,
such as noun phrases, verb phrases and so on. These provide the possibil-
ity of variation (between, say OV and VO in the case of verb phrases) and
Towards an evolutionary theory 147

might thus provide units for selection to choose among. Quite generally,
like all linguistic constituents, syntactic ones will also qualify as replica-
tors only if identical copies of them exist in a sufficiently large number of
competences while at the same time not being universal.
Since the empirical part in the later sections of this book will focus on
the evolution of phonological and morphological replicators, syntactic
questions will not be pursued much further here. Yet, one issue needs
to be addressed. As observed at the beginning of this section, mental
constituents for syntactic categories appear to differ from the morpho-
logical and phonological replicators which we identified earlier in that
their expressions represent a highly heterogeneous set of textual bits.
This difference, and in particular the fact that the textual expressions of
phonemes, morphemes and larger morphemic sequences tend to look
relatively similar to one another may cause a rather serious misunder-
Assume ‘N’ to be a mental syntactic category, a potential syntactic
replicator which gets expressed in discourse by elements such as man,
table, staircase and so on – that is, by any item that we would recognise
as a ‘noun’. All speakers competent in English must have a copy of that
particular constituent in their minds, even if they do not know that lin-
guists refer to it as ‘noun’, or ‘N’. ‘Having’ that constituent is what it
takes to use words that are nouns in the appropriate way, or to process
them accordingly when exposed to them. If a speaker does so, his/her
mind can be diagnosed to possess it. Assuming that such constituents
exist, what are they likely to look like? Being mental entities they must
clearly be realised, ontologically, in human brains. There, however, the
potentially replicating constituent that linguists refer to as ‘noun’ or ‘N’
will itself bear no such label of course. It will be the neural configuration
that is involved in the production and the cognitive processing of nouns,
and will ‘look like’ brain tissue. This means that the competence property
‘for’ nouns will bear no similarity whatsoever to either labels like ‘N’ or
‘noun’ or, indeed, to any of the particular nouns it ‘is for’. Of course,
this may strike you as self-evident. To assume that a neuronal constituent
involved in the processing of nouns should resemble stretches of text like
‘table’, ‘man’, or even ‘noun’ is as absurd as believing that genes for blue
eyes should be blue.
However, what strikes anybody as absurd in the field of syntax is not
so easy to recognise as wrong in phonology and even morphology. In
fact, even experts sometimes fail to do so. While individual instances of
nouns look very different from each other, and from category labels like
‘N’, or ‘noun’, this is not the case with speech sounds or morphs. For
example, the discourse realisations of a phoneme such as // are usually
148 Selfish Sounds and Linguistic Evolution

relatively similar to one another, and so are the individual occurrences of

morphemes, such as {man}, {table} or {pentathlon}. Also, in both cases
the tokens are similar to the labels they have received in standard linguis-
tic descriptions. Indicating ‘phonemes’ or ‘morphemes’, however, these
labels are not only used to refer to sets of textual tokens but at the same
time also to the assumed ‘competence properties’ ‘for’ the processing
(both active and receptive) of phones and morphs. Thus, the competence
property which ‘handles’ sounds that ‘function’ as // is usually also called
‘//’, and the competence property which is involved in the recognition
and production of ‘man’, is usually labelled {man}. Now, unless one
keeps reminding oneself that, when referring to competence properties,
labels such as // or {man} are in fact just labels which bear no similarity
to the entities they denote, one may very easily be tempted to imagine
that competence properties ‘for’ and ‘labelled’ // or {man} do in fact
‘look like’ idealised versions of textual constituents, that is, the phones
and morphs in whose processing they are involved. Conversely, one may
be tempted to imagine that the real-world expressions of phonemes or
morphemes in discourse are structurally similar to the mental entities
which ‘represent’ them in human competences.
Syntactic constituents do not lend themselves as easily to such misin-
terpretations. This is because the linguistic labels that denote syntactic
categories and/or the competence constituents that actually perform syn-
tactic categorisation do not normally ‘look like’ the textual sequences to
which they are associated. That bull nearly killed me, John hates his father,
Bill kicked his dog, My dog loves cats, Yesterday was great, This book reads
easily, and so on all realise the assumed mental pattern ‘NP VP’, with-
out, however, ‘looking’ very much ‘like it’. They ‘look’ even less ‘like’
the symbol ‘S’, which they also can be assumed to express. Thus, the
way in which []s or [i]s seem to express //, appears to be categorically
different from the ways in which John, Bill, My dog, The book or even
Yesterday ‘express’ NP. In reality, of course, this difference between syn-
tactic competence constituents on the one hand, and phonological and
morphological ones on the other is only an artefact of linguistic descrip-
tion. As we have argued, the idea that the mental constituents which
we think of as representations and which we call ‘phonemes’ or ‘mor-
phemes’ should be similar to the ‘phones’ or ‘morphs’ that are produced
and observed in discourse is mistaken. In the sense of a competence con-
stituent, a ‘phoneme’ cannot be an idealised phone that lives in human
minds. Instead, it is much more likely to represent a combination of two
things. On the one hand it is a mental configuration which causes a set
of phones (or speech sounds) to have similar effects on human minds.
Sounds that are allophones of a specific phoneme will affect minds in the
Towards an evolutionary theory 149

same way, or, as one might also put it, they will function in the same way
when interacting with minds. The phoneme itself will then be that config-
uration within a human mind which ensures that this indeed happens. On
the other hand, a ‘phoneme’ will be a mental configuration which under-
lies the production of sounds with the same effect or function. What we
mean by saying that a competence has a particular ‘phoneme’ is that it
will make its speaker perform a specifiable set of gestures under particular
conditions and react predictably and uniformly when exposed to one of
a specific class of acoustic impressions.
Conceiving of ‘phonemes’ in such a way is compatible with the notion
that they will – ultimately – ‘be’ neural configurations, a view to which
there really can be no serious alternative. Being neural configurations, of
course, phonemes can bear no similarity whatsoever to either the artic-
ulatory gestures performed when they are activated or to the particu-
lar acoustic patterns that result from this.15 Thus, the fact that we call
‘phonemes’ by similar names as sounds and articulatory gestures is a
matter of convenience with a very confusing side to it.
If one takes this into account, one will see that syntactic categories
are not different from categories such as phonemes or morphemes at
all. When we say that a competence includes the category ‘NP’, this
means that a speaker with that competence will react to a class of tex-
tual sequences (which may be as different as John, The house, Yesterday,
Whatever you say, The man I said had called earlier) in ways that have some-
thing in common (such as by knowing that they can either follow a verb
phrase or be followed by one). Similarly, the conditions under which a
speaker may produce any of such sequences of text will have certain fea-
tures in common as well. Thus, the label ‘NP’ refers to a particular way in
which speakers’ minds categorise their textual experience and modes of
behaviour, or to that configuration within a speaker’s mind by which such
categorisation is effected. In a speaker’s mind, in other words, the entity
referred to as ‘NP’ is that configuration which is responsible for the con-
sistency in his/her reactions and behaviour. Therefore, it is wrong to think
that the relation between a mental category ‘NP’ and the stretch of text
The man who lives next door should be different in principle from the rela-
tion between the mental category // and a bit of spoken text transcribed
as []. In both cases, specific textual patterns with many contingent quali-
ties trigger mental responses which could be equally triggered by different
textual patterns (such as [i] or [-i] in the case of //, or John, the house, my
15 The assumption that a phoneme might be a slightly abstracted or idealised ‘image’ of a
speech sound, passed on to some central processing unit in human brains is based on a
Cartesian view of consciousness as a kind of theatre in which pre-analysed perceptions
are displayed to a central observer, and is therefore fundamentally misconceived.
150 Selfish Sounds and Linguistic Evolution

cat etc. in the case of ‘NP’). If there is a difference then it will be that
The man who lives next door, when received, will trigger a complex set of
mental responses among which the identification of the stretch of text
as a ‘NP’ will be only one, while the mental response for which a sound
such as [] can be responsible may rarely go beyond the triggering of //.
Even more generally, we might say that there is no real difference
between any of the competence constituents that we have so far dis-
cussed. Basically, ‘phonemes’, ‘morphemes’, ‘syntactic categories’ as
well as semantic ‘concepts’ must all equally be understood as mental
constituents for recognising, representing and behaving appropriately
towards aspects of the environment in which humans live.16
Let us turn next to the final type of competence constituent whose
potential replicator status is to be discussed in this section, namely rules. Rules, phonological and otherwise

If interpreted synchronically, rules stand for assumed mental processes.
There is hardly a level of linguistic description for which modern lin-
guistic theories have not assumed rules. Since the replicators we have
so far identified are phonemes and phoneme sequences that make up
morphemes, the question of rules will also be approached in the (mor-)
phonological domain.
A phonological example of a mental process would be synchronic
assimilation rules, such as the rule devoicing /z/ to [s ] in the phrase It
was Tom /  twə[s]tɒm /. Let me briefly explain why mental phonological

rules are supposed to exist. First it is observed that certain morphemes
are pronounced in different ways. This means that in each case a variety of
different pronunciations are both recognised as and ‘intended’ to express
one and the same morpheme. Thus, the Modern English morpheme
{was} is pronounced in a variety of different ways, ranging from very
reduced [s ], or [z ] pronunciations over [əz ] or [wz ] to ‘fuller’ versions
such as [wəz ], [wəs ], [wɒs ], or [wɒz ]. Now, if one thinks of linguistic
competence as an economic production system, it makes sense to assume

16 Now, one of the reasons why I have brought syntax up is that I shall not have to say much
more about it in the rest of this book. Mostly because of my own professional background
and expertise, the more detailed case studies I shall present will focus on phonological
and morphological phenomena. Since it is fairly obvious, however, that the perspective
on language and language change that is beginning to emerge must be relevant to all
recognised levels of linguistic description, the complete exclusion of some might be
experienced as a serious omission. The reason I included the above two paragraphs,
then, is that I have felt obliged to point out that there are syntactic aspects which require
further consideration, to suggest (though admittedly in a very sketchy manner) how
they might relate to the phonological and morphological cases which I shall discuss in
some detail, and to encourage further enquiries into the matter. Thus, although it might
disrupt the flow of the narrative a bit, I will continue to include digressions of a similar
kind in the next sections.
Towards an evolutionary theory 151

that speakers don’t memorise all the variant shapes that may express a
morpheme but acquire rules for deriving some from others. Ideally (in
terms of computational efficiency), each morpheme would have just a
single ‘underlying’ form, which is stored in (lexical) memory and from
which all variants are derived by rule. At the same time, of course, it is
desirable that the assumed rules should themselves have possibly wide
coverage, that is, explain as many morphophonemic alternations as pos-
sible. In the case of {was} one would therefore want to assume (among
others) a rule that relates [z ] and [s ]. In order to find the appropriate
version of such a rule, one will take into account that /s /-variants tend
to occur primarily when {was} is followed by a word that begins with a
voiceless consonant as in
(17) It was Tom. /  twə[s]tɒm / vs.

It was Ivan. /  twə[z]a  vən /

One will also consider that the same type of [z ]-[s ] alternation occurs in
other morphemes as well, as in
(18) It is Tom. /  t  [s]tɒm / vs.

It is Ivan. /  t  [z]a  vən /

It pleases Tom. / tpliz  [s]tɒm / vs.
It pleases Ivan. / tpliz  [z]a  vən /
and so on.
All these things considered, one is likely to come up with some version
of an assimilation rule like
(19) C → [−voice] / [−voice]
Now, contrary to the constituents we have discussed so far, rules like
(19) seem to represent mental processes rather than static entities. This
may raise doubts about their stability, because while entities have spatio-
temporal integrity, processes normally don’t. So rules might not meet
the longevity criterion and fail to qualify, therefore, as linguistic repli-
cators. However, we are not asking whether mental processes are stable
but whether mental configurations are, which serve ‘for’ producing and
dealing with sounds whose occurrence in texts can be accounted for in
terms of processes. That is different. Such a mental configuration need
not itself be, and probably isn’t, dynamic at all, and that it should be long-
lived is, in principle, just as plausible as it is in the case of constituents
‘for’ other bits of text. Thinking of computer programmes is helpful here.
A programme may contain a rule in its code. As part of the code, the rule
is of course temporally stable, while only the computational events which
occur when it is actually invoked and changes the state of the central
152 Selfish Sounds and Linguistic Evolution

processing unit are transient. In this respect, mental configurations with

process-like effects, are the exact analogue of transformational rules in
classical computer programmes – and may be similarly ‘long-lived’. Once
speakers have acquired a configuration with process-like effects, they nor-
mally find it as difficult to unlearn, suppress or forget it as phonemes, and
this has similarly unwelcome effects in second language acquisition, of
The question of whether such mental configurations copy faithfully
enough to qualify as replicators is somewhat more complex, however. In
particular, it raises the problem what actually belongs to one. Take the
case of devoicing again. The configuration required to deal with it would
have to incorporate not only knowledge of the involved voiced and voice-
less phonemes, and the relation between the two, but also knowledge of all
factors that condition their relation. Of these constituents, the conditions
are clearly most problematic, particularly in the case of so-called ‘optional
processes’ whose ‘activation’ may depend on a variety of different factors,
some of which may not even be strictly speaking linguistic at all. Apart
from the actual morphonological contexts that may ‘trigger a process’ (in
our case the neighbourhood of a voiceless consonant), they can include
factors such as speech tempo, register, the age and social status of speak-
ers and hearers, their moods and their physical states, the social context
of speech situations and so on. Studies of phonological variation typically
show that there seem to be considerable differences among the ways in
which individuals speakers ‘apply’ optional processes. If there is a mental
basis for this interpersonal variation, it seems to follow that knowledge of
the conditions in which a process becomes relevant does not copy too well.
Obligatory processes, on the other hand, do normally not involve such
uncertainties. Thus, knowledge of them seems to be transmitted faith-
fully enough and their constituents will necessarily replicate together and
form integral units. As long as these constituents are themselves replica-
tors – or building blocks of replicators such as phonological features and
syntactic primitives – the process configurations in which they figure will
therefore also be. Since we have already identified phonemes as replica-
tors, we can therefore be confident that configurations ‘for’ phonological
processes will be linguistic replicators as well.

6.1.5 Résumé I: a set of likely language memes

Applying Dawkins’ criteria of copying fidelity and longevity to a selection
of competence constituents that have been deduced by linguistic theory,

17 Take German final devoicing as a classical example which speakers normally find difficult
to suppress when acquiring languages such as English.
Towards an evolutionary theory 153

we have seen that some of them are indeed likely to qualify as linguistic
replicators. Among them are competence constituents for (a) phonemes,
(b) phoneme configurations (with characteristic suprasegmental structures)
that are associated with ‘meaning’, (c) foot types, as well as phonological
(d) rules. For all of them, we could assert that they are highly long-lived
in human minds which have acquired them and normally copy faithfully.
As far as syntactic categories and configurations are concerned, we
said that they are, in principle, also likely candidates, although we did
not commit ourselves on any particular cases. Similarly, with regard to
the semantic side of language, we argued that in some way conceptual
configurations are also likely to be stably represented in and faithfully
transmitted among human minds, but we did not commit ourselves on
specifics in this regard either.
As far as more complex associations of constituents from different
domains are concerned, we discussed two cases which we preferred
to regard as potential replicator alliances rather than as proper replica-
tors. The first case concerned morphemes in the sense of formal units
that carry meaning. There, we argued that conceptual replicators, what-
ever they may eventually turn out to be, are unlikely to be the same as
the constituents which are generally regarded as the ‘meanings’ which
morphemes can convey. We suggested that form–meaning pairings as
assumed in lexicography and traditional structuralist morphology may
have no status as units in a replicator based approach to language.
Instead, we suspected that the mental associations between morphotac-
tic forms and conceptual configurations may be too loose for qualifying
morphemes – viewed as units in which meanings are bound to forms – as
proper replicators, so that we preferred to regard them as looser replica-
tor alliances, or ‘memeplexes’. Something similar applies to the second
case we discussed, namely the relation between lexically stored phoneme
sequences, or morphs, and rhythmic units such as foot types. Like in the
case of morph-meaning pairings, we argued that the associations between
them might better be regarded as temporary alliances which do not seem
to be stable enough to qualify proper replicators, or ‘memes’.

6.1.6 Résumé II: mental replicators, how to keep them apart from their
extra-mental expressions, and why this is important
An issue which came up repeatedly and particularly with regard to syn-
tactic categories and rules, was the relationship between mental linguistic
replicators, or replicating competence constituents, on the one hand, and
their expressions in discourse and text on the other. For the approach
which is being developed here, it is important to keep the two neatly
apart. An evolutionary theory of language (like any theory that follows
154 Selfish Sounds and Linguistic Evolution

the Darwinian paradigm) depends crucially on identifying the ontologi-

cal domain in which replicators exist, because even though their effects in
other domains may play decisive roles in their replication, only replicators
themselves can be subject to selection. Therefore, the issue deserves to
be highlighted and given more careful consideration.
Linguistic replicators, their effects, and the names we give to either, are
occasionally difficult to distinguish, particularly, but not exclusively, when
the names we give to assumed competence constituents are identical or
similar to their expressions, particularly in phonology and morphology.
Thus, it is confusing that a competence constituent which categorises,
represents and processes speech sounds as expressions of // should be
called //. Also, a rule like [+voice] → [−voice] / [−voice] looks con-
fusingly like the description of a process, even though, when supposed
to describe a person’s competence, it can only refer to a mental con-
figuration for dealing with sounds whose occurrence can be accounted
for in terms of a rule. Of course, the practice of naming competence
constituents after their effects or the ways in which they can be mod-
elled may have mnemonic advantages but may be easily misinterpreted.
Calling the phoneme // ‘//’ falsely suggests both that (mental) //-ness
is an intrinsic (essential) property of sounds, and that mental //s have
similar properties as the sounds they are for. They do not. Similarly,
calling a competence property X → Y / Z falsely suggests that minds
host Xs, Ys and a machinery for transforming one into the other. They
do not.
The reason why linguists call competence properties by names which
may also refer to their textual expressions is probably that linguistic the-
ory has no way of observing competence properties directly, but needs
to deduce them by observing textual products and speaker behaviour –
including their own cognitive behaviour – in relation to texts. From this
approach it follows naturally that linguists should come to think of com-
petence properties in terms of the behaviour and the textual constituents
they are ‘for’. Accordingly, they describe mental constituents in terms of
(a) how they show up in discourse and text, (b) how speakers respond
to specific textual constituents when they process them and/or (c) con-
stituents of abstract models of text production. Describing competence
constituents in terms of the behaviour and the texts they are ‘for’, or in
terms of abstract models, of course, linguists sometimes forget that they
are describing assumedly real mental entities.
If one approaches language as something that can have a history,
however, modelling linguistic competence must mean modelling real
minds, even when another purpose of such modelling is to account for
non-mental phenomena. Thus, when a linguist says that a competence
Towards an evolutionary theory 155

includes the phoneme /ɑ /, this ought to mean that this competence is
configured in a manner that will make its ‘owner’ or ‘host’ behave in a
specific way if exposed to one of a possibly rather mixed set of different
actual speech sounds (ranging, in variable contexts, possibly from [ə ] or
[ə ] over [ɒ ], [ɑ ], [a], [a
], to [a ], [æ
] [ε]), – a way which will be cate-
gorically different from the ways in which s/he behaves when exposed to
sounds outside the set. (Consider likely reactions to Take the car /kɑ /. vs.
Take the key /ki /.) In this sense, the notion of a ‘phoneme /ɑ /’ refers to
whatever in a speaker’s mind realises that configuration. It refers neither
to articulatory gestures, nor to sounds. Equally, it must not to be misread
as meaning that speakers’ minds actually host some idealised version of
[ɑ ], and it should be clear that while ‘having this phoneme’ may make
a speaker perform one of a set of articulatory gestures under specifiable
conditions, such as when the need to communicate about bras, cars, tar,
and so on, arises, it does not mean that an idealised version of this sound
is getting ‘realised’, or ‘expressed’, in the process. The competence prop-
erty /ɑ / is a mental constituent ‘for’ certain types of behaviour. It is in
no way ‘like’ that behaviour, nor ‘like’ its textual products.
The same is true of other categories of linguistic descriptions. If a lan-
guage is said to include a phonological rule like assimilatory consonant
de-voicing (C → [−voice] / [−voice]), this means that speakers will pro-
duce consonants without voicing them before unvoiced segments, even
though they will produce voiced ones in other contexts. Also, when they
hear unvoiced consonants in such environments they will distinguish, in
their behavioural reactions, between such consonants that they associate
with alternative voiced pronunciations and such that they don’t. To the
degree that such behavioural consistency is likely to have a mental basis,
it can be assumed that there will exist a common mental configuration
in speakers’ minds which underlies each behavioural event that looks
as if a ‘de-voicing process’ might be involved in it. Again, that mental
configuration is unlikely to look very much like the established linguistic
descriptions of such a process.
Or take another example. If a language is said to include, say, past
tense marking by means of a suffix {ed}, then this is meant to imply
that speakers will react in specifiable ways when they hear that suffix and
will produce it, likewise under specifiable conditions. It implies mental
constituents ‘for’ {past tense} and ‘for’ {ed}, as well as a mental config-
uration that relates the two. Neither of them will look, sound or feel like
‘the past’ or like the morpheme {ed}.
Confusing replicators and their expressions can have far reaching
effects. In genetics, it might tempt one to think that there could be a
one-to-one relationship between phenotype properties and genes. This
156 Selfish Sounds and Linguistic Evolution

misunderstanding underlies the wide-spread metaphor in which an

organism’s genome is likened to a blueprint for that organism, which
it isn’t. Instead, it is much more like a ‘recipe’, whose structure is not
isomorphic to the dish it is for (see Dawkins 1989). In linguistics, and
probably in all disciplines that deal with cognition, it tempts one to think
that minds represent reality by incorporating schematic pictures of it, that
phonemes or morphemes are mental and idealised versions of sounds
and sound sequences, and so on. If one conceives of competence con-
stituents in this fundamentally mistaken way, however, one will find it
extremely difficult to avoid further fallacies. For instance, if competence
constituents are thought to correspond straightforwardly to constituents
of linguistics behaviour and/or texts, one might conclude that all one
needs to do in order to discover differences among competences is to look
for differences among texts, and before long one will end up thinking of
language change as if it was a substitution (or even a transformation) of
some types of textual constituent by (into) other ones.
In this respect, the point of view that we have been developing here
has clear advantages. It casts language change as being brought about
through the differential replication of mental entities rather than their
‘phenotypic’ effects. It thus prevents us from restricting our view on the
latter while neglecting the former. Since we need to ask how well com-
petence constituents replicate, not how similar historically related texts
are to each other, we are forced to establish if competence constituents
with similar behavioural and textual effects can indeed be assumed to
be similar in the minds of different speakers. This represents a crucial
difference to linguistic theories which are concerned with accounting for
textual and behavioural data in terms of competence models. These do
not necessarily have to commit themselves on the question of whether the
competence models they draw may count as models of actual speakers’
minds/brains. As long as they explain behaviour and texts, they are fine.
If a single model can account for the behaviour of more than a single
speaker, this may even be better; whether it also describes how the minds
of these speakers are structured is not relevant. For the approach pursued
here, however, it is. The whole idea that competences might evolve along
Darwinian minds hinges on the question if patterns of mental organisa-
tion can be transmitted faithfully among different minds. If this is not the
case, then the whole approach is doomed from the start.
Of course, a ‘memetic’ code has not been discovered, which means
that we are still forced to hypothesise about the evolution of repli-
cating competence constituents on the basis of their behavioural and
textual expressions. In this respect we are in a similar position to the
one evolutionary biologists were in before the implications of Mendel’s
Towards an evolutionary theory 157

(1866) experiments were digested, and long before genes themselves were
actually discovered. They had to assume units of inheritance, without
knowing what those units might be. All they saw were similarities between
phenotypes. Likewise, the linguistic assumption that competences which
seem to have similar effects will also have similar structures is hypothet-
ical. Since our approach depends on this hypothesis, we need to discuss
how plausible it really is.

6.2 What memes might look like: on the material

implementation of linguistic replicators

6.2.1 The problem and why it is important

Considering evidence from speaker behaviour and its textual products
we have triangulated upon a set of competence constituents that may
indeed represent linguistic replicators, or language memes. We have sup-
posed that these replicators are materially real mental entities, and that
minds which host copies of the same replicator type are isomorphic in that
respect. Although this is clearly a strong assumption, our whole approach
hinges on it. If linguistic evolution is to proceed through the replication of
competence constituents, any two of them which are to count as copies
of each other must be structurally similar on the level on which they
are materially realised. Having similar extra-mental effects will not be
good enough as long as the possibility exists that similar effects can be
brought about by minds with completely different material structures. We
therefore need to think of competence properties in terms of materially
implemented structures.
As such this idea is not alien to the linguistic community. For all that is
known and said about competences they do count as brain-states. As we
have repeatedly pointed out, however, brain-states are not easily amenable
to empirical observation at the time being. The ways in which informa-
tion is represented in them is typically inferred from behavioural effects.
However, if a competence property is to count as a replicator it needs to
be mentally real and materially implemented in terms of identifiable pat-
terns. Abstract constructs of linguistic theory and/or description won’t
do even if they might allow us to model the functions of competence
as implemented in human brains. Unless elements of a linguistic theory
refer to material instantiations in the minds of speakers18 it is impossible
in principle to identify individual instances of them. It would therefore be
absurd to think of them as replicators, and statements to the effect that

18 Except in those of the linguists employing them, of course.

158 Selfish Sounds and Linguistic Evolution

one person’s brain hosts a copy of a linguistic constituent that also exists
in other brains would be utterly meaningless.
Clearly, this situation is somewhat uncomfortable, and it would be nice
if there were some easy way out of it. It would be great, for instance, if
there were evidence that linguistic replicators existed on levels that are
more easily accessible to empirical observation. Although all we have
observed so far speaks against this possibility, we should perhaps give it
some further thought. One linguist who has done so is William Croft
(2000). Similarly inspired as we are by the explanatory strategies that a
generalised theory of evolution seems to offer, he has proposed an evolu-
tionary model of language change which is based on the idea that ‘utter-
ances’ should be linguistic replicators. Clearly, Croft’s proposal appears
attractive at first, because it appears fairly easy to determine, by simply
comparing the physical properties of utterance stretches, whether two of
them are similar enough to count as copies of one another. However, it is
flawed in a fundamental way. This becomes clear as soon as one considers
how Croft defines an utterance. According to him, it represents
an actually occurring piece of language, completely specified at all levels of struc-
ture, including its full contextual meaning on the particular occasion of use (i.e.
speaker’s meaning). (2000: 244)

The problem is that no ‘piece of language’ that ‘actually occurs’ occurs

completely specified at all levels of structure, if one thinks of it as an
external, material, and easily observable manifestation of language. As
we have seen in section (1) above, the textual products of utterances
‘receive’ their structures only in interaction with speakers’ minds. ‘Their’
structures are not really ‘theirs’ at all, but mental constructs which they
trigger in highly complex ways. This holds for anything from phonological
segmentation to the assignment of syntactic constituent structure, and
it is even more obviously true with regard to meaning: recall that one
and the same physical pattern of sounds or graphics may convey utterly
different senses to different speakers, as in Dennett’s example Grand leg,
seize ours. In short: as soon as one considers ‘utterances’ to be structured
or meaningful in ways that emerge from their interpretation, one is no
longer talking about utterances in the sense of external manifestations
of language at all anymore, but of a multifaceted, confusing amalgam of
physical, behavioural and cognitive phenomena. Thus, Croft’s proposal
that ‘utterances’ should be regarded as linguistic replicators does not have
the advantage it claims to have, and provides no safe empirical basis at
all for measuring similarities between individual copies.
If ‘utterances’ are to have the advantage of being describable in terms
that are independent of speakers’ minds, then that can only be if one
Towards an evolutionary theory 159

defines them as raw, uninterpreted acoustic or graphic patterns. If one

does, however, innumerable difficulties will arise. It is difficult to see how
one might keep [s n] ‘son’ apart from [s n] ‘sun’, for instance, or how
one might establish that two allophonic realisations of any morpheme
actually represent variants of a single type. The acoustic or graphic pat-
terns which are allegedly so accessible to detached observation do not by
themselves suggest a degree of graining for the establishment of similari-
ties between different instances. Without minds that ‘digitise’ the speech
flow into patterns of discrete constituents, no exact copies of anything
will be identifiable on the ‘utterance’ level, and this is clearly a very bad
starting point for a theory that approaches language in terms of replica-
In short, there can be no way around the mind when one studies human
language, particularly not when one intends to view it as a system of
replicating constituents, and we shall have to face the difficulties involved
in that, however uncomfortable it might be.
So what is the state of our knowledge about how information might be
implemented in human minds? The basic problem is that in spite of some
insight provided by new techniques of observing and measuring human
brain activity, the available data are still too few and too coarse-grained
for even attempting to construct a realistic model of the neurological pro-
cesses underlying knowledge and behaviour. Furthermore, members of
the linguistic community have typically not been trained to tackle their
subject from the neurological side. This is also because until recently the
relationship between the brain as a physiological organ and the ways in
which it stores and processes information has been regarded as beyond the
grasp of science altogether, and disciplines which dealt with what can in
the widest sense be regarded as the information that human brains handle
ignored the physiological aspect of the issue more or less completely. To
some extent, this attitude has been supported, philosophically, by dualist
attitudes towards the mind–body problem, which consider the mind to
be autonomous of the brain, and assume that one will never understand
the former by studying the latter. However, in spite of heroic attempts to
make it respectable or to re-interpret it by relating ‘mental’ processes
to the obscure realm of sub-physiological quantum phenomena (e.g.
Penrose 1991, Popper and Eccles 1993), dualism can for all practical
purposes count as refuted (see, for example Dennett 1993, P. S. Church-
land 1986 or Kim 1998). Today, there is hardly a scientist who holds the
view that mind states are not fully determined by brain-states – although,
it has to be added, this does not imply that the relations between the
information and its material substrate should be straightforward or nec-
essarily tractable. However, the question of how linguistic competence is
160 Selfish Sounds and Linguistic Evolution

implemented in human brains will sooner or later have to be addressed

by any linguistic theory that professes to regard linguistic competence as
a brain-state. That our particular approach requires us to do so now is
therefore an asset rather than a drawback, even though it makes us aware
of a rather huge gap in our knowledge.

6.2.2 Outlining a tentative working model

Simplifying substantially, a brain is made up of a large number of nerve
cells or neurones.19 Roughly fourteen billion of them, namely those
involved most directly in cognition, reside in the cerebral cortex. Their
cell bodies are richly interconnected among one another via string-like
processes called axons and dendrites. Nerve cells exchange electrochem-
ical energy among each other by sending it along their axons, from where
it is transmitted to the dendrites of other cells across links called synapses.
A nerve cell will normally store the energy it receives until it reaches its
‘action-potential’. Then it ‘fires’, transmitting energy to other cells to
which it is connected. Doing so, it will excite some cells, and inhibit oth-
ers. Thus, the network which brain cells form is like an intricate web
of light-bulbs which can turn each other on or off 20 depending on the
ways in which they are linked. While being richly interconnected among
one another, the nerve cells in the cerebral cortex are also linked (albeit
sometimes indirectly, and both through direct nervous connections and
via other chemical routes) to almost all the rest of the body, including the
sense organs and the motor system, with which they can also exchange
energy. In that respect the network they constitute represents an open
This, basically, is the stuff which minds are made of. As far as present
knowledge goes, however, the states which hold and the processes which
take place on the level of neurones and the connections among them do
not translate straightforwardly into symbolic representations of higher-
level cognitive constituents or rules operating on them, as they are famil-
iar from most established linguistic theories and as introspection typically
suggests that we ‘have’. Specifically, it is difficult or even impossible to
identify links between individual neurones on the one hand and such men-
tal constituents that we assume the symbolic representations of linguistic
and other concepts to be.21 This can be taken to imply many different

19 For a thorough introduction into neuronal organisation, see Arbib/Érdi/Szentagothai

(1998); a very readable book on the topic is Rose (1993).
20 At least prevent each other from being turned on.
21 That an individual neurone cannot matter much for the cognitive capacities of a brain
becomes obvious from the fact that in all adult organisms, nerve cells keep dying at a
relatively high rate without impairing cognition in any noticeable way.
Towards an evolutionary theory 161

things. It certainly does not mean that the latter have no neuronal basis. It
is much more plausible to assume, instead, that higher-level mental con-
stituents will turn out to correspond to larger and internally structured
assemblies of – possibly locally distributed – neurones. That neurones
tend to form assemblies seems to be well established. This happens when
a set of them happen to fire simultaneously, or in close temporal suc-
cession. Then, they will form higher-level constituents which always fire
in unison, almost as if they were one big cell rather than many indepen-
dent ones.22 It was first suggested by Donald Hebb that such assemblies
might represent more probable candidates than individual neurones for
the ‘much-needed bridge between the structures found in high-level cog-
nition and the nervous system’ (Anderson 1995: 285). This idea is still
taken seriously among cognitive scientists, although it has so far with-
stood experimental testing.
Whatever the exact format in which brains store and represent infor-
mation may be, however, the key to this capacity will probably be that
they form dynamic networks, and that these networks can adjust their
own structures in response to environmental feedback. Two pieces of
evidence speak for this. First, the networks represented by neurones are
indeed capable of adjusting the qualities of inter-neuronal connections,
thus channelling the internal flow of electro-chemical energy into specific
pathways. Second, attempts to simulate learning behaviour in comput-
erised models of networks with adaptable connection strengths among
individual nodes have been highly successful. All this clearly supports
the hypothesis that cognition, learning and memory are a matter of asso-
ciation.23 The particular ways in which the internal organisation of a
brain comes to be adjusted are likely to reflect environmental feedback
on the behavioural effects of variant states. It is possible that such feedback
is transmitted to the mental network in the form of neuro-transmitters
under the control of the limbic system, a part of the brain which is likely
to play a role in emotions (Plotkin 1994). Simplifying a lot, a mind
that assumes functional states seems to be ‘rewarded’ by ‘feeling good’,
and one that assumes a dysfunctional organisation punished by ‘feeling
bad’. Configurations which incur positive feedback are thereby stabilised,

22 In fact, the study of complex systems suggests that the emergence of higher-level struc-
tures through auto-catalytic self-organisation is almost to be expected if such systems
contain a sufficiently large number of richly interconnected constituents (see, for exam-
ple, Kaufmann 1995: 54–69).
23 The notion that memory should be associative has a very long tradition. One of the first
modern psychologists to suggest that the relevant associations might actually involve
excitatory connections among ‘point[s] in the brain-cortex’ was William James (1892:
226), and the most influential advocate of this idea in the twentieth century was Donald
O. Hebb (1949), whose views underlie most contemporary connectionist approaches to
modelling learning and cognition.
162 Selfish Sounds and Linguistic Evolution

others destabilised. Thus, environmental feedback seems to be able to

select among rival patterns of mental organisation in a way which makes
mental development an adaptive, quasi-Darwinian process.24
How are we to think, then, of the internal structures that a brain devel-
ops as it acquires linguistic competence, and what are the chances that
two brains which direct their speakers’ linguistic behaviour in similar ways
and produce texts with similar structures should themselves be isomor-
phic? In order to deal with that issue let us first elaborate the idea that
information is implemented in brains in the form of neural constituents
and the connections among them.
Metaphorically speaking, a brain resembles a complex labyrinth in
which individual, many-doored rooms (nerve cells) are linked by a net-
work of innumerable corridors (dendrites, synapses, axons). Messengers
(bits of electrochemical energy) are continually racing back and forth
between them. When they enter a room, they will assemble there and
wait until the room gets too crowded (that is, the nerve cell reaches criti-
cal potential). Then, all of a sudden, they will jump to their feet and flee
the room as if in a panic (the cell fires). They will spill into the corri-
dors that lead out of the chamber and follow them until they reach other
rooms. There, the spectacle will repeat itself. But this does not seem to be
all of it. Additionally, the size of the doors and corridors is adjustable, and

24 It is of course self-evident that the adaptive potential of mental self-organisation will

itself have been selected for on the genetic level. After all, it has the overall effect that
organisms will typically interact with their environments in a way that is reasonably bene-
ficial to themselves. They will exploit the adaptability of their brains for categorising and
structuring their environments, and for acquiring and internalising appropriate forms
of behaviour. Each species will be specially adapted to learn about those environmental
aspects which are most relevant to its survival. In the case of humans, linguistic discourse
seems to represent such a vital aspect and it is predictable that humans should be genet-
ically endowed with brains that are very good at learning to deal with it.
Incidentally, the view of linguistic competences as associative networks is not alto-
gether new. Not surprisingly, it was explicitly held by Hermann Paul, who, as we have
pointed out, also thought of language constituents as ‘psychological organisms’ (1880:
27). The ideas (‘Vorstellungen’) that make up those organisms, he proposed
Werden gruppenweise ins Bewusstsein eingeführt. Es assoziieren sich die Vorstellungen
auf einander folgender Klänge, nach einander ausgeführter Bewegungen der Sprechor-
gane zu einer Reihe. Die Klangreihen und die Bewegungsreihen assoziieren sich unter-
einander. Mit beiden assoziieren sich die Vorstellungen, für die sie als Symbole dienen,
nicht bloss die Vorstellungen von Wortbedeutungen, sondern auch die Vorstellungen von
syntaktischen Verhältnissen. [. . .] So assoziieren sich auch die verschiedenen Gebrauch-
sweisen, in denen man ein Wort, eine Redensart kennen gelernt hat, unter einander.
So assoziieren sich die verschiedenen Kasus des gleichen Nomens, die verschiedenen
tempora, modi, Personen des gleichen Verbums [. . .] ferner alle Wörter von gleicher
Funktion, z.B.: alle Substantiva, alle Adjektiva, alle Verba [. . .] (1880: 26f.)
It is amazing, in retrospect, to what extent Hermann Paul’s vision anticipates current
views of linguistic competence organisation in terms of neuronal networks.
Towards an evolutionary theory 163

responds to pressure, so to speak. Pathways that are taken frequently will

expand as messenger crowds press into them, and turn into wide hallways,
practically merging the chambers they connect into whole apartments,
into whose different rooms messengers will automatically distribute as
they enter one of them (neuronal cell assemblies that always fire in uni-
son). On the other hand, corridors which are avoided by messengers will
reduce in size until they may turn, eventually, into narrow tunnels too
small to be entered by any messenger at all. Now, since wider corridors
will attract more messengers than smaller ones,25 it is likely that the wide
will get wider, while the narrower will get narrower still, so that in the
course of time, differences in corridor width will increase rather than
Next, imagine zooming away from the labyrinth until the rooms
become dots and the corridors lines. If you take a bird’s eye view of
this network and its development, you may see something like the follow-
ing. What begins as a relatively unstructured network of dots and lines,
soon develops into a landscape where more pronounced patterns become
visible: individual dots merge to form blots, some connection lines grow
fatter, and others wither away. The first visible patterns that emerge in
this way may not remain stable and quickly transform into different ones,
but eventually there will be regions of the landscape that come to rest
and settle into more stable states. One of those regions may be that in
which linguistic competence resides. How will such a region look? On
the one hand, there will be clearly discernible clusters of blots with good
internal connections among their constituents, and on the other hand,
there will be roads and highways that connect those ‘settlements’ to oth-
ers. If one watches that landscape for a certain while, one will observe
that settlements will sometimes be active and bustle with life (imagine
smoking chimneys, traffic jams, and streetlights going on), and some-
times they will be at rest. Furthermore, the active periods of a number of
settlements with good traffic connections, will seem to be synchronised,
making them appear like administrative or economic units, or districts.
In this metaphor, such administrative units may represent the configu-
rations in which information, and linguistic competence in our specific
case, is represented.
The question of whether competence constituents, or ‘language
memes’, in different brains are similar to each other now presents itself
like a problem in geographical topology. We are comparing two land-
scapes and need to determine, first, which districts and settlements in
25 Imagine that the messengers are not very intelligent, so that everybody is always heading
for the widest gate, even while narrower corridors, which would comfortably accommo-
date a small party of runners, remain completely unused.
164 Selfish Sounds and Linguistic Evolution

the two ‘brainscapes’ correspond to each other, and, second, whether

they are like each other.
Take the first problem first. One way in which we might identify coun-
terparts could be on the basis of their locations within their respective
worlds. If a specific complex of dots occupies a specific location in one
brain, we could say that whatever constellation occupies the correspond-
ing position in the other brain will be its ‘counterpart’. As far as human
brains are concerned, however, it is unlikely that this will work. Although
specific types of information may reside in similar global areas in different
brains, no particular place seems to be reserved a priori to specific bits of
knowledge. This is evident when one considers the rather straightforward
evidence of people who have suffered brain damage. It is known that they
are often able to re-acquire knowledge which was destroyed through a
brain lesion, at least to some extent. Since damaged brain areas are inca-
pable of recovery, however, this shows that brains must be able to estab-
lish constituents ‘for’ specific bits of knowledge or behaviour in various
different locations, and this means that we shall not find corresponding
mental constituents in different brains by taking down their longitudes
and latitudes. Networks of mental constituents may be isomorphic, but
differ in the ways in which they are laid out over actual brain matter.
There is another way of identifying counterparts, however. It is more
promising, because it ought to work, in principle, regardless of the par-
ticular way in which information is neuronally implemented as long as
it is a matter of constituents being associated with one another. It does
not involve locations but relations between nodes. Let us say that in one
brain we have identified a constellation of six dots (let us call it ‘A’). A
has one dot which sits in the centre and is connected to the five others in
the way that a hub connects to points on a surrounding wheel via spikes.
These surrounding dots are in turn connected to each other in a ring-
like manner, and two of them have outward connections to further dot
constellations, one consisting of four dots, all of which are connected to
one another (‘B’), and one consisting of three dots that form a triangle
(‘C’). The pattern is represented in figure 6.1, below. Thus, each of the
constellations A, B and C can be both identified and described inde-
pendently of its location, and indeed independently also of the particu-
lar ways in which their constituents (represented as dots) are materially
Let us go through a few test cases. A network which would count as
an exact copy of figure 6.1 would be the one in figure 6.2, for exam-
ple, although it looks different in terms of locations. Can you identify
the counterparts of A, B and C in it? (Hint: it will be easier if you start
with B and C.) If you have, repeat the exercise with figure 6.3. It is just
Towards an evolutionary theory 165

Figure 6.1 How to identify constituents in structured networks.

as easy. The two examples show that it is possible to establish isomor-

phies between informational networks even when nodes and lines are not
located at corresponding places.
We can now address the second problem, which is almost trivial now
that we have solved the first. Having identified corresponding network
configurations, there are several ways in which we can describe differences
among them. First, we could compare the internal structures of corre-
sponding constellations. Thus, assembly B in figure 6.3 differs from the
‘original’ in figure 6.1.26 Second, we might observe differences between
corresponding links rather than assemblies. In figure 6.2, for example, the
link between A and C is longer than its counterpart in figure 6.1, while
in figure 6.3 it is shorter. Also, we might observe that the counterpart of
one assembly is linked to more or fewer other assemblies than the origi-
nal. In (figure 6.4), for example, assembly C is not only linked to A, but
also to an additional assembly D, which is not even present in networks
(figure 6.1) to (figure 6.3).
That assemblies can differ from one another both internally and with
regard to their external links seem to undermine our strategy of identifying
counterparts. Consider the following problems. If C is defined as ‘the

26 The quadrangle lacks one of its diagonals.

166 Selfish Sounds and Linguistic Evolution

Figure 6.2 How to identify copies of network constituents.


Figure 6.3 Possible variants of constituent types.

Towards an evolutionary theory 167



Figure 6.4 More variants of constituent types.

assembly linked to A’, is C in figure 6.4 still a proper counterpart of the Cs

in figure 6.1 to figure 6.3? After all it is not merely linked to an A but also
to a D. Also, if assemblies may differ in their internal structures as well,
who says that A in figure 6.2 is really the counterpart of A in figure 6.1?
Could not figure 6.2’s B be the real counterpart of figure 6.1’s A? It may
have a different internal structure, but we have said that this is possible,
is it not? It may not be linked to two assemblies but only to one, but that
is also possible as we have said. The assembly it is linked to in figure 6.2
has an additional link to another assembly, which it lacks in figure 6.1,
but also this is possible. With so many possibilities of variation, how can
we still be sure of anything?
Clearly, the objection is justified, but there is a way out. All one needs
is a small set of fixed anchor points from which to start one’s comparison.
The ‘location’ of these anchor points would have to be established first,
and in ‘absolute terms’. That this should be possible even in the complex
case of brain structure is not at all improbable, of course. After all, some
mental structures must be linked to nerve cells that lead to constituents
in the sensori-motor system, such as, in the case of language, the tongue,
the larynx, the ears, and so on. These constituents can be established
relatively easily, of course, and if they are regarded as parts of the network
we are interested in, they can function as anchor points of just the kind
we require as safe starting points in a search for isomorphies between
higher-level brain structures along the lines just suggested.
We may therefore draw the following tentative conclusions. First, it
is possible in principle to compare networks, to identify corresponding
168 Selfish Sounds and Linguistic Evolution

constituents, and to determine how similar they are to one another. Sec-
ondly, these arguments will necessarily hold regardless of the specific
material level on which network structures are implemented in human
brains, as long as brains represent, store and process information in terms
of network structures at all. It will make no difference if, in practice, that
should be on the neuronal level, a level of neuronal assemblies, or a level
of other and internally more complex constituents.
Now, if the way in which brains store information can be described in
terms of relational networks, then this it implies a level on which ‘function’
(that is, the representation of knowledge, or behavioural options) meets
‘form’ (that is, the internal and external structures of node clusters). On
this level, constituents ‘for’ similar types of knowledge and/or constituents
of behaviour will be formally similar, that is, occupy similar relative posi-
tions and have similar structural characteristics. Network constituents
‘for’ similar knowledge and similar behavioural possibilities will thus also
be structured similarly. If they are sufficiently long-lived, and transmit-
ted at a sufficiently high rate and a sufficiently high degree of fecun-
dity, they would be perfect as candidates for the role of mental/cultural
replicators or ‘memes’ and might represent the material basis which
an evolutionary theory of culture in general and language in particular
At the same time, if memes are understood as internally structured
node assemblies embedded within larger networks, it becomes possible
to see in what sense two different instances of them could count as com-
peting ‘variants’ of one another. If the identity of memes is established
both through their internal make-up and through their position within
larger network structures, it is obvious that a single ‘position’ within a
network may be ‘occupied’ by more than one type of assembly configu-
ration, or meme. Thus, the B-constituent in (figure 6.3) is structurally
different from the B-constituents in figure 6.2) and (figure 6.1), and the
way in which both B-‘types’ relate to A and C makes them ‘variants’ of
each other. Since the frequency of one B-type in a population of net-
works will correlate negatively with the frequency of the other, they can
be conceived of as ‘rivals’ or as competitors for a specific slot within the
network. Thus, they would qualify as memetic counterparts to biological
‘alleles’, that is, different genes in competition for a single position on a

6.2.3 Summary
We have now sketched an assumption of what the material basis of
‘memes’, that is, mental replicators, linguistic or otherwise, might ‘look
Towards an evolutionary theory 169

like’. While its point has not been to develop a definite position on the
question how brains encode and process information,27 it has shown that
there can, in principle, be a level of brain organisation on which units of
information are implemented in ways that give them formal structures –
even though it might be unclear at present what level that will turn out
to be. The existence of such a level then suggests the following defini-
tion of a ‘meme’, which we shall tentatively adopt during the rest of this
(20) A ‘meme’ represents an assembly of nodes in a network of neu-
rally implemented constituents, which has (a) a definite internal
structure, (b) a definable position within a larger network con-
figuration, (c) qualifies as a replicator in Dawkins’ sense.
On this definition it is furthermore possible to regard meme variants
which can occupy the same position within types of network configura-
tions as being in competition for that position. This provides a suitable
basis for studying memetic evolution in terms of variation being brought
about by imperfect meme replication and the subsequent selection of
competing variants, as the Darwinian paradigm requires.

6.3 Sketching a few language memes

Let us now return to the likely language memes that we identified above
on behavioural and textual ground, and discuss how they might be imple-
mented on the level of organisation we have posited. We shall look at
potential ‘memes’ for phonemes, morphs and rules. The purpose of this
exercise is mainly to make the vague notion of memes as neuronal struc-
tures more plastic. Since, as Steven Pinker recently put it, ‘no one knows
[. . . what we would] see if we could crank up the microscope and peer
into the microcircuitry of the language areas’ (1994: 317), the following
sketches should not be mistaken for realistic representations of actual
neuro-biological set-ups. They are simply intended to demonstrate that
the notion of memes as neuronal constituents is ‘compatible in principle
with the billiard-ball causality of the physical universe, not just mysticism
dressed up in a biological metaphor’ (ibid.). At the same time the follow-
ing sketches may serve to illustrate what kind of models can be derived

27 It does not ‘advocate’ any of the specific connectionist models of (aspects of) linguistic
competence that have been proposed in the wake of Rumelhart/ McCleland (1986), but
is definitely highly sympathetic to the research programme. For a good overview over
connectionist achievements see Spitzer (1996), and for a defence of the approach see
the debate in McDonald/McDonald (eds. 1995). But see also Pinker (1999) or Dressler
(1999) for justified criticism of ‘greedy’ connectionism.
170 Selfish Sounds and Linguistic Evolution

from the notion that pieces of linguistic knowledge are implemented in

terms of link patterns within associative networks. They represent tenta-
tive, preliminary and fragmentary suggestions, no more. What they do
show, however, is that models with interesting properties can be derived
fairly easily. Maybe the exercise will inspire attempts to carry the approach
further. Should that turn out to be so, I would be more than happy of

6.3.1 Phone-memes
As ‘memes’ for categorising, representing, producing and interpreting
speech sounds, phone-memes will have to incorporate constituents that
(a) respond to and generalise over auditory impressions and that (b)
direct the performance of articulatory gestures. They must be linked,
externally, to constituents for recognising, representing and processing
phonemic sequences that represent the Gestalts of morphemes. Thus, the
‘phone-meme’ for English /z /, as in was, zoo, busy, goes, etc. might be
conceived of in a way similar to figure 6.5: 28
Thus, the phone-meme /z / amounts to an association between a com-
plex of specific articulatory gestures, a specific auditory impression as
well as with a set of morph-memes in whose identification /z / makes a
difference. Of course, the representation above is fragmentary and sim-
plified in many respects – not only with regard to neuronal structures,
but also with regard to the specific articulatory and auditory features it
should contain. About the neuronal aspect I know far too little to com-
mit myself, and as far as particular phonetic features are concerned, they
could clearly be discussed at great length, but their exact nature would
not contribute a great deal to the present argument. What matters here
is that a phone-meme can plausibly be conceived of as a configuration in
which mental constituents ‘for’ auditory impressions are linked to mental
constituents ‘for’ performing ‘articulatory gestures.
A few things deserve to be clarified. First, the representation must not
be conceived of as a diagrammatic blueprint of actual neural constel-
lations. As already said, it is merely supposed to demonstrate that the
functions which phonemes play can in principle be materially imple-
mented in terms of network-like structures. The particular structure
below, however, is unlikely to bear any resemblance to actual structures
as may eventually be identified in actual human brains. For example, it

28 Note that the labels in the diagram are there only for the sake of convenience. They
indicate the functions of constituents, but have themselves no neuronal status in speakers’
Towards an evolutionary theory 171

Articulatory Gestures Auditory Impression

Tongue: [corona towards alveoles] [+voicing]

Vocal folds: [+voice] [+strident]

Airstream: [+egressive] [+consonantal]

... ...

... ...

{zoo} {-es}


{...} {...}

Figure 6.5 The phone-meme /z /.

is unlikely that connections between constituents of configurations ‘for’

phone-memes, or indeed any piece of knowledge will depend on ‘cen-
tral nodes’ such as the node labelled /z / above. The node labelled /z / is
intended merely to indicate – to readers of this book – the connectedness
of the auditory and articulatory constituents involved in implementing
knowledge of /z /. Also, it should be pointed out that whatever nodes or
links are involved in implementing phonemic or other knowledge, they
will necessarily be non-symbolic. In the case of /z /, for example, none of
the involved nodes ‘symbolises’, ‘represents’, or ‘has in any way’ a quality
amounting to /z /-ness. Not even, and particularly not, the node labelled
/z /.29 That we have included a central node in our representation at all,

29 In terms of the theory of neural networks, it may correspond to a node on one of the so-
called intermediate layers, whose constituents are involved in forming abstractions of the
very type that ‘phonemes’ clearly represent (cf. Spitzer 1996: 129–32). It is thus merely
a part, albeit a crucial one, of a constellation that produces cognitive and other behaviour
‘for’ /z /. Thinking of it in terms of a ‘mental /z /’-node, however, would nevertheless be
misleading, because mental /z /-ness is not locally encoded in the node but distributed
over a larger section of the network.
172 Selfish Sounds and Linguistic Evolution

and that we have labelled it /z /, is meant to make things easier for read-
ers. Actual ‘/z /-ness’, or ‘knowledge of /z /’ is a function that emerges
from overall link patterns and from the ways in which constituents
Next, it is important to distinguish between the structure of a phone-
meme, which can be thought to implement ‘knowledge of a phoneme’
and its behaviour in ‘expression’ or ‘activation’, that is, when a phoneme
is either perceived or produced. For instance, not all of the structural con-
stituents of a phone-meme will be ‘active’ in every instance of language
use involving it. If that were the case, one would automatically produce
a phoneme whenever one hears or thinks of it, and nobody does that, of
course. Therefore, selected subsets of either articulatory or auditory con-
stituents will remain inactive even though the phone-meme is involved
in activity in one way or the other. When one whispers, for example, the
activation of articulatory voicing is systematically suppressed, and there
are many other conceivable cases by which the principle could be illus-
trated. In short, ‘knowledge’ of a phoneme can certainly be thought of as
the configuration of nodes and links which constitute a ‘phone-meme’,
but such a unit of knowledge is not identical with a unit of expression or
activity. Rather, it defines a web of pathways which make some patterns
of activation more probable than others. Thus, phone-memes represent
(comparably) stable competence constituents, that is, elements of ‘knowl-
edge’, but they will certainly not always operate as coherent units in their
expressions in actual discourse.30 And, in principle, this must be true of
all other memes as well, of course.
Before turning to morph-memes, consider the following interesting
aspect of the way in which we have represented phone-memes. Struc-
turally, they are characterised through the ways in which their constituents
are linked as well as through their ‘external’ links to ‘morph-memes’.
Now, if ‘phone-memes’ are indeed selected for – ‘rewarded’ and ‘sta-
bilised’ – during language acquisition, then there are two ways of looking
at this process. On the material, or structural level, one might think of
such ‘selection’ as a process in which a particular configuration of neu-
ral assemblies gets stabilised through being rewarded for firing together,
because this strengthens internal links. That they come to fire together at
all, however, reflects their common connection to neural assemblies ‘for’

30 This implies, quite generally, that the elements and processes involved in language use
do not necessarily have be identical with the elements and relations that define linguis-
tic competence and makes it improbable that views which model discourse as a pro-
cess in which competence constituents are put together to yield utterances will ever be
Towards an evolutionary theory 173

morphemic constituents, from which they derive the required energy.

In a way which is not even very metaphorical, one might then say that
phone-memes are ‘fed’, and ‘kept alive’ through their connections to
morph-memes. This corresponds well to the other way in which the
selection of phone-memes during language acquisition may be conceived
of, namely the more traditional, functional one. From that perspective
one would say that phone-memes are acquired because of the roles they
play in distinguishing, and recognising morph-memes. This translatabil-
ity seems to corroborate the assumption voiced above that association
networks may make it possible to relate neuronal form and cognitive
function. Should neuro-physiology find evidence for such configurations
on some level of description, then I would not be surprised if they turned
out to be the Rosetta stone for translating neuronal form into mental

6.3.2 Morph-memes
Next, turn to memetic configurations for recognising, storing and pro-
cessing the formal Gestalts of morphemes. As we observed above, such
‘morph-memes’ will have to incorporate, first, stable and good links to
phone-memes as well as to memes for the suprasegmental roles played by
the individual phone-memes. Additionally, there must be morph-meme
specific links that handle the sequential, or temporal, organisation of the
involved memes for sounds and suprasegmental roles. Finally, a morph-
meme will be linked, ‘externally’ and by lines of diverse qualities to a
variety of different ‘syntactic’ and ‘conceptual’ constituents. A mental
constituent for a simple lexical morpheme like bull might thus be drawn
as shown in figure 6.6.
Reflecting what was argued above, the diagram in figure 6.6 shows
morph-memes as configurations in which phone-memes and constituents
‘for’ supra-segmental patterns and roles are stably connected. It is hoped
that the graph speaks well enough for itself, so few words will be spent
on its interpretation. What might require an explanation, however, are
the arrow symbols, because they did not figure in the representation of
‘phone-memes’. The connections they indicate are supposed to imply
sequential rather than concomitant firings. Nodes for syllabic constituents
are thus assumed to be linked in such a way that the activation of a
node ‘for’ onsets will cause the subsequent activation of a node ‘for’
nuclei, or at least make it more likely. More will be said about such
constellations below. In most other respects the diagram can be read just
like the representation of /z / in figure 6.5.
174 Selfish Sounds and Linguistic Evolution

/υ/ /l/

‘Nucleus’ Nodes for Suprasegmentals


‘Onset’ ‘monosyllable’

Syntag -memes {bυl}

NOUN Morph-meme


Concepts [EDIBLE]

Legend: stands for memes which are themselves complex but whose internal structure
is not represented in the graph;
stands for functionally simple nodes, or nodes whose internal structure is of
no concern to the present discussion.
Figure 6.6 The morph-meme {bυl}.

As in the case of ‘phone-memes’, it is important to distinguish com-

petence constituents, that is, stable mental structures that implement
knowledge of a morpheme, from the configurations that get activated
when a morpheme ‘occurs’ in actual language use. The latter will involve
elements of the former, but the two ‘units’ must not be regarded as iden-
tical. Similarly, the presence of a central node in figure 6.6 is a mere
conceptual aid supposed to highlight the connectedness of the involved
constituents, regardless of whether in real brains such nodes may exist or
Like those in figure 6.5, the formal constellations in figure 6.6 invite an
interpretation in functional terms: just as phone-memes seem to depend,
for their stabilisation in acquisition, on the links they have to ‘morph-
memes’, so ‘morph-memes’ are stabilised through the links they have to
the conceptual configurations they express. This is beautifully compat-
ible with the functionalist position that both types of constituents exist
because of their (semiotic) functions.
Towards an evolutionary theory 175

/t/ /a/

/b/ /r/ //

/p/ /υ/

/n/ /e/
Figure 6.7 A meme for distinguishing between onsets and nuclei.

6.3.3 Memes for supra-segmentals Syllabic relations
Like other memes, memes ‘for’ supra-segmental roles and constellations
can also be modelled as configurations of nodes. Is important to show
this because they are more abstract than ‘phonemes’ or ‘morphemes’,
and the textual tokens that express them are far more diverse in form
than those which express any of the former. In established linguistic ter-
minology, concepts such as ‘onset’, ‘rhyme’, ‘nucleus’ or ‘coda’ refer to
functions of, or to relations among sounds. On the textual level, we refer
to a consonant or consonant cluster as an onset because of its relation
to a following vowel rather than because of any of its intrinsic qualities.
Now, in a network model of phonological competence, this relation might
be implemented by a link between nodes ‘for’ consonantal sounds (or,
alternatively) and nodes for ‘vocalic’ sounds (or articulatory gestures and
auditory features involved in consonantal sounds), so that activation of
the former makes the subsequent activation of the latter more likely, as
in shown in figure 6.7.
The dot on the left of the arrow represents the function ‘onset’, while
the dot on the right is a node ‘for’ nuclei. As in figure 6.5 and figure 6.6
above, there is nothing in the dots themselves which characterises them
as ‘onsets’ or ‘nuclei’, and they are not themselves symbolic. What makes
them what they are is the specific set of relations that holds among
them, as well as their respective external links. A node is an onset node,
for instance, because it is linked to a set of nodes ‘for’ specific sounds
176 Selfish Sounds and Linguistic Evolution

/s/ O
/t/ O1b /a/
/b/ //
O1c O2a
/p/ /υ/
/k/ /r/
Figure 6.8 A meme ‘for’ the phonotactics of onset clusters.

(that is, those that can ‘play the role’), and because it feeds energy
(again through a special type of link) to a node which connects to a
different set of sound-nodes (namely those that can play nucleus roles).
The same kind of relational definition is possible, by analogy, for nuclei
and codas, of course, and by a similar rationale, to other constituents,
such as configurations ‘for’ consonant or vowel clusters, rhymes or
Thinking of supra-segmental constituents in terms of specific link pat-
terns by which nodes are connected to one another and influence the
probability of one another’s activation, makes it possible to capture com-
plex phonotactic relations. Although it would clearly go beyond the scope
of this little exposition to design a network for the phonotactics of any
particular language, an exemplary demonstration of the principle seems
nevertheless to be in order. Imagine, then, a simple hypothetical language
that admits only the following onset types: (a) any single consonant, (b)
/s / followed by any voiceless stop, or liquid (for example, sk, st, sp, sl, sr),
(c) stops followed by liquids (tr, kr, pr, br and tl, kl, pl, bl), and (d) clusters
of /s / and voiceless stops followed by liquids (str, skl, spl, spr, and so on).
A network that would define the three onset positions and account for
possible relations might then look similar to figure 6.8.31
This network defines possible consonantal roles within complex onsets
in a distributed and relational manner. For instance, the fact that all

31 Which – for the sake of surveyability – does not include all the nodes and links necessary
even only for this simplified language. But the principles ought to become obvious.
Towards an evolutionary theory 177

consonants can figure in mono-consonantal onsets in CV syllables is

handled by the network through a sequential link which connects each of
them to each vowel, symbolised in figure 6.8 through the node labelled
‘O’ and the arrow leading away from it. That /s / can figure in four types of
onsets (that is, s, s[stop], s[liquid] and s[stop][liquid]), is handled through
links between /s / and four different nodes with different ‘consequences’
(O, O1a , O1b , O1c ). That /s / is like all other consonants in that it can be
followed by a liquid, reflects that in the network both /s / and all stops link
to a node which triggers liquids. The network’s structure clearly embodies
many further categorisations. Only a few have been mentioned to make
the point. Feel free to search for others.
Note, though, that we have once again used nodes and labelled them
for the sake of convenience only. The point is once again that knowl-
edge is distributed over the network and that no node is symbolic. Thus,
although the network provides for their emergence, there are no dedi-
cated nodes ‘for’ binary or ternary onsets. The connections in the net-
work are adequate ‘for producing and expecting’ all possible cluster types,
however. In this sense the link pattern in (figure 6.8) can be interpreted
as a meme, or meme complex, ‘for’ different types of [O(O(O))[N]
As already pointed out, it would clearly go beyond the scope of this
volume even to sketch a model network that could account for the far
more complex phonotactic patterns of a language such as English. In
principle, however, it will involve nodes and link patterns that define,
through their relations, syllabic roles such as Onset, Nucleus and Coda.
It is conceivable that they could also accommodate phonotactic prefer-
ences, if links work in such a way that the activation of some nodes makes
the activation of others more or less likely. Thus, (an) onset node(s) may
have stronger links to (a) nucleus node(s), for example, than to nodes
for further onset positions, making CV syllables more ‘likely’, or ‘pre-
ferred’ than CCV syllables. In such a way, the range of different syllable
structures which a language permits could be defined, and the greater
probability of some over others predicted at the same time. Other facts
such as that, in English for example, syllables which are at the same time

32 Note how the network also seems to define natural classes of consonants, which shows
how membership in a class such as ‘liquids’ or ‘stops’ correlates with the suprasegmental
roles that individual phonemes can play.
Incidentally, this way of thinking about mental configurations ‘for’ syllable types is
highly compatible with linguistic approaches that do not regard ‘syllables’ as phonological
primitives, most notably with Dziubalska’s (1995 and 2002) beat-and-binding model
of phonotactic organisation. She argues that the structures which we are used to call
syllables emerge from mutual attractions (bindings) among vocalic phonemes (‘beats’,
in Dziubalska’s terms) and consonantal phonemes (non-beats).
178 Selfish Sounds and Linguistic Evolution

‘monosyllabic foot’
Figure 6.9 A meme cluster for syllable structure.

monosyllabic feet seem to require at least two positions in the rhyme

(ruling out words such as ∗ /pυ / or ∗ /skr /) could be handled in terms of
additional links to the nodes for a second nucleus position and a coda
position from a node for monosyllabic feet, so that at least one of the
two will be activated if the node for monosyllabic feet is. The diagram
above demonstrates the complexity which is likely to characterise such a
In this diagram, most node labelling has been left out. The principles
outlined so far should help to interpret it. It illustrates how links between
a constellation ‘for’ monosyllabic feet and nodes ‘for’ rhymes or branch-
ing nuclei will make the activation of the latter more likely. Thus, the
configuration (figure 6.9) embodies knowledge of various syllable types
and the roles they can play in feet. Which of the syllables inherent in it
actually does get expressed in an utterance will of course not only depend
on the relative inherent likelihood of different syllable types to be acti-
vated in diverse rhythmic constellations, but also on other information, for
instance information transmitted from memorised morphs. Thus, {strp}
will activate the meme for OOONC-syllables even though otherwise the
links among the three onset positions may not be as strong as the links
between onset nodes and nucleus nodes (see figure 6.10 below; the roles
played by the nodes are once again labelled for the sake of readability,
-icons represent firing nodes).
The idea that phonotactic knowledge is embodied in network-like
structures makes it also conceivable that particular constituents receive
Towards an evolutionary theory 179

/s/ N1
O2 /t/ /p/


O3 {strp}

Figure 6.10 How {strip} activates the [s OOO[R NC]R ]s -meme.

/b/ N1
// /ə/


{bt} {of}
Figure 6.11 How {bit} fails to trigger C1 when occurring before {of}.

conflicting signals from different directions and implies ways in which

these might be resolved. For instance, the impact of strength relations
among the internal links of a configuration for different syllable types
may override the impact of lexically stored associations. In a phrase such
as a bit of ice, for instance, the lexically final /t / of {bit} and the /v / of {of}
might not necessarily set off the C1 node, to which they are linked, at all.
Instead, the syllabication bi § t o § f ice, in turn conditioning an aspirated
[th ], may be effected by the inherently greater strength of the ON meme
over the ONC meme.
180 Selfish Sounds and Linguistic Evolution

This indicates how syllabic structure can be more than just an epiphe-
nomenon of sequential arrangements of morphs or phonemes. The links
that define syllabic roles form structures with emergent properties of
their own. This possibility has many theoretical implications, but what
matters most for the present discussion is that configurations which
embody knowledge of prosodic relationships such as the ones expressed
in syllable structures ought, in principle, to be capable of replication in
their own right, and therefore represent integral ‘memes’ by themselves.
This is an important point because one usually tends to think of syllabic,
or rhythmic structures as properties which morph sequences have when
they are expressed in utterances, and it is therefore difficult to imagine
that knowledge of syllable or foot types should involve independent pieces
of information at all. (Can you think of a particular syllable without think-
ing of particular sound sequences at the same time?) One might therefore
find it difficult to imagine how ‘memes’ for syllabic or rhythmic structure
could represent independent units of replication. The diagrams in this
section help to make this more easily conceivable, it seems to me.33 Feet
Let us turn to feet next, English feet that is to say. In order to deal with
them successfully, a competence must be able, it would seem, to (a) distin-
guish between prominent syllables and weak syllables, (b) expect them to
alternate, (c) recognise sequences of one strong plus a variable number of
weak syllables as units of timing, and (d) calculate the probable duration
of feet on the assumption that the time span between prominence peaks
tends to be constant. If we think of competence being implemented as a
network, these tasks might call for structures of the following types: (a)
requires nodes ‘for’ recognising and producing different degrees of promi-
nence, each of them linked first to gestures for increasing and decreas-
ing articulatory effort respectively, secondly to perceived decreases and
increases of auditory intensity, and thirdly to nodes or assemblies for
those phonemic configurations (‘syllables’) with which the two degrees
of relative prominence get to be co-expressed; (b) requires the two nodes
to be linked in such a way that activation of one makes activation of the
other more likely; (c) and (d) require links between the nodes for strength

33 Of course, the model we have sketched is beautifully compatible with most contem-
porary theories of phonology, which treat syllabic structure as something that morph-
sequences ‘get assigned’ by syllabication rules rather than as something they ‘have’. At the
same time, it helps one to understand the apparent paradox that un-syllabified morph-
structures are impossible to even introspect. Since our model sees morph-memes as
being necessarily associated to memes for syllabic constituency, it is impossible to acti-
vate morph-memes without activating some syllabic structure at the same time.
Towards an evolutionary theory 181

[increase effort] [decrease effort]

[more prominence] [less prominence]

S w

‘σ’ [syllabic configurations

in which the prominence
values get expressed]
[timing unit]
Figure 6.12 A meme for foot structure.

and a possibly complex constellation for ‘timing’. In articulation, this unit

will attribute articulation times to all gestures to be performed between
one activation of the strength node and the next, and in perception it will
calculate the amount of segmental material to be expected before the next
prominence peak. A meme for feet might thus have a structure similar to
that in figure 6.12.
Given a mental foot constellation of such or a similar format, the par-
ticular activation patterns that occur in actual discourse will then depend
on a variety of other factors. In a language such as English, where the
position of stress within morphs can often be derived from the structures
of their memorised shapes, it will to a large extent be determined by such
information. For instance polysyllabic morphs will usually have one of
their syllables linked more strongly to the prominence node than others,
as figure 6.13 shows.
A link pattern of this kind would endow the first syllable of memory
with a relatively high chance of figuring as a foothead in actual utter-
ances – higher at least than the chances that the second or third sylla-
bles will do so. Memes for English open class polysyllables, which have
‘fixed stress’, are likely to involve links of such a kind. At the same time,
both the frequency of ‘stress shifts’ as in thirtéen vs thı́rtheen húndred, as
well as the fact that the prosodic roles of many monosyllabic and par-
ticularly closed-class items such as pronouns, conjunctions, auxiliaries
and so on are highly context dependent, suggest that the links between
memes for morph-shapes and memetic constituents for feet might not
always be very strong. Rather, their impacts seem to be frequently over-
ridden by the internal link patterns that characterise memes for English
182 Selfish Sounds and Linguistic Evolution

S w

σ2 σ3
/m/ /e/ /ə/ /r/ //


Legend: 1 , 2 and 3 represent subsequent firings of what way be one
and the same neuronal mode. The three events have been graphically
separated in the chapter for purposes of perceptibility.
Figure 6.13 The rhythm of memory.

feet. What this suggests is that memes for foot types may be more inde-
pendent of memes for morphs than memes for syllabic roles appear
to be.

6.3.4 Rule-memes
The discussion of possible ‘resyllabifications’ and ‘rhythmic restructur-
ings’ in the last two subsections has highlighted an issue which we have
already touched upon earlier. It is so generally important that it calls
for a principled discussion. It involves the question of how a network
state might be able to handle the dynamic processes of speech produc-
tion and reception, and the structural transformations that they seem to
bring with them. As observed in section above, many theories
of linguistic competence represent constituents for processes in terms of
‘rules’. The case we mentioned as a typical example was the phonologi-
cal assimilation of /z / to /s / before voiceless consonants as in It was [s t]
Tom. This process is often handled in terms of transformation rules like
C → [−voice]/ [−voice], which are believed to be activated in produc-
tion. Interestingly, the way the results of such processes are ‘handled’ in
perception is rarely formalised. Often it is implied that speakers will be
able to handle them in some way, simply by knowing the production rule.
Towards an evolutionary theory 183

The details are left vague however.34 While rules of the established for-
mat may work well in serially organised computer programs, however, it
is highly unlikely that actual minds/brains should implement information
‘for’ handling apparently ‘rule-governed’ relations such as those between
[s] and [z] in the above example in this manner – at least if brains are
really organised in terms of a network structure, as the most plausible
guess seems to be at the present time.
How can knowledge of a dynamic process be implemented in a static
network then? The problem is not as difficult as one might first imagine.
Take the example of /z /-devoicing in It was Tom again, and recall, first,
what kind of behaviour the network is supposed to generate. In produc-
tion, we expect it to put out [wə s] as the expression of the morph-meme
{was}, and in reception we expect it to activate concepts normally associ-
ated to the morph-meme {was} when it gets the phonetic input [wə s].35
In accordance with the way in which we represented the morph-meme
{bull}, we must assume that the morph-meme {was} incorporates a link
to the phone-meme /z /, which is supposed to activate [voicing] in pro-
duction. Activating [+voicing] automatically rules out the activation of
[−voicing] and vice versa, of course.36 How, then can [−voicing] be acti-
vated? Clearly, it must receive its input from some other source, and the
likely candidate is the /t / of Tom. In order for the network structure to
inhibit the prior activation of [+voicing] in It was Tom, little more seems
to be required than that the link between /t / and [−voicing] be so much
better than the link between /z / and [+voicing] that the energy emitted
by /t / will reach [−voicing] as soon as, or possibly sooner than, the energy
emitted by /z / reaches [+voicing], even though /t / itself might have been
activated after /z /. A graphic representation of such a network is given in
figure 6.14.

34 This may be because the problem is trickier than one might think at first sight. In our
specific case, for example, a receptive rule that voiced voiceless consonants before voice-
less ones will not work, because it would falsely interpret Jane wants to kiss Tom as /d e  n
wɑnts tə k  z tɒm /.
As we have said, this is probably because, in the wake of generativism, linguistic com-
petence has typically come to be thought of as a production system, a piece of mental
software for translating ideas, concepts, propositions or whatever into speech. Inspired
by the format of classical computer programs, the competence models constructed on
this conceptual basis naturally came to contain constituents such as variables and bat-
teries of ordered transformational rules, of which C → [−voice]/ [−voice] is of course
a prototypical example.
35 For this we have, albeit partly indirect, behavioural evidence. There is no substantial
evidence whatsoever, on the other hand, for the assumption that anywhere in a human
mind a mental version of /z / gets replaced by a mental version of /s / (and vice versa).
Instead, this assumption rests entirely on an analogical transfer of rule-formats from
logical modelling, or computer programming.
36 This implies a mutually inhibitory connection between the two constituents.
Vocal folds: [-voicing] [-voiced]
Tongue: [corona towards alveoles] [+strident]

Stricture: [close approximant] [+voiced]

Airstream: [+egressive] [+consonantal]

Vocal folds: [+voicing] [+continuant] /t/

/w/ /ɒ/ /z/



{tɒ m}

Figure 6.14 (Part of) a ‘rule-meme’ for pre-consonantal devoicing.

Towards an evolutionary theory 185

Since [+voicing] and [−voicing] contradict one another, a conflict will

arise whenever both receive energy that might otherwise set them off.
How the conflict is resolved may then depend either on the amount of
energy that each of the two constituents actually receives or on timing,
or on both. In a language where contextual devoicing is obligatory, the
connection between the memes for voiceless consonants, which trigger
it, and [−voicing] will be reliably better than the one between [+voicing]
and phone-memes for voiced consonants, while in languages with no
such assimilations, the link will not be sufficiently good. In languages
where ‘the process is optional’, the link between the triggering conso-
nant memes and [−voicing] may be of fairly good quality, but whether
enough energy will be transmitted along it quickly enough for trigger-
ing [−voicing] already during the preceding activation phase of memes
for voiced consonants may in turn depend on additional factors (excita-
tory or inhibitory), for instance on ‘memes’ for assessing the formality
of a communicative situation, or on mental constituents (not necessarily
memes) for representing the physical condition of speakers, and so on.
As far as perception is concerned, what we know is that listeners react
to the phonetic input [wə s] in It was Tom in a way which suggests that
in their minds the same concepts are activated that are also activated
when they receive [wə z]. In our model, all this implies is activation of
the {was} node. No necessity for a rule that translates the received [s]
‘back’ into a /z / arises. Questions like whether listeners first actually per-
ceive /wə s/ and then reinterpret it as /wəz / or whether they imagine
they perceive /wəz / in the first place do therefore not even need to be
Observe how difficult it would be in figure 6.14 to pin down a config-
uration that might correspond to the classical C → [−voice]/ [−voice]
rule. Even in the simplified (and probably incorrect) version we have
37 Note that it does not have to be resolved in actual communication either. There, we
normally pay little heed to the actual sounds we hear. What matters is what concepts
they activate in our minds. It is only when we consciously reflect on the process of
speech interpretation that the question if we heard an /s / or a /z / in It was Tom becomes
relevant at all, and it is a question which our minds may not be designed to answer
very well. Indeed, the network model suggests that we might get both impressions when
we attempt to ‘eavesdrop’ on the working of our minds in introspection. On the one
hand, the constituents for recognising the acoustic properties of [s], that is, stridency,
voicelessness, continuance, and consonantality are linked to the phone-meme /s /, while
on the other the morph-meme{was} links to the phone-meme /z /. It is therefore possible
that on conscious reflection speakers may equally think they heard an [s] and that they
heard a /z /. For the purposes of normal communication, their brains do not need to
have made the decision at all. All that matters is that they recognise {was} and process
it accordingly. Thus, in leaving the issue unresolved, our model may be truer to life than
phonological theories, which need to commit themselves on more issues than the brains
they proclaim to be modelling.
186 Selfish Sounds and Linguistic Evolution

sketched, it becomes obvious that there need not be a distinct ‘rule’ unit
on the level in which ‘knowledge’ of such a process ‘sits’, or ‘is repre-
sented’. There is no mental version of the rule, as in computer code. No
constituent of the configuration, which is ‘for’ the process, belongs to it
exclusively. All of them figure in other constituents and processes as well.
Yet, the ‘rule-meme’ does have an identity of its own, which is constituted
by the ways in which the various nodes that make it up are connected to
each other.

6.3.5 Summary
Since there is still much to be learnt about the processes by which brains
generate cognitive content, it is more than likely that the meme graphs
which I have drafted will turn out to be utterly unrealistic. Yet, they have
served to substantiate, I hope, the following points. First, it is not incon-
ceivable that the cognitive, or mental functions which brains serve might
have a material basis which can be described in structural terms. Second,
a systematic correlation or isomorphic relationship between patterns of
mental functions and patterns of neuronal structures ‘for’ functions can
plausibly be assumed. It would seem to follow from this that memes as we
have defined them, that is, as replicating neuronal structures with spe-
cific functions, expressions or effects, may indeed be as materially real
as the patterns of nucleotide acids that constitute genetic replicators are,
albeit possibly more difficult to observe and describe. This would then
corroborate the assumption that languages acquire, keep and therefore
have their properties because they evolve in a Darwinian manner. These
implications are exciting enough to justify, at least to a degree, the spec-
ulative way in which I have tried to give graphic shape to entities whose
exact material status is, let me stress that again, beyond our knowledge.

6.4 From replication to evolution

6.4.1 Variation and selection

In the preceding sections we have established that linguistic replicators are
conceivable. We have come to think of them as special types of ‘memes’,
that is, neuronal structures for dealing with the world in cognition and
behaviour, which can place faithful copies of themselves in other brains.
As Richard Dawkins has argued, replicators represent the sine-qua-non
of Darwinian evolution. If one understands what they are, what it takes
for them to remain stable, and how they replicate, one owns the key to
Towards an evolutionary theory 187

understanding many of the evolutionary changes that pools, populations,

and lineages of them will undergo. However, a number of important
questions still need to be addressed. In particular, it will be necessary
to develop an idea of the actual processes by which linguistic replica-
tors replicate. So far, we have only established that historically successive
competence constituents may be similar enough to count as copies of one
another. This is not good enough, because for evolution in the technical
Darwinian sense to occur, conditions have to be met which involve the
mechanics of the replication process.
Let us recall what these further conditions are. As Roger Lass puts
it, ‘variation, i.e. imperfect replication must be possible; and there must
be a selection process (what particular kind is unimportant) that biasses
survival in favour of some particular variant(s)’ (Lass 1997: 112). Now,
this is a very general way of putting it, which makes neither of the two
issues seem very problematic. That competence constituents don’t always
replicate faithfully appears obvious enough (recall how difficult it was for
us to establish that they replicate faithfully it all!), and all the examples of
language change that we have discussed so far show rather unambiguously
that not all variants always reproduce equally well. Thus, rather trivially,
languages do indeed seem to evolve historically.
There are purposes for which Roger Lass’ general assertion that lan-
guages acquire histories through the emergence of variant constituents
and the subsequent selection of some at the cost of others provides a
sufficiently solid basis. For instance, it is quite sufficient for establishing
lineages of constituents and describing changes that occur in them. If a
constituent A gets copied imperfectly to yield a constituent B, co-occurs
and competes with it over a certain time, and is eventually ousted by it, the
lineage in question will see As replaced by Bs and can be described as
(21) A A


t1 t2 t3 t4 t5 t6 t7
Nor do the details of selection matter when it comes to reconstructing
genealogies, or to charting family relationships that may come about when
different subsets of replicator populations get exposed to different selec-
tion pressures, and lineages split in the process. The tree in (22), for exam-
ple, can be established without knowing the reasons why or the means
by which competing variants came to be selected for and/or against. It
is enough to acknowledge the plain fact that selection seems to have
188 Selfish Sounds and Linguistic Evolution

(22) A
A B for ‘B’
for ‘A’

Selection Selection
for ‘B’ for ‘C’

6.4.2 Selection, agency, and time

But there are reasons for digging deeper. It is evident, first of all, that
the specific kind of selection process to which a population of replica-
tors is submitted determines what exactly gets selected. It therefore has
an obvious explanatory function, and the more we know about it, the
more we understand about the history of a replicator population. Fur-
thermore, the question arises where exactly in the selection process the
initiative lies. Simplifying a lot, there are two extreme scenarios which are
On the one hand, external selectional pressures on a population of
replicators may be more or less constant albeit regionally variable. In that
case, the initiative will be with the replicators and, depending on the types
of variants they can produce, it will diversify so that local sub-populations
come to match local environmental conditions more or less optimally. A
replicator pool will then evolve into a schematically compressed represen-
tation or model of those environmental aspects to which the replication
of its constituents is sensitive. Taking place against a stable environmen-
tal background, its evolution is then driven by a competition in which
better adapted variants oust their less fit competitors. This is, basically,
the way in which neo-Darwinian selection is assumed to work most of the
On the other hand, the environment in which a replicator population
evolves may be extremely changeable, with selectional pressures changing
directions at a rapid pace. In that case, the initiative will seem to be with
the environment. Although the distribution of replicator variants within
the pool may attempt to keep up with environmental changes and thereby
achieve a certain match between replicator qualities and external selection
pressures, this match will not be very good, and not very telling in any
case. In the most extreme scenario, a rapidly changing environment may
be deadly and put an end to all replication altogether, because it may
select against existing variants before new and better adapted ones have
a chance to emerge. Even in slightly less extreme cases, however, what
Towards an evolutionary theory 189

happens among replicators will not make for an interesting story. In order
to explain the history of replicator populations it will be sufficient to state,
in a lapidary manner, that they respond to environmental changes. Then
one will have to focus on the latter.
Of course, the two alternatives just described are not mutually exclu-
sive. Obviously, situations are conceivable in which evolutionary changes
in a replicator population are partly driven internally, that is, through
the cumulative selection of replicators that provide an ever-increasing
fit between the pool and its environment, and partly externally, that is,
through environmental changes that redefine what constitutes such a bet-
ter fit. Furthermore, even the distinction between internal and external
factors will not always be easy to draw. After all, the different replicators
within a pool represent environments for each other. Even though it may
not be a matter of black-and-white but of darker and lighter shades of
grey, however, the distinction between externally and internally driven
evolution is important. The gist is that when one wants to understand,
rather than just chart or describe, the history, or evolution, of a popu-
lation of replicators, it does matter how the selection processes they are
subjected to come about and particularly, how stable they are. There-
fore, the mere assertion that constituents of linguistic competence seem
to qualify as replicators which copy imperfectly and are subject to selec-
tion, will not get us very far. Instead, we need to understand how their
replication is brought about and what factors may influence it.
Recall the situation that obtains in biological evolution as Neo-
Darwinian theory sees it. Actual evolutionary changes have clearly been
driven both externally and internally. What Darwinian theory focuses
on, however, is the logic of those processes in which replicators them-
selves can be considered active. Naturally, the course of biological evo-
lution on our planet has repeatedly been redirected rather radically by
catastrophic external events (such as the well publicised meteor impact
which assumedly extinguished all dinosaurs except birds and paved the
way for mammals). Yet, for much of the time selection pressures and the
resulting directions of evolution seem to have reflected the following state
of affairs: new replicator variants emerged and turned out to be better
than others under environmental conditions which, from the point of
view of the evolving populations were more or less constant. Under such
conditions, evolutionary change has tended to be cumulative and adap-
tive, and produced more and more sophisticated and satisfactory matches
between replicator populations and their environments. Therefore, in the
words of Richard Dawkins, Darwinian biological evolution can be com-
pared to a slow climb up the peaks of a (metaphorical) ‘fitness landscape’
(one of Dawkins’ books is called Climbing Mount Improbable). Normally,
190 Selfish Sounds and Linguistic Evolution

replicating and mutating genes do the climbing, while the mountains

are simply there to be climbed. ‘Earthquakes’ that drastically re-shape
landscapes seem to be few and far between.
The question is whether cultural, and particularly linguistic evolution
can also be conceived of in this manner. Are there mountains that exist
long enough for language memes to climb slowly towards their tops, or is
the evolutionary landscape in which they need to persist so changeable
that it reshapes before they even have a chance to decide which way is
up? If the latter is true, lineages of replicators may still change historically
in response to quickly changing criteria of selection, but their own role
in the process will be negligible. History will simply select those which
happen to find themselves in elevated regions after each geological re-
modelling. Linguistic evolution will thus trace changes in the criteria
of selection, but be a straightforward, and only mildly interesting epi-
phenomenon of environmental developments at best. Languages must
be more than just systems of variants that replicate and get selected.
They need to be systems of active replicator lineages that mutate quickly
enough for selection to direct their evolution into specifiable directions.
Only if they are, does it make sense to say that their histories are driven
by Neo-Darwinian principles.

6.4.3 Human whim and a structuralist scare

To see why this question is crucial, consider the common sense notion
that what people do, say and learn is essentially up to them. Clearly, this
also covers the transmission of language, knowledge, culture, or ‘memes’
in general. They all depend on human agency. If common sense is correct,
however, this speaks strongly against the idea of dealing with linguistic
evolution along Darwinian lines. After all, the notion that agency rests
completely with people, casts constituents of culture – whether they are
artefacts or cognitive structures – as completely passive with regard to
their replication. Common sense suggests that cognitive content relates
to humans and their brains like printed text relates to humans and their
copying machines. It implies that, basically, all kinds of text and all kinds
of knowledge can be copied equally well. Which particular subset of them
gets copied more often than the rest depends – for all practical purposes –
on decisions made by humans, that is, ultimately on the personal prefer-
ences of the persons who actually operate the copying machines or brains.
And these, it might seem, will change quickly and unpredictably and thus
make it impossible for differences between replicators with regard to their
inherent quality as replicators to become relevant.
Towards an evolutionary theory 191

Think, for example, of a best-selling novel, such as Jurassic Park by

Michael Crichton. Consider it as a replicator. Obviously, it may be around
in the print-shops of the world in several, competing variants. These might
differ in lay-out, type-set, and, possibly, also in small textual details. Now,
it might be conceivable that some of these variants copy ‘inherently bet-
ter’ than others for a variety of possible reasons (relating to their legibility,
the amount of paper they require and so on). However, when external
demand changes too rapidly, these differences may not have time enough
to make themselves felt in quantitative terms. Thus, before print-shops
will have discovered in which lay-out their customers prefer to read Juras-
sic Park, or which version they can print more efficiently and cheaply, the
book as such may have gone out of fashion. Instead, printers may turn
to producing copies of new bestsellers (think of The Lost World, Michael
Crichton’s sequel) before the optimal version of Jurassic Park will have
come to reign supreme in the book shelves of the planet.
Generalising from this, one might come to conclude that human selec-
tion – whether of texts to be copied or languages to be transmitted – will
never be as ‘natural’, ‘automatic’, or systematically related to inherent
qualities of competing replicators/copies as ‘natural selection’. Rather it
will always be ‘artificial’, and directed by ‘catastrophic’, environmental
events. Thus, its criteria may change too rapidly for Darwin-like evolu-
tion to orient itself – just as if in biological evolution our planet would
‘change its tastes’ every ten thousand years or so, and radically reshuffle
the conditions under which life would have to evolve.
What will make things even worse – at least if one wants to understand
and explain cultural change – is that the ‘breeders’ of culture will not
only be external from the point of view of the cultural units whose evo-
lution they direct. On top of that, they will always be human too, and we
are accustomed to regarding the ways of humans as ultimately indeter-
minable. If they are, however, the courses which cultural evolution runs
will likewise be unexplainable. Instead, they will merely – and uninterest-
ingly – trace changes of fashion, or randomly altering human conventions.
Therefore, linguistic evolution will never be explainable either.
An apparently stringent argument similar to this may have motivated
Saussure to assert that language systems could not be explained histori-
cally, and some version of it may have been adopted more implicitly by
subsequent schools of ‘synchronic’ linguistics. Somewhat paradoxically,
it has gained popularity even among historical linguists, most notably
with Roger Lass (see 1980). If they are correct, we cannot expect too
much of an evolutionary approach to language. The fact that compe-
tence constituents appear to be transmitted, and often quite faithfully so,
may warrant the charting of historical lineages and the establishment of
192 Selfish Sounds and Linguistic Evolution

family relationships, but it will have no explanatory power with regard to

the changes we observe.
In particular, the selectional pressures which are conceivable from that
perspective make it unlikely that the idea of changes as gradual and cumu-
lative adaptations of linguistic replicator lineages to their environments
should lead anywhere interesting. Of course, there may be universal and
genetically provided constraints on possible competence designs, on their
expression in articulation, and on their acquisition. To the extent that they
can be considered as hardware-based, they will certainly have to count as
constant from the point of view of replicating competence constituents –
just as copying machines may be universally better or more efficient at
reproducing simple black letters on white background than at copying
intricately designed type-sets involving shades of grey and/or colour dis-
tinctions. But if they are rivalled by constraints which reflect social con-
ventions and fashions, their effects will in most cases be superseded and
masked by the latter. And since the latter constraint-types will themselves
change too rapidly for defining constant selection pressures on the repli-
cation of linguistic competence constituents, linguistic evolution will be
unable to find any direction. Therefore, the properties which any lan-
guage has at any particular point in time will NOT reflect constraints on
their replication in an interesting way. Of course, they will reflect uni-
versal, genetic or hardware-induced constraints simply by not violating
them – without, however, mirroring possible finer distinctions in relative
optimality. The constraints which they will ‘mirror’ more clearly, how-
ever, will be socially based and include many historical accidents. In short,
they will themselves represent huge explananda, and many of them will
fall outside the scope of linguistic investigation altogether.
Fortunately, the scenario just sketched is extremely implausible. It
rests on a number of highly questionable assumptions. In particular, it
distinguishes between human ‘selves’ on the one hand, and the mate-
rially implemented ‘contents’ of human brains on the other. It tacitly
attributes the development of tastes and preferences to the former, thus
casting brain content as a set of entities towards which human selves
may have attitudes. It is implied that human selves may prefer some
types of brain content over others, and decide which of them to acquire,
express, or put to use. It is basically the same view that casts languages as
tools for communication, and that makes us think of thoughts, emotions,
ideas as things which are ‘had’ by people. Although well established in
common-sense and everyday language, however, the notion of human
selves, egos, Is, souls or minds as distinct from human bodies (and in
particular brains) is highly problematic and does not hold up in critical
Towards an evolutionary theory 193

First and obviously, it raises the question in what medium, if not bodies
and brains, selves or souls are supposed to exist. This question cannot be
answered in any scientifically satisfactory way. It implies that there exists
a yet unknown ontological domain, for which there is no other motivation
than the very assumption that selves or souls are separate from brains and
bodies. It therefore epitomises an ad hoc notion which is grounded in the
very assertion that it is supposed to support.
If, on the other hand, the entities which we think of as human selves
or souls, are realised in brains and bodies, then they are on a par with
linguistic competence constituents, or language memes. Like the latter,
they must be neuronal configurations themselves. Being neuronal con-
figurations, of course, their internal structures and their places within
neuronal networks can in principle be determined in two ways. They can
either be genetically determined (programmed to become) hard-wired,
or they can be environmentally conditioned, that is, result from pro-
cesses of neuronal self-organisation directed by environmental feedback
on their effects. If such environmental feedback comes from neuronal
configurations in other brains and causes neuronal self-organisation to
produce copies of them, then the configurations which are thereby created
can count themselves as replicators. They also represent culturally trans-
mittable units of information, or memes, albeit of course not linguistic
What does this imply, then, for the role which ‘selves’ can play as envi-
ronments which select linguistic competence constituents? To the extent
that mental configurations ‘for’ human selves are genetically determined,
or hard-wired, the constraints on memetic transmission and evolution
that they represent will clearly be constant enough to allow the cumulative
selection of better adapted variants. The ‘fitness landscape’ they define
will clearly make it possible for memetic evolution to find directions.
Under such conditions, of course, the Saussurean notion that linguistic
change must be chaotic and unexplainable is not tenable. To the extent
that constituents of selves are transmittable, on the other hand, they may
themselves qualify as Dawkinsian memes – in very much the same way as
constituents of linguistic competence do. They will themselves have to be
approached accordingly. That is to say, changes in populations of memes
for ‘selves’, their ‘identities’, ‘tastes’ and other properties will have to be
explained in terms of selectional pressures against which fitter variants
will replicate better than less well adapted ones. Whatever those pres-
sures may be, whimsical human selves with randomly altering tastes and
preferences can no longer be among them.
This means that the Saussurean notion can be rejected on all accounts.
The view that human ‘selves’ will exert unexplainable and rapidly
194 Selfish Sounds and Linguistic Evolution

changing selection pressures on the evolution of mental replicators,

including language memes, is completely unwarranted. Whatever human
tastes, their preferences, and the fashions they negotiate should eventually
turn out to be, they can only contain either components that are temporar-
ily persistent enough to allow for memetic evolution to adapt to them,
or memes that co-evolve with all others. In the latter case, the relation-
ships among memes for selves, social group identity, and individual and
collective preferences will be like all relationships among replicators that
share a habitat. They will be characterised by competition, co-operation
or indifference. The common currency in which their interactions are
negotiated, will be the success with which the involved replicators man-
age to replicate in one another’s neighbourhood, and the most adequate
theory for studying them will be a generalised version of Neo-Darwinism
such as the one we are attempting to develop.
The idea of free-willed human selves that are completely unconstrained
when furnishing their minds with knowledge and putting it to use is an
anthropocentric illusion, reflecting, most probably, our ‘all-too-human’
desire to over-estimate ourselves. Although it may be firmly rooted in
common sense and natural language use, it can be confidently dismissed.
The idea that humans should be able to change their linguistic prefer-
ences more or less at will is simply wrong, and can therefore not count
as an argument against studying linguistic evolution in Neo-Darwinian
The discussion has also shown why the details of meme replication
and the factors that select among variants are so important. First, they
are crucial for deciding if linguistic evolution can reasonably be told from
the point-of-view of linguistic constituents, or whether it should be told,
more reasonably, from the point of view of the agents that select among
them, that is, humans. More importantly, however, it also depends on
them if the historical developments of languages can tell us anything
about their nature at all. Unless language change depends systematically
on the properties of language, it cannot not tell us anything about lan-
guage either. Synchrony would then indeed deserve to be separated from
diachrony, and a theory of universal grammar would not need to heed the
‘external’ evidence of historical changeability. If linguistic constituents
are active replicators, on the other hand, and if their evolutionary fate
depends – to a relevant degree – on their own properties, we cannot only
try to understand linguistic evolution from the point of view of language
itself, but will be able to define criteria for deciding which external, non-
linguistic factors a theory of language should incorporate. A theory of
grammar would then neither have to focus exclusively on language ‘in
and by itself ’, nor would it necessarily become a hopeless amalgam, or
Towards an evolutionary theory 195

a ‘theory of everything’, as Chomsky feared. Instead, it would be more

like a model of ‘the world according to language’, or rather ‘according to’
constituents of linguistic competence, or language memes.

6.4.4 Can replicators have a point-of-view of their own?

The last statement sounds wildly metaphorical, and in a sense, of course
it is. Competence constituents cannot literally see anything. But if seeing
is interpreted more generally as being sensitive and responding to exter-
nal stimuli, all sorts of entities can be said to have points-of-view. Thus,
if the success with which a replicator replicates reflects a specific factor
in its environment, then that success can be interpreted as the replica-
tor’s response to that factor. It then makes sense to say the replicator is
‘sensitive to’ it, or ‘sees’ it. It is in this sense, then, that replicators can
have points-of-view.
Consider, for example, how giraffe genes can be said to ‘see’ that leaves
often grow in high places. The genes for increased neck length spread in
the giraffe gene-pool because this increased the life-span of their host
organisms, afforded them more time for reproducing, and consequently
increased their numbers at the cost of competing gene variants. Coding
for longer necks is a property of the genes in question, and providing food
in high places is a property of its environment. The success of the ‘long-
neck gene’ emerges from the way in which the two properties interact.
Although the interaction itself takes place on the level of giraffe organisms,
it is driven, fundamentally and to a relevant degree, by the gene itself,
whose ‘expressions’, ‘vehicles’, ‘replication machines’ organisms are. The
selection of long-neck genes is therefore, ultimately, a thing between the
gene (lineage) itself and the presence of high-growing foliage in its envi-
ronment. The specific environmental aspect is selectively relevant to the
gene, and the gene is selectively sensitive to the former (see Hull 1988a,
1988b: 407–14). It ‘sees’ it, and responds by replicating well. Of course,
its sensitivity to that particular environmental factor is a consequence
of the fact that the long-neck gene is part of a team which has come to
depend on making giraffe bodies, which depend on leaves as food. Were it
not for the role that giraffe bodies play in the replication of giraffe genes,
however, their fates would be irrelevant in evolutionary terms.38 Thus,
the selection processes which bias the success of some replicators over
others can indeed be understood best from the ‘points-of-view’ of the
replicators themselves.
38 A hypothetical gene variant with effects from which giraffe bodies profited at the expense
of their reproductive success (say a gene which confers extreme longevity on a body while
making it infertile) could never be selected for, and would soon disappear from the pool.
196 Selfish Sounds and Linguistic Evolution

Since the only way of responding to its environment which a replicator

has is the number of copies or offspring it spawns, it will not be able to ‘see’
or react to any external factor that does not persist longer than the life-
span of at least two of its own generations. The longer an environmental
factor persists, the more clearly the replicator (lineage) will see it and
the more focused its response will become. In order to decide whether
linguistic evolution proceeds along the same lines and can be understood
from the point-of-view of competence constituents, we need to find out
which of the factors that are involved in their replication last long enough
for lineages of linguistic replicators to get a chance of responding to, or
seeing them, in terms of reproductive success. This issue will be discussed
in the following sections.

6.5 The hows and whys of meme replication

6.5.1 Introduction
Contrary to genes, memes – linguistic or otherwise – are not replicated by
a straightforward template copying process. Brains and their constituents
do not interact directly. Instead, memes are copied through a process
that is often referred to as imitation (see, for example, Blackmore 2000).
Although the agents which ‘do’ the imitating are normally supposed to
be ‘humans’ we have seen that this way of talking raises more questions
than it answers. The idea of human selves as autonomous overseers of
memetic replication is unfounded and likely to obscure the issue hope-
lessly. Instead, one needs to think of ‘humans’ holistically (and materi-
alistically) as the totalities of their minds and bodies. How then do the
brainy organisms that are humans manage to transmit acquired neuronal
structures among individuals? How can they ‘imitate’ what they do not
‘see’? And how, in particular, should they manage to produce sufficiently
faithful copies?

6.5.2 How can one copy what one cannot see? Revisiting
the ontological problem
It might be best to start by asking what humans (in the sense of organ-
isms with brains) do have immediate access to. To this question there are
reasonable answers. What humans perceive are contextualised instances
of human behaviour and its external consequences, including the imme-
diate results or products of that behaviour, as well as some of its further
consequences. In the case of language, they perceive how other people
speak, the textual products of such speech acts, as well as what is achieved
Towards an evolutionary theory 197

through them. In the case of other purposeful activities (getting dressed,

eating with knives and forks, playing tennis, and so on), or in the produc-
tion of artefacts of all sorts (think of tools, clothes, buildings, shoe-laces,
origami, or pieces-of-music), humans also perceive how other people per-
form certain actions and what results these have.
For an evolutionary approach to language this situation represents a
huge problem, because the relevant level for studying the phenomenon
is the level of linguistic competence.39 Thus, we seem to be facing the
paradox that the units of linguistic evolution do not seem to be accessible
by the machinery that is supposed to bring their replication about.
Not surprisingly, the problem has riddled most attempts that have
been made so far to develop evolutionary theories of cultural histories.
Since most accounts of memetic evolution that have been published so
far have remained somewhat superficial, however, they rarely address
(in formal and technical terms) the question of how memes actually
achieve sufficiently faithful replication. Instead, they typically focus on
more general ontological questions such as whether memes are cognitive
units, instances of behaviour, or their material products, that is, artefacts.
Discussions of the issue have tended to be somewhat undisciplined, and
the arguments proposed for either position of relatively low quality. For
39 Recall that several good arguments speak for this. First, and most obviously, the com-
petences of two speakers of a language are likely to be much more like each other than
their linguistics behaviours, or the texts they produce. This must be so because one
may produce infinitely many different types of texts from one competence. Therefore,
competences must necessarily copy more faithfully than texts and are much more likely
candidates for linguistic replicatorship. Of course, we have also observed that whole
competences do not copy faithfully either, and are therefore also unlikely to represent
integral units of replication. So it might be considered inappropriate to adduce the one-
to-many relationship between a competence and the texts it might spawn as an argument
against the possibility of textual as against cognitive replicators. But a similar case can be
made by comparing smaller competence constituents and their behavioural and textual
expressions. It is a truism, for example, that the actual articulatory gestures and result-
ing sounds (i.e. the allophones) which may express any given phoneme may vary greatly
between speakers, situations and so on. Yet, however different individual allophones are,
it is plausible, under specific conditions, to regard them all as expressions of one and the
same cognitive unit, and to assume that they will all trigger the same cognitive response.
If this is true, however, phonemes, or phone-memes, must count as both more stable
and more faithfully replicable than the phones, or articulatory acts that express them.
Therefore, competence constituents are indeed more likely replicators than behavioural
or textual constituents. Finally, there are competence constituents that do not materially
show up in texts at all, such as syntactic categories or even meanings. Yet, knowledge of
them is definitely transmittable, and an evolutionary theory of language which could not
talk about them would be impoverished indeed. In sum, our arguments for regarding
the cognitive level as the one on which cultural, and therefore linguistic, evolution is
played out, have been sound. We would lose more than we would gain, if we let the fact
that cognitive structures are not directly accessible to the agents of their transmission
make us change our minds and decide that textual, rather than cognitive entities are the
replicators in linguistic evolution.
198 Selfish Sounds and Linguistic Evolution

example, proponents of the view that memes should be considered as cog-

nitive units and artefacts as their expressions often seem to be inspired by
the biological scenario, where there is a clear distinction between underly-
ing genotypic replicators, hidden to the eye, and their phenotypic expres-
sions, which are external and obvious. However, there is no principled
reason why cultural evolution should mimic biological evolution in this
respect, which makes the analogical argument rather weak. On the other
hand, many memeticists who regard actual artefacts as cultural replica-
tors often tend to do so because artefacts are easier to study and talk about
than their assumed cognitive bases. Of course, the mere fact that it is dif-
ficult to study ideas, concepts, or cognitive entities (as opposed to their
behavioural and material expressions) does not exclude the possibility
that they should in fact be the replicating units on which cultural evolu-
tion hinges. Overall, one gets the impression that many of the attempts
to develop a memetic model of cultural evolution have so far displayed
a certain impatience with technical details. Blackmore (2000), who con-
tains a useful outline of some of the proposals which have been made so
far, is no exception herself, although she is refreshingly open about her
intention to take the approach as far as possible, without worrying too
much about technicalities. Finding no clear position on the ontology of
memes, she decides to

keep things as simple as possible [. . . and to] use the term ‘meme’ indiscriminately
to refer to memetic information in any of its many forms; including ideas, the
behaviours these brains structures produce, and their versions in books, recipes,
maps and written music. As long as the information can be copied by a process
we may broadly call ‘imitation’, then it counts as a meme. (66)

While this may be an option when one intends to introduce and pop-
ularise the idea of replicator based cultural evolution as well as its philo-
sophical implications, it is of course out of the question for our specific
purposes to keep things as informal as this. After all, there are a number
of theories about linguistic change which are technically highly sophisti-
cated and with which the present model needs to compete. Also, all that
is known about language shows that the cognitive level is clearly more
fundamental than the others, so that the level problem as such does not
really pose itself anymore. And this is exactly what makes the question of
how cognitive units can be replicated so urgent, of course.

6.5.3 Dawkins’ proposal: memetic information is digital

The issue has also been recognised by Richard Dawkins, and although he
has not himself proposed a definite solution, he has certainly realised its
Towards an evolutionary theory 199

importance. He seems convinced that cultural replicators are cognitive

units rather than their material effects and products. In his foreword to
Blackmore 1999, for example, he argues that even though only the prod-
ucts of cognitive units are accessible to people, they cannot be faithfully
replicated unless the cognitive units, which can be understood as instruc-
tions ‘for’ their visible effects, are copied first. He starts by observing
that often people appear to replicate artefacts, and gives the example of
‘Chinese Junks’, little origami artworks created by folding paper in certain
specific ways. Dawkins himself learnt the skill from his father and passed
it on to many of his friends at school. Soon there were many paper junks
about, each a good copy of the other. Why, Dawkins then asks, were the
artefacts copied faithfully? He then proposes that what was really copied
faithfully were not the junks themselves, but the instructions for making
them. He gives his reasons in the form of a thought experiment.

Suppose we assemble a line of children. A picture of, say, a Chinese junk is shown
to the first child, who is asked to draw it. The drawing, but not the original picture,
is then shown to the second child, who is asked to make her own drawing of it.
The second child’s drawing is shown to the third child, who draws it again, and
so the series proceeds until the twentieth child, whose drawing is revealed to
everyone and compared with the first. Without even doing the experiment, we
know what the result will be. The twentieth drawing will be so unlike the first, as
to be unrecognisable. Presumably, if we lay the drawings out in order, we shall
note some resemblance between each one and its immediate predecessor and
successor, but the mutation rate will be so high as to destroy all semblance after
a few generations. A trend will be visible as we walk from one end of the series
of drawings to the other, and the direction of the trend will be degeneration.
Evolutionary geneticists have long understood that natural selection cannot work
unless the mutation rate is low. [. . .] How then can the meme, with its apparently
dismal lack of fidelity, serve as a quasi-gene in any quasi-Darwinian process?
[. . .] Suppose we set up our Chinese Whispers Chinese Junk game again,
but this time with a crucial difference. Instead of asking the first child to copy a
drawing of a junk, we teach her, by demonstration, to make an origami model of
a junk. When she has mastered the skill and made her own junk, the first child
is asked to turn round to the second child and teach him how to make one. So
the skill passes down the line to the twentieth child. What will be the result of
this experiment? [. . .] I have not done it, but I will make the following confident
prediction, assuming that we run the experiment many times on different groups
of twenty children. In several of the experiments, a child somewhere in the line
will forget a crucial step in the skill taught him by the previous child, and the line
of phenotypes will suffer an abrupt micromutation which will presumably then
be copied to the end of the line, or until another discrete mistake is made. The
end result of such mutated lines will not bear any resemblance to a Chinese junk
at all. But in a good number of experiments the skill will correctly pass all along
the line, and the twentieth junk will be no worse and no better, on average, than
the first junk. If we then lay the twenty junks out in order, some will be more
200 Selfish Sounds and Linguistic Evolution

perfect than others, but imperfections will not be copied down the line. [. . .]
The twenty junks will not exhibit a progressive deterioration in the way that the
twenty drawings of our first experiment would.
Why? What is the crucial difference between the two kinds of experiment? It
is this: inheritance in the drawing experiment is Lamarckian ([. . .] ‘copying-
the-product’). In the origami experiment it is Weismannian ([. . .] ‘copying-the-
instructions’). In the drawing experiment, the phenotype in every generation is
also the genotype – it is what is passed on to the next generation. In the origami
experiment, what passes to the next generation is not the paper phenotype but
a set of instructions for making it. Imperfections in the execution of the instruc-
tion result in imperfect junks (phenotypes) but they are not passed on to future
generations: they are non-memetic.
[. . .] The instructions are self-normalising. The code is error-correcting
(Dawkins 1999: x–xii)

Richard Dawkins’ argument is certainly stringent. It contains two

aspects which are relevant for us.
First, it strongly supports our assumption that linguistic evolution
takes place on the competence rather than the textual level. Were not
the instructions for English being transmitted down the line of succes-
sive generations, successive generations of texts would deteriorate like
the generations of children’s drawings since every little way in which
linguistic performance comes to deviate from the norm defined in com-
petence would get passed on. English would indeed get worse and even-
tually degenerate into non-language. That something like this can indeed
happen is illustrated by cases in which medieval manuscripts came to
be copied by scribes who did not know the language of the texts they
were copying. The results were invariably ‘corrupted texts’. This then is
Dawkins’ first lesson. It supports the conclusion we arrived at on inde-
pendent grounds.
The second aspect of Dawkins’ thought experiment does not stand
out as clearly as the first. In fact, it is a tacit implication rather than
an explicit proposal. It is implicit in the suggestion that ‘the instruc-
tions are self-normalising’, and that the code in which they are written
is ‘error-correcting’. These rather mystifying expressions seem to suggest
that ‘cognitive instructions’ have a special way of ensuring the faithfulness
of their replication.
Unfortunately, Dawkins himself does not indicate how error-correction
or self-normalisation in the replication of memetic code are supposed to
work. But he has taken the issue up again in a talk delivered at the Austrian
academy of sciences in May 2000. There he suggested that memes may
copy as faithfully as they apparently do, because they consist of discrete
units arranged in specific ways. Therefore, they can be copied digitally
rather than analogically. In the case of the Chinese junk, he conceived of
Towards an evolutionary theory 201

the instructions in terms of a list of about twenty to thirty separate steps

like the following:

1. Take a square sheet of paper and fold all four corners exactly into the middle.
2. Take the reduced square so formed, and fold one side into the middle.
3. Fold the opposite side into the middle, symmetrically.
4. In the same way, take the rectangle so formed, and fold its two ends into the
middle [. . .] and so on (Dawkins 1999: xi)

Of course, the steps which Dawkins proposes are themselves pretty

complex, but they could clearly be broken down into smaller, more atom-
like units (such as (1) take a sheet of paper, (2) make it square, (3) take its
corners, (4) identify the middle of the sheet, (5) fold corners there, and so on).
The meme for Chinese junks could then be interpreted as an internally
complex cognitive structure, in which integral and separate memetic com-
ponents are associated, or arranged in a specific manner. This arrange-
ment might then indeed be reproduced ‘digitally’ – just as genetic
replication reproduces the arrangement of discrete DNA building blocks.
Now, Richard Dawkins’ proposal might strike one as unconvincing and
speculative. Thus, as we already said, the ‘instructions’ that he proposes
to represent the discrete building blocks of an assumed meme for ‘Chinese
junks’ appear themselves to be complex, even if one breaks them down
into smaller bits. The boundaries between them strike one as rather arbi-
trary. If one thinks of them as little texts, they do of course not represent
cognitive units at all, but are themselves artefacts. However, the prob-
lems in Dawkins’ proposal derive, or so it seems to me, more from the
specific example he has chosen than from the argument itself. Although
origami models of ‘Chinese junks’ are definitely smaller and less complex
entities than some of the other memes he has proposed in the past (such
as Hellfire, the Virgin Mary, or Christianity), they are still more complex
than one might be inclined to think. In particular, the amount of infor-
mation a human brain must command in order to recognise, represent,
and understand them, or to inform and control their production, is likely
to be large. It is therefore clearly too simple to think of them as a mere
twenty to thirty ‘commands’. But I don’t think this was what Dawkins
intended to suggest. Instead, his example should be interpreted as an
instructive metaphor, illustrating that an entity with superficially scalar
properties (like size or weight), which can only be transmitted analogi-
cally, might still express a set of instructions that consist of discrete units,
that can therefore be transmitted digitally and thus achieve high copying
fidelity. That the instructions for making a Chinese junk are probably not
the best examples of likely cognitive building blocks is irrelevant to the
202 Selfish Sounds and Linguistic Evolution

The question of whether anything like discrete building blocks of cog-

nition or behavioural instructions exist and what they look like still waits
for a definite answer. Recall the tentative character of our own proposal
in section (6.3) above. One might therefore regard Dawkins’ proposal
as speculative and lacking a secure basis. However, daring as the notion
might look when one thinks of it in material, or particularly, neuronal
terms, it actually merely echoes the time-honoured and well established
view that our mind-bodies deal with ‘the world out there’ by classifying
and categorising it. Both on the low levels of sensory perception and on
the higher levels of cognitive modelling, our representations of external
reality depend on and emerge from distinctions that we impose on, or
abstract from it. Perceptually, such distinctions may be dictated by the
limitations of our perceptual hardware. There are sounds we hear and
sounds we don’t hear, wavelengths of light we see and wavelengths we
don’t see, and so on. More importantly even, the already pre-selected
informational input from ‘out there’ that does reach our mind-bodies
is not perceived as a hopeless mix of seamlessly merging sensations
either, but in terms of distinct entities with distinct properties. Thus, we
distinguish – mentally – between individual objects, between self and other,
movement and stillness, up and down, in and out, desirable and undesirable,
different colours, alive or dead, and so on. Crucially, the distinctions we
construct, derive, or impose on our environment may be more clear cut
than the real data they are imposed on warrant. This is particularly obvi-
ous in cases where we impose categorical distinctions on reality even
though we suspect that the categories we create thereby may be rather
fuzzy in ‘reality’, as when we distinguish between compatriots and foreign-
ers, fruit and vegetable, good and evil, and so on. But the same is true of all
categorisations that we perform, even when we regard them as ‘obvious’
and unproblematic – as when we perceive, say, a person as either English
or not, an animal as either a dog or not, a sentence as either grammatical
or not, and so on. Examples are legion. In all cases – be they based on
perceptual limitations, or biases created by higher cognitive constituents –
the distinctions we impose on reality may be more or less plausible, more
or less functional, and more or less easy to make in individual instances,
but this does not matter for our present argument. What does matter,
is that we do make distinctions and that we base our own behavioural
choices on them. If those distinctions have a material basis in our mental
hardware – which I take it they must – then this implies that the con-
stituents of this hardware (again, no matter what their internal structure
might be) must also be in some sense independent of each other, or dis-
crete. And if this is true, it follows that they must, in principle, be able to
replicate digitally. Seen in this light, then, Dawkins’ proposal that memes
Towards an evolutionary theory 203

might be discrete components of mental instructions that can be copied

digitally, is anything but radical. It merely faces the difficulties of identi-
fying individual instances of them, and of discovering the ways in which
they are materially implemented. However, these problems are essentially
Of course, to linguists the idea that the mind processes environmen-
tal information by segmenting it into discrete units is almost old hat.
After all, it has long become hand-book lore that linguistic processing
involves the cognitive segmentation of acoustic signals which are ‘actu-
ally’ continuous and characterised by considerable overlap among their
‘constituents’. Speech does not come as a sequence of separate sounds,
morphemes, words, or sentences, but as a smooth flow of speech sound.
Yet, we clearly perceive it as if it were made up of discrete units. If discrete-
ness isn’t in the signal, it has to be created by the processor, the brain.
Therefore, phonemes, morphemes and other competence constituents
are assumed to be cognitive units. Contrary to their ‘etic’ counterparts,
or expressions, their activation in discourse is typically assumed to be a
matter of yes or no. A bit of speech may express a phoneme more or less
clearly: it may contain anything from the most prototypical representative
of the phoneme to nothing at all, or even a sound which may at other times
stand for a different phoneme. When the sound is processed mentally,
all those shades of grey are lost. In any individual instance, a phoneme is
either recognised or not. There is really nothing in between.40 Thus, lan-
guage processing represents excellent evidence for the assumption that
minds interpret, process and store external information by digitising it.
It is little surprising that the linguistic competence constituents which
we identified as potential replicators in sections 6.1.4 and 6.3, are quite
compatible with Dawkins’ proposal that minds store and process infor-
mation in discrete packages.41 He clearly could have made his own case
much more convincingly, had he chosen a simple word rather than a
Chinese junk to illustrate that mental instructions for artefacts replicate
more faithfully than the artefacts that express them.
40 Of course, one and the same sound may be processed differently by different speakers,
or by one and the same speaker at different times, and there may even be a correlation
between the sound’s prototypical occurrence as a representative of a particular phoneme
and the number of times it actually triggers the recognition of the phoneme, but this
clearly does not affect the argument developed here.
41 Let me stress again that this insight is not new, but has been well established in the
linguistic community ever since the heyday of Saussurean structuralism. Nowadays,
practically every language student knows that while the articulatory or acoustic properties
of individual instances of speech sounds or phones often make it difficult to attribute them
to different types, they are consistently categorised as realisations of discrete phonemes
when processed by listeners, and very much the same is true of sound sequences when
they are interpreted as specific morphemes.
204 Selfish Sounds and Linguistic Evolution

So, Dawkins’ proposal that culture may be capable of evolution because

its constituents copy digitally and therefore faithfully is both epistemo-
logically plausible, and strongly supported by linguistic theory. But there
are problems with it, nevertheless. In particular, it is doubtful if even digi-
tally encoded ‘instructions’ can be copied as faithfully as Dawkins thinks,
unless the humans that carry out the replication have some idea of the
product for which they are as well. In order to see the problem, imagine
another experiment, in which children are asked to tell each other the
instructions for folding Chinese junks without, however, ever seeing a
finished product and without ever actually carrying them out, not even
in their heads.
The prediction for the outcome of this experiment can again be made
quite confidently. Not knowing what these instructions are for, each child
may easily drop one line or the other, or replace a word that it does
not know too well with a similar sounding and more familiar one. The
end product, while still being a set of English sentences, is unlikely to
bear much resemblance to the original input. As in the case of repeated
drawings, errors will cumulate; and the signal degenerate. Digital or not,
pure instruction copying will thus work like Chinese Whispers. It can
therefore not be the only explanation for the high copying fidelity that
memes appear to achieve. At least in memetic replication, instructions
will be as unlikely as their products to copy faithfully by themselves. It
is only when the imitators know what the instructions they learn are for
that they will acquire them successfully.

6.5.4 The attractions of ‘purpose’

The capacity for ‘normalisation’ or ‘error-correction’ which memetic
code seems to display may therefore have as much to do with the fact
that it encodes instructions for specific products as with the fact that it may
be encoded in terms of discrete units. Of course, being digitally encoded
helps, but by itself it cannot guarantee the faithful replication of cogni-
tive instructions either. Instead, instructions will only replicate well if it
is clear what they are for, if they serve some evident purpose.
Recall that instructions must always be instructions ‘for’ something by
definition. Thus, genomes are instructions ‘for’ specific organisms with
specific behavioural options, linguistic competences are instructions ‘for’
texts in specific languages, and the instructions for folding Chinese junks
are ‘for’ these junks. Now, consider the Chinese-junk experiment again:
when a child learns how to make one, she may not necessarily receive
verbal instruction at all. She may, instead, learn the skill merely from
watching another child making a junk. In that process, she will be able
Towards an evolutionary theory 205

to identify some of the steps which the production involves. This will
be because her mind already hosts instructions or memes for those. For
example, she will already know what it means to fold paper. She will have
a concept of ‘corners’, a concept of the ‘centre’, and so on. It is likely that
some of these (possibly themselves memetic) components may indeed be
thought of as discrete units. They only need to be re-associated in a par-
ticular manner to become copies of ‘instructions’, and to establish a more
complex, larger meme, or memeplex. In this sense learning how to fold a
Chinese junk model is exactly analogous to learning a new morphologi-
cal Gestalt by memorising a particular association of pre-defined phone-
mic units or phone-memes, or to learning a phone-meme by associating
pre-defined auditory impressions with pre-defined articulatory gestures.
Now, when the child attempts to fold a junk model herself, working from
memory, her brain will re-organise and arrange the micro-instructions (or
memes) which it has already incorporated into a larger structure. Then,
she/her brain may ‘test’ the resulting macro-instruction (or memeplex)
by getting it expressed. If this achieves the desired result, she may have
another go, thereby strengthening the memory of the memeplex which
she has by now tentatively internalised. After a couple of attempts, she
will be satisfied, and the instructions, the memeplex, or macro-meme,
for making a Chinese junk will be stably represented in her brain. Now
will she be ready to turn to the next child in the sequence and pass them
on to him.
If, on the other hand, her first attempt at making a junk fails, she
will feel that she has arranged the necessary micro-memes in a wrong
way, or employed an incomplete or wrong set of them. Their current
arrangement will therefore not be memorised. Instead, the child will ask
her teacher to demonstrate the process again, perhaps more slowly, or she
will ask for explicit instruction on a step she is unsure about. Thereby,
she will be able to dismiss certain micro-instructions, adduce others,
or possibly even acquire new ones (although this may be harder than
working with known ones). Eventually, she will come to replace the first,
inadequate arrangement of micro-instructions that has formed in her
brain with one that is better at the job. By repeating the process as many
times as necessary, she will be able to restructure her brain until it actually
hosts an adequate copy of the master, or parent-memeplex, sitting in the
brain of her instructor.
Thus, a copy of a complex neuronal structure gets formed in a brain
through a combination of processes which may also reflect Darwinian
principles. From a variety of structures first assumed, or generated, those
which are evaluated as more adequate when their expressions are tested
come to be retained and strengthened. Depending on the degree of their
206 Selfish Sounds and Linguistic Evolution

- Evaluation BRAIN feed back on

Decreases increases assumes

chance to reassume STATES ENVIRON

generate effects on MENT

Figure 6.15 The internal selection of brain-states.

experienced adequacy, further variant structures may be generated, which

will once again be submitted to testing through expression and feedback
evaluation. Eventually, one (or a small set of) structure(s) will acquire
actual stability.42 It will be the one that is experienced as best suited to
solve the problem at hand, or, as one might say, the one which is best
adapted to the specific purpose. In cases of successful meme copying it
will also be a good replica of its ‘master copy’ in another brain. In short, by
being purpose-driven, meme-replication seems to exploit the Darwinian
mechanics which may underlie learning and cognitive development in
general, and which may be graphically represented as above.
If this scenario is correct, it suggests that error correction, or ‘nor-
malisation’, in memetic replication works so well not merely because it
involves discrete units (which may be very likely), but also because it is
goal-driven, and may approach its target gradually, and in repeated trials.
The copying of cognitive instructions appears therefore to be not strictly
speaking self-normalising at all. It rather appears to be normalised by the
purposes the instructions are supposed to serve.43 Thus, while possibly

42 As Henry Plotkin pointed out, following R. C. Lewontin (1970) and D. Campbell (1960),
the brain seems to operate like ‘a “Darwin machine”. That is [. . . its] transformation
in time that occurs through the workings of the processes of learning and intelligence is
the result of evolutionary processes operating within the brain’ (1994: 83). Plotkin’s own
way of schematising the general ‘principles [. . .] that describe the evolutionary process’
(ibid.) is in terms of what he calls a ‘generate-test-regenerate’ (84) heuristic, but indeed
other models such as the ‘Complex Adaptive System’ schema advocated by Gell-Mann
(1992) (see also page 95, above) are equally adequate, and sometimes slightly more
43 This is also why imperfections in the behavioural expressions of cognitive instructions
won’t normally be transmitted. If, during a demonstration, a child drops the model junk
which it is making, the child that is being instructed won’t interpret this as a step in the
production, and will not copy it either. If I cough while showing my son a dog, he will not
think the animal is called do[kchkch]og. Performance errors are thus never transmitted.
Towards an evolutionary theory 207

also being ‘digital’, meme-replication appears to differ from gene-copying

on two counts. First, gene-replication is a one time process. Once a DNA-
string has copied it cannot take itself apart anymore and reassemble itself
in a different way. Meme-replication, on the other hand, seems to allow
error correction on the product. Second, gene-replication is a blind, algo-
rithmic process. Neither the master nor the copy know what they are for.
In meme-replication, on the other hand, the purpose of a meme must
somehow be known to the brain which effects the copying, otherwise
there would be no criteria by which it could select among variant struc-
On closer reflection, however, it turns out that these differences are
really only apparent. Note that we have tacitly shifted our perspective
when switching between considering meme-replication and considering
gene-replication. Without being aware of it, we have approached meme-
replication from the point of view of the hosts, namely humans or their
brains. It is only from the point of view of a person (or a brain) who
is about to learn something that it makes sense to call first attempts at
meme-copying ‘preliminary’ and to consider only the configurations that
eventually get stabilised as ‘definite’ copies. The memes themselves do
not know about their positions in such a sequence. In gene-replication, on
the other hand, we have consistently taken the gene’s-eye point-of-view,
and have therefore not distinguished between preliminary copies and def-
inite ones. If one brings organisms into play, however, it is clearly possible
to make such a distinction. If one does, not every copy of a gene actually
qualifies as the counterpart of a ‘definite’, or mentally stabilised copy of a
meme. Only genes in mature, viable organisms do. Only they are stable in
the same sense as successfully memorised memes are. There exist many
genes in nature, that do not live in mature, viable organisms, however.
These include, particularly, genes in gametes, or sex-cells, or, on a wider
definition, genes in organisms in the early stages of their embryologi-
cal development. In a sense, all of them can be regarded as preliminary
attempts at gene copying, and can count as the analogues of ‘prelimi-
nary meme copies’. Just like ‘preliminary versions’ of memes, such ‘pre-
liminary’ versions of genes are tested for their functionality before they
become ‘stabilised’. If the expressions of a gene produce an organism that
is viable, and thereby also functions as a vehicle in which the gene is safe,
the gene will be retained. If, on the other hand, the expressions of a gene
abort, and do not lead to a viable, mature organism, the gene itself will be
discarded, and an alternative version of it will be tried out. If one looks at
it this way, then, also gene-replication may take more than a single step,
and also gene-replication appears to be goal-driven. The steps which a
gene takes to replicate come in the form of the different varieties which
208 Selfish Sounds and Linguistic Evolution

are ‘proposed’ as occupants of a specific slot on the genome of a species.

And the goal against which a gene is tested is the contribution it makes
to the expression of a viable organism.
Let us now consider meme-copying again, this time strictly keeping
the point-of-view of the replicators. In each instance of gene replication,
we have said, there is always only one attempt possible. Now, if we want
to be consistent, we cannot deny that the same is true in the memetic
realm. Every attempt in which a brain re-organises itself in response to
exposure to the expressions of a meme, has to count as an instance of
meme-replication. Not only successful attempts. There is nothing about
what we have referred to as ‘preliminary versions’ that inherently distin-
guishes them from ‘definite’ ones. They are merely copies of their master,
just as a mutant gene is a copy of its master. And just as a harmful genetic
mutation will be discarded when its expression does not result in a viable
organism, so will a preliminary meme-copy when its expression does not
pass the test of functionality. By this rationale, then, there is really no cru-
cial difference between genetic replication and memetic replication worth
speaking of. They work by the same principles. First, both of them seem
to involve the replication of discrete components. Secondly, both work by
producing a number of variants (attempts), of which only a select few are
kept, while the others are discarded. Finally, both are ‘goal-driven’, but
only in the non-teleological sense that copies whose expressions turn out
to be dysfunctional after they have been tested disintegrate, while copies
whose expressions turn out to be functional are retained, or stabilised.
As we have seen, the analogy can only be appreciated if we do not tac-
itly toggle between the perspectives of organisms and the perspectives of
replicators when comparing the two domains.

6.5.5 The teleology argument and how to get around it

Keeping strictly to the perspective of the replicator not only helps one to
see the analogy between genetic and memetic replication more clearly,
but also helps to solve an apparent problem with the proposal that meme-
copying is goal-driven. At first sight, the idea appears to rely on prescience
of the targets. In order to know when to stop revising her internalised
instructions for making Chinese junk, a child has to ‘know’ what counts as
a good specimen. And she has to know this from the start, otherwise, she
would not know when she has failed to reach her goal. By that rationale, a
child who acquires English, or indeed any language, would have to know
‘deep-down’ what ‘English’ is supposed to be like. Otherwise she wouldn’t
know when to stop learning either. This is the Platonic notion of learning
as ‘re-discovery’, of course. Humans can learn what a circle is, and learn
Towards an evolutionary theory 209

to draw imperfect but acceptable approximations, because their minds

or souls are pre-endowed with a vague memory of ‘ideal circles’, which
they glimpsed before they entered human bodies and floated through
the world of ideas. However, from the strictly materialist perspective that
we have so far taken, it make sense to assume that our minds should
be prescient of potential memes. Miracles simply don’t occur. In order
to save our model of meme-copying, we therefore need to replace the
teleological notion of ‘goal-orientedness’ by something more plausible.
Consider biological replication once again. There, the information that
a gene is viable is fed back to it from the environment via its phenotypic
expression. The gene itself does not have to ‘know’ at all whether it is a
faithful copy of its parent, and in the case of viable mutations it actually
won’t be. It expresses blindly. But the environmental feedback it receives
will either allow the gene to remain stable and keep replicating, or dis-
integrate it. The same is true of memes. The set of instructions for a
Chinese junk do not themselves ‘know’ that they are good copies of their
parent memeplex either.44 All they come to ‘know’ is if their expression
incurs positive environmental feedback. If it does, they will be retained,
if it does not, they won’t. Therefore, in order for a particular brain-state
to get stabilised, no one needs to know that a ‘goal’ has been reached
at all. Instead, the ‘functionality’ of a brain state will be assessed after it
is created and without reference to a pre-defined target. Both the envi-
ronmental factors which trigger the feedback a brain-state receives, and
the mechanisms through which it receives the feedback can be assumed
to antedate the emergence of the brain-state thus tested. Therefore, its
selection does not involve the slightest trace of teleology.

6.5.6 How and why neuronal structures (including memes) receive

environmental feedback
Of course, this scenario raises further questions. First and foremost, there
is the obvious, general, and impossibly huge question of what exactly
it might be in their environment that neuronal structures, and specifi-
cally memes, are sensitive to. Put like this, the question does not appear
addressable, because the set of potential candidates is open. However,
one thing can safely be said. The environment which is closest to neu-
ronal structures will be made up of other neuronal structures, as well as
the substances and electro-chemical processes that are present in human
brains. It is only with them that neuronal structures can interact directly.
44 Note that the individual hosting the memeplex does not know either, since the neuronal
structures which her own brain hosts are as inaccessible to her as those hosted by the
brain of her teacher.
210 Selfish Sounds and Linguistic Evolution

Like the rest of a body, of course, a brain represents the expression

of a genome, and owes its design to the processes of Darwinian evolu-
tion, acting in the interest of genes. As indicated above, the expressions
of a genome can be regarded as predictions, or hypotheses about its envi-
ronment, in the sense that coding for them is predicted to increase the
genome’s chances of reproducing. Now, the expressions of a genome do
not only comprise the body it codes for. Instead, some of them trans-
late into cognitive and behavioural guidelines for its body. Of course, in
the case of organisms with brains, much of their cognition and behaviour
appears to be immediately and actively determined by instructions emerg-
ing in their brains rather than being generated directly by genomes. Still,
brains are made and supposed to work for the benefit of genes. Therefore,
the types of behaviour they inform and the cognitive processes they imple-
ment, should not, on the whole, counteract the interest of their genomes.
If they did, it would be fatal to have a brain. This means that even if genes
may grant a certain amount of independence to their brains they must
still be able to constrain the configurations which brains can assume.
They must have a way in which they can translate their own, crude but
evolutionarily viable, understanding of the world into constraints on the
states which their brains will come to assume. From the point of view
of neuronal structures, however, this means that the constraints imposed
upon them by their genomes must be among the environmental factors
to which they are most sensitive. If this is so, we may approach the ques-
tion about the environmental feedback which is likely to ‘select’ neuronal
configurations in human brains in the following manner. First, we need
to identify some of the hypotheses which human genomes may plausi-
bly have come to incorporate about their environment, and ask how they
might translate into guidelines or preferences for behaviour. Secondly,
we need to reflect upon how they may convey that information to human
brains in terms of constraints on their organisation. Of course, nothing
even close to a complete account of the factors that may be involved
in the selection of brain-states will emerge in that process. Yet, we may
develop at least a general idea of the types of factors and the kinds of
mechanisms by which specific neuronal configurations may come to be
stabilised during cognitive development.
Because it is slightly more tangible and because there exist rather well
established hypotheses about it, I shall deal with the second problem first
and present current views on the ways in which brain-states are assumed
to receive feedback on their expressions. Only afterwards shall I turn to
the first problem. Of course, since our focus is here on a specific subset of
neuronal configurations, namely those that replicate, I will concentrate
Towards an evolutionary theory 211

on such genetic hypotheses, or biases, that may have come to play a

particular role for the emergence of such replicating brain-structures.

6.5.7 Emotions and instincts

As far as the mechanisms by which genomes communicate their hypothe-
ses about the world to brains are concerned, Daniel Dennett, Henry
Plotkin and others have suggested that the factors which are most likely
to have an immediate impact on the stability of an acquired mental struc-
ture are the emotional responses they trigger. Emotional responses have
an instinctive basis and are likely to represent the medium in which
genomes communicate with brain-states. ‘Emotions are post-cards from
our genes telling us, in a direct and non-symbolic manner, about life
and death’, as Plotkin puts it (1993: 208). In slightly different words,
Daniel Dennett argues that our brains, though highly malleable, still ‘are
genetically endowed with a biased quality space: some things feel good
and some things don’t. We tend to live by the rule: if it feels good, keep it’
(Dennett 1999). Simplifying considerably, what matters for the stability
of a neural structure is whether its expressions make its host feel well. And
the conditions under which its host will feel well, reflect genetic biases
expressed as instincts.
Examples to illustrate and perhaps corroborate this view are easy to
come by. In some cases, for instance, the expressions of a current men-
tal configuration may be experienced as ‘pleasing’, because they serve a
clear purpose, as when a child learns how to tie her shoe-laces. Arguably,
mastering such mechanical skills will increase the fitness of the organism
that has mastered them. That they are rewarded by feeling good is no
surprise. Alternatively, being able to produce certain types of behaviour
may tell a person that he is as good at something as somebody else, as
when one learns a skill like playing tennis, or chess. Being as skilful as its
co-speciates may be important for an organism, because they represent
its most immediate competitors. Finally, the expressions of brain-states
may incur positive feedback, more indirectly, from other people either
because they hope to profit from the skill one has mastered (as when
one has learnt how to set up Microsoft Windows on a new computer), or
because having mastered it makes one more like them (as when one learns
to hold a knife and fork like the adults around one). Again, an instinct
for trying to make oneself indispensable to other members in one’s group
seems to make genetic sense, and the same is true of an instinct for trying
to win their favour by conforming. In all cases, the feedback will reach
the responsible neuronal configurations in terms of positive emotions and
212 Selfish Sounds and Linguistic Evolution

thereby stabilise them, possibly through the release of neurotransmitters

or biochemical processes with the effect of strengthening configurations
of neuronal connections.45 Conversely, there will be kinds of feedback
a person may receive on the expressions of a mental structure that will
clearly motivate her to destabilise it and let her brain adopt a different
configuration. Thus, the expressions of a mental state may be costly in
terms of time and energy – as when a detailed picture of a Chinese junk
has to be drawn. Also, the results may be straightforwardly harmful. They
may not be liked by other people, as when somebody keeps whistling out
of tune, or keeps formatting their hard disks when asked to help them with
a virus. Some types of behaviour may be punished with hostile behaviour
simply because they make one appear ‘different’ to others, as when one
doesn’t conform to a dress code, or holds the knife in the wrong hand.
Again, what all cases have in common is that the neuronal configurations
behind the behavioural expressions will receive feedback about their con-
sequences in terms of unpleasant emotional states.
Thus, neuronal configurations, including replicated ones, that is,
memes, are likely to be selected by the emotional responses their expres-
sions generate. These, in turn, are likely to be determined by instincts,
which must in turn have been selected because genes, by coding for them,
have managed to replicate before disintegrating.

6.5.8 Instincts for imitating

This brings us to the other issue then, namely the question of what specific
types of constraints on brain-states it may have paid for human genomes
to encode, or: what types of things genomes may want their brains to
learn. Among the examples given above, only some involve imitation, the
actual replication of mental structures. Others, like the skill of opening
a nut, may but need not. This shows that the mechanisms which come
to replicate neuronal configurations are the same, in principle, as the
mechanisms that underlie learning, or cognitive development in general.
Since we are interested here primarily in brain structures which, when
assumed, do come to replicate structures that had existed in other brains
before, we need to discuss whether there might be a specific instinctive
basis for imitative learning. In other words, we need to identify possible
reasons why our genomes might profit from making us feel well when we
succeed in imitating others and/or find ourselves being imitated by them.
Before reflecting on the kinds of instincts that may plausibly have
evolved both to enable and to constrain the replication of memes through

45 See, for example, Spitzer (1996).

Towards an evolutionary theory 213

defining specific emotional preferences, it is first necessary to clarify a

point which might otherwise cause considerable confusion. On page 191
above we dealt with the idea that humans are essentially free to do and
learn (and thus by implication to replicate) what they like, and make
unconstrained use of that freedom. We said that this idea is incompatible
with the approach taken here. If it were true, we said, it would doom
attempts to explain cultural change in evolutionary terms from the start.
It would force one to concede that the environment to which cultural
replicators could be sensitive may contain a factor which changes too
quickly and randomly for cultural evolution to gain any direction. Yet,
now we find ourselves arguing that memes may get stabilised in human
brains because they make their hosts ‘feel well’. Are we not thereby rein-
troducing the very factor that we tried to eliminate above? Does it not
depend strongly on a person’s personal preferences and tastes whether
she feels well or not?
It is important to distinguish between the connotations of words like
‘emotions’ and ‘feeling’ in everyday speech, and the slightly more tech-
nical ways in which the terms are employed by Dennett or Plotkin. In
everyday speech, what a person feels like, is considered as the essence of
that person’s subjectivity. No two persons, one tends to think, can ‘feel’
exactly alike about anything, and even where rather simple phenomena
like colours are concerned, philosophers have demonstrated again and
again how nearly impossible it is to describe in inter-subjective terms how
exactly a particular colour ‘feels’ to a particular person. That is not the
meaning of ‘feeling’ which is relevant here. Instead, Dennett and Plotkin
appeal to the fact that our mind-bodies, having evolved through natural
selection, must generate emotional states according to principles that do
not reduce the fitness of human genomes. Pain is a good case in point.
Although it is a markedly unpleasant feeling, which all of us try our best to
avoid, our survival clearly depends on our ability to feel it. Only genomes
which make organisms experience harmful events as painful rather than
desirable can have become evolutionarily stable. On the other end of the
spectrum, it must clearly pay for genomes to make their host organisms
feel good about eating or sexual reproduction (Diamond 1998). Genomes
that do not are unlikely to remain in the gene-pool for long. They cannot
afford to leave their host organisms to decide if they should feel good
or bad about eating and sexual reproduction. They must programme
them so that they do. What these rather clear cases show is that there
is a sense in which organisms simply cannot be ‘free’ and unpredictable
with regard to how they feel about certain types of situation. Instead, they
must be genetically determined to evaluate certain types of experiences
(brain-states) more positively than others.
214 Selfish Sounds and Linguistic Evolution

At the same time, genetic constraints on the emotional evaluation of

brain-states cannot be very tight or specific. If they were, the whole idea
of generating a central nervous system would be superfluous from the
point of view of the genome. After all, the whole point of making brains
is to delegate control over situation specific behaviour to an agency which
is more flexible and more quickly adaptable to specific situations than the
genome itself. If a genome did not leave a certain amount of freedom to its
brain, the whole effort of coding for one would have been a waste. Instead
of controlling an organism’s behaviour via a complex emotional control
system which tightly governs brain-states for its behaviour, it would be
much more economical for a genome to directly programme for an organ-
ism’s situation specific behaviour in the first place. If it could, that is to
say. But it cannot, because many of the specific qualities of the situations
an organism will encounter during its lifetime are too changeable for
genetic evolution to ‘see’ them. And this is why a genome can define only
rough strategic preferences, derived statistically from the experiences of
many generations of genomes, for certain types of behavioural options
over others. What is communicated to brains via emotional responses are
these rough strategic guidelines, while tactic ‘decisions’ must be left to
the brains themselves.
When Dennett and Plotkin suggest that emotions select among mental
configurations, brain-states, or memetic structures, they refer to instinc-
tive emotional responses to general and possibly fuzzy categories of expe-
rience. To the degree to which emotions are biologically functional in
this sense, the conditions under which they are generated must be com-
mon to all members of a species alike. However indirectly and complexly
they are in fact controlled by genetic factors, there can clearly be little
about them that is idiosyncratic and practically nothing that is ‘essentially’
undetermined in a philosophical, or metaphysical sense. In short, there
can be nothing absolutely subjective or absolutely unpredictable about
emotional states or feelings. The principles that generate them cannot
vary randomly from person to person, or from instant to instant. The
type of emotional feedback which certain types of brain-state will incur is
likely to be at least statistically predictable across different individuals and
situations. The matter is not really a thing between brain-states and their
host organisms at all. It is much rather a thing between brain-states and
genomes, with organisms acting as messengers. Of course, if the pressures
which emotional states exert on populations of cognitive replicators have
an instinctive basis, they can clearly be assumed to be constant enough
for brain-states to become selectively sensitive to them.
Let us return to our original question then. Is it plausible to think
that there might be specific instincts behind the fact that humans so
Towards an evolutionary theory 215

often wind up with brain-states that are copies of states realised in other
people’s brains? What we have argued so far might explain how brains
learn and why they preferably learn beneficial rather than harmful things,
but it does not yet seem to explain why their ‘quality space’ should be
biased so as to make them copy from each other. To understand this, we
need to find reasons why brains should be made so as to feel particularly
good when they emulate one another.

6.5.9 Why imitation pays

Is it plausible to assume that the human species has evolved instincts
which allow for memetic replication and evolution?46 Clearly, this ques-
tion is huge, and likely to invite bold speculation and circular explana-
tions. On the one hand, it is of course obvious that humans do imitate
each other a lot, and since humans are the product of natural selection,
it is at least highly likely that their inclination to imitate also has been
selected for on the genetic level. Determining, on the other hand, how
human genomes should profit from coding for ‘imitation instincts’, is a
completely different matter. All we can reasonably hope to achieve here
is to point out a few of the advantages that imitation confers on organ-
isms capable of it and suppose that they might possibly have played a
role in the selection of relevant instincts. That this is abductive reasoning
need not disturb us. Our aim here is not to explain, convincingly, how
instincts for imitation have evolved, but merely to show that reasons why
such instincts may have evolved are not hard to come by.
As far as instincts for the active imitation of others are concerned, two
types of pressure can plausibly be assumed to have been at work. Firstly,
an instinct for ‘keeping-up-with-the-Jones’s’, and secondly an instinct
to conform. Consider the former first. As nobody will fail to concede,
brain-states often seem to feel well if their behavioural expressions are
experienced to be at least as efficient and effective in serving specified
purposes as the behaviour of others. Putting it more simply, we all seem to
feel well when we feel that we can keep up. That an instinct for rewarding

46 Note that this question is not the same as asking what instincts may have evolved ‘for’
memetic replication and evolution. Specifically, it does not pre-suppose that the dif-
ferential replication and evolution of memes should necessarily be beneficial to the
genomes that have evolved to make it possible, nor that the ‘phenotypic property of
making memetic evolvability possible’ must have been selected for on the genetic level.
Instead, we are asking whether there may have been instincts, or emotional preferences
for specific brain-states and types of behaviour, that are evolutionarily plausible, that is,
beneficial to the replication of genomes coding for them, and that may have had the effect
of turning humans and their brains into machines for replicating neuronal, structures so
faithfully that they started to evolve along Darwinian lines themselves.
216 Selfish Sounds and Linguistic Evolution

such brain-states should have evolved is highly plausible from the genetic
point-of-view. After all, our co-speciates are also our closest competitors,
and when they are up to something it may clearly pay to be as good
as they are – in whatever it happens to be. Therefore, it will generally
pay to keep an eye on the behaviour of one’s fellows. What it pays to be
particularly watchful of is anything with the characteristics of purpose or
design. After all, any skill or artefact might be turned against us, or might
give our neighbours an edge when it comes to competing for food, or
defending oneself against a predator. Thus, whenever we spot behaviour
that looks structured and systematic, it will pay to try and discover what it
might be for. If indeed it turns out to be useful, it will then pay to become
as good in it as the fellow from whom we have picked it up. An instinct
for keeping-up-with-the-Jones’s makes obvious evolutionary sense.
Consider a possible instinct to conform next. Clearly, it does not
seem to be as easy to confirm through introspection as the instinct for
keeping-up-with-the- Jones’s. Often, we experience the need to conform
as unpleasant and in conflict with our personal intentions and interests.
Yet, the fact that we do feel under pressure to conform at all, even if it
might conflict with other preferences, seems to suggest that it is emo-
tionally relevant to us. In some sense, we do seem to ‘like’ brain-states
whose behavioural expressions will persuade our co-speciates that we are
like them. Such a preference may have evolved in response to an instinct
whose evolutionary plausibility is more or less uncontested. It represents
the core of socio-biological explanations of altruistic behaviour (from
self-sacrifice among social insects to parental care) (Ridley 1996). It is
an instinct for behaving more co-operatively towards kin than towards
non-kin. Since genetic similarity tends to lead to phenotypic similarity,
organisms (of all species) will tend to behave more altruistically towards
co-speciates that look and behave like them than towards such that do
not. Consequently, adapting one’s behaviour so that it looks like that of
others is highly likely to pay. 47

47 The need to conform behaviourally to the co-speciates one is socialised among, may also
be important to humans for a further reason. Ever since their emergence as a species
they have lived in exceptionally large groups characterised by a high degree of social
organisation, the division of labour and the trading of favours. The evolutionary success
of such a life-style depends crucially on being able to distinguish members of one’s
own group from outsiders, so as to avoid dishing out a favour to an individual who will
disappear before paying it back. Therefore, the development of behavioural idiosyncrasies
learnable only by individuals who are around to see them, and thus common only to
group members, may plausibly have become an evolutionarily stable strategy. In fact,
Robin Dunbar (1996) has argued that providing such a type of freely variable behaviour,
may have been the most important factor behind the evolutionary emergence of the
human language capacity.
Towards an evolutionary theory 217

So much for imitating others. Consider, next, a few reasons why it

may pay to let oneself be imitated or even to motivate others to imi-
tate one. It might appear that there should not be many. From what
was just said, one might actually even conclude that organisms, while
trying to imitate others, should in fact be equally motivated to prevent
others from imitating them. After all, any skill one develops represents
an advantage in the competition against one’s co-speciates. One should
profit from keeping it to oneself. Similarly, we might expect that genomes
should develop an instinctive preference for generating behaviour that
cannot be imitated easily, so as to remain able to distinguish true kin from
However, these arguments cease to be valid as soon as a species has
evolved co-operative behaviour, reciprocal altruism and the division of
labour, which humans arguably have. In a group where labour is routinely
divided, and its benefits shared, being the only one capable of a specific
skill, makes one the perfect target of exploitation. One will remain the only
person being asked to perform it.48 Imagine what it would have meant to
be the only one in a group of our ancestors with the acknowledged skills
for taking on a mammoth, for example, and you have at least one good
reason for actively wanting to be imitated.
Something similar holds for imitating in order to conform. Again, it
is not only the imitator who profits from it, but the individual who gets
imitated as well. If A looks like kin to B, then B will also look like kin to
A, and both individuals may come to treat each other in the predictably
benevolent manner. In fact, if imitating is costly, the individual who gets
imitated may even profit more than the active imitator. Thus, an instinct
for helping the individuals around one to imitate one’s own behaviour
may even be a more efficient way of exploiting kinship-based altruism
than actively imitating them.
When one takes the division of labour and the co-operativity that char-
acterises human societies into account, one will immediately see further,
and possibly even stronger reasons for imitating and getting oneself imi-
tated. Humans often engage in team work, by which they achieve tasks
that would be beyond the capacities of any individual. Usually all partici-
pants profit. Clearly, of course, team work will be more successful when
all members understand what the others are doing and thinking, and can

48 All academic staff members who happen to be extraordinarily skilled at dealing with
computer problems will know what I am referring to. The pleasure of being more skilled
than the others soon gives way to the sobering insight that one has become the depart-
ment’s computer handyman. As soon as one realises that all the others get to use their
computers while oneself spends most of one’s time fixing them one finds oneself taking
every chance one gets for imparting some of one’s skills to others.
218 Selfish Sounds and Linguistic Evolution

adapt their own actions accordingly. For co-ordinating one’s actions, it is

clearly helpful if everybody knows what the objective is, and what steps
must be taken to reach it.
Finally, there are some types of behaviour that work only if others are
competent in them as well. Language, of course, is the prime example. It
works as a tool for enhancing one’s cognitive capacities, for manipulating
others in one’s own interest, and for acquiring and exchanging potentially
useful information. Having it is clearly beneficial, and will increase the
fitness of its owner. However, language differs from most other tools in
that it is not really an advantage to have exclusive, or even privileged
access to it. Instead, it will deliver most of its benefits only if it is shared.
Thus, all of the reasons for imitating and actively spreading a particular
skill, apply to language together. Sharing it will not only earn one the
benefits associated with conformism, it will not only incur the benefits
that come with matching the skills of one’s co-speciates, but it is the only
way for the tool to serve a purpose in the first place.
In sum, there are many plausible reasons why humans should want,
instinctively, both to imitate the behaviour of others and to be imitated
by them. Being different and trying to remain so, hardly ever pays. The
potential advantages of being more skilful than others are counterbal-
anced by the social pressure to apply them in the interest of the commu-
nity. The potential advantage of feeling genetically unrelated to others
and therefore finding it easier to deny them favours will obviously be
counterbalanced by the fact that others won’t feel genetically related to
oneself either and behave accordingly as well. And the potential advan-
tages of being unable (and therefore not required) to partake in commu-
nal efforts will be counterbalanced by the fact that one will not be able
to reap the benefits they incur. Thus, there are many good reasons why
genes for (instincts for) imitating and for getting imitated should have
been selected for in human genomes. Which combination of them may
in reality have been decisive is beyond my knowledge, but that does not
really matter. What is important is that there are indeed good reasons
why we should have become instinctively inclined to imitate one another.
This implies that in principle any brain-state which gets behaviourally
expressed and is witnessed by others may motivate them to try and assume
brain-states with similar behavioural effects. All of them are potential

49 It might be objected that the notion of a strong human imitation instinct makes the coun-
terfactual prediction that humans should attempt to imitate practically every behaviour
they witness, which they don’t. Instead, they are rather selective in what they acquire
through imitation. Therefore, the alleged instincts for imitation cannot be very strong,
Towards an evolutionary theory 219

6.5.10 Summary
We can summarise our theory of how and why some neuronal configura-
tions manage to replicate, and qualify as Dawkinsian ‘memes’. Basically,
the process works like this.
Let a potential ‘meme’ be a neuronal structure with behavioural expres-
sions (these expressions may or may not be artefacts in the narrower
sense). When expressed, its behavioural and/or material products may be
observed by another person. Being human, the mind of the hypothetical
observer is genetically programmed to pay special attention, and be per-
ceptually sensitive to the behaviour of other humans and its results. It is
programmed to feel good when it identifies the properties and purpose of
observed behaviour, and when it feels able to reproduce it, and it is pro-
grammed to feel bad when it perceives the behaviour as being purposeful
yet does not understand it, and/or finds it to be beyond its own capaci-
ties. Thus, when a human comes to perceive the expressions of a meme,
her mind will automatically check if it also incorporates a structure that
could be ‘for’ the observed behaviour or products. This process will not
necessarily be conscious, of course, nor does it need to be regarded as
distinct from perception at all. In fact, higher-level perception itself must
always involve the neuronal activation of structures which are, in a sense,
‘for’ the observed behaviour or artefact.
Saying that some instance of observed behaviour is not only perceived
but ‘recognised’ in the observer’s mind, is the same as saying that an
existing complex of neural configurations in that mind is activated. This
activation will incur positive emotional feedback – indicating to the mind
that all is under control and well, so to speak. Associations among the acti-
vated configurations will be strengthened and stabilised. In such a case,
the observer’s mind already hosts a ‘meme’ for the observed behaviour.

and the whole notion that human minds are genetically determined to act as meme repli-
cation machinery must therefore be wrong.
However, this objection is not really an objection at all. Saying that their instincts
for imitation turn every human brain state with an expression into a potential replicator,
does of course not imply that all of them will manage to replicate. Both brain space
and the energy required for the physical re-organisation of minds must be limited, so
not everything that can be learnt, acquired or imitated will. Were this not the case, Dar-
winian evolution could clearly not occur at all. Selection would not occur, and differences
among replicators could make no difference to the success of their replication. As we
have seen, however, imitation is costly, brain space is naturally limited and therefore no
brain can afford to copy every behaviour it gets informed of. The constraints on what it
will actually come to copy must represent the core of an evolutionary theory of culture,
of course. In section 6.6 we have already discussed a few. Thus, the objection that we do
not actually imitate as much as we might if our instincts for imitation were overpowering
is in fact an argument for a Darwinian approach to cultural evolution.
220 Selfish Sounds and Linguistic Evolution

No actual ‘meme-copying’ takes place, although the stability of that copy

may be increased.
On the other hand, the observed behaviour may not be ‘recognised’.
Then the neural structures activated during perception will not incur pos-
itive emotional feedback. They may result in a state experienced as dis-
comfort, worry, or anxiety. This will set the observer’s mind into motion.
It will re-organise itself and produce a variety of different configurations,
thereby searching for plausible hypotheses about the observed behaviour.
In such a ‘search’, a mind may be guided by recognising certain proper-
ties of the observed behaviour as expressions of instructional components
which it already hosts. It may attempt to build its hypotheses by trying
out different arrangements of prefabricated components. As the mind
assumes new hypothetical structures ‘for’ the observed behaviour, it will
test these new configurations by expressing them – virtually or actually.
In each test run, it will evaluate the results by weighting their costs and
benefits, including their capacity of ‘dealing with’, ‘being for’ and possi-
bly reproducing the observed behaviour. Every time, the results of such
evaluations are fed back to the mind in terms of positive or negative emo-
tions. When the expressions of a structure are evaluated as satisfying,
the mind will stabilise it. When not, it may keep testing further candi-
dates for some time until it either chances upon a structure whose virtual
(or actual) expression does incur sufficiently positive emotional feedback,
or until the costs of even only virtual testing rise to a sum that forbids fur-
ther trials. In the former case, it will have acquired a neural configuration
‘for’ the observed behaviour.
If minds store information in terms of discrete components, that is,
digitally, then there is, as Richard Dawkins suggests, a good chance that
the newly acquired configuration will not merely be ‘for’ the observed
behaviour, but will at the same time replicate a configuration in the
mind of the person who had originally produced the behaviour. A lin-
guistic example will make clear why this must indeed be the case. Say the
observed behaviour is an articulation of the name Dawkins, resulting in
an acoustic pattern with very, very specific properties. If the observer’s
mind dealt with this perceptual input by trying to incorporate an image
of the ‘whole thing’, the acoustic pattern in all its details, whatever it
may come up with will hardly be a good copy of whatever it was in
the producer’s mind that had given rise to the pattern of sounds. If,
on the other hand, the observer’s mind constructs its hypothesis about
Dawkins by using a combination of /d /, /ɔ /, /k /, //, /n /, /z / and the
prosodic pattern [strong–weak], then there is really only a single satis-
factory solution to the problem, and this solution will clearly be a per-
fect counterpart of the configuration which, in the producer’s mind, was
Towards an evolutionary theory 221

involved in the production of [dɔ knz]. The Dawkins-meme has thus

managed to copy faithfully. Thus, memes copy much more indirectly than
genes. They do so by expressing as behaviour or artefacts. Thereby they
change their environments in a way that rewards new minds for producing
copies of them. Such an indirect process leads to faithful copies because
(a) memes involve discrete components, (b) minds may try out many pre-
liminary versions of memes until they home in on the one that fits best,
and (c) minds stabilise memes when they serve specific purposes, which
can often only be achieved through a particular arrangement of memetic
This, then, may be the general way in which memetic replication is
brought about. It involves many factors which may have come into being,
originally, for reasons that had nothing to do with meme replication. It
crucially depends on the presence of brains that are capable of learning
through adaptive self(-re)-organisation. It depends on the fact that brains
store and process information in terms of units that are to a certain extent
discrete. It depends on a specific social set-up, in which organisms (and
their genomes) profit from interacting with each other in sophisticated
ways, including the division of labour and co-operation. Also, it depends
on a social set-up in which the benefits of group-membership have to
be earned through conformist behaviour, and in which non-conformist
behaviour is stigmatised. In such a set-up individuals are under a strong
pressure – probably instinctively implemented – to watch one another
closely, to understand one another’s behaviour, and, often, to imitate it.
Meme-replication exploits these instincts for imitation. Thus, memes do
not replicate ‘on their own’, but are strongly dependent on external help.
But this does not make it less plausible to regard them as active replica-
tors, because once the relevant factors are in place, meme-replication will
proceed automatically and without anybody intending to carry it out. It
can therefore be predicted that, just like in the genetic realm, those memes
will spread best that are best at exploiting the available mechanisms for
their own replication. Then they are best adapted to their environments,
or ‘fittest’.

6.6 Selective pressures in memetic evolution

We can now return to the question which we said was decisive for a
Darwinian approach to linguistic evolution.50 The question was whether
the selective factors in the environment of a meme are temporally con-
stant enough to allow for memetic evolution to be directed, in the sense

50 See the end of section 6.4.4.

222 Selfish Sounds and Linguistic Evolution

that lineages of memes can be assumed to actively adapt to those fac-

tors. Only if that was the case, we argued, would it make sense to adopt
the meme’s perspective and to explain changes in meme populations in
terms of competitions among rivalling variants, taking place against an
invariable environmental background.
From what we have said about the processes through which memetic
replication is effected, there emerge the following classes of factors to
which it is likely to be sensitive.

6.6.1 Genetic pressures

First, meme-replication must clearly be constrained by properties of the
human gene-pool, since a number of factors that are relevant in meme-
replication are obviously and directly under genetic control. In order to
replicate, a meme needs to be expressed behaviourally and its expres-
sion must be perceived. The former involves the motor system and the
latter relies on human perception. Both represent expressions of human
genomes, of course.
It is obvious that, other things being equal, memes which are easy to
express will be more likely to be expressed and therefore more likely to
replicate than memes which are difficult to express.51 And the same holds
for perception. The more easily the expression of a meme can be perceived
by another human, the more likely is it that the meme will replicate. To
the degree that their hosts’ motor and perceptual systems are genetically
conditioned, memes that adapt to them adapt to the underlying genes at
the same time.
The argument can be extended to constraints on neuronal organisation
as well – at least to the degree that these are ‘hard-wired’ or genetically
provided. And to some extent they must be, since no mind is a tabula
rasa. This may mean that some of the building blocks from which memes
get built are genetically prefabricated, so to speak. Clearly, memes which
incorporate prefabricated components will be more likely to emerge in
minds than memes which rely on components that need to be acquired
first.52 Thus, meme-replication is constrained by, and likely to adapt
51 Note that this is almost tautologically true: a meme that is impossible to express can –
by definition – not be replicated either. All memes that are expressed, and that can
therefore be replicated, however, are demonstrably ‘more easy’ to express than memes
which cannot be expressed at all.
52 This argument is of course a version of the widespread view that human minds are
genetically biased to learn certain things more easily than others, which in linguistics
underlies the assumption of a genetically provided ‘language instinct’, ‘language organ’,
‘language acquisition device’, or ‘universal grammar’. Apart from a special aptitude for
language acquisition, other talents have been proposed as well, including a ‘zoology
organ’, for example (Pinker 1994 and 1997).
Towards an evolutionary theory 223

to, the genetically provided repertoire of mental components or building

Most certainly, many other genetic constraints on meme-replication
will exist. What matters is that all of them will clearly be much longer-
lived than the memes for whose replication they are relevant. This means
that, from the point of view of the latter, they can indeed be regarded as
a constant environmental background against which rivalling meme vari-
ants may compete, and towards which lineages of memes may actively
adapt. In other words, if changes represent adaptations of memes to bod-
ily (and therefore genetic) preferences, it will make sense to take the
meme’s-eye perspective, to regard memes as active, and to consider their
environment as more or less static.

6.6.2 Memetic pressures

Second, the replication of memes must clearly be sensitive to other memes
in the pool. As we have defined them, memes are neuronal assemblies
that derive their very identity from the ways in which they are linked to
other assemblies, some of which are likely to be memes themselves. It is
through those links they receive the electro-chemical energy that activates
them and keeps them stable. Thus, a meme which manages to link up to
one or more other memes which replicate well for independent reasons,
may itself become a good replicator. Therefore, memes can be strongly
expected to adapt to one another.
A further reason why this must be so is that memes may actually be
composed of other memes. Thus, as we have seen in section 6.3.2, morph-
memes are in a sense composed of phone-memes, or involve at least
stable associations to them. Therefore, the success with which morph-
memes replicate will depend on the availability of phone-memes in the
meme pool. This becomes obvious in the fate of loan words which include
phonemes that are alien to the borrowing language. If English loans into
German include /d / as in /d entlmən / for example, they typically fail
to replicate faithfully and are soon ousted by variants with /tʃ /, such
as /tʃentlmen /. Similarly, to take a non-linguistic example, Dawkins’
Chinese-Junk meme consists of memes ‘for’ paper, folding, squares, cor-
ners, centre and so on. Clearly, the ease with which the Junk meme
replicates will depend on how many of its components are already sta-
bly represented within human minds. The reason for this dependence of
complex memes on their memetic components is of course the same as the
reason memetic replication also depends on the availability of genetically
prefabricated building blocks. Memes for which components are abun-
dantly present in the meme pool will replicate more easily than memes for
224 Selfish Sounds and Linguistic Evolution

which components are scarce – no matter if the components are genetic

or memetic in origin.
Note however that when both components and the complex config-
urations they form are memetic, the dependence will be mutual. Thus,
the expression of a complex meme may not only replicate that com-
plex meme itself, but simultaneously also all its memetic components.
In that sense, also the relationship between morph-memes and phone-
memes is symmetrical, or symbiotic. While the replication of morph-
memes depends on the availability of phonemic building blocks, phone-
memes also depend on morph-memes for their own replication. This
is because the expression of phone-memes as such is typically not per-
ceived as particularly purposeful and worthy of imitation. It is when
phone-memes are expressed as components of morph-memes, which also
express concepts, that their expression will incur positive feedback. Thus,
phone-memes will compete for associations to morph-memes, and lin-
eages of them will adapt to the slots they occupy. Similarly, also such
building-block memes as the constituents for folding paper, squares,
corners, and so on, will depend on the fate of more complex memes
within which they figure. The better the meme for Chinese Junks repli-
cates, the greater will be the pressure on component memes to associate
themselves with the complex, and lineages of them will adapt to figuring
within it.
So much for part–whole relationships among memes. However, the
fitness of any single meme will also depend on the looser associations
it enters with other memes, that is, on its neighbours in the network
of neuronal assemblies that a mind may represent. We have said that
memes are neuronal constituents, or cell assemblies, which depend, for
their very existence, on being activated through the firing of other assem-
blies. Thus, morph-memes, memes for morphotactic Gestalts, depend not
only on their phonemic building blocks, but also on their associations to
concepts, whose ‘meanings’ they may ‘carry’ in discourse. Therefore, the
replication of a morph-meme will depend crucially on the success of the
concepts to which it is associated. The recent spread of morph-memes
like /kəmpjutə /, /flɒp /, /s və / etc., clearly correlates with the spread
of concepts related to information technology. One can argue that the
morph-memes have spread through associating to concept-memes which
were successful for independent reasons.
Again, examples could easily be multiplied, and we shall look at
some interesting cases of meme–meme adaptation below (see chapter 8).
What matters here, however, is merely whether selective pressures among
memes will be stable enough to allow for adaptation and directed
Towards an evolutionary theory 225

evolution at all. – The answer must be yes for a simple reason. As a

matter of principle, no population of memes can change more quickly
than itself. Of course, some memes may appear and disappear from a
population more quickly than others. In this case, those that are more
stable will appear as relatively constant environmental factors to less sta-
ble ones, and if the latter are selectively sensitive to the former, they can
and will adapt to them. The point is that also meme–meme co-evolution
can be viewed as a straightforwardly Darwinian process in which less well
adapted memes are ousted by better adapted competitors.

6.6.3 Social pressures

Third, the replication of memes within any specific population will also
depend, more or less directly, on a variety of factors that are neither
genetic, nor memetic, but social. As we have said, memetic replication
exploits instincts for imitation. Imitation always involves both an imita-
tor and somebody who gets imitated, and this creates the problem of
determining not only who will imitate what but also from whom.
The issue is anything but trivial. Imagine such a simple case as two
people with different ways of consuming their food. Say one of them uses
knife and fork, and the other chop sticks. Several scenarios are conceiv-
able. First, each of them might come to imitate the other, so that in the
end both of them master both skills equally well. It is also conceivable,
however, that one might come to give up his old habit in favour of the new
one. It might be uneconomic to have to remember two different ways of
doing the same thing. But which of the two ways will win? If there was a
good reason why one of the two ways should be more efficient than the
other, one might predict that it will eventually win out, but there might
be other factors involved in determining the outcome. One of the two
persons might be older than the other, and less willing to part with her
habits. One of them might be a better learner and acquire the other’s habit
first. One of them might be stronger and/or more powerful, so that the
other might profit more from gaining his benevolence by accommodating
to him.
If we widen our perspective and look at larger groups than just two
individuals, power may also mean social status and influence. Again a
‘powerless’ individual, who occupies a low position in the social hier-
archy, will profit relatively more from conforming to (and gaining the
benevolence of) a more powerful individual with higher social status than
vice versa. Therefore, memes will replicate more easily from the brains
of powerful individuals to the brains of powerless ones.
226 Selfish Sounds and Linguistic Evolution

In larger groups of potential imitators and models, of course, simple

quantitative factors will also play a role. A behaviour which is widely
spread within a community is more likely to get imitated by individuals
than a behaviour which is rare. This will be true even if the rare type
might be a more efficient variant of the common type. Even if it could be
proved that it is altogether ‘better’ to eat with knife and fork rather than
with chop sticks, the skill will find it difficult to spread in a community
where chop sticks represent the behavioural norm. If one learns to use
chop sticks one will conform to more people than by learning to use knife
and fork. The benefits incurred by conforming will outweigh the possible
advantages of the otherwise better skill.
This effect must clearly be particularly strong in the case of languages.
Contrary to the knife-and-fork-vs-chop-sticks case the usefulness of a
language depends on the number of people who share it. No individual
would profit from acquiring even a highly efficient language if she were its
only speaker. Not only would she suffer social stigmatisation for refusing
to conform, but her language would also be communicatively useless.
This implies that even comparably inefficient language properties will
keep being imitated as long as they are shared by a sufficiently large
number of speakers.
Of course much more could be said about socially based constraints
on meme-replication. What we need to ask here, however, is once again
if they are likely to be constant enough from the perspective of memes to
allow for their directed evolution. A few things are obvious. To the degree
that social circumstances like the ones just mentioned are changeable, the
selectional pressures they exert on memetic populations will change as
well. For example, external events such as migration, or the restructuring
of social hierarchies, may change the quantitative composition of a meme-
pool, the interpersonal pathways along which memes may be transmitted,
as well as the group internal distribution of power. Occasionally, one
would think, such events might turn out to be just as catastrophic for
cultural ‘ecosystems’ as meteor impacts, climatic changes, or invasions
by new species, appear to be for biological ones. In such cases, changes
in memetic populations would clearly not be explainable in exclusively
Darwinian terms, that is, through the mutation, differential replication
and selection of the best adapted meme variants. Instead, their reasons
will have to be sought in the external catastrophes as well. – However,
from all experience, social ‘catastrophes’ of the types just mentioned do
not seem to occur at a frequency and to proceed at speeds which exceed
the rate at which memes themselves get replicated. Also, they never seem
to affect all aspects of a community at once. Thus, the selective effects
Towards an evolutionary theory 227

they may have on meme-populations are unlikely to preclude adaptive

memetic evolution. Instead, it seems more likely to assume that, from the
point-of-view of memes, social changes happen slowly enough, or leave
a sufficiently large number of memetically relevant factors unaffected, so
that meme-populations find a sufficiently large number of socially based
factors in their environments that they can perceive as stable and to which
they can adapt.

6.6.4 Other pressures

There will certainly be further selectional pressures on memetic evolu-
tion. Memes will be selectively sensitive – albeit indirectly – to some of
the material properties of the external environment of humans in general.
Some of them will be more crucial for the survival of human organisms
than others, and memes will be preferably acquired if they help their hosts
deal successfully with them. Of course, neither humans, their minds, nor
their genomes possess the capacity of determining whether acquiring a
meme will be good or bad for them in any absolute sense. Therefore, they
cannot ‘know’ in any absolute sense either, which aspects of the external
world it pays to learn about, or to acquire memes for. Instead, and as we
have already indicated, minds will select memes according to instinctive
constraints that have evolved biologically. These instincts focus human
minds on such aspects of reality which human genomes have learnt from
experience to pay attention to. Apart from the general, and possibly loose
guidelines that genetic biases define for cognitive development, of course,
the meme-population that exists within any individual mind at any partic-
ular time will itself define constraints and preferences on the acquisition of
further memes according to principles that have been just outlined. Thus,
both genetically and memetically defined biases will act as filters which
information about external reality needs to pass if it is to be established,
memetically, within human minds. All that being as it may, however, one
thing can be taken for granted. The elements in the external environ-
ment of humans about which they can acquire information will definitely
include many that are temporally constant enough to allow for memetic
evolution to produce adaptations to them.

6.7 Summary
Most of the factors that may constrain the replication and the stability
of memes can be assumed to be constant enough – from their point of
view – to allow sufficient time for adaptive evolution to take place. None
228 Selfish Sounds and Linguistic Evolution

of the factors that we have identified will appear to memes as a rapidly

changing series of catastrophes terminating and selecting for lineages in a
chaotic fashion. In order to see this, we have ‘deconstructed’ the concept
of human selves as the ultimate ‘owners’, ‘designers’, ‘acquirers’ and
‘transmitters’ of memes. We have shown how the factors which amount to
what human minds do may be determined in ways that are not controlled
by central mental agencies. Thus, positing such agencies has turned out to
be more or less redundant. That they might exert inexplicable, irreducible
and essentially random influences on meme selection, has consequently
become highly improbable.
All this means that most changes in meme-populations should indeed
be explainable in classical Darwinian terms, as processes in which vari-
ants of memetic replicators compete against one another against a back-
ground of selection pressures that are, from their point of view, constant
and admit a gradual replacement of less fit variants by fitter ones. There-
fore, memes do satisfy the fourth criterion which Dawkins established
for identifying replicators, namely that they should be active. Since their
chances at replicating will be directly affected by their own properties or
by differences among rivalling variants, they clearly are.
We have thus laid a sufficiently solid groundwork for applying a tech-
nically Darwinian, or evolutionist perspective in the study of linguistic
change. In sections 6.1 to 6.3 we showed that (at least some of) the con-
stituents of linguistic competences display properties that qualify them as
replicators. For phonemes, morphemes, syllabic structures, foot types, as
well as for regular phonological process, there very probably exist mental
configurations that are discrete, have specifiable structures, and occupy
specifiable positions within larger cognitive systems, or networks. Such
configurations, which we called ‘memes’, following a suggestion made
by Richard Dawkins, appear to be stably implemented within individ-
ual minds, and appear to copy fecundly and faithfully. In section 6.4 we
discussed what further conditions have to be met, if the existence of repli-
cating entities is to give rise to Darwinian evolution. We said that repli-
cation, though faithful, must not be perfect, so that competing variants
can arise, and observed that language clearly meets this condition. Next
we discussed, in section 6.5, the potential mechanics of meme replication
in order to get a better idea of what ‘environmental’ factors might con-
strain it. We identified three major constraint types (genetic, memetic,
and social), and showed – in section 6.6 – that a sufficient number of
them is likely to be constant enough, from the point of view of memes, to
allow for the directed Darwinian evolution of meme lineages through the
selection of variants that replicate best under given conditions. There-
fore we concluded that memes, including replicating constituents of
Towards an evolutionary theory 229

linguistic competence, are indeed likely to evolve, technically rather than

just metaphorically, in a Darwinian manner. The histories of languages
can thus be thought of as emerging, to a relevant extent, from processes
in which variants of linguistic replicators compete against each other for
‘mind-space’ within speech communities, and in which those that are
better at that eventually marginalise, or oust the others.
7 What does all this imply for the study
of language change?

We have now established that it is both possible and plausible to approach

the study of languages and their developments in technically (rather than
metaphorically) Darwinian terms. Since the argument has involved issues
which appear to be only remotely related to our central object of study,
it might be helpful to sum up where we have arrived before re-focusing
on language.
The starting point of our discussion was that languages change, and
that some of the ways in which they change can be described systemat-
ically and look amenable to rational explanation. The assumption was
that the ways in which languages change may tell us something about
their nature, and that a theory of language which could account for them
would be preferable to one which could not. We started to search for a per-
spective on language that could adequately deal with its historicity, and
a discussion of the ontology of language, viewed as a spatio-temporally
bound and ultimately material ‘world one’ phenomenon was launched.
It was established that languages were, at least for historical purposes,
best regarded as cognitive, mental and ultimately neuronal configura-
tions in the mind-brains of human individuals. We then discussed the
processes by which such configurations are acquired and transmitted
and the reasons why they often result in change. It turned out that the
types which the processes represented were not special to the domain
of language. This motivated excursions into evolutionary biology, cog-
nitive science and the theory of complex adaptive systems. We found
that languages – like other constituents of human cultures, biological
life-forms, immune systems, ecosystems and other phenomena – can
be regarded as replicating, evolving systems. This implied that, like
genes in biological life, the mental patterns which instantiate language
must depend for their existence on being reproduced before disinte-
grating. It was concluded that the properties and constituents of lan-
guages ought to a considerable extent to be derivable from this basic
fact. Next, it was discussed if the assumption that there exist linguistic

Implications for language change 231

replicators was at all plausible, and in what way they might be materi-
ally implemented. The view of human minds as networks of neuronal
cell assemblies turned out to provide a basis for modelling, albeit only
tentatively, the possible material shapes of replicating neuronal con-
stituents for language. It was next shown that they were likely to be long-
lived and fecund enough, and to reproduce with sufficient fidelity to be
capable of historical evolution, and adaptation. Like in other systems
of their kind, the mechanisms by which the historical evolution of lan-
guages was brought about, appeared to involve the creation of varia-
tion though quasi-random ‘copying errors’ or ‘mutations’ and the sub-
sequent automatic selection of better replicating variants over such that
were worse. Taking up a suggestion by Richard Dawkins, we called the
proposed linguistic replicators ‘language memes’. Finally, a model of the
mechanics by which language memes might be copied was developed and
a typology of constraints on their replication deduced. Thus, the basis
for a generalised Darwinian, or evolutionary approach to language was
The approach raises a number of highly complex issues and forces one
to take perspectives which differ considerably from the common sense
attitudes towards language which most humans, including many linguists,
naturally share. Thus, rather than casting them as tools which humans
may build in their minds and which they can use for communication and
cognition, the Darwinian approach suggests that languages are popula-
tions of mental patterns, or ‘memes’, which form within human minds,
direct human behaviour, and thereby bring their own replication about.
Likewise, it suggests that texts and utterances are not merely formal codes
created by humans, which transport meaning from one mind to the next,
but the external expressions of memes, on which the latter depend for
their existence and reproduction. Most disturbingly perhaps, the relation-
ship between languages and humans as conceived of in common sense
appears to be turned on its head. Languages may not primarily be tools
which humans use for communication and cognition and whose proper-
ties can be derived from their purpose. Instead, languages appear to have a
mode of existence in which humans figure not as their owners, designers,
users, or controllers at all, but rather as their ‘hosts’, ‘survival machines’,
or even as elements in their environment, by which their existence and
replication are constrained but not fully determined. In particular, those
aspects of humans to which language (and other) memes are likely to
be sensitive, appear to be properties of human genomes, bodies, brains
and the composition of the memetic population hosted by them, rather
than the properties of ‘conscious human selves’. Thus, there is a sense in
232 Selfish Sounds and Linguistic Evolution

which languages are insensitive to the existence of their users as conscious

Many of these issues are mind-boggling and fascinating. They clearly
call for further philosophical and epistemological discussion. As far as
the study of language change is concerned, however, the evolutionary
approach that we have outlined has clear advantages, which easily make
up for its ‘strangeness’. Thus, it is inherently non-essentialist and there-
fore more suited to dealing with the obvious variability of languages in
space and time than most other current approaches, which typically,
though often tacitly, are based on idealisations that reflect, ultimately,
the inherently a-historical concept of languages ‘as such’. Taking the
perspective of replicators, which organise into relatively stable teams
for the purpose of their replication, the evolutionary view of language
is radically item-based (see Hudson (1996)), and does not even have
to address many issues which have so far failed to receive satisfactory
answers. In particular, this concerns such questions as what ‘a single
language’, ‘a single variety’ or ‘a single competence’ really are. Since
linguistic evolution does not operate on languages, but on linguistic repli-
cators, defining ‘languages’ is not necessary for understanding their his-
torical development. By the same rationale, an evolutionary, replicator-
based approach to language can easily bridge the notorious conceptual
gap between languages as cognitive systems ‘owned’ by single individ-
uals, and languages as social institutions. On the replicator level, both
individual speakers’ competences, and speech communities can be con-
ceived of as populations, or pools, the former simply being subsets of the
Secondly, the evolutionary approach is radically materialistic and
thereby ontologically consistent throughout. The explanations by which
it derives the properties of actual languages do not involve a single entity
that cannot be described, at least in principle, on the physical level and
in intersubjectively verifiable terms. It establishes causal chains between
linguistic competences, communicative behaviour and its textual prod-
ucts without having to refer to vague and ill-defined concepts such as
speakers’ ‘selves’, their ‘knowledge’, their ‘intentions’, and so on. No
change of perspective is required, for instance, when discussing the prop-
erties of a ‘language system’ on the one hand, and ‘language use’, on the
other. The former can be described structurally as the properties of a
neuronal network implemented within a speaker’s mind, and the latter
can be understood as the activation and expression of a subset of its
constituent nodes under specific environmental conditions. Relating lin-
guistic phenomena from all domains by consistently taking the point-of-
view of linguistic replicators, an evolutionary approach to language can
Implications for language change 233

in principle integrate them seamlessly, and without gaps or ‘miracles’

(Cziko 1995).

7.1 Linguistic signs, languages and language components as

replicator alliances
Apart from challenging common sense, the change of perspective which
I have been advocating turns many concepts of linguistic theory, which
have traditionally been regarded as basic and taken for granted, into phe-
nomena that are themselves worthy of explanation. This is because the
only thing that may count as established and fundamental from a repli-
cator based perspective on language is that replicators exist. Everything
else is supposed to follow from that, and the question is clearly how. If
all we can be certain of are mental replicators, then what are the status,
the origin and the raison d’être of higher-level entities which are not them-
selves replicators? Contrary to scholars for whom anything from a speech
sound to a religion may count as a ‘meme’, we have been highly restric-
tive in our definition of mental replicators. We now need to ask what to
do about such units as ‘the linguistic sign’, ‘whole languages’, as well
as many constituents of intermediate size, such as grammars, lexicons,
phonologies, morphological paradigms and systems, and so on.
Consider the ‘linguistic sign’ first. As we have argued, it is questionable
whether the links between memes ‘for’ morphotactic units and concep-
tual configurations are stable enough to qualify combinations of them as
replicators. More likely, the associations between memes of the two types
may turn out to be relatively loose. ‘Forms’ and ‘meanings’ seem often to
express (and are therefore also likely to replicate) independently of one
another, so that ‘signs’ are more likely to be secondary replicator alliances
than basic units of linguistic theory.1

1 While this assumption would be consistent with the well-established observation that the
meanings which linguistic forms may convey in actual utterances are more diverse than
the meanings they are supposed ‘to carry’ and which one finds listed in monolingual dic-
tionaries, it obviously casts doubt on the sense of such structural approaches to semantics
in which linguistic items are supposed to simply ‘have’ meanings.
In relation to this, note that speakers have no status in a replicator based approach
to language, and that this is difficult to reconcile with all semiotic approaches to lan-
guage that define a sign as ‘something which stands to somebody for something in
some respect or capacity’ (Dressler 1985: 281). If linguistic replicators are neuronal
configurations, the associations among them must clearly be mechanical in nature, and
the level on which they exist is obviously inaccessible to any human observer. Since an act
of interpretation is itself merely the activation of a neuronally provided association, one
could even say neuronal associations provide for their own interpretation simply through
existing. Thus, while it may make sense to say that the graphic shape man may stand to a
person for the sound shape [mæn] and vice versa because in that person’s mind a neuronal
234 Selfish Sounds and Linguistic Evolution

Consider larger mental constituents next. Are there any at all? Does
it still make sense, for example, to assume that there is something like
linguistic as opposed to non-linguistic knowledge? If our minds are pop-
ulations of mental replicators, then what exactly is the difference between
linguistic competence and other aspects of cognition? Are they all hope-
lessly mixed in a single bowl of mental replicator soup? This strikes one
as highly unlikely, yet the approach we have been developing suggests
that this might be a plausible way of looking at it. And what about lin-
guistic competences themselves? Are they not inherently structured? Is
there not a difference between lexical knowledge and grammatical com-
petence? There must be, but how is one to account for this from the
meme’s point-of-view? Do established classifications of linguistic com-
petence into phonology, morphology, syntax, semantics and pragmatics
still make any sense at all? Or is cognitive content randomly distributed
within a complex network without any higher-level topology? As weird as
questions like these may sound, they are definitely justified, and we need
to address them.2

configuration ‘for’ man is associated with a neuronal configuration ‘for’ the sound shape
[mæn], it would be clearly absurd to assume that the neuronal configurations themselves
stand for each other to that person as well. Such an interpretation would only work if
human brains hosted homunculi who could observe the relations between individual neu-
ronal constituents. Of course, they do not, and assuming such ‘inner selves’ inevitably
leads to infinite regress – for also the interpretations performed by mind-internal homun-
culi must have a material basis and who, then, is to interpret the mechanical relations that
obtain there? So, unless one is willing to believe in miracles, many semiotic relations that
linguists have assumed to hold among competence constituents become highly question-
able. Who, for example, is supposed to be the interpreter of a semiotic relation between
allophones (acoustic patterns, sensual impressions) and the phonemes they ‘stand for’, if
phonemes are neuronal configurations? (Dressler 1985: 282)
2 Note, first, that it is not a bad thing when a new perspective questions established concepts.
Science is arguably more about that than about providing unquestionable truths (cf. Casti
1989). Thus, an approach which makes apparently established wisdom questionable is
in principle a good thing, even when it appears as a step back, and raises questions
which have presumably long been answered. This, it seems to me, is particularly true in
language sciences which have had a tendency to elaborate interesting intuitions into fully
fledged theoretical paradigms, complete with sophisticated formalisms and all, without
spending sufficient time and effort on questioning basic assumptions. This is equally true
of traditional attempts to describe the grammars of all types of languages in terms of a
Latin based paradigm, of Neogrammarian attempts to chart the histories of individual
languages in terms of covering sound laws, of dependency grammar’s attempts to describe
sentence structure in terms of the physical model of atoms and molecules, of Generative
attempts to write grammars as logical production systems, as well as of functionalist
attempts to explain the properties of languages from the services they do for their users.
Because of an undue impatience, it seems to me, most approaches to language have
tended to paint themselves into corners from which they found it difficult to get out.
Therefore, the fact that the replicator based evolutionary approach to language which
we have been advocating here presently creates more questions than answers and might
eventually necessitate the re-invention of one wheel or the other should, I am convinced,
be regarded as an asset rather than as a drawback.
Implications for language change 235

While it may be a good thing in principle that a replicator based

approach questions established linguistic concepts, it must of course be
equally evident that this study cannot answer all the questions that it
raises. Since we have developed the approach in order to address, specif-
ically, the issue of language evolution and change, it is on this that we
shall focus in the remaining parts of this book. Other problems, interest-
ing though they may be, will have to be put aside for the time being, unless
they are immediately relevant to the historical issues we shall discuss.

7.2 Group dynamics in replicator teams: how individual

languages acquire specific characteristics
Languages exist because their constituents have replicated before disin-
tegrating. They change when one of their constituent memes is ousted
by a rivalling variant that happens to replicate better under the specific
circumstances in which the change takes place. Being able to replicate
well is equivalent to being well adapted to environmental pressures on
memetic replication. Therefore, linguistic constituents/properties, as well
as changes of such, will reflect the environmental conditions that con-
strain their replication. Above, we established three basic types of environ-
mental constraints, or selective pressures on linguistic replicators. Before
looking in detail at specific linguistic changes that seem to have occurred
in the history of English, it might be interesting to see what roles the
constraint types we have identified may play in them. Such an exercise
in deduction will make it easier to explain particular phenomena without
getting lost in a sea of potential causes.
Pressures which represent physiological constraints on viable human
languages are genetic in origin. Since human body plans differ only
slightly among all members of the species, they may count, for all practi-
cal purposes, as universal. In particular, slight genetic differences among
individuals, such as those which show up as differences in skin, hair or
eye colour, do not seem to have the slightest effect on people’s ability
to acquire any language they are exposed to during the critical language
acquisition period. This corroborates the assumption that, as far as our
capacity to acquire language is concerned, we are all genetically equal.
Since the human genome is unlikely to have changed much during the
last 100,000 years, linguistic replicators must clearly have had ample time
to adapt to the physiological constraints it specifies. This means that no
linguistic change which has occurred during the period from which we
have historical records of language is likely to represent a straightforward
adaptation to the physiology of human mind-bodies. Even the most obvi-
ously ‘body-friendly’ changes, such as the ‘shortening’ or the ‘deletion’
236 Selfish Sounds and Linguistic Evolution

of phonological ‘segments’ (which trivially make things easier for speak-

ers) must therefore reflect responses to additional pressures or changes
of such.
Secondly, there are selection pressures on language memes which we
called ‘social’ in origin. They emerge from such facts as that language
memes will, other things being equal, replicate more easily when they
are hosted by powerful and prestigious individuals than when they are
hosted by powerless individuals or groups. Similarly, linguistic memes
will be acquired, that is, selected mind internally, more easily if they
manage to connect to human instincts for establishing group member-
ship and for distinguishing oneself from other, ‘rival’, or ‘hostile’ groups.
This means that the internal organisation of larger human communi-
ties into subgroups, as well as the relations among larger communities
will also specify selection pressures on linguistic replicators which are
social in origin. It is to be expected that linguistic memes will replicate
more easily within groups than across. Of course, power-relations within
human societies are themselves historically variable and difficult to pre-
dict. Therefore, the direction of socially based pressures on the selection
of language memes are likely to be fairly changeable as well. When viewed
from a certain historical distance, the changes they will produce are very
likely to look like random, purposeless drifts.
The third major class of selection pressures on language memes
which we established are memetic. They emerge from the fact that for
their expression, their acquisition and therefore also for their replication
memes depend crucially on other memes. Recall that complex memes
(such as morph-memes) depend on their (phonemic) components, and
that smaller memetic constituents (such as phone-memes) depend on
the larger configurations within which they figure. Quite generally, the
stability of memes depends on the energy they receive from other con-
stituents in ‘their’ networks. Therefore, also loose and indirect associ-
ations among memes may translate into selectional pressures. Further-
more, behavioural acts and their material products will usually represent
the joint expressions of many memes. It is highly unlikely that they should
always express and replicate all memes in such sets equally well. There-
fore, individual memes will be under pressure to adapt to the expres-
sive needs of those they usually team up with. For all these reasons,
memes are at least as likely to adapt to one another as to aspects of
their more remote physiological or social environments. Being adapted
to one another, however, implies being dependent on one another at the
same time. A replicator which thrives in a particular environment will
usually suffer when that particular environment changes. Thus, the
selectional pressures which memes exert on one another will force them
Implications for language change 237

into coalitions of the kind we discussed in section above. Just

as their counterparts in the biological domain, memetic replicators will

exploit the opportunities offered by their environments, and an important aspect

of the environment of a replicator is other replicators and their [. . .] manifes-
tations. Those replicators are successful whose [. . . success at replicating is]
conditional on the presence of other replicators which happen to be common.
These other replicators are also successful, otherwise they would not be com-
mon. The world therefore tends to become populated by mutually compatible
sets of replicators, replicators that get on well together. (Dawkins 1982: 264)

This proposition, made by Richard Dawkins to explain why genes form

coalitions to express as organisms instead of replicating individually, has
important consequences both for the explanation of cultural and specifi-
cally linguistic changes, and for the explanation of higher-level linguistic
constituents such as linguistic signs, mental lexicons, phonologies, mor-
phological and syntactic systems, as well as languages.
The implications of meme–meme co-adaptation for historical linguis-
tics will occupy us in greater detail below. Consider first how they explain
the emergence of higher-level linguistic constituents. Whether they be
‘linguistic signs’ or ‘complete languages’, a replicator based approach to
language must regard them as derived units rather than as primitives
of linguistic theory. It predicts their boundaries to be fuzzy – a predic-
tion which empirical observation and the difficulties involved in defining
‘languages’, ‘varieties’, ‘grammars’ and so on seem to bear out well. But
if these higher-level constituents are not fundamental, how do they come
about? Here, replicator theory provides a plausible answer. Higher level
entities such as ‘languages’, ‘phonologies’, ‘morphological systems’ and
so on simply represent replicator-teams. They emerge because they are
advantageous to the replication of their fundamental constituents. Thus,
phonologies may owe their existence as relatively coherent systems to
the fact that their constituents, that is, phone-memes, memes for cluster
types, memes for syllabic configurations, memes for foot types, and so on
replicate more successfully in co-operation with each other than each of
them would on its own. They represent replicator-teams whose members
benefit from keeping together. Just as the high interdependence of the
genes that code for them lends organisms a relatively great coherence, so
will the interdependence among ‘phonological memes’ bestow a certain
unity to ‘phonological systems’ and set them apart from ‘meme-teams’
on whose properties phonological memes do not depend as strongly as
they do on one another. By the same rationale, the apparent coherence of
other components of linguistic competences, and indeed the coherence
of languages may be accounted for. Of course, the larger a replicator team
238 Selfish Sounds and Linguistic Evolution

becomes, the smaller will be the dependence of its constituents on one

another, and the fuzzier the boundaries between such a team and other
memes, or meme teams in its environment.
This is how a replicator-based approach might account for the obvious
existence of higher-level linguistic components and the imperfect but still
undeniable coherence of what we normally refer to as specific languages.
Certainly, there are many details to be worked out, and many questions
raise themselves which would merit more careful consideration than we
can expend here. Since this book is primarily about the way in which
languages change historically, let us consider, instead, what it implies
for language changes that memes will exert selectional pressures on one
another and will thereby come to form co-adapted alliances of mutually
dependent replicators.

7.3 How languages determine their own histories

Memetic pressures on meme selection differ crucially both from genetic
pressures, and from social pressures. Genetic, physiological pressures can
be regarded as ubiquitous from the point-of-view of memes, and social
pressures will be relatively changeable and lead to apparently random
shifts in meme populations. Memetic pressures, however, will certainly
contain at least some which are (a) non-universal and (b) long-lived
enough to allow memes to adapt to them. Therefore, they are the most
likely causes of those long-term developments in the evolution of indi-
vidual languages and language families which amount to their global,
language specific and typological characteristics. If, for whatever histor-
ical reason, a specific (set of) replicator(s) gets established well enough
within a population of language memes, and turns out to be evolutionarily
more stable than others, it will exert a pronounced long-term selectional
pressure on the latter, which can be expected to outlast more short-lived
social pressures, while – contrary to universal, gene-based pressures – it
will at the same time be specific to the particular pool in which it has come
to be established. If a replicator or its effects are recognised as a constant
factor in the environment of others, it will obviously pay for the latter to
adapt to the predictable presence of that replicator. Thereby, they will not
only increase their own evolutionary stability but may further increase the
stability of the replicator to which they have adapted, because it will find
an ever increasing number of replicators in its environment with which it
replicates well together. This will in turn increase the pressure on further
replicators to adapt to and profit from the stable alliances that have been
forming. Through the repeated feedback loops which processes of this
kind are bound to create, small differences in initial conditions between
Implications for language change 239

linguistic populations that happen to get separated at a particular point

in time will be multiplied and lead to the striking differences that can be
observed to distinguish historically related languages.3
This means that if one wants to understand the specific characteris-
tics which an individual language (or a ‘component’ of a language) has
come to acquire over time, the most promising area in which to look for
an explanation is that of meme–meme co-adaptations. Therefore, and
in spite of the fact that languages are embedded in, and their replica-
tion constrained by, human mind-bodies and societies, the most impor-
tant factors behind the specific properties of any single one of them can
confidently be sought on the level of language itself. Thus, one of the
surprising conclusions of our discussion, which has argued that there is
nothing unique about languages in that they are just a particular subtype
of replicating and evolving systems as can be found in many ontological
domains, is that it still makes much sense to study ‘language by and for
itself ’.
The next sections will pursue some of the implications of this argument.
Focusing on phonological developments in the history of English, they
will show that the notion of meme co-adaptation inspires coherent histor-
ical accounts and admits unexpected generalisations. First, a Darwinian
account of vowel quantity changes that have long puzzled the commu-
nity of historical English linguists will be given. It will be shown that the
way in which the distribution of long and short vowel phonemes changed
during the Old and Middle English periods can be explained as adapta-
tions of memes for word forms to a strong selectional pressure exerted
on them by memes for the rhythmical organisation of English utterances,
preferably expressed as trochees. After that, the section on The Great
Trochaic Conspiracy will discuss a number of other, and at first sight
completely unrelated, sound changes. It will be argued that they also seem
to reflect the same pressure that was behind the vowel quantity changes.
The purpose of the following case studies is thus twofold. First, they will
show the evolutionary approach which this book has been arguing for at
work and demonstrate what accounts it can produce, what explanations
it can offer, and what generalisations it allows to be made. Thereby, and
secondly, they may possibly throw some light on the question of why
English has come to have some of the properties it has.

3 Types of system-internal co-adaptation are well known in linguistics, albeit not under
this name. Instead their results are referred to as ‘typological adequacy’ or as ‘system
adequacy’ (Dressler 1985). In languages that display an SVO syntax, for example, preposi-
tions are found to be more ‘natural’ or more ‘preferred’ than postpositions. Or agglutinat-
ing languages allow for richer and more productive morphologies than inflecting-fusional
languages (see, for instance, Dressler/Ladányi 1998 and 2000.)
8 How to live with feet, if one happens
to be a morph-meme

8.1 Early Middle English vowel lengthenings and

shortenings, and what makes them problematic

8.1.1 Introduction
It is generally acknowledged among historical linguists that – roughly
between 900 and 1300 – phonological changes must have taken place
which altered the distribution of English long and short monophthongs
rather drastically.1 The examples in (23) and (24) illustrate the issue.
Take, first, the words in (23a). All of them have diphthongs as their
stressed vowels, which are generally acknowledged to derive historically
from simple short vowels (23c), with long monophthongs as historically
intermediate forms (23b).

(23) Lengthenings
(a) ModE make, acorn, beaver, cloak; child, hound; whale,
bead, coal
(b) LME māken, ākorn, bēver; cı̄ld, hūnd; hwāl, bēd, cōl
(c) LOE/EME makien, akern, befor; cild, hund; hwœl, bed, col

Conversely, the words in (24a) have short vowels which are assumed to
derive from long monophthongs (24b).

(24) Shortenings
(a) ModE kept, dust, fist; errand, southern
(b) OE kēpte, dūst, f ȳst; ǣrende, sūþ erne

1 There are two reasons why I have picked these particular changes to illustrate how the
evolutionary, replicator-based approach developed in the preceding sections can be put to
work. The first is simply that I happen to be more familiar with them than with most other
changes in the history of English phonology, and the second is that my inability to make
full sense of them within any of the established descriptive and explanatory paradigms
which I have tried out has been decisive in making me seriously interested in evolutionary

How morph-memes live with feet 241

The English vowel quantity changes assumedly behind the correspon-

dences in (23) and (24) have represented a considerable challenge to
historical English linguists. Neither early Neogrammarian nor more con-
temporary attempts at describing or explaining them have been fully
An evolutionary view of language suggests the following interpretation
of the data in (23) and (24). In Old English times, there existed a popula-
tion of linguistic meme teams (‘languages’). In those there lived (or again,
more neutrally: existed) memes for the morphological Gestalts of items
such as mak(-ien), akern, befor; cild, hund; hwœl, bed, col, kēpte, dūst, fȳst;
ǣrende, and sūþ erne. Like all morph-memes, they were complex neuronal
configurations which consisted of smaller constituents, notably phone-
memes, that is, memes for specific speech sounds. Among those, there
were phone-memes for vowels of either of two distinctive quantities, for
‘short vowels’, or for ‘long vowels’.
In later times, meme teams came to exist which were descendants of
teams in the Old English population. Their constituents had the proper-
ties they had because they represented the last (at their time) in lineages of
copies created from their Old English ancestors. When one looks at some
of the morph-memes in this new population of linguistic meme teams, one
notices that not all represent faithful copies of their Old English progen-
itors. Instead, many of them are ‘mutations’, which must have emerged
as ‘copying mistakes’ somewhere in the lineage linking them to their ear-
lier counterparts. In some of these mutations, the positions originally
occupied by memes for vowels of one of the two possible quantities were
now occupied by memes for vowels of the respectively other quantities.
Examples of such mutations were the Middle English morph-memes for
māk(-en), ākorn, bēver; cı̄ld, hūnd; hwāl, bēd, cōl, kept, dust, fist, errand, and
suðern, which had out-replicated and more or less ousted more faithful
descendants of OE mak(-ien), akern, befor; cild, hund; hwœl, bed, col, kēpte,
dūst, fȳst; ǣrende, and sūþ erne.
A different way of describing the situation would be to take the point
of view of the vowel memes and to say that in associations with cer-
tain morph-memes, some vowel memes were out-replicated and ousted
by variants for vowels of different quantities. On this level, it would
not be appropriate to speak of ‘mutations’ ousting established variants,
since memes for both long and short vowels were equally ‘established’.
Instead, it would be more adequate to speak of two memes competing
for membership in higher-level memetic configurations.
If one takes a Darwinian, replicator-based view of linguistic evolution,
questions of the following types raise themselves. If one takes the morph-
meme perspective, one will ask what factors in the environments of
242 Selfish Sounds and Linguistic Evolution

replicating morph-memes selected for the new morph-meme variants

(with different vowel memes) and against their established competitors.
If one prefers the phone-memes’ point-of-view, one will ask what the ele-
ments in the environments of vowel memes may have been that selected
for long vowel memes or short vowel memes respectively. For now, I shall
take the perspective of the vowel memes. Looking for potential pressures
on their replication, a good starting point is to describe the environments
from which long vowel memes ousted short ones and vice versa, and look
for correlations between their structures on the one hand, and the rates at
which the vowel memes in competition managed to oust one another. This
is also what established, non-evolutionary approaches to sound change
have done, of course. We shall therefore look at their findings and see,
at the same time, what problems their theoretical frameworks came to

8.1.2 Non-evolutionary accounts and their shortcomings Neogrammarian and handbook accounts
Early attempts to express the relation between pre- and post-quantity-
change forms employed Neogrammarian type correspondence rules and
inspired the following analyses. First, it was observed that many of the
short vowels which were replaced by long ones occurred in words where
they were followed by single consonants and exactly one additional syl-
lable (makien, akern, befor). This tied in nicely with two further observa-
tions. In words where long vowels were followed by two consonants (kēpte,
dūst, fȳst) and in words where they were followed by more than one sylla-
ble (ǣrende, sūþ erne), the long vowels were often replaced by short ones.
Since the Neogrammarian notion was that sound changes were brought
about by – or at least should be described in terms of – categorical sound
laws that alter speech sounds in certain specifiable phonological contexts,
it was attempted to cast these observations in terms of such laws as well.
The resulting rules, which have survived in handbooks more or less until
today were

(25) a. Open Syllable Lengthening (V → [+long] / §#)2

b. Pre-cluster Shortening (V → [−long] / CC)3 and
c. Trisyllabic Shortening (V → [−long] / )4

2 Vowels were lengthened in open penultimate syllables.

3 Vowels were shortened before clusters of (at least) two consonants.
4 Vowels were shortened in antepenultimate syllables.
How morph-memes live with feet 243

With regard to the lengthenings in words such as OE cild > ModE child
or OE hund ModE hound, – which appear to contradict (25b) – it was
noticed that they occurred only before sonorant+voiced stop clusters
whose constituents were articulated at roughly the same place in the
mouth. These clusters were called ‘homorganic’, defined as exceptions
to (25b) and became the basis of a fourth sound law, namely
(26) Homorganic Lengthening (V → [long] {nd, mb, ŋg, ld, lz, rz,
rð , . . .}5
The lengthenings in words like OE hwœl ‘whale’, OE bed ‘bead’ or
OE col ‘coal’, finally, were not regarded as having been ‘brought about’
by a ‘sound law’ at all, but instead through the analogical transfer of
long vowels that had been created by regular Open Syllable Lengthening
in inflected forms such as hwalas, bedes or coles. This view appeared to
be supported by such singular–plural alternations as the one that can be
observed in ModE staff–staves, and – because CVC lengthenings appeared
to be rare (but see below section 8.5.3) – it has been accepted more or
less until today. Thus, all lengthenings and shortenings appeared to be
accounted for as the theory required, that is, in terms of categorical rules
describing sound laws, or in terms of sporadic replacements that were
morphologically rather than phonologically motivated.
Although it is the most widely known and accepted account of the
relevant changes, however, the traditional description has both empirical
and theoretical problems. One is that the four established sound laws
apparently face a disconcertingly high number of exceptions (for example,
we say /kr æk / ‘crack’ where ‘by law’ we ‘ought’ to say /kre  k /, /w  nd /
‘wind, n.’ where we ‘ought’ to say /wa  nd /, and /i:stən / ‘eastern’ where we
‘ought’ to say /estən /). For Open Syllable Lengthening Donka Minkova
(1982) showed that if the rule is used to relate Modern English word
forms to their Old English predecessors, the number of exceptions it
faces practically equals the correspondences which it predicts correctly.
This is particularly awkward since also lengthenings in apparently closed
syllables, as in col ‘coal’, or bed ‘bead’ have subsequently turned out to
be no less frequent,6 so that it is downright puzzling why the former
should be accounted for in terms of a lengthening rule and the latter not.
This is the empirical problem and it is serious. After all, the main value
of Neogrammarian type sound laws is descriptive. They chart regular
and categorical correspondences between sounds of related languages

5 Vowels were lengthened before the clusters nd, mb, ŋg, ld, lz, rz, rð, etc.
6 See Ritt (1997).
244 Selfish Sounds and Linguistic Evolution

or language stages in terms of categorical rules. Since they are neutral

with regard to the mechanics that bring those correspondences about,
their explanatory value is anyway zero. If they chart correspondences
which do not really exist in reality, the whole exercise more or less loses
its point. Since no unambiguous evidence has so far been discovered of
stages or historical varieties of English which the sound laws in (25) and
(26) would adequately relate, English changes of vowel quantity qualify
perfectly as a ‘Neogrammarian Nightmare’.
The other problem, which is related to the first, is theoretical and
involves some of the strategies which have come to be applied in order to
explain, or actually to dismiss, both the discomforting exceptions to the
sound laws and the apparently odd lengthenings in coal or bead, which
they fail to account for in the first place.
One of these strategies is the assumption that ‘messy’ correspondences
are due to ‘dialect mixtures’. That is to say, it is assumed that a (set of)
variety(-ies) existed which were indeed affected by perfectly regular and
categorical sound changes but that this(/these) variety(-ies) subsequently
mixed with others, which had not undergone the changes. Obviously,
the results of such dialect mixtures can be ‘safely’ supposed to be unpre-
dictable. This strategy is highly problematic, however, for two reasons.
First, it is so powerful that no pattern of correspondences is conceiv-
able which might not be accounted for by it. Therefore, it is explanato-
rily empty. Second, its inherent assumption that homogeneous languages
or varieties exist at all, is empirically unfounded. Languages are always
mixed populations. The notion of ‘homogeneous languages’ is closely
linked to essentialist concepts of idealised ‘language as such’, which are –
as we have argued at length – inherently a-historical.
The second strategy is to interpret both exceptions as well as unpre-
dicted lengthenings as ‘sporadic’ cases of analogical transfer from mor-
phologically related forms which obeyed the postulated laws. For exam-
ple, ModE whale is supposed to have inherited its long vowel from forms
such as hwalas, nom./acc. pl., where Open Syllable Lengthening predicts
it. Although that strategy may appear plausible enough, it is also problem-
atic. If whale got its long vowel from hwalas, then the question raises itself
why god did not get a long vowel from godas. If it cannot be answered,
this shows that the effects of analogical transfer are as unpredictable as
those of dialect mixture, and this makes the concept once again so pow-
erful that few cases can be found for which it would not serve as an
Thus, the possibilities of dialect mixture and analogical levelling render
the sound laws themselves more or less useless. If their effects may be
undone in any item to which they would apply, their predictive value
How morph-memes live with feet 245

becomes obviously zero. It is no longer possible, for example, to take

an Old English word, and use sound laws to make an educated guess
about the quantity its vowel is likely to have in its post-quantity-change
counterpart. All one can say is that it may be either long or short, and
this is about as much as one might have guessed without the sound laws
in the first place.
Of course, the fact that the sound laws in (25) and (26) fail to describe
adequately what actually happened to Old English vowels with regard
to their phonological quantities is no tragedy as such. The search for
regular correspondences among the sounds of related language stages is
certainly a worthwhile occupation, but that some sound correspondences
should not be describable in terms of categorical rules is not a problem
per se. It may simply be a fact. If it is, then it only comes to appear prob-
lematic, if a theory of change admits only categorical correspondence
rules as descriptive devices, dismisses correspondences which are statis-
tical or sporadic, and thereby tacitly implies that ‘real’ sound changes
cannot be brought about except by processes that are amenable to such
description. There are no a priori reasons to assume that, however. That
some – possibly even many – sound correspondences are indeed sur-
prisingly regular is certainly interesting and worthy of explanation, but
this does not warrant the inductive generalisation that all sound changes
must be so. Nor does it justify the decision to dismiss sound changes
which refuse to be described in terms of covering laws as uninteresting
or as ‘mere’ historical accidents. Thus, the distribution of lengthenings
and shortenings in Modern English reflexes of Old English words, rep-
resents a problem for (and evidence against) dogmatic Neogrammari-
anism as well as all theories which over-interpret the fact that phono-
logical change can sometimes be described in terms of categorical rules,
and which assume that it is always brought about by actual categorical
By that rationale alone, the non-categorical correspondences which
Early Middle English lengthenings and shortenings seem to have pro-
duced, speak in favour of the evolutionary approach which we have been
developing. It does not imply at all that environments which may select
for or against a particular replicator variant should be describable in cat-
egorical terms. It merely implies that similar environments should exert
7 This is particularly true of Generative approaches to sound change which, typically,
describe sound changes in terms of categorical phonological rules assumed to be added to,
subtracted from, or reordered within mental phonological competences (see McMahon
(1994) for an overview). It may apply less obviously to more recent versions such as Opti-
mality Theory (Prince and Smolensky 1993). It would not apply at all, if the possibility
of the framework to distinguish between more and less harmonic outputs were exploited
more fully.
246 Selfish Sounds and Linguistic Evolution

similar selection pressures. If a selection pressure should be exerted by

some environmental factor X, however, and environment A has more of
that particular factor than environment B, it is perfectly conceivable that
meme variants favoured by X should become more frequent in environ-
ment A than in environment B. However, the conclusion that they will
become the only variants to survive in environment A, while being absent
from environment B is not warranted. Thus, that categorical correspon-
dence rules like those in (25) and (26) fail to relate memes of successive
populations adequately is not a problem for an evolutionary model of
language change.
It seems that after the existence of regular sound correspondences was
discovered, and their value for establishing genetic relationships among
language recognised, they evolved into a kind of preoccupation. It may
have appeared to many linguists that the processes which brought them
about must be as regular as the correspondences themselves. Enquiry
into the possible nature of such processes may then have backgrounded
the fact that not all sound correspondences among related languages were
as regular as some seemed to be. What intrigued scholars were the regu-
lar ones. Messy sound correspondences were considered to have resulted
from linguistically uninteresting historical accidents such as migration
(resulting in dialect mixture), or to the preference of speakers for regular
morphological paradigms, which made them tamper with the effects of
sound laws occasionally. What one seemed to believe in and attempted
to reconstruct was a kind of ideal and splendidly systematic history of
sounds, supposed to underlie the observed heterogeneity of actual lan-
guages. The status of sound laws of the format exemplified in (25) and
(26) was subsequently corroborated even further by the generative idea
that linguistic competences were best modelled as logical and serial pro-
duction systems and by the surprising discovery that the production rules
which phonology models seemed to require were often formally identical
to Neogrammarian sound laws. Thus, sound change came to be thought
of in terms of alterations to phonological rule systems, and consequently
the idea that a decent sound change was regular came to be more or less
Therefore, the dissatisfaction with English changes of vowel quantity
and their handbook descriptions has always been high. In their case, real
history seemed to have messed with the assumed underlying processes to
a degree which distorted the postulated effects of these processes almost
beyond recognition. And since, for the reasons just outlined, this does
represent a problem, the issue has been subjected to numerous revi-
sions (such as recently Phillips (1992), Libermann (1992), Bermudez-
Otero (1998), Minkova/Stockwell (1992), or Dresher (1998), and see
How morph-memes live with feet 247

Ritt (1994) for references to earlier treatments). Various schools of lin-

guistic thought have left their marks on its (re-)interpretations. Two of
them will be singled out for more detailed discussion. Minkova’s bird’s-eye view

In 1982, Donka Minkova took issue with Open Syllable Lengthening,
particularly with the empirical inadequacy of the established law. She
suggested that it should be replaced by a more restricted version, which
had short vowels lengthened not in all open penultimate syllables, but
only in such that were succeeded by a word final, and optionally deletable
schwa. Her revised version read
(27) V → [+long] / §ə #,8
It did appear to work as a categorical correspondence rule, and correctly
related more than 90 per cent of its Old English inputs to their attested
Modern English successors. It also suggested a plausible causality, namely
that the lengthenings may have been compensatory for the loss of final
schwas. On the downside, the rule left a small, but substantial number of
vowel lengthenings completely undescribed, namely those in words which
did not fit its structure description (such as acorn < akern, or beaver <
Minkova’s proposal is interesting for a number of reasons. First, the
idea that the lengthenings were compensatory suggested that, for what-
ever reasons, the affected words ‘wanted’ to maintain their duration.9
Second, Minkova’s resolute decision not to accept a sound law unless it
was more or less covering (applied more or less to all its inputs), reflects
how firmly the preoccupation with covering laws and categorical rules,
and the refusal to put up with messiness had become established in lin-
guistic thought. In Minkova’s case this is particularly interesting because
the empirical evidence from which she derived her revised version of Open
Syllable Lengthening was rather unorthodox and unaffected by theoret-
ical biases of such kind. Minkova observed that historical texts from the
period when the lengthenings were assumed to have occurred represented
vowel quantity too inconsistently to admit any conclusions. Therefore,
she decided to hold Old English word forms against their Modern English
descendants. This means that she correlated language states separated
by almost a millennium, which is ample time for history to occur. Since
her revised lengthening rule must necessarily be interpreted against the
context of the data which it describes, it is clear that it cannot be either

8 Vowels were lengthened in open penultimate syllables, if the final syllable was schwa.
9 Speaking more technically, their overall metrical weight.
248 Selfish Sounds and Linguistic Evolution

of the following two things. First, it cannot represent a coherent pro-

cess which altered a relatively uniform set of phonologies in a particular
speech community. Secondly, it cannot represent a mind internal phono-
logical process either. Comparing Old English and Modern English word
forms, her rule neither describes a Neogrammarian sound change, nor a
rule in a computational model of mental phonology. So what, then, does
it describe?
Minkova does not commit herself on the issue, but from the evolution-
ary, replicator-based perspective which we have been developing, an obvi-
ous interpretation offers itself. The formula in (27) relates two historically
successive populations of vowel memes. They happen to be separated by
1,000 years but one may nevertheless be regarded as the descendant of the
other. As the earlier population evolved into its descendant, memes for
short vowels were eventually ousted completely by memes for long vowels
in specific environmental configurations, namely when they were associ-
ated to morph-memes that expressed as words whose stressed syllables
were open and whose last syllables were sometimes schwa and sometimes
zero. Read this way, Minkova’s version of Open Syllable Lengthening sim-
ply describes a historical correlation adequately. It neither describes the
processes by which, nor the reasons why it was brought about. This is a
perfectly sensible descriptive strategy and as theory-neutral as humanly
Note, however, that the descriptive reading of Minkova’s lengthening
rule – which really is the only type that makes sense, given her choice of
data – has further implications. When one charts correlations between
successive language states (whether one thinks of them as populations of
language memes or not), there is no reason at all why one should not chart
statistical correlations as well. Therefore, even the handbook account of
Open Syllable Lengthening would be a perfectly adequate description,
as long as one adds, explicitly, that it applies to only half of its inputs. It
would then simply express that there was a 50 per cent chance for the
modern English descendant of an Old English word with a short stressed
vowel in an open penultimate syllable to have a long vowel instead of a
short one. This statement would be just as correct as the observation that
the Modern English descendants of Old English words with short vowels
in open penultimate syllables whose final syllable was schwa all had long
Comparing these two observations one could then conclude that the
environment {– §ə #} seems to have selected more strongly against short
vowels than other types of {– §#} environments, and search for possible
reasons for this difference. Obviously, this is the kind of description on
How morph-memes live with feet 249

which an evolutionary explanation of the lengthenings would have to be

Yet, Donka Minkova’s approach was not evolutionary, and she did not
come to describe the lengthenings in terms of a statistical correspondence.
Instead, she suggested that her version of Open Syllable Lengthening
should be considered as fully replacing the established Neogrammarian
law. After all, she had managed to find a correlation that could be expressed
in terms of a categorical rule, that did look like a covering sound law, and
that did lend itself to be interpreted as a mental rule which speakers
may at some point have added to their internal phonologies. Thus –
and apart from having to leave lengthenings in akern, befor and similar
cases unaccounted for – Minkova unwillingly perpetuated the established
bias that ‘proper’ sound changes were processes which affected a set of
inputs categorically, as well as the more theory specific notion that these
processes could be thought of as mental rules whose inputs were the
pre-change sounds and whose outputs were the post-change ones. Generalised Quantity Adjustment: a rule in search

of an interpretation
To Minkova’s credit, also my own treatment of Early Middle English
changes of vowel quantity (Ritt 1994), which did describe them in terms
of statistical rather than categorical correspondence rules, provided no
completely satisfactory explanation as to why this move was at all justi-
fied. Just as Minkova’s, it was primarily empirical. I basically took her
method to its logical conclusion without being fully aware of the theo-
retical implications of this move.10 I drew a list of Old English inputs
to all traditionally recognised quantity changes, compared them to their
Modern English descendants, and looked for patterns in the distribution
of lengthenings and shortenings. Yet, in spite (or possibly because) of
the relative naivety of my approach, a description of the changes came up
which displayed patterns which, while being inexpressible in terms of cat-
egorical rules, were at the same time clearly non-random. They literally
begged for interpretation.
The emerging patterns suggested that there were a relatively small num-
ber of factors with which the distribution of long and short vowels in the

10 In fact, my decision to do so was motivated to some degree by the fact that the personal
computer which I had acquired to help me with the mechanical aspects of writing came
complete with a spreadsheet and a charting programme. Since I wanted to discover what
the programmes, which I had not even acquired intentionally, could do, I fed them my
linguistic data, and found, somewhat to my surprise that the patterns which this method
revealed were rather striking.
250 Selfish Sounds and Linguistic Evolution

Modern English descendants of Old English inputs to the established

quantity change rules correlated in surprisingly straightforward manners.
Some of them seemed to have favoured lengthening, and others shorten-
ing. Vowel length was favoured (a) in low vowels, (b) back vowels, (c) in
light syllables, (d) in light feet, (e) in stressed syllables, and (f) before weak
(or highly sonorous) consonants. Conversely, shortness was favoured in
(a) high vowels, (b) front vowels, (c) heavy syllables, (d) heavy feet, (e)
weakly stressed syllables, and (f) before strong consonants (consonants
low in sonority). This suggested that for each Old English vowel, the
chance of its Modern English descendant being either short or long,
could be expressed in terms of a single, albeit necessarily probabilistic
law. I suggested the following rule, which I called Quantity Adjustment
(Ritt 1994: 95).

k x(  ) + y( wl−wn ) + z(h )

p([ r r] → [ r] ) ≈ ≈l
p([ r] r r
 → [ ] ) t(  ) + u(sonc ) + v(b )

The probability of vowel shortening was proportional to

a. its height
b. syllable weight
c. the overall weight of the weak syllables in the foot
and inversely proportional to
a. the (degree of) stress on it
b. its backness
c. coda sonority

This rule describes how Old English vowels in alleged inputs to quan-
tity changes relate to their Modern English descendants – as far as their
quantity was concerned. It is empirically adequate (but see below). For
example, it appears simply correct to say that ceteris paribus11 one finds
more lengthened vowels among low vowels than among mid vowels, and
more among mid vowels than among high ones, etc. As the following
chart, based on inputs to Open Syllable Lengthening, illustrates (Ritt
1994: 39), the correlation is so obvious that it would be downright stub-
born to deny that the height of a vowel made a difference for its chance
to be lengthened.

11 That is to say, if one factored certain competing factors (for which see Ritt 1994)
out. The principles on which to do this are inherently problematic. On the issue,
see, for example, Lass (1980), Bermudez-Otero/McCully (1997), or Prince/Smolensky
How morph-memes live with feet 251

Percentage 20
of long reflexes
among items 15
with stable last
syllables 10

high mid low


Although it adequately describes the correspondences between OE inputs

to alleged Neogrammarian lengthening and shortening laws and their
ModE descendants, however, Ritt (1994) does face a number of prob-
lems. The first is theoretical. As indicated, I had no clear idea at the time
of the processes by which the described relation between Old English
vowels and their Modern English descendants might have been brought
about. Thus, I proposed somewhat rashly that Quantity Adjustment (i.e.
the rule in 28) represented the description of an ‘actual sound change’.
I even went ‘as far as to say that – for the purpose of diachrony – all length-
enings and shortenings may indeed be regarded as one single change’
(Ritt 1994: 96, my present self’s italics (2000)). Thereby, I committed a
similar fallacy to the one Neogrammarian linguists had committed when
they concluded that regular and categorical correspondences must be
brought about by regular and categorical processes. Only in my version
of the fallacy, I seem to have thought that statistical correlations must
be brought about by statistical processes. Of course, the idea that
there should necessarily be a similarity between historical processes and
descriptions of their effects is unfounded.
I must add, in my own defence, that I had provided a rather explicit and
cautious definition of what I considered a sound change to be, and made it
clear that I did not really think it was either a single coherent process or the
straightforward effect of a rule change within a single speaker’s mental
phonology. In an attempt to deconstruct established interpretations of
Open Syllable Lengthening I argued that

the concept of sound change – or the concepts behind such expressions as ‘(were)
lengthened’ – refers to the following phenomenon: at one time one group of peo-
ple pronounce certain words of their language in one particular way, while other
252 Selfish Sounds and Linguistic Evolution

people at a different time use similar words to convey similar meanings but pro-
nounce them in a different way. A change can be said to have occurred whenever
in a language a certain role is played by one articulatory target at one time, and
by a different target at another time. [. . . T]his view can be broken up to yield the
following more specific interpretations of ‘sound change’. In the first, it stands for
the mere fact that the latter target can be regarded as the functional equivalent and
thus the temporal successor of the former. In the second interpretation, which is
much stronger, ‘sound change’ stands for all the factors that caused the functional
correspondence between the two elements. [. . .]In both readings, [the phrase
‘sound change’ . . .] is a cover term for a large set of interrelated events. (Ritt
1994: 8)

Thus, I used the term ‘sound change’ to stand, basically, for whatever
processes may have brought a diachronic correspondence between dif-
ferent sounds. But, in retrospect this may have been a hedging strategy
more than anything else. Its inherent vagueness consequently tempted,
or practically invited readers to supply their own personal interpretations
of processes behind diachronic sound correspondences. And with some
interpretations the format of the rule I proposed was predictably incom-
patible. For instance, Bermudez-Otero objected to its statistical nature
on grounds of the principle (quoted from Prince and Smolensky 1993:
197–8) that
Linguistic theory cannot be built on ‘laws’ of this sort, because they are too
slippery, because they contend obscurely with partly contradictory counter-‘laws’,
because the consequences of violating them cannot be assessed with any degree
of precision [. . .]

As his own treatment of the changes show, his objection was based
on a misunderstanding. In good generative tradition albeit with a new
optimality-theoretical formalism, Bermudez-Otero attempted to account
for them in terms of an assumed mental production system which takes
pre-change forms as its input and puts out post-change forms. Thinking
in such terms, he seems to have interpreted my formula as a description
of mental processes as well. It was not intended to model such processes
at all, however, nor was it meant to formulate any type of ‘law’ which
to violate could have consequences in any meaningful sense. Instead, it
really just described quantitative relations between successive populations
of (as I would say now, but didn’t say then: ‘memes for’) word forms.12
The second major problem about my account was empirical. As indi-
cated, formula (28) was derived from a corpus of OE words that qualified
as inputs to one of the four Neogrammarian laws for vowel lengthening
and shortening. It described the correlation to their Modern English

12 Of course, when one fails to make it clear enough what one thinks that language changes
really are, one has only oneself to blame for provoking misinterpretations of that kind.
How morph-memes live with feet 253

descendants correctly. However, the way it was formulated (and the way
in which I understood it), clearly implied that it should apply quite gen-
erally to all items in the English lexicon, not just to those which had been
singled out by the Neogrammarians. Unfortunately, it seemed not to.
I had completely overlooked the fact that my formula predicted words
like OE mon and OE god to show up as ModE /me  n / and /əυd / rather
than as /m æn / and /ɒd /.13 Although it subsequently turned out that the
number of words of the mon type which do have long-vowel descendants
is actually greater than had hitherto been assumed, the figures were still
not compatible with the predictions inherent to (28).
Of course, that words of the mon-god type seemed to have been mira-
culously immune to the lengthening force which the parameters in (28)
‘ought’ to have exerted on their vowels is reason for worry. In my case, it
made me worry so much that I eventually came to reconsider my whole
approach to language and language change. In a way, this very book
has been triggered by the fact that rule (28) made wrong predictions. It
motivated me to re-think my first and rather naive interpretation of the
formula, and made me think again about the actual mechanics by which
the parameters in it could have affected the quantities of vowels. It was
only when I had trained myself to think of languages as systems of repli-
cators undergoing Darwinian evolution, however, that I realised what my
Quantity Adjustment rule could be expected to predict, and what it could
not. As we shall see, there is a plausible reason why mon, God and so many
other [CVC] monosyllables appeared to be miraculously immune to the
pressures which the factors in (28) ought to have exerted on them. The
pressures were not simply ‘lengthening pressures’. They did not exert ‘a
pressure on vowels to lengthen’. As the Darwinian perspective we have
developed suggests, they must have been selective pressures on the repli-
cation of morph-memes. As we shall see, they were ‘rhythmic’ in origin,
and selected for morphs that replicated well in predominantly trochaic
expressions. Vowel lengthening was just one of the possible ways in which
morph-meme lineages could adapt to them. [CVC] monosyllables had
an additional option which, if they took it, allowed them to maintain their
short vowels.

8.1.3 Outlines of an evolutionary account

Under the perspective we have been developing, the factors in formula
(28) represent environmental conditions which exerted selectional pres-
sures on the replication of memes for short and long vowels. These memes

13 My thanks go to Ricardo Bermudez-Otero who was courageous enough to point this out
to me at International Conference of Historical English Linguistics in 1996.
254 Selfish Sounds and Linguistic Evolution

competed for ‘slots’ in morph-memes of various structural types. Some

environmental conditions seem to have favoured the replication of long
vowels, others that of short ones. So much is obvious from the correla-
tions which (28) describes. However, the correlations themselves are not
explanatory. In order to explain why the factors involved in (28) amounted
to selectional pressures on the replication of long and short vowel memes,
the mechanics involved in the interaction need to be made explicit.
In order to keep the argumentation coherent and simple, I shall focus
on just two of the ‘environmental’ parameters which affected the evolu-
tionary stability of long and short vowel phone-memes, namely a(s ) (the
degree of stress on the syllable), and y(ww1-wn ) (the overall weight of the
weak syllables in the morph). Both relate to the structure and the weight
of the feet in which memes for English words came to be expressed. As
I will show, their impact derives from a memetic configuration for foot
structure and timing. Middle English morph-memes were co-expressed
with that configuration and came under heavy pressure to adapt to it.
The argument goes, roughly, like this.
Morph-memes coded for the phonemic and syllabic structures of
English words. Like all memes they could only replicate if they were
expressed. However, they could only express together with memes cod-
ing for their foot structures. In English, this worked in such a way that
nodes for foot-heads usually expressed together with the first syllables of
word-form memes. English had fixed initial stress.
In English, and possibly universally, foot memes seem to replicate best
when they express as regularly alternating sequences of more prominent
and less prominent sound sequences. Therefore, foot memes express
most commonly as trochees or as types which are not too dissimilar to
trochees in terms of structure and duration. This means that memes
‘for’ the phonemic and syllabic Gestalts of Middle English words (i.e.
morph-memes) had to choose among a fairly limited set of foot types
with which to team up in expression. They consequently came under
strong selectional pressure to adapt to those foot types, because memes
that co-express well with a successful meme will themselves be more suc-
cessful than variants which do not. Lexical morph-memes could pursue
two strategies to adapt to memes for trochees. Either they could them-
selves evolve structures that co-expressed well with trochee-like feet, or
they could form temporary alliances with other word memes for that
purpose. When they pursued the former strategy, the selectional pres-
sure from trochee memes would be passed on to the phone-memes that
made up the morph-memes, causing long variants to oust short ones
and vice versa under specific circumstances. Basically, it seems, long
vowels replicated better in memes for such word forms that expressed in
How morph-memes live with feet 255

segmentally short (or ‘light’) feet, and the other way round. When lexical
morph-memes could pursue the second strategy, on the other hand, the
pressure on their own components to adapt was considerably reduced,
and changes of vowel quantity consequently less frequent.
The next sections will show in detail how the explanation works. As
we shall see, it has none of the problems inherent in traditional accounts
(and most of their revised versions). Furthermore, it allows one to inte-
grate the lengthenings in monosyllabic words such as whale or stave,
which have so far been regarded as uncomfortable irregularities or simply

8.2 Utterance rhythm and Middle English vowel quantity:

a case of intra-linguistic meme–meme co-adaptation

8.2.1 The case of weakly stressed items such as ME have Introduction: Open Syllable Lengthening and have
For a start, reconsider the correspondence between original short and
later long vowels in open syllables. The prototypical description was given
by the Viennese philologist Karl Luick around the beginning of the twen-
tieth century. He claimed, essentially, that Open Syllable Lengthening
(OSL) lengthened short vowels, if they were non-high, if they carried pri-
mary stress, and if they occurred in open penultimate syllables. The exam-
ples he gave were LOE make [1st sg.] > ME māke ‘make’, LOE befor >
ME bēvor ‘beaver’, LOE hope [1st sg.] > ME hōpe ‘hope’ (cf. Luick
1914/21: 397–409). Thus, already Karl Luick included the three fac-
tors that have since figured prominently in most accounts of the change.
Although they may indeed have selected for long vowels, however, they
will not be further discussed at this point. Instead, we shall approach the
change from a slightly unusual direction.
Karl Luick was obviously aware that his ‘correspondence rule’ had
exceptions. In a paragraph supposed to round off his general description
of the lengthening he observed that
[d]iese Dehnung ist an Starkton gebunden: bei gemindertem Ton wurde die
Kürze bewahrt. Daher have neben have ‘haben’ (vgl. ne. have, behave) [. . .].
[this lengthening depends on full utterance stress.14 When utterance stress was
reduced, shortness was preserved; therefore have beside hāve (cf. ModE have,
behave).] (ibid.: 399)

14 The translation of Starkton, which would gloss ‘strongstress’, as ‘full utterance stress’ is
of course interpretative. It is clear from the larger context, however, that Luick could
not have meant lexical stress, as his description of OSL is part of a chapter which deals
exclusively with the history of lexically stressed vowels (‘Sonanten in Tonsilben’) anyway.
256 Selfish Sounds and Linguistic Evolution

While – or maybe even because – this observation and the facts behind it
are relatively easy to interpret some of their rather interesting implications
have tended to go unnoticed. The classical interpretation

Look, first, at the established interpretation: mainly because it was an
auxiliary verb, but possibly also because of its relatively vague lexical
content as a full verb, have may relatively often have occurred in utterances
where it was ‘unstressed’ and thus not the head of a foot. The following
examples serve to illustrate the point.
(30) a. ( Ne ) | shál ich | néuer haue | réste ne | ró
( Til ) | ı́ch haue | tóld hou | þ óu shalt | dó.
(/2    14:Heading)
b. Whát haue | ı́ch so | méche mis- | gı́lt,
fiát þ ow | séxt &| þ ólen | wı́lt,
(2    83:Heading)

In utterances where a word occupies a relatively unstressed position,

its phonetic realisation normally reflects this in predictable ways, because
it receives less articulatory energy than it would if it were metrically more
prominent. In the case of unstressed haves, this may have led to the
omission of the final schwa, for example, or to the dropping of the initial
[h ]. Also, and this is what is most relevant here, the actual duration of
the vowel /a / will have tended to be relatively short.
Clearly, this tendency must have conflicted with the supposed length-
ening effects of /a /’s structural position in an open penultimate syllable. As
is normally believed, that environment brought OSL about by causing the
phonetic realisations of affected vowels to become longer. Thus, the pho-
netic realisations of /a / in have would have been subject to diametrically
opposed pressures: the phonological structure of the word would have
favoured long [a ]s, while its frequent occurrence in unstressed positions
would have favoured short [a ]s. While the structurally induced prefer-
ence for phonetic lengthening may thus still have occasionally resulted
in long pronunciations of the /a /, particularly in utterances where the
have was more fully stressed, the preference for shortening in unstressed
words seems to have prevented it from having its usual diachronic effect,
namely the eventual replacement of short /a / by long /a :/ in the lexical
representations of have. Now, why was this the case?
Assume that the /a / in ME haven was realised relatively more often as
a short [a ] than the /a / in fully lexical words such as maken ‘make’, grasen
‘grase’ or aken ‘ache’, for example, because in the latter, the structurally
induced pressures which favoured lengthening faced no opposition and
How morph-memes live with feet 257

increased the [a :]/[a ] ratio in their phonetic realisations. When at some

later stage, new generations of speakers came to acquire Middle English
from such realisations, they were facing the problem – as such completely
normal in all language acquisition – that they heard the words pronounced
with vowels of varying durations. They would have to decide which forms
they ought to store in long-term memory as their prototypical represen-
tations and which they ought to treat as contextually ‘distorted’ variants.
Classical accounts of OSL suggests two possible ways in which the prob-
lem came to be resolved. Both rest on the plausible assumption that learn-
ers were able to generalise from the discourse they were exposed to, and
derived the hypothesis that non-high vowels in open penultimate syllables
are normally to be pronounced long. This may then have induced them
either to acquire a phonological process automatically ‘lengthening’ such
vowels, or simply to acquire only long vowels in lexical representations of
the relevant structure.
The third theoretical possibility, namely that speakers might have
decided for each word individually without incorporating any generalisa-
tion into their competences, is not really compatible with the traditional
view that OSL was basically a regular process. Given the ‘exceptional’ case
of have and its failure to undergo OSL, of course, the processes or phono-
tactic constraints that speakers of Middle English acquired with regard to
vowel length in open disyllables cannot have been completely watertight,
categorical rules. The fact that have was relatively frequently pronounced
with a short vowel must have induced them to acquire and store lexi-
cal representations of the word that had short /a /s – in spite of possible
intuitions they might have had about the distribution of short vs. long
vowels otherwise. So have became an ‘exception’ to the diachronic corre-
spondence rule (or ‘sound law’, as it has traditionally been called) V →
[+long] / §. Its frequent occurrence in unstressed positions is thus a
plausible explanation of its exceptional development. A replicator-based account

Consider this from an ‘evolutionary’ perspective. Think of short /a / and
long /a / as linguistic replicators competing for association to morphs.
Before OSL occurred the phone-meme /a / was tightly associated with
memes coding for the Gestalts of morphological forms like {have},
{make}, {grase}, and so on. It would be activated and expressed through
activation of the latter, and it depended on expression for its replication.
For certain reasons, the memes coding for {have}, {make}, {grase}, and
so on affected the expressions of /a / (i.e. the articulatory gestures and
their acoustic products) in such a way that their duration increased, just
as described above. Thereby they became similar to expressions of /a /
258 Selfish Sounds and Linguistic Evolution

as, for instance, in cam ‘came’. In minds whose organisation was affected
by exposure to lengthened expressions of /a /, they caused the emergence
of memes for {have}, {make} and {graze} which did not involve links
to the meme for /a / anymore. Instead they linked to the meme for /a /.
Since the expressions of these mutated morph-memes would be similar to
lengthened expressions of their progenitors with links to /a /, they would
incur enough positive feedback to stabilise the memes. Once the mutated
morph-memes had first emerged in the population, a competition among
/a / and /a / for association to the morph-memes for the words in ques-
tion started. Of course neither of the two vowel memes would have been
aware of this competition. The term merely indicates that some of the
have-make-graze type morph-memes in the population now had links to
/a /s and thereby helped to replicate them, while others involved links
to /a /s and helped to replicate those. Thus, the distribution of link-types
within the population of replicator teams ‘for’ Middle English came to
change over time. At some stages, of course, the competition may have
been undecided, and the distribution of morph-meme variants may have
remained stable, with /a /s prevailing in some brains and /a /s in others,
or /a / and /a / working out ways of sharing the association according to
certain (possibly socio-stylistic) regularities.15 Eventually, however, /a :/s
seem to have ousted /a /s from slots associated to the morph-memes for
words like make, graze – and indeed all others in which Open Syllable
Lengthening was implemented. In the case of have, on the other hand,
/a /s were able to maintain their position against /a :/s. How did they man-
age to do so? If we accept Karl Luick’s view, which is essentially sound,
this was because of have’s frequent occurrence ‘in unstressed positions’.
In ‘unstressed positions’, /a / memes would rarely express as [a ]s, because
‘such positions’ favoured short vowel sounds.
What exactly is ‘an unstressed position’, however? And what does it
mean for a morph-meme, if its expressions will frequently occur in one?
To what degree is the average relative prominence of its expressions an
inherent property of the morph-meme, and to what extent is it environ-
mentally conditioned?16 As I shall argue in the following, at least some
of all the parameters conditioning the prominence of a morph-meme’s
expressions are external to the morph-meme itself. Some of them may
express a configuration which, although it will co-operate with morph-
memes in expression, replicates independently of them, namely a meme-
plex for alternating and isochronic metrical ‘feet’.

15 The latter would themselves have been neuronally encoded. Traditional socio-historical
linguistics of the Labovian type might describe them in terms of variable rules.
16 That is, by factors that are not inherent to the morph-meme itself.
How morph-memes live with feet 259

[increase effort] [decrease effort]

[more prominence] [less prominence]

S w


[timing unit]
Figure 8.1 Another look at a meme for feet. Reconsidering the meme(-plex) for feet: the [Sw]-component

‘Unstressed positions’ express an arguably coherent memetic configura-
tion, namely a meme, or memeplex ‘for’ rhythmic structures, namely feet.
In section above, we suggested that such a mental configuration
might be structured, roughly, as in figure 8.1
At the core of this configuration, nodes for strong positions (labelled,
for convenience, ‘S’) and nodes for weak positions (‘w’) are linked in
such a way that the activation of one makes the activation of the other
more likely. Possibly, the link from S to w is inherently stronger than that
from w to S. This configuration expresses, favourably, as ‘trochaic’ utter-
ance stretches. In these, prominent and less prominent sound sequences
alternate. That whatever codes for trochees must be evolutionarily highly
stable has frequently been pointed out in phonological studies. For exam-
ple, Dogil (1980) assumes a universal ‘Trochaic Projection Constraint’
(92), for which he adduces much internal and external evidence, con-
veniently summed up in Dziubalska (1995: 58).17 There is, incidentally,
also a nice evolutionary thought experiment which suggests that memes
for distinguishing between exactly two degrees of rhythmic prominence
should always copy more easily than memes for distinguishing either more
prominence degrees or none.
Imagine a population of linguistic replicator teams that includes no
meme for distinguishing different degrees of rhythmic prominence at all.
This can clearly not mean that all syllables will be expressed with equal
articulatory effort and equal acoustic prominence in actual utterances.
17 Dziubalska also points out that the preference for a binary foot structure may be ‘neu-
rologically based (cf. the binary choice between presence and absence of neural firing)’
(1995: 58), which is of course beautifully compatible with the position taken here.
260 Selfish Sounds and Linguistic Evolution

It merely means that such differences will play no role for the linguistic
meme teams through whose expressions they come about. These teams
include no configurations that specifically react to different degrees of syl-
labic prominence, or code for any of them. Imagine, then, that a mutant
meme emerges in that population which does react, specifically, to syl-
lables whose expressions are more prominent than those in its imme-
diate neighbourhood, and which expresses through, say, increasing the
prominence of every other syllable that gets uttered. The question now
is how well this meme will fare in interaction with the rest of the pop-
ulation. The answer, it seems to me, is rather obvious: other linguistic
meme teams will react positively to its expression, because they basically
‘accept’ prominence peaks wherever they occur. The mutant meme for
alternating prominence patterns, on the other hand, will itself react more
positively to utterances that meet its expectations, cause the memetic
configurations that gave rise to them to receive positive environmental
feedback. Thus, it will cause copies of itself to become more frequent
in the population. Therefore, populations of linguistic replicator teams
which include no meme for different degrees of syllabic prominence are
inherently unstable and likely to be invaded by teams which do include
such a meme.
Now, given that languages are highly likely to include memes for dif-
ferent degrees of prominence, the next question is between how many
degrees these memetic configurations are likely to distinguish and for
what kind of patterns they are likely to code. Again, it is possible to
deduce that memes for distinguishing just two degrees of prominence
and for alternating lifts and dips (henceforth [Sw]-memes) should repli-
cate better than memes for more subtle distinctions and more complex
patterns. The reason is that linguistic memes, although they may them-
selves be encoded digitally (that is, in neuronal cell assemblies which may
either fire or not), depend for their replication on being expressed in artic-
ulatory behaviour and sound. Neither of the two involves discrete units,
however, and can therefore convey information only through analogue
signals, which are highly susceptible to distortion. If there are memes for
many different degrees of relative prominence, these memes will there-
fore come to ‘share’ the range of actual acoustic signals that express and
replicate them, as indicated in the following graph.
Now, if two assemblies ‘share a range of prominence levels’, this will
mean that any utterance stretch whose prominence falls within that range,
may trigger the activation of either of the two assemblies. Since both the
stability of an assembly and its propensity to fire are likely to depend
on the frequency with which it gets activated by a particular signal, a
How morph-memes live with feet 261

Memes A B C D E F G H I

Prominence scale - ←                  → +

Ideal expressions

Real expressions

Figure 8.2 How analogue transmission selects for binary oppositions.

Stage 1 Stage 2 Stage 3


Figure 8.3 The emergence of binary oppositions.

situation in which a set of signals is ‘shared’ as a vehicle for expression

and replication by two assemblies is unlikely to remain stable. Any of two
assemblies sharing a range of prominence values that happens for some
reason to be activated more often than the other by a signal falling within
the shared range, will thereby acquire a greater stability and a greater
propensity to fire again. The next time a signal in the shared prominence
range occurs, that assembly will be inherently more likely to be activated
by it than the one with which it originally shared the range. This will
make it stronger still, and increase its propensity to fire even more. Its
neighbour’s claim on the shared territory, on the other hand, is going to
become proportionally weaker. The feedback between activation and the
chance of being activated again will see to it that the strong get stronger,
and the weak weaker, as in figure 8.3. Eventually, one of two assemblies
that share a range of prominence values for expression and replication
is going to be activated so rarely that it will disintegrate, and disappear
Since this process will repeat itself as long as two assemblies depend on
‘sharing’ a range of expressions it can be predicted that memetic config-
urations for fewer prominence distinctions will generally be more stable
than configurations for more. Thus, any population of linguistic replica-
tor teams that is inhabited by memes for many prominence distinctions is
likely to be invaded by memes for fewer. Now, while this argument would
predict that memes for prominence distinctions should generally be
262 Selfish Sounds and Linguistic Evolution

unable to replicate faithfully, we have seen above that populations with no

such memes are not stable either, although for different reasons. Thus,
populations of replicator teams are predicted to be most stable, if they
contain memes for the minimal number of different degrees of promi-
nence possible, that is, two. Which does indeed seem to be the case.
Of course, there are additional reasons why memes for a binary dis-
tinction between different prominence levels should be more stable than
others. Thus, the two assemblies in a [Sw]-meme do not have to code for,
and express as prominence levels within specified ranges at all. Instead,
they can code simply for prominence differences between neighbouring
utterance stretches. Thus, an S-assembly in a binary system may be
caused to fire by the expression of any syllable which is more prominent
than that of a neighbouring one, and in order to activate a w-assembly,
the expression of syllable only needs to be less prominent than a neigh-
bouring one. Since the amount of the actual rise in prominence does not
matter, the analogue character of acoustic signals and its susceptibility
to distortion ceases to be problematic. An acoustic signal supposed to
express an S or a w-assembly does not have to hit a specified target area
at all. Instead it has to miss a target on one of its two sides. At the same
time, this fact makes it easy to understand why the actual utterance pat-
terns coded for by an [Sw]-meme will tend to be alternations of strong
and weak positions, rather than say long sequences of positions that get
continuously stronger or weaker: the weaker a particular signal is, the
easier it will be to miss it on the strong side, and vice versa. Therefore,
memetic configurations in which the S node and the w node make each
other’s activation more likely will have a higher chance of being accurately
expressed and replicated, than potential rivals in which such a relation is
not encoded.
Thus, the [Sw]-meme, which represents the core of the memetic con-
figuration for the foot types which have likely characterised English utter-
ances for a long time, is of inherently great evolutionary stability. As
already indicated, phonological literature abounds with arguments which,
albeit different in type, are often to the same effect (see Dziubalska 1995:
58–60). The gist is that the exceptional ‘fitness’ of memes for binary foot
structures can more or less be taken for granted. Let us return, then, to the
case of ME have, and its failure to undergo Open Syllable Lengthening. Foot–morph relations

First note that while actual linguistic utterances always express memetic
configurations for rhythmic structure, they never express only, or
even primarily, those. Instead utterances always also express memes for
How morph-memes live with feet 263

morphs, syntactic configurations, conceptual structures and so on. Every

utterance expresses a temporary alliance of linguistic replicators. Also,
replicator alliances which work well in expression may become stable
enough to qualify themselves as replicating units. The best examples are
memes for morphs, which represent associations of memes for speech
sounds and syllabic relations that are so stable and copy so faithfully as
integral units that this lends them replicator status. This implies that the
distinction between actual replicating units and looser associations is not
necessarily categorical, but may be fuzzy.
Recall next the relation between memetic configurations ‘for’ rhythmic
structure and memes for morphs. We observed that while morph-memes
do co-operate with the foot-memeplex in regular ways (see also below),
their associations seem to be less tight than, say, the relations among
sound-memes which code for a morph. While alliances among the latter
are so stable that they qualify as replicating units, this is not true of
the ways in which morph-memes associate with memes for the rhythmic
properties of English utterances.
Of course, English has for a long time been characterised by so called
‘fixed word stress’. As in most other Germanic languages, it was typi-
cally the first, or left-most syllable of English lexical morph-memes which
came to be expressed prominently. As Roger Lass (1994: 88) puts it, ‘the
first syllable of the word was generally stressed [. . . but] initial syllables
were generally not stressed if they were prefixes’. There are good argu-
ments18 why morph-memes should ‘profit’ from a mental configuration
that gets them expressed with salient initial syllables. Being word initial
(or rather: lexical-morph initial), phonetic salience would come to indi-
cate the beginning of morph-meme expressions, make them easier to be
recognised and thus facilitate their replication.19 But this does not neces-
sarily imply that each morph-meme should have its initial syllable firmly
linked to a node for prominence.20 Instead, the observed expression pat-
terns would also result, if assemblies for, say, major lexical categories such
as A[djective], V[erb], and N[oun],21 were associated with the assembly
for prominence. Then, each firing of an assembly for A, V or N would
activate the prominence assembly. The effect would be that in utterances

18 See, for example, Dziubalska (1995), or McCarthy/ Prince (forthcoming)

19 The replication of the morph-memes of course. Their expressions do not replicate.
20 Recall that hardly any contemporary phonological framework regards stress patterns as a
property of lexical representations. Instead stress contours have been ‘assigned by rules’
at least since Chomsky/Halle (1968).
21 These would express as specific syntactic patterns and would obviously also be linked to
morph-memes in their respective categories.
264 Selfish Sounds and Linguistic Evolution

the first syllable of the morph-meme which triggers A, V or N would be

expressed with increased articulatory effort. Likewise, each perception of
prominence would forward energy to the assemblies for A, V or N, thus
increasing their propensity to fire. The details of network patterns which
could implement this are not important here; what matters is that in order
to express with salient first syllables morph-memes do not require specific
and tight associations to constituents for utterance rhythm, in our case
the meme, or memeplex for English feet.
Whatever the mental set-up that expresses as ‘fixed word stress’ might
look like, of course, one thing is certain: it will not fully determine, but
merely constrain the ways in which the expressions of polysyllabic morphs
can be integrated into rhythmically structured sound sequences. It will
leave many other issues connected to their rhythmic roles to be resolved
in individual expressions of longer utterance stretches. For instance, fixed
word stress says little about the relative degrees of prominence with which
the unstressed syllables in polysyllabic items come to be expressed. It says
even less about the rhythmic roles that are played by the expressions of
monosyllabic morphs. Particularly in the case of the latter, their relative
prominence will depend strongly on how the foot-meme itself influences
its expression in longer utterance stretches. Thus, English morph-memes,
particularly monosyllables, do not themselves code (or only very indi-
rectly) for the rhythmic structures within which they get to be expressed.
As one might say in more established terminology, lexical entries do not
‘have’ rhythmic structure at all. Instead, the majority of the rhythmic
properties of (morph-memes expressed in) linguistic utterances are best
understood as expressing a memetic configuration for rhythmic struc-
ture which is independent of the morph-memes with which it expresses
together. For example, the links between nodes for strong and weak posi-
tions, and the fact that the activation of one makes the activation of the
other more likely, often result in alternating rhythmic patterns in actual
discourse. As cases such as the rhythmic reversals in phrases like

(31)  fifteen  years vs  I’m fif  teen. or

 New York  City vs New York

show, the preference for alternating rhythm (coded for by the foot
meme) may overrule associations between the constituents of polysyl-
labic morph-memes and constituents for prosodic strength. As this sug-
gests, the rhythmic roles which the expressions of a morph-meme play
will always be co-determined by factors that are, for all practical purposes,
independent of the morph-memes themselves. From the point-of-view of
a morph-meme, they will represent parts of its ‘environment’.
How morph-memes live with feet 265 Conclusion: have remained short because it was better adapted

to the [Sw]-meme for alternating rhythm
The factors which determine the rhythmic properties of the utterances
in which a morph-meme gets expressed may to a large extent be inde-
pendent of the morph-meme itself. However, this does not imply that
the morph-meme will not be selectively sensitive to them. Instead, if for
whatever reasons a morph-meme happens to be expressed more often in
rhythmic configurations of type A rather than B, the morph-meme lineage
(and/or population) will come to ‘know’ about this statistical fact. From
its point-of-view, it will represent a constant, and therefore predictable,
property of the environment within which it has to express and replicate.
Therefore, the meme will be able to incorporate information about it by
adapting its own structural properties accordingly.
This, then, is what seems to have happened in the mixed population
of morph-memes for ME have. Because of their grammatical status as
auxiliaries and their reduced conceptual content, they would often have
been expressed in the neighbourhood of lexical morph-memes and/or
morph-memes that expressed more conceptual content. Since the lat-
ter would have activated the strength assembly in the mental configu-
ration for rhythmic structure and would therefore be usually expressed
as foot-heads, the expressions of {hav} would typically come to express
prosodically weak rather than strong positions. Therefore, reduced artic-
ulatory energy and less time was attributed to their expression. Both the
{/hav (ə )/} and the {/hav (ə )/} variant of the meme for have, would more
often express as [hav(ə )] than as [ha v(ə )]. Since the short speech sound
[a] expresses and replicates memes for short /a / better than memes for
long /a /, the role which expressions of the {have}-meme typically played
in the rhythmic structures of Middle English utterances gave an advan-
tage to the /hav / variant and selected against the /hav / variant. Thus, /a /-
memes remained stably associated to Middle English morph-memes for
have. They were better adapted than /a /-memes to the typical role which
expressions of have played within the rhythmical organisation of English
utterances. Since that role was attributed to them by the [Sw]-meme, it
is perfectly plausible to say that the evolutionary stability of /hav (ə )/ and
the failure of /hav (ə )/ reflect that /hav (ə )/ was better adapted to the the
[Sw]-meme. This represents a clear case of meme-to-meme adaptation.

8.2.2 Generalising the case of have: the adaptive value of ‘regular’ open
syllable lengthenings
If one accepts the explanation just given for the evolutionary stability of
short ME /havə /, one will have to use the very same argument to explain
266 Selfish Sounds and Linguistic Evolution

actual lengthenings in open syllables as well. If short /a /s managed to win

the competition for a place in morph-memes for have against long /a /s
because they were better adapted to the role that have was made to play
in the rhythm of Middle English utterances, the same must be true for
words in which short vowels were ousted by competing long vowels. Of
course, those words played different rhythmic roles. Contrary to have,
they were normal lexical morphs rather than auxiliaries. They were more
often expressed in stressed than in unstressed positions. The articulation
of their segments would be carried out with increased effort, their expres-
sions would come to be more prominent and, by implication, longer.
Since phonetically long vowel sounds replicate memes for long vowels
better than memes for short vowels, it is obvious that short vowel memes
could not have been as stable in behave, make, graze and so on as they were
in have. Just as the short /a / in {/hav (ə )/} was better adapted to have’s
usual occurrence in metrical dips, so the long vowels in {/behav (ə )/},
{/mak (ə )/}, {/raz (ə )/}, and so on were better adapted to their typical
occurrence in metrical lifts. Therefore, it makes sense to say that they
managed to oust {/mak (ə )/}, {/raz (ə )/} and all the other short-vowel
morph-memes from the population of English replicator teams because
they were better able to express and replicate in the rhythmic roles which
the [Sw]-meme attributed to them. The were better adapted to it. Thus,
all lengthenings in open disyllables represent adaptations to the memetic
configuration for ME rhythmical organisation.
At this point it is crucial to remember the following. Saying that long
vowels will be selected for in syllables which often express as metrical lifts,
is not equivalent to saying that vowels in stressed syllables will become
long. Selectional pressures do not translate into categorical rules, and no
specific replicator lineage will ever be subject to just a single selectional
pressure. In the case of the lengthenings we are discussing here, the pres-
sure on vowel-memes to be long if they usually express in stressed syllables
was certainly often outweighed by others. Thus, in many English morph-
memes that were usually expressed stressed, short vowels managed to
defend their positions – or even to oust long variants. For instance, short-
vowel memes were highly favoured in morph-memes where they were
followed by consonant clusters such as /nt / as in {/plant (ə )/}, /pt / as in
{/kept (ə )/}, or /st / as in {/dust (ə )/}. This reflects that the ‘closed sylla-
ble environment’ selected for short rather than long vowels. Crucially, this
does not mean that these vowels were not under pressure to lengthen if
stressed. It just means that the impact of the closed syllable environment
was stronger.
In the next section, we shall pursue this line of reasoning fur-
ther and look at the role which ‘foot length’, another rhythm-related
How morph-memes live with feet 267

parameter seems to have played in Early Middle English changes of vowel


8.3 Vowel quantity and foot length

8.3.1 Introduction
If one looks at the Modern English distribution of lengthened non-high22
vowels in di- and trisyllabic words with open first syllables (as in Ritt
1994), one will not fail to observe that, roughly speaking, the chance of
short-vowel variants to outlive and out-replicate their long-vowel com-
petitors was proportional to the overall metrical weight of the unstressed
syllables which followed the vowels in their expressions. Figure 8.4, based
on the analysis in Ritt 1994, illustrates this.
(1) Words like make or crake ‘crack’– in which the stressed vowel was
followed by a syllable that contained at best an optional schwa23 in
its rhyme – display lengthening in more than 90 per cent of all cases,
making make prototypical and crack highly exceptional.
(2) Words like befor ‘beaver’, bodie ‘body’, or weþ er ‘weather’ – whose
second syllable has remained stable and has come to contain a vowel
or syllabic liquid – lengthened much less frequently: only a quarter
of potential inputs show up long in ModE.
(3) Among words whose second syllables have ended up with a VC
rhyme, such as in capon, bonnet or bottom, lengthening has affected
slightly less than 10 per cent of possible victims.
(4) Of those words whose second syllable is even heavier than that, as
in patient or warrant (both VCC), only 5 per cent have survived in a
lengthened variant.
(5) No lengthened variants have survived at all, finally, of words with
two or more syllables following the stressed vowel. In fact, in some
trisyllabic items, long vowel variants were even replaced by short
vowel competitors (as in southern, errand or holiday).
As we shall see, that long-vowel variants replicated better in such
morph-memes where they occurred before single light syllables (and
worse such in morph-memes where they occurred before heavy (or mul-
tiple) ones) offers itself as a very plausible explanation. Once again, the
rhythmic organisation of Middle English utterances is likely to have played
the decisive role.

22 The fact that high vowels usually resisted lengthening is probably due to the ‘inherent’
shortness of their expressions.
23 That is to say a schwa-meme which was not necessarily expressed by a schwa sound.
268 Selfish Sounds and Linguistic Evolution

100 93

P l 80
r e
o n s
b g h 60
a t o
b h r
i e t
l n e
i i n 25
t n i
y g n 20
g 9
o o
f r 0


























MORE < weight of weak syllables > LESS

Figure 8.4 The implementation of Open Syllable Lengthening.

8.3.2 Another look at the meme-plex for feet: the timing unit
Stress timing languages such as English are characterised by at least a
tendency towards foot isochrony. There is a reason why a meme for such
a tendency should replicate well. Consider that the [Sw]-meme(-plex)
will tend to express, more often than otherwise, in alternating patterns
of more and less prominent syllables. Thereby it will make [Sw]-patterns
statistically more common in actual utterances than patterns like ([SS],
[Sww], [Swww], or even [Swwwww]). This implies that, again statisti-
cally speaking, prominence peaks will tend to occur at temporally regular
intervals. A meme ‘for’ expecting such regular intervals will therefore
have a good chance of establishing itself within the replicator population.
If it does, it will in turn ‘reward’ any meme which causes its expectations
to be met by timing articulation accordingly. Together, the memes will
reinforce one another symbiotically, so to speak, and may combine into a
stable memetic unit for ‘foot-timing’.24 A unit ‘for’ foot timing will express

24 Closely associated to the ‘meme(-plex)’ for English feet.

How morph-memes live with feet 269

by speeding up pronunciation when a large number of segments need to

be articulated between prominence peaks, and by slowing it down, when
their number is small. Its overall effect will be that the actual durations of
feet within an utterance are assimilated to one another. This then causes
speech to be (at least perceived as) rhythmical.
Thus, feet which are comparably short in terms of phonological seg-
ments are pronounced relatively slowly, while feet that contain a large
number of phonological segments are pronounced relatively quickly.
Every student of English as a Foreign Language will be familiar with
examples like the following.

(32) |Hé is a |góod |fríend of |míne.

|Hé is a |góod old |fríend of |míne.
|t0 |t1 |t2 |t3

Of these it is normally said that the duration t(n+1) − t(n) will tend to be
roughly constant. In particular, | góod old | is supposed to last not much
longer than | góod | – at least not as long as one might guess from the
relatively greater number of segments that it contains. Therefore, good in
the foot | góod old | must be pronounced relatively more quickly than good
in the foot | góod |. This implies, of course, that in | góod old | comparably
less time will be available for the articulation of the individual segments
than in | góod |. Since vowels are by their very nature more flexible with
regard to their duration than stops, it is in their articulation that speed
differences will typically manifest themselves most strongly. Thus, the
[υ ]s in feet like | góod old | will tend to be phonetically shorter than the
[υ ]s in feet like | góod |.

8.3.3 What morph-memes may learn about timing

With this observation in mind, consider the following: in languages which
tend to have fixed initial stress such as Early Middle English, it is rea-
sonable to assume, at least for di- and trisyllabic items, a rough statistical
correlation between the length of words and the length of the feet in
which they actually occur when uttered. Length is here understood as
length in terms of phonological segments. Such a correlation is likely to
hold because, although the amount of unstressed material that can follow
a word before the next foothead within actual utterances will of course be
variable as observed already by Vennemann (1986), differences are likely
to sum over for most words if a large enough number of utterances are
taken into consideration. The following graph attempts to capture this
270 Selfish Sounds and Linguistic Evolution

(33) Word Length Average Foot Length


It is therefore reasonable to assume that morph-memes for long words will

have typically occurred in segmentally longer feet than morph-memes for
short words. In the same way as lineages and/or populations of morph-
memes may come to ‘learn’ about the relative probabilities with which
their members will be expressed as lifts or dips, so they must also be able
to ‘learn’ about the relative average duration of the feet within which their
expressions will typically come to figure – and they are likely to react in
the manner which Darwinian evolutionary theory predicts, that is to say
by adapting their own structural properties accordingly.

8.3.4 Vowel lengthening and shortening as adaptive responses to pressures

exerted by the timing unit in the memeplex for feet
If one combines the two observations just outlined, they explain elegantly
and plausibly how diachronic vowel lengthenings seem to have been
implemented among Early Middle English di- and trisyllabic items. Seg-
mentally short morph-memes, such as make, possibly often pronounced
without a final schwa, would have been expressed in feet which were
typically shorter (in terms of segments) than the feet in which longer
morph-memes, such as bevor, weþ er, or warrant, came to be expressed.
Assuming that foot duration tended to be constant already in Middle
English utterances, the phonetic realisations expressing vowel memes in
words of the make type will have lasted relatively long. Phonetic variants
such as [ma.k] or [ma:k] will therefore have been more frequent than
the variant [mak]. Conversely, for words of the warrant type, variants like
[wɑ .rənt ] or even [wɑ :rənt ] will have been relatively less frequent than
variants like [wɑrənt ].
From here we can basically follow the line of argumentation with which
we are already familiar. Long vowel sounds serve better to replicate
memes for long vowels than to replicate memes for short vowels, and
vice versa. The more frequently a morph-meme came to be expressed
with a long vowel sound, the greater were the chances that a long vowel
meme would come to be stably associated with it. Therefore, the minds
of Middle English speakers will have been more likely to select and
How morph-memes live with feet 271

stabilise morph-memes for {/ma :k /} than morph-memes for {/mak /}.25

Conversely, they will have preferred to acquire {/’warənt /} rather than
its competing variant {/wa :rənt /}. In either case, they adopted the
morph-meme variants which replicated best through their typical pho-
netic realisations. Therefore, {/ma :k /} and {/warənt /} managed to oust –
or to defend their position against – {/mak /} and {/wa :rənt /}. Or, if
one prefers to take the prespective of the vowel memes, /a /s managed
to oust /a /s from their positions in memes for make, while /a /s man-
aged to fend off the attempted invasions of /a /s into morph-memes for
What applies to morph-memes of the make and warrant types, also
applies – a fortiori – to cases of Trisyllabic Shortening. In trisyllabic
items, with stress on their first syllables, short-vowel memes not only
defended their positions against long competitors, but even managed to
out-replicate the latter. They did so in morph-memes like sūþ erne > south-
ern, for example. The success of short-vowel memes in such items clearly
was an adaptive response to the combined pressures of fixed left stress
and the foot-meme. It reflects the same pressures as the replacement of
/a / through /a :/ in make or the stability of short /a / in warrant. If long
vowels did not make it in the latter, why should they have survived in
word forms which were segmentally even longer? The factors by which
‘trisyllabic shortenings’ were brought about are indeed exactly the same
as those that prevented long vowels from establishing themselves in repre-
sentations of warrant and that helped them to oust their short competitors
in representations of make.
The reason why {/ma :k (ə )/} managed to oust {/mak (ə )/} from the pop-
ulation of memes for make, why /warənt / failed to be ousted by variants
like /warənt /, and why /suðərn (ə )/ came to be ousted by /sυðə rn(ə )
is therefore essentially the same as the reason why /ha :v (ə )/ failed to
oust /hav (ə )/ from the population of memes for have. In each case those
variants managed to survive which were better adapted to the roles
their phonetic realisations typically came to play within the rhythmical
organisation of Middle English utterances. While in the case of have,
however, the stability of the short vowel was adaptive to the fact that
the [Sw]-meme tended to attribute the role of metrical dips to expres-
sions of the morph-meme {/hav (ə )/}, the vowel quantities in morph-
memes for full lexical words were additionally adaptive to the average
duration of the feet in which their expressions occurred, and this in
turn represented an effect of the timing-unit within the memeplex for

25 The same obviously holds true for all items of the make-type.
272 Selfish Sounds and Linguistic Evolution

8.3.5 Summary
Essentially, all vowel lengthenings and shortenings among Middle English
di- and trisyllabic words reflected selection pressures that were exerted
on the phonological structure of Middle English morph-memes by two
rhythmic principles: fixed left stress and foot isochrony. These princi-
ples represented the combined impact of a memeplex for Middle English
feet. The pressures which that memeplex exerted on the population of
morph-memes generally selected against short vowels in auxiliaries like
have, which they frequently caused to be expressed as metrical dips. They
seem to have favoured long vowels in fully lexical morph-memes (which
tended to be expressed as metrical lifts) when these morph-memes were
also expressed in relatively short feet. And they selected for short vowels in
such morph-memes which tended to be expressed in relatively long feet.
Since morph-meme length and foot length correlated, this amounted to
long vowels being favoured in short morph-memes and short vowels in
long morph-memes. Thus, Early Middle English changes of vowel quan-
tity in disyllabic (and as we have already seen: also trisyllabic) words
represent cases of co-adaptation between memes in the meme teams that
Middle English competences represented. Phonological and morphologi-
cal replicators from the lexical section adapted to a memetic configuration
that governed utterance rhythm.

8.4 Competing selectional pressures and the statistical

nature of EME quantity adjustments
It is important to bear in mind that rhythm-related pressures did not rep-
resent the only factors behind Middle English changes of vowel quantity.
Nor should one attempt to derive covering, categorical laws from them,
by which the fate of each individual lexical item could be predicted. It is
in the very nature of selection pressures that, while they may favour the
replication of certain variants of patterns over that of others, their effects
may be counteracted by other pressures. A situation where a single set
of selection pressures gets so strong that it completely removes certain
types of patterns from the population can only be one of a variety of
possible scenarios. In other cases, specific selection pressures may just
cause statistical changes in the distribution of patterns within a pool, or
system of replicators. This seems to have been true of the pressure which
utterance rhythm exerted on the vowel quantity in Middle English lexical
items. It explains why they have become the Neogrammarian nightmare
that they are. As practically all studies both from the more remote and
the more recent past have shown, the pressures exerted by the memeplex
How morph-memes live with feet 273

for English feet were not the only factors which had an impact on the
evolutionary stability of long and short vowel memes. Instead, a consid-
erable variety of other, and sometimes conflicting, factors were at play,
and each of them seems to have favoured the replication of short or long
vowels in quite independent ways. These factors included the sonority
of the consonants or clusters which immediately followed the vowels, the
weight – or possibly the phonetic duration – of these consonant clusters,
as well as the height (and therefore the inherent duration) of the vowels
The result of their interaction is essentially what my 1994 Quantity
Adjustment Rule captures.26 As its empirical adequacy (for all items
except [CVC]-monosyllables) shows, the pressures exerted by the fac-
tors just listed amount to a rather plausible and unified description/
explanation not only of Open Syllable Lengthening and its exceptions
but, in fact, of all Early Middle English quantity changes that have tradi-
tionally been described in terms of four separate sound laws.27
What is important, however, is that the undeniable variety of factors
involved should not be interpreted as reducing the independent rele-
vance of any single one of them. If the lineage of a morph-meme failed
to respond to a pressure, this does not mean that it was not subjected to
it. It simply means that it did not pay for the morph-meme lineage to
adapt to it, because other pressures ensured that established variants
replicated better than potential competitors.28 Thus, contrary to what the
established sound laws suggest – by attributing distinct sets of changes
26 The timing unit in the English foot-meme may also have been decisive in the competition
between long and short vowels in words like kēpte and dūst, showing up as Mode kept
and dust and traditionally attributed to a law called Pre-Cluster Shortening. There, the
heavy consonantal codas seem to have consumed too much articulation time for long
vowels to survive before them. Although vowel height may have played an additional
role in the competition, the principle was again very much the same as that behind the
‘lengthening’ of make or the ‘shortening’ of southern. In the lengthenings traditionally
called ‘homorganic’ as in cı̄ld > child, finally, the picture is again only seemingly compli-
cated by the fact that their heavy codas would appear to favour short rather than long
vowels. In fact, their homorganic nature allows them to be pronounced almost exactly
as quickly as single consonants, so that the success of long /i / over short /i/ in lexical
representations of child is as little (or, if you will, as much) of a surprise as the success of
/a / over /a / in representations of make.
27 Since Ritt (1994) contains a sufficiently detailed account of the different factors which
seem to have been relevant in bringing the well studied changes in early Middle English
vowel quantity about, no more shall be said about them here.
28 Significantly, recent approaches to modelling speaker competence in the generative tradi-
tion have developed a formalism by which ‘outputs’, that is, theorems in the production
system which grammar, or competence is supposed to represent, are not derived by
means of categorical rules, but through rivalling and violable constraints, which ‘select’
the optimal one from a set of competing candidates. Constraints are supposed to be
universal and individual languages are assumed to differ because they ‘rank’ constraints
differently. The approach is called ‘Optimality Theory’, and while it appears similar to
274 Selfish Sounds and Linguistic Evolution

(such the lengthenings in words of the make type) to distinct single causes
(such as ‘the open-penultimate-syllable environment’), various factors
favouring either short or long-vowel phonemes exerted their selectional
pressures in parallel. That the relative success of long over short vow-
els seems to have also depended on parameters such as their height or
the sonority of their context, does therefore not mean that it did not
equally depend on rhythmically induced pressures, or vice versa. It just
explains why the impacts of each of the pressures seem to have remained
statistical: individual pressures sometimes added up to one another, and
sometimes cancelled one another out. Contrary to some linguists,29 but
in accordance with established practice in many social and biological dis-
ciplines, I find nothing objectionable in this notion, although the issue
naturally merits a separate and more detailed discussion and would seem
to call, in particular, for the application of more sophisticated mathemat-
ical tools than linguists can typically handle. Since all attempts to account
for Early Middle English changes of vowel quantity in terms of categor-
ical rules have proved empirically more or less inadequate, and in the
absence of convincing counter-arguments, I prefer to conceive of these
changes as statistical phenomena, by which individual lexical items were
not affected in a categorical, regular manner, but merely with certain,
specifiable probabilities.30

8.5 The surprising stability of short vowels in CVC

monosyllables or The descent of [mæn] and [gɒ d]

8.5.1 Introduction
We have argued that lengthenings in words of the make type were adaptive
responses to pressures emerging from the fact that segmentally short
words were – for rhythmical reasons – pronounced relatively long and
thus favoured the replication of long-vowel phonemes over that of short
ones. This has an interesting, and rather obvious, implication, namely that

the approach advocated here, its theoretical basis is still fundamentally different. In par-
ticular, ‘optimality theoretic’ constraints are supposed to be part of Universal Grammar,
and the mechanics by which they select among rivalling outputs therefore internal to
the grammar, that is, the mind as well. Selectional pressures on linguistic replicators
and teams of such, on the other hand, derive to a considerable extent from expression,
that is, the behavioural and textual products of linguistic competence. The fact that
generative theories distinguish categorically between the study of grammar (Chomsky’s
I-language) and discourse (one of Chomsky’s E-languages), plus their decision to focus
more or less exclusively on the former, force them to postulate mind-internal counter-
parts to mind external phenomena with mental effects if they want to incorporate them
into their competence models.
29 E.g. Lass (1980), Prince/Smolensky (1993), or Bermudez-Otero/McCully (1997).
30 Even though I have to accept that their specification remains very much an open problem.
How morph-memes live with feet 275

short vowels should not have been evolutionarily stable in monosyllabic

CVC-items such as god or man.31
Instead, it would seem that the Modern English descendants of god and
mon should be /əυd / and /me  n /, rather than /ɒd / and /m æn /. Miracles
do not occur, so no morph-meme can have been miraculously immune
to rhythmically based pressures. Also the replication of man and god
depended on their expression in rhythmically organised utterances. As
we saw above, the effects of the meme for English feet on the expres-
sions of Middle English morph-memes made it difficult for short morph-
memes32 to replicate well with short vowel memes. Short morph-memes
were likely to be expressed in short feet. This lengthened the typical dura-
tion of the sounds expressing their vowels. Lengthened vowel sounds were
better at replicating long-vowel memes than short-vowel memes. There-
fore, the former would out-replicate and oust the latter in short morph-
memes. Of course, monosyllables are as short as can be. The number of
segments in the rhymes of their weak syllables is zero. They are certainly
shorter, it would seem, than morphs of the make type, and in those we
have seen that short vowels had no chance against their long competitors.
Therefore, saying that short morph-memes were under strong rhythmical
pressure to mutate into long vowel variants predicts that a large number
of vowel lengthenings should have occurred among morph-memes like
man and god. But we do say /m æn / and /ɒd / in Modern English. This
seems to suggest that our explanation of the lengthenings and shorten-
ings in morph-memes of the types make, beaver, warrant and southern
might be wrong after all. The evolutionary stability of short vowels in
morph-memes for [CVC]-monosyllables seems to falsify our hypothesis
that the English memeplex for feet selected against short vowels and for
their long competitors in morph-memes which were segmentally short.
This is a serious problem, and merits a detailed discussion.

8.5.2 Handbook lore

The common view that [CVC] monosyllables hardly ever lengthened
goes back to the days of Karl Luick (see, for instance Luick 1898 and
31 Since this is also what a rule like (44a), extrapolated from Quantity Adjustment (rule
(28) on page 241 above), suggests,

(44a) p(V → [+long]) ≈ k

weight(w )

Read: the probability of vowel lengthening is inversely proportional to the weight of the weak
syllables ( w ) following it within the same word. The evolutionary account of the relation
seems to represent a straightforward interpretation of the Quantity Adjustment rule or
that, conversely, Quantity Adjustment merely formalises the evolutionary, replicator-
based account.
32 I.e. short in terms of the number of segments in the rhymes of their weak syllables.
276 Selfish Sounds and Linguistic Evolution

1914/21) and has never really been questioned since (except in Ritt
1997b). However, it is inadequate. [CVC] monosyllables did lengthen
quite frequently.33 Probably Luick failed to realise their number because
he was simply convinced that there could not be many of them, so he did
not look too hard. After all, it would have been absurd for Luick to search
for evidence of a process that lengthened vowels in closed syllables, when
he defended a sound law in which a necessary condition for vowels to be
lengthened was their occurrence specifically in open ones. Luick’s ‘belief’
in Open Syllable Lengthening may have been additionally strengthened
by the apparent frequency of ‘open syllable lengthenings’ in other Ger-
manic languages.34 These made the law appear plausible also for English,
while they would have made a law for Lengthening in Closed Syllables
highly exceptional. Thus, it came to be the established view that short
vowels in monosyllables with a [CVC] structure simply could not have
been affected by regular quantity changes during the Late Old English
and Early Middle English periods. That the (assumedly) few cases of
[CVC] items where vowel quantity obviously did change came then to
be considered as mere analogical extensions of changes which ‘really’
happened in disyllabic forms, is simply a logical consequence. Thus, the
Oxford English Dictionary still asserts of Modern English whale (< OE
hwœl ) that ‘The present form whale represents oblique forms (hwalas)
etc’ (OED: sv. whale).
However, vowels were not only lengthened more frequently in [CVC]
items than Luick and his successors had believed, but they also lengthened
much less frequently where they should have, if Open Syllable Length-
ening had been a proper Neogrammarian sound change. In particular,
we have seen in figure 8.4 above that words of the types beaver/weather,
capon/bottom or patient/warrant reflect the assumed ‘law’ only in a minor-
ity of items rather than in all (or at least most) of them. Consequently, the
assumption that whale and similar words got their long vowels through
analogical transfer from inflected forms such as /hwa :ləs / is doubly sus-
picious. Quite apart from the unfalsifiability of explanations which make
unconstrained use of analogy, the very forms that Luick proposed as the
bases of transfer are rather improbable.

8.5.3 The actual figures

As already indicated, the established way of accounting for lengthened
vowels in items such as whale might only just be defensible if they really

33 Although not as frequently as (44a) would seem to predict.

34 Such as German, Dutch, Swedish and others.
How morph-memes live with feet 277

represented merely a few isolated cases. However, they were as frequent as

lengthenings in Open Syllables of disyllabic items.35 Holthausen’s (1974)
Old English dictionary contains 69 Old English [CVC] monosyllables
which have survived into Modern English and whose present form allows
us to reconstruct their behaviour with respect to Early Middle English
quantity changes. Of these, 36 seem to have been replaced by long vowel
variants during the period in question, namely
(34) bœr ‘bare’, bed ‘bead’, blœd ‘blade’, broc ‘broke, misery’, col ‘coal’,
cran ‘crane, heron’, dœl ‘dale’, fœr ‘fare, journey’, flot ‘float, action
of floating (OED)’, grot ‘groats’, ham ‘hame’ or ‘the collar of a
draught horse’, hol ‘hole’, hop ‘hope’, hwœl ‘whale’, lœt ‘late’,
mot ‘mote (of dust)’, sceat ‘sheet’, slœd ‘slade’, sol ‘sole, pool’,
spœr ‘spare, chalk’, sped ‘spade: the gummy or wax-like matter
secreted at the corner of the eye (OED)’, stoc ‘x-stoke (in place
names)’, tœl ‘tale’, tot ‘tote, vault’, ð el ‘theal, plank’, wœr ‘aware’,
wer ‘weir’, spak ‘spake’, brak ‘broke’, gaf ‘gave’, bad ‘bade’, blœc
‘Blake (proper name)’, grœf ‘grave’, stœf ‘stave’, stœr ‘stare’ and
scead ‘shade’
Only 33 have survived with their original short vowels, namely
(35) bœc ‘back’, bœð ‘bath’, bet ‘better’, blœc ‘black’, brœs ‘brass’, cot
‘cot’, dol ‘dull’, fœt ‘vat’, god ‘god’, grœs ‘grass’, hlot ‘lot’, loc
‘lock’, los ‘loss’, pœð ‘path’, plot ‘plot’, sœd ‘sad’, sœp ‘sap’, scead
‘shed’, scot ‘shot’, set ‘set’, slœc ‘slack’, slop ‘slop’, smœl ‘small’,
soc ‘(dial.) sock’, spœr ‘spar’, stœf ‘staff’, swan ‘swan’, trod ‘trod’,
tro ‘trough’, ðroc ‘throck, share-beam’, wœsc ‘wash’, sat ‘sat’ and
grot ‘grot’
Thus, slightly more than 50 per cent (!) of Old English [CVC] items
came to be replaced by variants with long stressed vowels. This practically
equals the lengthening rate among open disyllables (which was about 55
per cent), so that there is no reason why one should regard the latter as
law driven and the former as exceptional.
Apparently, Old English [CVC] monosyllables were after all not quite
as immune to the selectional pressures which the memeplex for English
feet seems to have exerted on morph-memes as the established hand book
accounts suggest. Instead, there seem to be many among them whose
Modern English descendants do display long instead of short vowels.
This appears to bear out the prediction inherent to our account of other
quantity adjustments at least to some extent: the pressures created by

35 See also Ritt (1997).

278 Selfish Sounds and Linguistic Evolution
100 Probability of




ca (CV

m (CV

(C wa ient

be eath V)

ak C
po C

av er
w VC
V rra /

e/ @
n/ V

/w C)

CV nt


cr )

bo C)







MORE < weight of weak syllables > LESS

Figure 8.5 The implementation of lengthening among CVC monosyl-

fixed left word stress and the foot-meme did make short vowel phonemes
in segmentally short words evolutionary unstable and did favour their
long competitors.

8.5.4 Idiosyncrasies in the distribution of lengthened variants among

Old English [CVC] items
However, even though lengthenings have obviously occurred among Old
English [CVC] items, they were not nearly as frequent as the hypothesis
that the short feet, in which short morph-memes tend, statistically speak-
ing, to express, favoured long vowels would seem to suggest. Recall that
among [CVC(ə )] morph-memes, such as make or hope, long vowel vari-
ants managed to oust short-vowel competitors with a probability of more
than 90 per cent. Since [CVC] items are clearly shorter than [CVCə ]
items – or at least not longer if the instability of the final schwa is taken
into account – they ought to have displayed lengthening at least as often.
However, they do not seem to have, as figure 8.5 shows.

8.5.5 An attempted explanation

The figures in the chart (figure 8.5) are undeniably problematic. If short-
vowel memes were the more likely to be ousted by long competitors the
shorter the morph-memes were in which they occurred, then, clearly,
short-vowel memes ought to have been evolutionarily less stable in words
How morph-memes live with feet 279

like God or whale than in words like make. Does this mean that the proba-
bilities of vowel lengthening and shortening in polysyllabic items did not
depend on the number of rhyme segments in their weak syllables after
As I shall argue they did. But not straightforwardly. We committed
an error if we thought they should. One should not generalise from an
observed correlation without taking its explanation into account. Con-
sider how we derived the prediction which the [CVC]-lengthenings fail to
bear out. First, we observed that in all cases except [CVC]-monosyllables
the stability of short vowels was proportional to the rhyme-weight of the
weak syllables in the morph-memes with which they are associated. We
also provided an explanation of this correlation. It was that memes for foot
structure, with which memes for morphs co-expressed, had the effect of
shortening vowel sounds in long feet and lengthening them in short ones.
We observed that the segmental length of the feet in which memes for
polysyllabic morphs are expressed will correlate with their own segmental
length. So the causal relation between the rhyme segments in the weak
syllables of a morph-meme, and the meme for the vowel in its stressed
syllable is very indirect. It does not warrant a straightforward law like

(36) p (Vm → [+long]) ≈ k
weight(w )m

Read: the probability (p) that a vowel (V) associated to a morph-

meme (m ) will be ousted by its long competitor is inversely proportional
to the weight of the weak syllables ( w ) following it within the morph-

Although such a law may be descriptively adequate, the causal relation-

ship which it implies is too direct. Memes for rhyme segments in weak
syllables cannot influence memes for vowels in footheads directly. Instead,
the impact which ( w ) can have on memes for vowels in stressed sylla-
bles depends on the concomitant influence of the rhythm-memeplex, and
the channel through which they exert their combined influence is their
expression in actual utterances, which they will share with memes for
We argued above (see pp. 182 and particularly 262f.) that morph-
memes and memes for foot structure are likely to replicate indepen-
dently, and that the mental associations between them are likely to be
loose and indirect. Morph-memes as we defined them do not ‘have’ def-
inite suprasegmental structures. They only get them when expressed.
Therefore, in the case of make, in which the variants {/mak (ə )/} and
{/mak (ə )/} were in competition, the memeplex for feet caused their
expressions to be more often [mak (ə )] than [mak (ə )]. It may have
done so by activating the [S]-node in the [Sw]-meme whenever a
280 Selfish Sounds and Linguistic Evolution

lexical-morph-meme was expressed.36 Since the memes for English

feet replicated better through expressing as [mak (ə )] rather than as
[mak (ə )], they had to express as the former.37 In the cases of warrant and
southern it was the other way round. When the foot-memeplex expressed
together with them, it would replicate better when expressing as [wɑrənt ]
and [sυðərn (ə )] rather than as [wɑrənt ] and [suðərn (ə )]. Thereby,
and quite indirectly, the faithful replication of /{wɑrənt /} was made eas-
ier than that of {/wɑrənt /}, and the replication of {/sυðərn (ə )/} hap-
pened more easily than that of {/suðərn (ə )/}. One gets the same pic-
ture, if one reverses one’s perspective: morph-memes were associated to
memes for foot structure only indirectly. If they were linked to a meme
for a major lexical category (A, N, or V) they may have been connected
to the [S]-constituent in the foot-memeplex via such a link. Otherwise,
all they would come to ‘know’ about the memeplex for feet is what they
‘learned’ from the effects it had on the rhythmic structures of the actual
sound sequences through which they came to be expressed. In the cases of
make, warrant and southern, the ways in which the memeplex for feet co-
expressed together with them established a statistical correlation between
the number of segments in the rhymes of their weak syllables and the
metrical weight of the feet in which they came to be expressed. It was
the latter which affected the competition between long and short-vowel
memes, however, not the former.
From this it follows that the relation between memes for vowels, and
memes for rhyme segments in weak syllables cannot have been directly
causal either. Of course they would form stably replicating units together
when they were parts of the same morphs. But the effect which segments
in the weak rhymes of polysyllabic morph-memes had on the replication
of competing vowel variants in preceding syllables must have been indi-
rect. It was also mediated through the rhythmic structures of the sound
sequences in which they came to be expressed. As we have seen, these
structures were determined, above all, by the memes for foot structure.
Since the number of rhyme segments in the weak syllables of a morph-
meme did not directly affect the vowel in its stressed syllable, ‘law’ (36)
represents a shortcut, which it may not always be legal to take. For poly-
syllables, it happens to work because the duration of the feet in which
they get expressed correlates well with the number of rhyme-segments
in their weak syllables. Therefore, it will work for [CVC]-monosyllables
only if the same applies to them as well. That is to say, the duration of

36 And when nothing else inhibited the activation of that node. The activation of the
[S]-node may have been inhibited if it had just fired, for example.
37 If they had not done so, they would have been replaced by memes that did.
How morph-memes live with feet 281

the feet in which they are expressed must reflect the absence of weak syl-
lables by being shorter, on average, than the duration of the feet in which
polysyllables are typically expressed. As will be shown, this is probably
not the case.
In order to understand the exceptional status of monosyllables, con-
sider first the following two sentences. They illustrate once more how
the correlation between foot duration and the number of segments in the
weak rhymes of a morpheme comes about.
(37) a. Every syllable has a structure.
b. Every poem has a structure.
They are identical except that (37a) has the trisyllabic syllable where (37b)
has the disyllabic poem. This difference will be reflected in rhythmically
structured utterances of the two sentences, as (38) shows.

(38) a. |Every |syllable |has a |structure.

b. |Every |poem |has a |structure.

|t0 |t1 |t2 |t3

As one would have predicted, the expression of poem winds up in a shorter

foot and is likely to receive relatively more articulation time than the
expression of syllable. Thus, the pair behaves nicely with regard to the
word-length–foot-length correlation discussed above. But now look at an
example with a monosyllable in the place of syllable and poem.
(39) Every foot has a structure.
While it is of course conceivable to scan it rhythmically like the utterances
in (39), i.e. as in (40),

(40) |Every |foot |has a |structure.

|t0 |t1 |t2 |t3

this scansion is likely to occur preferably in slow speech, while in normal

speech and fast speech, a scansion like (41), where the expression of has
a is integrated into the foot headed by the expression of foot, is to be
expected more frequently.

(41) |Every |foot has a |structure.

|t0 |t1 |t2 |t3
282 Selfish Sounds and Linguistic Evolution

Conversely, parallel scansions for the sentences in (37) – given in (42) –

(42) a. |Every |syllable has a |structure.

b. |Every |poem has a |structure.

|t0 |t1 |t2 |t3

would be natural only for rapid speech. Thus, there seems to be a marked
difference between the ways in which polysyllables and monosyllables are
integrated into the rhythmical structures of English utterances. It reflects
the fact that the [S] and [w] nodes in the foot memeplex increase the likeli-
hood of each other’s activation and thus express, preferably, as trochees.
Since lexical monosyllables are in turn likely to activate the [S]-node,
the syllables whose expression comes to follow theirs will frequently co-
express with the [w]-node. The expressions of the final syllables of poly-
syllabic morph-memes, on the other hand, will usually co-express with the
[w] node and thereby increase the probability that the syllable expressed
immediately after them will co-express with the [S]-node. As one could
say in established terms, stressed monosyllables tend to ‘demote’ follow-
ing syllables where stressed polysyllables tend to ‘lift’ them.
Of course, there are many conceivable utterance configurations where
this tendency will be overridden by conflicting factors. Thus, the first syl-
lables in the expressions of full lexical items will tend to ‘resist’ demotion
while ‘unstressed’ prefixes will tend to ‘resist’ lifting, as (43a) and (43b)

(43) a. |Every |syllable |starts with an |onset.

|Every |poem |starts with a |word.
|Every |foot |starts with a |head.
|t0 |t1 |t2 |t3

b. |Every |syllable de- |serves an |onset.

|Every |poem de- |serves a |title.
|Every |foot de- |serves a |head.
|t0 |t1 |t2 |t3

However, it only requires the expression of a full lexical item to be fol-

lowed by the expression of semantically empty words or grammatical
morphemes such as auxiliaries, pronouns, determiners, or functors for
How morph-memes live with feet 283

conditions to arise in which the observed difference between monosylla-

bles and polysyllables will become relevant. If one samples a large enough
number of utterances, the average duration of the feet in which mono-
syllables occur will therefore not be as short as one would predict, if one
took only their own shortness into account and applied law (36).
What does all this imply, then, for the evolution of Old English mono-
syllables during the Early Middle English period, when the selectional
pressures which the memeplex for feet exerted on memes for morphemic
Gestalts caused many of them to evolve better adapted structures and to
associate with vowel-memes for different quantities? Clearly, utterances
like the ones just discussed, in which monosyllabic morph-memes caused
the memeplex for feet to ‘demote’ the expressions of items that succeeded
theirs, while polysyllabic ones motivated it to ‘lift’ them, must have existed
in Old and Middle English utterances just as often as in Modern English
ones. I take this to be so self-evident that three examples will be sufficient
to illustrate the fact. They are from the Old English version of the Old
Testament (Crawford 1969) and taken out of the Helsinki Corpus. They
contain the words God (monosyllabic) and Drihten ‘The Lord’ (disyl-
labic), and suggest what was likely to have happened to their rhythmic
structures, if one of the morph-memes was replaced by the other.
[…] |God cwæð to |Moyse38 |
[…] |Drihten |cwæð to |Moyse

|t0 |t1 |t3

(Numbers XIV, 11)

|God wearð ða |yrre |Israhela |bearnum,

|Drihten |wearð ða |yrre |Israhela |bearnum,

|t0 |t1 |t3 |t4 |t5

(Numbers XXV, 3)

|Drihten |him be- |bead, & |cwæð to ðam |folce:

|God him be- |bead, & |cwæð to ðam |folce:

|t0 |t1 |t3 |t4 |t5

(Joshua IV, 4)

38 First lines represent the original, second lines are modified by the author.
284 Selfish Sounds and Linguistic Evolution

These examples illustrate that what is true of Modern English must

have held for Old and Middle English as well: the average feet in which
monosyllables came to be expressed were not as short, relatively speaking,
as their own shortness in terms of segments would seem to imply, if law
(36) were a good law. But it is not. Its inherent prediction that [CVC]-
monosyllables should have displayed lengthening more, or at least as often
than words of the make type crucially depends on the assumption that
word length correlated with average relative foot length in all items of the
Early Middle English lexicon. And this assumption is unjustified. There-
fore, the fact that short vowels were more stable in [CVC]-monosyllables
than in polysyllables only shows that (36) does not work for them. It falsi-
fies that law, but it does not falsify the theory that vowel quantity changes
were adaptive responses to pressures exerted by the memeplex for feet on
the replication of morph-memes with which it co-expressed.
In order to explain the stability of short {/od /}, we can pursue the
argumentation with which we are by now familiar: the timing unit in the
memeplex for feet would express feet like |God cwœþ to|, |God wearð ða|
and |God him be| in such ways that the morph-meme for God would more
likely have been expressed as [od ] than as [od ] or [o
d ]. Therefore,
these feet would replicate {/od /} better than its potential rival {/od /}.
Thereby they would help the former to maintain its position in the pool
of English morph-memes against attempted invasions by {/od /} – but
probably only just, of course, since the morph-meme for God would also
have occurred in feet like

(45) |God |heofonan & |eor+dan

|t0 |t1 |t3

(Genesis , 1)

|God |bletsode+da |Noe & his |suna

|t0 |t1 |t3 |t4
(Genesis , 1)

|God |wuna+d be- |twux eow

|t0 |t1 |t3
(Joshua , 10)

In feet like these, the meme for foot timing would express God more
probably as [od ] than as [od ]. Thereby it would replicate {/od /}
better than {/od /}. In other words, the competition must have been
How morph-memes live with feet 285

a pretty close shave, and this is exactly what the fact that more than
50 per cent of Old English [CVC]-morph-memes actually do have
Modern English descendants with long vowels (e.g. bed ‘bead’, blœd
‘blade’, col ‘coal’, etc.), bears witness to.
That some [CVC]-monosyllables did evolve into long-vowel variants
indicates that they were as much under pressure to adapt to the ways in
which the memeplex for feet affected their expression as all other morph-
memes in the English lexicon. And this is exactly what our hypothesis
suggests of course. Whenever one of them came to be expressed in an
utterance where it figured in a comparably short foot, the phonetic dura-
tion of the sound which expressed its vowel would be extended, and
replicate a long-vowel meme more easily than a short one. Thus, a con-
siderable number of short-vowel phone-memes lost their slots in morph-
memes for [CVC] items, and long competitors took their places. At the
same time, the fact that [CVC] monosyllables ‘lengthened’ less frequently
than [CVCə ]-disyllables, reflects that the memeplex for feet tended to
demote the expressions of syllables that followed them. The syllables
thus demoted became parts of the same feet as the [CVC]-monosyllables
that headed them, which made those feet comparably long in terms of
(rhyme) segments. Then the meme for foot timing would see to it that
[CVC]-monosyllables were expressed relatively quickly. The expressions
of the vowels in them would be short in duration and replicate short vowel
memes better than long ones.
Thus, our theory of the mechanism by which Early Middle English
changes of vowel quantity were brought about seems to be corrobo-
rated, rather than falsified by the history of [CVC]-monosyllables. As
expected, they were not immune to the pressures which the memeplex
for feet exerted on long- and short-vowel phone-memes and their asso-
ciations to memes for English morphs. If one considers that the rhyth-
mical demotion of following syllables cannot have been as likely in the
expression of polysyllabic items as it must have been in those of monosyl-
lables, the evolutionary account that has been suggested here, ‘predicts’
the very distribution of ‘vowel lengthenings and shortenings’ across the
lexicon of Middle English that we actually do observe, namely the one in
figure 8.5 above. It is thus empirically more adequate than most tradi-
tional ones.

8.6 Summary
As we have seen, the evolutionary approach to language which has been
developed in this book not only provides a better explanation of those
EME changes of vowel quantity which established accounts dealt with,
rather unsuccessfully, in terms of categorical rules or various types. It also
286 Selfish Sounds and Linguistic Evolution

explains the distribution of vowel length in the Modern English descen-

dants of Old English [CVC] monosyllables such as God or whale, and
thereby solves a problem which has so far been either brushed under
the carpet, or dealt with in terms of explanatorily empty concepts such
as ‘dialect mixture’, or ‘analogical levelling’. Thus, the explanation we
have derived has the following assets. First, it tells a coherent and uni-
fied story of all the quantity changes which English vowels underwent
at the end of the Old English, and the beginning of the Middle English
period. Second, it explains their statistical nature without employing ad
hoc strategies such as unconstrained and random analogical levelling.
Third, it does so without the procrustean oversimplifications that were
inherent to my 1994 account of the changes in terms of one single process
of quantity adjustment of dubious empirical status. Fourth, it is highly
explicit about the mechanics by which the changes were brought about
and manages, at the same time, to describe the interplay of competence
constituents with factors governing performance and language use in a
principled and systematic way. Fifth, it is radically non-essentialist and
contains no elements which could not be given an empirical interpreta-
tion in terms of material, intersubjectively verifiable world-one referents –
at least in principle, and this is what matters.
The way in which an evolutionary and replicator-based approach allows
one to reconstruct and explain Early Middle English changes of vowel
quantity demonstrates what a powerful framework it provides for the
study of languages and their historical evolution. It illustrates how the
Darwinian perspective on language, for which this book has attempted to
provide theoretical arguments, can account for language change. That it
can explain historical phenomena which have stubbornly resisted expla-
nation when approached within non-evolutionary frameworks suggests
that it may not only throw new light on issues that can also be explained
otherwise, but that it may deepen our understanding of the processes that
drive linguistic history substantially.
To illustrate the kind of generalisations which a Darwinian approach
to language may inspire, I would like to consider a few of the possible
and rather far-reaching implications of the story that has just been told
of English changes of vowel quantity.
As we have seen, they can be understood as adaptations. The selectional
pressures which selected for them were exerted upon English morph-
memes through the ways in which memes for English feet affected their
expressions. We have also argued that no morph-meme in the English
lexicon could be immune to the impact of these selectional pressures.
The selectional pressures that we defined resulted from the fact that
memes for English foot structure and timing were exceptionally stable
How morph-memes live with feet 287

and successful replicators. This means that their expressions had to repli-
cate them well. Since foot-memes were resistant to evolutionary change,
all morph-memes which co-expressed with them were under pressure
to adapt to them. Of course all morph-memes have to co-express with
memes for foot structure and timing. In a language such as English, in
which a particular set of them is safely established, they must have prac-
tically amounted to an environmental constant. It is unlikely that their
impact should have been restricted to causing vowel quantity adjustments
in a number of morph-memes.
Of course, the high evolutionary stability of English memes for foot
structure and timing does not predict that all morph-memes should have
been under pressure to actually change. Many would already have been
sufficiently well adapted as they were – such as morph-memes of types
like have, warrant, or man. In some cases other pressures outweighed
the pressure from foot-memes. For instance, vowel height seems to have
selected against length and for shortness. This explains the stability of
short // and /υ / in words like OE sunu ‘son’ or scipu ‘ship’, or the short-
enings in words like OE fȳst ‘fist’ or dūst ‘dust’. In their case, foot-meme
based pressures can still be assumed to have selected for long vowels,
but the impact of vowel height outweighed their evolutionary effects.
Similarly, long vowels seem to have been selected against when they were
followed by voiceless obstruents. Thus, crack (< ME craken), drop (< ME
drope) and fret (ME < freten) have managed to survive with short vow-
els, even though foot-memes would have selected against them. In sum,
the environmental impact which memes for English feet had on English
morph-memes will have led to actual changes in morph-meme lineages
only when additional conditions were met.
Generalising from this, one may say that (a) factors which exert selec-
tional pressures on the memes in a population are unlikely to be involved
in only a single ‘change’, and (b) individual changes are unlikely to reflect
the impact of a single environmental factor only.
This suggests a new way in which the histories of languages might be
told. Traditionally, they have been understood as chronological sequences
of individual changes. Accounts typically focus on the constituents that
undergo a change. Sometimes, changes are seen as causally related to
each other, sometimes it is acknowledged that local changes appear to
‘conspire’ to have common, global effects, but relations among changes
are usually only investigated after the changes themselves are established
as unified events. The approach we have developed here opens another
possibility. Since we have argued that many linguistic changes will repre-
sent meme-to-meme adaptations, we may attempt to tell their story not
from the point-of-view of the meme-lineages that actually change, but
288 Selfish Sounds and Linguistic Evolution

from the point-of-view of the memes which exert adaptive pressures on

The memes for English feet are perfect candidates for such an attempt.
Since, for the reasons given in sections and 8.3.2 above, they have
been highly stable in the history of English, other memes on the English
replicator team will have been under a strong and long-lasting pressure to
adapt to them. Turning English into a stress-timing language in which the
initial syllables of major class morph-memes are likely to be expressed as
foot-heads, English foot-memes are likely to have caused many morph-
meme lineages to evolve variants that replicate well in their environment.
There ought to be many changes in the evolution of English which rep-
resent adaptations to memes for foot structure and timing. If there are,
this means that English might have many of its properties because its
constituents have responded adaptively to the memes behind English
utterance rhythm. This is an exciting perspective, because it suggests a
way of substantiating the widely shared intuition that English owes many
of its characteristics, such as the high number of monosyllables, the near-
absence of inflection, and ultimately even its syntactic ordering principles,
to the fact that Germanic word stress came to be fixed on the first sylla-
bles of root morphemes. As the next chapter will attempt to show, this
might be more than a well-invented just-so-story, and a Darwinian view
of linguistic evolution might help us to understand why.
9 The prosodic evolution of English word
forms or The Great Trochaic Conspiracy

9.1 Introduction
It is more or less a commonplace among historical linguists that many
of the characteristics of Present Day English have somehow followed
from the fact that in Germanic, the progenitor of English, word stress
came to be fixed on the first, leftmost, syllable of the root. One of the
consequences of this fixing is supposed to have been that word final, that
is, the rightmost, syllables first came to be phonetically backgrounded and
reduced, and then historically lost. This development has in turn been
adduced to explain not only the large number of monosyllables in the core
vocabulary of Present Day English, but is additionally supposed to have
furthered the loss of inflectional endings and thus the typological change
of English from an inflecting towards an isolating language. Therefore, it
can be considered indirectly responsible for the fixing of SVO word order
as well, because the latter appears to have been necessitated by the very
loss of morphological case marking, without which syntactic roles such
as subject and object could not be unambiguously indicated anymore.
Although the decisive role which the fixing of Germanic word stress
seems to have played in the evolution of English on almost all levels is
acknowledged by most linguists, however, the question how exactly it
has exerted its influence has not really been addressed. There are many
possible reasons for that, of course, but one of the most important ones
is the methodological difficulty involved in causally relating a single spe-
cific property that a language assumedly acquired at an early historical
stage with the long-term typological development of one of its daughters,
unfolding itself over a period of almost one and a half millennia. The issue
appears almost too big to address, and therefore the role of fixed root ini-
tial stress in the long-term evolution of English has never really made
it beyond a pedagogically convenient myth. However, the evolutionary
framework which we have been developing in this volume may provide a
theoretical basis which is solid enough to justify a new look at the issue
and to investigate the mechanics behind typological conspiracies.

290 Selfish Sounds and Linguistic Evolution

As indicated at the end of the last section, many of the develop-

ments listed in the previous paragraph might be understood as adaptive
responses to pressures exerted by a set of memes (or memeplex) for feet
and the ways in which they were integrated in the population of memes
for English. Of course, it would exceed the limits of this volume to try to
reconstruct the mechanics behind all adaptations which this circumstance
may have caused in more or less direct ways.1 A few examples will have
to do for illustrating the principle. We shall therefore focus on the effects
which English foot-memes may have had on the phonological structures
of memes for English lexical morphs. This has the advantage that we are
already familiar with the mechanics behind the relevant interactions. As
we saw in the last sections, morph-memes are caused to adapt to their
statistically most probable rhythmic roles, and these reflect the impact
of foot-memes. All we need to do here, then, is to apply the theory to
other changes than those which lengthened or shortened vowels during
the Early Middle English period.
The account we have given of Early Middle English changes of vowel
quantity implies a prediction, which can, in principle, be tested. When
morph-memes for major class lexical items expressed, it usually happened
in such a way that their first syllables were expressed together with the
[S]-node of the memeplex for feet. It follows from this that there must
have been a fairly good chance for their expressions of morph-memes to
coincide with the expressions of foot memes (but see the note on page 294
below). In other words, the expressions of morph-memes and the expres-
sions of foot-memes were often co-aligned. Since the preferred utter-
ance type as which the memeplex for feet expressed was the trochee
(a prominent syllable followed by a weak one) morph-memes would profit
from having phonological structures which replicated well when they were
expressed as trochees. Thus, the pressures exerted by the memeplex for
feet on morph-memes should, by and large, have selected for variants
which should express well as trochees, or at least as feet which were
similar to trochees in terms of prominence structure and duration.
This suggests, first, that of the open set of potential sound changes
which might have occurred in the history of English, a significantly greater
number of those ought to have ‘made it’, whose ‘outputs’ were more
trochee-like than their ‘inputs’. This is because, whatever other pressures
may have selected for or against competing variants of morph-memes,
the memeplex for feet will always and independently have selected for
those which replicated better when expressed as trochees. After all, its

1 An interesting paper on the effects which rhythm may exert on other aspects of a language
is Stampe/Donegan 1983.
The Great Trochaic Conspiracy 291

own stability was so great that other memes were more likely to adapt
to it than vice-versa. To the extent that the expression of morph-memes
coincided with the expression of a trochee, a tendency to become more
trochee-like ought to be empirically detectable in the evolution of their
Secondly, it suggests that compensatory changes should be expected
whenever independent factors introduced morph-meme variants into the
English meme pool which turned out to be badly adapted to foot-meme
pressures. The reason for this is what we have already observed: the stable
memeplex for English was not the only factor which affected the evolution
of morph-meme lineages. It caused or prevented actual changes always
in combination with, or against other, and sometimes quite independent
pressures. Recall, for instance, that vowel height seems to select strongly
against vowel length: this selectional pressure works in ways which are
clearly unrelated to the mechanics by which foot-memes select for or
against long vowels. Now, it is clearly conceivable that some independent
factor may select against morph-meme variants that are well adapted to
the foot memeplex, and have them ousted by competitors which are not as
good in that respect. What our view of meme–meme adaptation predicts
for such cases is that new variants will come under strong pressure from
foot-memes. Of course, the change which foot-independent factors had
caused is unlikely to be simply reversed by them. If that were an option,
it would not have occurred in the first place. What can be expected,
however, is that foot-memes will strongly select for variants which are
both compatible with the independent pressure, and better adapted to
foot-based pressures at the same time. In other words, we can expect
compensatory changes whenever a change produces morph-memes that
are worse trochees than their ancestors.
So, we expect memes that expressed as trochee–like word forms to have
become increasingly frequent in the history of English, and we expect
compensatory changes to have occurred whenever meme variants were
introduced that were suboptimal in that respect. One way of testing this
prediction, although admittedly a crude one, is to examine a represen-
tative number of well documented sound changes. To corroborate our
view, their outputs need to be better trochees than their inputs. If their
outputs are worse trochees, then we expect them to be accompanied
or closely followed by changes which restored some of their trochaic
qualities. Only if they are not will they speak against our hypothesis.
The following sections report the results of such a test. The predic-
tions we have formulated seem indeed to be borne out. If one takes a
bird’s-eye look, the evolution of English seems indeed to have selected
for memes that expressed and replicated well as trochees, while it seems
292 Selfish Sounds and Linguistic Evolution

to have punished memes that did not by selecting for better adapted

9.2 The Great Trochaic Conspiracy

What does it mean for a foot to be ‘like’ a trochee? Basically, the similarity
of a foot to a trochee is determined by two parameters. Primarily, and
obviously, it depends on the number of weak syllables it contains: a foot
is the more similar to a trochee, the closer its number of weak syllables is
to one. Secondarily, the similarity of a foot to a trochee will also be deter-
mined by its overall metrical weight, that is to say the number of segments
in the rhymes of its syllables. Why is that? Of course, trochees, that is,
sequences of one strong and one weak syllable, will vary with regard to
their weight. They can theoretically consist of syllables of diverse struc-
tures. However, some syllabic configurations seem to be more stable and
thus more common than others (see for example Vennemann (1988),
or Dziubalska (1995)). Universally speaking, the best syllables are CV,
that is, sequences of a single short consonant and a vowel. However, the
phonotactic preference for CV syllables, is counterbalanced by a ten-
dency of syllables which express as footheads to have complex rhymes.
Taking these two factors into account, one can expect the evolution-
arily most stable (and thus the most common) trochees to have struc-
tures such as CVCV, CVCCV or CVVCV. Therefore, if a mora equals
a segment in the rhyme of a syllable, the weight of ‘good’ trochees will
vary between two and three moræ,2 if onset maximal (OM) syllabication
([CV][CV], [CVV][CV] or [CVC][CV]) is assumed. On the assumption
of general maximal (GM) syllabication, in which the intermediate C in a
CVCV configuration counts as ambisyllabic ([CV[C]V], [CVV[C]V] or
[CVC[C]V]) and weigh 1/2 mora, the weight of a typical trochee would
be 21/2 or 31/2 moræ. Thus, a monosyllable with a [CVV[C] structure
is more similar to a trochee than a monosyllable with a [CV[C] struc-
ture, because – just like a typical trochee – the former weighs two moræ
(OM) or 21/2 moræ (GM), while the latter weighs only one mora (OM)
or 11/2 moræ (GM). By the same token, a trisyllabic item with the struc-
ture [CV[C]V[C]V] (four moræ, GM) is more similar to a trochee than a
trisyllabic item with the structure [CVV[C]V[C]V] or [CVC[C]V[C]V]
(five moræ, GM).
2 In accordance with established terminology, weight will be calculated in moræ. However,
in an admittedly simplistic but nevertheless practical manner, one mora will simply be
defined as one segment in the rhyme of a syllable. Since one’s calculation of syllable
weight will depend on one’s theory of syllabication, and since I have no axe to grind in
this matter, I shall calculate weight alterations both on the basis of an onset maximal
syllabication (Wo ), and on the basis of a general maximal one (Wg ). Also, I shall count
ambisyllabic rhyme elements as weighing 1/2 mora, problematic though this may be.
The Great Trochaic Conspiracy 293

Equipped with this, admittedly crude, method for determining how

trochee-like a foot is, let us look at some evidence. The question is if the
outputs of sound changes affecting the quantity of English word forms
are indeed more similar to trochees than their inputs. If they are, then this
confirms the prediction which we derived from our explanation of Early
Middle English changes of vowel quantity: English morph-memes should
have been under a long-lasting pressure to adapt to the impacts of English
foot-memes. If the outputs of sound changes are less trochee-like than
their inputs, then we expect their effects to be undone by compensatory
changes which restored some of their trochee-like properties and thus re-
adapted them to English foot-memes. To check whether these predictions
hold, I have classified a representative sample of changes affecting the
quantity of English word forms with regard to the following parameters:
(a)  s : How a change affected the stressed or strong syllable:
here ‘S’ will mean strengthening, ‘W’ weakening, and
‘/’ that there was no change in that respect.
(b)  w : How a change affected (one of) the unstressed or weak
syllables: again, ‘S’ will mean strengthening, ‘W’ weak-
ening, ‘D’ deletion, and ‘/’ that there was no change in
that respect.
(c) #: How a change altered the number of syllables in the
word form: a change from bisyllabic to monosyllabic
structure will be represented as ‘2 > 1’, and all others
(d) Wo and Wg : How a change altered the overall metrical weight of the
word form under Onset Maximal (Wo ) and General
Maximal syllabication (WG ) respectively: a change from
two to three moræ will be represented as ‘2 > 3’, and
all others accordingly.
(e) →  : Whether the outputs of a change were more similar to
trochees in terms of the number of weak syllables in the
foot: ‘+’ will mean ‘yes’, ‘−’ will mean outputs were
in fact less similar to trochees than inputs, and ‘/’ will
mean that they were not different from inputs in that
(f ) →W: Whether the outputs of a change were more similar to
trochees in terms of their overall metrical weight: again,
‘+’ will mean ‘yes’, ‘−’ will mean outputs were in fact
less similar to trochees than inputs, and ‘/’ will mean
that they were not different from inputs in that respect.
The parameters which sum up all the others, and which are ultimately
relevant to the question we are here addressing are (e) and (f ). Readers
294 Selfish Sounds and Linguistic Evolution

who prefer not to get lost in detail, may want to focus on them when
reading the tables in the following sections.3

9.2.1 Germanic high vowel deletion

This change deleted word final high vowels, if they occurred either after
a strong syllable or after two syllables. (46) lists some typical examples
(cf. Lass 1994: 98ff.).

(46) wurm+i > OE wyrm (as opposed to win+e)

flo:ð +u > OE flood (as opposed to sun+u)

wered+u > OE wered

lirn+unγ+u > OE leornung
The change left stressed syllables as they were. It deleted unstressed syl-
lables in trisyllabic word forms, as well as in disyllabic word forms with
heavy first syllables. Trisyllabic word forms thus became disyllabic, and
some disyllabic word forms became monosyllabic. The overall metrical
weight of the word forms was reduced by one mora.
As already mentioned above, the exact calculation of metrical weight
depends on the assumed theory of syllabication. On the assumption of
maximal onsets rhymes (and therefore moræ) are minimised. The word
form wurmi must be syllabified [wur][mi]. The first syllable counts two
moræ because the rhyme is [ur], the second syllable counts one mora
because the rhyme is [i]. The overall weight of the word form is then
three moræ. Deletion of the final vowel would result in a bimoric word
form, because wyrm is syllabified [wyr][m, the final vowel being extramet-
rical, as the theory requires. On a general maximal assumption, on the
other hand, wurmi is syllabified [wur[m]i]. Consequently, it weighs 31/2
moræ, and the deletion of the final vowel would yield [wyr[m], weighing
21/2 moræ.
3 In my choice of sound changes I have tried to be comprehensive. In fact, the only ones
that I have deliberately left out of the discussion are Old English Breakings (as in sterfan >
steorfan ‘to die’), because the quantities both of the vocalic nuclei they affected and
of those which they produced is highly disputable. Furthermore, I only deal with the
quantity of affected word forms, though in the widest sense. All aspects concerning the
quality of the affected segments or their environment are irrelevant to the discussion.
The resulting pictures of the changes are thus necessarily simplified and incomplete.
Finally, it needs to be pointed out that the morpho-syntactic status of English lexical
morph-memes has changed since Old English times from that of stems to base-forms.
This means that in Old English lexical morph-memes had to co-operate with morphs for
inflectional endings for purposes of expression, while their Middle and Modern English
descendants do not. One needs to take this into account, of course, when discussing the
feedback morph-memes are likely to receive from their expressions. Thus, when I will
discuss early sound changes, the tables will occasionally include stable alliances of lexical
plus inflectional morph-memes, rather than unambiguously non-compositional memes.
The Great Trochaic Conspiracy 295

The behaviour of this sound change with regard to the parameters

introduced above is summarised in table (47):

TYPE s w # Wo Wg →  →W
∗ wurm+i / D 2>1 3>2 3/ >2/
1 1
− /
2 2
∗ flo:ð+u / D 2>1 3>2 3/ >2/
1 1
− /
2 2
∗ wered+u / D 3>2 3>2 4>3 + +
∗ lirn+unγ +u / D 3>2 5>4 6>5 + +

Where the change affected morph-memes for trisyllabic word forms, its
outputs were more similar to trochees – both in terms of syllable structure
as well as in terms of weight. For disyllabic forms this is not true. At least
in terms of weight, however, the outputs were still within the possible
weight range of trochees. Thus, as far as duration is concerned, the change
did increase the number of English word forms that were equivalent to
optimal feet. At least on the whole, it appears to be compatible with
the assumption that memes for word forms evolved to become better at
replicating as trochees.

9.2.2 Medial syncope4

The change deleted high vowels in the middle syllables of trisyllabic
word forms with heavy first syllables. A typical example is ∗ xæuriðæ >
hı̄erde as opposed to ∗ næridæ > nerede (cf. Lass 1994: 100f.). The
change left the stressed syllable as it was, deleted one unstressed syl-
lable, thereby reduced the syllables in the word form from three to
two, and reduced the overall weight of the word form by half a mora
under general maximal syllabication. Under onset maximal syllabication,
it did not alter the weight of the word form. Table (48) summarises the

TYPE s w # Wo Wg →  →W
∗ xæur+i+ðæ / D 3>2 / 5>4/ 1
+ +

The change obviously increased the number of trochaic word forms.

It is therefore in line with our prediction.

4 cf. Lass (1994: 98–102), from where also the examples are taken.
296 Selfish Sounds and Linguistic Evolution

9.2.3 Old English shortening before consonant clusters

The change shortened long vowels in stressed syllables before clusters of
three consonants. Examples are gōdspell > godspell or brǣmblas > bræmblas.
As table (49) shows, neither were unstressed syllables affected by this
change, nor was the total number of syllables in the affected word forms.
However, the overall weight of the word forms was reduced by one mora,
so that they became more typical, in this respect, of trochees. Once again
our prediction is confirmed.
TYPE s w # Wo Wg →  →W

Gōdspel W / / 4>3 5>4 / +

brǣmblas W / / 4>3 5>4 / +

9.2.4 Old English trisyllabic shortening

The change shortened long vowels in antepenultimate heavy stressed syl-
lables. Thus, it reduced the overall weight of the affected word forms by
one mora. An example would be sāmcucu > samcucu. The characteristics
of the change are summed up in table (50). Once more, the outputs of the
change were more typical trochees with regard to weight than its inputs
had been.