Вы находитесь на странице: 1из 16

The Broken Plural Morphological System in Arabic

The Broken Plural Morphological System in Arabic:


A Challenge to Natural Language Processing Models

Mr DENDANE Zoubir
Universit de Tlemcen
Abstract:
The present paper intends to point up that the
Arabic broken plural noun, as labelled by traditional
Arabic grammarians, is undeniably considered, both in
morphological and phonological circles, as the most
sophisticated system of nominal plurality. Its complex
structure consists of a great number of rules due to the
overall morphological patterning of the language and, in
particular, its non-concatenative nature. As a matter of
fact, the Arabic broken plural, sometimes wrongly
referred to as irregular, is regarded as representing a
significant challenge to Natural Language Processing
applications and translation theory. Thus, questions arise
as to a) how to devise approaches to identify the various
plural types; b) how to develop algorithms to deal with
nouns subjected to internal modification of the singular
form. In addition, it is not easy to consider the numerous
patterns of pluralisation in relation to meaning,
particularly in cases where the input is the same singular
stem.

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

1. Introduction
The present paper is intended to examine one of
the most complex morphological structures that pervade
the Arabic language system: pluralisation a system that
excludes the dual noun, referred to as almuannaa in
Arabic and classified as a separate category.
The Arabic plural system, which consists of a
two-mode formation, sound plural and broken plural
as labelled by traditional Arab grammarians, represents,
with its structural configuration, an immense challenge to
both structural and generative morphologists. While the
morphological patterns of the sound plural observe a
straightforward

regularity,

the

broken

plural

is

undeniably considered, both in morphological and


phonological circles, as the most sophisticated system of
nominal plurality. Its complex structure consists of a
great number of patterns due to the overall morphological
patterning of the language, in particular for its nonconcatenative nature described in terms of interspersing
of consonantal roots and vocalic melody (McCarthy,
1981).
Wrongly referred to as irregular because of its
large-scale complex structure, the Arabic broken plural is
60

The Broken Plural Morphological System in Arabic

regarded a significant challenge to Natural Language


Processing applications and translation theory, and thus
questions arise as to
a) how to devise approaches to identify and
categorize the various plural types in a root-and-pattern
based system;
b) how to develop algorithms that may allow
automatic translation of nouns subjected to internal
modification of the singular form.
We will expose, in the first section, an overall
view of the Arabic nominal plural system starting with
the sound plural which generally responds to simple
affixations, though some nouns are subjected to special
rules. Then, we put emphasis on the broken plural with
its numerous patterns and their apparent versatility, albeit
the system is substantially structured on the basis of
formal but complex regularities.
In the following section, we attempt to touch upon
some idiosyncratic features of the Arabic plural:
- the pluralisation issue from the semantic point of
view; it is indeed by no means easy to consider the
numerous patterns in relation to meaning, particularly in
cases where the output of a singular stem emerges in two

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

or more plural forms, some being synonymous and some


carrying different meanings;
- other idiosyncrasies that characterize the Arabic
plurals are: the extra-pluralisation of a number of plural
forms; the existence of plural forms with no singular uses
as well as that of singular nouns with plural intent.
All throughout the paper, we will refer to Arabic
encoding issues that NLP researchers and computational
linguists are faced with in considering singular noun
classification and corresponding plural form generation
the encoding of which could be of great help to automatic
treatment of Arabic and machine translation to and from
Arabic.
In the last section, we look at some ways broken
plural patterns apply to loan words from European
languages, particularly French, into Modern Standard
Arabic but mostly into Algerian Arabic dialects, in
contrast to those favouring the sound plural forms.
2. Pluralization in Arabic
Word formation and inflectional rules make up the
backbone system whereby natural language processing
obtains and is indeed sustained; and thus the complexity
62

The Broken Plural Morphological System in Arabic

or simplicity of a given language is proportional to its


morphological derivational and inflectional patterns.
Morphological descriptions of word usually fall under
morpheme concatenation, where each morpheme is
made up of one or more segments, and where words are
made up of sequences of morphemes strung together in a
rigid linear order. (McCarthy 1983:263)1.
However, the Arabic language is known in
phonological and morphological circles as the most
heavily inflected language because, in addition to the
regular concatenating operations occurring at both ends
of the word, it is charaterized by non-concatenative
structures, based on a process termed root-and-pattern
morphology, the inter-digitation of consonantal roots and
vocalic patterns; this is particularly true with verb
patterns but also in the sophisticated noun pluralisation
system of this language. Some authors (e.g. Kiraz, 1996)
have also attempted to analyse Arabic plurals on the basis
of prosodic structure2.

In Dihoff ed. (1983). Current Trends in African Linguistics I.


Prosodic structure, mostly represented in suprasegmental features,
is not considered in this paper.
2

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

Pluralization in Arabic incorporates several noun


patterns, as well as related adjectives3, referring to more
than two entities (two units are part of almuannaa, the
dual system as in kitaabaan/kitaabayn 4, two books).
However, the matter is far from being as simple as this.
The complexity of the Arabic plural system is reflected in
the profusion of theoretical work on the topic (cf.
Beesley 1990, Kiraz 1996, McCarthy 1990, etc...) and
presents at the same time an immense challenge to
computational research and NLP applications.
Two modes characterize the morphological structure
of the Arabic plural: al dam as slim,

termed sound plural, and dam attaksr,

known as the broken plural.


2.1 The sound plural
Just as in English, French, Spanish and other western
languages, the sound plural in Arabic obtains by means
of morpheme sufffixation, {-n}/{-n} to the stem in the
3

Much more than in English or French, Arabic adjectives often have


the same patterns as the related nouns, e.g. /karm/ means both
generous and the generous, and thus are viewed like nouns and
subjected to the same rules of pluralisation, number and gender.
4
kitbn: nominative; kitbayn in both accusative and genitive cases

64

The Broken Plural Morphological System in Arabic

masculine depending on the case it takes i.e. {-n} for


the nominative as in muallimn teachers and {-n} is
used in accusative and genitive cases giving muallimn.
In the feminine, the morpheme {-t(un)} or {-t(in)},
depending on the case, is added to the noun stem to get
muallimt,5 women teachers. Interestingly, the sound
masculine plural,

, is only used with

rational beings, like abbzn (men)bakers, while the


sound feminine plural,

is used with all

types of beings and objects e.g. ktibt for female writers


and sijrt for cars, etc.
This seems to be more complicated than the mere
addition of the plural morpheme {-s} in English, French
and Spanish nouns; but there is much more to say about
the usage of the sound plural, in particular when we
consider a few masculine nouns with feminine plural
forms, eg. /imtin/ > pl./ imtint/, exams or
feminine words like /sana(tun)/, a year, with a
masculine plural /sinn/, in addition to its feminine plural
form /sanawt/.

As it occurs in pausal form.

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

This last instance, /sanawt/, shows that for


phonotactic reasons, there might be some change in the
nature of the consonant or vowel preceding the affixation
of {-t} and {-n}/{-n}, in particular when the noun
comes from the so-called weak verbs6 such as /qa/,
to judge, whose singular masculine noun is q with
the plural qn, judges, or when a feminine noun ends
with a glottal stop, as in /sam/ > pl. /samwt/, skies.
But these irregularities in the sound plural formation
are far from being as complex as the challenging patterns
in the broken plural mode. Indeed, the morphological
processes that govern the Arabic broken plural are
different in nature as they process by internal stem
changes, giving so intricate forms that only appropriate
stemming algorithms can process their patterning.
It is worth recalling, however, that native speakers of
Arabic naturally acquire a number of such forms in their
dialects saying, for instance, [ajjm] for days, on the
pattern afl, and never *[jawmn].

Weak verbs in Arabic have a long vowel or two as part of the triconsonantal root and are subject to changes in different paradigms,
e.g. ql / jaql / qawl, said, says, a saying.

66

The Broken Plural Morphological System in Arabic

2.2 The broken plural


While the sound plural is built on a quite regular
basis involving morpheme suffixation according to
gender and case namely, {-n(a)} and {-n(a)} in the
masculine, and {-t(u)} vs. {-t(i)} for the feminine (e.g.
/muminn/ vs. /mumint/, believers masc. vs. fem. in
the nominative case) , the broken plural is characterized
by a great number of fixed templatic constructions
resulting from internal modification of the singular stem:
the infixation of vowels or inter-digitation of a vocalic
melody with the verb consonantal skeleton, as in, for
instance, /bajt/ > pl. /bujt/ homes, or /qamar/ moon >
/aqmr/ on the patterns ful and afl, respectively.
The term broken plural might suggest the idea of
irregularity and exception, but it is in no way unusual in
Arabic. Rather, a high percentage of often-used nouns
have broken plural forms. McCarthy and Prince (1990b)
write, in this respect that, the sound plural is in no way
the regular or usual mode of pluralization. Essentially all
canonically-shaped lexical nouns of Arabic take broken
plurals, including many loans.

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

2.3 Broken plural categories


The broken plural is traditionally sub-divided into
categories representing three types of pluralization in
terms of smaller or greater numbers, and associated with
specific patterns. The table below displays the most often
recognized types of plural, though many are not
commonly found, particularly in peoples daily speech.
Plural

Paucity

Multiplicity

Ultimate pl.

Types

()

()

()

Number

3 to 10

11 and more

More than 1000

Patterns

4
aful, afl,
afila(tun),
fila(tun)

17

ful, fuul, ful,

fawil, fail,

fual, fial,

falil, fal,

fal, ful,

fal, etc.

fiala, etc.

It is interesting to bear in mind that among the


broken plural patterns found in the language, many
escape morphological rules and are classified and
recorded

in

dictionaries

as

heard

plurals,

i.e.,

traditionally heard among the people who were said to

68

The Broken Plural Morphological System in Arabic

speak Clear Arabic7, al Lugha l Fu, in pre-Islamic


and post-Islamic times. One rule says that the plural of
the two patterns, or measures, fal and fil is fual ,
as in karm > pl. kuram, and ir > pl. uar,
generous and poets, respectively. But Arab speakers
know that the plural forms of ar small and ktib
writer, of the same patterns mentioned above, are ir
and kuttb, and cannot be of the measure fual. Such
discrepancies and other irregularities add to the
complexity of the broken plural and to the challenge that
pluralisation in Arabic presents to NLP applications,
retrieval information and machine translation, but has
also been a source of theoretical stimulation for a great
number of researchers. Mc Carthy and Prince (1990b),
for instance, have succeeded in enhancing theoretical
issues by showing that The broken plural, then, makes a
full, systematic use of the categories and operations
provided by the theory of prosodic morphology,
providing a particularly interesting test case and a robust
new source of evidence for the theory.
7

We prefer Clear Arabic as Classical Arabic does not render the


real meaning of Al Lugha l Fu. (fu = eloquent).

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

2.4 Other characteristics of the broken plural


In addition to the complexity of the morphological
organization of the Arabic broken plural system, there are
a number of features that have to be mentioned:
- Allomorphy: two or more plural forms used for

the same singular noun; e.g. ajn > pl ajun and


ujn, eyes, the second one being only used as a
plural of ajn with the meaning water spring.
Similarly, the word amr has two plural forms
according to meaning: awmir for commands and
umr for issues. Soudi et al. (2002) write: For a
given singular pattern, two different plural forms may be
equally frequent, and there may be no way to predict
which of the two a particular singular will take.

- Collective noun: it is a noun known in Arabic as


ismul dam, bearing a singular form but used to

convey a plural meaning, as in /al ibil/ camels, /an


naml/ ants or even /aIfl/ children in a Quranic
verse8 where the related verb takes the plural
inflection suffix {-n}. Collective nouns are mostly
used to denote a group of animals or plants, but also
8

{ } 13

70

The Broken Plural Morphological System in Arabic

people in words like /qawm/ which has itself a plural


form /aqwm/ like peoples in English.
- Pluralisation of the plural is another peculiarity of
the Arabic system; indeed, some plural nouns can get
an augmented plural form to represent a large
number: e.g. /bujt/ homes >> /bujtt/, a great
number of homes; /diml/ >>/dimlt/, camels.
2.5 Borrowing pluralisation
The productivity of the Arabic root-and-template
morphology is so flexible that it allows the adaptation of
nouns borrowed from English and French to the various
plural patterns, both sound and broken, according to their
syllable structure and prosodic morphology (McCarthy
1981,1983; McCarthy and Prince 1990b; Kiraz, 1996).
Thus, it appears that the loanword takes a plural form on
the basis of its morpho-phonological structure. One
instance used in MSA is kawls, from French coulisses
on the pattern fawl, as if the word was of a tri-consonantal
root {k-l-s}. Another example is the word film whose

CvCC pattern requires its pluralisation in aflm, on the


same pattern as Arabic ukm > akm, judgments.

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

But there are many borrowings which may be found


in both sound and broken plurals and others in only the
sound feminine form. The French word lauto (the car)
may take both forms, lotojt and lwata. Interestingly, in
another French noun chambre (room), while the addition
of the suffix {-t} giving the feminine sound plural
ambrt does not alter the stem, in its broken version
nbr, the consonant /m/ is realized [n] as if the
insertion of a vowel between /m/ and /b/ prevents
assimilation, just like the Arabic noun [damb], side or
flank, whose plural form is [dnb] in many dialects.
3. Conclusion
We have attempted, in this paper, to highlight the
complexities of pluralisation in Arabic, in particular, the
broken plural system which is basically formed by some
internal modification in the singular noun stem, as in
kitb > pl. kutub. Based on root-and-pattern morphology,
it may remind us of the very few irregular English plural
forms like feet or mice that are subjected to some
internal change. But the Arabic broken plural is far more
complex with its highly productive character and is

72

The Broken Plural Morphological System in Arabic

reflected in a great number of patterns representing a real


challenge to NLP research and linguistic theory.
Although traditional Arabic grammarians already
described the system in all detail and they did much to
categorize the different types of plural, further research is
required to fully understand the intricacies of the whole
plural system, particularly in computational linguistics
which in turn requires to be supported by theoretical
progress in the organisation of Arabic as a concatenated
and non-concatenated language.

Revue Maghrbine des Langues 2010. N7 Oran (Algrie)

_____________________
Rfrences
- Kiraz, G. A. (1996). Analysis of the Arabic Broken
Plural and Diminutive (1996). In Proceedings of the 5th
International Conference and Exhibition on MultiLingual Computing
- McCarthy, J. (1981). "A Prosodic Theory of
Nonconcatenative Morphology." Linguistic Inquiry 12:
373418.
- McCarthy, 1983 "A Prosodic Account of Arabic
Broken Plurals," Current Trends in African Linguistics I
(ed) By L. Dihoff, (Dordrecht: Foris, 1983) pp. 263-289.
- McCarthy, J. and Prince, A. 1990a. Foot and word in
Prosodic Morphology: The Arabic broken plural. Natural
Language and Linguistic Theory 8:209282
- McCarthy, J. J. and Prince, A. (1993, 2001) Prosodic
Morphology Constraint Interaction and Satisfaction.
University of Massachusetts, Amherst Rutgers University
10115, USA.
- Soudi, A, Cavalli-Sforza, V. Jamari, A. (2002): Arabic
Noun System Generation. In: Proceedings of the Arabic
Processing Conference, University of Manouba, Tunisia.
- Troyer, M. (2006). Broken plural formation in
Moroccan Arabic
------------------------

74

Вам также может понравиться