Вы находитесь на странице: 1из 14

Morphological Analyzer of Minangkabau Derivational Affixes: A Theoretical Approach

Nur Rosita

ABSTRACT
Languages which belong to the Malayo-Polynesian language family play an important role in
the development of Indonesia Language especially during the initial formation of the language.
One of them is Minangkabau language which contributed much to the language vocabulary.
Minangkabau is an Austronesian, one of Indonesian type language variety spoken in West
Sumatera by approximately six million speakers (Ethnologue: 2007), even in some parts of North
Sumatra, Aceh, Riau, Jambi, and Bengkulu and further in Negeri Sembilan Malaysia due to the
merantau/outmigration tradition of Minangkabauness. Despite its large number of speakers
and the spread of Minangkabau people throughout the Indonesian Archipelago, Minangkabau
remains under-described when compared to other Indonesian type languages like Javanese and
Sundanese. This paper seeks to improve current understanding about Minangkabau language
and provide information to readers who are interested in local languages in Indonesia,
particularly Minangkabau language by trying to develop a theoretical base of Morphological
Analyzer for Derivational Affixes of Minangkabau by providing its finite state automation as the
base theoretical analysis because this case has received little attention to date and poorly
described compared with standard Indonesia. The data was taken from some source texts
provided online and feedback from the writer as a native speaker of the language into the
analysis as well as taking the findings of recent typological and theoretical studies of
Austronesian languages into consideration.

Key words: Austronesian language, Minangkabau language, Morphological Analyzer,


Derivational Affixes.

INTRODUCTION
1. Background of the Study
Many researches and inventions have been developed in the field of linguistics and
technology especially computer which then called computational linguistics. These
researches and inventions are related with how the natural language is processed. One of
these researches and inventions is doing morpheme analysis of one particular language then
called morphological analysis by automatic machine.
Morphological analysis is the process of studying and analyzing the structure and
formation of word in a particular language. It is the most important task of natural language
processing in any particular language and can be considered as a first step in any natural
language processing system which gives basic insight to the natural language by studying
how to distinguish and generate grammatical form of words. It is also a part of language
analysis which involves considering tag set to describe the grammatical categories of word
forms.
As the growing of the technology has been developed so rapid, many researches
regarding to morphological analysis have been done in many kind of languages throughout
the world to build an efficient morphological analyzer which can analyzed the language form
automatically by inputting some data like Indian languages, Arabic, English, even Korean,
Chinese, Japan, etc to get specific output. Any Natural Language Processing (NLP)
application for any language starts with the development of Morphological Analyzer or Word
Analyzer, which analyzes the inflected word and provides information such as root word or
stem and its constituent morphemes with which the original word was constructed. In
Indonesia especially, there are some studies and projects have been done related to this issue
such as developing MorpInd and Indmorp in order to analyze the Standard Indonesia in the
level of morphology. Yet, the study that related to the Indonesian local language has not been
done further. Whereas there are other multiple languages spoken throughout the country
might be endangered which need to be analyzed.
One of the languages is Minangkabau language. There are so many grammar aspects
of Minangkabau which have played no small role in typological studies since it is classified
as an agglutinative language. In linguistics typology, an agglutinative language means verbs

usually do not conjugate and words are derived from other words by mechanically appending
affixes to invariable stem. Thus, treatment of the word derivation is one of the important
problems in morphological analysis of these languages and Minangkabau language is one of
them. According to the nature of language, Minangkabau is a morphologically complex
language where almost every word can be inflected with affixes and building morph
analyzers for highly inflected languages is rather difficult but crucial. In conclusion, it is
needed to develop Minang corpora to extract automatic lexical information of Minangkabau
language in the level of morphophonemic and morphosyntactic.
In this paper, the writer tries to make a linguistic analysis of a very simple
morphological analyzer theoretically which can be applied practically by someone who
masters in computer programming by proposing a new simple theoretical framework of
Minangkabau morphological analyzer system and tries to handle morphological analysis and
lemmatization for a given surface word form, so that it is suitable for further language
processing. It consist both morphosyntactic and morphophonemic rules for Minangkabau
derivational surface words with detailed tag set which inspired from MorphInd developed by
Larasati et al.
2. Aim and Objective
Aims of this paper were to generally introduce Minangkabau language in relation to
its derivational affixes and further develop a morphological analyzer for its derivational
affixes by providing the theoretical framework for its application.
3. Materials and Methods
The material which were used in this paper some source texts provided online as a
primary source. In taking this primary source, the writer collected Minangkabau source texts
that are available online in internet in which the sentences consist of derivational affixes.
Moreover, the paper is carried out step by step by collecting data first from some sources
then the writer analyzed the data in systematic way by formulating and explaining the form
of derivational affixes of Minangkabau language to gain deep understanding about this case
and constructed the finite state automation of Minangkabau derivational affixes form in the
sentence as the base framework for further developing of morphological analyzer. It is

basically carried out based on the writer own knowledge and understanding as a
Minangkabau native speaker and observation of Minangkabau language with some
references from books, journals, and articles.
4. Scope and Limitation
The scope of this study was related to derivational affixes of Minangkabau language
in particular form. It is limited to the explaining theoretical approach and framework for
developing morphological analyzer of Minangkabau derivational affixes to be applicable in
any further research related to computational linguistics core of natural language processing.
This study gives a simple basic way to establish any derivational affixes of Minangkabau
language as an input then tries to elaborate the output by giving its finite-state morphological
pharsing.
5. Significance of the study
This study hopefully can be essential in giving contribution of providing deep
understanding about derivational affixes formulation in Minangkabau language as one of
vernacular language in Indonesia to build its morphological analyzer from this theoretical
framework. So it will be useful for further research in the field of computational linguistics,
natural language processing, to develop such a Minangkabau morphological analyzer in
which till now, none of research from any scholar studies this issue.
6. Definition of Key Terms
1.

Minangkabau language

2.

Morphological Analyzer

3.

Derivational Affixes

The Minangkabau language is the native language of


the Minangkabau people who mainly live in area which
is now known as West Sumatera, a province in the
island of Sumatera.
A morphological analyzer is a program for analyzing
the morphology of an input word, it detects morphemes
of any text.
An affix by means of which one word is formed
(derived) from another that is often of a different word
class from the original.

REVIEW OF RELATED LITERATURE

1. Minangkabau Language
Besides the standard national languages Indonesian, there are also many varieties spoken
locally as a mother tongue; others function as languages of wider communication. These
Indonesian language groups scattered throughout insular Indonesian archipelago. Minangkabau
is one of these languages of wider communication, spoken throughout west Sumatra, the western
part of Riau, South Aceh Regency, the northern part of Bengkulu and Jambi, also in several cities
throughout Indonesia by migrated Minangkabau. The language is also a lingua franca along the
western coastal region of the province of North Sumatra, and is even used in parts of Aceh,
where the language is called Aneuk Jamee. It is also spoken in some parts of Malaysia,
especially Negeri Sembilan and now spoken as a mother tongue by a growing number of people
in urban areas of West Sumatera Province. It was originally written using the Jawi script, a
modified Arabic alphabet and the Romanization of the language dates from the 19th century.
Minangkabau Language has a relatively simple morphological and syntactic structure.
Verbs are generally unmarked for tense and aspect, and nouns are generally unmarked for
number and definiteness. The basic word order in the clause is SVO, with head preceding
modifier in the phrase. While the lexicon and phonology of this language is very similar to
Standard Indonesian.
The Minangkabau language contains some dialects, found in various regions and at times
even in neighboring villages. There are a number of dialects of Minangkabau in the different
areas where it's spoken like Padang Panjang, Agam, Aneuk Jamee (Jamee), Batu SangkarPariangan, Kerinci-Minangkabau, Orang Mamak, Pajokumbuh, Pancuang Soal (Muko-Muko),
Penghulu, Sijunjung, Singkarak, Tanah, Ulu. The most common dialect used among
Minangkabau people to communicate with Minangkabau speakers from different areas is the
Agam-Tanah Datar dialect (Baso Padang or Baso Urang Awak "our (people's) language").

2. Morphological Analyzer

The history of morphological analysis dates back to the ancient Indian linguist Pn ini,
who formulated the 3,959 rules of Sanskrit morphology in the text As t dhyyby using
a constituency grammar. The Greco-Roman grammatical tradition also engaged in morphological
analysis. Studies in Arabic morphology, conducted by Marh al-arwh and Ah mad b. al
Masd, date back to at least 1200 CE. These studies became the scaffolding for other
researchers to build a tool called morphological analyzer which done by machine. A
morphological analyzer is the automated implementation of human ability to analyze a language
which always returns a morpheme with the suffix associated with it. It segments the word into
morphemes. A Morphological Analyzer is the computational implementation of human ability to
analyze a language. It is a computer program that analyses words belong to Natural Languages
and produces its grammatical structure as output. The computer takes word as an input and
analyses it using the given resources and algorithm.
Morphological analyzer is not a new invention in a field of computational linguistics.
This invention was begun when the Turing Test Model of algorithmic computation was
introduced in 1936 then followed by other scholar researchers like the early algorithms for
morphological parsing used either the bottom-up affix-stripping approach for Ancient Greek by
Packard in 1973. AMPLE (A Morphological Parser for Linguistics Exploration) (Weber and
Mann, 1981; Weber et al., 1986; Hankamer and Black, 1991). Karttunen built a program called
KIMMO based in Koskenniemis models in 1983. Two level or other finite state model of
morphology have been worked out for many languages such as Turkish (Oflazer, 1993) and
Arabic (Bessley, 1996). The latest studies in 20s century have also been done in numerous
languages like Indian languages, Japanese, Korean, French, German, and many others.
Eryiit and Adal (2004) proposed for doing the analysis of Turkish words with an affix
stripping approach and without using any lexicon. The rule-based and agglutinative structure of
the language allows Turkish to be modeled with finite state machines (FSMs). Theoretically,
corpus is (C)apable (O)f (R)epresenting (P)otentially (U)nlimited (S)elections of texts.
Inflectional morphological analyzer for Sanskrit, suggests a Sanskrit morphological analyzer that
identifies and analyzes inflected noun-forms and verb-forms in any given sandhi-free text. An
Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite
State Networks describes Morphological ambiguity is a major concern for syntactic parsers, POS
taggers and other NLP tools. For example, the greater the number of morphological analyses

given for a lexical entry, the longer a parser takes in analyzing a sentence and the greater the
number of parses it produces. Xerox Arabic Finite State Morphology and Buckwalter Arabic
Morphological Analyzer are two of the best known, well documented, morphological analyzers
for Modern Standard Arabic (MSA). In a work for a Rule based Morphological Analyzer for
Classical Tamil Text, the analyzer identifies root and suffixes of a word and assigns its
grammatical categories.
More accurate results are generated by using the rule based approaches. The rule based
approach used for morphological analysis which are based on a set of rules and dictionary that
contains root words and morphemes. A Novel Approach for English to Dravidian Language
Translation System developed a statistical machine translation system for English to South
Dravidian languages like Malayalam and Kannada by incorporating syntactic and morphological
information. A bilingual corpus was used to extract data for translating from one language to
another.
In Indonesia, there are some morphological analyzers have been introduced, for instance
MorpInd by Larasati (2011) as one of her project collaborated with her colleagues. It is a robust
finite state morphology tool for Indonesian (MorphInd), which handles both morphological
analysis and lemmatization for a given surface word form, so that, it is suitable for further
language processing. MorphInd consists of morphosyntactic and morphophonemic rules for
Indonesian derivational or inflectional surface words. It uses positional tagset with 3 different
morphological tags and a special lemma tag that directly follows lemma. MorphInd has wider
coverage on handling Indonesian derivational and inectional morphology compared to an
existing Indonesian morphological analyzer, along with a more detailed tagset and also outputs
the analysis in the form. The implementation was done using nite state technology by adopting
the two-level morphology approach implemented in Foma. It achieved 84.6% of coverage on a
preliminary stage Indonesian corpus where it mostly fails to capture the proper nouns and foreign
words as expected initially. Taken into consideration of these studies, the writer thinks that an
efficient morphological analyzer for derivational affixes of Minangkabau language is required to
be developed since there has no studies related to this issue been done yet.

3. Morpheme-Derivational Morphology

It is a word made up of meaningful units (morphemes). Morpheme isolates certain


minimal units meaning. Also, a morpheme can be realized as one phoneme such as the plural /s/
or more than one phoneme such as cat /keit/. Some morphemes are called lexical morpheme,
have meaning in and off themselves; others are called grammatical morpheme, specify the
relationship between one lexical morpheme and others. A morpheme which can meaningfully
occur alone is called a free morpheme or a root for example, ambiak and bukak. However
bound morpheme must occur with at least one other morpheme. For example, morpheme -an
in the word ambiakan, cannot stand alone, it needs other free form. Thus, Robin (1989:196)
classifies bound morpheme as affix and free morpheme as a root.
Jurafsky and Martin (2007) stated that derivational morphology is the combination of a
word stem with a grammatical morpheme, usually resulting in a word of a different class with a
meaning hard to predict exactly. Affix is one of derivational morphology phenomena which is
added to the front or the final position of words. Affix is also called bound grammatical
morpheme that can be subdivided into two; prefix and suffix, depend whether they are attached
to the beginning of lexical morpheme as in depress. Verhaar (1999:107) states that among the
morphological process, the most important process is affixation. He classifies affixes into four.
They are (a) prefix; it is added in the beginning of word or base. (b) Suffix; it is added in the
final position of word or base. (c) Infix; it is inserted into the word or base. (d) Confix (also
called simulfix, ambifix, or sirkumfix); it is added in the beginning and the final position of the
words.
Chaer (2003:177-195) says that in affixation process there are three elements that are
involved; the base form, affixes, and grammatical meaning resulted. Seen from the position a
suffix attach the words, he subdivided affix into prefix, suffix, infix, confix, interfix and transfix,
and also ambifix and circumfix. He adds that in morphological process there are six processes
involved; affixes, reduplication, composition, internal modification conversion, suplesion,
compounding, and productivity process of morphemes. In line with this, Wikipedia (2009:1)
divides affixes into eleven. There are prefix, suffix/postfix, infix, circumfix, interfix, duplifix,
confix, transfix, simulfix, suprafix, and disfix. Prefix and suffix are extremely common term
while infix and circumfix are less so, because not all of languages use them in morphological
process, especially European languages.

Related to the Minangkabau language, Hoa Nio (1979) says that there are two
morphological processes in the Minangkabau language: affixation and reduplication. He adds
that morphologically, there are also two classification of word in the Minangkabau language:
mono-morpheme words and poly-morpheme words. According to Moussay (1998:66-68) there
are 24 prefixes in the Minangkabau language; ba-, bar-, di-, ka-, maN-, pa-, paN-, par-, sa-, ta-,
tar-, baka-, baku-, bapa-, bapar-, basi-, dipa-, dipar-, mampa-, mampar-, mampasi-, tapa-,
tapar-, tasi-. For example in the words ba-salang, bar-anak, di-agiah, ka-andak. In addition
there are also 5 suffixes in the Minangkabaunesse. They are an, -i, -kan, -lah, -nyo. For example
are buai-an, sakik-i, kalah-kan, io-lah, elok-nyo. There are also 50 confixes in the
Minangkabauness. While there are 2 infixes in the Minangkabu language; -am-, -um-. For
instance Pamuncak comes from puncak, turun tumurun comes from turun.
In conclusion, a morpheme is a short segment of language that meets three criteria. First,
it is a word or part of the word that carries meaning. Second, it cannot be divided into smaller
meaningful without violation of its meaning or without meaningless remainders. Finally, it recurs
in differing verbal environment with a relatively stable meaning. To form a word, there are also
several ways that can be applied. Furthermore, word formation of a language may be the same
and may be different with other languages. One language may have suffixes, infixes, and
prefixes in forming their words, some other may only have suffixes and prefixes. Thus it is clear
that word formation is a way to form words by adding and combining a bound morpheme or
more with a free morpheme.

DATA PRESENTATION AND CONCLUSION


1. Data Presentation
This part consists of data presentation and example of morphological analysis for
derivational affixes in Minangkabau language. It was limited to some sample of data as one of
example since the morphological analysis is a complex process. It uses positional tagset with 3
different morphological tags and a special lemma tag that directly follows lemma. The complete

tagset can be found in at the tagset section. By giving this example, the further analysis of any
kind of derivational affixes is still can be done by reflecting to the example of the data below.
Ph. wakmangirimannyo (ph. I send him/her)
Lemma position

Awak<p>_PS1

maN+kirim<v>+an_VSA

inyo<p>_PS3

proclitic

enclitic

morphological tag
I

send

him/her

Figure 1. Output Structure


The surface word form is followed by 1 up to 3 morphological tag(s). The lemma tag
directly followed the lemma, so that the lemma can be easily recognized for a lemmatization
purposes (see lemma position on the figure above). Extra chunks, such as clitics (proclictic and
enclitic) or particles, are analyzed as an independent surface word form but glued to the main
chunk by a plus sign (+).
Table 1. Morphological Tagset
1st Position
N

Noun

2nd Position
P

Plural

Feminine

Singular

Masculine

Non-Specified

Personal Pronoun

3rd Position

Plural

First Person

Singular

Second Person

Third Person

Verb

Plural

Active Voice

Singular

Passive Voice

Numeral

Cardinal Numeral

Ordinal Numeral

Collective Numeral

Adjective

Plural

Positive

Singular

Superlative

Coordinating Conjunction

Subordinating Conjunction

Foreign Word

Preposition

Modal

Determiner

Adverb

Particle

Negation

Interjection

Copula

Question

Unknown

Punctuation

Table 2. Lemma Tagset


Lemma Tag
n Noun
p Personal Pronoun
v Verb

c Numeral
q Adjective
h Coordinating Conjunction
s

Subordinating Conjunction

Foreign Word

Preposition

m Modal
b Determiner
d Adverb
t

Particle

g Negation
i

Interjection

o Copula
w Question
x Unknown
z Punctuation

This section shows several tool output examples. Below given a phrase example with proclitic
and enclitic:
(ph. I deliver
ph.

Awak<p>_PS1+maN+kirim<v>+kan_VSA+inyo<p>_PS

Wakmangirimannyo him/her)

Yields

In some derivational case, the lemma lexical category can be different than the lexical category
of the whole surface form, as shown in the example below:
v. kirim

(v. deliver)

yields

kirim<v>_VSA

v. mangirim

(v. deliver)

yields

maN+kirim<v>_VSA

n. kiriman

(n. package)

yields

kirim<v>+an_NSD

n. pangiriman

(n. delivery)

yields

paN+kirim<v>+an_NSD

Below given the plural surface word form. There are also several special plural cases using infix,
which hardly coded in the dictionary:
n. gerigi

(n. teeth)

yields

gerigi<n>_NPD

n. gigi-gigi

(n. teeth)

yields

gigi<n>_NPD

Below is given the example of numeral-noun agreement:


n. 2 kaco

(n. 2 mirrors)

yields

2<c>_CC- mirror<n>_NSD

yields

duo<c>_CC- kaco<n>_NSD>

(lit n. *2 mitrror)
n. duo kaco

(n. two mirrors)


(lit n. *two mirror)

n. Kaco-kaco

(n.mirrors)

yields

kaco<n>NPD

n. *2 kaco-kaco

(lit n. 2 mirrors)

yields

2<c>_CC- kaco<n>_NPS

Below given the example of numeral alternation:


num. 2

(num. 2)

yields

2<c>_CC-

num. duo

(num. two)

yields

duo<c>_CC-

num. ke-2

(num. second)

yields

ke+2<c>_CO-

num. kaduo

(num. second)

yields

ka+duo<c>_CO-

2. Conclusion
This analysis produces robust morphological information in the output format i.e.
morphemic segmentation, lemma morpheme position, lexical category, and morphological
feature. The new robust tagset with broader categorization that it uses is also suitable for a
further language processing such as parsing. With a good selection of the lexical entries, by
choosing the most frequent and productive lemmas, the deeper and wider analysis would
possibly be done.
It is realized that the analysis of Minangkabaunese language as one of the local language
in Indonesia is still far from expected in number of studies especially in the field of
computational linguistics. Furthermore, the writer hopes those who care about the existence of

Minangkabaunese language should do some attempts in developing it from any aspect of


linguistics analysis.

BIBLIOGRAPHY
Isman, Jakub. Dkk. 1978. Kedudukan dan Fungsi Bahasa Minangkabau Di SUMBAR. Jakarta:
Pusat pembinaan dan Pengembangan Bahasa Departemen Pendidikan dan
Kebudayaan.
Jufrizal. 2007. Tipologi Grammatikal Bahasa Minangkabau: Tataran Morfosintaksis. Padang:
Universitas Negri Padang Press
Jurafsky, D. and James H. Martin. 2006. Speech and Language Processing: An introduction to
natural language processing, computational linguistics, and speech recognition.
Prentice Hall: New Jersey
Larasati, S.D et al. 2011. An: Indonesian Morphology Tool (MorphInd): Towards an Indonesian
Corpus. SFCM 2011. Zurich, Switzerland. To be appear in Springer CCIS
proceedings of the Workshop on Systems and Frameworks for Computational
Morphology
Moussay. G. 1998. Tata Bahasa Minangkabau. Jakarta: Gramedia.
Nio, Be Kim Hoa et.al. 1979. Morfologi dan Sintaksis Bahasa Minangkabau. Jakarta: Pusat
Pembinaan dan Pengembangan Bahasa Departemen Pendidikan dan Kebudayaan
Song, J. J. 2014. Linguistic Typology: Morphology and Syntax. Taylor and Francis: London
.
Viklund. Andreas. 2008. Didaktika Bahasa Minangkabau: Persoalan Ragam dan Konotasi
Bahasa. Retrieved from http://www.wordspress.com on 27th April 2016.
https://en.wikipedia.org/wiki/Minangkabau_language
http://www.ethnologue.com/18/language/min/

Вам также может понравиться