Вы находитесь на странице: 1из 4

2009 Second International Symposium on Knowledge Acquisition and Modeling

WordNet-based Way to Identify Chinglish in Automated Essay Scoring Systems

Wen Zhuge 1, Jingyu Hua 2


1 Foreign Language College, Hangzhou Dianzi University, Hangzhou, China
2 College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
e-mail: chirs3@163.com, eehjy@163.com

AbstractRecent years have witnessed the success of The paper is organized as follows: Section 2 clarifies
Automated Essay Scoring (AES) in western countries. the definition of Chinglish, as well as the obstacle it
However, its further promotion in China is limited by the causes in AES systems. Subsequently, a semantic structure
negative transfer of non-native learners. Therefore, this fit for identifying Chinglish is presented in Section 3.
paper proposed a new way to identify Chinglish, which Finally, experimental results and conclusion are made in
obstructs the development of AES in China. This WordNet- Section 4.
based method starts with dealing with semantic relations
between English verbs, calculates semantic distances 2. CHINGLISH IN ESSAYS
between subjects and objects, and then realizes the
identification of Chinglish by threshold. Experiments 2.1 Definition of Chinglish
conducted in one university show that the proposed way
performs well in identifying Chinglish in college students Transfer is a term used by psychologists in their
English essays. accounts of the way in which present learning is affected
by past learning[8]. Odlin has suggested that transfer is
Keywords-Automated essay scoring; Identification of the influence resulting from the similarities and
Chinglish; WordNet; Semantic calculation differences between the target language and any other
language that has been previously (and perhaps
1. INTRODUCTION imperfectly) acquired [9]. When learning a foreign
language, an individual already knows his mother tongue,
Writing has been considered as an effective method to so the knowledge and skill of the native language will be
measure a language learners language proficiency. It transferred to the foreign language unconsciously. Since
accounts for a large proportion of classroom teaching as the way one thinks determines the way one speaks, the
well as proficiency tests. However, the traditional way to differences between the native and the foreign language in
score an essay not only costs a great deal of labor and linguistic structure, social culture and logic thought,
material resources, but also is greatly affected by scorers predestine the existence of negative transfer in second
language ability and personal preferences, therefore language learning.
reducing the credibility of the test [1]. However, The Han nationality puts more emphasis on the whole
automated essay scoring (AES) based on corpus and when they think, which, in turn results in an analytical
artificial intelligence technology could perform this task language --- Chinese, while British and Americans, who
with high efficiency and impersonality [2]. With labor and weigh reason and analysis more, create a synthetic
resources saved, language testing is entering into an era in language --- English. This difference in way of thinking
need of AES systems directly leads to the appearance of Chinglish.
Currently, there already exist an ocean of AES systems Generally speaking, Chinglish appears in two forms:
overseas, such as PEG, IEA, and Intelli-Metric, which are vocabulary and theme [10]. The former is caused by the
based on shallow parsing, latent semantic analysis and the unequivalence of the concept aroused by certain words in
combination of artificial intelligence and statistics, Chinese and English, whose meaning are simply taken for
respectively. They have already achieved a relatively granted in one culture. Take the sentence We played very
accurate scoring task [3-7]. However, these systems are happily in the party last night for example. In Chinese, the
English speakers oriented, thus they neglect the problems concept of play covers almost every interment in our life;
caused by the negative transfer of non-native learners. As yet, the counterpart of this concept in English varies. They
far as Chinese learners are concerned, Chinglish play computer games, have fun in a party, amuse
sentences, i.e. the hybird of Chinese way of thinking and themselves with books, or enjoy the movie very much. The
English syntax, should be given priority in the research of latter results from the confusion of a subject and a theme.
AES technologies. A subject in an English sentence is usually a person or a
Seeing that Chinglish sentences accord with English thing, represented by a noun, pronoun, or noun phrase. But
syntax, it is difficult for the common syntactic analyzers subject can take any form in a Chinese sentence. Any
employed by AES system to identify these special language elements can appear in the place of a subject.
linguistic phenomena. Utilizing the semantic web put Such as Our home has a dog.
forward by WordNet, the researchers start with the
semantic relations between English verbs, deals with 2.2 Obstacles Caused in AES
semantic relations between English verbs, calculates As mentioned above, Chinglish sentences are correct
semantic distances between subjects and objects, and then in terms of syntax, thus AES systems mainly based on
realizes the identification of Chinglish by threshold in this syntax analysis will make wrong judgment about these
paper. sentences.

978-0-7695-3888-4/09 $26.00
$25.00 2009 IEEE 233
229
DOI 10.1109/KAM.2009.322
Burstein has made a comparison experiment with E- 3.2 Semantic Description of The New Method
Rater in [11], in which he collected essays from both There is always a semantic database behind an AES
native learners and nonnative learners and had these system, the scheme of which directly affects the credibility
essays scored by teachers and E-Rater respectively. His of the system. To add collocation information, along with
research results can be shown in Table I. the description of semantic features possessed by the
TABLE I. COMPARISON OF ESSAY SCORES GIVEN BY TEACHERS
subject and object of a certain verb on the platform set up
AND E-RATER by WordNet will be beneficial to the identification of
Chinglish. For instance, to tag one semantic entry of the
Teachers E-Rater verb play (participate in games or sport), the semantic
Mother
Standard Standard
tongue Average
Deviation
Average
Deviation
feature of its subject (on the tree of (living_thing)) and its
object of ({game} {device}) should be included.
Arabic 3.83 0.973 3.67 0.947
According to Case Grammar, arguments covering
Chinese 4.09 0.884 4.12 1 agent, patient, location, and instrument / benefactive etc.
are all attached to verbs. For most verbs, subject, object
Spanish 3.96 0.986 3.7 0.915
and benefactive are what researchers care most. Thus, we
American 4.96 0.624 4.93 0.814 exclude the other semantic information in our scheme and
depict a web between nouns and verbs in Table II.
From Table I, we can clearly see that only essays from
Chinese learners receive higher marks from E-Rater than TABLE II. RELATION BETWEEN NOUNS AND VERBS
teachers. To account for it, Ge pointed that Chinese
learners large vocabulary shadowed their misuse of Part of Semantic
Vocabulary Agent Patient
sentence structure, thus cheating the AES system and speech Category
affecting its accuracy [12].
House n. location human none
3. WORDNET-BASED WAY TO IDENTIFY CHINGLISH Food,
Living
dog n. animal living
thing
3.1 Semantic Relations in WordNet thing

In December 2006, WordNet launched its latest father n. human entity entity
edition, WordNet 3.0 for Unix systems. The WordNet wash v. change
Living
object
project was launched by a group of psycho-lexicologists thing
Living
and linguists in Princeton University since 1985. Being an have v. possesion
thing
entity
electronic dictionary based on psychological research, Living
WordNet is an effective combination of both traditional walk v. change location
thing
lexicographical knowledge and modern computer
technology. It employs the spelling system, which is Take walk for example, it is marked as a change verb
familiar to anyone with some knowledge of English, to in our scheme, whose agent should be human beings or
symbolize word form and introduces synsets to stand for animals, who are capable of walking. Besides, walk in
word sense. English is an intransitive verb, so a patient other than
The most ambitious feature of WordNet, however, is location cannot follow it. Thus, I walked into the dog is
its attempt to organize lexical information in terms of incorrect while I walked into the house is acceptable. This
word senses, rather than word forms. In that respect, scheme is quite effective in identifying typical Chinglish
WordNet resembles a thesaurus more than a dictionary. Its sentences.
intricate yet quiet clear representation of hyponymy, Because of the limited space of this paper, the way to
which is realized by pointers, chains and lists in its construct such a semantic database will not be mentioned.
database, created a hierarchical semantic system, or an All the analysis below will be performed with the database
inheritance system for words. Thus, one can easily trace constructed by the semantic dictionary of English verbs
the hyperym, hyponym, co-hyponym and even holonym based on WordNet and FrameNet [15].
and meronym of certain word easily through WordNet 3.3 Indentification of Chinglish
browser.
With the help of the noun categories, as well as the
All the nouns in WordNet form a single thematic
semantic web of hypernym and hyponym provided by
hierarchical tree, whose root is entity, because inheritance
WordNet, this new method starts with semantic relations
underlines their semantic relations. Adjectives and adverbs
between verbs. By calculating semantic distances between
assemble satellite synsets of antonyms, since the bipolar subjects and objects, the identification of Chinglish can be
nature is their unique and outstanding feature. Altogether, realized by threshold. Take the two sentences mentioned
there are 25 categories in WordNet, which can be in the precious section as examples, we can employ the
furthered grouped into 11 classes: entity, abstraction, method to make a judgment.
psychofeatures, natural phenomenon, activity, event,
Our home has a dog.
group, location, possession, shape and state. Similarly,
verbs, which scattered in 15 semantic domains, weave a The syntactic analyzer tells us that has is the verb in
network of entailment, for this network covers almost all this sentence. Matching it with its stem have in the
the semantic relations among verbs [13-14]. database, we got the semantic restriction of its agent and
patient, being living thing and entity respectively. Then,
the method casts back the semantic categories of the agent

230
234
and patient in the current sentence, i.e. home and dog. 1.
WordNet returns the research as:
08559508 15 n 02 home 4 place 6 005 @ 4. CONCLUSION
08558963 n 0000 + 02537960 v 0201 + 00477661 This paper proposed a new way, which is based on
a 0102 + 02005347 v 0101 ~ 08559766 n 0000 | WordNet and semantic information description, to identify
where you live at a particular time; "deliver the Chinglish. As shown in the examples, this method can
package to my home"; "he doesn't have a home to identify structurally correct Chinglish effectively. We also
go to"; "your place or mine?" made experiments with college students in Zhejiang
As weve mentioned in the previous section that @-> province in China. Of the 300 randomly collected sample
means is the hyponym of, we can see that synset {home, essays written by freshman who majored in
place} has a hypernym, numbered 08558963, i.e. synset communication technologies, there are 147 Chinglish
{residence, abode }. As long as @-> does not point to , sentences identified by English teachers. And our method
the search keeps going. Ultimately, weve got: successfully identified 93 of them. This new method can
be used as a supplement to the existing AES systems.
home, place @-> residence, abode Merging the structure of FrameNet, and matching its
@-> address semantic concept with HowNet, the power of this method
@-> geographic point, geographical could be strengthened and help improve the accuracy of
existing AES systems in the future.
point
@-> point ACKNOWLEDGMENT
@-> location
This paper is sponsored by Zhejiang Provincial
@-> object, physical object
Science Program (2007C21026) and Zhejiang Natural
@-> physical entity
Science Fund (Y1090645).
@-> entity
The subject locates under the root nod of {physical
entity}, while according to the semantic description of the REFERENCES
subject features of have stored in the database, its subject [1] J. F. Wen, A reliable scoring system for norm-referenced tests of
should be on the tree under the nod {living thing}. Both English writing, Journal of Guangzhou University(Social Science
{physical entity} and {living-thing} are on the second layer Edition), vol.2, no.11, Nov. 2003, pp. 84-87, doi: cnki:ISSN:1671-
394X.0.2003-11-020
of a semantic tree. According to ontology, the higher two
[2] L. Streeter, J. Pstoka, D. Laham, and D. MacCuish, The credible
nods are, the remoter they will become. So the subject grading machine: Automated essay scoring in the dod, http:
home and the verb have do not match semantically. That is //www. k2a2t. com /papers/CredGrad2ing2002. pdf, 2003 /2006 -
to say, although this sentence is grammatically correct, it 03 20
is improper semantically. [3] S. L. Ge and X. X. Chen, An Overview of Current Automated
Essay Scoring Techniques, Media in Foreign Language
Instruction, no.117, Oct. 2007, pp. 25-29 , doi:
We played very happily in the party last night. CNKI:SUN:WYDH.0.2007-05-006.
Because play is used as an intransitive verb here, we [4] M. Hearst, K. Kukich, L. Hirschman, E. Breck, M. Light, J. Burge,
can make use of the place noun party to perform the L. Ferro, T. K. Landauer, D. Laham, and P. W. Foltz, The debate
calculation. After calculation, we get: on automated essay grading, IEEE Intelligent Systems, vol.15,
no. 5, Sept./Oct. 2000, pp. 22-37.
[5] S. Elliot, IntelliMetric: from here to validity, Automated essay
party @-> affair, occasion, social occasion, function, scoring: a cross disciplinary perspective, Lawrence Erlbaum
social function Associates, 2003, pp. 71-86.
@-> social event [6] T. Landauer, D. Laham, and P. Foltz, Automated Essay Scoring
@-> event and Annotation of Essays with the Intelligent Essay Assessor,
@-> psychological feature Automated essay scoring: a cross disciplinary perspective,
Lawrence Erlbaum Associates, 2003, pp. 87-112.
@-> abstraction
[7] S. Valenti, F. Neri, and A. Cucchiarelli, An overview of current
@-> abstract entity research on automated essay grading, Journal of Information
@-> entity Technology Education, vol. 2, 2003, pp. 319-330.
The comparison between the semantic categories where [8] J. Miao, and X. G. Ni, A Theoretical Study of Language
party and the object of play fall into, i.e. {abstract entity}, Transfer, Journal of Inner MongoliaCollege of Education, vol. 13,
and {game} {device}, tells us that these two nods also no.4, Dec. 2000, pp. 12-17, doi: cnki:ISSN:1008-7451.0.2000-04-
004.
belong to two different tress, and therefore, Chinglish can
[9] T. Odlin, Language Transfer:Cross-Linguistic influence in
be found here. Thus it can be seen that, utilizing the language Learning, Shanghai Foreign Language Education Press.
semantic categories and the semantic relations provided by Shanghai, Oct. 2001.
WordNet is an efficient way to identify Chinglish in an [10] Y. S. Zhao, N. N. Liu, Error Analysis about College English
essay. We can analyze the structure of sentences by syntax Writing based on the Difference between Chinese Thinking and
English Thinking, Journal of Weifang Educational
analyzer, and then match the semantic description of the College,vol.21, no. 2, 2008, doi: CNKI:SUN:WFJY.0.2008-02-
collocation information of the verb with its subjects or 042.
objects, calculate the semantic distances to make a [11] J. Burstein, and M. Chodorow, M. Automated essay scoring for
judgment. The whole process can be summarized in nonnative English speakers, Joint Symposium of the Association
Figure of Computational Linguistics and the International Association of
Language Learning Technologies, Workshop on Computer-

231
235
Mediated Language Assessment and Evaluation of Natural [14] C. Fellbaum, WordNet: an electronic lexical database,
Language Processing, College Park, Maryland, June 1999. Cambridge,MA, USA: MIT Press, 1998.
[12] S. L. Ge and X. X. Chen, Automatic Scoring for Chinese EFL [15] W. Zhuge, The Construction of Semantic Dictionary of English
learners, Proceedings of the 3rd Students' Workshop on Verbs Based on WordNet and FrameNet, unpublished
Computational Linguistic, 2006.
[13] Q. X. Chen, WordNet: an on-line Thesaurus, Applied
Linguistics, no.2, 1998, pp.93-99.

Figure 1. The process of identifying Chinglish

232
236

Вам также может понравиться