The Past, Present and Future of Machine Translation

The past, present and future of Machine Translation
Micha Szewczyk, University of Warsaw Abstract. The aim of the following paper is to investigate the current condition as well as possible future development of Machine Translation (MT). An attempt is made to answer the question if, and to what extent, computers could replace humans in the process of translation. The paper begins with a concise outline of the history of machine translation. The following section provides an analysis of the main approaches to MT, as well as of the problems that are likely to be encountered in the proccess, based on comparison of human and machine translation of the selected passages. Having conducted the analysis, the author highlights the prospects of machine translation and ponders on its capacity to support and possibly replace human translators in their work.
I. Introduction
Machine Translation, also refered to by the abbreviation MT, is a field of computational linguistics concerned with the use of computers in the proccess of translating messages from one natural language to another. Over more then sixty years of its development, it has been subject to scientific ventures and heated debates by linguists, computer scientists, engineers, psychologists and philosophers. Its begginings trace back to Cold War, when the competition between the United States and the USSR created the need for excessive translation of documents from English to Russian and vice versa. Since then, MT has been developed with the aim of facilitating and accelerating the work of human translators and even eliminating human factor from the process of translation in the future. The author of this paper attempts to answer the question whether or not such an objective is attainable and what may be expected of MT based on its history and the present status. Since the contemporary civilisation is based on information and the need for cross-cultural, international cooperation between people is greater then ever, this issue seems to be particulary interesting and relevant.
II. Historical background

Modern Machine Translation has begun in the second half of the XXth century, when the global political situation and the onset of Cold War stimulated the nessesity of translating vast number of documents between English and Russian languages. The starting point of MT is often
considered to be the so-called Weaver memorandum of 1949. A letter written by Warren Weaver, who then was a vice president of Rockefeler Foundation was distributed among people potentially interested in the developement of MT. Although it had a predominantly strategic meaning, it covered several important methodological and theoretical problems, such as the question of ambiguity, logical rudiments of language and analysis of linguistic universals. The memorandum was an impulse to begin research over MT in a number of American universities. As a result, the first scientific conference was held in 1952 and two years later a public demonstration of machine translation took place. The event, known as Georgetown experiment, involved fully automated translation of more than sixty statements on a variety of subjects from Russian into English. The experiment was successful and proved influential, since it encouraged the government to allocate money in the field of computational linguistics and stimulated research in MT outside the United States, notably in the USSR. The research intensified in the 1950s as well as in the first half of the following decade. However, the initial enthusiasm decreased over time since exceeding the standards established in the Georgetown experiment proved unexpectedly problematic. The quality of fully automated translations lowered along with expanding vocabulary and the set of grammatical rules. Furthermore, the researchers encountered problems concerning word choice in case of multiple meaning and dealing with ambiguous semantic structures. Bar Hillel, one of the most influential figures in MT field at that time, recommended combining automated translation with human postediting and although this policy has been implemented in many on-going projects, MT faced increasing criticism. Finally, in 1966 the National Academy of Sciences Automatic Language Processing Advisory Committee (ALPAC) published a raport critical of current advancements of MT. The raport recommended limiting the financial support for further research. As it was pointed out, the cost of machine supported translation was higher then it would be in case of purely human translation. The report caused American efforts in MT to be greatly reduced for the following fifteen years. Nevertheless it still has been developed in other countries, which resulted in successful projects such as TAUM-METEO, which translated weather reports from English into French, or SYSTRAN. The field of machine translation gradually revived in the United States and all over the world and in the 1990s it became one of the most vital domains of computational linguistics.
III.
Strategies and approaches to Machine Translation
Since the beginnings of Machine Translation interfered with the dawn of modern computer era, MT pioneers had to surmount numerous obstacles of purely technical nature. The first objective was to divise an automated bilingual dictionary suitable for machines of largely limited storage and computational capacity. One of the methods of reducing dictionary size involved the division of words into stems and endigs, so as not to include all the inflected forms of nouns and verbs. This triggered the first systematic morphological research for the purpose of translation, yet the analysis quckly proved overly complex for some languages, such as German or Russian. Thus, initially, automated dictionaries contained all the inflected word forms. This forced researchers to adopt simplified strategies of machine translation, most notably the word-for-word approach. It involved finding the equivalents of the Source Language words in the Target Language and substitution without taking morphological analysis or the word order into account. Obviously, this method was not expected to produce coherent or even comprehensible translations. However, it may be useful in translation of long lists of phrases, such as short catalogues or inventories. Although it is possible to devise procedures of basic stuctural analysis and rearrangement of word within the dictionary-based systems, producing high quality translation requires the ivestigation of the phrase and clause relationships. Another serious problem is polysemy, as word may have multiple meanings in the TL, depending on context. Some words may function as different parts of speech with no formal distinction, e. g. the English word record. Also, some languages make more subtle distinctions in meaning than others, e. g. the English verb to know may be translated either as savoir or connaitre into French. The syntactic issues are especially stressed in the classic, rule based approach to MT . In this method, sentence structures of SL and TL are represented by two different sets of rules and another set contains the rules of relating the two structures together. First, all the words are identified as proper parts of speech. The next step is to retrive specific syntactic information concerning the verb and its possible phrasal contexts and to parse the sentence by assigning each word to a proper phrase. Finally, the words in the sentence are translated, mapped on the syntactic structure relevant for the TL and inflected. The rule-based MT divides into two different subtypes: Transfer Based and Interlingua MT. Transfer systems permit contextual substitution SL lexical units with those of TL, which is possible as a result of a syntactic analysis. The interlingua systems aim at representing the meaning of a source text by means of an artificial and unambiguous formal language. The meaning is then rendered through syntactic structures and lexical units of the Target Language. Since extracting the
deep meaning from a natural language text is complex both in technological and empirical terms, few large-scale interlingua projects have been completed. However, most new Transfer based systems tend to be interlingual by nature and handle semantic problems with the use of dictionaries containing disambiguation information, rather than purely syntactic analysis. Disambiguation, one of the key problems in the field of Machine Translation, often requires extratextual knowledge of how the world functions. Knowledge based MT systems are an attempt to implement such information in the forms of conceptual trees or networks and divising algorithms supposed to select the appropriate candidates. Such systems are based on assumtion that the traditional syntactic methods do not solve a certain class of problems, thus syntax issues are solved by means of semantic discription. In statistical and example based systems, translations are generated on the basis of large text corpra, which serve as sources for deriving the parameters of statistical models. A sentence in a text is translated according to the probabilty rate that a string in the TL is a translation of this sentence. Such systems are cost-effective and do not requre manual implementation of rules, while being largely independent of a language pair chosen. Generated translations tend to be more natural if the available corpus is sufficiently vast to contain a close equivalent of a given sentence. In example based systems words are translated as their inexact matches in the TL (e.g synonymous or hyponymous expressions).
IV.Machine versus Human Translation

According to Jiri Levy, (Levy 1967) translation is a decisional process and the choice of lexical units (as well as higher-level units) is governed by a system of concious and subconcious instructions. The instructions are both objective, dependant on the linguistic material (semantic, rythmic, stylistic etc.) and subjective, such as the structure of the translator's memory and their aesthetic norms. A unit may be chosen from a set of potential candidates (a paradigm) on the basis of such factors as the potential meanings of a word, different conceptions of a character's style or stylistic and philosophical preferences of the author. As it may be observed, human translators have a wide spectrum of arbitrary and extratextual infirmation at their disposal. As it was shown in the previous section, providing such information to machines as well as processing it may be complex and challenging in many aspects. In the following section, the author is going to compare human and machine translations of several fragments of a literary text (Slaughterhouse Five by Kurt Vonnegut) and a fragment of a techical text. The comparison will be followed by a short analysis of the problems that are likely to occur in the process of machine translation.
Original passage (1):
So I held up my right hand and I made her a promise. ''Mary,'' I said, ''I don't think this book is ever going to be finished. I must have written five thousand pages by now, and thrown them all away. If I ever do finish it, though, I give you my word of honor: there won't be a part for Frank Sinatra or John Wayne. ''I tell you what,'' I said, ''I'll call it The Children's Crusade.'' She was my friend after that. (Vonnegut 1969)
Human Translation:
Podniosem wic praw rk i zoyem jej obietnic: Mary powiedziaem. Nie sdz, aby kiedykolwiek udao mi si skoczy t ksik. Napisaem ju chyba z pi tysicy stron i wszystko wyrzuciem. Jeeli jednak kiedykolwiek j skocz, to daj ci sowo honoru, e nie bdzie w niej roli dla Franka Sinatry ani Johna Wayne'a. Wiesz, co ci powiem? Dam jej tytu Krucjata dziecica. Od tej chwili bylimy przyjacimi. (Jczmyk 1972)
Translation by Translatica
Wic uniosem swoj praw rk i zoyem przyrzeczenie jej. '' Maria, '' powiedziaem '' nie myl, e ta ksika zamierza zosta skoczonym kiedykolwiek. Musiaem napisa pi tysic stron ju, i wyrzucony ich wszystkich. Jeli kiedykolwiek kocz to, jednak, daj ci swoje sowo honoru: tam nie by czci dla Frank Sinatra albo Jana Wayne. '' Ja wiesz co? '' powiedziaem '' nazw to Krucjat Dzieci. '' bya moj przyjacik po tym. (Translatica)
The machine translation of the fragment is intelligible, although imperfect in many aspects. For instance, the noun book has been incorrectly identified as the agent in the second sentence and the verb thrown has been treated as an adjective. The translation involves mistakes concerning word order, inflection and tenses, although the message is not gravely disrupted. Thus, the simple text has been translated successfully and requires minor post-editing to reach an acceptable quality level. Original passage (2):
'Close it up and keep it closed!'' Roland Weary warned Billy Pilgrim as they moved out. Weary looked like Tweedledum or Tweedledee, all bundled up for battle. He was short and thick. (Vonnegut 1969) Human translation:
Morda na kdk! ostrzeg Roland Weary Billy'ego. Weary wyglda jak Kubu
Puchatek wyruszajcy na wojn. By niski i gruby. (Jczmyk 1972)
Translation by Translatica:
'' Zamyka to i trzyma to zamknity! '' Rol and Weary ostrzeg Billy Pil gri m poni ewa wyprowadzili si. Znuony wyglda jak Tweedl edum al bo Tweedl edee, wszystko zebra o w plik dl a bitwy. By krtki i tpy. (Transl ati ca)
The second passage is a good example of how cultural awareness may affect translation. The human translator decided to remove the names of the fictional characters from the English language nursery rhyme (Tweedledum and Tweedledee) and replace them with another character of children fiction, possibly better known to a Polish reader. The machine translator ignored the names, which was acceptable, but it failed to establish the correct reference of words in two cases (the pronoun 'it' as refering to the character's mouth and 'bundled up' as a phrasal verb refering to Roland Weary). Furthermore, the word 'Weary' was incorrectly recognised as an adjective the socond time, due to its sentence-initial position. The capital letter was identified as an indicator of a new sentence, rather then of a proper name. The problems with inflection (ostrzeg Billy Pilgrim, Znuony wyglda) suggest that the machine had problems with parsing the fragment and establishing the correct syntactic structure.
Original Passage (3):

I had two books with me, which I'd meant to read on the plane. One was Words for the Wind, by Theodore Roethke, and this is what I found in there: I wake to steep, and take my waking slow. I feet my late in what I cannot fear. I learn by going where I have to go. (Vonnegut 1969) Human Translation:
Wiozem ze sob dwie ksiki, ktre miaem zamiar czyta w samolocie. Jedn z nich byy Sowa na wiatr Teodora Roethke i oto co w niej znalazem: Budz si, aby ni, i wkraczam w sen powoli. Tam, gdzie strach si nie czai, szukam przeznaczenia.
Idc, ucz si drogi, ktr zmierza musz. (Jczmyk 1972)
Translation by Tumacz Komputerowy:

Miaem dwie ksiki ze mn, ktry zamierzaem przeczyta w samolocie. Jeden by S o w a dla Wiatru , przez Theodorea Roethkea i to jest co znalazem w tym:
Ucz si przez chodzenie , gdzie musz pj.
Budz si z m o czy i w zi m j bu dz cy si p o w olny. Ja st opy m j p n o w czy m nie m o g b a si.
The third fragment exposes the inability of machine translators to deal with highly figurative poetical language. The two initial sentences written in prose manage to convey the intended message, despite containing errors similar to those already discussed. However, the poem has clearly been mistranslated
by the program. The verb to steep has been treated as a separate infinitive rather then a part of a Verb Phrase, which suggests considerable problems with parsing the verse. The verb to take has been rendered by its one-to-one equivalent in Polish, despite the fact that it is used by the poet in the sense of 'proceeding'. Simlarily, the words feet and late, which have been used outside of their regular contexts and became different parts of speech, assume their literal meaning in the automated translation. Solely the last verse of the poem, which is prose by nature, has been rendered correctly. In contrast, the human translator managed to retrive the underlaying meaning correctly, while rendering the poetic style and rhythm. The following sentence comes from the manual of a digital camera. Such sources normally operate language that is simple in terms of structure and syntax and employ specialized, yet unambiguous vocabulary. Presumably, such sentences should be translated successfully by most machine translators. Original Passage: Firing the flash too close to the subject's eyes could cause a momentary loss of vision. Human Translation: Zadziaanie la m py by sk o w ej za blisk o o czu o s o by fot og rafo wa n ej m o e s p o w o d o w a c h wil o w utrat wzr oku.
Tumacz komputerowy R ozpalanie {strzelanie, w y rzu c anie} by s ku r w ni e blisk o o czu te m atu {przed mi otu} m o g o by s p o w o d o wa c h wil o w utrat wzr oku . Google Translate:
Lampy byskowej zbyt blisko oczu fotografowanej osoby moe spowodowa chwilow utrat wzroku. This sentence eploys several ambiguous words (e.g firing, flash, subject). Each of the two machine translators adopt a different approach to the question of disambiguation. The comuper translating program, which is a rule-based system, provides alternative translations which are supposed to help the user arrive at the intended meaning. Clearly, the application is not suited with specialized vocabulary, yet it is still possible to understand the message. The translation provided by the Google's on-line tool is surprisingly close to the one produced by a human. Since Google Translate is a statistic-based MT system, it processes millions of man-written texts in search for patterns. The above phrase is conventional and likely to occur in a large number of documents. Thus, a statisticbased MT system quickly recognises it as a pattern.
V.
Can machines replace human translators in the future?
On the basis of the above analysis, one may observe that Machine Translation is at present a largely immature technology. Most MT systems are focused on predominantly syntactic analysis and fail to consider extratextual factors or multi-sentence meaning in the process of translation. Many systems are overly dependent on pre-specified kinds of texts and developing broad-domain knowledge sources is still overly complex. On the other hand, the statistic based systems are completely dependant on size and quality of the available corpora. For some language pairs obtaining a reasonable corpus is difficult, thus the quality of translations may be significantly lower. The analysis has shown that machines do not create a real competition for profesional translators as far as literary and poetical texts are concerned. However, MT systems might be adequate for translating technical documentation, specialized publications of a restricted domain and materials not meant for publication. Over time, with the advancement of modern technologies such as speech recognition, MT might as well become an important means of cross-cultural and international communication between people. Translators all over the world may appreciate the help of machine translators in the process of creating rough versions of translations. With their ever-growing corpora MT systems are increasingly effective tools to support the work of human translators.
IV. References
1. 2. 3. 4.
5. Niren burg, S. Wilks, Y. M a c hin e Tra n slation . PDF. M i c hin e Tran slation 2 4. 2 (20 1 0): 6 7-6 9. S prin ge r Link . 5 Sept. 2 0 1 0 . Web. 1 6 Jan. 2 0 1 1 . Fied er er, R. O'Brien, S. "Quality and Machin e Translation: A Realistic Obje ctiv e?" Th e Ji Lev. "Przek ad Jako Proc es Podej m o w a nia Decyzji." W s p cze s n e Te or i e P r zekadu . Osb orn e, M. "MT History and Rule-b as e d Syste m s." Lecture. 9 Jan. 2 0 0 9. Web. 8 Jan. 2 0 1 1 . "Statistical Machin e Translation." Wikip edia, the F r e e E n c y cl o p e dia . Web. 1 6 Jan. 2 0 1 1 . In sid e G o o gl e Tra n slate . G o o gl e Tran slate . Web. 1 6 Jan. 2 0 1 1 . < h ttp://translate.g o o gl e.c o m/ Wdo o w s ki, P. Grabs ki, B. Tu m a cz I S o w nik Jzyka A n giel skie g o . Vers. 4. 2 0 0 7. Tran slat or. Co mputer s oft w ar e. Tu m a cz, Tran slator, Biur o Tu m a cze P w n.pl, S o w nik Hutchins, W. John. "Chapter 3: Probl e m s, Meth ods, and Strate gi es." M a c hin e Tran slation :
J o u r nal of S p e cialized Tra n slation 1 1 (20 0 9): 5 2-7 4. Web. Ed. Piotr Buko w s ki Magda Heyd el. Krak w : Znak, 2 0 0 9. 7 2-8 5. Print. < h ttp://w w w.inf.ed.a c.uk/tea c hin g/c ours es/ mt/lectures/hist ory.pdf > .
6. 7. 8. 9. 10.
< h ttp://en. wikip edia. or g/ wiki/Statistical_ m a c hin e_translati on > . a b out/intl/en_ALL/ >. Co mputer s oft war e. A n giel ski, Nie mi e cki, R o s yjski, P o l s ki . Web. 1 6 Jan. 2 0 1 1 . < h ttp://w w w.translatica.pl/ >.
P a st, P r e s e nt, F utur e . Chich est er [West Sussex: Ellis Hor w o o d, 1 9 8 6. Print.
11. 12. 13.
Vonn e gut, Kurt. Rze nia Nu m e r P i. Warsza w a: Past w o w y Instytut Wyda w niczy, 1 9 7 2.
Print. Przeoy: Lech Jczmyk
Vonnegut, Kurt. Slaughterhouse Five. NY: Dell, 1991. Print. Google Translate. Web. 16 Jan. 2011. <http://translate.google.com/>.

The Past, Present and Future of Machine Translation

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Past, Present and Future of Machine Translation

Загружено:

Авторское право:

Доступные форматы

The past, present and future of Machine Translation

II. Historical background

Strategies and approaches to Machine Translation

IV.Machine versus Human Translation

Original passage (1):

Puchatek wyruszajcy na wojn. By niski i gruby. (Jczmyk 1972)

Original Passage (3):

Idc, ucz si drogi, ktr zmierza musz. (Jczmyk 1972)

Translation by Tumacz Komputerowy:

Ucz si przez chodzenie , gdzie musz pj.

Budz si z m o czy i w zi m j bu dz cy si p o w olny. Ja st opy m j p n o w czy m nie m o g b a si.

Can machines replace human translators in the future?

11. 12. 13.

Print. Przeoy: Lech Jczmyk

Вам также может понравиться