Академический Документы
Профессиональный Документы
Культура Документы
This paper deserihes the treatment of lexical gaps, The reader is suggested to consult [131 or [14] ['or more
collocation information and idioms in the English to details.
Portuguese machine translation system P O R T U G A .
N-to-I words
have fun - divertir-se
get up early - madrugar
fall in love - apaixonar-se
take advantage - aproveitar
Figure I. General structure of P O R T U G A : A - television set - televisor
analysis, G - generation. Transfer: L - swimming pool - piscina
lexical, S - structural, T - tense, Sty - style.
N-to-M words
kick the bucket - hater as bolas
' l h e main characteristics of this English to Portuguese lose one's temper - pcrdcr a paci6ncia
transhm)r are:
the separation between possible translation (which Figure 2. Translath}n gaps: One-to-many, many-
may be multiple), and best or chosen translation (de- to-one and many-to-many words trans-
cided in the "style Iransfer" module). lation.
.... z,~ ¸
"a special language called L E X T R A , which makes it easier
Other approaches to state the type of tree transformations required by lexical
transfer. (...) L E X T R A takes as data an explicit de-
In this section I mention related work and alternative sol- scription of the admissible tree structures, and guarantees
utions that have been proposed and which l lind represen- that any tree it receives or creates is bMeed an admi~'ible
tative of the present day state-of-art. 'l'herelbre, primitive tree".
approaches such as, for instance, treatment of complex
expressions as simple strings will not be surveyed. Similar solutions can be found in the Japanese-to-I!nglish
system of[10], which handles both lexical gaps and changes
of part of speech:
Machine translation: "One can specify not only the Engh'sh main verbs but also
arbitrmT phrases governed by the verbs as constants", al-
It is acknowledged by outstanding machine translation re- lowing for variables and complex patterns in tile lexical
searchers that there are MT problems which are bilingual rules. "one can provide lexieal rules directly in GRADE,
in nature. R.egarding the problem of lexical transfer, and attach them to specific items. (...) One can specify
l'sujii[17] states specific tree transformations in GRADE".
"we cannot enumerate, by monolingual thinking, different and in the old ITS translation system[12]:
concepts denoted by the verb 'produce'. (...) Only when we
are asked to translate sentences into another language, can "The contents of these records" (dictionary entries) "#rclude
we try to find appropriate target language words. (...) The transfer language statements which performs the necessary
above discussion bnplies that certain 'understanding proe- transfers as well as other referential information."
e~es" are target language dependant, and cannot be fully
specified in a monolingual manner."
Parsing
Specitically the problem of translating a word for an ex-
pression, is one of the reasons presented by Schenk[15] to In parsing, idioms have to be considered. A recent paper
-se the concept of "Complex Basic Expressiorff in the de- on idiom processing[1] lists some of their relevant proper-
~;crlption of one language. (A CBE is a basic expression ties (my rephrasing):
liom a semantic point of view, i.e., it corresponds to a basic
mearfing, and a complex expression from a syntactical • usual existence of ambiguity between literal and
point of vie,,,,'.) Miomatic readings,
• frequent discontinuity of idioms,
"Expressions that are not idiomatic, but that consist of more ® applicability of regular application of syntactic rules
than one word can be handled by means of a complex basic (like adverb(s) or auxiliaJ T verb(s) insertion),
expression in order to retain the isomorphyJ • applicability of "transformations' to the idioms proper,
(like passivation or relativization).
This approach is related to the theoretical requirement of
Ihe Rosetta MT system to btfild isomorphic grammars for The difference between "non-literal reading" and
ihe several l:nlguages dealt with by tim system• "idiomatic" reading of an expression is also pointed out.
"Ibis implies that, in this ti'amework, it is the set of all lan- Metaphoric readings are proposed to be parsed by usu.:d
guages in presence that delines what is a basic meaning. rules. The advantages of submitting idioms to "regular"
15ke this, it is possible to dispense altogether with structural syntactic rules and even to 'transformations', whenever
Irarlsfer~ possible, are emphasized.
hx general, however, most M'F systems do not make the "no additional devices need be added to the syntax in ac-
analysis phase dependant on the target language(s), and counting for the peculiarities of fixed expressions". Since
therefore it is usual to see statements like the following[4]: not only idioms can be assigned internal syntactic structure,
but an internal semantic structure as well, as "all syntac-
"the lexical rule must mark words with the corresponding tically active idiomatic expressions have a metaphorical
parts of speech,", in the "cases where a source language basis".
entry must be rephrased in the target language as an ad-hoe
combination of words which does not form a lexieal
entity". However, radically different views can be found Ibr in-
stance in [5], where, in the lexicon-grammar approach, the
It is unclear, however, how much complexity of the result concern with the representation of compound words
can be handled, or how rnuch syntactical transfo,mation (adverbs, verbs, nouns) makes Gross establish a classilica-
the new expression carl suffer. lion according to their syntactical shape, ranging fiom se-
veral degrees of variation, li'om completely frozen ("at
Some hybrid approaches cart also be found in [9], this time night") to having parts completely free ("organize in one's
apparently putting the burden on the generation phase: honor").
"The second substep of German morphological generation This author suggests that finite automata be attached to a
is the application of German word list transformations. (...) given entry in Order to describe the compound wtriation.
these rules can also be used to handle non-compositional
translations. For example, "for example" can translate "The variations of form we have enumerated can be partly
compositionally into far Beispiel, and then a G P l t R A S E handled by attaching a finite automaton to a given entry,
rule can convert this to zum Beispiel." and this automaton will describe the main grammatical
changes allowed."
l lowever, the more comprehensive way to deal with this
problcnL wilhout changing analysis accordingly, is the one In between, the need to store several pieces of infbrmation
cxemplilied by Isabcllc[7]. It was developed concerning idioms is acknowledged by Stockllf], such as
2 331
undergoing passivization, weight in the whole idiom, re- Therefore, we treat all these three problems the same way,
m o v e r of the idiom interpretation, semantic value, etc. namely, considering then~ as instances of a contrastive lex-
ical transfer problem in the scope of machine translation.
This system stores idioms as "further information c o n c e r n -
ing words", divided in two cases, "canned phrases" and We must emphasize that we are only interested in
"flexible idioms", the latter being stored under the 'thread' expression-to-expression translations when tile literal ones
of tile idiom. are not acceptable. This stems from the lhct that there is a
considerable number of fixed expressions which do not re-
Based on the claim that "the flexibility o f an idiom depends quire any special processing, as can be seen in the following
on how recognizable its metaphorical origin is", one of the list, with examples taken from several l a n g u a g e s :
goals is to "integrate idioms in our lexical data merely as
further information concerning words (as in traditional
(E) parents and children
dictionaries)".
pals e filhos
(E) ladies and gentlemen
senhoras e senhores
Generation (F) monter la moutarde au nez de
subir a mostarda ao nariz de
(F) attendre un enfant
Finally, research in natural language generation has also
esperar uma crianqa um filho
contributed to clarify and furnish solutions to the problem,
(E) take into account
Clearly, generation is one of the issues in a machine trans- tomar em conta
lation system. However, work in generation per se usually (I) prendere il toro per le c o m a
presupposes the existence of an unambiguous 'concept' pegar o touro pelos cornos
representation, and so the problems begin with the correct (E) in good hands
em boas mY.os
stating of an idea in one particular language. In this
framework, it is clear that one key concept is that of
Figure 3. Literally translatable idioms: E-English,
"collocations", or how lexical items combine in a particular
F-French, l-Italian
language.
A remark of utmost importance can be found in [11], dur- On the other hand, it didn't appeal to us to have to store,
ing the description of the D I O G E N E S generation system: for each pair source-targe t entries, the lull structural trans-
formation implied, as in the most powerful approaches
"collocalional relations are defined on lexical units, not mentioned above (cf. [101 and 171). This a p p r o a c h gives
meaning representations". origin to very heavy dictionaries, with a lot of redundancy,
moreover, since there may be similar transformations re-
peated to many entries. On the other hand, not only the
dictionary becomes very difficult to unde,'stand and modify',
Summing up (requiring someone who knows the " p r o g r a m m i n g ~ lan-
guage used), but also it makes it tightly coupled to the
T h e literature survey above supports some of our assump- structural representation a n d o r particular linguistic
tions, namely that formalism and options used in the machine translation
system, in both analysis and generation.
• there are problems which are bilingual in nature, and
cannot therefore be properly dealt with in only one We chose thus a different method that
language;
• allows tbr maximal readability
• there is not a clear distinction between what should
be accounted for as an idiom, a metaphorical use of is independent of the linguistic (and p r o g r a m m i n g )
a word or a collocation. The boundaries between decisions of file whole h i t system (being only con-
cerned with lexical transfer)
collocational restrictions, metaphorical readings and
idioms are blurred and may even not be pertinent to * provides as much power as any unrestricted (tree or
the automatic treatment of language. graph) transformation language
to translate correctly, it is often necessary to use ex- The method proposed consists then of u ~ ' p ~
pressions instead of single words. Those expressions as tile result value in tim bilingual dictionary, there-
can m o r e o v e r give origin to complex structure ['ore keeping it independent of whatever structure it shouht
changes, possibly discontinuous. be assigned, and i n v ~ l a ~ e . . d ~ r s e r that
builds the structure required, on the Ily.
332 3
phase, that is, it s h o u l d obey the s a m e formalism a n d lin- ! '11 a l w a y s miss people i like .
guistic options in o r d e r fur t h e m to be compatible. . . . . > eu sentirei s e m p r e a falta de pessoas de
q u e m eu g o s t o .
I miss the m a n w h o was h e r e .
..... > eu sinto a falta do h o m e m que esteve aqui .
A detailed example he was missed , but w h o missed her ?
. . . . > foi sentlda a falta dele , m a s q u e m 6 que
F'or the sake o f clarity, a full e x a m p l e will be presented, sentiu a falta dela ?
r e g a r d i n g the word miss, in its m e a n i n g of to f e e l sorry or he was the o n e who m o s t missed his father .
unhappy at the absence or loss o f (someone or something) . . . . > ele foi o que sentiu mais a falta do seu
( l . o n g m a n ) . T h e Iqgure 4 shows an abridged fbrm of the pai .
entry for miss in the bilingual dictionary. T h e information they were the ones who were least missed .
[br c h o o s i n g a m o n g the several possible translations was . . . . > eles f o r a m os de q u e m se sentiu m e n o s a
omitted am1 will n o t be discussed here. T h e examples pre- falta .
I miss h a v i n g you in the n e i g h b o r h o o d .
sented will be in any case those that correctly trigger the
. . . . > sinto a falta de te ter na v i z i n b a n q a .
t r a n s l a t i o n sentir a f a l t a (literally, "feel tile lack"). I forgot missing you .
..... > esqueci-me de sentir a tua falta .
miss(Vl:.RB C I I T P O S S ( E V P sentir a falta)) 1 forgot to miss you .
miss(VEI,H] C I I T P O S S ( E V P ter saudades)) ..... > e s q u e c i q n e de sentir a tua falta .
perder0/FA~.B)
f a l t a r 0 / E RB) Figure 6. Several examples of "miss" t r a n s l a t e d as
m i s s ( V E R B ( E V P deixar escapar)) "sentir a falta": C o m p l e x ter,ses, passive,
m e n i n a ( N O UN) relative clauses, distinction between third
p e r s o n singular a n d others, a d v e r b posi-
Figure 4. l ) k t i o n a r y entry for "miss": E V P stores tion, etc.
the Portuguese string to be used as trans-
lation. As ter smMades is also a valid translation for miss in the
s a m e context as senti," a f a t t a , this choice belongs to style
T h e first thing that should be exemplified, is tbat, after the transfer. Follows the o u t p u t of the system in that case:
choice o f the mulfiword translation, the Portuguese g r a m -
m a r is invoked, building a equivalent g r a p h f i a g m e n t to the i miss that time.
t r a n s l a t i o n of 'Teel the lack". "|'his g r a p h fragment is then ...... > eu tenho saudades daquele tempo.
conveniently inserted in place o f the one fbr miss.
Figure 7. Another style alternative: "miss" trans-
lated by "ter saudades".
i miss you.
4 333
l lebrew, the indication of which syntactical alternative, thunder(NOUN (MWE thunder and lightning)
when different from the source one, could be needed very (EVP relfimpagos e trov6es))
frequently, therefore reducing the economy we are assert-
ing. We can only answer that while for English-to- Figure 10. Collocation differences: The same de-
Portuguese translation that structural marking is very vice used tbr many-to-many translation
rarely used, further testing with different language pairs can be used when, lbr instance, the order
must be done in order to assess or deny the universality of must be reversed.
this method. Namely languages whose translation would
require an extensive part-of speech change should be
kick(VERB (MWE kick the bucket)
tested.
(EVP bater as botas))
Follows a very simple example of the case discussed above:
Figure ! I. Example of a many-to-many words
translation: The MWE feature corre-
thank you. sponds both to the context requited in
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
order to choose that particular trans-
I)ECLI VERBI* "thank" lation, and to the piece of English tc~ re-
NPI PR.ON1 * "you" place.
PUN ,el . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
&wore portuguesa
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
It should also be mentioned in passing that whenever there No. of verbs translated by EVPs : 60.
is a generalized part-of-speech change on syntactical No. of nouns translated by EVPs : 47.
grounds, that is not done through lexical transfer, but in the No. of adjectives translated by EVPs : 14.
No. of adverbs translated by EVPs : 19.
structural transfer, as is the case of adieetival English
present participle clauses. No. of entries whose first translation is an EVP : 27.
No. of nodes whose correct translation is an EVP : 42.
311VE- to-A1 I V E translation Figure 12. Some relevant numbers
3:34 5
1 stood in the d o o r w a y . lation ", Proceedings of the 12th International Confer-
...... > estive de p6 na soleira da porta . ence on Computational Linguistics, Budapest, 22-27
1 dropped the camera while p a c k i n g . A u g u s t , 1988.
...... > deixei cair a mfiquina fotogrfilica enquanto [5] Gross, Maurice. 1986 " L e x i c o n - G r a m m a r : The Rep-
estava a fazer as m a l a s .
resentation of C o m p o u n d Words °, Proceedings of the
I missed the sunset tor, i g h t .
1 lth International Conference on Computational Lin-
. . . . = = > senti a [hlta do p6," do sol hoje fi noite . guistics, Bor, n 1986, pps 1-6.
T h e tihnstar kicked her agent .
...... > a estrela de cinema deu um pontap6 [6] 11eid, Ulrich and Syhille Raab. 1989 ~Collocafions in
ao see a g e n t e . Multilingual Generation", Proceedings of the Fourth
Vv'atch the dog ! European Conference of the European Chapter of the
..... > toma cuidado corn o cacborro ! Association for Computational Lingui.sties, 10-12
1 bicycled and did not do my h o m e w o r k April 1989, Manchester, UK.
..... > andei de hicicleta e n-Co fiz o meu
trabalho de casa . [7] Isabelle, Pierre. 1984 "Machine Translation at the
A then officer would not b o r r o w a uniform . ' [ ' A U M G r o u p ", Machine Translation Today: The
..... > u m olicial do tempo nim pediria emprestado State of the Art, Margaret King, ed., 1987.
um tmitbrme .
Did [ trouble you when ! yellowed your shirt .9 [8] Jensen, Karen. 1986 " P E G 1986: A BroiJd-coverage
= = . . . . > causei-te transtorno quando tingi de amarelo Computational Syntax of English", IB.\I Research
a tea camisa ? Report R C draft, Feb 1986, T.J.Watson Research
Center, Yorktown i leights, N Y 10598.
l,'igt, re 13. Several examples of 1-to-N translation.
[9] McCord, Michael C. 1989 "Design of I.MT: A
Prolog-Based Machine Translation System", Compu-
Even though no thorough broad-coverage translation tests tational Linguistics, Vol. 15, No. 1.
have been performed, we believe these results can assess
not only the feasibility but also the flexibility of the method [10] Nagao, M a k o t o and Jun-ichi "['sujii. 1986 "The
proposed. transfer phase of" the Mu Machine Translation Sys-
tem", in Proceedings of COLING'86, ACI., pps
9%103.
Considering them a bilingual problem, the transfer phase [12] Pdchardson, Stephen D. 1980 ~A l ligh-Level Trans[er
was assigned as the proper place tbr them to be treated. Language for the B Y U q S I Interactive Translalion
System", M.A. ]hesis, Brigham Young University.
Ihe method presenled has as advantage minimal storage
icquired aml the least COmlmtation (only on demand) of [13] Santos, Diana. 1988 "A fase de transfer6ncia de um
Ihe several strt, ctures involved. Also, it only makes use of sistema de traduqfio autom,5_tica do ingl6s para o
one single comprehending parser lor the target language, portugu6s', Tese de .Mestrado, Instituto Superior
instead of developing particular solutions to particular Tacnico, Universidade Tacnica de Lisboa.
problems.
[14] Santos, l)iana. 1988 "An N l l p r o t o t y p e f i o m I!nglish
l h e way the dictionary was conceived brings with it con- to Portuguese", Proceedings of the IBM Conference
siderable readability, making it independent of the linguistic on Natural Language Processing, October 24-26,
and p r o g r a m m i n g formalisms used in the other modules 1988, T h o r n w o o d , pps 122-133.
of the translation system. Its format can, moreover, make
il very easy to inherit inlbrmation from human-readable [15] Schenk, Andr6. 1986 "Idioms in the R o s e t t a . \ l a c h i n e
bilingual dictionaries. I tand-coding by an expert is not re- Translation System", Proceedings of the l lth Inter-
quired. national ConJerenee on Computational Linguistics,
Bonn 1986, pps 319-324.
335