Вы находитесь на странице: 1из 4

Jan 23rd, 2019

Vo Tri Thong.

NERAL MACHINE TRANSLATION


(SEQ2SEQ) REPORT

1. INTRODUCTIONS
A. Neural Machine Translation (NMT)
This report discusses the architecture and the implementation of the neural
machine translation which was devised by (Luong, Brevdo, & Zhao, 2017). The
document also covers related concepts such as thought vector, attention mechanism
and beam search.

i. Neural Machine Translation


Due to the existence of numerous of human languages around the world, the
need to develop a machine translation system has been around for decades.
Until the recent years, neural machine translation systems have typically relied
on phrase by phrase translation approaches which led to disfluency and
unsatisfying results. In contrast, the technic in this paper scan the entire source
sentence to produce the translated sentence. This sequence reading method is
similar to the way that humans employ to translate documents. The main
components of the system are an encoder, a decoder and a thought vector.

ii. Encoder, Decoder and Thought vector


The NMT System firstly reads the entire sentence in the source language
through the encoder. Then, it generates a Thought Vector that carries
information of the source sentence in numeric format. Subsequently, the
Decoder emit the output sentence by generating one word after another.

Figure 1: Encoder-decoder architecture


On top of the fundamental encoder-decoder architecture, some advance
techniques are employed to further enhance the result such as attention
mechanism and beam searching.

iii. Attention mechanism:


The process of compressing the meaning of the source sentence into a fixed
length Thought Vector may result in information loss. Because the output
(Bahdanau, 2014)thought vector, some information may not be retained.
Therefore, the principle of attention mechanism is to maintain a connection
between the output and the source sentence.

Figure 2: Effects of Attention Mechanism (Bahdanau, 2014)

This technique demonstrates its effectiveness especially in long sentence as


indicated in Figure 2.

B. BLEU Score
i. Introductions
Bleu score is a method for automatic evaluation of machine translation. As
human evaluations are expensive and time-consuming, an automatic set of
matrices is needed to evaluate translation results. Bleu is a relatively quick to
apply, and it highly correlates with human evaluation. Bleu has been proven as
one of the most prominent method to evaluate translation results because of its
correlation to human judgments. (Kishore Papineni, 2012)

ii. Algorithm
The Bleu score is a number between 0 and 1. In order to score the translation,
one will compare n-grams of the candidate translation with the n-grams of the
reference translation and count the number of matches. The process includes
the evaluation for various n-gram sizes and compute the weighted average.

2
2. EXPERIMENTS
A. Dataset
Experiments in this report are trained and tested based on the IWSLT English-
Vietnamese dataset. The training set has 133K sentence pairs provided by the
IWSLT Evaluation Campaign.
B. The ‘Vanilla’ NMT model:
i. Configurations:
The parameters of this model is referred from the Standard HParams iwslt15.json
with a slight adjustment: Attention is none.
Key configurations:

"attention": "",
"attention_architecture": "standard",

"learning_rate": 1.0,
"num_units": 512,
"optimizer": "sgd",
"beam_width": 10
ii. Results:
Blue score is relatively slow without attention mechanism, maximum test Blue
at 8.9
# Best bleu, step 9000 lr 0.0625 step-time 0.00s wps 0.00K ppl 0.00 gN 0.00 dev
ppl 21.77, dev bleu 10.0, test ppl 24.12, test bleu 8.9, Mon Jan 21 09:14:51 2019
Time to train: 30’ using a rig with Nvidia1080 Ti
C. NMT with Attention model
i. Model with SGD optimizer
1. Configurations:
The parameters of this model is referred from the Standard HParams iwslt15
without any adjustment.

"attention": "scaled_luong",
"attention_architecture": "standard",
"learning_rate": 1.0,
"num_units": 512,

3
"optimizer": "sgd",
"beam_width": 10

2. Results
Blue score is higher with attention mechanism, maximum test Blue at 23.1
# Best bleu, step 12000 lr 0.125 step-time 0.14s wps 40.07K ppl 4.87 gN 5.96
dev ppl 9.88, dev bleu 20.3, test ppl 8.39, test bleu 23.1, Mon Jan 21 06:58:25
2019
Time to train: 30’ using a rig with Nvidia 1080 ti
ii. Model with Adam optimizer and learning rate is 0.001:
1. Configurations:
"attention": "scaled_luong",
"attention_architecture": "standard",
"learning_rate": 0.001,
"num_units": 512,
"optimizer": "adam",
"beam_width": 10

2. Results
Adam optimizer quickly achieved a high Bluescore at step 5000 but it didn’t
reach the same high as SGD. Also one observation is that Adam optimizer
generated significantly more log files.
# Best bleu, step 5000 lr 0.000125 step-time 0.18s wps 30.47K ppl 2.79 gN 8.81
dev ppl 10.88, dev bleu 19.6, test ppl 9.69, test bleu 21.6, Mon Jan 21 10:09:06
2019
Time to train 1 hour.

Вам также может понравиться