Addressing the Rare Word Problem in Neural Machine Translation (1410.8206v4)

Published 30 Oct 2014 in cs.CL, cs.LG, and cs.NE

Abstract: Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OOV) word. In this paper, we propose and implement an effective technique to address this problem. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sentence. This information is later utilized in a post-processing step that translates every OOV word using a dictionary. Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2.8 BLEU points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT system is the first to surpass the best result achieved on a WMT14 contest task.

PDF Abstract

Addressing the Rare Word Problem in Neural Machine Translation

The paper "Addressing the Rare Word Problem in Neural Machine Translation," authored by Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba, offers a significant contribution to improving the efficacy of Neural Machine Translation (NMT) systems. Specifically, the focus of the research is on mitigating the challenge posed by rare words and out-of-vocabulary (OOV) terms in NMT systems.

Summary of Contributions

The main contributions of this paper are as follows:

Augmented Training Data: The authors propose a novel method wherein NMT systems are trained on data augmented with the output of a word alignment algorithm. This alignment information allows the NMT system to produce pointers to the positions of OOV words in the source sentence.
Post-Processing Step: A post-processing step utilizes the alignment information to translate OOV words using a dictionary. If a translation for an OOV word is not found in the dictionary, the model defaults to an identity translation.
Empirical Validation: The proposed method was empirically validated on the WMT'14 English-to-French translation task, showing substantial improvements of up to 2.8 BLEU points over systems that do not incorporate the alignment technique. Notably, the NMT system with the proposed technique achieved a BLEU score of 37.5, surpassing the previous best result on the WMT'14 contest task.

Technical Approach

Alignment-Based Augmentation

The technique leverages alignment information to track the origins of unknown words in the target sentence. Specific strategies for annotation are introduced, including:

Copyable Model: Multiple tokens are used to represent various unknown words in both the source and target languages. OOV words are annotated with indices, enabling the system to identify the source of unknown target words.
Positional All Model (PosAll): This model inserts positional tokens to denote the relative positions of aligned source and target words, catering to the alignment of frequent words in addition to OOV terms.
Positional Unknown Model (PosUnk): Focuses solely on annotating unknown words with their relative source positions, thereby reducing sentence length and computational load while achieving better alignments.

Training Procedures

The authors trained multi-layer deep Long Short-Term Memory (LSTM) models with 1000 cells and embeddings, achieving a training speed of 5.4K words per second on an 8-GPU machine. The models were trained on a parallel dataset of 12M English-French sentences, with different vocabulary sizes (40K and 80K words).

Empirical Results

Table 1 in the paper provides a detailed comparison of BLEU scores across various NMT systems. Key findings include:

Single LSTM with PosUnk: Achieved a significant improvement of BLEU scores by 2.3 points over a 40K vocabulary system.
Ensemble Models: Demonstrated even greater improvements, with an ensemble of 8 LSTMs + PosUnk achieving a BLEU score of 37.5, which is a new record for the WMT'14 task.

Theoretical and Practical Implications

The ability to correctly translate rare words has theoretical implications on the robustness and generalizability of NMT systems. Practically, this technique is especially beneficial for translating domain-specific texts or languages with rich vocabularies and many rare terms. The method's capability to enhance existing NMT models by treating them as blackboxes and only manipulating their input and output layers also speaks to its flexibility and broad applicability.

Future Directions in AI

Considering these promising results, future research may explore:

Further refining positional models to better handle non-monotonic alignments, particularly in language pairs with different syntactic structures.
Extending this methodology to other sequence-to-sequence tasks beyond translation, such as text summarization or speech recognition.
Combining the alignment-based methods with advances in large vocabulary handling to further alleviate the computational intensity of training NMT systems.

The paper successfully demonstrates a practical approach to address the rare word problem in NMT systems and sets a new benchmark in translation quality. The implications extend to broader AI applications, suggesting a fertile area for future exploration and advancement.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Minh-Thang Luong (32 papers)
Ilya Sutskever (58 papers)
Quoc V. Le (128 papers)
Oriol Vinyals (116 papers)
Wojciech Zaremba (34 papers)

Citations (776)

View on Semantic Scholar

Addressing the Rare Word Problem in Neural Machine Translation (1410.8206v4)