Improving Neural Machine Translation Models with Monolingual Data (1511.06709v4)

Published 20 Nov 2015 in cs.CL

Abstract: Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained LLMs, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a LLM, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic back-translation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English<->German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task Turkish->English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English->German.

PDF Abstract

Improving Neural Machine Translation Models with Monolingual Data

The paper "Improving Neural Machine Translation Models with Monolingual Data" presents an in-depth analysis and novel approaches for augmenting Neural Machine Translation (NMT) systems using monolingual data. This research distinguishes itself from prior work by leveraging monolingual data without altering the fundamental neural architecture. Traditional NMT systems have relied predominantly on parallel datasets, but the authors explore integrating monolingual data to enhance translation quality and fluency.

Primary Contributions

The paper introduces two main strategies:

Dummy Source Sentences: Monolingual data is treated as parallel data with empty source sentences. This method essentially forces the network to make predictions based solely on target-side context, mimicking a LLM without altering the neural network architecture.
Synthetic Source Sentences: This method involves back-translating monolingual target sentences into the source language to generate synthetic parallel datasets. The NMT model is then trained on this combined synthetic and original parallel data, allowing it to leverage the additional monolingual resources.

Experimental Results

The validity of these methods is tested rigorously across several datasets and language pairs. The paper reports substantial improvements in BLEU scores when monolingual data is incorporated:

English-German WMT 15: An increase of up to 3.7 BLEU points for English to German translation and 3.6 to 3.7 BLEU points for the reverse direction.
Turkish-English IWSLT 14: Improvement of up to 3.4 BLEU points.

The use of synthetic data outperforms dummy source sentences, suggesting that generating realistic source-side context gives better training signals to the model.

Theoretical and Practical Implications

These findings carry significant theoretical and practical implications:

Domain Adaptation: The proposed methods facilitate effective domain adaptation. By back-translating a small monolingual in-domain corpus, the NMT model adapts more readily to new domains.
Enhanced Fluency: Monolingual data improves the model's fluency in the target language by augmenting the decoder's LLMing capabilities. This is particularly evident in the word-level fluency analysis presented.

Furthermore, the results indicate that expanding the training data with synthetic pairs delays overfitting and enhances the cross-entropy results on development sets, indicative of better generalization.

Future Developments in AI

This paper paves the way for future developments in AI and NMT by demonstrating that substantial gains can be made without altering network architectures. The adaptability of these techniques suggests broad applicability across various NMT frameworks and language pairs. Future research could explore optimizing the ratio of monolingual to parallel data and fine-tuning back-translation quality.

Conclusion

The methods outlined in this paper represent a pragmatic and effective approach to leveraging monolingual data in NMT systems. The authors achieve significant improvements in translation quality, demonstrate the practical benefits of domain adaptation, and reduce overfitting through innovative use of synthetic data. This research underscores the potential of monolingual data to enhance neural translation models, setting a new standard in the training of robust and fluent NMT systems.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Rico Sennrich (87 papers)
Barry Haddow (59 papers)
Alexandra Birch (67 papers)

Citations (2,627)

View on Semantic Scholar