Edinburgh Neural Machine Translation Systems for WMT 16 (1606.02891v2)

Published 9 Jun 2016 in cs.CL

Abstract: We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English<->Czech, English<->German, English<->Romanian and English<->Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.

PDF Abstract

Overview of Edinburgh Neural Machine Translation Systems for WMT 16

The paper "Edinburgh Neural Machine Translation Systems for WMT 16" presents the machine translation systems developed for the 2016 Workshop on Machine Translation (WMT16) shared news translation task. The authors, Sennrich, Haddow, and Birch from the School of Informatics at the University of Edinburgh, explore neural translation systems across four language pairs bidirectionally: English-Czech, English-German, English-Romanian, and English-Russian.

The systems are based on an attentional encoder-decoder framework enhanced by Byte-Pair Encoding (BPE) for handling open-vocabulary translation with a fixed vocabulary. Several innovative approaches were applied to enhance the translation quality including automatic back-translation of monolingual data, pervasive dropout, and target-bidirectional models. These approaches resulted in substantial improvements of 4.3–11.2 BLEU points over baseline systems.

Methodological Approaches

Baseline System: The baseline utilizes an attentional encoder-decoder network trained using techniques such as Adadelta, beam search, and gradient clipping. BLEU score is employed for validation.
Byte-Pair Encoding (BPE): Used to enable open-vocabulary translation, BPE segments words into sub-unit tokens, thereby managing vocabulary size effectively while increasing sequence lengths.
Synthetic Training Data: Leveraging monolingual data, the paper employed automatic back-translation to create additional synthetic parallel training data. This method notably improved model robustness and adaptability to domain-specific fluency.
Pervasive Dropout: To tackle overfitting, especially notable in English-Romanian translations, pervasive dropout was applied throughout the network layers, including recurrent ones. This approach offered significant performance gains.
Target-Bidirectional Translation: By training separate models for left-to-right (l2r) and right-to-left (r2l) translation directions, the ensemble results were improved. This technique mitigates potential issues from target context bias.

Results and Analysis

The results across different language pairs are compelling. For each pair, training with synthetic data substantially improved BLEU scores. The introduction of ensembles and r2l reranking further elevated translation performance. Specifically, the English-German and English-Czech language pairs exhibited marked gains with the use of these methods.

English↔German: Achieved improvements of up to 5.7 BLEU through synthetic data; ensemble models enhanced results further, culminating in top shared task rankings.
English↔Czech: Continued training using increasing amounts of back-translated data incrementally boosted BLEU scores, affirming the value of synthetic data.
English↔Romanian: Significant enhancements from dropout and synthetic data underscore the challenges and solutions in handling smaller, inconsistent datasets.
English↔Russian: Overcame cross-alphabet challenges with ISO-9 transliteration and BPE operations, yielding strong BLEU improvements.

Implications and Future Directions

This research demonstrates the efficacy of combining traditional systems with data augmentation strategies like synthetic data generation and regularization techniques such as dropout. The findings suggest that neural systems can achieve competitive results akin to, or surpassing, existing state-of-the-art systems in machine translation tasks.

Future exploration could focus on more diverse ensemble techniques and real-time adaptation capabilities to further improve translation quality. The methods described, particularly in terms of synthetic data utilization and dropout, could offer potential avenues for enhancement in other language processing tasks and machine learning models.

In conclusion, this work provides meaningful insights into neural machine translation advancements, robustly positioning itself as a reference point for further research and application enhancement in artificial intelligence and computational linguistics.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Rico Sennrich (87 papers)
Barry Haddow (59 papers)
Alexandra Birch (67 papers)

Citations (521)

View on Semantic Scholar

Edinburgh Neural Machine Translation Systems for WMT 16 (1606.02891v2)

Overview of Edinburgh Neural Machine Translation Systems for WMT 16

Methodological Approaches

Results and Analysis

Implications and Future Directions

Related Papers