Linguistic Input Features Improve Neural Machine Translation (1606.02892v2)

Published 9 Jun 2016 in cs.CL

Abstract: Neural machine translation has recently achieved impressive results, while using little in the way of external linguistic information. In this paper we show that the strong learning capability of neural MT models does not make linguistic features redundant; they can be easily incorporated to provide further improvements in performance. We generalize the embedding layer of the encoder in the attentional encoder--decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature. We add morphological features, part-of-speech tags, and syntactic dependency labels as input features to English<->German, and English->Romanian neural machine translation systems. In experiments on WMT16 training and test sets, we find that linguistic input features improve model quality according to three metrics: perplexity, BLEU and CHRF3. An open-source implementation of our neural MT system is available, as are sample files and configurations.

Authors (2)

Rico Sennrich (88 papers)
Barry Haddow (59 papers)

Citations (379)

View on Semantic Scholar

Summary

Overview of "Linguistic Input Features Improve Neural Machine Translation"

The paper "Linguistic Input Features Improve Neural Machine Translation" by Rico Sennrich and Barry Haddow addresses the integration of linguistic features into Neural Machine Translation (NMT) models. Despite the notable progress in NMT achieved without extensive linguistic information, this paper posits that incorporating linguistic features can further enhance translation quality.

Methodology and Innovations

The authors extend the embedding layer within the attentional encoder-decoder NMT architecture to incorporate arbitrary linguistic features alongside the traditional word feature. They integrate morphological features, part-of-speech (POS) tags, and syntactic dependency labels into the encoding process of NMT systems. These features are evaluated using English↔German and English→Romanian translation systems, benchmarked on the WMT16 dataset.

The extended model architecture supports the inclusion of additional linguistic input features by generalizing the encoder's input representation. Each feature such as lemmas, subword tags, morphological details, and others is embedded separately and then concatenated, maintaining the total embedding size constant. The model's training leverages recurrent neural networks with gated recurrent units, employing a bidirectional encoder to process input sequences alongside an attention mechanism for better alignment during translation.

Experimental Results

The paper shows that linguistic features improve translation quality across several metrics, including perplexity, BLEU, and chrF3 scores. For the English↔German language pair, improvements of approximately 1.5 BLEU and 0.5 chrF3 were observed, with perplexity reductions indicating enhanced model predictability. However, the improvements varied depending on the linguistic features used, with the combination of all features yielding the highest gains. Importantly, these enhancements were also retained when augmenting the training corpus with synthetic parallel data.

In a lower-resource setting such as English→Romanian, the integration of linguistic features provided a clear performance boost, with significant enhancements in BLEU and chrF3 scores. This suggests that linguistic annotations are particularly beneficial in under-resourced translation tasks, where data-driven learning from large corpora is limited.

Discussion and Implications

The findings reveal that despite the robust learning capabilities inherent in NMT, explicit linguistic features augment disambiguation and enrich representation learning. These improvements are attributed to enhanced generalization over inflected word forms and better syntactic disambiguation, illustrating that linguistic information can contribute meaningfully even in systems designed for minimal handcrafted features.

The practical implications of this research suggest that integrating linguistic annotations into NMT systems can significantly enhance translation outputs, particularly for languages with complex morphologies or in scenarios with limited bilingual data. The paper also opens avenues for further exploration into various linguistic features' roles in low-resource machine translation settings.

Future Directions

Moving forward, further enhancements in machine learning architectures might reduce reliance on explicit features. However, novel features, possibly beyond traditional linguistic annotations, could provide additional benefits. Explorations into real-time syntactic parsing and possibly incremental application during decoding could potentially improve translation efficiency and accuracy.

In summary, this paper provides a comprehensive investigation into the advantages of incorporating linguistic input features in neural machine translation, demonstrating tangible improvements in translation quality and laying a foundation for future research in leveraging linguistic knowledge within deep learning frameworks.

PDF Markdown