A Survey of Domain Adaptation for Neural Machine Translation

Published 1 Jun 2018 in cs.CL, cs.AI, and cs.LG | (1806.00258v1)

Abstract: Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

Abstract PDF Upgrade to Chat

Citations (251)

View on Semantic Scholar

Summary

The paper surveys domain adaptation approaches by distinguishing data-centric and model-centric strategies to enhance NMT performance.
It highlights methods like back-translation, fine tuning, and ensemble decoding to overcome challenges in low-resource and domain-specific contexts.
The survey underscores the need for integrating multi-domain strategies and adversarial techniques to advance neural machine translation architectures.

Domain Adaptation Techniques for Neural Machine Translation: A Comprehensive Survey

This paper presents a comprehensive survey of domain adaptation techniques specifically designed for Neural Machine Translation (NMT), addressing the limitations faced by vanilla NMT systems when dealing with domain-specific corpora. The survey categorizes domain adaptation techniques into data centric and model centric approaches, providing a detailed examination of the growth and application of these methods in the context of NMT. Such a categorization aids in understanding the adaptability of these methods from Statistical Machine Translation (SMT) to NMT.

Domain adaptation is critical due to the scarcity of large-scale parallel corpora for various language pairs beyond those involving English or several European languages. A major issue with NMT is its poor performance in low-resource and domain-specific scenarios. Thus, this paper stresses the necessity of leveraging out-of-domain parallel corpora alongside in-domain monolingual corpora to improve translation quality in such cases.

Data Centric Approaches

Monolingual Corpora Utilization: The adaptation of monolingual data for NMT involves strategies that incorporate monolingual data into NMT architectures or training paradigms. Techniques such as using monolingual target data for decoder strength improvement or employing a multitasking framework to enhance encoder capabilities are examined.
Synthetic Parallel Corpora: Back-translation is a prominent method for generating synthetic parallel corpora from monolingual data, enhancing the training of low-resource NMT models by augmenting the dataset with pseudo-parallel sentence pairs.
Out-of-domain Parallel Data: Creating multi-domain systems by concatenating data and appending domain-specific tags is explored. The study also covers data selection techniques drawn from SMT literature, emphasizing those that optimize corpus selection to improve in-domain translation efficiency.

Model Centric Approaches

Training Objective Centered Adaptations:
- Instance/Cost Weighting: These methods adjust the NMT loss function using instance-specific weights informed by criteria such as LLM cross-entropy scores.
- Fine Tuning and Mixed Fine Tuning: Tailoring NMT models initially trained on out-of-domain data by subsequent training on mixed domain data, offering reductions in the trade-off between in-domain performance and out-of-domain degradation.
Architecture Modifications:
- Deep Fusion: This technique integrates an in-domain RNN LLM with the NMT decoder, facilitating transfer learning from the pre-trained model during in-domain adaptation.
- Domain Discriminators and Domain Control Mechanisms: Adjusting model architectures to incorporate domain predictors or embed domain-specific information in input features to steer the translation output appropriately.
Decoding Adjustments:
- Shallow Fusion and Ensemble Techniques: These involve combining model outputs at the inference stage to incorporate broader linguistic contexts from related in-domain scenarios and maintain robustness in varied domain settings.

Implications and Future Directions

The paper posits that integrating domain adaptation techniques with the latest NMT architectures, such as convolutional and transformer models, is a promising future direction. The need for practical implementations that incorporate domain-specific dictionaries and knowledge bases is also highlighted as a crucial path for enhancing domain-specific translation tasks.

Furthermore, the exploration of multilingual and multi-domain strategies remains largely unexamined but offers significant potential for leveraging cross-linguistic data. Lastly, adversarial domain adaptation methods and the introduction of domain generation concepts can address the challenges posed by unseen target domains, offering more generalized translation solutions.

In conclusion, the survey elucidates the critical role of domain adaptation in advancing NMT's applicability to real-world scenarios, filling the gaps left by general-purpose machine translation systems.

Markdown