On Using Monolingual Corpora in Neural Machine Translation
The paper investigates the integration of monolingual corpora into neural machine translation (NMT) systems, focusing on improving translation quality, especially for low-resource languages. Acknowledging the constraints of acquiring high-quality parallel corpora, the authors propose leveraging the abundant availability of monolingual data to enhance NMT performance.
Methodology
The researchers introduce two methods for integrating a LLM (LM) trained on monolingual data into NMT systems: shallow fusion and deep fusion.
- Shallow Fusion involves combining the scores from the neural translation model (NMT) and the LM at inference time, using a weight coefficient to balance their contributions.
- Deep Fusion concatenates the hidden states of the NMT decoder and the LM, finetuning the output layer to dynamically integrate information from both sources.
These methods aim to exploit the linguistic structure present in monolingual corpora to improve translation performance.
Experimental Results
The experimental evaluation covers several language pairs: Turkish-English (Tr-En), Chinese-English (Zh-En), German-English (De-En), and Czech-English (Cs-En). The results show:
- An improvement of up to 1.96 BLEU points on Tr-En using deep fusion.
- In high-resource settings (Cs-En, De-En), deep fusion enhances performance by 0.39 and 0.47 BLEU points, respectively.
These enhancements indicate that the proposed approaches are not limited to low-resource scenarios, demonstrating their broad applicability.
Analysis
Performance improvements correlate with the domain similarity between monolingual corpora and target translation tasks. Domains with higher similarity, such as news articles for De-En, benefited more significantly from the supplemental LM. This suggests the potential for further advances through domain adaptation.
Implications and Future Work
The research underscores the utility of monolingual data in situations where parallel corpora are scarce, proposing a viable path to boost NMT across various contexts. The paper opens avenues for further exploration in domain adaptation techniques and enhanced LM integration strategies, which could yield even greater gains in translation quality. The insights from this work can inform ongoing developments in AI and language processing, enhancing cross-linguistic communication capabilities.