Exploiting Cross-Sentence Context for Neural Machine Translation (1704.04347v3)

Published 14 Apr 2017 in cs.CL

Abstract: In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.

Authors (4)

Longyue Wang (87 papers)
Zhaopeng Tu (135 papers)
Andy Way (46 papers)
Qun Liu (230 papers)

Citations (199)

View on Semantic Scholar

Summary

Exploiting Cross-Sentence Context for Neural Machine Translation

The paper "Exploiting Cross-Sentence Context for Neural Machine Translation" presents a novel approach to enhancing the performance of Neural Machine Translation (NMT) systems by effectively utilizing cross-sentence context. The authors, Longyue Wang et al., propose a methodological framework to address the challenges inherent in document-level translation, including ambiguity and inconsistency, which often arise when translating sentences in isolation.

Concept and Implementation

Current NMT systems predominantly rely on the encoder-decoder framework, processing sentences individually without leveraging the broader document context. This limitation often results in translation errors due to the absence of cross-sentence contextual information. To overcome these challenges, Wang et al. introduce a cross-sentence context-aware model that incorporates historical context from prior source sentences within the same document. Specifically, the approach employs a hierarchy of Recurrent Neural Networks (RNNs) to effectively summarize and integrate cross-sentence context into the NMT process.

The paper delineates two principal strategies for context integration:

Initialization Strategy: This involves using the historical sentence representation to initialize the states of the encoder, decoder, or both. By initializing these states with the cross-sentence context, the model gains an informed starting point, which aids in generating more contextually accurate translations.
Auxiliary Context Strategy: This method treats the cross-sentence context as an auxiliary source of information that enhances the decoder's state update mechanism. An advanced version, termed Gating Auxiliary Context, introduces a gating mechanism that dynamically regulates the amount of global context applied in generating each target word.

Experimental Results and Implications

The experimental evaluation, conducted on a large-scale Chinese-English translation task, demonstrates the effectiveness of the proposed strategies. The integration of cross-sentence context through initialization and auxiliary context mechanisms individually results in significant performance improvements. Specifically, the combination of the best variants from both strategies achieved an increase of up to +2.1 BLEU points over the baseline NMT system.

This outcome highlights the potential for cross-sentence context integration to resolve ambiguity and improve consistency in translation outputs. The use of a gated auxiliary context appears particularly beneficial, as it provides a more nuanced control over context utilization, adapting dynamically to the demands of each target word.

Future Directions

Beyond the promising results, this research opens avenues for further exploration into beneficial applications of global context in NMT. Subsequent work could investigate the integration of additional document-level features such as discourse relations and explore the extension of this approach to full-length documents. Further, expanding this research to encompass other linguistic features might yield additional improvements in translation accuracy and fidelity.

In conclusion, this research marks a meaningful step towards enhancing NMT systems by adeptly incorporating cross-sentence context. It introduces methodologies that not only improve translation quality but also lay the groundwork for continued advancements in document-level translation capabilities. The code and methodologies developed in this work have been made publicly available, fostering further research and development in the domain of context-aware NMT.