Document-Level Neural Machine Translation with Hierarchical Attention Networks (1809.01576v2)

Published 5 Sep 2018 in cs.CL

Abstract: Neural Machine Translation (NMT) can be improved by including document-level contextual information. For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. The model is integrated in the original NMT architecture as another level of abstraction, conditioning on the NMT model's own previous hidden states. Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in context-aware methods, and that both the encoder and decoder benefit from context in complementary ways.

PDF Abstract

Document-Level Neural Machine Translation with Hierarchical Attention Networks

The paper "Document-Level Neural Machine Translation with Hierarchical Attention Networks" introduces an innovative approach to enhance Neural Machine Translation (NMT) by incorporating document-level contextual information. The authors propose a Hierarchical Attention Network (HAN) to effectively model and utilize this contextual information, resulting in improved translation performance as measured by BLEU scores.

The paper begins by acknowledging the limitations of traditional sentence-level NMT, which fails to account for document context, potentially leading to reduced coherence and cohesion in the output. Recent advances have shown that adding contextual information can improve translation performance. However, existing methods either require additional parameters or fail to exploit pre-learned representations.

The authors propose utilizing HAN, which operates on word-level and sentence-level abstractions, allowing the NMT model to dynamically and selectively focus on relevant parts of the context for each word prediction. This network is integrated into both the encoder and decoder stages of the NMT model. The HAN encoder helps disambiguate source-word representations, while the HAN decoder enhances target-side lexical cohesion and coherence. Notably, the integration reuses hidden representations from prior translations, optimizing across multiple sentences.

Several experiments were conducted across various languages and domains to substantiate the proposed model's effectiveness. The paper used Chinese-to-English and Spanish-to-English datasets, emphasizing the robust improvements HAN provides over strong baseline models, including a transformer-based NMT and a cache-augmented NMT method. The implementation of context-aware multi-head attention allows the network to capture diverse discourse phenomena, which is paramount for translating coherent documents.

The results indicated that the HAN model delivered significant improvements across all tested scenarios, demonstrating superior performance to both sentence-level baselines and prior context-aware methods. The HAN model's context-awareness translates to enhanced handling of inter-sentence connections and, subsequently, overall translation quality. It was revealed that the model effectively improves noun and pronoun translation accuracy, lexical cohesion, and coherence relative to reference translations.

The implications of these findings are both practical and theoretical. Practically, this suggests that integrating hierarchical attention mechanisms within NMT systems can yield more contextually faithful and coherent translations, potentially benefiting real-world document translation applications. Theoretically, it provides insights into modeling context in translation systems and suggests future exploration into discourse-specific features.

The paper presents a significant advancement in document-level NMT through the application of hierarchical attention mechanisms. Future research directions may include further refinement of context modeling, integration with discourse annotations, and exploring additional hierarchical architectures to further improve translation quality in more complex documents. Such avenues explore promising applications in advancements in Artificial Intelligence and broadening NMT's applicability in multilingual and multicultural communication technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Lesly Miculicich (15 papers)
Dhananjay Ram (10 papers)
Nikolaos Pappas (188 papers)
James Henderson (52 papers)

Citations (264)

View on Semantic Scholar

Document-Level Neural Machine Translation with Hierarchical Attention Networks (1809.01576v2)

Document-Level Neural Machine Translation with Hierarchical Attention Networks

Related Papers