Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Attention for Context-aware Neural Machine Translation (1903.08788v2)

Published 21 Mar 2019 in cs.CL

Abstract: Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document. Recent works in context-aware NMT consider only a few previous sentences as context and may not scale to entire documents. To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. We also propose single-level attention approaches based on sentence or word-level information in the context. The document-level context representation, produced from these attention modules, is integrated into the encoder or decoder of the Transformer model depending on whether we use monolingual or bilingual context. Our experiments and evaluation on English-German datasets in different document MT settings show that our selective attention approach not only significantly outperforms context-agnostic baselines but also surpasses context-aware baselines in most cases.

Selective Attention for Context-aware Neural Machine Translation

This paper addresses a significant challenge in Neural Machine Translation (NMT) by advancing context-aware capabilities to improve translation quality at the document level. Authors Sameen Maruf, Andr é F. T. Martins, and Gholamreza Haffari have proposed a novel approach leveraging selective attention mechanisms in NMT to efficiently incorporate broader contextual information without sacrificing computational scalability. The necessity of this research arises from the limitations of existing NMT systems, which predominantly operate at a sentence level, missing out on the nuanced discourse dependencies present in full documents.

The paper introduces a hierarchical attention mechanism that employs sparse attention to selectively target relevant sentences and crucial words within those sentences, enhancing the context-aware translation process. This top-down approach to attention marks a departure from traditional methods that are restrictive in both memory use and contextual range. By utilizing sparse attention, the authors aim to mimic cognitive translation processes, where selective focus can yield better translations by concentrating resources on pertinent parts of the document.

The research is evaluated against strong baselines on three English-German dataset domains: TED talks, News-Commentary, and Europarl. The experiments are conducted in both offline and online document MT settings. Results indicate that the selective attention model notably surpasses context-agnostic baselines with improvements quantified through increases in BLEU scores: +1.34 on TED Talks, +2.06 on News-Commentary, and +1.18 on Europarl, and it also outperforms context-aware baselines in most instances.

Key contributions outlined in this paper include:

  • A novel and scalable hierarchical attention mechanism to optimize context-aware translation.
  • Comparative analysis of multiple selective attention variants against existing baselines.
  • Enhanced performance demonstrated across varying genres and document lengths.

The practical implications of these results point towards significant strides in achieving more coherent and contextually accurate translations for document-level tasks, making substantial progress over sentence-based methods. By focusing on selective sparsity, the model potentially offers better interpretability, a crucial factor for advancing NMT technologies.

Future directions for this research could delve into refining selective attention models further to accommodate evolving complexities in language dynamics and translation requirements. Exploring sparse attention applications beyond NMT in broader NLP tasks might yield insights for interdisciplinary advancements.

In summary, this research significantly contributes to the domain of NMT by refining attention mechanisms in context-aware settings, aiming to develop models that are adept at more comprehensively understanding and translating documents through nuanced contextual integration.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sameen Maruf (6 papers)
  2. André F. T. Martins (113 papers)
  3. Gholamreza Haffari (141 papers)
Citations (169)