Learning to Remember Translation History with a Continuous Cache (1711.09367v1)

Published 26 Nov 2017 in cs.CL

Abstract: Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.

PDF Abstract

Learning to Remember Translation History with a Continuous Cache

The presented paper explores a novel approach to enhance Neural Machine Translation (NMT) systems by integrating a continuous cache-like memory network. This cache is employed to store recent hidden representations as translation history, allowing existing NMT models to leverage document-level context dynamically.

Overview of the Approach

Neural Machine Translation models typically process sentences independently, failing to utilize cross-sentence context, which can lead to translation inconsistency and ambiguity. The proposed methodology seeks to address this limitation by augmenting NMT models with a lightweight cache that retains bilingual hidden states from previous translations. These are stored as key-value pairs within an external memory structure: the keys are attention context vectors from source sentences, and the values are corresponding decoding states representing target-side context.

The cache operates by implementing an efficient mechanism for key matching during translation. The current attention context serves as a query, retrieving values associated with similar past source contexts, which assist in generating the next word. This approach avoids the complexity of additional learned parameters for matching, utilizing simple dot products for similarity quantification.

Experimental Insights

Experiments were conducted across multiple domains (News, Subtitle, TED), showing that integrating the continuous cache consistently improves translation performance. Notably, the proposed method enhances translation consistency by dynamically updating the probability distribution with contextual history from the cache. Such improvements occur with minimal additional computational cost, making this approach highly efficient compared to previous models that demanded significant computational resources.

Implications and Developments

The research positions this continuous cache system as a significant improvement over existing approaches that only consider isolated source sentences. By embedding internal representations rather than discrete lexicons or words, the system mitigates error propagation issues and effectively utilizes target-side history.

This advancement opens pathways for more robust models capable of handling document-level translation with greater context fidelity. Future work may explore further enhancements in long-range context utilization, potentially integrating discourse relations or novel architectural designs for even more effective memory usage.

Overall, the paper’s contributions mark a meaningful step towards more intelligent and adaptive NMT systems, setting the stage for future innovation in utilizing translation history to improve model performance across varied linguistic domains.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Zhaopeng Tu (135 papers)
Yang Liu (2253 papers)
Shuming Shi (126 papers)
Tong Zhang (569 papers)

Citations (179)

View on Semantic Scholar

Learning to Remember Translation History with a Continuous Cache (1711.09367v1)