Incorporating Discrete Translation Lexicons into Neural Machine Translation (1606.02006v2)

Published 7 Jun 2016 in cs.CL

Abstract: Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence. We propose a method to alleviate this problem by augmenting NMT systems with discrete translation lexicons that efficiently encode translations of these low-frequency words. We describe a method to calculate the lexicon probability of the next word in the translation candidate by using the attention vector of the NMT model to select which source word lexical probabilities the model should focus on. We test two methods to combine this probability with the standard NMT probability: (1) using it as a bias, and (2) linear interpolation. Experiments on two corpora show an improvement of 2.0-2.3 BLEU and 0.13-0.44 NIST score, and faster convergence time.

Authors (3)

Philip Arthur (9 papers)
Graham Neubig (342 papers)
Satoshi Nakamura (94 papers)

Citations (208)

View on Semantic Scholar

Summary

Incorporating Discrete Translation Lexicons into Neural Machine Translation: An Overview

The paper "Incorporating Discrete Translation Lexicons into Neural Machine Translation" addresses a known limitation in neural machine translation (NMT), particularly concerning the accurate translation of low-frequency content words. This limitation hampers the semantic fidelity of NMT systems, which unlike phrase-based translation systems, represent words as continuous vectors, often leading to errors with less common words. The authors propose integrating discrete translation lexicons into NMT as a viable solution to enhance translation precision, particularly for these problematic words.

Methodology

The proposed approach augments NMT systems by incorporating discrete translation lexicons to improve the translation of rare content words effectively. The methodology comprises the following key steps:

Lexicon Probability Calculation: The lexicon probability of the next word is computed using the attention vector from the NMT model. This attention mechanism aids in selecting the source words whose lexical probabilities will be prioritized.
Combining Probabilities: Two combination methods are explored:
- Model Bias: This method introduces the lexicon probability into the NMT probability model through a log-bias mechanism.
- Linear Interpolation: The lexical probability is linearly combined with NMT probabilities using a learnable parameter to control the interpolation.
Lexicon Construction: The authors construct lexicon probabilities using:
- Automatically Learned Lexicons through traditional word alignment models like IBM models.
- Manual Lexicons derived from external resources such as handmade dictionaries.
- A Hybrid Approach utilizing both automatic and manual resources for lexical probability refinement.

Experimental Evaluation

Experiments were conducted on two English-Japanese datasets: KFTT and BTEC. The performance was measured through BLEU and NIST scores, focusing on the translation accuracy of low-frequency words. Key results include:

The bias-based method using either automatic or hybrid lexicons showed notable improvements in BLEU scores (2.0-2.3) and NIST scores (0.13-0.44).
The proposed method demonstrated faster convergence compared to traditional attentional NMT systems.
Despite being more effective for automatic lexicons, the linear interpolation method did not consistently improve translation quality, indicating the relative efficacy of the bias approach when integrating lexicons.

Implications and Future Work

The integration of lexicons to enhance NMT systems is significant in improving the translation of rare words, thus maintaining the semantic integrity of translations. This advancement narrows the gap between NMT and traditional SMT, particularly for challenging language pairs and domains with broader, more diverse vocabularies.

Future research can benefit from exploring adaptive mechanisms for interpolation coefficients based on contextual information, potentially increasing the flexibility and accuracy of the linear method. Additionally, investigating the integration of character-based models with lexicon enhancements could yield further improvements, given the complementary nature of hybrid translation models.

In conclusion, this paper presents a pragmatic approach to refining NMT systems through the incorporation of discrete lexicons, showing promising results in translation quality and convergence speed. This enhancement is especially crucial in maintaining the translation accuracy of diverse content, thus bolstering the reliability and applicability of NMT systems across varying contexts and language pairs.

PDF Markdown