Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation (1802.01812v1)

Published 6 Feb 2018 in cs.CL, cs.AI, and cs.LG

Abstract: Attention-based sequence-to-sequence model has proved successful in Neural Machine Translation (NMT). However, the attention without consideration of decoding history, which includes the past information in the decoder and the attention mechanism, often causes much repetition. To address this problem, we propose the decoding-history-based Adaptive Control of Attention (ACA) for the NMT model. ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration. Experiments on Chinese-English translation and the English-Vietnamese translation have demonstrated that our model significantly outperforms the strong baselines. The analysis shows that our model is capable of generating translation with less repetition and higher accuracy. The code will be available at https://github.com/lancopku

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Junyang Lin (99 papers)
  2. Shuming Ma (83 papers)
  3. Qi Su (58 papers)
  4. Xu Sun (194 papers)
Citations (16)