Multimodal Attention for Neural Machine Translation (1609.03976v1)

Published 13 Sep 2016 in cs.CL and cs.NE

Abstract: The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.

View on arXiv

Authors (3)

Ozan Caglayan (20 papers)
Loïc Barrault (34 papers)
Fethi Bougares (18 papers)

Citations (74)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Multimodal Attention for Neural Machine Translation (1609.03976v1)

Summary

Related Papers