Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation (2007.08742v1)

Published 17 Jul 2020 in cs.CL

Abstract: Multi-modal neural machine translation (NMT) aims to translate source sentences into a target language paired with images. However, dominant multi-modal NMT models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have potential to refine multi-modal representation learning. To deal with this issue, in this paper, we propose a novel graph-based multi-modal fusion encoder for NMT. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, these representations provide an attention-based context vector for the decoder. We evaluate our proposed encoder on the Multi30K datasets. Experimental results and in-depth analysis show the superiority of our multi-modal NMT model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yongjing Yin (19 papers)
  2. Fandong Meng (174 papers)
  3. Jinsong Su (96 papers)
  4. Chulun Zhou (13 papers)
  5. Zhengyuan Yang (86 papers)
  6. Jie Zhou (687 papers)
  7. Jiebo Luo (355 papers)
Citations (128)

Summary

We haven't generated a summary for this paper yet.