Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective (2210.08478v1)

Published 16 Oct 2022 in cs.CV and cs.CL

Abstract: Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Baijun Ji (5 papers)
  2. Tong Zhang (569 papers)
  3. Yicheng Zou (20 papers)
  4. Bojie Hu (8 papers)
  5. Si Shen (11 papers)
Citations (11)