Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Pivoting for Learning Multilingual Multimodal Representations (1707.07601v1)

Published 24 Jul 2017 in cs.CL and cs.CV

Abstract: In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Spandana Gella (26 papers)
  2. Rico Sennrich (87 papers)
  3. Frank Keller (45 papers)
  4. Mirella Lapata (135 papers)
Citations (77)