Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Multilingual Word Embeddings Using Image-Text Data (1905.12260v1)

Published 29 May 2019 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Karan Singhal (26 papers)
  2. Karthik Raman (26 papers)
  3. Balder ten Cate (32 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.