Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Scale Multilingual Representations for Vision-Language Tasks (2004.04312v2)

Published 9 Apr 2020 in cs.CV and cs.CL

Abstract: Current multilingual vision-LLMs either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-LLMing loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Andrea Burns (11 papers)
  2. Donghyun Kim (129 papers)
  3. Derry Wijaya (31 papers)
  4. Kate Saenko (178 papers)
  5. Bryan A. Plummer (64 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.