Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages (2306.16774v1)

Published 29 Jun 2023 in cs.CL

Abstract: Vision-Language Pre-training (VLP) has advanced the performance of many vision-language tasks, such as image-text retrieval, visual entailment, and visual reasoning. The pre-training mostly utilizes lexical databases and image queries in English. Previous work has demonstrated that the pre-training in English does not transfer well to other languages in a zero-shot setting. However, multilingual pre-trained LLMs (MPLM) have excelled at a variety of single-modal language tasks. In this paper, we propose a simple yet efficient approach to adapt VLP to unseen languages using MPLM. We utilize a cross-lingual contextualized token embeddings alignment approach to train text encoders for non-English languages. Our approach does not require image input and primarily uses machine translation, eliminating the need for target language data. Our evaluation across three distinct tasks (image-text retrieval, visual entailment, and natural language visual reasoning) demonstrates that this approach outperforms the state-of-the-art multilingual vision-LLMs without requiring large parallel corpora. Our code is available at https://github.com/Yasminekaroui/CliCoTea.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yasmine Karoui (1 paper)
  2. Negar Foroutan (10 papers)
  3. Karl Aberer (44 papers)
  4. Rémi Lebret (19 papers)
Github Logo Streamline Icon: https://streamlinehq.com