Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models (2306.01311v1)

Published 2 Jun 2023 in cs.CL

Abstract: Large-scale LLMs have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a LLM to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Masoud Monajatipoor (9 papers)
  2. Liunian Harold Li (19 papers)
  3. Mozhdeh Rouhsedaghat (9 papers)
  4. Lin F. Yang (86 papers)
  5. Kai-Wei Chang (292 papers)
Citations (8)