Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning (2309.07915v3)

Published 14 Sep 2023 in cs.CL, cs.AI, and cs.CV

Abstract: Since the resurgence of deep learning, vision-LLMs (VLMs) enhanced by LLMs have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex multi-modal prompts with multiple images, making VLMs less effective in downstream vision-language tasks. In this paper, we address the limitation above by 1) introducing vision-LLM with Multi-Modal In-Context Learning(MMICL), a new approach to allow the VLM to deal with multi-modal inputs efficiently; 2) proposing a novel context scheme to augment the in-context learning ability of the VLM; 3) constructing the Multi-modal In-Context Learning (MIC) dataset, designed to enhance the VLM's ability to understand complex multi-modal prompts. Our experiments confirm that MMICL achieves new state-of-the-art zero-shot performance on a wide range of general vision-language tasks, especially for complex benchmarks, including MME and MMBench. Our analysis demonstrates that MMICL effectively tackles the challenge of complex multi-modal prompt understanding and emerges the impressive ICL ability. Furthermore, we observe that MMICL successfully alleviates language bias in VLMs, a common issue for VLMs that often leads to hallucination when faced with extensive textual context. Our code, dataset, dataset tool, and model are available at https://github.com/PKUnlp-icler/MIC

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Haozhe Zhao (19 papers)
  2. Zefan Cai (26 papers)
  3. Shuzheng Si (20 papers)
  4. Xiaojian Ma (52 papers)
  5. Kaikai An (15 papers)
  6. Liang Chen (360 papers)
  7. Zixuan Liu (38 papers)
  8. Sheng Wang (239 papers)
  9. Wenjuan Han (36 papers)
  10. Baobao Chang (80 papers)
Citations (115)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com