Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? (2311.18021v2)

Published 29 Nov 2023 in cs.CV, cs.AI, and cs.LG

Abstract: LLMs with in-context learning (ICL) ability can quickly adapt to a specific context given a few demonstrations (demos). Recently, Multimodal LLMs (MLLMs) built upon LLMs have also shown multimodal ICL ability, i.e., responding to queries given a few multimodal demos, including images, queries, and answers. While ICL has been extensively studied on LLMs, its research on MLLMs remains limited. One essential question is whether these MLLMs can truly conduct multimodal ICL, or if only the textual modality is necessary. We investigate this question by examining two primary factors that influence ICL: 1) Demo content, i.e., understanding the influences of demo content in different modalities. 2) Demo selection strategy, i.e., how to select better multimodal demos for improved performance. Experiments revealed that multimodal ICL is predominantly driven by the textual content whereas the visual information in the demos has little influence. Interestingly, visual content is still necessary and useful for selecting demos to increase performance. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demos. Extensive experiments are conducted to support our findings and verify the improvement brought by our method. Code is available at \url{https://chenxshuo.github.io/m-icl/}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Shuo Chen (127 papers)
  2. Zhen Han (54 papers)
  3. Bailan He (12 papers)
  4. Mark Buckley (4 papers)
  5. Philip Torr (172 papers)
  6. Volker Tresp (158 papers)
  7. Jindong Gu (101 papers)
  8. Jianzhe Liu (12 papers)
  9. Yao Qin (41 papers)
Citations (8)