Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning (2407.00902v2)

Published 1 Jul 2024 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Motivated by in-context learning (ICL) capabilities of LLMs, multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks. Through perturbations over different modality information, we show that modalities matter differently across tasks in multimodal ICL. Guided by task-specific modality impact, we recommend modality-driven demonstration strategies to boost ICL performance. We also find that models may follow inductive biases from multimodal ICL even if they are rarely seen in or contradict semantic priors from pretraining data. Our principled analysis provides a comprehensive way of understanding the role of demonstrations in multimodal in-context learning, and sheds light on effectively improving multimodal ICL on a wide range of tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nan Xu (83 papers)
  2. Fei Wang (573 papers)
  3. Sheng Zhang (212 papers)
  4. Hoifung Poon (61 papers)
  5. Muhao Chen (159 papers)
Citations (3)