Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning (2506.09473v1)

Published 11 Jun 2025 in cs.CV

Abstract: In-context learning (ICL), a predominant trend in instruction learning, aims at enhancing the performance of LLMs by providing clear task guidance and examples, improving their capability in task understanding and execution. This paper investigates ICL on Large Vision-LLMs (LVLMs) and explores the policies of multi-modal demonstration selection. Existing research efforts in ICL face significant challenges: First, they rely on pre-defined demonstrations or heuristic selecting strategies based on human intuition, which are usually inadequate for covering diverse task requirements, leading to sub-optimal solutions; Second, individually selecting each demonstration fails in modeling the interactions between them, resulting in information redundancy. Unlike these prevailing efforts, we propose a new exploration-exploitation reinforcement learning framework, which explores policies to fuse multi-modal information and adaptively select adequate demonstrations as an integrated whole. The framework allows LVLMs to optimize themselves by continually refining their demonstrations through self-exploration, enabling the ability to autonomously identify and generate the most effective selection policies for in-context learning. Experimental results verify the superior performance of our approach on four Visual Question-Answering (VQA) datasets, demonstrating its effectiveness in enhancing the generalization capability of few-shot LVLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Cheng Chen (262 papers)
  2. Yunpeng Zhai (8 papers)
  3. Yifan Zhao (66 papers)
  4. Jinyang Gao (35 papers)
  5. Bolin Ding (112 papers)
  6. Jia Li (380 papers)