Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MIBench: Evaluating Multimodal Large Language Models over Multiple Images (2407.15272v2)

Published 21 Jul 2024 in cs.CV

Abstract: Built on the power of LLMs, numerous multimodal LLMs (MLLMs) have recently achieved remarkable performance on various vision-language tasks. However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images underexplored. Although a few benchmarks consider multiple images, their evaluation dimensions and samples are very limited. In this paper, we propose a new benchmark MIBench, to comprehensively evaluate fine-grained abilities of MLLMs in multi-image scenarios. Specifically, MIBench categorizes the multi-image abilities into three scenarios: multi-image instruction (MII), multimodal knowledge-seeking (MKS) and multimodal in-context learning (MIC), and constructs 13 tasks with a total of 13K annotated samples. During data construction, for MII and MKS, we extract correct options from manual annotations and create challenging distractors to obtain multiple-choice questions. For MIC, to enable an in-depth evaluation, we set four sub-tasks and transform the original datasets into in-context learning formats. We evaluate several open-source and closed-source MLLMs on the proposed MIBench. The results reveal that although current models excel in single-image tasks, they exhibit significant shortcomings when faced with multi-image inputs, such as limited fine-grained perception, multi-image reasoning and in-context learning abilities. The annotated data of MIBench is available at https://huggingface.co/datasets/StarBottle/MIBench.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Haowei Liu (13 papers)
  2. Xi Zhang (302 papers)
  3. Haiyang Xu (67 papers)
  4. Yaya Shi (13 papers)
  5. Chaoya Jiang (15 papers)
  6. Ming Yan (190 papers)
  7. Ji Zhang (176 papers)
  8. Fei Huang (408 papers)
  9. Chunfeng Yuan (35 papers)
  10. Bing Li (374 papers)
  11. Weiming Hu (91 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com