Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models (2406.09295v2)

Published 13 Jun 2024 in cs.CL and cs.CV

Abstract: Evaluating the alignment capabilities of large Vision-LLMs (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically designed for emerging Chinese VLMs. This benchmark is meticulously curated from real-world scenarios and Chinese Internet sources, encompassing thirteen specific tasks across three categories, and includes both single-turn and multi-turn dialogue scenarios. Incorporating a prompt rewrite strategy, AlignMMBench encompasses 1,054 images and 4,978 question-answer pairs. To facilitate the evaluation pipeline, we propose CritiqueVLM, a rule-calibrated evaluator that exceeds GPT-4's evaluation ability. Finally, we report the performance of representative VLMs on AlignMMBench, offering insights into the capabilities and limitations of different VLM architectures. All evaluation codes and data are available on https://alignmmbench.github.io.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yuhang Wu (41 papers)
  2. Wenmeng Yu (7 papers)
  3. Yean Cheng (8 papers)
  4. Yan Wang (733 papers)
  5. Xiaohan Zhang (78 papers)
  6. Jiazheng Xu (10 papers)
  7. Ming Ding (219 papers)
  8. Yuxiao Dong (119 papers)
Github Logo Streamline Icon: https://streamlinehq.com