Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks (2310.02569v2)

Published 4 Oct 2023 in cs.CV

Abstract: Recent years have witnessed remarkable progress in the development of large vision-LLMs (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering substantial data for evaluating various capabilities of LVLMs. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors. Our benchmark and evaluation framework will be open-sourced as a cornerstone for advancing the development of LVLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Zejun Li (18 papers)
  2. Ye Wang (248 papers)
  3. Mengfei Du (5 papers)
  4. Qingwen Liu (75 papers)
  5. Binhao Wu (4 papers)
  6. Jiwen Zhang (16 papers)
  7. Chengxing Zhou (3 papers)
  8. Zhihao Fan (28 papers)
  9. Jie Fu (229 papers)
  10. Jingjing Chen (99 papers)
  11. Xuanjing Huang (287 papers)
  12. Zhongyu Wei (98 papers)
Citations (11)