Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness (2502.14914v3)

Published 19 Feb 2025 in cs.CV, cs.CL, and cs.LG

Abstract: Visual captioning benchmarks have become outdated with the emergence of modern multimodal LLMs (MLLMs), as the brief ground-truth sentences and traditional metrics fail to assess detailed captions effectively. While recent benchmarks attempt to address this by focusing on keyword extraction or object-centric evaluation, they remain limited to vague-view or object-view analyses and incomplete visual element coverage. In this paper, we introduce CAPability, a comprehensive multi-view benchmark for evaluating visual captioning across 12 dimensions spanning six critical views. We curate nearly 11K human-annotated images and videos with visual element annotations to evaluate the generated captions. CAPability stably assesses both the correctness and thoroughness of captions with \textit{precision} and \textit{hit} metrics. By converting annotations to QA pairs, we further introduce a heuristic metric, \textit{know but cannot tell} ($K\bar{T}$), indicating a significant performance gap between QA and caption capabilities. Our work provides a holistic analysis of MLLMs' captioning abilities, as we identify their strengths and weaknesses across various dimensions, guiding future research to enhance specific aspects of their capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Zhihang Liu (9 papers)
  2. Chen-Wei Xie (14 papers)
  3. Bin Wen (34 papers)
  4. Feiwu Yu (3 papers)
  5. Jixuan Chen (9 papers)
  6. Pandeng Li (10 papers)
  7. Boqiang Zhang (11 papers)
  8. Nianzu Yang (7 papers)
  9. Yinglu Li (6 papers)
  10. Zuan Gao (4 papers)
  11. Yun Zheng (49 papers)
  12. Hongtao Xie (48 papers)