Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models (2504.09979v1)

Published 14 Apr 2025 in cs.CV

Abstract: We propose an efficient evaluation protocol for large vision-LLMs (VLMs). Given their broad knowledge and reasoning capabilities, multiple benchmarks are needed for comprehensive assessment, making evaluation computationally expensive. To improve efficiency, we construct a subset that yields results comparable to full benchmark evaluations. Our benchmark classification experiments reveal that no single benchmark fully covers all challenges. We then introduce a subset construction method using farthest point sampling (FPS). Our experiments show that FPS-based benchmarks maintain a strong correlation (> 0.96) with full evaluations while using only ~1\% of the data. Additionally, applying FPS to an existing benchmark improves correlation with overall evaluation results, suggesting its potential to reduce unintended dataset biases.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models (2504.09979v1)

Collections

Summary

Follow-up Questions

Related Papers

Authors (2)