Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VLM-Eval: A General Evaluation on Video Large Language Models (2311.11865v1)

Published 20 Nov 2023 in cs.CV

Abstract: Despite the rapid development of video LLMs, a comprehensive evaluation is still absent. In this paper, we introduce a unified evaluation that encompasses multiple video tasks, including captioning, question and answering, retrieval, and action recognition. In addition to conventional metrics, we showcase how GPT-based evaluation can match human-like performance in assessing response quality across multiple aspects. We propose a simple baseline: Video-LLaVA, which uses a single linear projection and outperforms existing video LLMs. Finally, we evaluate video LLMs beyond academic datasets, which show encouraging recognition and reasoning capabilities in driving scenarios with only hundreds of video-instruction pairs for fine-tuning. We hope our work can serve as a unified evaluation for video LLMs, and help expand more practical scenarios. The evaluation code will be available soon.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shuailin Li (9 papers)
  2. Yuang Zhang (18 papers)
  3. Yucheng Zhao (28 papers)
  4. Qiuyue Wang (8 papers)
  5. Fan Jia (33 papers)
  6. Yingfei Liu (20 papers)
  7. Tiancai Wang (48 papers)
Citations (1)