Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts (2505.13928v1)

Published 20 May 2025 in cs.CV and cs.IR

Abstract: Long videos contain a vast amount of information, making video-text retrieval an essential and challenging task in multimodal learning. However, existing benchmarks suffer from limited video duration, low-quality captions, and coarse annotation granularity, which hinder the evaluation of advanced video-text retrieval methods. To address these limitations, we introduce LoVR, a benchmark specifically designed for long video-text retrieval. LoVR contains 467 long videos and over 40,804 fine-grained clips with high-quality captions. To overcome the issue of poor machine-generated annotations, we propose an efficient caption generation framework that integrates VLM automatic generation, caption quality scoring, and dynamic refinement. This pipeline improves annotation accuracy while maintaining scalability. Furthermore, we introduce a semantic fusion method to generate coherent full-video captions without losing important contextual information. Our benchmark introduces longer videos, more detailed captions, and a larger-scale dataset, presenting new challenges for video understanding and retrieval. Extensive experiments on various advanced embedding models demonstrate that LoVR is a challenging benchmark, revealing the limitations of current approaches and providing valuable insights for future research. We release the code and dataset link at https://github.com/TechNomad-ds/LoVR-benchmark

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Qifeng Cai (3 papers)
  2. Hao Liang (137 papers)
  3. Hejun Dong (3 papers)
  4. Meiyi Qiang (4 papers)
  5. Ruichuan An (14 papers)
  6. Zhaoyang Han (7 papers)
  7. Zhengzhou Zhu (4 papers)
  8. Bin Cui (165 papers)
  9. Wentao Zhang (261 papers)
Github Logo Streamline Icon: https://streamlinehq.com