Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Re-thinking Temporal Search for Long-Form Video Understanding (2504.02259v2)

Published 3 Apr 2025 in cs.CV

Abstract: Efficiently understanding long-form videos remains a significant challenge in computer vision. In this work, we revisit temporal search paradigms for long-form video understanding and address a fundamental issue pertaining to all state-of-the-art (SOTA) long-context vision-LLMs (VLMs). Our contributions are twofold: First, we frame temporal search as a Long Video Haystack problem: finding a minimal set of relevant frames (e.g., one to five) from tens of thousands based on specific queries. Upon this formulation, we introduce LV-Haystack, the first dataset with 480 hours of videos, 15,092 human-annotated instances for both training and evaluation aiming to improve temporal search quality and efficiency. Results on LV-Haystack highlight a significant research gap in temporal search capabilities, with current SOTA search methods only achieving 2.1% temporal F1 score on the Longvideobench subset. Next, inspired by visual search in images, we propose a lightweight temporal search framework, T* that reframes costly temporal search as spatial search. T* leverages powerful visual localization techniques commonly used in images and introduces an adaptive zooming-in mechanism that operates across both temporal and spatial dimensions. Extensive experiments show that integrating T* with existing methods significantly improves SOTA long-form video understanding. Under an inference budget of 32 frames, T* improves GPT-4o's performance from 50.5% to 53.1% and LLaVA-OneVision-OV-72B's performance from 56.5% to 62.4% on the Longvideobench XL subset. Our code, benchmark, and models are provided in the Supplementary material.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Jinhui Ye (8 papers)
  2. Zihan Wang (181 papers)
  3. Haosen Sun (3 papers)
  4. Keshigeyan Chandrasegaran (13 papers)
  5. Zane Durante (12 papers)
  6. Cristobal Eyzaguirre (5 papers)
  7. Yonatan Bisk (91 papers)
  8. Juan Carlos Niebles (95 papers)
  9. Ehsan Adeli (97 papers)
  10. Li Fei-Fei (199 papers)
  11. Jiajun Wu (249 papers)
  12. Manling Li (47 papers)