Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frame-Voyager: Learning to Query Frames for Video Large Language Models (2410.03226v2)

Published 4 Oct 2024 in cs.CV and cs.CL

Abstract: Video LLMs (Video-LLMs) have made remarkable progress in video understanding tasks. However, they are constrained by the maximum length of input tokens, making it impractical to input entire videos. Existing frame selection approaches, such as uniform frame sampling and text-frame retrieval, fail to account for the information density variations in the videos or the complex instructions in the tasks, leading to sub-optimal performance. In this paper, we propose Frame-Voyager that learns to query informative frame combinations, based on the given textual queries in the task. To train Frame-Voyager, we introduce a new data collection and labeling pipeline, by ranking frame combinations using a pre-trained Video-LLM. Given a video of M frames, we traverse its T-frame combinations, feed them into a Video-LLM, and rank them based on Video-LLM's prediction losses. Using this ranking as supervision, we train Frame-Voyager to query the frame combinations with lower losses. In experiments, we evaluate Frame-Voyager on four Video Question Answering benchmarks by plugging it into two different Video-LLMs. The experimental results demonstrate that Frame-Voyager achieves impressive results in all settings, highlighting its potential as a plug-and-play solution for Video-LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Sicheng Yu (13 papers)
  2. Chengkai Jin (1 paper)
  3. Huanyu Wang (26 papers)
  4. Zhenghao Chen (30 papers)
  5. Sheng Jin (69 papers)
  6. Zhongrong Zuo (1 paper)
  7. Zhenbang Sun (10 papers)
  8. Bingni Zhang (3 papers)
  9. Jiawei Wu (43 papers)
  10. Hao Zhang (948 papers)
  11. Qianru Sun (65 papers)
  12. Xiaolei Xu (5 papers)