Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video (2405.08890v2)

Published 14 May 2024 in cs.CV

Abstract: Current video summarization methods rely heavily on supervised computer vision techniques, which demands time-consuming and subjective manual annotations. To overcome these limitations, we investigated self-supervised video summarization. Inspired by the success of LLMs, we explored the feasibility in transforming the video summarization task into a NLP task. By leveraging the advantages of LLMs in context understanding, we aim to enhance the effectiveness of self-supervised video summarization. Our method begins by generating captions for individual video frames, which are then synthesized into text summaries by LLMs. Subsequently, we measure semantic distance between the captions and the text summary. Notably, we propose a novel loss function to optimize our model according to the diversity of the video. Finally, the summarized video can be generated by selecting the frames with captions similar to the text summary. Our method achieves state-of-the-art performance on the SumMe dataset in rank correlation coefficients. In addition, our method has a novel feature of being able to achieve personalized summarization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tomoya Sugihara (1 paper)
  2. Shuntaro Masuda (1 paper)
  3. Ling Xiao (45 papers)
  4. Toshihiko Yamasaki (74 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.