Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Weakly Supervised Dense Video Captioning (1704.01502v1)

Published 5 Apr 2017 in cs.CV

Abstract: This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences. The proposed method is trained without explicit annotation of fine-grained sentence to video region-sequence correspondence, but is only based on weak video-level sentence annotations. It differs from existing video captioning systems in three technical aspects. First, we propose lexical fully convolutional neural networks (Lexical-FCN) with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels. Second, we introduce a novel submodular maximization scheme to generate multiple informative and diverse region-sequences based on the Lexical-FCN outputs. A winner-takes-all scheme is adopted to weakly associate sentences to region-sequences in the training phase. Third, a sequence-to-sequence learning based LLM is trained with the weakly supervised information obtained through the association process. We show that the proposed method can not only produce informative and diverse dense captions, but also outperform state-of-the-art single video captioning methods by a large margin.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhiqiang Shen (172 papers)
  2. Jianguo Li (59 papers)
  3. Zhou Su (51 papers)
  4. Minjun Li (5 papers)
  5. Yurong Chen (43 papers)
  6. Yu-Gang Jiang (223 papers)
  7. Xiangyang Xue (169 papers)
Citations (130)

Summary

We haven't generated a summary for this paper yet.