Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video sentence grounding with temporally global textual knowledge (2404.13611v2)

Published 21 Apr 2024 in cs.CV and cs.CL

Abstract: Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the same video-query pair, to enhance the bridging of domain gaps and attain a heightened level of similarity between multi-modal features. Specifically, we propose a Pseudo-query Intermediary Network (PIN) to achieve an improved alignment of visual and comprehensive pseudo-query features within the feature space through contrastive learning. Subsequently, we utilize learnable prompts to encapsulate the knowledge of pseudo-queries, propagating them into the textual encoder and multi-modal fusion module, further enhancing the feature alignment between visual and language for better temporal grounding. Extensive experiments conducted on the Charades-STA and ActivityNet-Captions datasets demonstrate the effectiveness of our method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Cai Chen (7 papers)
  2. Runzhong Zhang (3 papers)
  3. Jianjun Gao (15 papers)
  4. Kejun Wu (7 papers)
  5. Kim-Hui Yap (28 papers)
  6. Yi Wang (1038 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com