Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention (1908.07236v2)

Published 20 Aug 2019 in cs.CV

Abstract: This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and {\em proposal-free approach} that relies on three key components: a dynamic filter to transfer language information to the visual domain, a new loss function to guide our model to attend the most relevant parts of the video, and soft labels to model annotation uncertainty. We evaluate our method on two benchmark datasets, Charades-STA and ActivityNet-Captions. Experimental results show that our approach outperforms state-of-the-art methods on both datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Cristian Rodriguez-Opazo (15 papers)
  2. Edison Marrese-Taylor (29 papers)
  3. Fatemeh Sadat Saleh (10 papers)
  4. Hongdong Li (172 papers)
  5. Stephen Gould (104 papers)
Citations (138)