Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding (2008.06941v2)

Published 16 Aug 2020 in cs.CV and cs.MM

Abstract: Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence. Currently, most existing grounding methods are restricted to well-aligned segment-sentence pairs. In this paper, we explore spatio-temporal video grounding on unaligned data and multi-form sentences. This challenging task requires to capture critical object relations to identify the queried target. However, existing approaches cannot distinguish notable objects and remain in ineffective relation modeling between unnecessary objects. Thus, we propose a novel object-aware multi-branch relation network for object-aware relation discovery. Concretely, we first devise multiple branches to develop object-aware region modeling, where each branch focuses on a crucial object mentioned in the sentence. We then propose multi-branch relation reasoning to capture critical object relationships between the main branch and auxiliary branches. Moreover, we apply a diversity loss to make each branch only pay attention to its corresponding object and boost multi-branch learning. The extensive experiments show the effectiveness of our proposed method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhu Zhang (39 papers)
  2. Zhou Zhao (219 papers)
  3. Zhijie Lin (30 papers)
  4. Baoxing Huai (28 papers)
  5. Nicholas Jing Yuan (22 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.