Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modelling Spatio-Temporal Interactions For Compositional Action Recognition (2305.02673v2)

Published 4 May 2023 in cs.CV

Abstract: Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed. Humans can abstract away the action from the appearance of the objects which is referred to as compositionality of actions. We focus on this compositional aspect of action recognition to impart human-like generalization abilities to video action-recognition models. First, we propose an interaction model that captures both fine-grained and long-range interactions between hands and objects. Frame-wise hand-object interactions capture fine-grained movements, while long-range interactions capture broader context and disambiguate actions across time. Second, in order to provide additional contextual cues to differentiate similar actions, we infuse the interaction tokens with global motion information from video tokens. The final global motion refined interaction tokens are used for compositional action recognition. We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset where we obtain a new state-of-the-art result outperforming recent object-centric methods by a significant margin.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ramanathan Rajendiran (3 papers)
  2. Debaditya Roy (10 papers)
  3. Basura Fernando (60 papers)
Citations (1)