Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions (2204.00746v2)

Published 2 Apr 2022 in cs.CV

Abstract: We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. A S M Iftekhar (11 papers)
  2. Hao Chen (1006 papers)
  3. Kaustav Kundu (9 papers)
  4. Xinyu Li (136 papers)
  5. Joseph Tighe (30 papers)
  6. Davide Modolo (30 papers)
Citations (47)