Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human-Object Interaction Detection via Disentangled Transformer (2204.09290v1)

Published 20 Apr 2022 in cs.CV

Abstract: Human-Object Interaction Detection tackles the problem of joint localization and classification of human object interactions. Existing HOI transformers either adopt a single decoder for triplet prediction, or utilize two parallel decoders to detect individual objects and interactions separately, and compose triplets by a matching process. In contrast, we decouple the triplet prediction into human-object pair detection and interaction classification. Our main motivation is that detecting the human-object instances and classifying interactions accurately needs to learn representations that focus on different regions. To this end, we present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks. To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder. Extensive experiments show that our method outperforms prior work on two public HOI benchmarks by a sizeable margin. Code will be available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Desen Zhou (10 papers)
  2. Zhichao Liu (47 papers)
  3. Jian Wang (967 papers)
  4. Leshan Wang (2 papers)
  5. Tao Hu (146 papers)
  6. Errui Ding (156 papers)
  7. Jingdong Wang (236 papers)
Citations (51)

Summary

We haven't generated a summary for this paper yet.