Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detecting Human-Object Interaction via Fabricated Compositional Learning (2103.08214v2)

Published 15 Mar 2021 in cs.CV

Abstract: Human-Object Interaction (HOI) detection, inferring the relationships between human and objects from images/videos, is a fundamental task for high-level scene understanding. However, HOI detection usually suffers from the open long-tailed nature of interactions with objects, while human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples. Inspired by this, we devise a novel HOI compositional learning framework, termed as Fabricated Compositional Learning (FCL), to address the problem of open long-tailed HOI detection. Specifically, we introduce an object fabricator to generate effective object representations, and then combine verbs and fabricated objects to compose new HOI samples. With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection. Extensive experiments on the most popular HOI detection dataset, HICO-DET, demonstrate the effectiveness of the proposed method for imbalanced HOI detection and significantly improve the state-of-the-art performance on rare and unseen HOI categories. Code is available at https://github.com/zhihou7/HOI-CL.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhi Hou (13 papers)
  2. Baosheng Yu (51 papers)
  3. Yu Qiao (563 papers)
  4. Xiaojiang Peng (59 papers)
  5. Dacheng Tao (830 papers)
Citations (86)

Summary

  • The paper introduces FCL, a novel framework that synthesizes artificial HOI samples to address data imbalance and improve detection of rare interactions.
  • It employs an object fabricator to merge verb and object features, achieving up to 4.22% mAP improvement on unseen categories in the HICO-DET benchmark.
  • Robust evaluations, including t-SNE visualizations, demonstrate FCL's versatility in enhancing scene understanding and its applicability to visual relation detection tasks.

Detecting Human-Object Interaction via Fabricated Compositional Learning

The paper "Detecting Human-Object Interaction via Fabricated Compositional Learning" presents an innovative framework to improve Human-Object Interaction (HOI) detection, specifically addressing the challenges posed by the open long-tailed distribution of interactions. Human-Object Interaction detection is a crucial task in computer vision, given its role in high-level scene understanding, yet it often contends with a pronounced imbalance in the frequency of certain interaction classes and a predominance of rare or unseen classifications. To tackle these challenges, the authors introduce Fabricated Compositional Learning (FCL), a method that synthesizes new HOI samples utilizing a novel feature generation approach.

The proposed FCL framework is centered on the concept of compositional learning, drawing inspiration from human cognitive abilities to perceive and infer unseen or rare interactions from limited samples. By incorporating an object fabricator, the framework generates artificial object representations, which are then merged with verb representations to synthesize novel HOI samples. This compositional approach facilitates an enhanced understanding of interactions by creating balanced training samples for categories that are otherwise rare or unseen.

The method is meticulously evaluated using the HICO-DET dataset, a widely recognized benchmark for HOI detection. The experiments demonstrate that FCL substantially boosts performance, especially in the detection of rare and unseen categories, outperforming several state-of-the-art methods. Specifically, the incorporation of fabricated objects significantly mitigates the effects of data imbalance and enhances the detection accuracy across long-tailed distributions.

The efficacy of FCL is illustrated through several key results: it achieves a mAP improvement of up to 4.22% on unseen categories in one evaluation setting and demonstrates a notable boost in the detection of rare interactions by 2.82% compared to the leading approach under the same object detector. Complementary analyses, such as t-SNE visualizations, validate that fabricated object features effectively cluster with real object features, enhancing their discriminative capacity for interaction detection.

Furthermore, the paper explores various architectural variations and optimizations, including step-wise optimization and the integration of auxiliary losses, which contribute to the improved performance. FCL proves to be robust, showing improved results not only in HOI detection scenarios but also when benchmarked on visual relation detection tasks, reinforcing its application potential across varied domains.

In conclusion, the paper presents a compelling advancement in HOI detection through its novel approach of leveraging fabricated compositional learning. The ability to synthesize rare interaction samples offers significant prospects for enhancing scene understanding models, especially in AI systems tasked with processing real-world data where the long-tailed distribution of visual entities is prevalent. As future work, the exploration of more sophisticated fabrication techniques and the integration with broader multi-modal datasets could further leverage this method in enhancing AI's interaction comprehension capabilities.