Transferable Interactiveness Knowledge for Human-Object Interaction Detection (1811.08264v4)

Published 20 Nov 2018 in cs.CV

Abstract: Human-Object Interaction (HOI) Detection is an important problem to understand how humans interact with objects. In this paper, we explore Interactiveness Knowledge which indicates whether human and object interact with each other or not. We found that interactiveness knowledge can be learned across HOI datasets, regardless of HOI category settings. Our core idea is to exploit an Interactiveness Network to learn the general interactiveness knowledge from multiple HOI datasets and perform Non-Interaction Suppression before HOI classification in inference. On account of the generalization of interactiveness, interactiveness network is a transferable knowledge learner and can be cooperated with any HOI detection models to achieve desirable results. We extensively evaluate the proposed method on HICO-DET and V-COCO datasets. Our framework outperforms state-of-the-art HOI detection results by a great margin, verifying its efficacy and flexibility. Code is available at https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network.

Citations (37)

View on Semantic Scholar

Summary

The paper introduces an Interactiveness Network that learns transferable interactiveness to significantly improve HOI detection performance.
It employs a Non-Interaction Suppression strategy to efficiently filter out irrelevant human-object pairs, reducing false positives.
The method achieves robust gains in mAP on HICO-DET and V-COCO benchmarks, particularly enhancing detection in rare HOI categories.

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

The paper "Transferable Interactiveness Knowledge for Human-Object Interaction Detection" addresses the challenge of detecting Human-Object Interactions (HOIs) in still images, focusing on the transferable nature of interactiveness knowledge across different datasets. The authors propose an innovative framework introducing an Interactiveness Network to enhance HOI detection performance by exploiting cross-dataset interactiveness knowledge.

Core Contributions

Interactiveness Network: The paper introduces an Interactiveness Network, which detects whether human-object pairs interact, independent of specific HOI categories. This network learns general interactiveness knowledge from multiple HOI datasets, which can then be used in conjunction with any HOI detection model.
Non-Interaction Suppression (NIS): The framework employs a two-stage inference mechanism. First, it uses the learned interactiveness to suppress non-interactive pairs, thereby reducing false positives significantly. Subsequently, the HOI detection task is performed solely on interactive pairs, improving efficiency and accuracy.
Transferable Knowledge: One of the significant insights of this work is the identification of interactiveness as a transferable component across datasets. Unlike traditional one-stage methods that encounter difficulties with diverse category settings and dataset scales, the proposed method shows that interactiveness can be generalized and transferred, adding value to other datasets.

Methodology

The authors utilize a combination of visual appearance and spatial configuration by adopting a multi-stream architecture, which integrates human, object, and spatial-pose information. Through explicit interactiveness discrimination, the network learns to efficiently filter out non-interactive pairs, which are often misclassified as interactive in one-stage methods. The approach leverages a logistic function-based weighting mechanism, known as Low-grade Instance Suppressive Function (LIS), emphasizing high-quality object detections and further assisting interactiveness determination.

Evaluation and Results

The framework's efficacy and flexibility are rigorously evaluated on standard benchmarks, HICO-DET and V-COCO. Compared to existing state-of-the-art detection models, the proposed method achieves considerable improvements in mean average precision (mAP), indicating a robust reduction in false positive rates. Specifically, the framework demonstrates a performance increase by 2.38 mAP on HICO-DET and 4.0 mAP on V-COCO, signifying strong numerical results particularly in the Rare HOI categories.

Implications and Future Developments

The introduction of interactiveness as a transferable component has significant implications for the future of HOI detection and potentially broader computer vision problems. This work encourages further exploration into dataset-independent knowledge elements and highlights the value of reusable components in machine learning models. Future research could explore expanding the scope of interactiveness knowledge to other contexts or modalities, potentially integrating with temporal data to advance action recognition systems.

In summary, this paper presents an innovative approach to improving HOI detection by leveraging transferable interactiveness knowledge, providing both theoretical insights and practical advancements in the field of computer vision. The findings suggest a promising direction for future research on the cross-dataset transferability of learned knowledge in AI models.

PDF Markdown

Related Papers

GitHub

GitHub - DirtyHarryLYL/Transferable-Interactiveness-Network: Code for Transferable Interactiveness Knowledge for Human-Object Interaction Detection. (CVPR'19, TPAMI'21) (227 stars)