BEHAVE: Dataset and Method for Tracking Human Object Interactions (2204.06950v1)

Published 14 Apr 2022 in cs.CV

Abstract: Modelling interactions between humans and objects in natural environments is central to many applications including gaming, virtual and mixed reality, as well as human behavior analysis and human-robot collaboration. This challenging operation scenario requires generalization to vast number of objects, scenes, and human actions. Unfortunately, there exist no such dataset. Moreover, this data needs to be acquired in diverse natural environments, which rules out 4D scanners and marker based capture systems. We present BEHAVE dataset, the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them. We record around 15k frames at 5 locations with 8 subjects performing a wide range of interactions with 20 common objects. We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup. Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions. Our approach can record and track not just the humans and objects but also their interactions, modeled as surface contacts, in 3D. Our code and data can be found at: http://virtualhumans.mpi-inf.mpg.de/behave

Citations (141)

View on Semantic Scholar

Summary

The paper presents a comprehensive BEHAVE dataset featuring 15,000 RGBD frames capturing multi-view human-object interactions and annotated contacts.
It introduces a novel method that integrates neural correspondences with a portable multi-camera setup to fit SMPL models and predict accurate object orientations.
Quantitative results show superior performance with lower alignment errors (4.99 cm for SMPL; 21.20 cm for objects) compared to traditional methods.

An Overview of the BEHAVE Dataset and Tracking Method for Human-Object Interactions

The paper presents a significant development in the domain of modeling human-object interactions by introducing the BEHAVE dataset and a novel method for tracking these interactions in dynamic environments. The emphasis is on practical applications ranging from gaming and virtual reality to human-robot collaboration, where understanding human-object dynamics is crucial.

BEHAVE Dataset

The BEHAVE dataset marks a pivotal contribution as the first to offer comprehensive multi-view RGBD frames capturing human-object interactions with full-body human models, object fits, and annotated contacts between them. It comprises approximately 15,000 frames from 8 subjects interacting with 20 common objects across 5 distinct locations. This dataset overcomes the limitations of existing datasets by capturing interactions without the need for 4D scanners or marker-based capture systems, thus enabling a wider range of natural interactions.

Methodology

The paper introduces an innovative method leveraging this dataset to track human-object interactions through a portable multi-camera setup. The core insight lies in predicting correspondences from the human and object to a statistical body model, which facilitates the capture of human-object contacts during interactions. This is achieved through integrating neural network predictions, which enhance the registration of humans and objects in 3D space, making the process robust to occlusions and other challenges inherent in natural settings.

The methodology consists of:

Fitting a Human Model (SMPL): By using neural correspondences to predict human surface points, the model effectively tackles challenges of partial occlusions and noisy input data.
Object Fitting: The method introduces an object orientation prediction that improves fitting accuracy by avoiding local minima common in tracking objectives.
Contact Prediction: The network also predicts contacts as 3D correspondences, ensuring realistic interaction modeling by preventing non-physical outcomes like hovering or floating objects.

Numerical Results and Comparison

The paper provides quantitative comparisons demonstrating the superiority of the proposed method over existing approaches, notably outperforming the PHOSA method by achieving a lower vertex-to-vertex alignment error for both SMPL (4.99 cm) and object models (21.20 cm). Traditional fitting methods that do not incorporate neural correspondences and implicit modeling fail to achieve similar accuracy.

Implications and Future Directions

The implications of this research are profound for fields requiring accurate modeling of human-environment interactions. Practically, this work lays the groundwork for more advanced applications in virtual reality, ergonomic design in robotics, and interactive AI systems. Theoretically, it challenges existing paradigms in computer vision and human modeling by highlighting the efficacy of combined RGBD inputs and neural predictions for solving complex interaction tracking problems.

Future research may build upon this work by exploring extensions into single-camera setups or enhancing real-time processing capabilities. With the dataset publicly available, it invites further explorations into diverse applications, fostering innovations in AI systems that require intricate understanding of human-object dynamics.

PDF Markdown

Related Papers

YouTube

Show All Videos