InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (2008.09309v1)

Published 21 Aug 2020 in cs.CV

Abstract: Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of \textbf{2.6M labeled single and interacting hand frames} under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available at https://mks0601.github.io/InterHand2.6M/.

View on arXiv

Authors (5)

Gyeongsik Moon (31 papers)
He Wen (22 papers)
Takaaki Shiratori (18 papers)
Kyoung Mu Lee (107 papers)
Shoou-I Yu (14 papers)

Citations (250)

View on Semantic Scholar

Summary

Overview of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image"

This essay examines the paper “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image,” which addresses the challenges inherent in estimating 3D interacting hand poses using single RGB images. Despite advancements in the field of 3D hand pose estimation, most efforts have focused on isolated single-hand scenarios. The paper introduces a novel dataset, InterHand2.6M, alongside a baseline system, InterNet, designed to enhance the estimation of 3D hand poses in interactive settings.

Dataset and Methodology

InterHand2.6M provides a comprehensive, real-world dataset encompassing 2.6 million labeled frames depicting single and interacting hands in diverse poses. Captured in a multi-view studio using 80 to 140 cameras, this dataset surpasses previous collections in scale and resolution. In developing InterHand2.6M, the authors employed a semi-automatic annotation strategy combining human input and machine-generated annotations, achieving efficient labeling with notable accuracy.

The introduced InterNet model predicts 3D hand poses by utilizing handedness estimation, 2.5D pose estimation, and relative depth between hands. The handedness prediction component ascertains the presence of right or left hands in an image, while the 2.5D pose model estimates planar coordinates and depth relative to the root joint. A particular innovation lies in estimating relative depth between interacting hands, enhancing 3D pose prediction accuracy.

Experimental Outcomes

Experiments reveal that including interacting hand data significantly improves the accuracy of 3D hand pose estimation in interactive scenarios. Testing InterNet on the InterHand2.6M dataset yields substantial reductions in interacting hand pose estimation errors compared to baselines trained solely on single hand data. The evaluation on benchmark datasets such as STB and RHP demonstrates that InterNet outperforms existing state-of-the-art methods in 3D hand pose prediction without requiring ground-truth scale or handedness information during inference.

Implications and Future Directions

The implications of this research are both practical and theoretical. Practically, the large-scale InterHand2.6M dataset serves as a critical resource for developing and benchmarking new algorithms in 3D hand pose estimation. Theoretically, this work highlights the necessity of data diversity and multi-view capture in improving model performance in complex interacting scenarios.

Future research could explore integrating mesh-based models or investigating domain adaptation techniques to leverage synthetic datasets alongside real-world data for enhanced generalizability. Additionally, extending this work to dynamic sequences or real-time applications could further the impact of these findings in domains such as virtual reality and interactive robotics.

In conclusion, this paper provides a structured framework and essential resources for advancing the domain of 3D interacting hand pose estimation, opening avenues for more robust human-computer interaction interfaces. The comprehensive dataset and the methodological innovations presented by InterHand2.6M stand as notable contributions to the field, underpinning future scientific inquiry and application development.

PDF Markdown

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (2008.09309v1)

Summary

Overview of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image"

Dataset and Methodology

Experimental Outcomes

Implications and Future Directions

Related Papers

GitHub

YouTube