H3WB: Human3.6M 3D WholeBody Dataset and Benchmark (2211.15692v2)

Published 28 Nov 2022 in cs.CV

Abstract: We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. Currently, the lack of a fully annotated and accurate 3D whole-body dataset results in deep networks being trained separately on specific body parts, which are combined during inference. Or they rely on pseudo-groundtruth provided by parametric body models which are not as accurate as detection based methods. To overcome these issues, we introduce the Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, and iii) 3D whole-body pose estimation from a single RGB image. Additionally, we report several baselines from popular methods for these tasks. Furthermore, we also provide automated 3D whole-body annotations of TotalCapture and experimentally show that when used with H3WB it helps to improve the performance. Code and dataset is available at https://github.com/wholebody3d/wholebody3d

Citations (16)

View on Semantic Scholar

Summary

The paper introduces a novel dataset and multi-view annotation pipeline that creates 3D whole-body poses with 133 keypoints over 100K images.
The paper defines three benchmark tasks for lifting complete and incomplete 2D poses and estimating 3D poses from a single RGB image.
The paper demonstrates that transformer-based models like Jointformer achieve significant improvements when integrating H3WB, emphasizing its application potential.

An Analysis of H3WB: A New Benchmark for 3D Whole-Body Pose Estimation

The paper "H3WB: Human3.6M 3D WholeBody Dataset and Benchmark" presents a comprehensive approach to addressing the challenges inherent in 3D human whole-body pose estimation. The researchers introduce a novel dataset, H3WB, which enhances the Human3.6M dataset with annotations based on the COCO WholeBody layout. This extension includes 133 whole-body keypoints across 100,000 images, encompassing facial, hand, body, and foot keypoints.

Technical Contributions

Multi-View Annotation Pipeline: The authors devised a new multi-view annotation pipeline to provide fully annotated 3D whole-body poses using existing multi-view datasets. This methodology was pivotal in overcoming the inadequacies in the coverage of existing datasets for 3D whole-body pose estimation.
New Benchmark Tasks: The paper defines three critical tasks for evaluating 3D whole-body pose estimation:
- Lifting 3D whole-body poses from complete 2D poses.
- Lifting from incomplete 2D poses (accounting for occlusions).
- Estimation from a single RGB image.

These tasks are structured to challenge current models and stimulate further research in the complexity of whole-body pose understanding.

Automated Annotation for TotalCapture: In addition to the H3WB dataset, the authors provided automated 3D whole-body annotations for the TotalCapture dataset, showing improved performance when used in conjunction with H3WB. This highlights the efficacy of their multi-view pipeline in generating reliable annotations.

Experimental Results

The paper presents baselines for each of the proposed tasks using prominent methods from the literature. Notably:

Jointformer outperformed other methods in both the complete and incomplete 2D lifting tasks, reflecting the importance of transformer-based approaches in capturing complex human pose representations.
The inclusion of the TotalCapture dataset (T3WB) with H3WB showed significant performance improvements, particularly in 3D whole-body estimation from images. This underscores the importance of dataset diversity and volume in training robust pose estimation models.

Implications and Future Work

The introduction of H3WB sets a new standard for benchmarking 3D whole-body pose estimation, providing a much-needed resource that combines body, face, and extremities in a unified framework. The alignment with COCO WholeBody's layout further facilitates integration with existing 2D keypoint detection systems, offering opportunities for the development of hybrid 2D/3D methods.

From a practical perspective, this dataset and benchmarking scheme is poised to advance applications in fields such as robotics, sport analysis, and ergonomic studies, where detailed and accurate human pose understanding is vital. The potential to leverage this dataset for real-world scenarios—where occlusions and incomplete poses are common—stands to significantly broaden the scope and applicability of human pose estimation technologies.

The paper opens potential avenues for future work. For instance, exploring improvements in mesh fitting accuracy or leveraging generative models for further refinement of incomplete keypoints could build on the foundational work laid by the H3WB dataset. Furthermore, the community might investigate cross-dataset transfer capabilities to enhance generalizability using the repository of annotations provided by H3WB.

Overall, H3WB represents a substantial contribution to the field of computer vision and human pose estimation, with its robust framework and tasks challenging the community to develop more accurate and comprehensive models for human body tracking and analysis.

PDF Markdown

Related Papers

GitHub

GitHub - wholebody3d/wholebody3d: Official repository of Human3.6M 3D WholeBody (H3WB) dataset (259 stars)

Tweets

https://twitter.com/david_picard/status/1710038098757742663

https://twitter.com/david_picard/status/1710038129753686335

https://twitter.com/georvitymusic/status/1597875495340998656