3D Human Pose Estimation in the Wild by Adversarial Learning (1803.09722v2)

Published 26 Mar 2018 in cs.CV

Abstract: Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.

Authors (6)

Wei Yang (349 papers)
Wanli Ouyang (358 papers)
Xiaolong Wang (243 papers)
Jimmy Ren (32 papers)
Hongsheng Li (340 papers)
Xiaogang Wang (230 papers)

Citations (367)

View on Semantic Scholar

Summary

Adversarial Learning for 3D Human Pose Estimation in the Wild

The paper "3D Human Pose Estimation in the Wild by Adversarial Learning" addresses the challenge of estimating 3D human poses in unconstrained real-world scenarios using monocular images. While deep convolutional neural networks (DCNNs) have demonstrated substantial progress in 3D pose estimation within controlled environments, their effectiveness diminishes when applied to in-the-wild images due to the absence of 3D pose annotations. This research introduces an adversarial learning framework that bridges this domain gap by leveraging 2D pose annotations prevalent in unsupervised image datasets.

Key Contributions

Adversarial Learning Framework: The core contribution of the paper is an adversarial learning paradigm that facilitates the transfer of 3D pose structures from fully annotated datasets to in-the-wild datasets lacking 3D annotations. By employing an adversarial setup, the framework enforces the pose estimator to produce anthropometrically valid poses even when direct 3D supervision is unavailable.
Multi-Source Discriminator: A significant innovation is the introduction of a multi-source discriminator. This component is pivotal in discerning plausible 3D pose predictions by integrating three critical information sources: the original image, a geometric descriptor capturing pairwise joint relationships, and the heatmap inputs representing joint positions and depths. The inclusion of these diverse channels enriches the discriminator's discriminatory capability, thereby enhancing the generator's (pose estimator) performance.
Geometric Descriptor: The design of a geometric descriptor using pairwise relative positions and distances between joints as a novel information source represents a methodological advancement. This descriptor affords the model an intrinsic understanding of body articulation constraints, ensuring the production of anatomically plausible poses.

Experimental Validation

The proposed framework's efficacy is validated through comprehensive experiments on benchmarks such as Human3.6M, MPI-INF-3DHP, and MPII Human Pose. The results underscore considerable improvements over existing state-of-the-art methods, especially under Protocol #2 where a significant reduction in Mean Per Joint Position Error (MPJPE) is observed. The adversarially trained model demonstrates not only an enhancement in quantitative performance but also robustness in cross-domain applicability, showing improved generalization on unseen datasets.

Implications and Future Directions

The implications of this research are multifold. Practically, it showcases the potential of adversarial learning frameworks to augment performance in scenarios where obtaining comprehensive 3D annotations is impractical or costly. Theoretically, the methodology opens avenues for future exploration into multi-source adversarial setups that incorporate varied and complex domain knowledge to enhance model generalization capabilities.

Moving forward, future research could explore augmenting the diversity of training views to better emulate real-world conditions and investigate more sophisticated architectures that could potentially enhance the perceptual fidelity of estimated poses. As the field advances, such frameworks may also be adapted into more complex applications such as real-time pose tracking in dynamic environments or integrated into robotics systems requiring enhanced interaction capabilities.

In conclusion, this paper presents a well-founded step toward addressing the challenges of 3D human pose estimation in the wild, providing substantial ground for both current application and future exploration in computer vision and related domains.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos