Self-Supervised Learning of 3D Human Pose using Multi-view Geometry (1903.02330v2)

Published 6 Mar 2019 in cs.CV

Abstract: Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera geometry which are subsequently used to train a 3D pose estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new state-of-the-art among weakly/self-supervised methods. Furthermore, we propose a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth. Code and pretrained models are available at https://github.com/mkocabas/EpipolarPose

Authors (3)

Muhammed Kocabas (18 papers)
Salih Karagoz (2 papers)
Emre Akbas (32 papers)

Citations (262)

View on Semantic Scholar

Summary

Self-Supervised Learning of 3D Human Pose using Multi-view Geometry: An Expert Overview

The paper "Self-Supervised Learning of 3D Human Pose using Multi-view Geometry" introduces EpipolarPose, a self-supervised approach to 3D human pose estimation that leverages multi-view imagery without needing 3D ground-truth data or camera extrinsics. This innovative method addresses the challenges posed by the scarcity of comprehensive 3D labeled datasets, particularly in non-laboratory environments.

Methodology and Contributions

EpipolarPose employs multi-view geometry principles to generate 3D poses using estimated 2D poses from synchronized image captures of multiple cameras. The novel approach bypasses the requirement for camera extrinsics by utilizing epipolar geometry to derive essential 3D information. The training process involves two branches: an upper branch, which learns to estimate 3D poses, and a lower branch that remains frozen to generate reliable 2D pose estimates.

The paper posits a critical contribution with the Pose Structure Score (PSS), designed to evaluate the structural integrity of poses beyond traditional metrics like MPJPE or PCK, which often fail to capture structural discrepancies. PSS introduces a scale-invariant metric sensitive to structural errors by employing unsupervised clustering of ground-truth poses to assess pose plausibility.

Numerical Results and Implications

EpipolarPose achieved notable results on benchmark datasets such as Human3.6M and MPI-INF-3DHP, setting new standards for weakly or self-supervised methods. The method shows significant advantages over prior approaches by Pavlakos and Rhodin, achieving improvements in MPJPE accuracy while requiring less supervision. Quantitatively, EpipolarPose outperforms other self-supervised methods by leveraging robust 2D pose detection cascaded with innovative 3D learning strategies.

Additionally, the refinement unit introduced offers post-training enhancements that further reduce errors by refining noisy 3D predictions through learned patterns, bridging the gap to fully supervised results. This modular aspect of EpipolarPose exemplifies the potential for adaptable deployments in diverse settings, making it a formidable candidate for real-world applications.

Theoretical and Practical Implications

From a theoretical standpoint, the work expands the understanding of leveraging geometric constraints in pose estimation, providing a framework that can be extended to other domains requiring minimal supervision. Practical implications extend to fields like autonomous driving, robotics, and AR/VR, where robust 3D pose understanding improves interactive and perceptive capabilities without necessitating extensive labeling.

Future Directions

Future avenues for this research could explore optimizing the integration of PSS further into the learning pipeline as a loss function, advancing its role from purely evaluative to contributory in the training cycle. Additionally, extensions to other 3D tasks or integration with unsupervised domain adaptation strategies could widen its applicability and robustness across varied environments and datasets. The potential to generalize epipolar-based self-supervision across other structured tasks remains an enticing prospect.

In summary, EpipolarPose offers a substantial advancement in self-supervised 3D human pose estimation by smartly navigating the limitations of data availability and leveraging intrinsic geometric properties. Its contributions to metric innovation and the methodological framework suggest broader impacts on future research and applications in 3D computer vision.

PDF Markdown

Related Papers

GitHub

GitHub - mkocabas/EpipolarPose: Self-Supervised Learning of 3D Human Pose using Multi-view Geometry (CVPR2019) (592 stars)

Tweets

https://twitter.com/mkocab_/status/1103553879294267392

https://twitter.com/LaForge_AI/status/1265602406114627586

https://twitter.com/UiuxVroo/status/1265590217353957376

https://twitter.com/angsuman/status/1266835838588997632

https://twitter.com/shigekzishihara/status/1142459457919565825

https://twitter.com/rifcoru/status/1113680416660234240

YouTube

Show All Videos