Unite the People: Closing the Loop Between 3D and 2D Human Representations (1701.02468v3)

Published 10 Jan 2017 in cs.CV

Abstract: 3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits "in-the- wild". However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.

Citations (533)

View on Semantic Scholar

Summary

The paper introduces the UP-3D dataset with 31 segments and 91 landmarks to enrich human pose annotations.
The paper enhances SMPLify with a silhouette-based objective and human-sorted model fits to improve 3D body shape estimation.
The paper demonstrates iterative refinement that achieves state-of-the-art performance on HumanEva and Human3.6M in 3D pose and shape prediction.

Overview of "Unite the People: Closing the Loop Between 3D and 2D Human Representations"

The paper presented by Lassner et al. proposes a hybrid methodology to enhance the synergy between 3D and 2D human body representations, focusing on improving 3D human model fitting from 2D images. This research develops an extended dataset, UP-3D, which provides robust annotations for human body segments and landmarks, facilitating the training of predictive models that deliver detailed human representations. By leveraging the SMPLify method and integrating human-based sorting of model fits, the paper offers a systematic approach to overcome the limitations posed by the scarcity of large-scale annotated data.

Methodology

The authors enhance the SMPLify method, a tool that aligns a 3D body model with 2D keypoints to provide a full body model of pose and shape. They introduce a silhouette-based shape objective to improve the accuracy of body shape estimation, overcoming the limitations connected with relying solely on keypoint connections. By employing human annotators to sort good from bad fits, the method yields a refined dataset of high-quality 3D body model fits from initially available datasets such as LSP, LSP-extended, and MPII-HumanPose.

Key Contributions

Data Enhancement and Annotation: The introduction of the UP-3D dataset, which combines multiple sets of human pose data, leads to rich annotations, including 31 segments and 91 landmarks on the human body.
Predictive Model Training: A significant advancement is achieved by training discriminative models that use these detailed annotations, yielding state-of-the-art results in 3D human pose and shape estimation with minimal prior assumptions regarding gender or pose.
Self-Improvement Mechanism: The system exhibits capabilities for self-enhancement, presenting improved estimations from the model which are then integrated back into the dataset, thus enhancing the model's performance iteratively.

Numerical Results

The predictive models trained with the UP-3D dataset displayed remarkable precision:

The models outperformed current standards on the HumanEva and Human3.6M datasets, specifically excelling in 3D human pose estimation tasks.
The integration of 91 landmark predictors demonstrated a marked increase in precision over traditional 14-keypoint based models.

Theoretical Implications

This research bridges a critical gap between 3D and 2D human body representations. It suggests a holistic methodology that can unify various human representation datasets across different applications, promoting improvements in prediction accuracy and inference capability.

Practical Applications

The improved detail and accuracy of human body models have potential applications spanning from gaming and animation to medical imaging and surveillance. Real-time applications stand to benefit significantly from the method's ability to directly predict 3D poses and shapes from 2D inputs efficiently.

Future Directions

The iterative "closing-the-loop" approach proposed by the authors provides a framework for continuous enhancement with limited human oversight, potentially facilitating large-scale deployment. Future work could explore refining the regression tree model to achieve real-time performance, thus unlocking further applications in interactive environments and live video analysis.

In summary, the paper makes substantial advancements in integrating 3D and 2D human representations, offering robust tools and methodologies with considerable implications for improving human modeling in computer vision.

PDF Markdown