AGORA: Avatars in Geography Optimized for Regression Analysis (2104.14643v1)

Published 29 Apr 2021 in cs.CV

Abstract: While the accuracy of 3D human pose estimation from images has steadily improved on benchmark datasets, the best methods still fail in many real-world scenarios. This suggests that there is a domain gap between current datasets and common scenes containing people. To obtain ground-truth 3D pose, current datasets limit the complexity of clothing, environmental conditions, number of subjects, and occlusion. Moreover, current datasets evaluate sparse 3D joint locations corresponding to the major joints of the body, ignoring the hand pose and the face shape. To evaluate the current state-of-the-art methods on more challenging images, and to drive the field to address new problems, we introduce AGORA, a synthetic dataset with high realism and highly accurate ground truth. Here we use 4240 commercially-available, high-quality, textured human scans in diverse poses and natural clothing; this includes 257 scans of children. We create reference 3D poses and body shapes by fitting the SMPL-X body model (with face and hands) to the 3D scans, taking into account clothing. We create around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA consists of 173K individual person crops. We evaluate existing state-of-the-art methods for 3D human pose estimation on this dataset and find that most methods perform poorly on images of children. Hence, we extend the SMPL-X model to better capture the shape of children. Additionally, we fine-tune methods on AGORA and show improved performance on both AGORA and 3DPW, confirming the realism of the dataset. We provide all the registered 3D reference training data, rendered images, and a web-based evaluation site at https://agora.is.tue.mpg.de/.

Authors (6)

Priyanka Patel (11 papers)
Chun-Hao P. Huang (11 papers)
Joachim Tesch (6 papers)
David T. Hoffmann (7 papers)
Shashank Tripathi (14 papers)
Michael J. Black (163 papers)

Citations (187)

View on Semantic Scholar

Summary

The paper introduces AGORA, a synthetic 3D human pose dataset that extends benchmarks by incorporating diverse demographics and complex scene conditions.
The paper employs the SMPL-X model to render high-quality scans, including precise interpolations to capture unique child body shapes.
The paper’s evaluations demonstrate that fine-tuning on AGORA significantly enhances 3D pose estimation accuracy on both synthetic and real-world datasets.

Avatars in Geography Optimized for Regression Analysis (AGORA)

Overview

The paper presents the AGORA dataset, a groundbreaking contribution to the field of 3D human pose estimation that seeks to address the limitations of existing datasets. This synthetic dataset introduces a novel approach by incorporating photorealistic images with accurate 3D human pose and shape annotations, extending current benchmarks in important ways. AGORA is designed to drive advancements in human pose estimation by addressing more challenging scenarios often neglected by previous datasets, such as diverse clothing, mixed-age groups, varied ethnicities, and complex environmental conditions.

Key Contributions

AGORA consists of 4,240 high-quality, textured human scans in diverse poses and clothing, including 257 scans of children, sourced from several commercial entities. These were meticulously rendered using SMPL-X, a parametric 3D body model inclusive of face and hands, to yield highly detailed reference data. The dataset includes 173,000 individual person crops spread across 14,000 training images and 3,000 test images, each featuring multiple individuals and complex scenes, lighting, and occlusions.

A significant contribution of AGORA lies in its handling of child body shapes. The paper extends the SMPL-X model to better capture the unique characteristics of children's body shape by interpolating between adult and infant body templates.

Numerical Results and Evaluation

The paper rigorously evaluates the performance of current state-of-the-art (SOTA) methods for 3D human pose estimation on the AGORA dataset. Notable findings indicate noticeable performance deterioration when existing methods handle images of children. Furthermore, upon fine-tuning methods specifically on AGORA, substantial improvements in performance were observed for both AGORA and the 3DPW dataset, affirming the realism and robustness of AGORA.

Implications and Future Prospects

Practically, AGORA sets a new standard for training datasets that enable the development of more robust 3D pose estimation algorithms, particularly in scenarios involving complex occlusions and diverse demographics. The richness of AGORA can extend applicability beyond pose estimation to areas such as 3D clothing modeling and neural avatar creation.

Theoretically, AGORA challenges researchers to explore new directions in multi-person pose estimation, emphasizing the need to overcome hurdles related to occlusion, detection accuracy, and diverse human shapes, including children. The framework encourages advancements in integrating perspective realism into synthetically generated data, potentially inspiring future innovation in highly realistic virtual landscapes.

Given AGORA’s advancements, future work should focus on enhancing the dataset with additional complexity, including varied scene heights, dynamic scenes, larger crowds, and multi-view sequences to further invigorate research in 3D human pose estimation.

PDF Markdown