AMASS: Archive of Motion Capture as Surface Shapes (1904.03278v1)

Published 5 Apr 2019 in cs.CV and cs.GR

Abstract: Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model; here we use SMPL [doi:10.1145/2816795.2818013], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyperparameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11,000 motions, and will be publicly available to the research community.

Authors (5)

Naureen Mahmood (2 papers)
Nima Ghorbani (5 papers)
Nikolaus F. Troje (7 papers)
Gerard Pons-Moll (81 papers)
Michael J. Black (163 papers)

Citations (1,055)

View on Semantic Scholar

Summary

AMASS: Archive of Motion Capture as Surface Shapes

The paper entitled "AMASS: Archive of Motion Capture as Surface Shapes" presents a unified and extensive database of human motion capture data, leveraging a novel method named MoSh++ to integrate data from various sources into a consistent format. AMASS addresses the limitations of existing human motion capture (mocap) datasets, which are typically small and highly domain-specific, thus inhibiting comprehensive model training for applications in computer vision and animation.

Core Contributions

The key contributions of this paper can be summarized as follows:

MoSh++ Method: The authors introduce MoSh++, an enhanced version of the MoSh method, to convert motion capture data into realistic 3D human meshes using the SMPL model. This method is tailored to handle arbitrary markersets, recover soft tissue dynamics, and incorporate hand motions.
Unified Dataset: AMASS integrates 15 different mocap datasets into a single unified database consisting of over 300 subjects, 11,451 motions, and more than 40 hours of motion data, all represented in the SMPL format.
Evaluation and Validation: The effectiveness of MoSh++ is validated using a newly collected dataset of 4D body scans that simultaneously recorded marker-based mocap data. The consistent representation achieved through AMASS makes it valuable for animation, visualization, and training data generation for deep learning models.

Technical Details

The authors detail the limitations of prior mocap datasets and methods, such as MoSh, and describe how MoSh++ overcomes these. The key technical advancements are:

SMPL Model Integration: By replacing the SCAPE model with SMPL, which is widely used and supported, MoSh++ provides a more flexible and compatible skeletal representation and includes a rigged surface mesh.
Dynamic Shape Representation: MoSh++ employs the DMPL model to realistically capture soft tissue dynamics, which is not possible with static identity-based models. Soft tissue motion is modeled using learned shape spaces derived from dynamic scans, producing more natural and metrically accurate results.
Hand Motion Capture: The integration of the MANO hand model into SMPL allows the capturing of intricate hand postures along with body motions, making the dataset richer and more versatile.

Numerical Results

The paper reports significant numerical results, demonstrating the accuracy and utility of their method:

Shape Estimation: MoSh++ surpassed the original MoSh in accuracy, achieving a mean reconstruction error of 7.4mm compared to MoSh’s 12.1mm using a standard 46-marker set.
Pose Estimation with Dynamics: Incorporating dynamic components, MoSh++ further reduced errors to 7.3mm, highlighting its effectiveness in retaining realistic surface deformations.

Implications and Future Work

The creation of AMASS has several practical and theoretical implications:

Deep Learning Training: AMASS provides a substantial volume of high-quality training data crucial for improving the learning and generalization of deep learning models in human pose and motion estimation.
Animation and Visualization: The dataset’s consistency and detail allow for the creation of realistic animations and improved visualization tools, which are valuable for both academic research and industrial applications.
Extended Research Opportunities: Future research can build upon AMASS to explore more sophisticated models of human motion, develop better mocap marker denoising techniques, and extend MoSh++ to handle facial mocap using models like FLAME.

In summary, the "AMASS: Archive of Motion Capture as Surface Shapes" paper delivers a significant resource to the research community, facilitating advancements in machine learning, computer vision, and animation through a comprehensive and unified mocap dataset.

PDF Markdown