SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (2104.03313v2)

Published 7 Apr 2021 in cs.CV

Abstract: We present SCANimate, an end-to-end trainable framework that takes raw 3D scans of a clothed human and turns them into an animatable avatar. These avatars are driven by pose parameters and have realistic clothing that moves and deforms naturally. SCANimate does not rely on a customized mesh template or surface mesh registration. We observe that fitting a parametric 3D body model, like SMPL, to a clothed human scan is tractable while surface registration of the body topology to the scan is often not, because clothing can deviate significantly from the body shape. We also observe that articulated transformations are invertible, resulting in geometric cycle consistency in the posed and unposed shapes. These observations lead us to a weakly supervised learning method that aligns scans into a canonical pose by disentangling articulated deformations without template-based surface registration. Furthermore, to complete missing regions in the aligned scans while modeling pose-dependent deformations, we introduce a locally pose-aware implicit function that learns to complete and model geometry with learned pose correctives. In contrast to commonly used global pose embeddings, our local pose conditioning significantly reduces long-range spurious correlations and improves generalization to unseen poses, especially when training data is limited. Our method can be applied to pose-aware appearance modeling to generate a fully textured avatar. We demonstrate our approach on various clothing types with different amounts of training data, outperforming existing solutions and other variants in terms of fidelity and generality in every setting. The code is available at https://scanimate.is.tue.mpg.de.

Citations (203)

View on Semantic Scholar

Summary

The paper introduces a dual-network skinning approach that generates realistic, pose-dependent avatars without relying on pre-defined garment templates.
It employs a multi-stage training regime with cycle consistency constraints, achieving superior geometry and texture detail compared to state-of-the-art methods.
The methodology streamlines digital avatar production for VR, film, and gaming, setting the stage for future advancements in handling diverse garment topologies.

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks

The paper introduces SCANimate, a novel approach aimed at automating the creation of realistic and animatable 3D human avatars, termed Scanimats, from raw scan data. The contribution lies in the ability to produce pose-dependent deformations and textures for avatars without the need for pre-defined garment templates or mesh registration, overcoming significant limitations in current human modeling techniques.

Methodology

SCANimate employs a weakly supervised learning methodology where the focus is on aligning raw human scans with varying poses into a canonical form. The technique revolves around a dual-network system incorporating both forward and inverse skinning networks. These networks are based on multi-layer perceptrons that map Cartesian coordinates and embedded pose features into high-dimensional latent spaces that encode skinning weights.

The learning process is structured in three stages: pretraining the skinning networks with carefully selected loss weights, joint training using cycle consistency constraints, and finally fixing these networks' weights to train the geometry and texture networks. This multi-stage training regime ensures that the system refines the pose-dependent deformations accurately, even with the inherent noise and imperfections in the input data.

Results

The results presented demonstrate the model's efficacy in generating high-fidelity animatable avatars. The authors provide both qualitative and quantitative evaluations, comparing SCANimate with state-of-the-art methods such as CAPE and NASA. The model shows a clear advantage, handling extrapolation tasks with superior geometry and texture detail retention, which is particularly noteworthy for complex clothing items with irregular topology.

Implications

Practically, SCANimate's approach can significantly streamline the production of digital avatars for various applications including virtual reality, film, and gaming. Its ability to operate without garment-specific constraints enhances scalability and reduces the reliance on labor-intensive manual preparation of datasets.

Theoretically, the introduction of a dual-network skinning approach that leverages a comprehensive embedding of local poses to distinguish variations in input data can be seen as a substantive step towards more generalized solutions in human modeling. It also raises intriguing questions about the potential for further advancement when combining these concepts with generative models or other high-dimensional representation methods.

Future Directions

Future work could explore how SCANimate's methodology could be adapted to cover broader ranges of garments, particularly those with topologies significantly different from the human body, such as skirts. Additionally, integrating advancements in neural representation could lead to lighter models with improved performance, especially in real-time settings. There is also scope in investigating garment-specific hyperparameter tuning and robustness enhancements to handle diverse clothing types and styles more effectively.

In conclusion, SCANimate presents an influential approach in the domain of 3D avatar modeling, providing both practical benefits and setting the stage for further research developments in the field.

PDF Markdown

Related Papers

YouTube

Show All Videos