Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies (2105.02872v2)

Published 6 May 2021 in cs.CV

Abstract: This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce neural blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the learned blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. Experiments show that our approach significantly outperforms recent human synthesis methods. The code and supplementary materials are available at https://zju3dv.github.io/animatable_nerf/.

Citations (347)

View on Semantic Scholar

Summary

The paper introduces a novel neural blend weight field for aligning 3D human skeletons with radiance fields, enabling controlled deformation for animation.
It employs skeleton-driven deformations to accurately capture and animate human motion, outperforming recent methods on H36M and ZJU-MoCap datasets.
Empirical results validate superior novel view synthesis and pose accuracy, paving the way for applications in VR, telepresence, and digital entertainment.

Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

The paper, "Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies," presents an advancement in the field of computer vision and graphics by addressing the complex challenge of reconstructing an animatable human model from multi-view video data. This work builds upon the foundational concept of neural radiance fields (NeRF), extending its functionality to handle dynamic, non-rigidly deforming scenes, specifically human bodies in motion.

Key Contributions

Neural Blend Weight Fields: The paper introduces a novel representation called neural blend weight fields. Unlike previous approaches that use under-constrained translational vector fields or SE(3) fields, neural blend weight fields utilize 3D human skeletons, providing a more constrained and controlled deformation model. This representation is crucial for generating observation-to-canonical and canonical-to-observation correspondences, facilitating the animation of human models.
Skeleton-Driven Deformation: By integrating blend weight fields with 3D human skeletons, the authors leverage the observable nature of skeletons to regularize the learning of deformation fields. This incorporation allows for explicit control over input skeletal motions, significantly enhancing the usability of the model in animation tasks.
Empirical Validation: The paper demonstrates the efficacy of their approach on the H36M and ZJU-MoCap datasets. The results show substantial improvement in novel view and pose synthesis, outperforming recent state-of-the-art methods. The proposed method effectively reconstructs 3D human shapes and enables their animation under new poses.

Methodological Details

The proposed approach involves decomposing a dynamic human scene into a canonical neural radiance field and a set of deformation fields, generated via neural blend weight fields. This decomposition allows for leveraging a skeleton-driven deformation framework, ensuring effective regularization and optimized rendering, even in novel poses.

Through the combination of a differentiable renderer and these novel fields, the authors address the historical issues of under-constrained optimization by facilitating a more reliable learning process from sparse input data, characteristic of conventional multi-view setups.

Results and Implications

The paper presents strong numerical results, demonstrating superior performance over existing methods. The approach showcases robustness in reconstructing and animating complex human motions, while also presenting a pathway towards more scalable production of animatable digital humans.

The practical implications of this work are notable. By reducing the reliance on complex hardware setups and extensive manual labor, the technique opens avenues for applications in diverse domains such as telepresence, virtual reality, and the entertainment industry. Theoretically, it advances the understanding of integrating neural representations with traditional skeletal animation systems, potentially influencing future research in dynamic neural rendering and human-computer interaction.

Future Prospects

The paper leaves room for future exploration in enhancing the model’s ability to handle more complex, non-rigid deformations, such as those of loose clothing. Further, improving the system to autonomously refine human poses during training or reducing training time could yield considerable efficiency gains. Innovations in this area could extend the framework's applicability to real-time applications, enhancing its utility in interactive settings.

In conclusion, this research marks a significant step in the synthesis and animation of dynamic human bodies, while setting a groundwork for further developments in neural rendering technologies. The integration of neural radiance fields with explicit skeletal deformation models stands out as a promising approach to advancing the field towards broader and more effective practical applications.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/fdellaert/status/1447618386436009991

YouTube

Show All Videos