Relightable and Animatable Neural Avatars from Videos (2312.12877v1)

Published 20 Dec 2023 in cs.CV

Abstract: Lightweight creation of 3D digital avatars is a highly desirable but challenging task. With only sparse videos of a person under unknown illumination, we propose a method to create relightable and animatable neural avatars, which can be used to synthesize photorealistic images of humans under novel viewpoints, body poses, and lighting. The key challenge here is to disentangle the geometry, material of the clothed body, and lighting, which becomes more difficult due to the complex geometry and shadow changes caused by body motions. To solve this ill-posed problem, we propose novel techniques to better model the geometry and shadow changes. For geometry change modeling, we propose an invertible deformation field, which helps to solve the inverse skinning problem and leads to better geometry quality. To model the spatial and temporal varying shading cues, we propose a pose-aware part-wise light visibility network to estimate light occlusion. Extensive experiments on synthetic and real datasets show that our approach reconstructs high-quality geometry and generates realistic shadows under different body poses. Code and data are available at \url{https://wenbin-lin.github.io/RelightableAvatar-page/}.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a novel approach that reconstructs relightable, animatable neural avatars from sparse multi-view videos.
It employs an invertible neural deformation field to accurately model non-rigid body movements and enhance dense correspondence between poses.
It uses part-based light visibility networks to estimate dynamic shading cues, improving self-shadow effects under varied lighting conditions.

Introduction to Neural Avatars

The creation of digital avatars has various applications from online virtual presence to entertainment and fashion industries. One of the challenges in this area has been developing methods to construct 3D avatars that are not only realistic but can also be relighted and animated to fit different environments and actions. This involves complex processes that include disentangling the intricate interplay of geometry, textures, and lighting. Researchers have explored using neural radiance fields (NeRF) for static objects, but dynamic objects, especially humans, pose extra challenges due to their non-rigid movements that lead to fast-changing shadows and textures.

Methodology Developed

In this paper, an innovative process is introduced for constructing neural avatars that can be animated and relighted based on input from sparse videos. The videos used don't come with predefined lighting conditions, which adds to the complexity of reconstructing accurate avatars.

The core of the approach lies in two main areas:

Geometry Change Modeling: For this, an invertible neural deformation field is constructed that allows movement between a canonical, unperturbed space and the observation space of every video frame. This bidirectional mapping leverages the geometry of an extracted body mesh to solve the inverse skinning problem. By doing so, it achieves a high-quality reconstruction of the body’s shape and movements.
Temporal Shading Cues Handling: A novel pose-aware, part-based network is introduced to estimate light occlusion. Dividing the body into parts and treating each part's light visibility separately simplifies the problem and allows for better generalization with limited training data. It enables the modeling of dynamic self-occlusions and correctly rendered shadows under various lighting setups.

Experiments and Results

Extensive experiments were conducted using both synthetic and real-world datasets. The methodology was compared with state-of-the-art techniques, showing that it leads the way in terms of constructing high-quality geometry and dynamically accurate material estimation against varying poses and lighting conditions. It successfully disentangles the key elements of human avatar reconstruction, leading to relighting and animation with photorealistic rendering quality.

Individual Contributions

The contributions of this work are significant:

They propose the first method that can reconstruct both relightable and animatable human avatars, including plausible shadow effects, from sparse multi-view videos.
They introduce an invertible deformation field that enhances solving the inverse skinning problem, leading to accurate dense correspondence between different body poses.
Part-based light visibility networks are suggested, which effectively estimate pose and light-related shading cues, even with limited data.

Conclusion

This development paves the way for realistic, relightable, and animatable neural avatars that can be constructed from minimal, everyday video recordings. The sophisticated techniques used allow for a deeper understanding and rendering of human digital representations, potentially enhancing various applications that rely on digital human avatars. Despite its success, the method currently does not account for face and finger movements and still faces challenges related to the animation of loose clothing and global illumination effects, leaving room for future improvements.

PDF Markdown

Related Papers

GitHub

Wenbin Lin