- The paper introduces a novel approach that reconstructs relightable, animatable neural avatars from sparse multi-view videos.
- It employs an invertible neural deformation field to accurately model non-rigid body movements and enhance dense correspondence between poses.
- It uses part-based light visibility networks to estimate dynamic shading cues, improving self-shadow effects under varied lighting conditions.
Introduction to Neural Avatars
The creation of digital avatars has various applications from online virtual presence to entertainment and fashion industries. One of the challenges in this area has been developing methods to construct 3D avatars that are not only realistic but can also be relighted and animated to fit different environments and actions. This involves complex processes that include disentangling the intricate interplay of geometry, textures, and lighting. Researchers have explored using neural radiance fields (NeRF) for static objects, but dynamic objects, especially humans, pose extra challenges due to their non-rigid movements that lead to fast-changing shadows and textures.
Methodology Developed
In this paper, an innovative process is introduced for constructing neural avatars that can be animated and relighted based on input from sparse videos. The videos used don't come with predefined lighting conditions, which adds to the complexity of reconstructing accurate avatars.
The core of the approach lies in two main areas:
- Geometry Change Modeling: For this, an invertible neural deformation field is constructed that allows movement between a canonical, unperturbed space and the observation space of every video frame. This bidirectional mapping leverages the geometry of an extracted body mesh to solve the inverse skinning problem. By doing so, it achieves a high-quality reconstruction of the body’s shape and movements.
- Temporal Shading Cues Handling: A novel pose-aware, part-based network is introduced to estimate light occlusion. Dividing the body into parts and treating each part's light visibility separately simplifies the problem and allows for better generalization with limited training data. It enables the modeling of dynamic self-occlusions and correctly rendered shadows under various lighting setups.
Experiments and Results
Extensive experiments were conducted using both synthetic and real-world datasets. The methodology was compared with state-of-the-art techniques, showing that it leads the way in terms of constructing high-quality geometry and dynamically accurate material estimation against varying poses and lighting conditions. It successfully disentangles the key elements of human avatar reconstruction, leading to relighting and animation with photorealistic rendering quality.
Individual Contributions
The contributions of this work are significant:
- They propose the first method that can reconstruct both relightable and animatable human avatars, including plausible shadow effects, from sparse multi-view videos.
- They introduce an invertible deformation field that enhances solving the inverse skinning problem, leading to accurate dense correspondence between different body poses.
- Part-based light visibility networks are suggested, which effectively estimate pose and light-related shading cues, even with limited data.
Conclusion
This development paves the way for realistic, relightable, and animatable neural avatars that can be constructed from minimal, everyday video recordings. The sophisticated techniques used allow for a deeper understanding and rendering of human digital representations, potentially enhancing various applications that rely on digital human avatars. Despite its success, the method currently does not account for face and finger movements and still faces challenges related to the animation of loose clothing and global illumination effects, leaving room for future improvements.