HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field (2309.17128v1)

Published 29 Sep 2023 in cs.CV

Abstract: The problem of modeling an animatable 3D human head avatar under light-weight setups is of significant importance but has not been well solved. Existing 3D representations either perform well in the realism of portrait images synthesis or the accuracy of expression control, but not both. To address the problem, we introduce a novel hybrid explicit-implicit 3D representation, Facial Model Conditioned Neural Radiance Field, which integrates the expressiveness of NeRF and the prior information from the parametric template. At the core of our representation, a synthetic-renderings-based condition method is proposed to fuse the prior information from the parametric model into the implicit field without constraining its topological flexibility. Besides, based on the hybrid representation, we properly overcome the inconsistent shape issue presented in existing methods and improve the animation stability. Moreover, by adopting an overall GAN-based architecture using an image-to-image translation network, we achieve high-resolution, realistic and view-consistent synthesis of dynamic head appearance. Experiments demonstrate that our method can achieve state-of-the-art performance for 3D head avatar animation compared with previous methods.

Citations (26)

View on Semantic Scholar

Summary

The paper introduces a novel hybrid method that integrates explicit facial models with neural radiance fields to achieve high-fidelity head avatar synthesis with precise expression control.
It employs synthetic renderings and feature plane generators to effectively blend static priors with dynamic head details, ensuring robust performance under lightweight setups.
Experimental results demonstrate significant improvements over state-of-the-art techniques, offering enhanced realism and stability for interactive virtual applications.

High-Fidelity Head Avatars Using Facial Model Conditioned Neural Radiance Fields

The paper presents a novel approach to animatable 3D human head avatar modeling using a hybrid explicit-implicit representation, specifically integrating a Facial Model Conditioned Neural Radiance Field (NeRF). This method addresses challenges in synthesizing realistic portrait images while maintaining precise expression control under lightweight setup conditions. Previous 3D head modeling approaches often struggled to balance realism and accuracy, either requiring dense capture systems or not effectively modeling expression dynamics. This paper proposes a significant advancement by combining the expressiveness of NeRF with parametric facial models, providing both high-fidelity appearance and controllable dynamics.

Methodology

The core of this research lies in its hybrid representation, where the 3D head avatar is described using both explicit parametric models and implicit neural fields. The Neural Radiance Field is conditioned by facial model renderings, integrating prior information without constraining the topological flexibility found in complex head details, such as hair or accessories. Key developments include:

Synthetic-Renderings-Based Conditioning: The method leverages the synthetic renderings of parametric face models to create feature volumes for the canonical space of dynamic head appearances. This enables robust fine-grained control over expressions while accommodating topological variations.
Feature Plane Generators: Using orthogonal rendering from front and side views, the system generates feature planes that feed into a lightweight MLP module for density and color prediction. This process capitalizes on convolutional networks to fuse image features efficiently.
Pose and Expression Embeddings: The solution involves conditioning the neural representation using learnable embeddings modulated in conjunction with input expressions via a convolutional network. This implementation enhances the generalization over unseen expressions and stabilizes animation, preventing shape inconsistencies.
Head Motion Decoupling: The framework also incorporates a mechanism to separate head movements from the torso using a learned linear blend skinning weight field. This ensures that body motions remain unaffected by head poses, enabling more realistic animations.

Experimental Evaluation

Under both monocular and sparse-view camera conditions, the proposed method outperforms existing state-of-the-art techniques. Experiments show substantial improvements in visual quality and stability, yielding state-of-the-art performance on several benchmarks compared to methods like Nerface and RigNeRF.

For quantitative comparison, metrics such as PSNR and LPIPS illustrate significant enhancements in photo-realism and detail preservation. The integration of adversarial training to enhance visual quality through a GAN-based image-to-image translation network further consolidates this method’s superiority.

Implications and Future Work

This research pushes the boundaries of head avatar modeling by integrating detailed facial dynamics without the need for complex capture setups. Practically, this can impact areas such as virtual reality, telepresence, and interactive media, where realistic and controllable virtual avatars are essential.

Theoretically, this work brings insights into hybrid modeling methods blending explicit and implicit data. Future directions could explore broader applications, optimizing the system for challenging scenarios like extreme expressions, or even developing the approach to handle other body parts or full-body avatars.

This paper's integration of neural radiance fields conditioned on parametric models represents an important step forward in the realistic synthesis and control of animated avatars, with both theoretical implications and practical applications across various digital and interactive platforms.

PDF Markdown

Related Papers

Tweets

https://twitter.com/camenduru/status/1745139702200410532

https://twitter.com/knishimae0531/status/1745232439289802872

YouTube

Show All Videos