FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding (2312.02214v2)

Published 3 Dec 2023 in cs.CV and cs.GR

Abstract: We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/

Citations (20)

View on Semantic Scholar

Summary

The paper introduces FlashAvatar, a method that uses a uniform 3D Gaussian field and spatial offsets to achieve rapid, high-fidelity avatar rendering.
The paper demonstrates how combining geometric priors with non-neural radiance fields captures complex facial details and accessories more effectively.
The paper achieves 300 FPS rendering, marking an order of magnitude improvement over traditional methods for interactive VR/AR applications.

The field of digital avatars has taken a significant leap forward with the development of FlashAvatar, an innovative method for reconstructing and rendering high-fidelity avatars. Unlike previous models that could take hours to process and render, FlashAvatar achieves an impressive rendering speed of 300 frames per second (FPS) at 512x512 resolution, all on a commonly available Nvidia RTX 3090 GPU. This advancement opens a new avenue for applications requiring real-time interactive digital humans, such as virtual reality (VR) and augmented reality (AR).

At the core of FlashAvatar lies a uniform 3D Gaussian field, embedded within the surface of a parametric face model. To account for the fine details that are not on the surface, the model also learns additional spatial offsets. This approach allows for the efficient capture of not only the broad facial expressions but also the subtler nuances like hair details and accessories, including glasses. The use of 3D Gaussians as rendering primitives is central to achieving both high visual fidelity and high rendering performance.

Traditional methods such as the use of 3D morphable models (3DMMs) had their limitations, such as an inability to capture complex hairstyles or fine details. Similarly, existing representations based on neural implicit functions could capture finer details but were hampered by slow inference speeds. FlashAvatar overcomes these challenges through a blend of geometric priors and non-neural radiance fields, enabling rapid rendering without compromising on detail.

To effectively train FlashAvatar, a series of innovative techniques are employed. For instance, uniform UV sampling and critical mesh-attached initialization are leveraged to bypass inefficiencies related to training a dynamic offset field. The impactful combination of these strategies means that FlashAvatar can not only produce superior renderings compared to state-of-the-art methods but also do so almost an order of magnitude faster.

FlashAvatar's representation ability has been rigorously tested against challenging cases. It delivers highly detailed renditions of facial expressions, fine structures, and complex accessories that are coherent, dynamic, and highly realistic. Furthermore, facial reenactment tasks that demand real-time rendering speeds can be executed with no loss in visual quality. The framework even allows for easy adjustment to camera view, making it versatile for a range of applications where various perspectives may be required.

Despite its advantages, FlashAvatar does have limitations worth noting. The system is heavily reliant on accurate initial surface-embedded Gaussian field arrangements and is sensitive to tracking errors, particularly for global poses. However, these are recognized as challenges to be addressed in future iterations, with the aim of further enhancing the robustness and fidelity of the model.

In conclusion, FlashAvatar represents a significant step forward in avatar modeling, offering a speedy and efficient way to create detailed digital representations that can have applications in various interactive VR and AR platforms. While acknowledging potential risks related to the misuse of digital avatars, the research team encourages the responsible use of their work and believes in contributing positively to the field by improving and potentially creating tools for the detection of forgeries.

PDF Markdown

Related Papers

GitHub

FlashAvatar

HackerNews

FlashAvatar: High-Fidelity Digital Avatar Rendering at 300FPS (2 points, 0 comments)