GASP: Gaussian Avatars with Synthetic Priors (2412.07739v1)

Published 10 Dec 2024 in cs.CV, cs.AI, and cs.GR

Abstract: Gaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rendering, or they can be trained with a single camera but only rendered at high quality from this fixed viewpoint. An ideal model would be trained using a short monocular video or image from available hardware, such as a webcam, and rendered from any view. To this end, we propose GASP: Gaussian Avatars with Synthetic Priors. To overcome the limitations of existing datasets, we exploit the pixel-perfect nature of synthetic data to train a Gaussian Avatar prior. By fitting this prior model to a single photo or video and fine-tuning it, we get a high-quality Gaussian Avatar, which supports 360$^\circ$ rendering. Our prior is only required for fitting, not inference, enabling real-time application. Through our method, we obtain high-quality, animatable Avatars from limited data which can be animated and rendered at 70fps on commercial hardware. See our project page (https://microsoft.github.io/GASP/) for results.

Summary

The paper presents a novel method that uses synthetic priors to overcome single-camera limitations for creating realistic digital avatars.
It employs a three-stage fitting process that optimizes latent features and refines Gaussian properties to bridge the gap from synthetic to real data.
The approach enables real-time rendering at 70fps on commercial hardware, outperforming state-of-the-art techniques in quality and efficiency.

Overview of the GASP Framework for 3D Avatar Synthesis

The paper "GASP: Gaussian Avatars with Synthetic Priors" presents a novel approach to creating photorealistic, animatable avatars using Gaussian Splatting for real-time rendering. Gaussian Splatting has emerged as a popular methodology for generating high-quality, realistic digital avatars, but it faces limitations when trained using only single-camera setups, often necessitating expensive multi-camera rigs. GASP seeks to bridge this gap by leveraging synthetic priors to facilitate avatar creation from easily-captured data, such as a webcam or smartphone, supporting unrestricted 360-degree rendering.

Key Contributions and Methodology

The authors introduce several contributions that distinguish their work from prior methodologies:

Synthetic Priors for Gaussian Avatars: The framework utilizes synthetic data with perfect pixel annotations to train a Gaussian Avatar prior. This serves as a robust base model that fills in orientations and data gaps inherent in single video or image captures.
Three-Stage Fitting Process: The initial fitting in the GASP methodology involves optimizing a latent feature vector. This is followed by fine-tuning the model, and finally refining the Gaussian properties using optimization techniques specific to Gaussian Splatting. This staged approach is crucial in addressing the domain gap introduced by synthetic data and ensures the rendered avatars accurately mimic human likenesses from limited initial data.
Real-Time Capabilities: A significant advantage of the proposed methodology is that it requires neural network computation only during the training and fitting phases. During inference, synthesizing and animating the Avatar is possible at 70fps on commercial hardware without the need for extensive GPU resources, making it viable for practical implementation.

Experimental Results and Analysis

The paper presents quantitative and qualitative evaluations across multiple experimental setups: monocular video, single image, and multi-camera. Notably, in the monocular and single image scenarios, the GASP method significantly outperforms existing state-of-the-art approaches, including Gaussian Avatar and DiffusionRig, in metrics such as PSNR, SSIM, LPIPS, and subjective quality measures. In simulations where GASP is trained with a three-image setup, it still manifests superior performance, demonstrating its flexibility.

Implications for Computational Graphics and AI

The implications of the GASP framework extend beyond mere avatar synthesis. The method signifies a step towards democratizing the creation of high-fidelity digital humans by minimizing the need for complex capture setups. Moreover, the integration of synthetic priors signifies an innovative approach to leveraging vast amounts of synthetic data for realistic texture and geometry modeling in real-time applications. This has broad applicability in industries like gaming, virtual reality, and online conferencing, where photorealistic digital representations are increasingly essential.

Future Directions

While the method shows considerable promise, limitations persist, notably in rendering less visible parts like the back of the head. Future research could delve into enhancing the realism of synthetic priors, possibly integrating advanced lighting models or pursuing hybrid approaches incorporating real and synthetic data. Moreover, there are ethical considerations, especially concerning privacy and consent, which require addressing as this technology gains adoption.

In conclusion, GASP represents a significant development in using synthetic data and Gaussian modelling for avatar creation, offering practical solutions for real-time applications while maintaining high fidelity and realism.

PDF Markdown

Related Papers

GitHub

GASP

Tweets

https://twitter.com/janusch_patas/status/1866750461749105101

https://twitter.com/jack_r_saunders/status/1866875614822842825