GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians (2412.13983v1)

Published 18 Dec 2024 in cs.CV

Abstract: Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size. The code will be released at: https://github.com/ucwxb/GraphAvatar

Summary

The paper introduces GraphAvatar, a GNN-driven method that generates 3D Gaussians to render high-quality head avatars with minimal storage overhead.
It employs a novel graph-guided optimization module to refine face-tracking data, resulting in improved visual fidelity and enhanced performance metrics.
The approach achieves a compact model size (~10MB) while outperforming NeRF-based and 3DGS methods, paving the way for scalable VR/AR applications.

GraphAvatar: Enhancing Head Avatars through GNN-Generated 3D Gaussians

The paper "GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians" presents a method for rendering photorealistic head avatars using Graph Neural Networks (GNNs) to generate attributes of 3D Gaussians, addressing critical challenges in rendering quality, speed, and storage efficiency. The proposed method, GraphAvatar, applies GNNs for generating compact 3D Gaussians, significantly reducing storage overhead while maintaining high visual fidelity, which is a notable contribution against existing techniques reliant on storage-intensive methods like Neural Radiance Fields (NeRF).

Problem Context and Limitations of Existing Methods

Rendering head avatars from arbitrary viewpoints has significant applications in virtual and augmented reality. Traditional approaches such as those based on NeRF suffer from inefficiencies in real-time rendering and require substantial storage, primarily due to the implicit nature of NeRF. Recent advancements leveraging 3D Gaussian Splatting (3DGS) have provided a partial solution by improving real-time performance. However, such methods still face storage overhead challenges due to a large number of 3D Gaussian parameters and dependencies on exact face-tracking, which is prone to errors.

Methodology

Graph-Based 3D Gaussian Generation:

GraphAvatar innovatively utilizes GNNs to generate 3D Gaussians from mesh data. This approach comprises training two GNN models—geometric and appearance Graph Unets—to capture the attributes of 3D Gaussians based on tracked facial meshes. By storing these GNN models instead of all Gaussian parameters, GraphAvatar reduces storage to a mere 10MB.

Graph-Guided Optimization:

To mitigate errors in face-tracking data, a novel graph-guided optimization module is introduced. This module refines face-tracking parameters during training by resolving temporal dependencies and optimizing pose and expression coefficients through cross-attention mechanisms.

3D-Aware Enhancer for Post-Processing:

This system integrates a 3D-aware enhancer for improving rendering quality, addressing the over-smoothing inherent in GNNs by incorporating depth map information into post-processing to enhance image details.

Experimental Validation

GraphAvatar was evaluated against existing NeRF and 3DGS-based methods across diverse datasets, including INSTA and NeRFBlendShape (NBS). The proposed method demonstrated superior visual fidelity as evidenced by higher PSNR and SSIM values, along with reduced LPIPS, establishing its effectiveness in detail representation such as eyes and mouth. Particularly, the compact model size of 10.8MB is a significant improvement over baselines, such as FlashAvatar and Gaussian Head Avatar, which typically suffer from much larger storage requirements.

Significance and Future Perspectives

The implications of GraphAvatar extend beyond immediate applications in VR/AR. Its compact model size without compromising quality makes it scalable for complex systems where storage constraints are critical. Moreover, its robustness to face-tracking inaccuracies opens pathways for enriched interactive experiences in digital environments.

Conclusion

GraphAvatar represents a substantial step forward in the field of photorealistic head avatar rendering using GNN-driven 3D Gaussian generation. Its strategic use of GNNs allows it to overcome the challenges posed by the need for high fidelity, reduced storage, and efficient rendering. Future directions might involve further refinement in temporal dynamics and integration with advanced machine learning frameworks to extend its applicability in dynamic scenes and real-time user interactions. The findings present a promising avenue to elevate avatar realism and efficiency significantly.

PDF Markdown

Related Papers

GitHub

GitHub - ucwxb/GraphAvatar: [AAAI2025] GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians (15 stars)

Tweets

https://twitter.com/gm8xx8/status/1869621204799082521