MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting (2412.04955v2)

Published 6 Dec 2024 in cs.CV

Abstract: Reconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields (NeRF), which have been limited by training and rendering speed. Recent methods based on 3D Gaussian Splatting (3DGS) significantly improve the efficiency of training and rendering. However, the surface inconsistency of 3DGS results in subpar geometric accuracy; later, 2DGS uses 2D surfels to enhance geometric accuracy at the expense of rendering fidelity. To leverage the benefits of both 2DGS and 3DGS, we propose a novel method named MixedGaussianAvatar for realistically and geometrically accurate head avatar reconstruction. Our main idea is to utilize 2D Gaussians to reconstruct the surface of the 3D head, ensuring geometric accuracy. We attach the 2D Gaussians to the triangular mesh of the FLAME model and connect additional 3D Gaussians to those 2D Gaussians where the rendering quality of 2DGS is inadequate, creating a mixed 2D-3D Gaussian representation. These 2D-3D Gaussians can then be animated using FLAME parameters. We further introduce a progressive training strategy that first trains the 2D Gaussians and then fine-tunes the mixed 2D-3D Gaussians. We demonstrate the superiority of MixedGaussianAvatar through comprehensive experiments. The code will be released at: https://github.com/ChenVoid/MGA/.

Summary

The paper presents a novel hybrid method combining 2D and 3D Gaussian splatting for accurate 3D head avatar reconstruction.
It introduces a progressive training strategy that refines geometric precision and enhances color rendering for dynamic expressions.
Empirical evaluations demonstrate state-of-the-art improvements over NeRF and 3DGS methods in metrics like PSNR and LPIPS.

Overview of MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting

The paper presents MixedGaussianAvatar, a novel approach in the domain of 3D head avatar reconstruction, leveraging the strengths of mixed 2D and 3D Gaussian splatting techniques. It addresses the challenges faced by existing methods using Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), specifically focusing on the trade-off between rendering speed, geometric accuracy, and visual fidelity in the context of dynamic avatar generation.

Key Contributions

Hybrid Methodology for Enhanced Reconstruction: The proposed method combines the geometric strengths of 2D Gaussian Splatting (2DGS) with the color rendering capabilities of 3DGS to form a mixed 2D-3D Gaussian representation. This dual-method approach aims to achieve a realistically and geometrically accurate reconstruction of 3D head avatars, a significant step towards overcoming the limitations inherent in using either method independently.
Progressive Training Strategy: A notable innovation is the progressive training strategy. This entails an initial phase where 2D Gaussian models are trained to refine geometric precision, followed by fine-tuning combined 2D-3D models to enhance the realism and geometric consistency of dynamic 3D avatars. This strategy ensures the robustness of the avatars across various expressions and viewpoints.
Impactful Implementation: The integration with the FLAME head model allows for alignment with animated parameters, thus facilitating the creation of dynamic, expressive 3D avatars. This is achieved by interpolating the parameters across different Gaussian scales, leading to a more coherent transformation of facial expressions and other head dynamics.

Methodology

The paper delineates a comprehensive methodology involving:

Splatting Process: The mixed representation involves attaching 2D Gaussians to the triangular mesh of the FLAME head model, subsequently appending 3D Gaussians to these 2D counterparts where additional color correction is necessary.
Local-to-Global Transformation: A critical component that employs a mapping method to align FLAME parameters with Gaussian points, ensuring that the mixed Gaussian representation accurately reflects both local and global spatial transformations during animation.
Error-based Selection Algorithm: This computational strategy is utilized to dynamically refine the placement and the number of Gaussian representations based on multi-view consistency, thereby adapting the model to compensate for potential rendering inaccuracies.

Experimental Evaluation

The MixedGaussianAvatar demonstrates state-of-the-art performance, particularly in color rendering and geometric reconstruction, as validated through comprehensive empirical experiments on datasets such as NeRSemble and INSTA. The quantitative analysis shows notable improvements over existing NeRF and 3DGS methods across metrics like PSNR and LPIPS. Qualitatively, it achieves superior detail and fidelity in mesh reconstruction and texture rendering compared to baselines including FlashAvatar and Gaussian Head Avatar.

Implications and Future Directions

The implications of this work are profound for applications in virtual reality and digital avatar creation, where geometric precision and visual authenticity are paramount. By addressing multi-view inconsistencies and optimizing the trade-off between rendering efficiency and accuracy, MixedGaussianAvatar sets a new benchmark in the field.

Further developments could explore the expansion of this framework to full-body avatar reconstruction, integration with real-time performance capture systems, and refinement through diverse datasets to enhance robustness across various conditions. The potential for adaptation into consumer-grade technology for personalization in digital media is apparent, offering vast opportunities for innovation in AI-driven 3D modeling.

PDF Markdown

Related Papers

GitHub

GitHub - ChenVoid/MGA (4 stars)
GitHub - ChenVoid/MGA (4 stars)

Tweets

https://twitter.com/janusch_patas/status/1866013097942732860