Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling (2311.16096v4)

Published 27 Nov 2023 in cs.CV and cs.GR

Abstract: Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.

Citations (18)

View on Semantic Scholar

Summary

The paper presents a hybrid approach that integrates explicit 3D Gaussian splatting with StyleGAN-based 2D CNNs to capture dynamic, pose-dependent details.
It learns a character-specific template from multi-view RGB videos and uses a PCA-based projection to effectively constrain pose generalization.
Extensive experiments demonstrate improved PSNR, SSIM, LPIPS, and FID metrics compared to state-of-the-art methods, enhancing avatar realism.

Animatable Gaussians: High-Fidelity Human Avatar Modeling

The paper "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling" presents a novel approach to animatable human avatars, utilizing a combination of 2D CNNs and 3D Gaussian splatting. This approach aims to enhance the fidelity of avatars created from RGB videos, focusing on dynamic, pose-dependent garment details that have traditionally been challenging to capture.

Technical Approach

The research introduces a new avatar representation termed "Animatable Gaussians," deploying a hybrid of explicit 3D Gaussian splatting and StyleGAN-based 2D CNNs. The method begins by learning a parametric template derived from input RGB videos. This template is then parameterized across two canonical Gaussian maps, which allow for the integration of pose-dependent dynamics.

Key Highlights:

Parametric Template Learning:
- The method constructs a character-specific template from multi-view videos, extracted from a data-driven SDF and color field. This template, informed by pre-existing SMPL models, defines the basic garment shapes, even accommodating looser clothing like dresses.
Template-Guided 2D Parameterization:
- The template is projected onto two canonical front and back views, with each pixel representing a 3D Gaussian. This enables leveraging StyleGAN-based CNNs to produce detailed Gaussian maps reflecting dynamic appearances.
Pose Projection for Generalization:
- Recognizing the limitations of learning-based models in novel pose generalization, the paper introduces a PCA-based projection strategy. This technique constrains pose extrapolation within the training distribution, enhancing the realism of synthesized results.

Experimental Results

The authors conduct extensive experiments demonstrating improved performance over state-of-the-art methods in high-fidelity dynamic avatar modeling. Key numerical outcomes include enhanced PSNR and SSIM metrics, alongside reduced LPIPS and FID scores, showcasing notable advances in both visual fidelity and detail accuracy.

Implications and Future Directions

The research underlines a significant stride towards realistic avatar representations, suggesting potential applications across holoportation, virtual reality, and gaming industries. From a practical standpoint, the blend of explicit and parametric modeling offers a path towards more computationally efficient and detailed avatar rendering.

Theoretically, the paper opens avenues for deeper integrations of 2D CNN architectures with 3D data representations, highlighting the capacity to overcome traditional frequency constraints inherent in MLP-based methods.

Potential future developments could explore disentangling body and garment dynamics for more modular avatar components. Addressing physical realism in hair or accessory motion could further enhance applicability. Additionally, adapting this methodology to monocular inputs could democratize access to such high-fidelity representations.

In conclusion, "Animatable Gaussians" provides a robust framework that marries explicit geometric representations with deep learning, offering new directions in the field of human-computer interaction and digital personas. This work is a valuable contribution to both academic and practical discourse on avatar modeling.

PDF Markdown

Related Papers

GitHub

GitHub - lizhe00/AnimatableGaussians: Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling" (940 stars)

Tweets

https://twitter.com/taziku_co/status/1767140240857378903