GUAVA: Generalizable Upper Body 3D Gaussian Avatar (2505.03351v1)

Published 6 May 2025 in cs.CV

Abstract: Reconstructing a high-quality, animatable 3D human avatar with expressive facial and hand motions from a single image has gained significant attention due to its broad application potential. 3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs, which is both complex and time-consuming. Furthermore, limited by SMPLX's expressiveness, these methods often focus on body motion but struggle with facial expressions. To address these challenges, we first introduce an expressive human model (EHM) to enhance facial expression capabilities and develop an accurate tracking method. Based on this template model, we propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction. We leverage inverse texture mapping and projection sampling techniques to infer Ubody (upper-body) Gaussians from a single image. The rendered images are refined through a neural refiner. Experimental results demonstrate that GUAVA significantly outperforms previous methods in rendering quality and offers significant speed improvements, with reconstruction times in the sub-second range (0.1s), and supports real-time animation and rendering.

Summary

The paper introduces GUAVA, a framework for rapidly reconstructing animatable 3D upper-body avatars from a single image using techniques like 3D Gaussian splatting.
GUAVA achieves sub-second reconstruction times and superior quality across various metrics, outperforming existing 2D and 3D methods.
The framework's ability to create expressive avatars quickly from minimal input has extensive applications in VR, gaming, and digital media production.

Generalizable Upper Body 3D Gaussian Avatar (GUAVA)

The paper introduces GUAVA, a novel framework designed for the rapid reconstruction of animatable 3D upper-body avatars from a single image. This framework addresses the challenges associated with 3D human avatar reconstruction, particularly focusing on expressive facial and hand motions. It leverages various technical innovations to enhance both the speed and quality of avatar reconstruction, offering significant improvements over existing methods.

Key Innovations

Expressive Human Model (EHM): GUAVA implements an advanced human model that integrates SMPLX with FLAME to enhance facial expressiveness. This combination allows for more detailed facial expression capture than SMPLX alone. EHM serves as a critical component for accurately tracking and reconstructing avatars, improving the template model's capability to handle subtle facial details and motions.
Gaussian Representation: The framework employs 3D Gaussian splatting (3DGS) to represent the avatar in a canonical space. By using Gaussians, the representation maintains spatial consistency and allows for the incorporation of fine texture details through UV Gaussians rigged on a triangulated mesh. This dual approach—combining template Gaussians and UV mapping—ensures high fidelity texture detail and realistic geometry.
Inverse Texture Mapping: This novel technique maps the appearance features from screen space to UV space, effectively bridging the gap between captured textures and the reconstructed model. The approach provides a mechanism to extract and apply detailed textures from a single image, significantly enhancing the visual realism of the reconstructed avatars.
Real-time Animation and Rendering: GUAVA supports fast feed-forward inference that enables sub-second reconstruction times (~0.1s), allowing for real-time animation and rendering. This is achieved through efficient use of neural refinement methods, which enhance the synthesized images beyond initial Gaussian splatting. Comparisons indicate that GUAVA outperforms existing 2D and 3D methods in both speed and quality.

Numerical Results and Implications

Quantitative evaluations demonstrate GUAVA's superior performance in various metrics such as PSNR, L1, SSIM, and LPIPS, across both self-reenactment and cross-reenactment scenarios. The framework achieves high identity preservation scores, indicating effective capture of the source image's features across different poses and expressions.

The potential applications of GUAVA are extensive, spanning fields such as virtual reality, gaming, digital media production, and interactive avatar systems. The ability to quickly and accurately reconstruct expressive avatars from minimal input lowers the barrier to entry for creators to develop rich, interactive experiences. Moreover, GUAVA's approach to avatar reconstruction and rendering pushes forward the development of efficient and expressive digital characters for various applications.

Future Directions

The paper suggests further exploration into full-body avatar reconstruction and improved modeling of clothing dynamics, which are currently limited by the template model. Augmenting GUAVA's capabilities with advancements in generative AI models may provide deeper insights into achieving more holistic and detailed reconstructions. Additionally, addressing ethical concerns, such as preventing misuse of AI-generated avatars and safeguarding personal biometric data, will be crucial for responsible deployment.

The research presented in this paper sets a precedent for future work in avatar modeling and synthesis, showing that rapid, high-quality reconstruction is achievable with innovative approaches in image processing and computer vision.

Related Papers

Find Related Papers

Tweets

https://twitter.com/janusch_patas/status/1920002842519339184

YouTube

Show All Videos