Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos (2505.00421v1)

Published 1 May 2025 in cs.CV

Abstract: High-quality, animatable 3D human avatar reconstruction from monocular videos offers significant potential for reducing reliance on complex hardware, making it highly practical for applications in game development, augmented reality, and social media. However, existing methods still face substantial challenges in capturing fine geometric details and maintaining animation stability, particularly under dynamic or complex poses. To address these issues, we propose a novel real-time framework for animatable human avatar reconstruction based on 2D Gaussian Splatting (2DGS). By leveraging 2DGS and global SMPL pose parameters, our framework not only aligns positional and rotational discrepancies but also enables robust and natural pose-driven animation of the reconstructed avatars. Furthermore, we introduce a Rotation Compensation Network (RCN) that learns rotation residuals by integrating local geometric features with global pose parameters. This network significantly improves the handling of non-rigid deformations and ensures smooth, artifact-free pose transitions during animation. Experimental results demonstrate that our method successfully reconstructs realistic and highly animatable human avatars from monocular videos, effectively preserving fine-grained details while ensuring stable and natural pose variation. Our approach surpasses current state-of-the-art methods in both reconstruction quality and animation robustness on public benchmarks.

Summary

The paper proposes a novel framework for real-time animatable 2DGS avatars from monocular video, integrating 2D Gaussian Splatting with the SMPL model for enhanced detail and stable animation.
Key technical contributions include a Rotation Compensation Network (RCN) to handle non-rigid deformations and a dense sampling strategy around joints for better capture of transformations.
Experimental validation shows superior reconstruction quality on datasets like PeopleSnapshot, demonstrating applicability in areas such as gaming, augmented reality, and social media.

Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos

This paper proposes a novel framework for the real-time reconstruction of animatable human avatars using monocular video input, leveraging 2D Gaussian Splatting (2DGS) for detail enhancement. The research tackles significant challenges associated with single-view avatar reconstruction—namely, capturing fine geometric details and maintaining animation stability under dynamic poses—by introducing innovations in the modeling and rendering processes.

Key Contributions

2D Gaussian Splatting (2DGS): The approach incorporates 2DGS within the avatar modeling framework to enhance the fidelity of surface geometries. This method addresses limitations of existing models that struggle to accurately reconstruct high-curvature, fine details due to reliance on volumetric representations ill-suited for thin surfaces.
Integration with SMPL Model: By utilizing the SMPL model for human parameterization, the framework effectively aligns global positional and rotational discrepancies. The system leverages the global SMPL pose parameters to produce robust, natural pose-driven animations.
Rotation Compensation Network (RCN): A novel component introduced to address non-rigid deformations, the RCN learns rotation residuals by integrating local geometric features with global pose parameters, resulting in smoother pose transitions and reduced artifacts.
Dense Sampling Strategy: The method employs a joint-prior guided sampling approach, particularly around joint regions, to better capture deformations in areas subject to high non-rigid transformations.

Experimental Validation

The methodology was validated on well-known datasets including PeopleSnapshot and Synthetic, where it demonstrated superior reconstruction quality and robustness over existing techniques. Key performance metrics such as PSNR, SSIM, and LPIPS were used for quantitative assessments, with the proposed framework outperforming methods like GaussianAvatar, Gart, and SplattingAvatar across these metrics.

Implications and Future Directions

The implications of this research extend to enhancing real-time capabilities in fields like gaming, augmented reality, and social media, where animated avatars play crucial roles. Moreover, the application of 2DGS in a parametric model introduces new possibilities for improving avatar rendering quality and efficiency.

The integration of 2DGS with global pose parameters points to future research avenues involving more sophisticated machine learning models that can dynamically predict and adjust person-specific traits. Furthermore, this work lays foundational ground for incorporating more complex motion dynamics, such as interactions with objects, potentially leading to more immersive VR environments.

Finally, while the paper addressed key shortcomings of current 3D representations, challenges remain—particularly around the accuracy of pose estimation from monocular video. Addressing these could further solidify the framework's applicability across a wider range of less controlled environments.

YouTube

Show All Videos