HUGS: Human Gaussian Splats (2311.17910v1)

Published 29 Nov 2023 in cs.CV and cs.GR

Abstract: Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work. Our code will be announced here: https://github.com/apple/ml-hugs

Citations (55)

View on Semantic Scholar

Summary

The paper introduces a novel neural rendering method, HUGS, which leverages 3D Gaussian Splatting to generate animatable human avatars with significantly reduced training time.
The methodology combines SMPL-based initialization with innovative Gaussian deformation techniques to capture fine details such as clothing and hair.
The framework achieves state-of-the-art performance with efficient real-time rendering at 60 FPS and improved image quality on benchmarks like NeuMan and ZJU-Mocap.

Overview of "HUGS: Human Gaussian Splats"

The paper introduces Human Gaussian Splats (HUGS), a neural rendering method that advances the visualization of dynamic humans in complex environments using 3D Gaussian Splatting (3DGS). The authors focus on overcoming current challenges in rendering animatable humans from monocular videos, proposing a solution that learns disentangled representations of both humans and scenes in a notably short training duration of 30 minutes. By leveraging 3DGS, the proposed approach significantly improves on training speed and rendering quality compared to prior state-of-the-art methods.

Methodology

HUGS leverages monocular video comprising 50-100 frames to construct animatable human avatars integrated with scene data. The system differentiates itself with several key components:

Representation: The method uses the SMPL body model to initialize human Gaussians, with 3D Gaussians deviating to capture unmodeled details such as clothing and hair.
Novel Deformation Techniques: The paper introduces a deformation model predicting Gaussian transformations to support intricate movements and poses, managing surface integrity during animations.
Optimization: HUGS optimizes Gaussian centers using a triplane feature structure combined with Multi-Layer Perceptrons (MLPs), which predict shape, orientation, and color properties.
Rendering: The strategy allows for efficient rendering at 60 FPS, outperforming traditional methods by eschewing the need for neural field evaluations at inference time.

Results and Implications

Empirically, HUGS demonstrates superior performance on established benchmarks like NeuMan and ZJU-Mocap datasets, surpassing previous methods in image quality metrics such as PSNR, SSIM, and LPIPS. The framework achieves state-of-the-art reconstruction and animation quality, enhancing details like hand articulations and clothing textures. The authors emphasize HUGS’ ability to generalize across various scenes and poses, underpinning its potential utility in augmented reality, visual effects, and other applications requiring rapid avatar creation and rendering.

Future Directions

The paper notes limitations tied to the SMPL model’s constraints on modeling non-rigid deformations, implying future work could incorporate more sophisticated deformation models or integrate generative techniques for enhanced realism. Additionally, developing techniques to better account for varying lighting conditions across environments presents a promising avenue for further research.

Conclusion

By utilizing 3D Gaussian Splatting, HUGS establishes itself as a significant contribution to the field of animatable human rendering. Its innovative methodologies, combined with efficient training and rendering processes, position it as a valuable tool for advancing practical applications in dynamic human avatar creation. The approach’s adaptability and detail-oriented results signify its potential for influencing future developments in the field of neural rendering and human-computer interaction.

PDF Markdown

Related Papers

GitHub

GitHub - apple/ml-hugs: Official repository of HUGS: Human Gaussian Splats (CVPR 2024) (214 stars)

Tweets

https://twitter.com/4351172593/status/1737173861756485875

https://twitter.com/22146921/status/1737229877299093988

https://twitter.com/1567733872703713281/status/1738804819362799986

YouTube

Show All Videos