Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars (2311.17113v2)

Published 28 Nov 2023 in cs.CV and cs.GR

Abstract: This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. The body is represented by a set of gaussian primitives in a canonical space which is deformed with a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (HuGS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  2. Sira: Relightable avatars from a single image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 775–784, 2023.
  3. Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In International Conference on Computer Vision (ICCV), 2021.
  4. Humanrf: High-fidelity neural radiance fields for humans in motion. ACM Transactions on Graphics (TOG), 42(4):1–12, 2023.
  5. Vschh 2023: A benchmark for the view synthesis challenge of human heads. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1121–1128, 2023.
  6. Skinning with dual quaternions. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, pages 39–46, 2007.
  7. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  8. Tetra-NeRF: Representing neural radiance fields using tetrahedra. arXiv preprint arXiv:2304.09987, 2023.
  9. Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 2000.
  10. Tava: Template-free animatable volumetric actors. 2022.
  11. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  12. Read: Large-scale neural scene rendering for autonomous driving. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1522–1529, 2023a.
  13. Posevocab: Learning joint-structured pose embeddings for human avatar modeling. In ACM SIGGRAPH Conference Proceedings, 2023b.
  14. High-Fidelity Clothed Avatar Reconstruction from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  15. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Trans. Graph.(ACM SIGGRAPH Asia), 2021.
  16. Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph., 40(4), 2021.
  17. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015.
  18. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
  19. From obscurances to ambient occlusion: A survey. The Visual Computer, 25:181–196, 2009.
  20. Animation space: A truly linear framework for character animation. ACM Transactions on Graphics (TOG), 25(4):1400–1423, 2006.
  21. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  22. Ansh Mittal. Neural radiance fields: Past, present, and future. arXiv preprint arXiv:2304.10050, 2023.
  23. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  24. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  25. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  26. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019.
  27. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021a.
  28. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021b.
  29. Representing volumetric videos as dynamic mlp maps. In CVPR, 2023.
  30. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  31. Drivable volumetric avatars using texel-aligned features. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  32. SCANimate: Weakly supervised learning of skinned clothed avatar networks. In Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.
  33. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  34. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2015.
  35. Caphy: Capturing physical properties for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14150–14160, 2023.
  36. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  37. Volume rendering digest (for nerf). arXiv preprint arXiv:2209.02417, 2022.
  38. Arah: Animatable volume rendering of articulated human sdfs. In European Conference on Computer Vision, 2022.
  39. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, 2022.
  40. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  41. Neural fields in visual computing and beyond. Computer Graphics Forum, 2022.
  42. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  43. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023a.
  44. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv 2310.10642, 2023b.
  45. MonoHuman: Animatable human neural field from monocular video. In CVPR, 2023.
  46. Structured local radiance fields for human avatar modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  47. Avatarrex: Real-time expressive full-body avatars. ACM Transactions on Graphics (TOG), 42(4), 2023.
Citations (42)

Summary

  • The paper introduces a novel 3D Gaussian Splatting method that represents human bodies using Gaussian primitives in a canonical space for high-fidelity rendering.
  • It employs a two-step deformation technique combining forward skinning with local non-rigid refinement to accurately capture both broad and fine body motions.
  • The approach delivers real-time performance at 80 FPS and a 1.5 dB PSNR improvement on the THuman4 dataset compared to traditional methods.

The paper "Human Gaussian Splatting: Real-time Rendering of Animatable Avatars" addresses the challenge of real-time rendering of photorealistic avatars, particularly focusing on achieving high visual quality for animatable human models. Traditional methods largely relied on textured meshes for modeling virtual humans, but these often fall short in rendering quality and real-time performance, especially when the avatars are animated with new poses.

Key contributions of this work include the use of a novel technique called 3D Gaussian Splatting, which has gained traction as an efficient alternative to neural radiance fields. The researchers formulated an animatable human model that uses Gaussian primitives to represent the human body in a canonical space. This model employs a two-pronged approach combining forward skinning and local non-rigid refinement to handle body deformations efficiently.

The methodology includes:

  1. Representation: The human body is represented by a cloud of Gaussian splats, which are defined in a canonical pose. This allows the model to handle various body shapes more flexibly.
  2. Deformation Technique: The transformation from the canonical space to the animated pose is achieved through a coarse-to-fine deformation strategy. It starts with forward skinning for broad deformations and follows up with local refinements to capture finer details.
  3. Training and Rendering: The model is trained end-to-end using multi-view video data. This approach helps the model learn accurate geometric and photometric details required for high-fidelity rendering.

The authors report substantial improvements over existing methods, with a 1.5 dB increase in Peak Signal-to-Noise Ratio (PSNR) on the THuman4 dataset, a benchmark for evaluating synthetic novel pose synthesis. Notably, their approach supports real-time rendering capabilities, achieving approximately 80 frames per second at a resolution of 512x512, which marks a significant advancement in efficiency compared to prior neural representations.

Overall, the paper showcases a breakthrough in creating high-quality, animatable human models that can be rendered in real-time, advancing the possibilities for applications in virtual reality, gaming, and digital human creation.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets