Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting (2401.09720v2)

Published 18 Jan 2024 in cs.CV

Abstract: In this work, we propose a novel clothed human reconstruction method called GaussianBody, based on 3D Gaussian Splatting. Compared with the costly neural radiance based models, 3D Gaussian Splatting has recently demonstrated great performance in terms of training time and rendering quality. However, applying the static 3D Gaussian Splatting model to the dynamic human reconstruction problem is non-trivial due to complicated non-rigid deformations and rich cloth details. To address these challenges, our method considers explicit pose-guided deformation to associate dynamic Gaussians across the canonical space and the observation space, introducing a physically-based prior with regularized transformations helps mitigate ambiguity between the two spaces. During the training process, we further propose a pose refinement strategy to update the pose regression for compensating the inaccurate initial estimation and a split-with-scale mechanism to enhance the density of regressed point clouds. The experiments validate that our method can achieve state-of-the-art photorealistic novel-view rendering results with high-quality details for dynamic clothed human bodies, along with explicit geometry reconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2252–2261, 2019.
  2. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253–5263, 2020.
  3. Monocular, one-stage, regression of multiple 3d people. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11179–11188, 2021.
  4. Collaborative regression of expressive bodies using moderation. In 2021 International Conference on 3D Vision (3DV), pages 792–804. IEEE, 2021.
  5. Scape: shape completion and animation of people. In ACM SIGGRAPH 2005 Papers, pages 408–416. 2005.
  6. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015.
  7. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019.
  8. Star: Sparse trained articulated human body regressor. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 598–613. Springer, 2020.
  9. Learning to dress 3d people in generative clothing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6469–6478, 2020.
  10. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8387–8397, 2018.
  11. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2304–2314, 2019.
  12. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 84–93, 2020.
  13. Icon: Implicit clothed humans obtained from normals. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13286–13296. IEEE, 2022.
  14. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  15. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021.
  16. Animatable neural radiance fields from monocular rgb videos. arXiv preprint arXiv:2106.13629, 2021.
  17. Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922–16932, 2023.
  18. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition, pages 16210–16220, 2022.
  19. Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision, pages 419–436. Springer, 2022.
  20. Posevocab: Learning joint-structured pose embeddings for human avatar modeling. In ACM SIGGRAPH Conference Proceedings, 2023.
  21. Humanrf: High-fidelity neural radiance fields for humans in motion. ACM Transactions on Graphics (TOG), 42(4):1–12, 2023.
  22. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  23. Point-based radiance fields for controllable human motion synthesis, 2023.
  24. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  25. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023.
  26. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting, 2023.
  27. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  28. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
  29. Detailed human avatars from monocular video. In 2018 International Conference on 3D Vision (3DV), pages 98–109. IEEE, 2018.
  30. High-fidelity 3d human digitization from single 2k resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12869–12879, 2023.
  31. ECON: Explicit Clothed humans Optimized via Normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  32. Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence, 44(6):3170–3184, 2021.
  33. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3093–3102, 2020.
  34. Arch++: Animation-ready clothed human reconstruction revisited. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11046–11056, 2021.
  35. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  36. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  37. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), dec 2021.
  38. Im4d: High-fidelity and real-time novel view synthesis for dynamic scenes. arXiv preprint arXiv:2310.08585, 2023.
  39. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14314–14323, 2021.
  40. Human performance modeling and rendering via neural animated mesh. ACM Transactions on Graphics (TOG), 41(6):1–17, 2022.
  41. High-fidelity clothed avatar reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8662–8672, 2023.
  42. Selfrecon: Self reconstruction your digital avatar from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5605–5615, 2022.
  43. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023.
  44. Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5875–5884, 2021.
  45. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  46. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
  47. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16569–16578, 2023.
  48. Representing volumetric videos as dynamic mlp maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4252–4262, 2023.
  49. Fourier plenoctrees for dynamic radiance field rendering in real-time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13524–13534, 2022.
  50. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  51. Fast-snarf: A fast deformer for articulated neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  52. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5904–5913, 2019.
  53. Adam: A method for stochastic optimization. Ithaca, NYarXiv.org, 2014.
  54. Capturing and animation of body and clothing from monocular video. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
Citations (19)

Summary

  • The paper introduces a novel 3D Gaussian splatting framework for reconstructing detailed, clothed human figures from single images and videos.
  • It employs explicit pose-guided deformation and forward linear blend skinning to accurately capture dynamic human motion with reduced training time.
  • Rigorous experiments show that GaussianBody outperforms baselines by enhancing clothing detail fidelity and managing complex non-rigid movements.

Overview

Researchers have introduced a groundbreaking method for reconstructing high-fidelity clothed human figures from single images or monocular videos. This innovative technique, named GaussianBody, uses 3D Gaussian Splatting (3D-GS), a differentiable rendering approach that significantly reduces training time and renders with exceptional quality. Traditional techniques face challenges with complex non-rigid human motion and intricate clothing details; however, GaussianBody successfully addresses these by applying explicit pose-guided deformation, a physically-based regularization, and advanced strategies to refine pose estimation and enhance point cloud density.

The Problem Domain

Reconstructing realistic 3D models of clothed humans from images or videos is a crucial task with applications in various industries, from entertainment to virtual reality. Mesh-based and implicit methods have shown promising results, offering both speed and detailed representations. However, they struggle with dynamic motion and require extensive computational resources. By contrast, 3D-GS, previously applied to static models, has proven effective but needed adaptation for the dynamic nature of human motion. GaussiansBody represents an advancement in this space, offering both improved precision and efficiency.

The GaussianBody Approach

In the heart of GaussianBody lies the blend of 3D-GS and articulated human models. The method uses forward linear blend skinning (LBS) to deform 3D Gaussians from canonical to observation space, closely following the human pose throughout frames. A scalable strategy dynamically enhances the point cloud density, handling areas of sparseness that occur due to the lack of multi-view information. Moreover, pose refinement is introduced to correct any initial pose estimation inaccuracies. This results in high-quality, detailed renderings of dynamic clothed humans in near-real-time.

A Leap in Performance

GaussianBody has been tested and validated through rigorous experiments against baseline models and state-of-the-art methods. Noteworthy is its performance in areas that typically challenge reconstruction accuracy, such as the fidelity of clothing details and the recovery of geometry. By comparison, GaussianBody offers superior reconstruction quality while significantly reducing the required training time. It demonstrates its efficacy, particularly in rendering dynamic clothed humans, marking significant progress in the field of 3D human modeling.