Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering (2404.08449v3)

Published 12 Apr 2024 in cs.CV

Abstract: Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Neural point-based graphics. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. Springer, 696–712.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855–5864.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5470–5479.
  4. GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20648–20658.
  5. Geometry-guided progressive nerf for generalizable and efficient neural human rendering. In European Conference on Computer Vision. Springer, 222–239.
  6. Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11594–11604.
  7. UV Volumes for real-time rendering of editable free-view human performance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16621–16631.
  8. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5501–5510.
  9. Learning neural volumetric representations of dynamic humans in minutes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8770.
  10. Markus Gross and Hanspeter Pfister. 2011. Point-based graphics. Elsevier.
  11. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297–7306.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  13. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. arXiv preprint arXiv:2312.02134 (2023).
  14. SHERF: Generalizable Human NeRF from a Single Image. arXiv preprint arXiv:2303.12791 (2023).
  15. Shoukang Hu and Ziwei Liu. 2023. Gauhuman: Articulated gaussian splatting from monocular human videos. arXiv preprint arXiv:2312.02973 (2023).
  16. Occluded Human Body Capture with Self-Supervised Spatial-Temporal Motion Prior. arXiv preprint arXiv:2207.05375 (2022).
  17. Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16922–16932.
  18. Hifi4g: High-fidelity human performance rendering via compact gaussian splatting. arXiv preprint arXiv:2312.03461 (2023).
  19. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).
  20. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  21. Hugs: Human gaussian splats. arXiv preprint arXiv:2311.17910 (2023).
  22. Neural human performer: Learning generalizable radiance fields for human performance rendering. Advances in Neural Information Processing Systems 34 (2021), 24741–24752.
  23. Gart: Gaussian articulated template models. arXiv preprint arXiv:2311.16099 (2023).
  24. Human101: Training 100+ fps human gaussians in 100s from 1 view. arXiv preprint arXiv:2312.15258 (2023).
  25. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (2017), 194–1.
  26. Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. arXiv preprint arXiv:2311.16096 (2023).
  27. Posynda: Multi-hypothesis pose synthesis domain adaptation for robust 3d human pose estimation. In Proceedings of the 31st ACM International Conference on Multimedia. 5542–5551.
  28. SMPL: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 851–866.
  29. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023).
  30. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In European conference on computer vision. Springer, 179–197.
  31. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  32. Human gaussian splatting: Real-time rendering of animatable avatars. arXiv preprint arXiv:2311.17113 (2023).
  33. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–15.
  34. Transhuman: A transformer-based human representation for generalizable neural human rendering. In Proceedings of the IEEE/CVF International conference on computer vision. 3544–3555.
  35. Ash: Animatable gaussian splats for efficient and photoreal human rendering. arXiv preprint arXiv:2312.05941 (2023).
  36. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10975–10985.
  37. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14314–14323.
  38. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.
  39. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. arXiv preprint arXiv:2312.02069 (2023).
  40. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. arXiv preprint arXiv:2312.09228 (2023).
  41. Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint arXiv:2201.02610 (2022).
  42. Adop: Approximate differentiable one-pixel point rendering. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–14.
  43. Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4104–4113.
  44. Refu: Refine and fuse the unobserved view for detail-preserving single-image 3d human reconstruction. In Proceedings of the 30th ACM International Conference on Multimedia. 6850–6859.
  45. NPC: Neural Point Characters from Video. arXiv preprint arXiv:2304.02013 (2023).
  46. Neural free-viewpoint performance rendering under complex human-object interactions. In Proceedings of the 29th ACM International Conference on Multimedia. 4651–4660.
  47. Attention is all you need. Advances in neural information processing systems 30 (2017).
  48. Lightweight Super-Resolution Head for Human Pose Estimation. In Proceedings of the 31st ACM International Conference on Multimedia. 2353–2361.
  49. ibutter: Neural interactive bullet time generator for human free-viewpoint rendering. In Proceedings of the 29th ACM International Conference on Multimedia. 4641–4650.
  50. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
  51. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220.
  52. PersonNeRF: Personalized Reconstruction from Photo Collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 524–533.
  53. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023).
  54. Wild2Avatar: Rendering Humans Behind Occlusions. arXiv preprint arXiv:2401.00431 (2023).
  55. Rendering humans from object-occluded monocular videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3239–3250.
  56. Ghum & ghuml: Generative 3d human shape and articulated pose models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6184–6193.
  57. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438–5448.
  58. Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians. arXiv preprint arXiv:2312.03029 (2023).
  59. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. arXiv preprint arXiv:2305.01190 (2023).
  60. LASOR: Learning accurate 3D human pose and shape via synthetic occlusion-aware data and neural mesh rendering. IEEE Transactions on Image Processing 31 (2022), 1938–1948.
  61. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023).
  62. S3: Neural shape, skeleton, and skinning fields for 3d human modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13284–13293.
  63. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–14.
  64. MonoHuman: Animatable Human Neural Field from Monocular Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943–16953.
  65. Gavatar: Animatable 3d gaussian avatars with implicit mesh learning. arXiv preprint arXiv:2312.11461 (2023).
  66. Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation. In Proceedings of the 30th ACM International Conference on Multimedia. 1788–1796.
  67. Vmrf: View matching neural radiance fields. In Proceedings of the 30th ACM International Conference on Multimedia. 6579–6587.
  68. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
  69. Humannerf: Efficiently generated human radiance field from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7743–7753.
  70. Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. arXiv preprint arXiv:2312.02155 (2023).
  71. Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13545–13555.
  72. Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21057–21067.
  73. Human de-occlusion: Invisible perception and recovery for humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3691–3701.
Citations (2)

Summary

  • The paper introduces a novel method that leverages 3D Gaussian splatting to overcome occlusion challenges in human rendering.
  • It employs 3D Gaussian forward skinning and occlusion feature queries to efficiently capture and enhance missing details in occluded regions.
  • Experiments demonstrate that OccGaussian achieves up to 160 FPS and 250x faster training, outperforming state-of-the-art methods.

3D Gaussian Splatting for Occluded Human Rendering: A Study on OccGaussian

Introduction

Rendering dynamic 3D humans from monocular videos is crucial for virtual reality and digital entertainment. However, occlusion poses a significant challenge, as conventional methods struggle to maintain high-quality renderings when parts of the human body are obstructed. The recently introduced OccGaussian method addresses these limitations by leveraging 3D Gaussian Splatting, achieving rapid training and real-time rendering while rendering high-quality human figures in occluded scenarios.

Technical Summary

OccGaussian initializes 3D Gaussian distributions in the canonical space and conducts occlusion feature queries in occluded regions. It then utilizes Gaussian Feature MLP to process the aggregated pixel-align features extracted to compensate for missing information. Remarkably, OccGaussian achieves training speeds 250 times faster than its predecessors and can render at up to 160 FPS, an 800 times improvement. This efficiency does not compromise quality, as the method demonstrates comparable or superior performance against state-of-the-art methods.

Methodological Innovations

  • 3D Gaussian Forward Skinning: Adapts the 3D Gaussian Splatting technique for occluded human rendering, leveraging the efficiency of 3DGS while ensuring high-quality renderings of dynamic human figures under occlusion.
  • Occlusion Feature Query: Implements K-nearest feature query in occluded regions, followed by the extraction of aggregated pixel-align features to effectively utilize local information and compensate for the absence of ground truth in these areas.
  • Gaussian Feature MLP: Further processes the features of occluded regions, predicting spherical harmonic coefficients and opacity values through MLP, enhancing the rendering quality in occluded areas.

Experimental Insights

The effectiveness of OccGaussian is demonstrated through rigorous experiments on the ZJU-MoCap and OcMotion datasets, showcasing superior performance in rendering quality, training speed, and rendering framerate. The method not only achieves state-of-the-art rendering quality but does so with remarkable improvements in efficiency, making it particularly suitable for real-time applications.

Practical Implications and Future Prospects

OccGaussian represents a significant advancement in the field of 3D human rendering, particularly for scenarios complicated by occlusions. The method's efficiency and quality make it an appealing option for a wide range of applications, from virtual try-on and augmented reality to virtual production in films.

Future research may explore incorporating temporal information to enhance the reconstruction of severely occluded regions, a limitation currently faced by OccGaussian. Additionally, improving the method's robustness to inaccuracies in pose and camera parameters could extend its applicability to in-the-wild videos. The remarkable improvements in efficiency and rendering quality position OccGaussian as a promising avenue for future developments in the field of 3D human rendering.

Conclusion

OccGaussian introduces a novel approach to rendering occluded humans in monocular videos by leveraging 3D Gaussian Splatting. Its efficiency in training and rendering, combined with its ability to produce high-quality renderings in the presence of occlusions, marks a notable advancement in the field. As the method opens new doors for real-time applications and beyond, OccGaussian is poised to drive further innovations in 3D human rendering technology.