Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffGS: Functional Gaussian Splatting Diffusion (2410.19657v2)

Published 25 Oct 2024 in cs.CV

Abstract: 3D Gaussian Splatting (3DGS) has shown convincing performance in rendering speed and fidelity, yet the generation of Gaussian Splatting remains a challenge due to its discreteness and unstructured nature. In this work, we propose DiffGS, a general Gaussian generator based on latent diffusion models. DiffGS is a powerful and efficient 3D generative model which is capable of generating Gaussian primitives at arbitrary numbers for high-fidelity rendering with rasterization. The key insight is to represent Gaussian Splatting in a disentangled manner via three novel functions to model Gaussian probabilities, colors and transforms. Through the novel disentanglement of 3DGS, we represent the discrete and unstructured 3DGS with continuous Gaussian Splatting functions, where we then train a latent diffusion model with the target of generating these Gaussian Splatting functions both unconditionally and conditionally. Meanwhile, we introduce a discretization algorithm to extract Gaussians at arbitrary numbers from the generated functions via octree-guided sampling and optimization. We explore DiffGS for various tasks, including unconditional generation, conditional generation from text, image, and partial 3DGS, as well as Point-to-Gaussian generation. We believe that DiffGS provides a new direction for flexibly modeling and generating Gaussian Splatting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  2. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  3. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  4. Large-vocabulary 3D diffusion model with transformer. arXiv preprint arXiv:2309.07920, 2023.
  5. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022.
  6. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.
  7. TensoRF: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  8. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2416–2425, 2023.
  9. Text-to-3d using gaussian splatting. arXiv preprint arXiv:2309.16585, 2023.
  10. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
  11. SDFusion: Multimodal 3D shape completion, reconstruction, and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4456–4465, 2023.
  12. Diffusion-SDF: Conditional generative modeling of signed distance functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2262–2272, 2023.
  13. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023.
  14. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  15. Geo-Neus: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  16. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022.
  17. 3DGen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  18. Binocular-guided 3d gaussian splatting with view consistency for sparse view synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
  19. GVGEN: Text-to-3D generation with volumetric representation. arXiv preprint arXiv:2403.12957, 2024.
  20. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  21. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  22. Lrm: Large reconstruction model for single image to 3D. arXiv preprint arXiv:2311.04400, 2023.
  23. 2d gaussian splatting for geometrically accurate radiance fields. arXiv preprint arXiv:2403.17888, 2024.
  24. NeuSurf: On-surface priors for neural surface reconstruction from sparse input views. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  25. Music-udf: Learning multi-scale dynamic grid representation for high-fidelity surface reconstruction from point clouds. Computers & Graphics, page 104081, 2024.
  26. Multi-grid representation with field regularization for self-supervised surface reconstruction from point clouds. Computers & Graphics, 2023.
  27. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  28. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
  29. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  30. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. arXiv preprint arXiv:2403.06912, 2024.
  31. NeAF: Learning neural angle fields for point normal estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  32. Learning continuous implicit field with local distance indicator for arbitrary-scale point cloud upsampling. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  33. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  34. Marching cubes: A high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics, 21(4):163–169, 1987.
  35. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  36. GeoDream: Disentangling 2D and geometric priors for high-fidelity and consistent 3D generation. arXiv preprint arXiv:2311.17971, 2023.
  37. Towards better gradient consistency for neural signed distance functions via level set alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17724–17734, 2023.
  38. Donald Meagher. Geometric modeling using octree encoding. Computer graphics and image processing, 19(2):129–147, 1982.
  39. Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4460–4470, 2019.
  40. Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
  41. NeRF: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, 2020.
  42. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023.
  43. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  44. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  45. Multipull: Detailing signed distance functions by pulling multi-level queries at multi-step. In Advances in Neural Information Processing Systems, 2024.
  46. DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
  47. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  48. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  49. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  50. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  51. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  52. Dreambooth3d: Subject-driven text-to-3d generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2349–2359, 2023.
  53. Hierarchical text-conditional image generation with clip latents, 2022.
  54. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  55. 3d neural field generation using triplane diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023.
  56. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818, 2023.
  57. Splatter image: Ultra-fast single-view 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10208–10217, 2024.
  58. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In European Conference on Computer Vision, pages 1–18. Springer, 2025.
  59. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  60. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22819–22829, 2023.
  61. Volumediffusion: Flexible text-to-3d generation with efficient volumetric encoder. arXiv preprint arXiv:2312.11459, 2023.
  62. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems, 34:27171–27183, 2021.
  63. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4563–4573, 2023.
  64. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3295–3306, 2023.
  65. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 2024.
  66. 3D shape reconstruction from 2D images with disentangled attribute flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3803–3813, 2022.
  67. Octrees for faster isosurface generation. ACM Transactions on Graphics (TOG), 11(3):201–227, 1992.
  68. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  69. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20908–20918, 2023.
  70. Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation. arXiv preprint arXiv:2403.14621, 2024.
  71. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529, 2023.
  72. Gaussiancube: Structuring gaussian splatting using optimal transport for 3d generative modeling. arXiv preprint arXiv:2403.19655, 2024.
  73. Gs-lrm: Large reconstruction model for 3d gaussian splatting. arXiv preprint arXiv:2404.19702, 2024.
  74. Neural signed distance function inference through splatting 3d gaussians pulled on zero-level set. In Advances in Neural Information Processing Systems, 2024.
  75. Learning unsigned distance functions from multi-view images with volume rendering priors. European Conference on Computer Vision, 2024.
  76. Zero-shot scene reconstruction from single images with deep prior assembly. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
  77. Cap-udf: Learning unsigned distance functions progressively from raw point clouds with consistency-aware field optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
  78. Learning a more continuous zero level set in unsigned distance fields through level set projection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  79. Fast learning of signed distance functions from noisy point clouds via noise to noise mapping. IEEE transactions on pattern analysis and machine intelligence, 2024.
  80. Learning consistency-aware unsigned distance functions progressively from raw point clouds. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  81. Differentiable registration of images and lidar point clouds with voxelpoint-to-pixel matching. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  82. Uni3D: Exploring Unified 3D Representation at Scale. In International Conference on Learning Representations (ICLR), 2024.
  83. 3d-oae: Occlusion auto-encoders for self-supervised learning on point clouds. IEEE International Conference on Robotics and Automation (ICRA), 2024.
  84. Udiff: Generating conditional unsigned distance fields with optimal wavelet diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
  85. Deep fashion3d: A dataset and benchmark for 3d garment reconstruction from single images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 512–530. Springer, 2020.
  86. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Junsheng Zhou (28 papers)
  2. Weiqi Zhang (21 papers)
  3. Yu-Shen Liu (79 papers)
Citations (7)

Summary

Analysis of "DiffGS: Functional Gaussian Splatting Diffusion"

The paper "DiffGS: Functional Gaussian Splatting Diffusion" presents an advanced approach to the generation of 3D Gaussian Splatting (3DGS), a representation known for its real-time rendering capabilities and potential for high-fidelity visual output. The authors propose DiffGS, a novel model that leverages latent diffusion to address the inherent challenges of unstructured and discrete Gaussian Splatting.

Core Contributions

  1. Functional Representation: DiffGS introduces a unique method of representing Gaussian Splatting through three disentangled functions: the Gaussian Probability Function (GauPF), Gaussian Color Function (GauCF), and Gaussian Transform Function (GauTF). This representation allows for continuous modeling of 3DGS, overcoming the limitations posed by its discrete nature.
  2. Generative Framework: The authors propose a Gaussian Variational Auto-Encoder (VAE) coupled with a Latent Diffusion Model (LDM) to generate these continuous functions. The VAE encodes 3DGS into latent vectors, while the LDM learns to generate new 3D shapes in this latent space, enabling both unconditional and conditional generation.
  3. Discretization Algorithm: An innovative octree-guided sampling and optimization algorithm is introduced. This method allows for efficient geometry extraction from generated Gaussian probabilities, providing a scalable way to generate Gaussians at arbitrary resolutions.

Empirical Evaluation

The researchers demonstrate the efficacy of DiffGS across several tasks:

  • Unconditional Generation: Tested on ShapeNet's airplane and chair classes, DiffGS surpasses existing methods such as GET3D and DiffTF regarding FID and KID metrics.
  • Conditional Generation: The model shows strong results in conditional generation based on text, images, and partial 3DGS inputs, further illustrating its versatility and applicability in different contexts.
  • Point-to-Gaussian Generation: Tested on ShapeNet and DeepFashion3D datasets, DiffGS effectively translates point cloud data into high-quality Gaussian primitives.

Implications and Speculation

The functional approach of DiffGS opens new avenues in 3D content generation by fostering more flexible and efficient modeling. Practically, this could enhance tools for virtual reality, game development, and film production, where real-time rendering and high-quality visualization are critical. Theoretically, the disentangled functional representation might inspire further research in continuous modeling techniques for inherently discrete data.

Considering future applications in AI and related fields, DiffGS's method of leveraging diffusion models could pave the way for more adaptive and robust 3D generative frameworks. The seamless integration with existing 2D and 3D data generators could also enhance cross-domain synthesis capabilities.

Conclusion

DiffGS offers a substantial contribution to the field of 3D generative modeling. By efficiently marrying the strengths of diffusion models with a novel representational schema, it sets a strong precedent for future research and application in graphics and beyond. The model's ability to accommodate varying granularities of Gaussian primitives without sacrificing quality or computational efficiency makes it a robust tool for both academic exploration and practical deployment.

X Twitter Logo Streamline Icon: https://streamlinehq.com