Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling (2403.19655v4)

Published 28 Mar 2024 in cs.CV

Abstract: We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-quality representation with orders of magnitude fewer parameters than previous structured representations for comparable quality, ranging from one to two orders of magnitude. The compactness of GaussianCube greatly eases the difficulty of 3D generative modeling. Extensive experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D synthesis all show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a highly accurate and versatile radiance representation for 3D generative modeling. Project page: https://gaussiancube.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  2. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  3. Linear assignment problems and extensions. In Handbook of combinatorial optimization: Supplement volume A, pages 75–149. Springer, 1999.
  4. Large-vocabulary 3d diffusion model with transformer. arXiv preprint arXiv:2309.07920, 2023.
  5. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5799–5809, 2021.
  6. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  7. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  8. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. arXiv preprint arXiv:2312.12337, 2023.
  9. Mimic3d: Thriving 3d-aware gans via 3d-to-2d imitation. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2338–2348. IEEE Computer Society, 2023.
  10. Comboverse: Compositional 3d assets creation using spatially-aware diffusion guidance. arXiv preprint arXiv:2403.12409, 2024.
  11. Efficient text-guided 3d-aware portrait generation with score distillation sampling on distribution. arXiv preprint arXiv:2306.02083, 2023.
  12. Dynamic gaussian splatting from markerless motion capture reconstruct infants movements. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 60–68, 2024.
  13. Gram: Generative radiance manifolds for 3d-aware image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10673–10683, 2022.
  14. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  15. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  16. Get3d: A generative model of high quality 3d textured shapes learned from images. arXiv preprint arXiv:2209.11163, 2022.
  17. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  18. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985, 2021.
  19. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  20. Gvgen: Text-to-3d generation with volumetric representation. arXiv preprint arXiv:2403.12957, 2024.
  21. Openlrm: Open-source large reconstruction models, 2023.
  22. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  23. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  24. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023.
  25. Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19774–19783, 2023.
  26. A shortest augmenting path algorithm for dense and sparse linear assignment problems. In DGOR/NSOR: Papers of the 16th Annual Meeting of DGOR in Cooperation with NSOR/Vorträge der 16. Jahrestagung der DGOR zusammen mit der NSOR, pages 622–622. Springer, 1988.
  27. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  28. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  29. Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics (ToG), 28(5):1–10, 2009.
  30. Gaussianbody: Clothed human reconstruction via 3d gaussian splatting. arXiv preprint arXiv:2401.09720, 2024.
  31. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  32. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  33. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023.
  34. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  35. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  36. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11453–11464, 2021.
  37. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  38. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  39. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  40. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  41. 3d neural field generation using triplane diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023.
  42. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  43. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  44. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818, 2023.
  45. Splatter image: Ultra-fast single-view 3d reconstruction. arXiv preprint arXiv:2312.13150, 2023.
  46. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
  47. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. arXiv preprint arXiv:2402.05054, 2024.
  48. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023b.
  49. Volumediffusion: Flexible text-to-3d generation with efficient volumetric encoder. arXiv preprint arXiv:2312.11459, 2023c.
  50. What do single-view 3d reconstruction networks learn? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019.
  51. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2626–2634, 2017.
  52. Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
  53. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023.
  54. Themestation: Generating theme-aware 3d assets from few exemplars. ArXiv, 2024.
  55. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023a.
  56. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 803–814, 2023b.
  57. Gram-hd: 3d-consistent image generation at high resolution with generative radiance manifolds. arXiv preprint arXiv:2206.07255, 2022.
  58. Agg: Amortized generative 3d gaussians for single image to 3d. arXiv preprint arXiv:2401.04099, 2024.
  59. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. arXiv preprint arXiv:2212.14704, 2022a.
  60. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022b.
  61. Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians. arXiv preprint arXiv:2312.03029, 2023.
  62. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529, 2023.
  63. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  64. Styleswin: Transformer-based gan for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11304–11314, 2022.
  65. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  66. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  67. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147, 2023.

Summary

  • The paper introduces a densification-constrained fitting algorithm that preserves Gaussian expressiveness while enforcing structural constraints for generative modeling.
  • It employs Optimal Transport to arrange Gaussians into a coherent voxel grid, ensuring spatial alignment and minimized transport distances.
  • Experimental results on ShapeNet and OmniObject3D demonstrate GaussianCube's efficiency and accuracy in generating semantically rich 3D objects.

Introducing GaussianCube: A Structured Approach for 3D Generative Modeling with Gaussian Splatting

Overview of GaussianCube

In the field of 3D generative modeling, structuring representations pose significant challenges, especially when dealing with the complex and unstructured nature of existing approaches such as 3D Gaussian Splatting (GS). To bridge this gap, GaussianCube leverages the strengths of GS, including its high fidelity in 3D fitting and efficient rendering capabilities, while introducing a structured and efficient representation suitable for generative modeling. Through a novel combination of a modified densification-constrained GS fitting algorithm and an innovative arrangement using Optimal Transport, GaussianCube successfully transforms scattered Gaussians into a coherent voxel grid format without compromising their expressiveness.

Key Contributions

  • Densification-Constrained Fitting Algorithm: This methodology ensures high-quality fitting results with a fixed number of free Gaussians, maintaining the expressiveness of the GS fitting while imposing structural constraints to facilitate generative modeling.
  • Optimal Transport for Structured Arrangement: By arranging Gaussians into a predefined voxel grid via Optimal Transport, GaussianCube achieves a spatially coherent structure, optimizing for minimal total transport distances and maximal spatial coherence.
  • Efficient and Expressive 3D Generative Modeling: Utilizing standard 3D U-Net architecture, GaussianCube enables both unconditional and conditional generation tasks with state-of-the-art results in terms of quality and efficiency.

Experimental Validation

Extensive experiments were conducted on both ShapeNet and OmniObject3D datasets, demonstrating superior qualitative and quantitative generation results compared to existing methods. Notably, GaussianCube showcases its robustness in producing semantically accurate 3D objects with intricate geometries and textures across a variety of classes.

Implications and Future Directions

The introduction of GaussianCube represents a significant advance in structuring Gaussian Splatting for 3D generative modeling. Its ability to provide a coherent and structured representation while retaining the expressiveness and efficiency of GS opens new avenues for research and applications. Future work may explore the adaptability of GaussianCube to other forms of 3D data and its potential integration with different generative frameworks, further expanding the capabilities and applications of 3D generative modeling.

Concluding Remarks: GaussianCube's novel approach to structuring 3D Gaussian Splatting using Optimal Transport for generative modeling addresses a critical gap in the field. By maintaining the expressiveness and efficiency of GS while providing a structured representation, GaussianCube sets a new standard for 3D content creation, offering promising directions for both theoretical advancements and practical applications in AI and 3D modeling.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 154 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com