Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation (2309.16653v2)

Published 28 Sep 2023 in cs.CV

Abstract: Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  2. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
  3. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. arXiv preprint arXiv:2304.06714, 2023a.
  4. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023b.
  5. It3d: Improved text-to-3d generation with explicit view synthesis. arXiv preprint arXiv:2308.11473, 2023c.
  6. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. arXiv preprint arXiv:2208.00277, 2022.
  7. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, pp.  4456–4465, 2023.
  8. MeshLab: an Open-Source Mesh Processing Tool. In Vittorio Scarano, Rosario De Chiara, and Ugo Erra (eds.), Eurographics Italian Chapter Conference. The Eurographics Association, 2008. ISBN 978-3-905673-68-5. doi: 10.2312/LocalChapterEvents/ItalChap/ItalianChapConf2008/129-136.
  9. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018. URL http://www.blender.org.
  10. Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint arXiv:2307.05663, 2023a.
  11. Objaverse: A universe of annotated 3d objects. In CVPR, pp.  13142–13153, 2023b.
  12. Topologically-aware deformation fields for single-view 3d reconstruction. In CVPR, pp.  1536–1546, 2022.
  13. Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 35:31841–31854, 2022.
  14. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  15. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  16. Baking neural radiance fields for real-time view synthesis. ICCV, 2021.
  17. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  18. Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint arXiv:2306.12422, 2023.
  19. Zero-shot text-guided object generation with dream fields. In CVPR, pp.  867–876, 2022.
  20. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  21. 3d gaussian splatting for real-time radiance field rendering. ToG, 42(4):1–14, 2023.
  22. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  23. Modular primitives for high-performance differentiable rendering. ToG, 39(6), 2020.
  24. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, pp.  12888–12900. PMLR, 2022.
  25. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.
  26. Focaldreamer: Text-driven 3d editing via focal-fusion assembly. arXiv preprint arXiv:2308.10608, 2023b.
  27. Neuralangelo: High-fidelity neural surface reconstruction. In CVPR, 2023c.
  28. Tada! text to animatable digital avatars. arXiv preprint arXiv:2308.10899, 2023.
  29. Magic3d: High-resolution text-to-3d content creation. In CVPR, pp.  300–309, 2023.
  30. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023a.
  31. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023b.
  32. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023c.
  33. Marching cubes: A high resolution 3d surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, pp.  347–353. 1998.
  34. Att3d: Amortized text-to-3d object synthesis. arXiv preprint arXiv:2306.07349, 2023.
  35. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  36. Realfusion: 360deg reconstruction of any object from a single image. In CVPR, pp.  8446–8455, 2023.
  37. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  38. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
  39. Text2mesh: Text-driven neural stylization for meshes. In CVPR, pp.  13492–13502, 2022.
  40. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  41. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia, pp.  1–8, 2022.
  42. Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
  43. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  44. Autodecoding latent 3d diffusion models. arXiv preprint arXiv:2307.05445, 2023.
  45. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  46. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  47. U2-net: Going deeper with nested u-structure for salient object detection. Pattern recognition, 106:107404, 2020.
  48. Learning transferable visual models from natural language supervision. In ICML, pp.  8748–8763. PMLR, 2021.
  49. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  50. Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721, 2023.
  51. High-resolution image synthesis with latent diffusion models. In CVPR, pp.  10684–10695, 2022.
  52. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 35:36479–36494, 2022.
  53. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  54. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
  55. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
  56. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. https://github.com/ashawkey/stable-dreamfusion.
  57. Delicate textured mesh recovery from nerf via adaptive surface refinement. arXiv preprint arXiv:2303.02091, 2023a.
  58. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023b.
  59. Alex Trevithick and Bo Yang. Grf: Learning a general radiance field for 3d representation and rendering. In ICCV, pp.  15182–15192, 2021.
  60. Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint arXiv:2304.12439, 2023.
  61. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, pp.  12619–12629, 2023a.
  62. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  63. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In CVPR, pp.  803–814, 2023.
  64. Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In CVPR, pp.  4479–4489, 2023a.
  65. Matlaber: Material-aware text-to-3d via latent brdf auto-encoder. arXiv preprint arXiv:2308.09278, 2023b.
  66. Jonathan Young. Xatlas, 2021. URL https://github.com/jpcy/xatlas.
  67. pixelnerf: Neural radiance fields from one or few images. In CVPR, pp.  4578–4587, 2021.
  68. Points-to-3d: Bridging the gap between sparse points and shape-controllable text-to-3d generation. arXiv preprint arXiv:2307.13908, 2023.
  69. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. arXiv preprint arXiv:2301.11445, 2023.
  70. Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv preprint arXiv:2308.13223, 2023.
  71. Locally attentional sdf diffusion for controllable 3d shape generation. arXiv preprint arXiv:2305.04461, 2023.
  72. Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766, 2023.
  73. Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jiaxiang Tang (23 papers)
  2. Jiawei Ren (33 papers)
  3. Hang Zhou (166 papers)
  4. Ziwei Liu (368 papers)
  5. Gang Zeng (40 papers)
Citations (443)

Summary

An Analysis of DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Recent advancements in 3D content creation have witnessed a significant shift towards optimization-based 3D generation methods, particularly leveraging score distillation sampling (SDS). Despite delivering promising results, these methods are often hindered by slow per-sample optimization times, thus constraining their practical applications. The paper "DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation" introduces a novel framework aimed at addressing these efficiency issues without compromising quality.

Summary of Contributions

The core contribution of the paper is the introduction of DreamGaussian, a generative 3D content creation framework. The framework's primary innovation lies in employing a 3D Gaussian Splatting model coupled with mesh extraction and UV space texture refinement to enhance efficiency and quality in 3D content generation. The paper critiques the traditional occupancy pruning methods used in Neural Radiance Fields (NeRF) and highlights how the progressive densification of 3D Gaussians offers faster convergence in generative scenarios.

One of the significant achievements of DreamGaussian is its capability to produce high-quality textured meshes from a single-view image in approximately 2 minutes, representing a tenfold speed increase compared to existing methods. This efficiency is facilitated through several key design choices:

  1. 3D Gaussian Splatting: By using 3D Gaussian splatting, the framework simplifies the optimization process. This approach reduces the time and computational resources typically required by volumetric rendering with NeRF.
  2. Mesh Extraction and Refinement: The framework introduces an efficient algorithm for converting 3D Gaussians into textured meshes, accompanied by a UV-space refinement stage. This fine-tuning stage significantly enhances texture detail and prepares the asset for downstream applications.

These contributions indicate a pivotal shift in the strategy for 3D generative modeling, alleviating the constraints faced by previous methods.

Implications and Future Directions

The implications of this framework are multifaceted, affecting both practical applications and theoretical understanding within the field. Practically, the reduction in optimization time and computational overhead opens new avenues for deploying 3D content generation in real-world applications, such as gaming and virtual reality, where rapid asset creation is valuable.

Theoretically, the adoption of Gaussian splatting as an alternative to traditional occupancy-based methods could inspire further research into optimizing 3D representations for generative tasks. This shift could lead to a reevaluation of how spatial information is modeled and reconstructed, possibly reducing the reliance on extensive 3D datasets that are currently a bottleneck due to their resource-intensive nature.

Speculation on future developments in AI could include leveraging this framework for more complex 3D environments and real-time applications. Additionally, integrating DreamGaussian with other emerging AI technologies, such as reinforcement learning and real-time 3D rendering engines, could further broaden its application scope.

Conclusion

DreamGaussian represents a notable advancement in the field of 3D content creation, providing a highly efficient solution for generating 3D assets with competitive quality. By moving towards a Gaussian-based framework, the paper opens pathways for further research into efficient and scalable 3D generation techniques. This work suggests a promising future for optimization-based methods in unlocking rapid and high-quality 3D content generation for broader industry applications.

Youtube Logo Streamline Icon: https://streamlinehq.com