Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes (2311.13384v2)

Published 22 Nov 2023 in cs.CV

Abstract: With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Learning representations and generative models for 3d point clouds. In International conference on machine learning, 2018.
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
  3. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In CVPR, 2021.
  4. Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022.
  5. Tensorf: Tensorial radiance fields. In ECCV, 2022.
  6. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. arXiv preprint arXiv:2304.06714, 2023a.
  7. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In CVPR, 2023b.
  8. Set-the-scene: Global-local training for generating controllable nerf scenes. arXiv preprint arXiv:2303.13450, 2023.
  9. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
  10. Cvxnet: Learnable convex decomposition. In CVPR, 2020.
  11. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022.
  12. Extended bayesian information criteria for gaussian graphical models. 2010.
  13. Scenescape: Text-driven consistent scene generation. arXiv preprint arXiv:2302.01133, 2023.
  14. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  15. Fastnerf: High-fidelity neural rendering at 200fps. In ICCV, 2021.
  16. Learning shape templates with structured implicit functions. In ICCV, 2019.
  17. Generative adversarial nets. NIPS, 2014.
  18. CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP, 2021.
  19. Denoising diffusion probabilistic models. NeurIPS, 2020.
  20. 3d gaussian splatting for real-time radiance field rendering. ACM ToG, 2023a.
  21. 3d gaussian splatting for real-time radiance field rendering. ACM ToG, 2023b.
  22. Rgbd2: Generative scene synthesis via incremental view inpainting using rgbd diffusion models. In CVPR, 2023.
  23. LAVIS: A one-stop library for language-vision intelligence. In ACL, 2023.
  24. Neural sparse voxel fields. NIPS, 2020.
  25. Diffusion probabilistic models for 3d point cloud generation. In CVPR, 2021.
  26. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
  27. Real-time neural radiance caching for path tracing. arXiv preprint arXiv:2106.12372, 2021.
  28. Instant neural graphics primitives with a multiresolution hash encoding. TOG, 2022.
  29. Hologan: Unsupervised learning of 3d representations from natural images. In ICCV, 2019.
  30. Blockgan: Learning 3d object-aware scene representations from unlabelled images. NeurIPS, 2020.
  31. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, pages 11453–11464, 2021.
  32. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
  33. Superquadrics revisited: Learning 3d shape parsing beyond cuboids. In CVPR, 2019.
  34. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  35. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  36. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  37. Zero-shot text-to-image generation. In ICML, 2021.
  38. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In ICCV, 2021.
  39. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  40. Structure-from-motion revisited. In CVPR, 2016.
  41. Pixelwise view selection for unstructured multi-view stereo. In ECCV, 2016.
  42. Graf: Generative radiance fields for 3d-aware image synthesis. In NIPS, 2020.
  43. 3d point cloud generative adversarial network based on tree structured graph convolutions. In ICCV, 2019.
  44. 3d neural field generation using triplane diffusion. In CVPR, 2023.
  45. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
  46. Implicit neural representations with periodic activation functions. NeurIPS, 2020.
  47. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  48. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  49. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In CVPR, 2021.
  50. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
  51. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv preprint arXiv:2307.01097, 2023b.
  52. Learning shape abstractions by assembling volumetric primitives. In CVPR, 2017.
  53. Exploring clip for assessing the look and feel of images. In AAAI, 2023.
  54. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NIPS, 2016.
  55. Point-nerf: Point-based neural radiance fields. In CVPR, 2022.
  56. 3dias: 3d shape reconstruction with implicit algebraic surfaces. In ICCV, 2021.
  57. Generative neural fields by mixtures of neural implicit functions. arXiv preprint arXiv:2310.19464, 2023.
  58. Plenoctrees for real-time rendering of neural radiance fields. In ICCV, 2021.
  59. Lion: Latent point diffusion models for 3d shape generation. arXiv preprint arXiv:2210.06978, 2022.
  60. 3d shape generation and completion through point-voxel diffusion. In ICCV, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jaeyoung Chung (8 papers)
  2. Suyoung Lee (13 papers)
  3. Hyeongjin Nam (8 papers)
  4. Jaerin Lee (6 papers)
  5. Kyoung Mu Lee (107 papers)
Citations (68)

Summary

Overview of "LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes"

In the paper titled "LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes," the authors present an innovative approach to 3D scene generation. This technique is particularly aimed at overcoming the limitations imposed by reliance on 3D scan datasets and training data that may restrict scene diversity and quality. The paper introduces a pipeline named LucidDreamer, which leverages the capabilities of Stable Diffusion and 3D Gaussian splatting to produce high-quality, diverse 3D scenes from various input types, including text, RGB, and RGBD images.

Key Contributions

  1. Domain-Free Generation: LucidDreamer allows for generating high-quality 3D scenes without constraints on the domain. This is a notable advancement as it supports the generation of multi-view consistent images across various styles, such as realistic, anime, and Lego.
  2. Pipeline Methodology: The authors introduce a dual-process pipeline—Dreaming and Alignment. The Dreaming aspect generates geometrically consistent images, while Alignment integrates these images seamlessly into a unified 3D scene. This iterative approach leads to highly detailed and realistic outputs.
  3. Flexible Input Handling: LucidDreamer accommodates a range of input types and conditions, demonstrating its flexibility and utility across different scenarios. This includes the ability to modify input conditions dynamically during the generation process.
  4. Gaussian Splatting Optimization: The use of 3D Gaussian splatting enables the rendering of photo-realistic scenes by filling voids in point clouds with a continuous representation. This method enhances scene realism and addresses depth discrepancies typically observed in traditional representations.

Technical Insights

LucidDreamer capitalizes on pre-trained models for inpainting and depth estimation, rather than training from scratch, which enhances generalization capabilities. The initial point cloud is constructed by lifting pixels from input RGBD images into 3D space. This setup is progressively refined through image generation (via Stable Diffusion) and depth estimation, allowing for cohesive 3D modeling.

Subsequently, Gaussian splatting is utilized to optimize the scene, leveraging both the point cloud and re-projected images as ground truth. This innovative process ensures that the discrepancies in depth and incomplete image regions are seamlessly managed, culminating in a high-fidelity 3D rendering.

Empirical Results

The empirical evaluations demonstrate that LucidDreamer produces more realistic and visually pleasing scenes compared to existing models like RGBD2. The authors showcase qualitative and quantitative results across diverse datasets, underscoring the pipeline's superior ability to maintain visual consistency and adapt to differing input domains.

Future Implications

This research suggests significant potential for applications in virtual reality (VR), gaming, and simulation environments where domain-free and high-fidelity 3D scene generation is beneficial. The flexibility and generalization capabilities inherent in LucidDreamer could facilitate more personalized and adaptable digital content creation.

Conclusion

LucidDreamer represents a robust methodology for 3D scene generation, offering flexibility across domains and input types while delivering high-quality results. It addresses current limitations in traditional 3D scene modeling by integrating advanced Gaussian splatting with state-of-the-art diffusion models. Future work could explore further enhancements in terms of rendering efficiency and extending its applications in various interdisciplinary fields.

Youtube Logo Streamline Icon: https://streamlinehq.com