Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation (2405.10508v1)

Published 17 May 2024 in cs.CV

Abstract: In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Gaudi: A neural architect for immersive 3d scene generation. Advances in Neural Information Processing Systems, 35:25102–25116, 2022.
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
  3. Generative novel view synthesis with 3d-aware diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4217–4229, 2023.
  4. Text2shape: Generating shapes from natural language by learning joint embeddings. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 100–116. Springer, 2019.
  5. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes. arXiv preprint arXiv:2311.13384, 2023.
  6. Can chatgpt boost artistic creation: The need of imaginative intelligence for parallel art. IEEE/CAA Journal of Automatica Sinica, 10(4):835–838, 2023.
  7. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  8. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  9. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  10. Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989, 2023.
  11. Ai art and its impact on artists. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 363–374, 2023.
  12. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  13. Text2video-zero: Text-to-image diffusion models are zero-shot video generators. arXiv preprint arXiv:2303.13439, 2023.
  14. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  15. Generating daylight-driven architectural design via diffusion models. arXiv preprint arXiv:2404.13353, 2024.
  16. Efficient temporal denoising for improved depth map applications. In Proc. Int. Conf. Learn. Representations, Tiny papers, 2023.
  17. Towards practical consistent video depth estimation. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, pages 388–397, 2023a.
  18. Layerdiffusion: Layered controlled image editing with diffusion models. In SIGGRAPH Asia 2023 Technical Communications, pages 1–4. 2023b.
  19. Sketch-to-Architecture: Generative AI-aided Architectural Design. In Pacific Graphics Short Papers and Posters. The Eurographics Association, 2023c.
  20. Tuning-free image customization with image and text guidance. arXiv preprint arXiv:2403.12658, 2024.
  21. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  22. Infinite nature: Perpetual view generation of natural scenes from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14458–14467, 2021.
  23. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  24. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  25. Consistent video depth estimation. ACM Transactions on Graphics (ToG), 39(4):71–1, 2020.
  26. Text-guided synthesis of eulerian cinemagraphs. ACM Transactions on Graphics (TOG), 42(6):1–13, 2023.
  27. Realfusion: 360deg reconstruction of any object from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8446–8455, 2023.
  28. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  29. Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
  30. Generative ai model for artistic style transfer using convolutional neural networks. arXiv preprint arXiv:2310.18237, 2023.
  31. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  32. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  33. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  34. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  35. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  36. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020.
  37. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  38. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
  39. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022b.
  40. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022c.
  41. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  42. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 34:6087–6101, 2021.
  43. 3d photography using context-aware layered depth inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8028–8038, 2020.
  44. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
  45. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22819–22829, 2023b.
  46. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
  47. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023a.
  48. Irs: A large naturalistic indoor robotics stereo dataset to train deep models for disparity and surface normal estimation. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2021.
  49. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  50. Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8121–8130, 2022.
  51. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  52. Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv preprint arXiv:2308.13223, 2023.
  53. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pengzhi Li (7 papers)
  2. Chengshuai Tang (2 papers)
  3. Zhiheng Li (67 papers)
  4. QInxuan Huang (2 papers)
Citations (9)

Summary

ART3D: Generating 3D Artistic Scenes Using AI

Let's dive into the fascinating world of AI-driven art with ART3D, a novel framework that merges diffusion models and 3D Gaussian splatting to create impressive 3D artistic scenes from text descriptions or reference images. This paper addresses some of the prevailing challenges in generating 3D art, presenting a solution that is both creative and technically potent.

The Core Innovation

ART3D stands out because it effectively bridges the gap between artistic and realistic images, making 3D art generation more consistent. Here's a glance at its components:

  • Diffusion Models: These are powerful tools for complex data modeling, often used in 2D art generation.
  • 3D Gaussian Splatting: This technique allows for fast and high-quality reconstruction of 3D scenes.

By combining these methodologies, ART3D produces 3D scenes that are stylistically consistent and visually appealing.

Key Components of ART3D

1. Image Semantic Transfer

One main challenge lies in generating realistic images from artistic styles. ART3D tackles this through an image semantic transfer algorithm:

  • Uses the attention mechanism of the Stable Diffusion model.
  • Ensures the semantic layout of realistic images aligns closely with the artistic ones.
  • Generates depth maps from these realistic images, bridging the artistic and realistic domain gap.

2. Point Cloud Map

To create 3D point clouds from the generated depth information, ART3D:

  • Projects depth pixels onto 3D space.
  • Reprojects these points to novel camera views.
  • Utilizes inpainting techniques to complete hollow areas in the projected images.

3. Depth Consistency Module

Consistency across different views is critical:

  • Introduces a depth consistency module that learns depth residuals to align depth maps from different viewpoints.
  • Ensures a unified depth range, improving the overall consistency of the 3D scene.

4. 3D Gaussian Splatting for Rendering

Finally, ART3D employs 3D Gaussian splatting to render the 3D scenes, starting from the initial point clouds and optimizing their position and volume.

Performance and Comparisons

Quantitative Results

ART3D excels in both style consistency and continuity metrics. Here's how the scores stack up:

  • CLIP-I (Image Similarity): ART3D achieves a score of 68.15, outperforming other methods like Text2Room (53.44) and LucidDreamer (64.43).
  • CLIP-T (Text Similarity): With a score of 26.81, ART3D again surpasses other approaches.

Qualitative Results

Comparing ART3D with other models like LucidDreamer and Text2Room shows that:

  • ART3D produces more continuous and structurally consistent 3D scenes.
  • Handles artistic styles better, avoiding the structural distortions observed in other methods.

User Studies

Participants rated ART3D's outputs highly in terms of structural consistency and content alignment with textual descriptions, achieving top scores in user studies.

Implications and Future Directions

ART3D takes a significant step forward in the fusion of AI and art. Practically, it allows artists and designers to generate intricate 3D scenes with minimal input, potentially revolutionizing fields like virtual reality, game design, and digital art.

Theoretically, this work opens doors to further advancements in AI models that can handle diverse artistic styles and complex scene structures. Future developments could include:

  • Enhancements in the image semantic transfer algorithms.
  • More robust methods for depth consistency across dynamic scenes.
  • Expanded datasets to train AI models for a wider range of artistic styles.

ART3D is a notable contribution to the interdisciplinary field of AI and art, showcasing the potential of merging advanced AI techniques to create stunning and consistent 3D artistic scenes. With further improvement and adoption, it could become a vital tool in digital creative processes.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com