Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication (2405.18515v2)

Published 28 May 2024 in cs.LG

Abstract: Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embodied AI, and robotics, where stable models are needed for reliable interaction. Additionally, stable models ensure that 3D-printed objects, such as figurines for home decoration, can stand on their own without requiring additional supports. To fill this gap, we introduce Atlas3D, an automatic and easy-to-implement method that enhances existing Score Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the generation of self-supporting 3D models that adhere to physical laws of stability under gravity, contact, and friction. Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization, serving as either a refinement or a post-processing module for existing frameworks. We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (109)
  1. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12608–12618, 2023.
  2. David Baraff. Physically based modeling: Rigid body simulation. SIGGRAPH Course Notes, ACM SIGGRAPH, 2(1):2–1, 2001.
  3. Gaudi: A neural architect for immersive 3d scene generation. Advances in Neural Information Processing Systems, 35:25102–25116, 2022.
  4. Learning body-aware 3d shape generative models. arXiv preprint arXiv:2112.07022, 2021.
  5. Large-vocabulary 3d diffusion model with transformer. arXiv preprint arXiv:2309.07920, 2023.
  6. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2416–2425, 2023.
  7. Virtual elastic objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15827–15837, 2022.
  8. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023.
  9. V3d: Video diffusion models are effective 3d generators. arXiv preprint arXiv:2403.06738, 2024.
  10. Gpld3d: Latent diffusion of 3d shape generative models by edupont2022datanforcing geometric and physical priors. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). IEEE/CVF, 2024.
  11. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022.
  12. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14300–14310, 2023.
  13. Introduction to quantum mechanics. Cambridge university press, 2018.
  14. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  15. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  16. Debiasing scores and prompts of 2d diffusion for view-consistent text-to-3d generation. Advances in Neural Information Processing Systems, 36, 2024.
  17. Diffusion-based generation, optimization, and planning in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16750–16761, 2023.
  18. Dreamtime: An improved optimization strategy for diffusion-guided 3d generation. In The Twelfth International Conference on Learning Representations, 2023.
  19. Plasticinelab: A soft-body manipulation benchmark with differentiable physics. arXiv preprint arXiv:2104.03311, 2021.
  20. Differentiable solver for time-dependent deformation problems with contact. ACM Transactions on Graphics, 2022.
  21. Vr-gs: A physical dynamics-aware interactive gaussian splatting system in virtual reality. arXiv preprint arXiv:2401.16663, 2024.
  22. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  23. Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18423–18433, 2023.
  24. Noise-free score distillation. In The Twelfth International Conference on Learning Representations, 2024.
  25. Eschernet: A generative model for scalable view synthesis. arXiv preprint arXiv:2402.03908, 2024.
  26. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics (ToG), 39(6):1–14, 2020.
  27. Affine body dynamics: Fast, stable & intersection-free simulation of stiff materials. arXiv preprint arXiv:2201.10022, 2022.
  28. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023.
  29. Incremental potential contact: intersection-and inversion-free, large-deformation dynamics. ACM Trans. Graph., 39(4):49, 2020.
  30. Diffusion-sdf: Text-to-shape via voxelized diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12642–12651, 2023.
  31. Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d. arXiv preprint arXiv:2310.02596, 2023.
  32. Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. In The Eleventh International Conference on Learning Representations, 2022.
  33. Diffavatar: Simulation-ready garment optimization with differentiable simulation. arXiv preprint arXiv:2311.12194, 2023.
  34. Diffcloth: Differentiable cloth simulation with dry frictional contact. ACM Transactions on Graphics (TOG), 42(1):1–20, 2022.
  35. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  36. Dreampolisher: Towards high-quality text-to-3d generation via geometric diffusion. arXiv preprint arXiv:2403.17237, 2024.
  37. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. arXiv preprint arXiv:2311.07885, 2023.
  38. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems, 36, 2024.
  39. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9298–9309, 2023.
  40. Few-shot physically-aware articulated mesh generation via hierarchical deformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 854–864, 2023.
  41. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023.
  42. Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
  43. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  44. Miles Macklin. Warp: A high-performance python framework for gpu simulation and graphics. https://github.com/nvidia/warp, March 2022. NVIDIA GPU Technology Conference (GTC).
  45. Hexagen3d: Stablediffusion is just one step away from fast and diverse text-to-3d generation. 2024.
  46. Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
  47. Physical simulation layer for accurate 3d modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13514–13523, 2022.
  48. Physically-aware generative network for 3d shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9330–9341, 2021.
  49. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13492–13502, 2022.
  50. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  51. Dit-3d: Exploring plain diffusion transformers for 3d shape generation. Advances in Neural Information Processing Systems, 36, 2024.
  52. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023.
  53. gradsim: Differentiable simulation for system identification and visuomotor control. In International conference on learning representations, 2020.
  54. Phyrecon: Physically plausible neural scene reconstruction. arXiv preprint arXiv:2404.16666, 2024.
  55. Autodecoding latent 3d diffusion models. Advances in Neural Information Processing Systems, 36:67021–67047, 2023.
  56. Automatic differentiation in pytorch. 2017.
  57. Compositional 3d scene generation using locally conditioned diffusion. arXiv preprint arXiv:2303.12218, 2023.
  58. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  59. Make it stand: balancing shapes for 3d fabrication. ACM Transactions on Graphics (TOG), 32(4):1–10, 2013.
  60. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
  61. Efficient differentiable simulation of articulated bodies. In International Conference on Machine Learning, pages 8661–8671. PMLR, 2021.
  62. Feature splatting: Language-driven physics-based scene synthesis and editing. arXiv preprint arXiv:2404.01223, 2024.
  63. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  64. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  65. High-resolution image synthesis with latent diffusion models, 2021.
  66. Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
  67. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 34:6087–6101, 2021.
  68. Zero123++: a single image to consistent multi-view diffusion base model, 2023.
  69. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
  70. Diffusion-based signed distance fields for 3d shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20887–20897, 2023.
  71. 3d design using generative adversarial networks and physics-based validation. Journal of Mechanical Design, 142(7):071701, 2020.
  72. 3d neural field generation using triplane diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023.
  73. Diffsdfsim: Differentiable rigid-body dynamics with implicit shapes. In 2021 international conference on 3D Vision (3DV), pages 96–105. IEEE, 2021.
  74. Diffcloud: Real-to-sim from point clouds with differentiable simulation and rendering of deformable objects. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10828–10835. IEEE, 2022.
  75. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8863–8873, 2023.
  76. Differentiable fluids with solid coupling for learning and control. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6138–6146, 2021.
  77. Stable score distillation for high-quality 3d generation. arXiv preprint arXiv:2312.09305, 2023.
  78. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  79. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22819–22829, 2023.
  80. Emergent correspondence from image diffusion. Advances in Neural Information Processing Systems, 36:1363–1389, 2023.
  81. Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems, 35:10021–10039, 2022.
  82. Cg3d: Compositional generation for text-to-3d via gaussian splatting. arXiv preprint arXiv:2311.17907, 2023.
  83. Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion. arXiv preprint arXiv:2403.12008, 2024.
  84. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023.
  85. Physics-aware 3d mesh synthesis. In 2019 International Conference on 3D Vision (3DV), pages 502–512. IEEE, 2019.
  86. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  87. Imagedream: Image-prompt multi-view diffusion for 3d generation. arXiv preprint arXiv:2312.02201, 2023.
  88. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 2024.
  89. Consistent123: Improve consistency for one image to 3d object synthesis, 2023.
  90. Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation. arXiv preprint arXiv:2401.04092, 2024.
  91. Physgaussian: Physics-integrated 3d gaussians for generative dynamics. arXiv preprint arXiv:2311.12198, 2023.
  92. Precise-physics driven text-to-3d generation. arXiv preprint arXiv:2403.12438, 2024.
  93. Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model. arXiv preprint arXiv:2311.09217, 2023.
  94. Magic-boost: Boost 3d generation with mutli-view conditioned diffusion. arXiv preprint arXiv:2404.06429, 2024.
  95. Consistnet: Enforcing 3d consistency for multi-view images diffusion. arXiv, 2023.
  96. Learn to optimize denoising scores for 3d generation: A unified and improved diffusion prior on nerf and 3d gaussian splatting. arXiv preprint arXiv:2312.04820, 2023.
  97. Physcene: Physically interactable 3d scene synthesis for embodied ai. arXiv preprint arXiv:2404.09465, 2024.
  98. Holodeck: Language guided generation of 3d embodied ai environments. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), volume 30, pages 20–25. IEEE/CVF, 2024.
  99. Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models. arXiv preprint arXiv:2310.03020, 2023.
  100. Diffusion time-step curriculum for one image to 3d generation. arXiv preprint arXiv:2404.04562, 2024.
  101. Text-to-3d with classifier score distillation. In The Twelfth International Conference on Learning Representations, 2024.
  102. Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16010–16021, 2023.
  103. Handypriors: Physically consistent perception of hand-object interactions with differentiable priors. arXiv preprint arXiv:2311.16552, 2023.
  104. Physdreamer: Physics-based interaction with 3d objects via video generation. arXiv preprint arXiv:2404.13026, 2024.
  105. Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv preprint arXiv:2308.13223, 2023.
  106. Free3d: Consistent novel view synthesis without 3d representation. arXiv, 2023.
  107. Locally attentional sdf diffusion for controllable 3d shape generation. ACM Transactions on Graphics (SIGGRAPH), 42(4), 2023.
  108. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
  109. Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766, 2023.

Summary

We haven't generated a summary for this paper yet.