Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-to-3D Shape Generation (2403.13289v1)

Published 20 Mar 2024 in cs.CV

Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination as they enable non-expert users to easily create 3D content directly from text. However, there are still many limitations and challenges remaining in this problem space. In this state-of-the-art report, we provide a survey of the underlying technology and methods enabling text-to-3D shape generation to summarize the background literature. We then derive a systematic categorization of recent work on text-to-3D shape generation based on the type of supervision data required. Finally, we discuss limitations of the existing categories of methods, and delineate promising directions for future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (156)
  1. Martin Arjovsky, Soumith Chintala and Léon Bottou “Wasserstein generative adversarial networks” In International conference on machine learning, 2017, pp. 214–223 PMLR arXiv:1701.07875 [stat.ML]
  2. “Learning representations and generative models for 3D point clouds” In International conference on machine learning, 2018, pp. 40–49 arXiv:1707.02392 [cs.CV]
  3. “ShapeGlot: Learning language for shape differentiation” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8938–8947 arXiv:1905.02925 [cs.CL]
  4. “ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12685–12694 DOI: 10.1109/CVPR52729.2023.01220
  5. “Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond” In arXiv preprint arXiv:2304.04968, 2023 arXiv:2304.04968 [cs.CV]
  6. “Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5855–5864 arXiv:2103.13415 [cs.CV]
  7. “Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2022 arXiv:2111.12077 [cs.CV]
  8. “eDiffI: Text-to-image diffusion models with an ensemble of expert denoisers” In arXiv preprint arXiv:2211.01324, 2022 arXiv:2211.01324 [cs.CV]
  9. “Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation” In International Conference on Computer Vision, 2023 arXiv:2303.13873 [cs.CV]
  10. “Text2Shape: Generating shapes from natural language by learning joint embeddings” In Asian Conference on Computer Vision, 2019, pp. 100–116 arXiv:1803.08495 [cs.CV]
  11. “ShapeNet: An information-rich 3D model repository” In arXiv preprint arXiv:1512.03012, 2015 arXiv:1512.03012 [cs.GR]
  12. Paramanand Chandramouli and Kanchana Vaishnavi Gandikota “LDEdit: Towards generalized text guided image manipulation via latent diffusion models” In Proceedings of the British Machine Vision Conference (BMVC) 3, 2022 arXiv:2210.02249 [cs.CV]
  13. Cheng-Kang Ted Chao and Yotam Gingold “Text-guided Image-and-Shape Editing and Generation: A Short Survey” In arXiv preprint arXiv:2304.09244, 2023 arXiv:2304.09244 [cs.GR]
  14. “ABO: Dataset and benchmarks for real-world 3D object understanding” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21126–21136 arXiv:2110.06199 [cs.CV]
  15. “Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors” In arXiv preprint arXiv:2311.17261, 2023 arXiv:2311.17261 [cs.CV]
  16. “SDFusion: Multimodal 3D shape completion, reconstruction, and generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4456–4465 arXiv:2212.04493 [cs.CV]
  17. “Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes” In arXiv preprint arXiv:2303.13450, 2023 arXiv:2303.13450 [cs.CV]
  18. “Learning generative models of 3D structures” In Computer Graphics Forum 39, 2020, pp. 643–666 Wiley Online Library DOI: 10.1111/cgf.14020
  19. “Text2tex: Text-driven texture synthesis via diffusion models” In arXiv preprint arXiv:2303.11396, 2023 arXiv:2303.11396 [cs.CV]
  20. Angel Chang, Manolis Savva and Christopher D Manning “Learning spatial knowledge for text to 3D scene generation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 2028–2038 DOI: 10.3115/v1/D14-1217
  21. Zilong Chen, Feng Wang and Huaping Liu “Text-to-3D using Gaussian Splatting” In arXiv preprint arXiv:2309.16585, 2023 arXiv:2309.16585 [cs.CV]
  22. “TensoRF: Tensorial radiance fields” In European Conference on Computer Vision, 2022, pp. 333–350 Springer arXiv:2203.09517 [cs.CV]
  23. “Objaverse-XL: A universe of 10m+ 3D objects” In arXiv preprint arXiv:2307.05663, 2023 arXiv:2307.05663 [cs.CV]
  24. “Diffusion models beat gans on image synthesis” In Advances in neural information processing systems 34, 2021, pp. 8780–8794 arXiv:2105.05233 [cs.LG]
  25. Laurent Dinh, Jascha Sohl-Dickstein and Samy Bengio “Density estimation using real NVP” In arXiv preprint arXiv:1605.08803, 2016 arXiv:1605.08803 [cs.LG]
  26. “Objaverse: A universe of annotated 3D objects” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13142–13153 arXiv:2212.08051 [cs.CV]
  27. “SceneScape: Text-driven consistent scene generation” In arXiv preprint arXiv:2302.01133, 2023 arXiv:2302.01133 [cs.CV]
  28. “Plenoxels: Radiance fields without neural networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5501–5510 arXiv:2112.05131 [cs.CV]
  29. “ShapeCrafter: A recursive text-conditioned 3D shape generation model” In Advances in Neural Information Processing Systems 35, 2022, pp. 8882–8895 arXiv:2207.09446 [cs.CV]
  30. “Improved training of Wasserstein GANs” In Advances in neural information processing systems 30, 2017 arXiv:1704.00028 [cs.LG]
  31. “Learning deformable tetrahedral meshes for 3D reconstruction” In Advances In Neural Information Processing Systems 33, 2020, pp. 9936–9947 arXiv:2011.01437 [cs.CV]
  32. “GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2024 arXiv:2312.00093 [cs.CV]
  33. “GET3D: A generative model of high quality 3D textured shapes learned from images” In Advances In Neural Information Processing Systems 35, 2022, pp. 31841–31854 arXiv:2209.11163 [cs.CV]
  34. Susung Hong, Donghoon Ahn and Seungryong Kim “Debiasing scores and prompts of 2D diffusion for robust text-to-3D generation” In Advances in Neural Information Processing Systems, 2023 arXiv:2303.15413 [cs.CV]
  35. “Text2room: Extracting textured 3D meshes from 2D text-to-image models” In International Conference on Computer Vision, 2023 arXiv:2303.11989 [cs.CV]
  36. David Ha, Andrew M Dai and Quoc V Le “HyperNetworks” In International Conference on Learning Representations, 2017, pp. 24–26 arXiv:1609.09106 [cs.LG]
  37. Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in neural information processing systems 33, 2020, pp. 6840–6851 arXiv:2006.11239 [cs.LG]
  38. “VLGrammar: Grounded grammar induction of vision and language” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1665–1674 arXiv:2103.12975 [cs.CV]
  39. Philip Haeusser, Alexander Mordvintsev and Daniel Cremers “Learning by association–A versatile semi-supervised training method for neural networks” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 89–98 arXiv:1706.00909 [cs.CV]
  40. “GANs trained by a two time-scale update rule converge to a local nash equilibrium” In Advances in neural information processing systems 30, 2017 arXiv:1706.08500 [cs.LG]
  41. “Classifier-free diffusion guidance” In arXiv preprint arXiv:2207.12598, 2022 arXiv:2207.12598 [cs.LG]
  42. “LoRA: Low-rank adaptation of large language models” In arXiv preprint arXiv:2106.09685, 2021 arXiv:2106.09685 [cs.CL]
  43. “Instruct-NeRF2NeRF: Editing 3D scenes with instructions” In International Conference on Computer Vision, 2023 arXiv:2303.12789 [cs.CV]
  44. “LRM: Large reconstruction model for single image to 3D” In arXiv preprint arXiv:2311.04400, 2023 arXiv:2311.04400 [cs.CV]
  45. “ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives” In ACM Transactions on Graphics (TOG), Proc. SIGGRAPH, 2023 arXiv:2305.05661 [cs.GR]
  46. “Zero-shot text-guided object generation with dream fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 867–876 arXiv:2112.01455 [cs.CV]
  47. “Shap-E: Generating conditional 3D implicit functions” In arXiv preprint arXiv:2305.02463, 2023 arXiv:2305.02463 [cs.CV]
  48. “Elucidating the design space of diffusion-based generative models” In Advances in Neural Information Processing Systems 35, 2022, pp. 26565–26577 arXiv:2206.00364 [cs.LG]
  49. Michael Kazhdan, Matthew Bolitho and Hugues Hoppe “Poisson surface reconstruction” In Proceedings of the fourth Eurographics symposium on Geometry processing 7, 2006 DOI: 10.2312/SGP/SGP06/061-070
  50. “3D Gaussian splatting for real-time radiance field rendering” In ACM Transactions on Graphics (ToG) 42.4 ACM New York, NY, USA, 2023, pp. 1–14 arXiv:2308.04079 [cs.GR]
  51. Gwanghyun Kim, Taesung Kwon and Jong Chul Ye “DiffusionCLIP: Text-guided diffusion models for robust image manipulation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2426–2435 arXiv:2110.02711 [cs.CV]
  52. “Noise-free score distillation” In arXiv preprint arXiv:2310.17590, 2023 arXiv:2310.17590 [cs.CV]
  53. “ReLU fields: The little non-linearity that could” In ACM SIGGRAPH Conference Proceedings, 2022, pp. 1–9 arXiv:2205.10824 [cs.CV]
  54. “BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion” In arXiv preprint arXiv:2305.15798, 2023 arXiv:2305.15798 [cs.CV]
  55. “Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion” In arXiv preprint arXiv:2303.15780, 2023 arXiv:2303.15780 [cs.CV]
  56. Diederik P Kingma and Max Welling “Auto-encoding variational bayes” In arXiv preprint arXiv:1312.6114, 2013 arXiv:1312.6114 [stat.ML]
  57. “SALAD: Part-level latent diffusion for 3D shape generation and manipulation” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14441–14451 arXiv:2303.12236 [cs.CV]
  58. LambdaLabs “Stable Diffusion Image Variations - A Hugging Face Space by LambdaLabs”, https://huggingface.co/lambdalabs/sd-image-variations-diffusers
  59. “CompoNeRF: Text-guided multi-object compositional NeRF with editable 3D scene layout” In arXiv preprint arXiv:2303.13843, 2023 arXiv:2303.13843 [cs.CV]
  60. Han-Hung Lee and Angel X Chang “Understanding pure clip guidance for voxel grid nerf models” In arXiv preprint arXiv:2209.15172, 2022 arXiv:2209.15172 [cs.CV]
  61. William E Lorensen and Harvey E Cline “Marching cubes: A high resolution 3D surface construction algorithm” In Seminal graphics: pioneering efforts that shaped the field ACM SIGGRAPH, 1998, pp. 347–353 DOI: 10.1145/37402.37422
  62. “SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D” In arXiv preprint arXiv:2310.02596, 2023 arXiv:2310.02596 [cs.CV]
  63. “3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023 arXiv:2303.10406 [cs.CV]
  64. “ISS: Image as stepping stone for text-guided 3D shape generation” In International Conference on Learning Representations, 2023 arXiv:2209.04145 [cs.CV]
  65. “Repaint: Inpainting using denoising diffusion probabilistic models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11461–11471 arXiv:2201.09865 [cs.CV]
  66. “Diffusion-SDF: Text-to-shape via voxelized diffusion” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12642–12651 arXiv:2212.03293 [cs.CV]
  67. “Wonder3D: Single image to 3D using cross-domain diffusion” In arXiv preprint arXiv:2310.15008, 2023 arXiv:2310.15008 [cs.CV]
  68. “Magic3D: High-resolution text-to-3D content creation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 300–309 arXiv:2211.10440 [cs.CV]
  69. “Modular primitives for high-performance differentiable rendering” In ACM Transactions on Graphics (TOG) 39.6 ACM New York, NY, USA, 2020, pp. 1–14 arXiv:2011.03277 [cs.GR]
  70. Tiange Luo, Honglak Lee and Justin Johnson “Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program” In Transactions on Machine Learning Research, 2022 arXiv:2212.12952 [cs.CV]
  71. “UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation” In arXiv preprint arXiv:2312.08754, 2023 arXiv:2312.08754 [cs.CV]
  72. “SparseNeuS: Fast generalizable neural surface reconstruction from sparse views” In European Conference on Computer Vision, 2022, pp. 210–227 Springer arXiv:2206.05737 [cs.CV]
  73. “SyncDreamer: Generating Multiview-consistent Images from a Single-view Image” In arXiv preprint arXiv:2309.03453, 2023 arXiv:2309.03453 [cs.CV]
  74. “Implicit maximum likelihood estimation” In arXiv preprint arXiv:1809.09087, 2018 arXiv:1809.09087 [cs.LG]
  75. “Scalable 3D Captioning with Pretrained Models” In arXiv preprint arXiv:2306.07279, 2023 arXiv:2306.07279 [cs.CV]
  76. “One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion” In arXiv preprint arXiv:2311.07885, 2023 arXiv:2311.07885 [cs.CV]
  77. “OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding” In arXiv preprint arXiv:2305.10764, 2023 arXiv:2305.10764 [cs.CV]
  78. “Instant3D: Fast text-to-3D with sparse-view generation and large reconstruction model” In arXiv preprint arXiv:2311.06214, 2023 arXiv:2311.06214 [cs.CV]
  79. “Towards implicit text-guided 3D shape generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17896–17906 arXiv:2203.14622 [cs.CV]
  80. “Zero-1-to-3: Zero-shot one image to 3D object” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9298–9309 arXiv:2303.11328 [cs.CV]
  81. “One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization” In arXiv preprint arXiv:2306.16928, 2023 arXiv:2306.16928 [cs.CV]
  82. “ATT3D: Amortized Text-to-3D Object Synthesis” In International Conference on Computer Vision, 2023 arXiv:2306.07349 [cs.LG]
  83. “LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching” In arXiv preprint arXiv:2311.11284, 2023 arXiv:2311.11284 [cs.CV]
  84. “Instant3D: Instant Text-to-3D Generation” In arXiv preprint arXiv:2311.08403, 2023 arXiv:2311.08403 [cs.CV]
  85. “Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5 D Diffusion” In arXiv preprint arXiv:2311.15980, 2023 arXiv:2311.15980 [cs.CV]
  86. “Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era” In arXiv preprint arXiv:2305.06131, 2023 arXiv:2305.06131 [cs.CV]
  87. “Text2mesh: Text-driven neural stylization for meshes” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13492–13502 arXiv:2112.03221 [cs.CV]
  88. “AutoSDF: Shape priors for 3D completion, reconstruction and generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 306–315 arXiv:2203.09516 [cs.CV]
  89. “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding” In ACM Transansactions on Graphics (TOG) - SIGGRAPH 41.4 New York, NY, USA: ACM, 2022, pp. 102:1–102:15 arXiv:2201.05989 [cs.CV]
  90. “SDEdit: Guided image synthesis and editing with stochastic differential equations” In arXiv preprint arXiv:2108.01073, 2021 arXiv:2108.01073 [cs.CV]
  91. “Conditional generative adversarial nets” In arXiv preprint arXiv:1411.1784, 2014 arXiv:1411.1784 [cs.LG]
  92. “Occupancy networks: Learning 3D reconstruction in function space” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4460–4470 arXiv:1812.03828 [cs.CV]
  93. “SKED: Sketch-guided Text-based 3D Editing” In arXiv preprint arXiv:2303.10735, 2023 arXiv:2303.10735 [cs.CV]
  94. “Latent-NeRF for shape-guided generation of 3D shapes and textures” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12663–12673 arXiv:2211.07600 [cs.CV]
  95. “NeRF: Representing scenes as neural radiance fields for view synthesis” In Proceedings of European Conference on Computer Vision, 2020, pp. 405–421 arXiv:2003.08934 [cs.CV]
  96. “CLIP-mesh: Generating textured meshes from text using pretrained image-text models” In SIGGRAPH Asia conference papers, 2022, pp. 1–8 arXiv:2203.13333 [cs.CV]
  97. “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models” In arXiv preprint arXiv:2112.10741, 2021 arXiv:2112.10741 [cs.CV]
  98. “Point-E: A system for generating 3D point clouds from complex prompts” In arXiv preprint arXiv:2212.08751, 2022 arXiv:2212.08751 [cs.CV]
  99. “Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3504–3515 arXiv:1912.07372 [cs.CV]
  100. “Benchmark for compositional text-to-image synthesis” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021 URL: https://openreview.net/pdf?id=bKBhQhPeKaF
  101. “DeepSDF: Learning continuous signed distance functions for shape representation” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174 arXiv:1901.05103 [cs.CV]
  102. “DreamFusion: Text-to-3D using 2D Diffusion” In International conference on machine learning, 2023 arXiv:2209.14988 [cs.CV]
  103. “Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping” In International Conference on Learning Representations, 2024 arXiv:2310.12474 [cs.CV]
  104. “Compositional 3D scene generation using locally conditioned diffusion” In arXiv preprint arXiv:2303.12218, 2023 arXiv:2303.12218 [cs.CV]
  105. René Ranftl, Alexey Bochkovskiy and Vladlen Koltun “Vision transformers for dense prediction” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12179–12188 arXiv:2103.13413 [cs.CV]
  106. “High-resolution image synthesis with latent diffusion models” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695 arXiv:2112.10752 [cs.CV]
  107. “Hierarchical text-conditional image generation with CLIP latents” In arXiv preprint arXiv:2204.06125 1.2, 2022, pp. 3 arXiv:2204.06125 [cs.CV]
  108. Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-net: Convolutional networks for biomedical image segmentation” In Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241 Springer arXiv:1505.04597 [cs.CV]
  109. “Neurosymbolic Models for Computer Graphics” In Computer Graphics Forum 42, 2023, pp. 545–568 Wiley Online Library arXiv:2304.10320 [cs.GR]
  110. “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 arXiv:2103.00020 [cs.CV]
  111. “TEXTure: Text-guided texturing of 3D shapes” In ACM SIGGRAPH Conference Proceedings, 2023 arXiv:2302.01721 [cs.CV]
  112. Herbert E Robbins “An empirical Bayes approach to statistics” In Breakthroughs in Statistics: Foundations and basic theory Springer, 1992, pp. 388–394
  113. “CLIP-Forge: Towards zero-shot text-to-shape generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18603–18613 arXiv:2110.02624 [cs.CV]
  114. “Photorealistic text-to-image diffusion models with deep language understanding” In Advances in Neural Information Processing Systems 35, 2022, pp. 36479–36494 arXiv:2205.11487 [cs.CV]
  115. “RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture” In arXiv preprint arXiv:2305.11337, 2023 arXiv:2305.11337 [cs.CV]
  116. “Zero123++: a single image to consistent multi-view diffusion base model” In arXiv preprint arXiv:2310.15110, 2023 arXiv:2310.15110 [cs.CV]
  117. “Vox-E: Text-guided Voxel Editing of 3D Objects” In International Conference on Computer Vision, 2023 arXiv:2303.12048 [cs.CV]
  118. “CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18339–18348 arXiv:2211.01427 [cs.CV]
  119. “Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis” In Advances in Neural Information Processing Systems, 2021 arXiv:2111.04276 [cs.CV]
  120. “Improved techniques for training GANs” In Advances in neural information processing systems 29, 2016 arXiv:1606.03498 [cs.LG]
  121. “Let 2D diffusion model know 3D-consistency for robust text-to-3D generation” In arXiv preprint arXiv:2303.07937, 2023 arXiv:2303.07937 [cs.CV]
  122. Kihyuk Sohn “Improved deep metric learning with multi-class N-pair loss objective” In Advances in neural information processing systems 29, 2016 DOI: 10.5555/3157096.3157304
  123. “Deep generative models on 3D representations: A survey” In arXiv preprint arXiv:2210.15663, 2022 arXiv:2210.15663 [cs.CV]
  124. Cheng Sun, Min Sun and Hwann-Tzong Chen “Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5459–5469 arXiv:2111.11215 [cs.CV]
  125. “Score-based generative modeling through stochastic differential equations” In International conference on learning representations, 2021 arXiv:2011.13456 [cs.LG]
  126. “Deep unsupervised learning using nonequilibrium thermodynamics” In International conference on machine learning, 2015, pp. 2256–2265 PMLR arXiv:1503.03585 [cs.LG]
  127. “MVDream: Multi-view diffusion for 3D generation” In arXiv preprint arXiv:2308.16512, 2023 arXiv:2308.16512 [cs.CV]
  128. “TextMesh: Generation of Realistic 3D Meshes From Text Prompts” In arXiv preprint arXiv:2304.12439, 2023 arXiv:2304.12439 [cs.CV]
  129. “DreamGaussian: Generative gaussian splatting for efficient 3D content creation” In arXiv preprint arXiv:2309.16653, 2023 arXiv:2309.16653 [cs.CV]
  130. “Language grounding with 3D objects” In Conference on Robot Learning, 2022, pp. 1691–1701 arXiv:2107.12514 [cs.CL]
  131. “Stable Score Distillation for High-Quality 3D Generation” In arXiv preprint arXiv:2312.09305, 2023 arXiv:2312.09305 [cs.CV]
  132. Xi Tian, Yong-Liang Yang and Qi Wu “ShapeScaffolder: Structure-Aware 3D Shape Generation from Text” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2715–2724 DOI: 10.1109/ICCV51070.2023.00256
  133. Alexander Vilesov, Pradyumna Chari and Achuta Kadambi “CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting” In arXiv preprint arXiv:2311.17907, 2023 arXiv:2311.17907 [cs.CV]
  134. Aaron Van Den Oord and Oriol Vinyals “Neural discrete representation learning” In Advances in neural information processing systems 30, 2017 arXiv:1711.00937 [cs.LG]
  135. “CLIP-NeRF: Text-and-image driven manipulation of neural radiance fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3835–3844 arXiv:2112.05139 [cs.CV]
  136. “Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 12619–12629 arXiv:2212.00774 [cs.CV]
  137. “SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity” In arXiv preprint arXiv:2401.00604, 2023 arXiv:2401.00604 [cs.CV]
  138. “NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction” In arXiv preprint arXiv:2106.10689, 2021 arXiv:2106.10689 [cs.CV]
  139. “ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation” In Advances in Neural Information Processing Systems, 2023 arXiv:2305.16213 [cs.LG]
  140. “TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16805–16815 arXiv:2303.13273 [cs.CV]
  141. “Taming Mode Collapse in Score Distillation for Text-to-3D Generation” In arXiv preprint arXiv:2401.00909, 2023 arXiv:2401.00909 [cs.CV]
  142. “GPT-4V (ision) is a Human-Aligned Evaluator for Text-to-3D Generation” In arXiv preprint arXiv:2401.04092, 2024 arXiv:2401.04092 [cs.CV]
  143. “Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior” In arXiv preprint arXiv:2401.09050, 2024 arXiv:2401.09050 [cs.CV]
  144. “ULIP: Learning a unified representation of language, images, and point clouds for 3D understanding” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1179–1189 arXiv:2212.05171 [cs.CV]
  145. “DMV3D: Denoising multi-view diffusion using 3D large reconstruction model” In arXiv preprint arXiv:2311.09217, 2023 arXiv:2311.09217 [cs.CV]
  146. “Neural fields in visual computing and beyond” In Computer Graphics Forum 41, 2022, pp. 641–676 Wiley Online Library arXiv:2111.11426 [cs.CV]
  147. “Dream3D: Zero-shot text-to-3D synthesis using 3D shape prior and text-to-image diffusion models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20908–20918 arXiv:2212.14704 [cs.CV]
  148. “GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors” In arXiv preprint arXiv:2310.08529, 2023 arXiv:2310.08529 [cs.CV]
  149. “Text-to-3D with classifier score distillation” In arXiv preprint arXiv:2310.19415, 2023 arXiv:2310.19415 [cs.CV]
  150. “CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs” In arXiv preprint arXiv:2311.16703, 2023 arXiv:2311.16703 [cs.CV]
  151. “Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation” In arXiv preprint arXiv:2306.17115, 2023 arXiv:2306.17115 [cs.CV]
  152. “Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields” In IEEE Transactions on Visualization and Computer Graphics, 2023 arXiv:2305.11588 [cs.CV]
  153. “SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation” In Computer Graphics Forum 41, 2022, pp. 52–63 Wiley Online Library arXiv:2206.12055 [cs.CV]
  154. “Locally attentional SDF diffusion for controllable 3D shape generation” In ACM Transactions on Graphics (TOG), Proc. SIGGRAPH, 2023 arXiv:2305.04461 [cs.CV]
  155. “DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling” In arXiv preprint arXiv:2311.17082, 2023 arXiv:2311.17082 [cs.CV]
  156. “SceneWiz3D: Towards Text-guided 3D Scene Composition” In arXiv preprint arXiv:2312.08885, 2023 arXiv:2312.08885 [cs.CV]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Han-Hung Lee (6 papers)
  2. Manolis Savva (64 papers)
  3. Angel X. Chang (58 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com