Text-to-3D Shape Generation (2403.13289v1)
Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination as they enable non-expert users to easily create 3D content directly from text. However, there are still many limitations and challenges remaining in this problem space. In this state-of-the-art report, we provide a survey of the underlying technology and methods enabling text-to-3D shape generation to summarize the background literature. We then derive a systematic categorization of recent work on text-to-3D shape generation based on the type of supervision data required. Finally, we discuss limitations of the existing categories of methods, and delineate promising directions for future work.
- Martin Arjovsky, Soumith Chintala and Léon Bottou “Wasserstein generative adversarial networks” In International conference on machine learning, 2017, pp. 214–223 PMLR arXiv:1701.07875 [stat.ML]
- “Learning representations and generative models for 3D point clouds” In International conference on machine learning, 2018, pp. 40–49 arXiv:1707.02392 [cs.CV]
- “ShapeGlot: Learning language for shape differentiation” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8938–8947 arXiv:1905.02925 [cs.CL]
- “ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12685–12694 DOI: 10.1109/CVPR52729.2023.01220
- “Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond” In arXiv preprint arXiv:2304.04968, 2023 arXiv:2304.04968 [cs.CV]
- “Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5855–5864 arXiv:2103.13415 [cs.CV]
- “Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2022 arXiv:2111.12077 [cs.CV]
- “eDiffI: Text-to-image diffusion models with an ensemble of expert denoisers” In arXiv preprint arXiv:2211.01324, 2022 arXiv:2211.01324 [cs.CV]
- “Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation” In International Conference on Computer Vision, 2023 arXiv:2303.13873 [cs.CV]
- “Text2Shape: Generating shapes from natural language by learning joint embeddings” In Asian Conference on Computer Vision, 2019, pp. 100–116 arXiv:1803.08495 [cs.CV]
- “ShapeNet: An information-rich 3D model repository” In arXiv preprint arXiv:1512.03012, 2015 arXiv:1512.03012 [cs.GR]
- Paramanand Chandramouli and Kanchana Vaishnavi Gandikota “LDEdit: Towards generalized text guided image manipulation via latent diffusion models” In Proceedings of the British Machine Vision Conference (BMVC) 3, 2022 arXiv:2210.02249 [cs.CV]
- Cheng-Kang Ted Chao and Yotam Gingold “Text-guided Image-and-Shape Editing and Generation: A Short Survey” In arXiv preprint arXiv:2304.09244, 2023 arXiv:2304.09244 [cs.GR]
- “ABO: Dataset and benchmarks for real-world 3D object understanding” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21126–21136 arXiv:2110.06199 [cs.CV]
- “Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors” In arXiv preprint arXiv:2311.17261, 2023 arXiv:2311.17261 [cs.CV]
- “SDFusion: Multimodal 3D shape completion, reconstruction, and generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4456–4465 arXiv:2212.04493 [cs.CV]
- “Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes” In arXiv preprint arXiv:2303.13450, 2023 arXiv:2303.13450 [cs.CV]
- “Learning generative models of 3D structures” In Computer Graphics Forum 39, 2020, pp. 643–666 Wiley Online Library DOI: 10.1111/cgf.14020
- “Text2tex: Text-driven texture synthesis via diffusion models” In arXiv preprint arXiv:2303.11396, 2023 arXiv:2303.11396 [cs.CV]
- Angel Chang, Manolis Savva and Christopher D Manning “Learning spatial knowledge for text to 3D scene generation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 2028–2038 DOI: 10.3115/v1/D14-1217
- Zilong Chen, Feng Wang and Huaping Liu “Text-to-3D using Gaussian Splatting” In arXiv preprint arXiv:2309.16585, 2023 arXiv:2309.16585 [cs.CV]
- “TensoRF: Tensorial radiance fields” In European Conference on Computer Vision, 2022, pp. 333–350 Springer arXiv:2203.09517 [cs.CV]
- “Objaverse-XL: A universe of 10m+ 3D objects” In arXiv preprint arXiv:2307.05663, 2023 arXiv:2307.05663 [cs.CV]
- “Diffusion models beat gans on image synthesis” In Advances in neural information processing systems 34, 2021, pp. 8780–8794 arXiv:2105.05233 [cs.LG]
- Laurent Dinh, Jascha Sohl-Dickstein and Samy Bengio “Density estimation using real NVP” In arXiv preprint arXiv:1605.08803, 2016 arXiv:1605.08803 [cs.LG]
- “Objaverse: A universe of annotated 3D objects” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13142–13153 arXiv:2212.08051 [cs.CV]
- “SceneScape: Text-driven consistent scene generation” In arXiv preprint arXiv:2302.01133, 2023 arXiv:2302.01133 [cs.CV]
- “Plenoxels: Radiance fields without neural networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5501–5510 arXiv:2112.05131 [cs.CV]
- “ShapeCrafter: A recursive text-conditioned 3D shape generation model” In Advances in Neural Information Processing Systems 35, 2022, pp. 8882–8895 arXiv:2207.09446 [cs.CV]
- “Improved training of Wasserstein GANs” In Advances in neural information processing systems 30, 2017 arXiv:1704.00028 [cs.LG]
- “Learning deformable tetrahedral meshes for 3D reconstruction” In Advances In Neural Information Processing Systems 33, 2020, pp. 9936–9947 arXiv:2011.01437 [cs.CV]
- “GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2024 arXiv:2312.00093 [cs.CV]
- “GET3D: A generative model of high quality 3D textured shapes learned from images” In Advances In Neural Information Processing Systems 35, 2022, pp. 31841–31854 arXiv:2209.11163 [cs.CV]
- Susung Hong, Donghoon Ahn and Seungryong Kim “Debiasing scores and prompts of 2D diffusion for robust text-to-3D generation” In Advances in Neural Information Processing Systems, 2023 arXiv:2303.15413 [cs.CV]
- “Text2room: Extracting textured 3D meshes from 2D text-to-image models” In International Conference on Computer Vision, 2023 arXiv:2303.11989 [cs.CV]
- David Ha, Andrew M Dai and Quoc V Le “HyperNetworks” In International Conference on Learning Representations, 2017, pp. 24–26 arXiv:1609.09106 [cs.LG]
- Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in neural information processing systems 33, 2020, pp. 6840–6851 arXiv:2006.11239 [cs.LG]
- “VLGrammar: Grounded grammar induction of vision and language” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1665–1674 arXiv:2103.12975 [cs.CV]
- Philip Haeusser, Alexander Mordvintsev and Daniel Cremers “Learning by association–A versatile semi-supervised training method for neural networks” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 89–98 arXiv:1706.00909 [cs.CV]
- “GANs trained by a two time-scale update rule converge to a local nash equilibrium” In Advances in neural information processing systems 30, 2017 arXiv:1706.08500 [cs.LG]
- “Classifier-free diffusion guidance” In arXiv preprint arXiv:2207.12598, 2022 arXiv:2207.12598 [cs.LG]
- “LoRA: Low-rank adaptation of large language models” In arXiv preprint arXiv:2106.09685, 2021 arXiv:2106.09685 [cs.CL]
- “Instruct-NeRF2NeRF: Editing 3D scenes with instructions” In International Conference on Computer Vision, 2023 arXiv:2303.12789 [cs.CV]
- “LRM: Large reconstruction model for single image to 3D” In arXiv preprint arXiv:2311.04400, 2023 arXiv:2311.04400 [cs.CV]
- “ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives” In ACM Transactions on Graphics (TOG), Proc. SIGGRAPH, 2023 arXiv:2305.05661 [cs.GR]
- “Zero-shot text-guided object generation with dream fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 867–876 arXiv:2112.01455 [cs.CV]
- “Shap-E: Generating conditional 3D implicit functions” In arXiv preprint arXiv:2305.02463, 2023 arXiv:2305.02463 [cs.CV]
- “Elucidating the design space of diffusion-based generative models” In Advances in Neural Information Processing Systems 35, 2022, pp. 26565–26577 arXiv:2206.00364 [cs.LG]
- Michael Kazhdan, Matthew Bolitho and Hugues Hoppe “Poisson surface reconstruction” In Proceedings of the fourth Eurographics symposium on Geometry processing 7, 2006 DOI: 10.2312/SGP/SGP06/061-070
- “3D Gaussian splatting for real-time radiance field rendering” In ACM Transactions on Graphics (ToG) 42.4 ACM New York, NY, USA, 2023, pp. 1–14 arXiv:2308.04079 [cs.GR]
- Gwanghyun Kim, Taesung Kwon and Jong Chul Ye “DiffusionCLIP: Text-guided diffusion models for robust image manipulation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2426–2435 arXiv:2110.02711 [cs.CV]
- “Noise-free score distillation” In arXiv preprint arXiv:2310.17590, 2023 arXiv:2310.17590 [cs.CV]
- “ReLU fields: The little non-linearity that could” In ACM SIGGRAPH Conference Proceedings, 2022, pp. 1–9 arXiv:2205.10824 [cs.CV]
- “BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion” In arXiv preprint arXiv:2305.15798, 2023 arXiv:2305.15798 [cs.CV]
- “Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion” In arXiv preprint arXiv:2303.15780, 2023 arXiv:2303.15780 [cs.CV]
- Diederik P Kingma and Max Welling “Auto-encoding variational bayes” In arXiv preprint arXiv:1312.6114, 2013 arXiv:1312.6114 [stat.ML]
- “SALAD: Part-level latent diffusion for 3D shape generation and manipulation” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14441–14451 arXiv:2303.12236 [cs.CV]
- LambdaLabs “Stable Diffusion Image Variations - A Hugging Face Space by LambdaLabs”, https://huggingface.co/lambdalabs/sd-image-variations-diffusers
- “CompoNeRF: Text-guided multi-object compositional NeRF with editable 3D scene layout” In arXiv preprint arXiv:2303.13843, 2023 arXiv:2303.13843 [cs.CV]
- Han-Hung Lee and Angel X Chang “Understanding pure clip guidance for voxel grid nerf models” In arXiv preprint arXiv:2209.15172, 2022 arXiv:2209.15172 [cs.CV]
- William E Lorensen and Harvey E Cline “Marching cubes: A high resolution 3D surface construction algorithm” In Seminal graphics: pioneering efforts that shaped the field ACM SIGGRAPH, 1998, pp. 347–353 DOI: 10.1145/37402.37422
- “SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D” In arXiv preprint arXiv:2310.02596, 2023 arXiv:2310.02596 [cs.CV]
- “3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023 arXiv:2303.10406 [cs.CV]
- “ISS: Image as stepping stone for text-guided 3D shape generation” In International Conference on Learning Representations, 2023 arXiv:2209.04145 [cs.CV]
- “Repaint: Inpainting using denoising diffusion probabilistic models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11461–11471 arXiv:2201.09865 [cs.CV]
- “Diffusion-SDF: Text-to-shape via voxelized diffusion” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12642–12651 arXiv:2212.03293 [cs.CV]
- “Wonder3D: Single image to 3D using cross-domain diffusion” In arXiv preprint arXiv:2310.15008, 2023 arXiv:2310.15008 [cs.CV]
- “Magic3D: High-resolution text-to-3D content creation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 300–309 arXiv:2211.10440 [cs.CV]
- “Modular primitives for high-performance differentiable rendering” In ACM Transactions on Graphics (TOG) 39.6 ACM New York, NY, USA, 2020, pp. 1–14 arXiv:2011.03277 [cs.GR]
- Tiange Luo, Honglak Lee and Justin Johnson “Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program” In Transactions on Machine Learning Research, 2022 arXiv:2212.12952 [cs.CV]
- “UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation” In arXiv preprint arXiv:2312.08754, 2023 arXiv:2312.08754 [cs.CV]
- “SparseNeuS: Fast generalizable neural surface reconstruction from sparse views” In European Conference on Computer Vision, 2022, pp. 210–227 Springer arXiv:2206.05737 [cs.CV]
- “SyncDreamer: Generating Multiview-consistent Images from a Single-view Image” In arXiv preprint arXiv:2309.03453, 2023 arXiv:2309.03453 [cs.CV]
- “Implicit maximum likelihood estimation” In arXiv preprint arXiv:1809.09087, 2018 arXiv:1809.09087 [cs.LG]
- “Scalable 3D Captioning with Pretrained Models” In arXiv preprint arXiv:2306.07279, 2023 arXiv:2306.07279 [cs.CV]
- “One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion” In arXiv preprint arXiv:2311.07885, 2023 arXiv:2311.07885 [cs.CV]
- “OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding” In arXiv preprint arXiv:2305.10764, 2023 arXiv:2305.10764 [cs.CV]
- “Instant3D: Fast text-to-3D with sparse-view generation and large reconstruction model” In arXiv preprint arXiv:2311.06214, 2023 arXiv:2311.06214 [cs.CV]
- “Towards implicit text-guided 3D shape generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17896–17906 arXiv:2203.14622 [cs.CV]
- “Zero-1-to-3: Zero-shot one image to 3D object” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9298–9309 arXiv:2303.11328 [cs.CV]
- “One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization” In arXiv preprint arXiv:2306.16928, 2023 arXiv:2306.16928 [cs.CV]
- “ATT3D: Amortized Text-to-3D Object Synthesis” In International Conference on Computer Vision, 2023 arXiv:2306.07349 [cs.LG]
- “LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching” In arXiv preprint arXiv:2311.11284, 2023 arXiv:2311.11284 [cs.CV]
- “Instant3D: Instant Text-to-3D Generation” In arXiv preprint arXiv:2311.08403, 2023 arXiv:2311.08403 [cs.CV]
- “Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5 D Diffusion” In arXiv preprint arXiv:2311.15980, 2023 arXiv:2311.15980 [cs.CV]
- “Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era” In arXiv preprint arXiv:2305.06131, 2023 arXiv:2305.06131 [cs.CV]
- “Text2mesh: Text-driven neural stylization for meshes” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13492–13502 arXiv:2112.03221 [cs.CV]
- “AutoSDF: Shape priors for 3D completion, reconstruction and generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 306–315 arXiv:2203.09516 [cs.CV]
- “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding” In ACM Transansactions on Graphics (TOG) - SIGGRAPH 41.4 New York, NY, USA: ACM, 2022, pp. 102:1–102:15 arXiv:2201.05989 [cs.CV]
- “SDEdit: Guided image synthesis and editing with stochastic differential equations” In arXiv preprint arXiv:2108.01073, 2021 arXiv:2108.01073 [cs.CV]
- “Conditional generative adversarial nets” In arXiv preprint arXiv:1411.1784, 2014 arXiv:1411.1784 [cs.LG]
- “Occupancy networks: Learning 3D reconstruction in function space” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4460–4470 arXiv:1812.03828 [cs.CV]
- “SKED: Sketch-guided Text-based 3D Editing” In arXiv preprint arXiv:2303.10735, 2023 arXiv:2303.10735 [cs.CV]
- “Latent-NeRF for shape-guided generation of 3D shapes and textures” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12663–12673 arXiv:2211.07600 [cs.CV]
- “NeRF: Representing scenes as neural radiance fields for view synthesis” In Proceedings of European Conference on Computer Vision, 2020, pp. 405–421 arXiv:2003.08934 [cs.CV]
- “CLIP-mesh: Generating textured meshes from text using pretrained image-text models” In SIGGRAPH Asia conference papers, 2022, pp. 1–8 arXiv:2203.13333 [cs.CV]
- “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models” In arXiv preprint arXiv:2112.10741, 2021 arXiv:2112.10741 [cs.CV]
- “Point-E: A system for generating 3D point clouds from complex prompts” In arXiv preprint arXiv:2212.08751, 2022 arXiv:2212.08751 [cs.CV]
- “Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3504–3515 arXiv:1912.07372 [cs.CV]
- “Benchmark for compositional text-to-image synthesis” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021 URL: https://openreview.net/pdf?id=bKBhQhPeKaF
- “DeepSDF: Learning continuous signed distance functions for shape representation” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174 arXiv:1901.05103 [cs.CV]
- “DreamFusion: Text-to-3D using 2D Diffusion” In International conference on machine learning, 2023 arXiv:2209.14988 [cs.CV]
- “Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping” In International Conference on Learning Representations, 2024 arXiv:2310.12474 [cs.CV]
- “Compositional 3D scene generation using locally conditioned diffusion” In arXiv preprint arXiv:2303.12218, 2023 arXiv:2303.12218 [cs.CV]
- René Ranftl, Alexey Bochkovskiy and Vladlen Koltun “Vision transformers for dense prediction” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12179–12188 arXiv:2103.13413 [cs.CV]
- “High-resolution image synthesis with latent diffusion models” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695 arXiv:2112.10752 [cs.CV]
- “Hierarchical text-conditional image generation with CLIP latents” In arXiv preprint arXiv:2204.06125 1.2, 2022, pp. 3 arXiv:2204.06125 [cs.CV]
- Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-net: Convolutional networks for biomedical image segmentation” In Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241 Springer arXiv:1505.04597 [cs.CV]
- “Neurosymbolic Models for Computer Graphics” In Computer Graphics Forum 42, 2023, pp. 545–568 Wiley Online Library arXiv:2304.10320 [cs.GR]
- “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 arXiv:2103.00020 [cs.CV]
- “TEXTure: Text-guided texturing of 3D shapes” In ACM SIGGRAPH Conference Proceedings, 2023 arXiv:2302.01721 [cs.CV]
- Herbert E Robbins “An empirical Bayes approach to statistics” In Breakthroughs in Statistics: Foundations and basic theory Springer, 1992, pp. 388–394
- “CLIP-Forge: Towards zero-shot text-to-shape generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18603–18613 arXiv:2110.02624 [cs.CV]
- “Photorealistic text-to-image diffusion models with deep language understanding” In Advances in Neural Information Processing Systems 35, 2022, pp. 36479–36494 arXiv:2205.11487 [cs.CV]
- “RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture” In arXiv preprint arXiv:2305.11337, 2023 arXiv:2305.11337 [cs.CV]
- “Zero123++: a single image to consistent multi-view diffusion base model” In arXiv preprint arXiv:2310.15110, 2023 arXiv:2310.15110 [cs.CV]
- “Vox-E: Text-guided Voxel Editing of 3D Objects” In International Conference on Computer Vision, 2023 arXiv:2303.12048 [cs.CV]
- “CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18339–18348 arXiv:2211.01427 [cs.CV]
- “Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis” In Advances in Neural Information Processing Systems, 2021 arXiv:2111.04276 [cs.CV]
- “Improved techniques for training GANs” In Advances in neural information processing systems 29, 2016 arXiv:1606.03498 [cs.LG]
- “Let 2D diffusion model know 3D-consistency for robust text-to-3D generation” In arXiv preprint arXiv:2303.07937, 2023 arXiv:2303.07937 [cs.CV]
- Kihyuk Sohn “Improved deep metric learning with multi-class N-pair loss objective” In Advances in neural information processing systems 29, 2016 DOI: 10.5555/3157096.3157304
- “Deep generative models on 3D representations: A survey” In arXiv preprint arXiv:2210.15663, 2022 arXiv:2210.15663 [cs.CV]
- Cheng Sun, Min Sun and Hwann-Tzong Chen “Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5459–5469 arXiv:2111.11215 [cs.CV]
- “Score-based generative modeling through stochastic differential equations” In International conference on learning representations, 2021 arXiv:2011.13456 [cs.LG]
- “Deep unsupervised learning using nonequilibrium thermodynamics” In International conference on machine learning, 2015, pp. 2256–2265 PMLR arXiv:1503.03585 [cs.LG]
- “MVDream: Multi-view diffusion for 3D generation” In arXiv preprint arXiv:2308.16512, 2023 arXiv:2308.16512 [cs.CV]
- “TextMesh: Generation of Realistic 3D Meshes From Text Prompts” In arXiv preprint arXiv:2304.12439, 2023 arXiv:2304.12439 [cs.CV]
- “DreamGaussian: Generative gaussian splatting for efficient 3D content creation” In arXiv preprint arXiv:2309.16653, 2023 arXiv:2309.16653 [cs.CV]
- “Language grounding with 3D objects” In Conference on Robot Learning, 2022, pp. 1691–1701 arXiv:2107.12514 [cs.CL]
- “Stable Score Distillation for High-Quality 3D Generation” In arXiv preprint arXiv:2312.09305, 2023 arXiv:2312.09305 [cs.CV]
- Xi Tian, Yong-Liang Yang and Qi Wu “ShapeScaffolder: Structure-Aware 3D Shape Generation from Text” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2715–2724 DOI: 10.1109/ICCV51070.2023.00256
- Alexander Vilesov, Pradyumna Chari and Achuta Kadambi “CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting” In arXiv preprint arXiv:2311.17907, 2023 arXiv:2311.17907 [cs.CV]
- Aaron Van Den Oord and Oriol Vinyals “Neural discrete representation learning” In Advances in neural information processing systems 30, 2017 arXiv:1711.00937 [cs.LG]
- “CLIP-NeRF: Text-and-image driven manipulation of neural radiance fields” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3835–3844 arXiv:2112.05139 [cs.CV]
- “Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 12619–12629 arXiv:2212.00774 [cs.CV]
- “SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity” In arXiv preprint arXiv:2401.00604, 2023 arXiv:2401.00604 [cs.CV]
- “NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction” In arXiv preprint arXiv:2106.10689, 2021 arXiv:2106.10689 [cs.CV]
- “ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation” In Advances in Neural Information Processing Systems, 2023 arXiv:2305.16213 [cs.LG]
- “TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16805–16815 arXiv:2303.13273 [cs.CV]
- “Taming Mode Collapse in Score Distillation for Text-to-3D Generation” In arXiv preprint arXiv:2401.00909, 2023 arXiv:2401.00909 [cs.CV]
- “GPT-4V (ision) is a Human-Aligned Evaluator for Text-to-3D Generation” In arXiv preprint arXiv:2401.04092, 2024 arXiv:2401.04092 [cs.CV]
- “Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior” In arXiv preprint arXiv:2401.09050, 2024 arXiv:2401.09050 [cs.CV]
- “ULIP: Learning a unified representation of language, images, and point clouds for 3D understanding” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1179–1189 arXiv:2212.05171 [cs.CV]
- “DMV3D: Denoising multi-view diffusion using 3D large reconstruction model” In arXiv preprint arXiv:2311.09217, 2023 arXiv:2311.09217 [cs.CV]
- “Neural fields in visual computing and beyond” In Computer Graphics Forum 41, 2022, pp. 641–676 Wiley Online Library arXiv:2111.11426 [cs.CV]
- “Dream3D: Zero-shot text-to-3D synthesis using 3D shape prior and text-to-image diffusion models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20908–20918 arXiv:2212.14704 [cs.CV]
- “GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors” In arXiv preprint arXiv:2310.08529, 2023 arXiv:2310.08529 [cs.CV]
- “Text-to-3D with classifier score distillation” In arXiv preprint arXiv:2310.19415, 2023 arXiv:2310.19415 [cs.CV]
- “CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs” In arXiv preprint arXiv:2311.16703, 2023 arXiv:2311.16703 [cs.CV]
- “Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation” In arXiv preprint arXiv:2306.17115, 2023 arXiv:2306.17115 [cs.CV]
- “Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields” In IEEE Transactions on Visualization and Computer Graphics, 2023 arXiv:2305.11588 [cs.CV]
- “SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation” In Computer Graphics Forum 41, 2022, pp. 52–63 Wiley Online Library arXiv:2206.12055 [cs.CV]
- “Locally attentional SDF diffusion for controllable 3D shape generation” In ACM Transactions on Graphics (TOG), Proc. SIGGRAPH, 2023 arXiv:2305.04461 [cs.CV]
- “DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling” In arXiv preprint arXiv:2311.17082, 2023 arXiv:2311.17082 [cs.CV]
- “SceneWiz3D: Towards Text-guided 3D Scene Composition” In arXiv preprint arXiv:2312.08885, 2023 arXiv:2312.08885 [cs.CV]
- Han-Hung Lee (6 papers)
- Manolis Savva (64 papers)
- Angel X. Chang (58 papers)