Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Survey on 3D Content Generation (2402.01166v2)

Published 2 Feb 2024 in cs.CV and cs.AI

Abstract: Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This review aims to consolidate developments within the burgeoning domain of 3D content generation. Specifically, a new taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods. The survey covers approximately 60 papers spanning the major techniques. Besides, we discuss limitations of current 3D content generation techniques, and point out open challenges as well as promising directions for future work. Accompanied with this survey, we have established a project website where the resources on 3D content generation research are provided. The project page is available at https://github.com/hitcslj/Awesome-AIGC-3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. 4d-fy: Text-to-4d generation using hybrid score distillation sampling. arXiv preprint arXiv:2311.17984, 2023.
  3. Gaudi: A neural architect for immersive 3d scene generation. NeurIPS, 2022.
  4. Improving image generation with better captions. Computer Science, 2023.
  5. Face recognition based on fitting a 3d morphable model. TPAMI, 2003.
  6. Texfusion: Synthesizing 3d textures with text-guided image diffusion models. In ICCV, 2023.
  7. Text2shape: Generating shapes from natural language by learning joint embeddings. In ACCV, 2019.
  8. Towards efficient and photorealistic 3d human reconstruction: a brief survey. Visual Informatics, 2021.
  9. Sofgan: A portrait image generator with dynamic styling. TOG, 2022.
  10. gDNA: Towards generative detailed neural avatars. In CVPR, 2022.
  11. Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors. arXiv preprint arXiv:2311.17261, 2023.
  12. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In ICCV, 2023.
  13. Scenedreamer: Unbounded 3d scene generation from 2d image collections. TPAMI, 2023.
  14. Text-to-3d using gaussian splatting. arXiv preprint arXiv:2309.16585, 2023.
  15. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, 2023.
  16. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes. arXiv preprint arXiv:2311.13384, 2023.
  17. Smplicit: Topology-aware generative model for clothed people. In CVPR, 2021.
  18. Shapecrafter: A recursive text-conditioned 3d shape generation model. NeurIPS, 2022.
  19. Headsculpt: Crafting 3d head avatars with text. In NeurIPS, 2023.
  20. Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989, 2023.
  21. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535, 2022.
  22. Headnerf: A real-time nerf-based parametric head model. In CVPR, 2022.
  23. Lrm: Large reconstruction model for single image to 3d. ICLR, 2024.
  24. Textfield3d: Towards enhancing open-vocabulary 3d generation with noisy text fields. arXiv preprint arXiv:2309.17175, 2023.
  25. Dreamcontrol: Control-based text-to-3d generation with 3d self-prior. arXiv preprint arXiv:2312.06439, 2023.
  26. Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406, 2023.
  27. Dreamwaltz: Make a scene with complex 3d animatable avatars. arXiv preprint arXiv:2305.12529, 2023.
  28. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  29. 3d gaussian splatting for real-time radiance field rendering. TOG, 2023.
  30. Neuralfield-ldm: Scene generation with hierarchical latent diffusion models. In CVPR, 2023.
  31. Dreamhuman: Animatable 3d avatars from text. arXiv preprint arXiv:2306.09329, 2023.
  32. Generative ai meets 3d: A survey on text-to-3d in aigc era. arXiv preprint arXiv:2305.06131, 2023.
  33. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023.
  34. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
  35. Deep learning for procedural content generation. Neural Computing and Applications, 2021.
  36. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. arXiv preprint arXiv:2311.07885, 2023.
  37. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS, 2023.
  38. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023.
  39. 3dall-e: Integrating text-to-image ai in 3d design workflows. In ACM DIS, 2023.
  40. Humangaussian: Text-driven 3d human generation with gaussian splatting. arXiv preprint arXiv:2311.17061, 2023.
  41. Syncdreamer: Generating multiview-consistent images from a single-view image. ICLR, 2024.
  42. Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
  43. SMPL: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 2023.
  44. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  45. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  46. Dreamgaussian4d: Generative 4d gaussian splatting. arXiv preprint arXiv:2312.17142, 2023.
  47. Texture: Text-guided texturing of 3d shapes. In SIGGRAPH, 2023.
  48. PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In ICCV, 2019.
  49. SCULPT: Shape-conditioned unpaired learning of pose-dependent clothed and textured human meshes. arXiv preprint arXiv:2308.10638, 2023.
  50. Controlroom3d: Room generation using semantic proxy rooms. arXiv preprint arXiv:2312.05208, 2023.
  51. Graf: Generative radiance fields for 3d-aware image synthesis. NeurIPS, 2020.
  52. Deep generative models on 3d representations: A survey. arXiv preprint arXiv:2210.15663, 2022.
  53. Mvdream: Multi-view diffusion for 3d generation. ICLR, 2024.
  54. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
  55. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818, 2023.
  56. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion, 2023.
  57. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. ICLR, 2024.
  58. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In CVPR, 2023.
  59. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In NeurIPS, 2023.
  60. Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation. arXiv preprint arXiv:2401.04092, 2024.
  61. Get3DHuman: Lifting StyleGAN-Human into a 3D generative model using pixel-aligned reconstruction priors. In ICCV, 2023.
  62. Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model. ICLR, 2024.
  63. 4dgen: Grounded 4d content generation with spatial-temporal consistency. arXiv preprint arXiv:2312.17225, 2023.
  64. Dreamface: Progressive generation of animatable 3d faces under text guidance. arXiv preprint arXiv:2304.03117, 2023.
  65. Scenewiz3d: Towards text-guided 3d scene composition. arXiv preprint arXiv:2312.08885, 2023.
  66. Animate124: Animating one image to 4d dynamic scene. arXiv preprint arXiv:2311.14603, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Jian Liu (404 papers)
  2. Xiaoshui Huang (55 papers)
  3. Tianyu Huang (28 papers)
  4. Lu Chen (245 papers)
  5. Yuenan Hou (31 papers)
  6. Shixiang Tang (48 papers)
  7. Ziwei Liu (368 papers)
  8. Wanli Ouyang (358 papers)
  9. Wangmeng Zuo (279 papers)
  10. Junjun Jiang (97 papers)
  11. Xianming Liu (121 papers)
Citations (15)

Summary

A Comprehensive Survey on 3D Content Generation

The paper "A Comprehensive Survey on 3D Content Generation" conducts an in-depth examination of the current landscape of three-dimensional (3D) content generation. It is motivated by the burgeoning interest in 3D Artificial Intelligence Generated Content (AIGC), which presents both significant academic interest and practical applications across various domains such as gaming, entertainment, construction, and industrial design. In this survey, the authors propose a new taxonomy for classifying 3D content generation methodologies into three categories: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods.

Taxonomy and Techniques

3D Native Generative Methods are approaches that generate 3D content directly with 3D data, though they often grapple with the scarcity of comprehensive 3D datasets. These methods include the generation of objects, scenes, and human avatars, utilizing various representations like point clouds, voxels, meshes, and neural fields. The limitation in this category often arises due to insufficient available 3D data, restricting the vocabulary and richness of objects generated.

2D Prior-based 3D Generative Methods leverage the wealth of existing 2D image and diffusion models for 3D content synthesis. Techniques like DreamFusion use pretrained 2D models to inform the 3D creation process, addressing issues like speed and fidelity. By utilizing multi-view and image-based priors, these methods circumvent the 3D data limitation but face challenges like view consistency and maintaining geometric detail.

Hybrid 3D Generative Methods aim to combine the strengths of both native and prior-based approaches, integrating 3D data with powerful 2D priors. Methods such as Zero123 and its derivatives employ multi-view fine-tuning and large-scale reconstruction models to produce coherent and efficient 3D assets. This category represents a convergence of methodologies, aiming to synthesize the accuracy of 3D data with the creative possibilities of 2D priors.

Key Findings and Numerical Results

The survey covers approximately 60 influential papers, highlighting significant methodologies and development in the sector. Notably, methods integrating 3D Gaussian Splatting have shown substantial improvement in speed (up to 10x faster) compared to those based on NeRF, demonstrating a notable advancement in enabling rapid 3D generative tasks.

Challenges and Future Directions

The paper provides a critical analysis of unresolved challenges such as maintaining high-quality generation, ensuring multi-view consistency, and improving speed without compromising fidelity. From a data perspective, there is an imperative need for larger, more diverse 3D datasets. Model-wise, the advancement of foundational 3D models and architectures tailored to large datasets is a potential avenue for development.

The paper also highlights the importance of establishing robust benchmarks to evaluate 3D content quality, suggesting that automated evaluation metrics should evolve to comprehensively address both geometric and textural fidelity.

Implications and Outlook

The insights offered by this survey lay essential groundwork for further research and development in the field of 3D content generation. As 3D generation techniques evolve, they promise to revolutionize applications across industries by providing innovative and efficient design methods. Furthermore, the integration of future LLMs and multimodal intelligence systems poses an intriguing direction for developing advanced 3D generative frameworks capable of seamless operation within the digital content creation domain.

In conclusion, this paper serves as a pivotal resource charting the trajectory of 3D generative content technologies, providing both an overview of past work and a roadmap for future exploration and application in artificial intelligence and beyond.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets