Papers
Topics
Authors
Recent
Search
2000 character limit reached

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

Published 5 Jan 2024 in cs.AI and cs.GR | (2401.02620v1)

Abstract: While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of LLMs have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020.
  2. 3d gaussian splatting for real-time radiance field rendering, 2023.
  3. Generative ai meets 3d: A survey on text-to-3d in aigc era, 2023.
  4. Mvdream: Multi-view diffusion for 3d generation, 2023.
  5. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d, 2023.
  6. Direct2.5: Diverse text-to-3d generation via multi-view 2.5d diffusion, 2023.
  7. Objaverse-xl: A universe of 10m+ 3d objects, 2023.
  8. Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models, 2023.
  9. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015.
  10. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019.
  11. Story-to-motion: Synthesizing infinite and controllable character animation from long text, 2023.
  12. Global-correlated 3d-decoupling transformer for clothed avatar reconstruction, 2023.
  13. Point-e: A system for generating 3d point clouds from complex prompts, 2022.
  14. Exim: A hybrid explicit-implicit representation for text-guided 3d shape generation, 2023.
  15. Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d, 2023.
  16. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models, 2023.
  17. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. arXiv preprint arXiv:2311.07885, 2023.
  18. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model, 2023.
  19. Wonder3d: Single image to 3d using cross-domain diffusion, 2023.
  20. Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models, 2023.
  21. Zeronvs: Zero-shot 360-degree view synthesis from a single real image, 2023.
  22. Hexplane: A fast representation for dynamic scenes, 2023.
  23. K-planes: Explicit radiance fields in space, time, and appearance, 2023.
  24. Denoising diffusion implicit models, 2022.
  25. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023.
  26. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  27. Learning transferable visual models from natural language supervision, 2021.
  28. High-resolution image synthesis with latent diffusion models, 2021.
  29. Objaverse: A universe of annotated 3d objects, 2022.
  30. A theory of shape by space carving. International journal of computer vision, 38:199–218, 2000.
  31. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021.
  32. Let there be color! large-scale texturing of 3d reconstructions. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 836–850. Springer, 2014.
  33. Google scanned objects: A high-quality dataset of 3d scanned household items, 2022.
  34. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
  35. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
  36. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation, 2023.
  37. Editing 3d scenes via text prompts without retraining, 2023.
  38. Lora: Low-rank adaptation of large language models, 2021.
  39. Attention is all you need, 2023.
  40. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  41. Stability AI. Stable Zero123: Quality 3D Object Generation from Single Images. https://stability.ai/news/stable-zero123-3d-generation, 2023. Online; accessed 13 December 2023.
  42. aitviewer, 7 2022.
  43. Dreamwaltz: Make a scene with complex 3d animatable avatars, 2023.
  44. Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation, 2023.
  45. Tada! text to animatable digital avatars, 2023.
  46. Tech: Text-guided reconstruction of lifelike clothed humans, 2023.
  47. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models, 2023.
  48. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis, 2021.
  49. Chupa: Carving 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models, 2023.
  50. Avatarcraft: Transforming text into neural human avatars with parameterized shape and pose control, 2023.
  51. Dreamhuman: Animatable 3d avatars from text, 2023.
  52. One-shot implicit animatable avatars with model-based priors, 2023.
  53. Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
  54. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars, 2022.
  55. imghum: Implicit generative models of 3d human shape and articulated pose, 2021.
  56. Ref-nerf: Structured view-dependent appearance for neural radiance fields, 2021.
  57. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022.
  58. Perf: Panoramic neural radiance field from a single panorama, 2023.
  59. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes, 2023.
  60. Scenedreamer: Unbounded 3d scene generation from 2d image collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):15562–15576, December 2023.
  61. Synthesizing diverse human motions in 3d indoor scenes, 2023.
  62. Humantomato: Text-aligned whole-body motion generation, 2023.
  63. Learned motion matching. ACM Transactions on Graphics (TOG), 39(4):53–1, 2020.
  64. T2m-gpt: Generating human motion from textual descriptions with discrete representations, 2023.
  65. Plan, posture and go: Towards open-world text-to-motion generation, 2023.
  66. Insactor: Instruction-driven physics-based characters, 2023.
  67. Imos: Intent-driven full-body motion synthesis for human-object interactions, 2023.
  68. Interdiff: Generating 3d human-object interactions with physics-informed diffusion, 2023.
  69. Chore: Contact, human and object reconstruction from a single rgb image, 2023.
  70. Editing conditional radiance fields, 2021.
  71. Nerf-editing: Geometry editing of neural radiance fields, 2022.
  72. Neuraleditor: Editing neural radiance fields via manipulating point clouds, 2023.
  73. Smerf: Streamable memory efficient radiance fields for real-time large-scene exploration, 2023.
  74. 4k4d: Real-time 4d view synthesis at 4k resolution, 2023.
  75. Spacetime gaussian feature splatting for real-time dynamic view synthesis, 2023.
  76. 4d gaussian splatting for real-time dynamic scene rendering, 2023.
  77. Deblurring 3d gaussian splatting, 2024.
  78. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023.
  79. Scaling up gans for text-to-image synthesis, 2023.
  80. Open-nerf: Towards open vocabulary nerf decomposition, 2023.
Citations (3)

Summary

  • The paper presents major advances in 3D generative AI by integrating stable diffusion, NeRF, and 3D Gaussian Splatting to enhance realism and multi-view consistency.
  • It employs iterative refinement and rapid multi-angle synthesis to generate detailed static objects and dynamic human figures using models like SMPL-X.
  • The work paves the way for immersive applications in gaming, AR/VR, and digital media by enabling efficient 3D scene construction and interactive human motion synthesis.

Overview of 3D Generative AI

The field of generative AI has made notable strides in 3D content. This progress is not only evident in the creation of static 3D objects but also in dynamically rendered 3D characters and motion generation. Recent advancements utilize stable diffusion processes, ensuring multi-view consistency and leverages models such as SMPL-X for highly realistic human figures. Moreover, the inception of rendering techniques like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) has elevated the realism and efficiency of neural rendered models. LLMs have also stepped into the arena, brilliantly converting language inputs into corresponding human motions.

Innovations in 3D Object and Human Model Generation

Within the field of singular 3D object generation, two main pathways are discernible. Some approaches beautifully refine models iteratively, garnering incredible detail. Conversely, there are methods designed for efficiency that realize multi-angle images in one fell step and transform them swiftly into 3D models. When it comes to human modeling, the Skinned Multi-Person Linear Model (SMPL) and its more evolved form, SMPL-X, play pivotal roles. These models facilitate the generation of human figures with higher fidelity by training AI from images and translating them into 3D outputs.

Refinement of 3D Scene Generation Methods

3D scene generation has gained traction, though at a slower pace compared to object and human model generation. Methods span from constructing exhaustive 3D scenes from sets of 2D images without 3D annotations to generating expansive, diverse worlds. The integration of new techniques such as RGBD inpainting and progressive inpainting-and-erasing strategies has enriched the 3D scene generation process by allowing for completion and stylization of panoramic images in unforeseen ways.

Evolution of Human Motion Synthesis

Moving towards human motion synthesis, a vast array of solutions now exist, with some able to guide motions based on marked points or interactive text prompts. These advancements suggest a promising future where motion synthesis could merge seamlessly with interactive environments, improving virtual and augmented reality experiences.

Conclusion and Future Perspective

The advancements in 3D generative AI signify a burgeoning era where the boundary between reality and AI-generated content is increasingly blurred. With the improvements in fidelity, realism, and efficient rendering methods, applications could see a surge in industries such as gaming, education, and advertising, offering more immersive and visually appealing content. Moreover, as 3D generative AI continues to develop, the prospect of creating more sophisticated and personalized 3D content is becoming a tangible reality, reshaping the landscape of digital content creation.

Paper to Video (Beta)

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 24 likes about this paper.