Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

Published 5 Jan 2024 in cs.AI and cs.GR | (2401.02620v1)

Abstract: While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of LLMs have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.

Abstract PDF HTML Upgrade to Chat

References (80)

Citations (3)

View on Semantic Scholar

Summary

The paper presents major advances in 3D generative AI by integrating stable diffusion, NeRF, and 3D Gaussian Splatting to enhance realism and multi-view consistency.
It employs iterative refinement and rapid multi-angle synthesis to generate detailed static objects and dynamic human figures using models like SMPL-X.
The work paves the way for immersive applications in gaming, AR/VR, and digital media by enabling efficient 3D scene construction and interactive human motion synthesis.

Overview of 3D Generative AI

The field of generative AI has made notable strides in 3D content. This progress is not only evident in the creation of static 3D objects but also in dynamically rendered 3D characters and motion generation. Recent advancements utilize stable diffusion processes, ensuring multi-view consistency and leverages models such as SMPL-X for highly realistic human figures. Moreover, the inception of rendering techniques like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) has elevated the realism and efficiency of neural rendered models. LLMs have also stepped into the arena, brilliantly converting language inputs into corresponding human motions.

Innovations in 3D Object and Human Model Generation

Within the field of singular 3D object generation, two main pathways are discernible. Some approaches beautifully refine models iteratively, garnering incredible detail. Conversely, there are methods designed for efficiency that realize multi-angle images in one fell step and transform them swiftly into 3D models. When it comes to human modeling, the Skinned Multi-Person Linear Model (SMPL) and its more evolved form, SMPL-X, play pivotal roles. These models facilitate the generation of human figures with higher fidelity by training AI from images and translating them into 3D outputs.

3D scene generation has gained traction, though at a slower pace compared to object and human model generation. Methods span from constructing exhaustive 3D scenes from sets of 2D images without 3D annotations to generating expansive, diverse worlds. The integration of new techniques such as RGBD inpainting and progressive inpainting-and-erasing strategies has enriched the 3D scene generation process by allowing for completion and stylization of panoramic images in unforeseen ways.

Evolution of Human Motion Synthesis

Moving towards human motion synthesis, a vast array of solutions now exist, with some able to guide motions based on marked points or interactive text prompts. These advancements suggest a promising future where motion synthesis could merge seamlessly with interactive environments, improving virtual and augmented reality experiences.

Conclusion and Future Perspective

The advancements in 3D generative AI signify a burgeoning era where the boundary between reality and AI-generated content is increasingly blurred. With the improvements in fidelity, realism, and efficient rendering methods, applications could see a surge in industries such as gaming, education, and advertising, offering more immersive and visually appealing content. Moreover, as 3D generative AI continues to develop, the prospect of creating more sophisticated and personalized 3D content is becoming a tangible reality, reshaping the landscape of digital content creation.

Markdown