Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advances in 3D Generation: A Survey (2401.17807v1)

Published 31 Jan 2024 in cs.CV and cs.GR

Abstract: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models. The rapid growth of this field makes it difficult to stay abreast of all recent developments. In this survey, we aim to introduce the fundamental methodologies of 3D generation methods and establish a structured roadmap, encompassing 3D representation, generation methods, datasets, and corresponding applications. Specifically, we introduce the 3D representations that serve as the backbone for 3D generation. Furthermore, we provide a comprehensive overview of the rapidly growing literature on generation methods, categorized by the type of algorithmic paradigms, including feedforward generation, optimization-based generation, procedural generation, and generative novel view synthesis. Lastly, we discuss available datasets, applications, and open challenges. We hope this survey will help readers explore this exciting topic and foster further advancements in the field of 3D content generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xiaoyu Li (348 papers)
  2. Qi Zhang (785 papers)
  3. Di Kang (44 papers)
  4. Weihao Cheng (9 papers)
  5. Yiming Gao (26 papers)
  6. Jingbo Zhang (43 papers)
  7. Zhihao Liang (16 papers)
  8. Jing Liao (100 papers)
  9. Yan-Pei Cao (58 papers)
  10. Ying Shan (252 papers)
Citations (24)

Summary

  • The paper presents a comprehensive survey of 3D generation techniques, including explicit, implicit, and hybrid representations.
  • The paper details various methodologies such as GANs, diffusion, autoregressive, and optimization-based approaches for synthesizing high-quality 3D models.
  • The paper highlights practical applications and identifies open challenges like evaluation metrics and data scarcity to advance industrial standards.

Introduction

The synthesis of 3D models is an intricate task that involves the convergence of computer vision, graphics, and machine learning disciplines. The impetus behind generating 3D content spans several domains, from entertainment to virtual reality, necessitating a rich repository of 3D assets. While traditional methods of 3D content creation involved labor-intensive modeling by artists, recent strides in AI have paved the way for automated, high-quality, and scalable 3D model generation.

3D Representations and Generation Methods

At the core of 3D generation are the representations that embody the geometry and appearance of 3D objects. The survey highlights three fundamental types of scene representations: explicit, implicit, and hybrid. Explicit representations describe scenes with primitives like point clouds and meshes, which are straightforward but may lack finer resolution. Implicit representations like Neural Radiance Fields (NeRFs) encapsulate volumetric characteristics and enable continuous and detailed modeling, albeit with slower optimization. Hybrid representations attempt to integrate the strengths of both explicit and implicit forms, offering efficient optimization and flexible topology.

The core methodologies in 3D generation also vary, ranging from generative adversarial networks (GANs) and diffusion models to autoregressive models and optimization-based approaches. GANs, for instance, have shown remarkable success in synthesizing realistic textures and geometries, while diffusion models have demonstrated potential in capturing the underlying structure and randomness inherent in natural objects. Autoregressive models have been used effectively for sequential generation of 3D points or polygons, whereas optimization-based approaches are adept at leveraging pre-existing large-scale models to distill 3D content from textual or image-based prompts.

Datasets and Applications

Training models for 3D generation invariably requires a substantial amount of data. This survey provides an insightful rundown of datasets tailored for different facets of 3D vision, outlining those aimed at object-centric applications such as ShapeNet, to those capturing multi-view images like ScanNet and datasets focused on single-view images including FFHQ and AFHQ. The advent of larger, more diverse datasets such as Objaverse-XL indicates a trend towards enhancing 3D model quality and variety through enriched training sources.

3D generation finds utility in various applications, from generating photorealistic human avatars and facial structures to general object and scene creation. The evolution from textureless models to fully textured assets epitomizes the advancements in the field, offering promising perspectives for practical applications and broader creative possibilities.

Open Challenges

Despite considerable progress, there remain challenges that prevent 3D generated content from fully meeting industry standards. The survey identifies evaluation metrics, data scarcity, content representation, controllability, and the advent of large-scale models as areas necessitating further research. The discussion of these open challenges underscores the complexities of 3D content generation while inviting innovative solutions and perspectives.

Conclusion

This comprehensive survey meticulously presents the dynamic landscape of 3D content generation, offering a structured compilation of methodologies, datasets, applications, and challenges. The synergy of diverse algorithmic paradigms and representation strategies highlighted in the survey not only reflects the current state of the field but also kindles the potential for future breakthroughs that could transform the way we create and interact with 3D content.