Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EVA3D: Compositional 3D Human Generation from 2D Image Collections (2210.04888v1)

Published 10 Oct 2022 in cs.CV

Abstract: Inverse graphics aims to recover 3D models from 2D observations. Utilizing differentiable rendering, recent 3D-aware generative models have shown impressive results of rigid object generation using 2D images. However, it remains challenging to generate articulated objects, like human bodies, due to their complexity and diversity in poses and appearances. In this work, we propose, EVA3D, an unconditional 3D human generative model learned from 2D image collections only. EVA3D can sample 3D humans with detailed geometry and render high-quality images (up to 512x256) without bells and whistles (e.g. super resolution). At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts. Each part is represented by an individual volume. This compositional representation enables 1) inherent human priors, 2) adaptive allocation of network parameters, 3) efficient training and rendering. Moreover, to accommodate for the characteristics of sparse 2D human image collections (e.g. imbalanced pose distribution), we propose a pose-guided sampling strategy for better GAN learning. Extensive experiments validate that EVA3D achieves state-of-the-art 3D human generation performance regarding both geometry and texture quality. Notably, EVA3D demonstrates great potential and scalability to "inverse-graphics" diverse human bodies with a clean framework.

Citations (105)

Summary

  • The paper introduces a compositional NeRF model that splits the human body into local parts to generate realistic 3D human shapes from 2D images.
  • The paper employs a pose-guided sampling strategy to mitigate imbalanced pose distributions and enhance generative performance.
  • The paper demonstrates state-of-the-art results in texture quality and geometry detail, enabling efficient training and scalable rendering.

"EVA3D: Compositional 3D Human Generation from 2D Image Collections" addresses the challenge of generating detailed 3D human models from 2D image collections, a crucial aspect of inverse graphics. While recent advancements in 3D-aware generative models have succeeded with rigid objects, generating articulated objects like human bodies remains complex due to diverse poses and appearances.

The cornerstone of EVA3D is its compositional human NeRF representation, dividing the human body into local parts. Each body part is represented as an individual volume, leading to several benefits:

  1. Inherent Human Priors: The model integrates inherent anatomical knowledge, allowing for realistic and coherent human body generation.
  2. Adaptive Allocation of Network Parameters: Different body parts can be allocated varying network resources, optimizing detail where needed.
  3. Efficient Training and Rendering: This modular approach facilitates efficient and scalable training processes, achieving high-quality outputs.

EVA3D's approach is unique in using 2D image collections alone—without the need for extensive 3D data—while still sampling 3D humans with detailed geometry and high-quality rendering up to resolutions of 512x256 pixels. A key innovation of EVA3D is a pose-guided sampling strategy, designed to manage the imbalanced pose distribution typically found in sparse 2D human image collections. This technique improves GAN learning, leading to superior generative performance.

Extensive experiments demonstrated EVA3D's state-of-the-art performance in 3D human generation, particularly in both geometric detail and texture quality. The model's scalability and clean framework indicate its strong potential for the "inverse-graphics" of diverse human bodies, providing a robust tool for a wide range of applications in graphics and vision.

EVA3D represents a significant step forward in 3D human generative modeling, with its innovative compositional representation and efficient training strategies addressing many of the challenges that have traditionally hampered progress in this area.