Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data (2411.18624v2)

Published 27 Nov 2024 in cs.CV

Abstract: Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. Existing methods face difficulties including a) the varying body proportions captured by in-the-wild human images; b) diverse personal belongings within the shot; and c) ambiguities in human postures and inconsistency in human textures. In addition, the scarcity of high-quality human data intensifies the challenge. To address these problems, we propose a Generalizable image-to-3D huMAN reconstruction framework, dubbed GeneMAN, building upon a comprehensive multi-source collection of high-quality human data, including 3D scans, multi-view videos, single photos, and our generated synthetic human data. GeneMAN encompasses three key modules. 1) Without relying on parametric human models (e.g., SMPL), GeneMAN first trains a human-specific text-to-image diffusion model and a view-conditioned diffusion model, serving as GeneMAN 2D human prior and 3D human prior for reconstruction, respectively. 2) With the help of the pretrained human prior models, the Geometry Initialization-&-Sculpting pipeline is leveraged to recover high-quality 3D human geometry given a single image. 3) To achieve high-fidelity 3D human textures, GeneMAN employs the Multi-Space Texture Refinement pipeline, consecutively refining textures in the latent and the pixel spaces. Extensive experimental results demonstrate that GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods. Notably, GeneMAN could reveal much better generalizability in dealing with in-the-wild images, often yielding high-quality 3D human models in natural poses with common items, regardless of the body proportions in the input images.

Summary

  • The paper introduces GeneMAN, which leverages diverse multi-source data to enable accurate 3D human reconstructions from single in-the-wild images.
  • The paper employs a NeRF-based geometry initialization followed by sculpting and multi-space texture refinement for detailed, realistic results.
  • The paper demonstrates enhanced performance, with superior PSNR and LPIPS metrics, ensuring robust multi-view consistency in real-world scenarios.

An Overview of "GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data"

The paper "GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data" presents a framework engineered to tackle the intricacies and challenges of reconstructing high-fidelity 3D human models from single in-the-wild images. Despite the recent advancements in image-to-3D human reconstruction approaches, limitations persist, particularly in dealing with varying body proportions, unconventional poses, or individuals with diverse clothing. Additionally, the generalization to scenarios beyond controlled environments remains a significant challenge.

Core Contributions and Methodological Developments

The authors introduce GeneMAN, a framework that capitalizes on human-specific priors developed using a broad and diverse dataset. The dataset comprises 3D scans, multi-view video datasets, single-image datasets, and synthetic images. It forms the basis for training the GeneMAN 2D and 3D human prior models, offering a generalizable and comprehensive prior that diverges from parametric models. These priors leverage text-to-image and view-conditioned diffusion models, ensuring detailed, consistent, and high-fidelity reconstructions while maintaining realistic textures and natural human geometry.

GeneMAN operates through several innovative modules:

  1. Geometry Initialization and Sculpting: Commencing with a NeRF-based approach to initialize human geometry without dependence on parametric models like SMPL, the framework refines this through a sculpting process. The latter involves converting NeRF outputs into high-resolution mesh representations to produce detailed and realistic geometry.
  2. Multi-Space Texture Refinement: This aspect focuses on achieving high-fidelity textures by refining textures first in the latent space, aided by the 2D diffusion model, and then in the pixel space. The latter involves optimizing the UV maps through the introduction of a ControlNet-based 2D prior, leading to a coherent and detailed texture output.

Quantitative and Qualitative Insights

The proposed framework demonstrates significant advancements over previous state-of-the-art methods both quantitatively and qualitatively. Through rigorous testing on datasets like CAPE and a variety of in-the-wild images, GeneMAN consistently displays superior geometry reconstruction quality, evidenced by metrics such as PSNR and LPIPS. It also excels in achieving multi-view consistency, a challenging aspect gaping open in image-to-3D human reconstruction tasks thus far.

The framework's generalization capabilities are highlighted as it robustly handles various real-world scenarios involving disparate poses, diverse clothing, and different body proportions, outperforming template-based methods that frequently struggle due to reliance on high-quality human pose and shape (HPS) estimations.

Implications and Future Directions

The implications of GeneMAN are profound in the context of virtual reality (VR), augmented reality (AR), gaming, and digital human interaction interfaces, where high-quality 3D reconstructions are crucial. Its ability to function with in-the-wild data opens new fronts for applications in personal avatars, digital fashion, and telepresence.

Looking toward the future, the authors acknowledge the framework's considerably longer processing times compared to other feed-forward models, suggesting potential optimization improvements. Moreover, the challenge of modeling complex interactions, such as occlusions or humans with large objects, remains open. Addressing these limitations will expand GeneMAN's applicability further into complex real-world scenarios.

In conclusion, by ingeniously combining multi-source data with advanced diffusion models, GeneMAN sets new benchmarks in the domain of single-image 3D human reconstruction, paving the way for more universal and robust applications across multiple industries.

Youtube Logo Streamline Icon: https://streamlinehq.com