SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes (2308.10638v2)

Published 21 Aug 2023 in cs.CV, cs.AI, cs.GR, and cs.LG

Abstract: We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. Our code and data can be found at https://sculpt.is.tue.mpg.de.

Authors (6)

Soubhik Sanyal (7 papers)
Partha Ghosh (16 papers)
Jinlong Yang (119 papers)
Michael J. Black (163 papers)
Justus Thies (62 papers)
Timo Bolkart (36 papers)

Citations (3)

View on Semantic Scholar

Summary

Overview of "SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes"

The paper "SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes" presents a novel approach to generating 3D representations of clothed human figures using a generative model that constructs explicit geometry and appearance via meshes and texture maps. This approach addresses the limitations of previous efforts which often relied on implicit representations difficult to integrate with existing graphical systems.

Methodology and Approach

The paper introduces SCULPT, a deep neural network designed to learn the geometry and appearance distributions of clothed human figures. The innovation of SCULPT lies in utilizing both medium-sized 3D dataset and large-scale 2D image datasets to overcome dataset deficiencies typically encountered in this research area. The system learns pose-dependent geometry from 3D scan data and represents this geometry in terms of per-vertex displacements relative to the SMPL model. This strategy allows SCULPT to generate human meshes effectively conditioned by shape and pose.

The training procedure features an unpaired learning methodology, combining 3D and 2D data modalities. The geometry generator creates displacement maps from 3D data using the CAPE dataset, while a texture generator is trained unsupervisedly with 2D images. A significant aspect of the framework is its architecture, inspired by StyleGAN, that allows intricate synthesis of high-fidelity textures conditioned on intermediate geometry activations. This conditioning mitigates the entanglement between pose, clothing type, and color appearance by using attribute labels derived from advanced models like BLIP and CLIP.

Results and Comparisons

The empirical validations presented in the paper demonstrate that SCULPT can produce high-quality 3D clothed human figures with realistic textures and pose-dependent geometry. The model outperforms several existing state-of-the-art models, including EG3D and EVA3D, particularly in terms of rendering quality and detail in the synthesized geometry. Notably, quantitative metrics such as FID and KID show the efficacy of SCULPT over its contemporaries. The authors also highlight the model's ability to generate nuanced variations in clothing style and appearance, offering significant user control.

Implications and Future Directions

The practical implications of SCULPT are marked by its compatibility with current graphics and game engines, owing to its explicit geometry mesh outputs. This compatibility is a notable advantage over models using implicit representations. The work fits well within the broader context of augmenting virtual and augmented reality environments, enhancing virtual assistant avatars, and contributing to privacy-centric synthetic data generation for machine learning applications.

Theoretically, SCULPT shifts the paradigm for generative modeling of 3D humans by integrating classical and modern machine learning elements, such as leveraging pose-conditioned geometry and language-driven attribute conditioning.

Future research could explore expanding the dataset diversity to include varied body types and clothing styles. Furthermore, incorporating real-time pose estimation might lead to dynamic, interactive 3D avatar systems. Another exciting avenue would be optimizing the underlying computational framework to ensure scalability and robustness across different hardware platforms.

In conclusion, the SCULPT framework advances the field of 3D generative modeling by presenting a highly controlled and nuanced synthesis technique that is both functionally and ergonomically aligned with current technological infrastructure and future applications in digital humans.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/SanyalSoubhik/status/1800912363786506641

YouTube

Show All Videos