360-Degree Textures of People in Clothing from a Single Image (1908.07117v1)

Published 20 Aug 2019 in cs.CV and cs.GR

Abstract: In this paper we predict a full 3D avatar of a person from a single image. We infer texture and geometry in the UV-space of the SMPL model using an image-to-image translation method. Given partial texture and segmentation layout maps derived from the input view, our model predicts the complete segmentation map, the complete texture map, and a displacement map. The predicted maps can be applied to the SMPL model in order to naturally generalize to novel poses, shapes, and even new clothing. In order to learn our model in a common UV-space, we non-rigidly register the SMPL model to thousands of 3D scans, effectively encoding textures and geometries as images in correspondence. This turns a difficult 3D inference task into a simpler image-to-image translation one. Results on rendered scans of people and images from the DeepFashion dataset demonstrate that our method can reconstruct plausible 3D avatars from a single image. We further use our model to digitally change pose, shape, swap garments between people and edit clothing. To encourage research in this direction we will make the source code available for research purpose.

Citations (126)

View on Semantic Scholar

Summary

The paper introduces a method that recovers complete 360-degree textures of people from a single image by encoding appearance in a common UV-space.
It employs a tripartite pipeline—texture completion, segmentation, and geometry prediction—with specialized neural networks to address complex 3D reconstruction challenges.
The approach generalizes to novel poses, shapes, and clothing, offering scalable solutions for virtual reality, augmented reality, and digital fashion applications.

Overview of 360-Degree Textures of People in Clothing from a Single Image

This paper presents an innovative approach to the generation of 3D avatars from a single image, focusing on complete texture prediction and accurate geometry detail within the UV-space of the SMPL model. The technique exploits image-to-image translation methods to infer a comprehensive texture map and is distinguished by its ability to capture the full appearance of a subject, including clothing segmentation and displacement mapping. The model is developed through non-rigid registration of thousands of 3D body scans, allowing textures and geometries to be encoded as images, which notably simplifies the complex task of 3D inference.

A significant contribution of the paper is its methodology for recovering full-textured 3D avatars, which can generalize to novel poses, shapes, and even new clothing with high plausibility. The approach comprehensively addresses the challenge of predicting complete textures from limited visual information, enabling diverse applications across virtual reality, augmented reality, gaming, and human surveillance systems.

Methodology

The core methodology involves a tripartite pipeline: texture completion, segmentation completion, and geometry prediction. Each component is trained separately, leveraging a dataset of registered scans brought into a common UV-space for detailed correspondence and localization. The authors employ DensePose for extracting partial texture maps, which are then completed using a neural network based on image-to-image translation principles, mitigating distortions arising from initial texture extraction inaccuracies. Separate neural networks handle clothing segmentation and geometry detail, enabling highly controlled manipulation of the 3D model.

Implications and Future Directions

The implications of this research are profound in both theoretical and practical domains. The model offers a robust solution for generating visually coherent and detailed avatars from minimal data, potentially transforming processes in digital media creation and interactive technologies. By encoding and manipulating textures and geometries in a common UV-space, the technique sets a foundation for further development in real-time avatar creation and virtual fashion applications.

In future work, addressing the generation of clothing with diverse topological features and improving texture detail remains crucial. Exploring implicit function-based representations may enhance the ability to manage different topologies, while incorporating neural rendering techniques could advance photo-realism and the complex visualization of textured clothing.

In conclusion, this paper represents a significant step towards democratizing the creation of personalized 3D avatars, offering detailed control over appearance and geometry from a single image input. The approach circumvents traditional complexities tied to multi-view capture, providing a scalable solution with notable implications in entertainment, surveillance, and virtual try-on applications.

PDF Markdown

Related Papers

YouTube

Show All Videos