Be Your Own Prada: Fashion Synthesis with Structural Coherence (1710.07346v1)

Published 19 Oct 2017 in cs.CV

Abstract: We present a novel and effective approach for generating new clothing on a wearer through generative adversarial learning. Given an input image of a person and a sentence describing a different outfit, our model "redresses" the person as desired, while at the same time keeping the wearer and her/his pose unchanged. Generating new outfits with precise regions conforming to a language description while retaining wearer's body structure is a new challenging task. Existing generative adversarial networks are not ideal in ensuring global coherence of structure given both the input photograph and language description as conditions. We address this challenge by decomposing the complex generative process into two conditional stages. In the first stage, we generate a plausible semantic segmentation map that obeys the wearer's pose as a latent spatial arrangement. An effective spatial constraint is formulated to guide the generation of this semantic segmentation map. In the second stage, a generative model with a newly proposed compositional mapping layer is used to render the final image with precise regions and textures conditioned on this map. We extended the DeepFashion dataset [8] by collecting sentence descriptions for 79K images. We demonstrate the effectiveness of our approach through both quantitative and qualitative evaluations. A user study is also conducted. The codes and the data are available at http://mmlab.ie.cuhk. edu.hk/projects/FashionGAN/.

Citations (270)

View on Semantic Scholar

Summary

The paper introduces a dual-GAN framework that decomposes synthesis into human segmentation and texture rendering stages.
It maintains structural coherence by preserving body shape and pose while incorporating text-guided clothing details.
Quantitative and user studies on an annotated DeepFashion dataset demonstrate superior attribute consistency and perceptual realism.

Analyzing "Be Your Own Prada: Fashion Synthesis with Structural Coherence"

The paper "Be Your Own Prada: Fashion Synthesis with Structural Coherence" presents a nuanced approach to the synthesis of fashion images through the use of generative adversarial networks (GANs). The methodology focuses on generating new outfits onto existing images of individuals using textual descriptions as input. Meticulously, it maintains structural coherence by ensuring that body shape and posture are preserved in the generated outputs.

Methodology and Approach

The pivotal contribution of this research is the decomposition of the generative process into two distinct GAN stages: one for human segmentation and another for texture rendering. This decomposition facilitates the separation of concerns, allowing for more focused training and potentially more accurate generative results. The first GAN stage, termed as the "shape generation" stage, is responsible for creating a human segmentation map based on spatial constraints and a merged image representation that captures the essence of the wearer's body without including clothing details. This ensures that the synthetic output aligns with the real-world pose and structure.

The second GAN, responsible for texture rendering, harnesses the segmentation map produced in the previous stage along with the textual description. An innovative compositional mapping layer is introduced in this stage to enable region-specific rendering of textures. This not only enriches the final output but ensures that the synthesized clothing remains coherent with the body's segmentation map.

Dataset and Evaluations

The authors adapted the DeepFashion dataset by annotating a subset of images with sentences that describe the clothing visually depicted. With 79,000 annotated images, this enhanced dataset supports the training and evaluation of the proposed GAN framework. Quantitative evaluations centered on attribute predictions demonstrated that their method outperformed other conventional GAN baselines, including one-step GAN models. The framework boasted robust consistency in terms of structural correctness and attribute alignment, as validated through automated detection methods.

Qualitative Assessment and User Studies

Qualitative assessments are enriched through visual examples which depict varying text inputs to produce contextually resonant images. The thorough user paper, engaging 50 participants, highlights the perceptual fidelity of generated images, with a significant percent of participants finding the synthetic segmentations indistinguishable from real maps.

Implications and Future Directions

The implications of this research are broad, with potential applications in personalized fashion design, virtual try-ons, and augmented reality. The method’s ability to anchor synthetic variations in text provides a flexible interface for users to interactively drive creativity in fashion design. The decomposition of tasks into two GAN stages may inform similar strategies in other domains of image synthesis and conditional GAN applications.

The research paves the way for future explorations into domain-specific adaptability, such as handling more complex and varied backgrounds and refining garment texture details. Future developments could include scaling up the system to handle an extensive range of fabric types and integrating advancement in neural networks that focus on fine-grained texture synthesis.

In summary, the combination of technical innovation and practical utility propels this paper as a valuable contribution to the domain of image synthesis and fashion technology, offering a scaffold for further explorations and enhancements in GAN-based image generation.

PDF Markdown