Fashion-Gen: The Generative Fashion Dataset and Challenge (1806.08317v2)

Published 21 Jun 2018 in stat.ML and cs.LG

Abstract: We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles. We provide baseline results on 1) high-resolution image generation, and 2) image generation conditioned on the given text descriptions. We invite the community to improve upon these baselines. In this paper, we also outline the details of a challenge that we are launching based upon this dataset.

Citations (122)

View on Semantic Scholar

Summary

The paper introduces Fashion-Gen, a high-resolution generative fashion dataset with 293,008 images and detailed professional captions, along with a challenge to advance text-to-image synthesis.
Baseline results using P-GANs show promising high-resolution image generation, while text-to-image experiments with StackGANs highlight the importance of text encoding for generating fashion items.
Fashion-Gen has practical implications for fashion design by providing tools for visualization and iteration, and theoretical implications for refining generative modeling techniques.

Fashion-Gen: The Generative Fashion Dataset and Challenge

The paper, "Fashion-Gen: The Generative Fashion Dataset and Challenge," introduces a substantial contribution to the field of text-to-image synthesis in fashion by presenting a comprehensive, high-resolution dataset paired with an associated challenge. The authors have curated a dataset comprising 293,008 fashion images, each image accompanied by detailed item descriptions provided by professional stylists. The images are captured under controlled studio conditions and photographed from multiple angles. This dataset offers a significant resource for advancing research in text-conditioned image generation.

Key Contributions

The core contributions of the paper can be summarized as follows:

Dataset Introduction: The Fashion-Gen dataset surpasses existing datasets in text-to-image synthesis concerning the number of images and resolution quality. The dataset offers substantial granularity in descriptive captions and photographs fashion items from 1 to 6 angles.
Baseline Methods and Results: The authors provide baseline results on the generation of high-resolution images using Progressive Generative Adversarial Networks (P-GANs) and text-conditioned image generation utilizing StackGAN-v1 and StackGAN-v2 models.
Challenge and Community Engagement: A unique challenge is introduced to stimulate further research in text-to-image synthesis, with an evaluation process combining inception scores and human evaluations to offer a holistic assessment.

Technical Insights and Results

The paper provides essential insights into the generative modeling techniques employed:

High-Resolution Image Generation: Progressive Growing of GANs (P-GANs) was leveraged to generate high-resolution images, showing promising results in maintaining global coherency while reflecting the intricate details typical of fashion items. The validation inception score of 7.91 out of a target score of 9.71 denotes substantial progress yet room for improvement.
Text-to-Image Synthesis: Text-to-image experiments showed that StackGAN-v1 outperformed StackGAN-v2 in terms of inception scores, despite StackGAN-v2 producing qualitatively better-looking images. The experiments highlighted the importance of the text encoding method, with a pretrained bi-LSTM encoder yielding the most promising results.

Comparisons with Existing Datasets

The authors conduct thorough comparative evaluations with existing datasets such as DeepFashion, CelebA, and MS COCO, underscoring the unique aspects such as multiple angles and higher resolution in their dataset. Notably, the inclusion of detailed professional captions for each item provides specific utility for the fashion design application.

Implications and Future Directions

This research sets a significant precedence in amalgamating generative models with real-world applications in fashion design. The implications are multifold:

Practical Implications: Enhanced tools for fashion designers can facilitate rapid visualization and iteration of designs, bridging the conceptual-to-visual gap.
Theoretical Implications: The detailed dataset allows the community to refine and test novel generative modeling techniques, potentially advancing understanding in conditional generative networks and text-to-image synthesis.

Looking ahead, improvements in synthesizing finer details from textual descriptions, exploring the role of various facets of the dataset such as multiple angles, and refining evaluation metrics could be promising directions. This dataset and accompanying challenge invite a collaborative effort within the research community to push the boundaries of what can be achieved with generative models in fashion technology.

The release of the Fashion-Gen dataset alongside the challenge represents a robust platform for researchers to test, evaluate, and refine their models, fostering continued innovation in fashion technology and generative image modeling.

Related Papers

YouTube

Show All Videos