SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On (2001.06265v1)

Published 17 Jan 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Image-based virtual try-on for fashion has gained considerable attention recently. The task requires trying on a clothing item on a target model image. An efficient framework for this is composed of two stages: (1) warping (transforming) the try-on cloth to align with the pose and shape of the target model, and (2) a texture transfer module to seamlessly integrate the warped try-on cloth onto the target model image. Existing methods suffer from artifacts and distortions in their try-on output. In this work, we present SieveNet, a framework for robust image-based virtual try-on. Firstly, we introduce a multi-stage coarse-to-fine warping network to better model fine-grained intricacies (while transforming the try-on cloth) and train it with a novel perceptual geometric matching loss. Next, we introduce a try-on cloth conditioned segmentation mask prior to improve the texture transfer network. Finally, we also introduce a dueling triplet loss strategy for training the texture translation network which further improves the quality of the generated try-on results. We present extensive qualitative and quantitative evaluations of each component of the proposed pipeline and show significant performance improvements against the current state-of-the-art method.

Citations (63)

View on Semantic Scholar

Summary

The paper introduces a unified multi-stage framework that enhances garment alignment and texture synthesis via a coarse-to-fine warping process and duelling triplet loss.
It integrates conditional segmentation mask generation to effectively mitigate texture bleeding and reduce artifacts in challenging poses.
Experimental evaluations reveal improved FID and PSNR metrics, demonstrating the framework's robustness in image-based virtual try-on applications.

SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On

The paper "SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On" provides an in-depth investigation into improving the quality and robustness of virtual try-on systems, which are pivotal for enhancing online shopping experiences in the fashion industry. The authors identify significant challenges in existing models, particularly concerning artifacts and distortions, and introduce the SieveNet framework to address these limitations. The methodology proposed in this paper comprises a multi-stage approach, integrating advanced techniques such as a coarse-to-fine warping module, a conditional segmentation mask generation process, and a novel duelling triplet loss for enhancing texture translation.

Key Contributions

Coarse-to-Fine Warping Module: This module is designed to optimize the alignment of the try-on garment with the pose and body shape of the target model. The multi-stage warping approach significantly improves the modeling of fine-grained shape intricacies. Moreover, incorporating a perceptual geometric matching loss, this module deviates from traditional single-stage warping techniques by implementing a refined sequential process that yields finer geometric transformations.
Conditional Segmentation Mask Generation: This module tackles the problem of texture bleeding and skin-artifact occurrences by predicting a conditional segmentation mask for the try-on garment. By conditioning on the input clothing item, the network corrects expected segments, facilitating improved synthesis of the final try-on image amidst complex poses and occlusions.
Segmentation Assisted Texture Translation: The texture translation network benefits from the well-formed segmentation mask, ensuring seamless integration of the try-on garment with the persisting details of the target model. The inclusion of a duelling triplet loss strategy offers an innovative angle, whereby fine-tuning phases are guided towards refining the try-on image’s realism by employing hard negative mining techniques.

Experimental Evaluation and Results

The authors conduct extensive evaluations using benchmark metrics including SSIM, MS-SSIM, FID, PSNR, and IS to substantiate the performance improvements of SieveNet against the state-of-the-art method, CP-VTON. Notably, the SieveNet configuration exhibits a remarkable reduction in FID scores from 20.331 to 14.65, demonstrating a significant enhancement in image distribution fidelity. Furthermore, PSNR sees a notable improvement of approximately 17%, indicating better image quality reconstruction.

Implications and Future Directions

The practical implications of this research are profound for e-commerce platforms, especially in enhancing customer interactions through more realistic virtual try-on tools. Theoretical advancements illustrated by the integration of perceptual and duelling losses may offer valuable insights for broader AI applications, especially in fields where image synthesis holds critical importance.

Looking forward, expanding upon the current findings could involve integrating more context-aware features and exploring reinforcement learning mechanisms tailored for optimization in virtual environments. Additionally, leveraging the framework alongside emerging AI trends, such as synthetic data generation or improved human pose estimation techniques, could foster new avenues for research and application.

In conclusion, the SieveNet framework stands as a testament to the potential for methodological innovation in image-based virtual try-on systems. By synthesizing various novel computational strategies, this paper sets a new benchmark and opens multiple prospectives for subsequent research in the domain.

PDF Markdown

Related Papers

YouTube

Show All Videos