Style-Based Global Appearance Flow for Virtual Try-On

Published 3 Apr 2022 in cs.CV | (2204.01046v1)

Abstract: Image-based virtual try-on aims to fit an in-shop garment into a clothed person image. To achieve this, a key step is garment warping which spatially aligns the target garment with the corresponding body parts in the person image. Prior methods typically adopt a local appearance flow estimation model. They are thus intrinsically susceptible to difficult body poses/occlusions and large mis-alignments between person and garment images (see Fig.~\ref{fig:fig1}). To overcome this limitation, a novel global appearance flow estimation model is proposed in this work. For the first time, a StyleGAN based architecture is adopted for appearance flow estimation. This enables us to take advantage of a global style vector to encode a whole-image context to cope with the aforementioned challenges. To guide the StyleGAN flow generator to pay more attention to local garment deformation, a flow refinement module is introduced to add local context. Experiment results on a popular virtual try-on benchmark show that our method achieves new state-of-the-art performance. It is particularly effective in a `in-the-wild' application scenario where the reference image is full-body resulting in a large mis-alignment with the garment image (Fig.~\ref{fig:fig1} Top). Code is available at: \url{https://github.com/SenHe/Flow-Style-VTON}.

Abstract PDF Upgrade to Chat

Citations (103)

View on Semantic Scholar

Summary

The paper introduces a novel StyleGAN-based global appearance flow model that significantly improves garment-person alignment in virtual try-on systems.
It presents a two-phase framework combining global context capture with local flow refinement to achieve precise garment warping.
Experimental results on the VITON dataset show enhanced performance with an SSIM of 0.91 and an FID of 8.89, surpassing previous state-of-the-art methods.

Style-Based Global Appearance Flow for Virtual Try-On

The paper "Style-Based Global Appearance Flow for Virtual Try-On" by Sen He, Yi-Zhe Song, and Tao Xiang introduces a novel approach to image-based virtual try-on (VTON), which aims to superimpose in-shop garments onto images of clothed persons. The research addresses a significant limitation of previous methods that relied on local appearance flow estimations, which often falter under complex bodily poses or substantial misalignment between the person and garment images.

Methodology Overview

The core novelty of this work is the introduction of a style-based global appearance flow estimation model leveraging StyleGAN. The approach decouples the garment warping process into two phases: global context capturing and local refinement:

Global Appearance Flow Estimation: A StyleGAN-based architecture is employed for the first time in VTON to predict the global appearance flow. This involves the extraction of a global style vector that modulates the generation process, allowing it to encode entire image contexts and effectively manage extensive garment-person misalignments.
Local Flow Refinement Module: To complement the global style modulation, a local refinement module is introduced that incorporates local garment context for precise deformation, ensuring fine-grained alignment.

Experimental Evaluation

The performance of the proposed model was assessed using the VITON dataset, widely recognized in VTON research. The paper reports significant improvements over state-of-the-art methods:

Quantitative Metrics: The model achieved a Structural Similarity (SSIM) index of 0.91 and a Fréchet Inception Distance (FID) of 8.89, outperforming the closest competitor (PF-AFN) which achieved an SSIM of 0.89 and an FID of 10.09.
Qualitative Assessments: In scenarios marked by difficult poses and occlusions, the proposed model maintained robustness, generating realistic try-on images with higher fidelity in garment features and alignment.

Implications and Future Directions

The success of integrating a global style-based modulation technique underscores a vital shift in enhancing VTON capabilities, suggesting potential broader applications in tasks demanding significant feature alignment and realistic image synthesis. The resilience to misalignments and complex poses broaden the practicality of VTON models for real-world applications, specifically in e-commerce.

Future work may explore more profound integration with 3D modeling for enhanced virtual try-on experiences, and potentially expand the style-based framework to other domains such as augmented reality and fashion design simulations. Furthermore, the scalability and efficiency of the model could be tested against larger datasets and more varied garment types, potentially integrating real-time try-on systems.

This contribution undeniably provides a significant step forward in the field of virtual try-on technologies, offering concrete improvements and expanding the understanding of garment alignment methods within computer vision and AI.