Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content (2003.05863v1)

Published 12 Mar 2020 in cs.CV, cs.GR, and eess.IV

Abstract: Image visual try-on aims at transferring a target clothing image onto a reference person, and has become a hot topic in recent years. Prior arts usually focus on preserving the character of a clothing image (e.g. texture, logo, embroidery) when warping it to arbitrary human pose. However, it remains a big challenge to generate photo-realistic try-on images when large occlusions and human poses are presented in the reference person. To address this issue, we propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN). In particular, ACGPN first predicts semantic layout of the reference image that will be changed after try-on (e.g. long sleeve shirt$\rightarrow$arm, arm$\rightarrow$jacket), and then determines whether its image content needs to be generated or preserved according to the predicted semantic layout, leading to photo-realistic try-on and rich clothing details. ACGPN generally involves three major modules. First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on. Second, a clothes warping module warps clothing images according to the generated semantic layout, where a second-order difference constraint is introduced to stabilize the warping process during training. Third, an inpainting module for content fusion integrates all information (e.g. reference image, semantic layout, warped clothes) to adaptively produce each semantic part of human body. In comparison to the state-of-the-art methods, ACGPN can generate photo-realistic images with much better perceptual quality and richer fine-details.

Citations (238)

View on Semantic Scholar

Summary

The paper introduces the Adaptive Content Generating and Preserving Network (ACGPN) to produce photo-realistic virtual try-on images.
It employs semantic layout generation, a clothes warping module with a second-order difference constraint, and an inpainting stage for effective content fusion.
ACGPN outperforms methods like VITON and CP-VTON, achieving higher SSIM and Inception Scores for complex textures and occluded images.

Toward Photo-Realistic Virtual Try-On by Adaptively Generating and Preserving Image Content

The paper presents a novel approach to image-based virtual try-on, utilizing an Adaptive Content Generating and Preserving Network (ACGPN) to achieve photo-realistic results. The central challenge addressed is generating high-quality, naturalistic images of a person wearing arbitrary target clothing, even when the original image features complicated human poses or occlusions.

Architectural Overview

The ACGPN comprises three main modules:

Semantic Layout Generation: This module is responsible for predicting the semantic layout of the image that indicates areas of expected change due to the try-on process. It employs semantic segmentation to progressively create a layout that distinguishes between body parts and clothing regions, enabling the correct identification of areas that require alteration or preservation.
Clothes Warping Module: This component warps the target clothing image according to the predicted semantic layout. A second-order difference constraint is introduced here to stabilize the warping process, crucial for maintaining the integrity of the clothing's texture, logo, and other characteristics during deformation.
Inpainting and Content Fusion: This final stage integrates all the information—warped clothing, semantic layout, and original image content—to produce the complete try-on image. It adaptively generates content where necessary while preserving regions that do not require alteration.

Numerical Performance and Implications

The paper provides quantitative assessments using Structural SIMilarity (SSIM) and Inception Score (IS) metrics, demonstrating that ACGPN outperforms existing methods such as VITON and CP-VTON across varying levels of image complexity. Experimental data indicate significant improvements in the photorealism of outputs, particularly noted in the handling of complex textures and occlusions.

Contributions and Future Prospects

The key contributions of this work include the development of the ACGPN framework for virtual try-on applications, a methodology for adaptive content generation and preservation using semantic layouts, and the introduction of a warping constraint that enhances the stability and accuracy of virtual garment fitting.

The implications of this research extend to the theoretical domain, where it offers novel strategies for managing complex image synthesis tasks, and to practical applications, notably in the fashion industry and in digital retail environments where realistic virtual fitting can greatly enhance consumer experience.

Looking ahead, this work could inspire new research directions in synthesizing images where significant variance exists between target and source images—in essence, any domain where the midground between generation and preservation must be precisely navigated. The adaptive mechanisms proposed here could serve as a foundation for further exploration in real-time applications and for extending virtual try-on technologies to more dynamic use cases, such as animated characters or AR-enhanced shopping experiences.

ACGPN’s promising results pave the way for integrating AI-driven solutions in online retail and fashion, marking a step toward more immersive and personalized digital shopping possibilities. As AI methodologies continue evolving, similar adaptive systems could have broader implications, potentially influencing how virtual environments are constructed and manipulated in multiple domains.

PDF Markdown