Towards Multi-pose Guided Virtual Try-on Network (1902.11026v1)

Published 28 Feb 2019 in cs.CV

Abstract: Virtual try-on system under arbitrary human poses has huge application potential, yet raises quite a lot of challenges, e.g. self-occlusions, heavy misalignment among diverse poses, and diverse clothes textures. Existing methods aim at fitting new clothes into a person can only transfer clothes on the fixed human pose, but still show unsatisfactory performances which often fail to preserve the identity, lose the texture details, and decrease the diversity of poses. In this paper, we make the first attempt towards multi-pose guided virtual try-on system, which enables transfer clothes on a person image under diverse poses. Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses. Our MG-VTON is constructed in three stages: 1) a desired human parsing map of the target image is synthesized to match both the desired pose and the desired clothes shape; 2) a deep Warping Generative Adversarial Network (Warp-GAN) warps the desired clothes appearance into the synthesized human parsing map and alleviates the misalignment problem between the input human pose and desired human pose; 3) a refinement render utilizing multi-pose composition masks recovers the texture details of clothes and removes some artifacts. Extensive experiments on well-known datasets and our newly collected largest virtual try-on benchmark demonstrate that our MG-VTON significantly outperforms all state-of-the-art methods both qualitatively and quantitatively with promising multi-pose virtual try-on performances.

Citations (193)

View on Semantic Scholar

Summary

The paper presents MG-VTON, a novel framework integrating conditional parsing, Warp-GAN, and refinement to synthesize high-fidelity images.
It employs precise geometric and TPS-based warping to align clothing with complex body poses, effectively reducing misalignment and texture loss.
Empirical evaluations on MPV and DeepFashion datasets demonstrate superior SSIM and Inception Scores, validating its advanced virtual try-on capabilities.

An Expert Overview of the Multi-Pose Guided Virtual Try-On Network (MG-VTON)

The paper introduces a technologically advanced approach to virtual try-on systems, tackling the challenge of synthesizing person images with varying poses. The Multi-Pose Guided Virtual Try-On Network (MG-VTON) presents a significant stride in computing the seamless integration of desired clothing onto an individual under diverse body poses. Such capabilities have pertinent applications in virtual reality and e-commerce, notably in online shopping platforms requiring virtual garment fitting.

Methodological Advancements

The MG-VTON framework is intricately structured into three stages, each serving a functional role in the image synthesis pipeline:

Conditional Parsing Learning: This phase employs a pose-clothes-guided human parsing network that synthesizes a human parsing map, conditioned on the input pose and clothing details. This parsing map comprehensively delineates the body structure and attire, addressing common synthesis problems like misalignment and texture detail loss.
Warp-GAN: Central to alleviating misalignments between input and desired poses, the Warp-GAN elegantly warps the clothing to fit the target human parsing map, maintaining clothing details and authenticity. It employs a feature warping strategy leveraging geometric transformation mapping backed by TPS transformations, enhancing the alignment precision at a pixel level.
Refinement Render: Here, a multi-pose composition mask is used to refine the synthesized image, correcting artifacts introduced in earlier stages. This stage ensures the finer details of clothing textures are preserved, culminating in a realistic depiction of the subject wearing the desired outfit.

Empirical Evaluation

The efficacy of MG-VTON is underscored by extensive experiments on a new dataset (MPV) and the DeepFashion dataset. Objective metrics such as SSIM and Inception Score (IS) demonstrate quantitatively superior performance relative to existing methodologies, including VITON and CP-VTON models. For instance, MG-VTON achieves an IS of 3.154 on MPV, a notable improvement over CP-VTON's score of 2.459. These findings are corroborated by human perceptual studies affirming MG-VTON's high-fidelity image synthesis and identity preservation.

Implications and Future Prospects

The successful development of MG-VTON represents a noteworthy progression in virtual try-on technology, promising applications that go beyond current e-commerce platforms to augmented and virtual reality systems. The incorporation of multi-pose capabilities into virtual try-on systems broadens the scope of digital clothing try-on and sets the stage for more immersive virtual environments.

Future work can build on MG-VTON by exploring its integration with dynamic 3D modeling techniques and real-time processing capabilities. Additionally, further research could refine the conditional parsing networks to handle more complex interplays of clothing textures and body dynamics.

In summation, the Multi-Pose Guided Virtual Try-On Network puts forward an optimized approach for synthesizing realistic person images under varying poses and clothing, significantly enhancing the potential of virtual try-on systems in practical applications.

PDF Markdown