Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions (2206.14180v2)

Published 28 Jun 2022 in cs.CV and cs.AI

Abstract: Image-based virtual try-on aims to synthesize an image of a person wearing a given clothing item. To solve the task, the existing methods warp the clothing item to fit the person's body and generate the segmentation map of the person wearing the item before fusing the item with the person. However, when the warping and the segmentation generation stages operate individually without information exchange, the misalignment between the warped clothes and the segmentation map occurs, which leads to the artifacts in the final image. The information disconnection also causes excessive warping near the clothing regions occluded by the body parts, so-called pixel-squeezing artifacts. To settle the issues, we propose a novel try-on condition generator as a unified module of the two stages (i.e., warping and segmentation generation stages). A newly proposed feature fusion block in the condition generator implements the information exchange, and the condition generator does not create any misalignment or pixel-squeezing artifacts. We also introduce discriminator rejection that filters out the incorrect segmentation map predictions and assures the performance of virtual try-on frameworks. Experiments on a high-resolution dataset demonstrate that our model successfully handles the misalignment and occlusion, and significantly outperforms the baselines. Code is available at https://github.com/sangyun884/HR-VITON.

Citations (107)

Summary

  • The paper introduces a unified Try-On Condition Generator that concurrently manages clothing warping and segmentation to eliminate misalignment.
  • It demonstrates significant improvements in photorealism and robustness using metrics like FID and KID on a dataset of over 13,000 image pairs.
  • The method effectively handles body-part occlusions to preserve garment details, setting a new benchmark for high-resolution virtual try-on.

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

This paper addresses the challenges posed by misalignment and occlusion in high-resolution image-based virtual try-on systems. The proposed framework aims to synthesize realistic images of individuals wearing specified clothing items, with a focus on high-resolution output (1024x768 pixels). The authors introduce a novel architecture that combines the warping of clothing items with segmentation map generation, ensuring these two vital processes are aligned and interconnected.

Methodology

The authors identify two significant challenges in current virtual try-on methodologies: misalignment between warped clothing and segmentation maps, and pixel-squeezing artifacts due to occlusions. To tackle these issues, the paper proposes a unified module—termed the "Try-On Condition Generator"—that addresses both warping and segmentation in tandem.

Key components of the proposed methodology include:

  • Try-On Condition Generator: This module integrates two pathways (for flow and segmentation) that share and exchange information, preventing misalignment and appropriately handling occlusions caused by body parts. A feature fusion block allows the simultaneous prediction of a warped garment and a perfectly aligned segmentation map.
  • Condition Aligning: Ensures that the segmentation map aligns with the warped clothing, completely eliminating regions of misalignment.
  • Body Part Occlusion Handling: Handles occlusions naturally, avoiding excessive warping and preserving clothing details without pixel-squeezing artifacts.
  • Discriminator Rejection: Incorporates a method for rejecting low-quality segmentation maps during testing, enhancing the robustness of the virtual try-on application for real-world scenarios.

Experimental Results

The authors conducted experiments on a high-resolution dataset comprising over 13,000 pairs of garment and person images, demonstrating significant improvements over existing methods. Quantitative metrics such as FID and KID underscore the performance superiority of the proposed model. Visual comparisons show enhanced photorealism and preservation of clothing details, with the model effectively handling complex body poses and occlusions.

The approach outperforms notable baselines such as CP-VTON, ACGPN, VITON-HD, and PF-AFN across multiple resolutions. The qualitative analyses further illustrate the model's ability to generate coherent and artifact-free outputs, regardless of the high degree of variability in the input data.

Implications and Future Directions

The research offers a significant advancement in high-resolution virtual try-on techniques, with practical implications for the online retail industry, where such systems can enhance customer experiences by providing realistic simulations of clothing on various body types and poses.

From a theoretical standpoint, the paper demonstrates the power of integrating different stages of image synthesis into a cohesive system that handles both global alignment and local occlusions. Future research directions may explore extending these techniques to other domains requiring precise feature alignment and integration, such as augmented reality applications and more diverse clothing types.

The proposed architecture paves the way for additional investigations into improving generative adversarial networks (GANs) through enhanced discriminator functionalities and integrating multi-modal inputs for richer try-on experiences.

In conclusion, this paper presents a robust solution to some of the pressing challenges in high-resolution virtual try-on systems, setting a benchmark for future advancements in the field while providing clear pathways for practical and theoretical developments in AI-driven image synthesis.