VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization (2103.16874v2)

Published 31 Mar 2021 in cs.CV

Abstract: The task of image-based virtual try-on aims to transfer a target clothing item onto the corresponding region of a person, which is commonly tackled by fitting the item to the desired body part and fusing the warped item with the person. While an increasing number of studies have been conducted, the resolution of synthesized images is still limited to low (e.g., 256x192), which acts as the critical limitation against satisfying online consumers. We argue that the limitation stems from several challenges: as the resolution increases, the artifacts in the misaligned areas between the warped clothes and the desired clothing regions become noticeable in the final results; the architectures used in existing methods have low performance in generating high-quality body parts and maintaining the texture sharpness of the clothes. To address the challenges, we propose a novel virtual try-on method called VITON-HD that successfully synthesizes 1024x768 virtual try-on images. Specifically, we first prepare the segmentation map to guide our virtual try-on synthesis, and then roughly fit the target clothing item to a given person's body. Next, we propose ALIgnment-Aware Segment (ALIAS) normalization and ALIAS generator to handle the misaligned areas and preserve the details of 1024x768 inputs. Through rigorous comparison with existing methods, we demonstrate that VITON-HD highly surpasses the baselines in terms of synthesized image quality both qualitatively and quantitatively. Code is available at https://github.com/shadow2496/VITON-HD.

Authors (4)

Seunghwan Choi (8 papers)
Sunghyun Park (38 papers)
Minsoo Lee (4 papers)
Jaegul Choo (161 papers)

Citations (198)

View on Semantic Scholar

Summary

Analysis of VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization

The paper under discussion presents VITON-HD, a cutting-edge approach in the field of image-based virtual try-on. The objective is to synthetically transfer a target clothing item onto a reference image of a person, producing high-resolution outputs that maintain essential details and characteristics. This process is inherently complex, given the need to adapt the clothing to the person’s body while preserving the original image’s integrity.

Technical Innovations

The authors identify key challenges in existing virtual try-on methods, primarily the low resolution (typically 256×192) and the presence of artifacts due to misalignments in the warped clothing areas. To address these, VITON-HD introduces several technical innovations:

Clothing-Agnostic Person Representation: This representation eliminates any dependency on the original clothing item by using pose and segmentation maps, effectively removing confounding clothing details while retaining relevant body and pose information.
ALIAS Normalization: A significant contribution of this paper is the ALIgnment-Aware Segment normalization technique. This advances on traditional instance normalization by separately standardizing activations in misaligned regions, thus reducing artifacts caused by background interference in the clothing warping process.
Multi-Scale Refinement: Utilizing a simplified encoder-less architecture with ALIAS normalization, VITON-HD performs multi-scale refinement at a feature level to preserve clothing texture and details, even at high resolutions up to 1024×768.

Experimental Evaluation

The paper’s empirical evaluation demonstrates that VITON-HD significantly outperforms existing methods such as CP-VTON and ACGPN:

Quantitative Metrics: The model achieves notable improvements in SSIM and LPIPS across various resolutions, with substantial gains at 1024×768. The FID score also reflects improved realism in unpaired settings, highlighting the model’s ability to produce convincing virtual try-on images.
Qualitative Analysis: Visual comparisons showcase VITON-HD’s superiority in maintaining the integrity and detail of the target clothing, overcoming limitations seen in other approaches. Misalignment artifacts are effectively mitigated, showcasing the efficacy of ALIAS normalization.
User Study: Further supporting the quantitative results, user studies indicate a strong preference for VITON-HD's outputs in terms of realism and detail preservation.

Implications and Future Work

The development of VITON-HD has practical implications for enhancing online shopping experiences by providing consumers with realistic, high-resolution virtual try-ons. This method could streamline the online clothing industry by reducing the rate of returns and improving customer satisfaction. Theoretically, the introduction of alignment-aware normalization may inspire further research into advanced normalization techniques across various image synthesis tasks.

For future developments, expanding the dataset diversity and addressing the current limitations in handling body shapes and out-of-dataset images could enhance the robustness and applicability of this approach. Additionally, exploring integration with real-time applications or further optimizing computational efficiency for deployment in practical scenarios might be valuable areas of investigation.

In conclusion, VITON-HD stands out as a significant advancement in image-based virtual try-on technologies, offering a detailed and sophisticated methodology for high-resolution clothing transfer that overcomes prior limitations. This work contributes meaningfully to both the academic understanding and practical application of virtual try-on systems.

PDF Markdown

Related Papers

GitHub

GitHub - shadow2496/VITON-HD: Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021) (814 stars)