Improving Diffusion Models for Authentic Virtual Try-on in the Wild (2403.05139v3)

Published 8 Mar 2024 in cs.CV

Abstract: This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page: https://idm-vton.github.io

References (1)

team, D.: Stable diffusion xl inpainting. link (2023)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces IDM–VTON, a dual-module diffusion framework that enhances garment fidelity by integrating high-level semantics with low-level detail extraction.
It leverages innovative cross-attention and self-attention modules, achieving improved LPIPS, SSIM, CLIP, and FID metrics over previous models.
The framework's robust performance on real-world datasets, including 'In-the-Wild', demonstrates its practical potential for e-commerce and paves the way for future AI-driven fashion research.

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

This paper addresses significant challenges within the domain of image-based virtual try-on (VTON) using diffusion models, presenting a novel framework known as IDM--VTON. By leveraging advancements in diffusion models, the authors aim to generate more authentic virtual try-on images, while preserving the fidelity of garments in complex real-world scenarios.

Methodological Advancements

The authors introduce IDM--VTON, which improves upon existing exemplar-based inpainting diffusion models. Two primary modules are introduced:

Image Prompt Adapter (IP-Adapter): This module encodes the high-level semantics of the garment using visual encoders, feeding this abstraction into the cross-attention layers of the diffusion model.
GarmentNet: Acting as a parallel UNet encoder, GarmentNet captures low-level features of the garment to preserve intricate details, passing this information to the self-attention layers.

This dual-module architecture allows IDM--VTON to substantially improve garment fidelity over previous methods by simultaneously capturing both macro and micro aspects of the garment image.

Additionally, the authors enhance the model's performance by integrating detailed textual prompts related to the garment and person images, exploiting the rich generative prior of pretrained text-to-image diffusion models.

Experimental Framework

IDM--VTON is trained and evaluated on multiple datasets, including VITON-HD and DressCode. A novel and more challenging dataset, "In-the-Wild," was also introduced to simulate real-world scenarios. This dataset contains garments with intricate patterns and people in diverse poses and backgrounds, conditions under which previous models often perform inadequately.

Results and Implications

The IDM--VTON model demonstrates superior performance over existing GAN-based and diffusion-based methods. On standard datasets, it achieves improved quantitative performance metrics such as LPIPS and SSIM for reconstruction, higher CLIP image similarity, and lower FID scores, reflecting enhanced image fidelity and garment accuracy.

Notably, IDM--VTON's customization capability, using a single pair of person-garment images, exhibits significant improvements in challenging real-world scenarios presented by the "In-the-Wild" dataset. This indicates practical applicability in e-commerce environments where garments and human images may vary widely in appearance and context.

Implications for Future Research

The methodological advancements proposed in this paper have broader implications for the field of AI-driven fashion technology. The incorporation of diffusion models in VTON tasks, combined with detailed textual prompts and advanced conditioning techniques, may inspire further research into more nuanced and realistic virtual try-on applications. The dual-module strategy may also be applicable to other areas needing precise visual and semantic fidelity.

Future research could explore more comprehensive conditioning of other human attributes (e.g., tattoos) and further integration of textual control in garment generation. Exploring these areas could lead to even more robust and flexible virtual try-on models capable of handling increasingly complex scenarios.

In conclusion, this paper enriches the discourse on diffusion models in computational fashion, providing substantial advancements in garment fidelity and image realism for virtual try-on systems. The results mark a promising trajectory for further exploring diffusion-based synthesis methods in real-world applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/camenduru/status/1850276767975751870

https://twitter.com/taziku_co/status/1782883334206742834

https://twitter.com/OMNIOUS_AI/status/1822828050016551147

https://twitter.com/a_cspr/status/1784934945536004289

https://twitter.com/evildisease/status/1795542551019065516

YouTube

Show All Videos