FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on (2404.14162v3)
Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.
- Single stage virtual try-on via deformable attention flows. In ECCV, pages 409–425, 2022.
- Seeing what a GAN cannot generate. In ICCV, pages 4502–4511, 2019.
- Demystifying MMD GANs. In ICLR, 2018.
- BerDiff: Conditional bernoulli diffusion model for medical image segmentation. In MICCAI, 2023.
- VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In CVPR, pages 14131–14140, 2021.
- C-VTON: Context-driven image-based virtual try-on network. In WACV, pages 3144–3153, 2022.
- Parser-free virtual try-on via distilling appearance flows. In CVPR, pages 8485–8493, 2021.
- Generative adversarial nets. In NIPS, 2014.
- Taming the power of diffusion models for high-quality virtual Try-On with appearance flow. In ACM MM, pages 7599–7607, 2023.
- ClothFlow: A flow-based model for clothed person generation. In ICCV, pages 10471–10480, 2019.
- Style-based global appearance flow for virtual try-on. In CVPR, pages 3470–3479, 2022.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. NIPS, 30, 2017.
- Classifier-free diffusion guidance. arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. In NIPS, pages 6840–6851, 2020.
- Adaptive nonlinear latent transformation for conditional face editing. In ICCV, pages 21022–21031, 2023.
- Perceptual losses for real-time style transfer and super-resolution. In ECCV, pages 694–711, 2016.
- High-resolution virtual try-on with misalignment and occlusion-handled conditions. In ECCV, pages 204–219. Springer, 2022.
- Toward accurate and realistic outfits visualization with attention to details. In CVPR, pages 15546–15555, 2021.
- Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017.
- DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In NIPS, volume 35, pages 5775–5787, 2022.
- CP-VTON+: Clothing shape and texture preserving image-based virtual try-on. In CVPRW, volume 3, pages 10–14, 2020.
- Dress code: High-resolution multi-category virtual try-on. In CVPR, pages 2231–2235, 2022.
- LaDI-VTON: Latent diffusion textual-inversion enhanced virtual try-on. In ACM MM, 2023.
- Improved denoising diffusion probabilistic models. In ICML, pages 8162–8171, 2021.
- DINOv2: Learning robust visual features without supervision. arXiv:2304.07193, 2023.
- SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Zero-shot text-to-image generation. In ICML, pages 8821–8831. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
- FreeU: Free lunch in diffusion U-Net. arXiv:2309.11497, 2023.
- Denoising diffusion implicit models. In ICLR, 2020.
- Attention is all you need. In NIPS, 2017.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004.
- Toward characteristic-preserving image-based virtual try-on network. In ECCV, pages 589–604, 2018.
- Joint learning framework of cross-modal synthesis and diagnosis for alzheimer’s disease by mining underlying shared modality information. Medical Image Analysis, 91:103032, 2024.
- Dreamvideo: Composing your dream videos with customized subject and motion. In CVPR, 2024.
- GP-VTON: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. In CVPR, pages 23550–23559, 2023.
- Paint by Example: Exemplar-based image editing with diffusion models. In CVPR, pages 18381–18391, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In ICCV, pages 586–595, 2018.
- TryOnDiffusion: A tale of two unets. In CVPR, pages 4606–4615, 2023.