Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images (2311.16094v3)

Published 27 Nov 2023 in cs.CV and cs.GR

Abstract: Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN. ACM Transactions on Graphics (TOG), 40(6):1–11, 2021.
  2. Single Stage Virtual Try-on via Deformable Attention Flows. In European Conference on Computer Vision, pages 409–425. Springer, 2022.
  3. VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14131–14140, 2021.
  4. Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14638–14647, 2021.
  5. Learning Garment Densepose for Robust Warping in Virtual Try-On. arXiv preprint arXiv:2303.17688, 2023.
  6. Towards Multi-pose Guided Virtual Try-on Network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9026–9035, 2019.
  7. Dressing in the Wild by Watching Dance Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3480–3489, 2022.
  8. Hugging Face. Prompt weighting. https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts, a.
  9. Hugging Face. Stable diffison inpainting. https://huggingface.co/runwayml/stable-diffusion-inpainting, b.
  10. DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5337–5345, 2019.
  11. Parser-Free Virtual Try-on via Distilling Appearance Flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8485–8493, 2021.
  12. DensePose: Dense Human Pose Estimation In The Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7297–7306, 2018.
  13. VITON: An Image-Based Virtual Try-on Network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7543–7552, 2018.
  14. ClothFlow: A Flow-Based Model for Clothed Person Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10471–10480, 2019.
  15. Style-Based Global Appearance Flow for Virtual Try-On. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3470–3479, 2022.
  16. GANSs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, 30, 2017.
  17. Rerceptual Losses for Real-Time Style Transfer and Super-Resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  18. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
  19. High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions. In European Conference on Computer Vision, pages 204–219. Springer, 2022.
  20. TryOnGAN: Body-Aware Try-on via Layered Interpolation. ACM Transactions on Graphics (TOG), 40(4):1–10, 2021.
  21. Self-Correction for Human Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6):3260–3271, 2020.
  22. Deepfashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1096–1104, 2016.
  23. Controllable Person Image Synthesis with Attribute-Decomposed GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5084–5093, 2020.
  24. Dress Code: High-resolution Multi-Category Virtual Try-On. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2231–2235, 2022.
  25. Learning Transferable Visual Models from Natural Language Supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  26. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  27. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  28. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  29. StyleGAN-XL: Scaling Stylegan to Large Diverse Datasets. In ACM SIGGRAPH 2022 conference proceedings, pages 1–10, 2022.
  30. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
  31. Towards scalable unpaired virtual try-on via patch-routed spatially-adaptive gan. Advances in Neural Information Processing Systems, 34:2598–2610, 2021.
  32. Pasta-gan++: A versatile framework for high-resolution unpaired virtual try-on. arXiv preprint arXiv:2207.13475, 2022.
  33. GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23550–23559, 2023.
  34. Free-Form Image Inpainting with Gated Convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4471–4480, 2019.
  35. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  36. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR, 2018.
  37. TryOnDiffusion: A Tale of Two UNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4606–4615, 2023.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets