Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PG-VTON: A Novel Image-Based Virtual Try-On Method via Progressive Inference Paradigm (2304.08956v2)

Published 18 Apr 2023 in cs.CV

Abstract: Virtual try-on is a promising computer vision topic with a high commercial value wherein a new garment is visually worn on a person with a photo-realistic effect. Previous studies conduct their shape and content inference at one stage, employing a single-scale warping mechanism and a relatively unsophisticated content inference mechanism. These approaches have led to suboptimal results in terms of garment warping and skin reservation under challenging try-on scenarios. To address these limitations, we propose a novel virtual try-on method via progressive inference paradigm (PGVTON) that leverages a top-down inference pipeline and a general garment try-on strategy. Specifically, we propose a robust try-on parsing inference method by disentangling semantic categories and introducing consistency. Exploiting the try-on parsing as the shape guidance, we implement the garment try-on via warping-mapping-composition. To facilitate adaptation to a wide range of try-on scenarios, we adopt a covering more and selecting one warping strategy and explicitly distinguish tasks based on alignment. Additionally, we regulate StyleGAN2 to implement re-naked skin inpainting, conditioned on the target skin shape and spatial-agnostic skin features. Experiments demonstrate that our method has state-of-the-art performance under two challenging scenarios. The code will be available at https://github.com/NerdFNY/PGVTON.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. P. Hu, E. S.-L. Ho, and A. Munteanu, “3dbodynet: fast reconstruction of 3d animatable human body shape from a single commodity depth camera,” IEEE Transactions on Multimedia, vol. 24, pp. 2139–2149, 2021.
  2. T. Zhao, S. Li, K. N. Ngan, and F. Wu, “3-d reconstruction of human body shape from a single commodity depth camera,” IEEE Transactions on Multimedia, vol. 21, no. 1, pp. 114–123, 2018.
  3. Y. A. Sekhavat, “Privacy preserving cloth try-on using mobile augmented reality,” IEEE Transactions on Multimedia, vol. 19, no. 5, pp. 1041–1049, 2016.
  4. X. Han, Z. Wu, Z. Wu, R. Yu, and L. S. Davis, “Viton: An image-based virtual try-on network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2018, pp. 7543–7552.
  5. B. Wang, H. Zheng, X. Liang, Y. Chen, L. Lin, and M. Yang, “Toward characteristic-preserving image-based virtual try-on network,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 589–604.
  6. R. Yu, X. Wang, and X. Xie, “Vtnfp: An image-based virtual try-on network with body and clothing feature preservation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 10 511–10 520.
  7. H. Yang, R. Zhang, X. Guo, W. Liu, W. Zuo, and P. Luo, “Towards photo-realistic virtual try-on by adaptively generating-preserving image content,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7850–7859.
  8. M. R. Minar, T. T. Tuan, H. Ahn, P. Rosin, and Y.-K. Lai, “Cp-vton+: Clothing shape and texture preserving image-based virtual try-on,” in CVPR Workshops, 2020.
  9. A. Neuberger, E. Borenstein, B. Hilleli, E. Oks, and S. Alpert, “Image based virtual try-on network from unpaired data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5184–5193.
  10. C. Ge, Y. Song, Y. Ge, H. Yang, W. Liu, and P. Luo, “Disentangled cycle consistency for highly-realistic virtual try-on,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 928–16 937.
  11. S. He, Y.-Z. Song, and T. Xiang, “Style-based global appearance flow for virtual try-on,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 3470–3479.
  12. S. Bai, H. Zhou, Z. Li, C. Zhou, and H. Yang, “Single stage virtual try-on via deformable attention flows,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 409–425.
  13. N. Fang, L. Qiu, S. Zhang, Z. Wang, K. Hu, and L. Dong, “A novel human image sequence synthesis method by pose-shape-content inference,” IEEE Transactions on Multimedia, vol. 25, pp. 6512–6524, 2023.
  14. L. Ma, K. Huang, D. Wei, Z.-Y. Ming, and H. Shen, “Fda-gan: Flow-based dual attention gan for human pose transfer,” IEEE Transactions on Multimedia, 2021.
  15. B. Hu, P. Liu, Z. Zheng, and M. Ren, “Spg-vton: Semantic prediction guidance for multi-pose virtual try-on,” IEEE Transactions on Multimedia, vol. 24, pp. 1233–1246, 2022.
  16. H. Dong, X. Liang, X. Shen, B. Wang, H. Lai, J. Zhu, Z. Hu, and J. Yin, “Towards multi-pose guided virtual try-on network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9026–9035.
  17. Z. Xie, Z. Huang, F. Zhao, H. Dong, M. Kampffmeyer, and X. Liang, “Towards scalable unpaired virtual try-on via patch-routed spatially-adaptive gan,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021, pp. 2598–2610.
  18. A. Cui, D. McKee, and S. Lazebnik, “Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 14 638–14 647.
  19. A. Raj, P. Sangkloy, H. Chang, J. Hays, D. Ceylan, and J. Lu, “Swapnet: Image based garment transfer,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2018, pp. 679–695.
  20. Y. Liu, W. Chen, L. Liu, and M. S. Lew, “Swapgan: A multistage generative approach for person-to-person fashion style transfer,” IEEE Transactions on Multimedia, vol. 21, no. 9, pp. 2209–2222, 2019.
  21. T. Liu, J. Zhang, X. Nie, Y. Wei, S. Wei, Y. Zhao, and J. Feng, “Spatial-aware texture transformer for high-fidelity garment transfer,” IEEE Transactions on Image Processing, vol. 30, pp. 7499–7510, 2021.
  22. K. M. Lewis, S. Varadharajan, and I. Kemelmacher-Shlizerman, “Tryongan: Body-aware try-on via layered interpolation,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–10, 2021.
  23. H. Dong, X. Liang, X. Shen, B. Wu, B.-C. Chen, and J. Yin, “Fw-gan: Flow-navigated warping gan for video virtual try-on,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1161–1170.
  24. C.-Y. Chen, L. Lo, P.-J. Huang, H.-H. Shuai, and W.-H. Cheng, “Fashionmirror: Co-attention feature-remapping virtual try-on with sequential template poses,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 809–13 818.
  25. S. Choi, S. Park, M. Lee, and J. Choo, “Viton-hd: High-resolution virtual try-on via misalignment-aware normalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 131–14 140.
  26. T. Issenhuth, J. Mary, and C. Calauzènes, “Do not mask what you do not need to mask: a parser-free virtual try-on,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 619–635.
  27. Y. Ge, Y. Song, R. Zhang, C. Ge, W. Liu, and P. Luo, “Parser-free virtual try-on via distilling appearance flows,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8485–8493.
  28. M. Jaderberg, K. Simonyan, A. Zisserman, and k. kavukcuoglu, “Spatial transformer networks,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 28.   Curran Associates, Inc., 2015.
  29. F. L. Bookstein, “Principal warps: Thin-plate splines and the decomposition of deformations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 6, pp. 567–585, 1989.
  30. A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2015, pp. 2758–2766.
  31. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  32. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2020.
  33. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022.
  34. J. Yang, C. Li, P. Zhang, X. Dai, B. Xiao, L. Yuan, and J. Gao, “Focal attention for long-range interactions in vision transformers,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 34.   Curran Associates, Inc., 2021, pp. 30 008–30 022.
  35. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5728–5739.
  36. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8110–8119.
  37. R. A. Güler, N. Neverova, and I. Kokkinos, “Densepose: Dense human pose estimation in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7297–7306.
  38. H. He, J. Zhang, Q. Zhang, and D. Tao, “Grapy-ml: Graph pyramid mutual learning for cross-dataset human parsing,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 07, 2020, pp. 10 949–10 956.
  39. X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, “U2-net: Going deeper with nested u-structure for salient object detection,” Pattern Recognition, vol. 106, p. 107404, 2020.
  40. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  41. W. Peebles, J.-Y. Zhu, R. Zhang, A. Torralba, A. A. Efros, and E. Shechtman, “Gan-supervised dense visual alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 470–13 481.
  42. Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 07, 2020, pp. 13 001–13 008.
  43. M. Z. Alom, C. Yakopcic, M. Hasan, T. M. Taha, and V. K. Asari, “Recurrent residual u-net for medical image segmentation,” Journal of Medical Imaging, vol. 6, no. 1, p. 014006, 2019.
  44. Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.
  45. Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019.
  46. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
  47. S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6881–6890.
  48. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” arXiv preprint arXiv:2105.05537, 2021.
  49. Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7291–7299.
  50. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 2234–2242.
  51. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410.
  52. G. Liu, D. Song, R. Tong, and M. Tang, “Toward realistic virtual try-on through landmark guided shape matching,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 35, no. 3, 2021, pp. 2118–2126.
  53. X. Han, S. Zhang, Q. Liu, Z. Li, and C. Wang, “Progressive limb-aware virtual try-on,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 2420–2429.
  54. S. Su, Q. Yan, Y. Zhu, C. Zhang, X. Ge, J. Sun, and Y. Zhang, “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3667–3676.
Citations (6)

Summary

We haven't generated a summary for this paper yet.