Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models (2306.05182v1)

Published 15 May 2023 in cs.CV and cs.LG

Abstract: Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe, helping fashion designers in real-time visualization by giving them a basic customized structure of how a specific design preference would look in real life and what further improvements can be made for enhanced customer satisfaction. Moreover, users can alone interact and generate fashionable images by just giving a few simple prompts. Recently, diffusion models have gained popularity as generative models owing to their flexibility and generation of realistic images from Gaussian noise. Latent diffusion models are a type of generative model that use diffusion processes to model the generation of complex data, such as images, audio, or text. They are called "latent" because they learn a hidden representation, or latent variable, of the data that captures its underlying structure. We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs) and suggesting ways to compose multiple probability distributions. We describe a pipeline on how our method can be used specifically for new fashionable outfit generation and virtual try-on using LLM-guided text-to-image generation. Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles and reducing bias.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Image2stylegan: How to embed images into the stylegan latent space? In International Conference on Computer Vision (ICCV), pages 4431–4440, 2019.
  2. Image2stylegan++: How to edit the embedded images? In Computer Vision and Pattern Recognition (CVPR), 2020.
  3. Restyle: A residual-based stylegan encoder via iterative refinement. In International Conference on Computer Vision (ICCV), pages 6711–6720, 2021.
  4. Hyperstyle: Stylegan inversion with hypernetworks for real image editing. arXiv preprint arXiv:2111.15666, 2021.
  5. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML), pages 214–223, 2017.
  6. Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5420–5430, 2019.
  7. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR), 2018.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Fashion meets computer vision: A survey. ACM Computing Surveys (CSUR), 54(4):1–41, 2021.
  10. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  11. Diffusion models beat gans on image synthesis. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8780–8794. Curran Associates, Inc., 2021.
  12. Towards multi-pose guided virtual try-on network. In International Conference on Computer Vision (ICCV), pages 9026–9035, 2019.
  13. Semantic image synthesis via adversarial learning. In International Conference on Computer Vision (ICCV), pages 5706–5714, 2017.
  14. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. arXiv preprint arXiv:2302.11552, 2023.
  15. C-vton: Context-driven image-based virtual try-on network. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3144–3153, 2022.
  16. Disentangled cycle consistency for highly-realistic virtual try-on. In Computer Vision and Pattern Recognition (CVPR), pages 16928–16937, 2021.
  17. Parser-free virtual try-on via distilling appearance flows. In Computer Vision and Pattern Recognition (CVPR), pages 8485–8493, 2021.
  18. Generative adversarial nets. In Advances in neural information processing systems (NIPS), 2014.
  19. Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological), 56(4):549–581, 1994.
  20. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems (NIPS), pages 5767–5777, 2017.
  21. Viton: An image-based virtual try-on network. In Computer Vision and Pattern Recognition (CVPR), pages 7543–7552, 2018.
  22. Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
  23. Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282, 2021.
  24. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  25. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  26. Do not mask what you do not need to mask: a parser-free virtual try-on. In European Conference on Computer Vision, pages 619–635. Springer, 2020.
  27. Text2human: Text-driven controllable human image generation. arXiv preprint arXiv:2205.15996, 2022.
  28. Total capture: A 3d deformation model for tracking faces, hands, and bodies. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8320–8329, 2018.
  29. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations (ICLR), 2018.
  30. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, pages 12104–12114, 2020.
  31. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  32. A style-based generator architecture for generative adversarial networks, 2018.
  33. A style-based generator architecture for generative adversarial networks. In Computer Vision and Pattern Recognition (CVPR), pages 4401–4410, 2019.
  34. Analyzing and improving the image quality of stylegan. In Computer Vision and Pattern Recognition (CVPR), pages 8110–8119, 2020.
  35. On density estimation with diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  36. On fast sampling of diffusion probabilistic models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021.
  37. Manigan: Text-guided image manipulation. In Computer Vision and Pattern Recognition (CVPR), pages 7880–7889, 2020.
  38. Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017.
  39. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  40. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  41. Least squares generative adversarial networks. In International Conference on Computer Vision (ICCV), pages 2794–2802, 2017.
  42. Which training methods for gans do actually converge? In International Conference on Machine learning (ICML), pages 3481–3490, 2018.
  43. Cloth-vton: Clothing three-dimensional reconstruction for hybrid image-based virtual try-on. In Proceedings of the Asian conference on computer vision, 2020.
  44. Learning to transfer texture from clothing images to 3d humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7023–7034, 2020.
  45. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations (ICLR), 2018.
  46. Text-adaptive generative adversarial networks: manipulating images with natural language. Advances in neural information processing systems, 31, 2018.
  47. Styleclip: Text-driven manipulation of stylegan imagery. In International Conference on Computer Vision (ICCV), pages 2085–2094, 2021.
  48. Mirrorgan: Learning text-to-image generation by redescription. In Computer Vision and Pattern Recognition (CVPR), pages 1505–1514, 2019.
  49. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020, 2021.
  50. Learning transferable visual models from natural language supervision, 2021.
  51. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR), 2016.
  52. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  53. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  54. Generative adversarial text to image synthesis. In International conference on machine learning, pages 1060–1069. PMLR, 2016.
  55. Encoding in style: a stylegan encoder for image-to-image translation. In Computer Vision and Pattern Recognition (CVPR), pages 2287–2296, 2021.
  56. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
  57. High-resolution image synthesis with latent diffusion models, 2021.
  58. U-net: Convolutional networks for biomedical image segmentation, 2015.
  59. Fashion-gen: The generative fashion dataset and challenge. arXiv preprint arXiv:1806.08317, 2018.
  60. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  61. Noise estimation for generative diffusion models, 2021.
  62. Gustavo Santana. Magicprompt - stable diffusion, 2022.
  63. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  64. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  65. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  66. Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
  67. Sp-viton: shape-preserving image-based virtual try-on network. Multimedia Tools and Applications, 79:33757–33769, 2020.
  68. Denoising diffusion implicit models. arXiv:2010.02502, October 2020.
  69. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  70. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  71. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
  72. Score-based generative modeling in latent space. In Neural Information Processing Systems (NeurIPS), 2021.
  73. Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the European conference on computer vision (ECCV), pages 589–604, 2018.
  74. Toward characteristic-preserving image-based virtual try-on network. In European Conference on Computer Vision (ECCV), pages 589–604, 2018.
  75. A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology, 22(6):660–674, 2017.
  76. Tedigan: Text-guided diverse face image generation and manipulation. In Computer Vision and Pattern Recognition, pages 2256–2265, 2021.
  77. GAN inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  78. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Computer Vision and Pattern Recognition (CVPR), pages 1316–1324, 2018.
  79. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Computer Vision and Pattern Recognition (CVPR), pages 7850–7859, 2020.
  80. Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. In International Conference on Computer Vision (ICCV), pages 10511–10520, 2019.
  81. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In International Conference on Computer Vision (ICCV), pages 5907–5915, 2017.
  82. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8):1947–1962, 2018.
  83. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision (ICCV), pages 2223–2232, 2017.
  84. Be your own prada: Fashion synthesis with structural coherence. In International Conference on Computer Vision (ICCV), October 2017.
  85. Be your own prada: Fashion synthesis with structural coherence. In International Conference on Computer Vision (ICCV), pages 1680–1688, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.