Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models (2401.07450v4)

Published 15 Jan 2024 in cs.CV and cs.AI

Abstract: Fashion design is a challenging and complex process.Recent works on fashion generation and editing are all agnostic of the actual fashion design process, which limits their usage in practice.In this paper, we propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff. Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages: 1) an ideation stage that generates design proposals given high-level concepts and 2) an iteration stage that continuously refines the proposals using low-level attributes. Our model supports fashion design generation and fine-grained local editing in a single framework. To train our model, we contribute a new dataset of full-body fashion images annotated with hierarchical text descriptions. Extensive evaluations show that, as compared to prior approaches, our method can generate fashion designs and edited results with higher fidelity and better prompt adherence, showing its promising potential to augment the practical fashion design workflow. Code and Dataset are available at https://github.com/haoli-zbdbc/hierafashdiff.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Attribute manipulation generative adversarial networks for fashion images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10541–10550, 2019.
  2. Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network. Pattern Recognition Letters, 135:22–29, 2020.
  3. Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
  4. Multimodal garment designer: Human-centric latent diffusion models for fashion image editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  5. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. arXiv preprint arXiv:2304.08465, 2023a.
  6. Image reference-guided fashion design with structure-aware transfer by diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3524–3528, 2023b.
  7. Tailorgan: making user-defined fashion designs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3241–3250, 2020.
  8. Edit like a designer: Modeling design workflows for unaligned fashion editing. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3492–3500, 2021.
  9. Automatic spatially-aware fashion concept discovery. In Proceedings of the IEEE international conference on computer vision, pages 1463–1471, 2017.
  10. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  11. CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP, 2021.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
  13. Text2human: Text-driven controllable human image generation. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022.
  14. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  15. Leveraging off-the-shelf diffusion model for multi-attribute fashion image manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 848–857, 2023.
  16. Tailor me: An editing network for fashion attribute shape manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3831–3840, 2022.
  17. Theme-matters: fashion compatibility learning via theme attention. arXiv preprint arXiv:1912.06227, 2019.
  18. M6-fashion: High-fidelity multi-modal image generation and editing. arXiv preprint arXiv:2205.11705, 2022.
  19. Fashiontex: Controllable virtual try-on with text and texture. arXiv preprint arXiv:2305.04451, 2023.
  20. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  21. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  22. Dress code: High-resolution multi-category virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2231–2235, 2022.
  23. Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning, pages 7176–7185. PMLR, 2020.
  24. Fice: Text-conditioned fashion image editing with guided gan inversion. arXiv preprint arXiv:2301.02110, 2023.
  25. Fashion-attgan: Attribute-aware fashion editing with multi-objective gan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
  26. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  27. Generative adversarial text to image synthesis. In International conference on machine learning, pages 1060–1069. PMLR, 2016.
  28. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  29. Fashion-gen: The generative fashion dataset and challenge. arXiv preprint arXiv:1806.08317, 2018.
  30. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  31. Metrabs: metric-scale truncation-robust heatmaps for absolute 3d human pose estimation. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(1):16–30, 2020.
  32. Long and diverse text generation with planning-based hierarchical variational model. arXiv preprint arXiv:1908.06605, 2019.
  33. Sgdiff: A style guided diffusion model for fashion synthesis. arXiv preprint arXiv:2308.07605, 2023.
  34. Df-gan: A simple and effective baseline for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16515–16525, 2022.
  35. Coarse-to-fine attribute editing for fashion images. In Artificial Intelligence: First CAAI International Conference, CICAI 2021, Hangzhou, China, June 5–6, 2021, Proceedings, Part I 1, pages 396–407. Springer, 2021.
  36. Fashion iq: A new dataset towards retrieving images by natural language feedback. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11307–11317, 2021.
  37. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1316–1324, 2018.
  38. Toward intelligent design: An ai-based fashion designer using generative adversarial networks aided by sketch and rendering generators. IEEE Transactions on Multimedia, 2022.
  39. Toward intelligent fashion design: A texture and shape disentangled generative adversarial network. ACM Transactions on Multimedia Computing, Communications and Applications, 19(3):1–23, 2023.
  40. Fashion captioning: Towards generating accurate descriptions with semantic rewards. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 1–17. Springer, 2020.
  41. Stylized text-to-fashion image generation. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–8. IEEE, 2021.
  42. Armani: Part-level garment-text alignment for unified cross-modal fashion design. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4525–4535, 2022.
  43. Diffcloth: Diffusion based garment synthesis and manipulation via structural cross-modal semantic alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23154–23163, 2023.
  44. Long and diverse text generation with planning-based hierarchical variational model. 2019.
  45. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5810, 2019.
  46. Be your own prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE international conference on computer vision, pages 1680–1688, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.