Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human (2305.16759v4)

Published 26 May 2023 in cs.CV and cs.GR

Abstract: This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. StyleGAN-Human: A data-centric odyssey of human generation. CoRR abs/2204.11823 (2022)
  2. A style-based generator architecture for generative adversarial networks. In: CVPR. (2019) 4401–4410
  3. Analyzing and improving the image quality of stylegan. In: CVPR. (2020) 8110–8119
  4. StyleCLIP: Text-driven manipulation of stylegan imagery. In: ICCV. (2021) 2085–2094
  5. TediGAN: Text-guided diverse face image generation and manipulation. In: CVPR. (2021) 2256–2265
  6. Clip2StyleGAN: Unsupervised extraction of StyleGAN edit directions. In: ACM SIGGRAPH 2022 conference proceedings. (2022) 1–9
  7. HairCLIP: Design your hair by text and reference image. In: CVPR. (2022) 18072–18081
  8. DiffusionCLIP: Text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on CVPR. (2022) 2426–2435
  9. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG) 41(4) (2022) 1–13
  10. ManiCLIP: Multi-attribute face manipulation from text. arXiv preprint arXiv:2210.00445 (2022)
  11. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)
  12. Learning transferable visual models from natural language supervision. In: ICML, PMLR (2021) 8748–8763
  13. Generative adversarial nets. In: Advances in Neural Information Processing Systems. Volume 27., Curran Associates, Inc. (2014)
  14. Wasserstein generative adversarial networks. In: ICML, PMLR (2017) 214–223
  15. Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations. (2018)
  16. Self-attention generative adversarial networks. In: ICML, PMLR (2019) 7354–7363
  17. Large scale GAN training for high fidelity natural image synthesis. In: ICLR. (2019)
  18. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS 2016. 2172–2180
  19. Unsupervised discovery of interpretable directions in the GAN latent space. In: ICML 2020. (2020) 9786–9796
  20. GANSpace: Discovering interpretable GAN controls. In: NeurIPS 2019. (2020)
  21. Closed-form factorization of latent semantics in GANs. In: CVPR 2021. (2021) 1532–1540
  22. EigenGAN: Layer-wise eigen-learning for GANs. In: ICCV 2021. (2021)
  23. LatentCLR: A contrastive learning approach for unsupervised discovery of interpretable directions. In: ICCV 2021. (2021) 14243–14252
  24. Low-rank subspaces in GANs. In: NeurIPS 2021. (2021)
  25. PandA: Unsupervised learning of parts and appearances in the feature maps of GANs. In: Int. Conf. Learn. Represent. (2023)
  26. Interpreting the latent space of GANs for semantic face editing. In: CVPR 2020. (2020) 9240–9249
  27. StyleFlow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. ACM Trans. Graph. 40(3) (2021) 21:1–21:21
  28. Discovering interpretable latent space directions of GANs beyond binary attributes. In: CVPR 2021. (2021) 12177–12185
  29. On the “steerability” of generative adversarial networks. In: ICLR 2020. (2020)
  30. GAN “steerability” without optimization. In: ICLR 2021. (2021)
  31. Viton: An image-based virtual try-on network. In: CVPR. (2018)
  32. Toward characteristic-preserving image-based virtual try-on network. In: ECCV. (2018) 589–604
  33. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In: ICCV. (2019) 10511–10520
  34. SP-VITON: shape-preserving image-based virtual try-on network. Multimedia Tools and Applications 79 (2020) 33757–33769
  35. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: CVPR. (2020) 7850–7859
  36. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In: CVPR. (2021) 14131–14140
  37. High-resolution virtual try-on with misalignment and occlusion-handled conditions. In: ECCV, Springer (2022) 204–219
  38. C-VTON: Context-driven image-based virtual try-on network. In: WACV. (2022) 3144–3153
  39. Stylespace analysis: Disentangled controls for StyleGAN image generation. In: CVPR. (2021) 12863–12872
  40. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022) 10684–10695
  41. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427 (2022)
  42. Pivotal tuning for latent-based editing of real images. ACM Tran. on Graph. (TOG) 42(1) (2022) 1–13
  43. Attention is all you need. Advances in neural information processing systems 30 (2017)
  44. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(6) (2020) 3260–3271
  45. GANs spatial control via inference-time adaptive normalization. In: WACV. (2022) 2160–2169
  46. Zero-shot image-to-image translation. CoRR abs/2302.03027 (2023)
  47. The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR. (2018)
  48. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 40(4) (2021) 1–14
  49. Wright, L.: Ranger - a synergistic optimizer. https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer (2019)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Takato Yoshikawa (1 paper)
  2. Yuki Endo (20 papers)
  3. Yoshihiro Kanamori (23 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets