StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human (2305.16759v4)
Abstract: This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods.
- StyleGAN-Human: A data-centric odyssey of human generation. CoRR abs/2204.11823 (2022)
- A style-based generator architecture for generative adversarial networks. In: CVPR. (2019) 4401–4410
- Analyzing and improving the image quality of stylegan. In: CVPR. (2020) 8110–8119
- StyleCLIP: Text-driven manipulation of stylegan imagery. In: ICCV. (2021) 2085–2094
- TediGAN: Text-guided diverse face image generation and manipulation. In: CVPR. (2021) 2256–2265
- Clip2StyleGAN: Unsupervised extraction of StyleGAN edit directions. In: ACM SIGGRAPH 2022 conference proceedings. (2022) 1–9
- HairCLIP: Design your hair by text and reference image. In: CVPR. (2022) 18072–18081
- DiffusionCLIP: Text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on CVPR. (2022) 2426–2435
- StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG) 41(4) (2022) 1–13
- ManiCLIP: Multi-attribute face manipulation from text. arXiv preprint arXiv:2210.00445 (2022)
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)
- Learning transferable visual models from natural language supervision. In: ICML, PMLR (2021) 8748–8763
- Generative adversarial nets. In: Advances in Neural Information Processing Systems. Volume 27., Curran Associates, Inc. (2014)
- Wasserstein generative adversarial networks. In: ICML, PMLR (2017) 214–223
- Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations. (2018)
- Self-attention generative adversarial networks. In: ICML, PMLR (2019) 7354–7363
- Large scale GAN training for high fidelity natural image synthesis. In: ICLR. (2019)
- InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS 2016. 2172–2180
- Unsupervised discovery of interpretable directions in the GAN latent space. In: ICML 2020. (2020) 9786–9796
- GANSpace: Discovering interpretable GAN controls. In: NeurIPS 2019. (2020)
- Closed-form factorization of latent semantics in GANs. In: CVPR 2021. (2021) 1532–1540
- EigenGAN: Layer-wise eigen-learning for GANs. In: ICCV 2021. (2021)
- LatentCLR: A contrastive learning approach for unsupervised discovery of interpretable directions. In: ICCV 2021. (2021) 14243–14252
- Low-rank subspaces in GANs. In: NeurIPS 2021. (2021)
- PandA: Unsupervised learning of parts and appearances in the feature maps of GANs. In: Int. Conf. Learn. Represent. (2023)
- Interpreting the latent space of GANs for semantic face editing. In: CVPR 2020. (2020) 9240–9249
- StyleFlow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. ACM Trans. Graph. 40(3) (2021) 21:1–21:21
- Discovering interpretable latent space directions of GANs beyond binary attributes. In: CVPR 2021. (2021) 12177–12185
- On the “steerability” of generative adversarial networks. In: ICLR 2020. (2020)
- GAN “steerability” without optimization. In: ICLR 2021. (2021)
- Viton: An image-based virtual try-on network. In: CVPR. (2018)
- Toward characteristic-preserving image-based virtual try-on network. In: ECCV. (2018) 589–604
- VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In: ICCV. (2019) 10511–10520
- SP-VITON: shape-preserving image-based virtual try-on network. Multimedia Tools and Applications 79 (2020) 33757–33769
- Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: CVPR. (2020) 7850–7859
- VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In: CVPR. (2021) 14131–14140
- High-resolution virtual try-on with misalignment and occlusion-handled conditions. In: ECCV, Springer (2022) 204–219
- C-VTON: Context-driven image-based virtual try-on network. In: WACV. (2022) 3144–3153
- Stylespace analysis: Disentangled controls for StyleGAN image generation. In: CVPR. (2021) 12863–12872
- High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022) 10684–10695
- Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427 (2022)
- Pivotal tuning for latent-based editing of real images. ACM Tran. on Graph. (TOG) 42(1) (2022) 1–13
- Attention is all you need. Advances in neural information processing systems 30 (2017)
- Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(6) (2020) 3260–3271
- GANs spatial control via inference-time adaptive normalization. In: WACV. (2022) 2160–2169
- Zero-shot image-to-image translation. CoRR abs/2302.03027 (2023)
- The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR. (2018)
- Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 40(4) (2021) 1–14
- Wright, L.: Ranger - a synergistic optimizer. https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer (2019)
- Takato Yoshikawa (1 paper)
- Yuki Endo (20 papers)
- Yoshihiro Kanamori (23 papers)