Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models (2403.19645v1)

Published 28 Mar 2024 in cs.CV

Abstract: The rapid advancement in image generation models has predominantly been driven by diffusion models, which have demonstrated unparalleled success in generating high-fidelity, diverse images from textual prompts. Despite their success, diffusion models encounter substantial challenges in the domain of image editing, particularly in executing disentangled edits-changes that target specific attributes of an image while leaving irrelevant parts untouched. In contrast, Generative Adversarial Networks (GANs) have been recognized for their success in disentangled edits through their interpretable latent spaces. We introduce GANTASTIC, a novel framework that takes existing directions from pre-trained GAN models-representative of specific, controllable attributes-and transfers these directions into diffusion-based models. This novel approach not only maintains the generative quality and diversity that diffusion models are known for but also significantly enhances their capability to perform precise, targeted image edits, thereby leveraging the best of both worlds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. SEGA: Instructing text-to-image models using semantic guidance. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  2. Ledits++: Limitless image editing using text-to-image models. arXiv preprint arXiv:2311.16711, 2023b.
  3. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  4. Noiseclr: A contrastive learning approach for unsupervised discovery of interpretable directions in diffusion models. arXiv preprint arXiv:2312.05390, 2023.
  5. Vecgan: Image-to-image translation with interpretable latent directions. In European Conference on Computer Vision, pages 153–169. Springer, 2022.
  6. Image-to-image translation with disentangled latent vectors for face editing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  7. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092, 2023.
  8. Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5744–5753, 2019.
  9. Improving negative-prompt inversion via proximal guidance. arXiv preprint arXiv:2306.05414, 2023.
  10. Ganspace: Discovering interpretable gan controls. arXiv preprint arXiv:2004.02546, 2020.
  11. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  12. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  13. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  15. An edit friendly ddpm noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140, 2023.
  16. On the” steerability” of generative adversarial networks. arXiv preprint arXiv:1907.07171, 2019.
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  18. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  19. Stylemc: multi-channel based fast text-guided image generation and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 895–904, 2022.
  20. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018.
  21. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022.
  22. When stylegan meets stable diffusion: a 𝒲+subscript𝒲\mathcal{W}_{+}caligraphic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT adapter for personalized image generation. arXiv preprint arXiv:2311.17461, 2023.
  23. Compositional visual generation with composable diffusion models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pages 423–439. Springer, 2022.
  24. Unsupervised compositional concepts discovery with text-to-image generative models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2085–2095, 2023.
  25. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  26. Disentangling disentanglement in variational autoencoders. In International conference on machine learning, pages 4402–4412. PMLR, 2019.
  27. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  28. Understanding the latent space of diffusion models through the lens of riemannian geometry. arXiv preprint arXiv:2307.12868, 2023.
  29. Styleclip: Text-driven manipulation of stylegan imagery. arXiv preprint arXiv:2103.17249, 2021.
  30. Styleres: Transforming the residuals for real image editing with stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1828–1837, 2023.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  33. Closed-form factorization of latent semantics in gans. arXiv preprint arXiv:2007.06600, 2020.
  34. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1532–1540, 2021.
  35. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  36. Fantastic style channels and where to find them: A submodular framework for discovering diverse directions in gans. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4731–4740, 2023.
  37. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  38. Stylegan-fusion: Diffusion guided domain adaptation of image generators. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5453–5463, 2024.
  39. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7064–7073, 2017.
  40. Unitune: Text-driven image editing by fine tuning a diffusion model on a single image. 42(4), 2023.
  41. Unsupervised discovery of interpretable directions in the gan latent space. In International Conference on Machine Learning, pages 9786–9796. PMLR, 2020.
  42. Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
  43. Uncovering the disentanglement capability in text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1900–1910, 2023.
  44. Stylespace analysis: Disentangled controls for stylegan image generation. arXiv preprint arXiv:2011.12799, 2020.
  45. Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3121–3138, 2022.
  46. Latentclr: A contrastive learning approach for unsupervised discovery of interpretable directions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14263–14272, 2021.
  47. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  48. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yusuf Dalva (12 papers)
  2. Hidir Yesiltepe (9 papers)
  3. Pinar Yanardag (34 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.