Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models (2312.05390v1)

Published 8 Dec 2023 in cs.CV

Abstract: Generative models have been very popular in the recent years for their image generation capabilities. GAN-based models are highly regarded for their disentangled latent space, which is a key feature contributing to their success in controlled image editing. On the other hand, diffusion models have emerged as powerful tools for generating high-quality images. However, the latent space of diffusion models is not as thoroughly explored or understood. Existing methods that aim to explore the latent space of diffusion models usually relies on text prompts to pinpoint specific semantics. However, this approach may be restrictive in areas such as art, fashion, or specialized fields like medicine, where suitable text prompts might not be available or easy to conceive thus limiting the scope of existing work. In this paper, we propose an unsupervised method to discover latent semantics in text-to-image diffusion models without relying on text prompts. Our method takes a small set of unlabeled images from specific domains, such as faces or cats, and a pre-trained diffusion model, and discovers diverse semantics in unsupervised fashion using a contrastive learning objective. Moreover, the learned directions can be applied simultaneously, either within the same domain (such as various types of facial edits) or across different domains (such as applying cat and face edits within the same image) without interfering with each other. Our extensive experiments show that our method achieves highly disentangled edits, outperforming existing approaches in both diffusion-based and GAN-based latent space editing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
  2. SEGA: Instructing text-to-image models using semantic guidance. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  3. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  4. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  5. Vecgan: Image-to-image translation with interpretable latent directions. In European Conference on Computer Vision, pages 153–169. Springer, 2022.
  6. Image-to-image translation with disentangled latent vectors for face editing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  7. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092, 2023.
  8. Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5744–5753, 2019.
  9. Discovering interpretable directions in the semantic latent space of diffusion models. arXiv preprint arXiv:2303.11073, 2023.
  10. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pages 1735–1742. IEEE, 2006.
  11. Improving negative-prompt inversion via proximal guidance. arXiv preprint arXiv:2306.05414, 2023.
  12. Ganspace: Discovering interpretable gan controls. arXiv preprint arXiv:2004.02546, 2020.
  13. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  14. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  15. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  16. On the” steerability” of generative adversarial networks. arXiv preprint arXiv:1907.07171, 2019.
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  18. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018.
  19. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013.
  20. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022.
  21. Compositional visual generation with composable diffusion models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pages 423–439. Springer, 2022.
  22. Unsupervised compositional concepts discovery with text-to-image generative models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2085–2095, 2023.
  23. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  24. Disentangling disentanglement in variational autoencoders. In International conference on machine learning, pages 4402–4412. PMLR, 2019.
  25. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  26. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  27. Understanding the latent space of diffusion models through the lens of riemannian geometry. arXiv preprint arXiv:2307.12868, 2023.
  28. Styleclip: Text-driven manipulation of stylegan imagery. arXiv preprint arXiv:2103.17249, 2021.
  29. Styleres: Transforming the residuals for real image editing with stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1828–1837, 2023.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  31. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  32. Closed-form factorization of latent semantics in gans. arXiv preprint arXiv:2007.06600, 2020.
  33. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1532–1540, 2021.
  34. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  35. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  36. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
  37. Stablerep: Synthetic images from text-to-image models make strong visual representation learners. In NeurIPS, 2023.
  38. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7064–7073, 2017.
  39. Unitune: Text-driven image editing by fine tuning a diffusion model on a single image. 42(4), 2023.
  40. Unsupervised discovery of interpretable directions in the gan latent space. In International Conference on Machine Learning, pages 9786–9796. PMLR, 2020.
  41. Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
  42. Uncovering the disentanglement capability in text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1900–1910, 2023.
  43. Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3121–3138, 2022.
  44. Zero-shot contrastive loss for text-guided diffusion image style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22873–22882, 2023.
  45. Latentclr: A contrastive learning approach for unsupervised discovery of interpretable directions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14263–14272, 2021.
  46. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  47. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yusuf Dalva (12 papers)
  2. Pinar Yanardag (34 papers)
Citations (10)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets