Papers
Topics
Authors
Recent
2000 character limit reached

Concept Algebra for (Score-Based) Text-Controlled Generative Models (2302.03693v6)

Published 7 Feb 2023 in cs.CL, cs.LG, and stat.ML

Abstract: This paper concerns the structure of learned representations in text-guided generative models, focusing on score-based models. A key property of such models is that they can compose disparate concepts in a disentangled' manner. This suggests these models have internal representations that encode concepts in adisentangled' manner. Here, we focus on the idea that concepts are encoded as subspaces of some representation space. We formalize what this means, show there's a natural choice for the representation, and develop a simple method for identifying the part of the representation corresponding to a given concept. In particular, this allows us to manipulate the concepts expressed by the model through algebraic manipulation of the representation. We demonstrate the idea with examples using Stable Diffusion. Code in https://github.com/zihao12/concept-algebra-code

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. “Analogies explained: Towards understanding word embeddings” In International Conference on Machine Learning, 2019, pp. 223–231 PMLR
  2. Anonymous “Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC” under review In Submitted to The Eleventh International Conference on Learning Representations, 2023 URL: https://openreview.net/forum?id=OboQ71j1Bn
  3. “A latent variable model approach to PMI-based word embeddings”, 2015 arXiv:1502.03520
  4. “Man is to computer programmer as woman is to homemaker? debiasing word embeddings” In Advances in neural information processing systems 29, 2016
  5. “On the opportunities and risks of foundation models”, 2021 arXiv:2108.07258
  6. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  7. “Diffedit: Diffusion-based semantic image editing with mask guidance” In arXiv preprint arXiv:2210.11427, 2022
  8. Yilun Du, Shuang Li and Igor Mordatch “Compositional visual generation with energy based models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6637–6647
  9. “Unsupervised learning of compositional energy concepts” In Advances in Neural Information Processing Systems 34, 2021, pp. 15608–15620
  10. “DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability” In arXiv preprint arXiv:2210.00364, 2022
  11. “Toy Models of Superposition” In arXiv preprint arXiv:2209.10652, 2022
  12. Alex Gittens, Dimitris Achlioptas and Michael W Mahoney “Skip-gram- zipf+ uniform= vector additivity” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 69–76
  13. “word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method”, 2014 arXiv:1402.3722
  14. “Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them”, 2019 arXiv:1903.03862
  15. “Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models” In arXiv preprint arXiv:2303.11073, 2023
  16. “Towards a definition of disentangled representations” In arXiv preprint arXiv:1812.02230, 2018
  17. Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6840–6851
  18. “Estimation of non-normalized statistical models by score matching.” In Journal of Machine Learning Research 6.4, 2005
  19. “Unsupervised feature extraction by time-contrastive learning and nonlinear ica” In Advances in neural information processing systems 29, 2016
  20. “Nonlinear ICA of temporally dependent stationary sources” In Artificial Intelligence and Statistics, 2017, pp. 460–469 PMLR
  21. Aapo Hyvarinen, Hiroaki Sasaki and Richard Turner “Nonlinear ICA using auxiliary variables and generalized contrastive learning” In The 22nd International Conference on Artificial Intelligence and Statistics, 2019, pp. 859–868 PMLR
  22. “Variational autoencoders and nonlinear ica: A unifying framework” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 2207–2217 PMLR
  23. “Large Language Models are Zero-Shot Reasoners”, 2022 arXiv:2205.11916
  24. Mingi Kwon, Jaeseok Jeong and Youngjung Uh “Diffusion models already have a semantic latent space” In arXiv preprint arXiv:2210.10960, 2022
  25. “Learning to compose visual relations” In Advances in Neural Information Processing Systems 34, 2021, pp. 23166–23178
  26. “Compositional Visual Generation with Composable Diffusion Models”, 2022 arXiv:2206.01714
  27. Calvin Luo “Understanding diffusion models: A unified perspective”, 2022 arXiv:2208.11970
  28. “Distributed representations of words and phrases and their compositionality” In Advances in neural information processing systems 26, 2013
  29. Tomáš Mikolov, Wen-tau Yih and Geoffrey Zweig “Linguistic regularities in continuous space word representations” In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746–751
  30. Graziano Mita, Maurizio Filippone and Pietro Michiardi “An identifiable double vae for disentangled representations” In International Conference on Machine Learning, 2021, pp. 7769–7779 PMLR
  31. Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara and Vishal M Patel “Unite and Conquer: Cross Dataset Multimodal Synthesis using Diffusion Models”, 2022 arXiv:2212.00793
  32. “Unsupervised Discovery of Semantic Latent Directions in Diffusion Models” In arXiv preprint arXiv:2302.12469, 2023
  33. Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
  34. “Learning transferable visual models from natural language supervision” In International Conference on Machine Learning, 2021, pp. 8748–8763 PMLR
  35. “Hierarchical text-conditional image generation with clip latents”, 2022 arXiv:2204.06125
  36. “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding” In Advances in neural information processing systems 35, 2022
  37. “Generative modeling by estimating gradients of the data distribution” In Advances in Neural Information Processing Systems 32, 2019
  38. “Self-supervised learning with data augmentations provably isolates content from style” In Advances in neural information processing systems 34, 2021, pp. 16451–16467
  39. “Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE” In Advances in Neural Information Processing Systems 33, 2020, pp. 7234–7247
  40. “Contrastive learning inverts the data generating process” In International Conference on Machine Learning, 2021, pp. 12979–12990 PMLR
Citations (23)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 4 likes about this paper.