Concept Algebra for (Score-Based) Text-Controlled Generative Models (2302.03693v6)
Abstract: This paper concerns the structure of learned representations in text-guided generative models, focusing on score-based models. A key property of such models is that they can compose disparate concepts in a disentangled' manner. This suggests these models have internal representations that encode concepts in adisentangled' manner. Here, we focus on the idea that concepts are encoded as subspaces of some representation space. We formalize what this means, show there's a natural choice for the representation, and develop a simple method for identifying the part of the representation corresponding to a given concept. In particular, this allows us to manipulate the concepts expressed by the model through algebraic manipulation of the representation. We demonstrate the idea with examples using Stable Diffusion. Code in https://github.com/zihao12/concept-algebra-code
- “Analogies explained: Towards understanding word embeddings” In International Conference on Machine Learning, 2019, pp. 223–231 PMLR
- Anonymous “Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC” under review In Submitted to The Eleventh International Conference on Learning Representations, 2023 URL: https://openreview.net/forum?id=OboQ71j1Bn
- “A latent variable model approach to PMI-based word embeddings”, 2015 arXiv:1502.03520
- “Man is to computer programmer as woman is to homemaker? debiasing word embeddings” In Advances in neural information processing systems 29, 2016
- “On the opportunities and risks of foundation models”, 2021 arXiv:2108.07258
- “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
- “Diffedit: Diffusion-based semantic image editing with mask guidance” In arXiv preprint arXiv:2210.11427, 2022
- Yilun Du, Shuang Li and Igor Mordatch “Compositional visual generation with energy based models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6637–6647
- “Unsupervised learning of compositional energy concepts” In Advances in Neural Information Processing Systems 34, 2021, pp. 15608–15620
- “DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability” In arXiv preprint arXiv:2210.00364, 2022
- “Toy Models of Superposition” In arXiv preprint arXiv:2209.10652, 2022
- Alex Gittens, Dimitris Achlioptas and Michael W Mahoney “Skip-gram- zipf+ uniform= vector additivity” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 69–76
- “word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method”, 2014 arXiv:1402.3722
- “Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them”, 2019 arXiv:1903.03862
- “Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models” In arXiv preprint arXiv:2303.11073, 2023
- “Towards a definition of disentangled representations” In arXiv preprint arXiv:1812.02230, 2018
- Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Advances in Neural Information Processing Systems 33, 2020, pp. 6840–6851
- “Estimation of non-normalized statistical models by score matching.” In Journal of Machine Learning Research 6.4, 2005
- “Unsupervised feature extraction by time-contrastive learning and nonlinear ica” In Advances in neural information processing systems 29, 2016
- “Nonlinear ICA of temporally dependent stationary sources” In Artificial Intelligence and Statistics, 2017, pp. 460–469 PMLR
- Aapo Hyvarinen, Hiroaki Sasaki and Richard Turner “Nonlinear ICA using auxiliary variables and generalized contrastive learning” In The 22nd International Conference on Artificial Intelligence and Statistics, 2019, pp. 859–868 PMLR
- “Variational autoencoders and nonlinear ica: A unifying framework” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 2207–2217 PMLR
- “Large Language Models are Zero-Shot Reasoners”, 2022 arXiv:2205.11916
- Mingi Kwon, Jaeseok Jeong and Youngjung Uh “Diffusion models already have a semantic latent space” In arXiv preprint arXiv:2210.10960, 2022
- “Learning to compose visual relations” In Advances in Neural Information Processing Systems 34, 2021, pp. 23166–23178
- “Compositional Visual Generation with Composable Diffusion Models”, 2022 arXiv:2206.01714
- Calvin Luo “Understanding diffusion models: A unified perspective”, 2022 arXiv:2208.11970
- “Distributed representations of words and phrases and their compositionality” In Advances in neural information processing systems 26, 2013
- Tomáš Mikolov, Wen-tau Yih and Geoffrey Zweig “Linguistic regularities in continuous space word representations” In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746–751
- Graziano Mita, Maurizio Filippone and Pietro Michiardi “An identifiable double vae for disentangled representations” In International Conference on Machine Learning, 2021, pp. 7769–7779 PMLR
- Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara and Vishal M Patel “Unite and Conquer: Cross Dataset Multimodal Synthesis using Diffusion Models”, 2022 arXiv:2212.00793
- “Unsupervised Discovery of Semantic Latent Directions in Diffusion Models” In arXiv preprint arXiv:2302.12469, 2023
- Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
- “Learning transferable visual models from natural language supervision” In International Conference on Machine Learning, 2021, pp. 8748–8763 PMLR
- “Hierarchical text-conditional image generation with clip latents”, 2022 arXiv:2204.06125
- “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding” In Advances in neural information processing systems 35, 2022
- “Generative modeling by estimating gradients of the data distribution” In Advances in Neural Information Processing Systems 32, 2019
- “Self-supervised learning with data augmentations provably isolates content from style” In Advances in neural information processing systems 34, 2021, pp. 16451–16467
- “Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE” In Advances in Neural Information Processing Systems 33, 2020, pp. 7234–7247
- “Contrastive learning inverts the data generating process” In International Conference on Machine Learning, 2021, pp. 12979–12990 PMLR
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.