On the Origins of Linear Representations in Large Language Models (2403.03867v1)
Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of LLMs. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent together promote the linear representation of concepts. Experiments show that linear representations emerge when learning from data matching the latent variable model, confirming that this simple structure already suffices to yield linear representations. We additionally confirm some predictions of the theory using the LLaMA-2 LLM, giving evidence that the simplified model yields generalizable insights.
- Carl Allen, Ivana Balazevic and Timothy Hospedales “What the vec? towards probabilistically grounded embeddings” In Advances in neural information processing systems 32, 2019
- “Analogies explained: Towards understanding word embeddings” In International Conference on Machine Learning, 2019, pp. 223–231 PMLR
- “A Practical Algorithm for Topic Modeling with Provable Guarantees” In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013 28, JMLR Workshop and Conference Proceedings JMLR.org, 2013, pp. 280–288
- “Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings” In arXiv preprint arXiv:1502.03520, 2015, pp. 385–399
- “A latent variable model approach to pmi-based word embeddings” In Transactions of the Association for Computational Linguistics 4 MIT Press One Rogers Street, Cambridge MA 02142-1209, USA journals-info …, 2016, pp. 385–399
- “Linear algebraic structure of word senses, with applications to polysemy” In Transactions of the Association for Computational Linguistics 6, 2018, pp. 483–495
- “Network dissection: Quantifying interpretability of deep visual representations” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549
- David M Blei and John D Lafferty “Dynamic topic models” In Proceedings of the 23rd international conference on Machine learning, 2006, pp. 113–120
- “Learning Linear Causal Representations from Interventions under General Nonlinear Mixing” In arXiv preprint arXiv:2306.02235, 2023
- “Discovering latent knowledge in language models without supervision” In arXiv preprint arXiv:2212.03827, 2022
- Tyler A Chang, Zhuowen Tu and Benjamin K Bergen “The geometry of multilingual language model representations” In arXiv preprint arXiv:2205.10964, 2022
- “Probing BERT in hyperbolic spaces” In arXiv preprint arXiv:2104.03869, 2021
- Sanjoy Dasgupta “Learning mixtures of Gaussians” In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), 1999, pp. 634–644 IEEE
- “Toy models of superposition” In arXiv preprint arXiv:2209.10652, 2022
- Jesse Engel, Matthew Hoffman and Adam Roberts “Latent constraints: Learning to generate conditionally from unconditional generative models” In arXiv preprint arXiv:1711.05772, 2017
- Kawin Ethayarajh, David Duvenaud and Graeme Hirst “Towards understanding linear word analogies” In arXiv preprint arXiv:1810.04882, 2018
- “Multi-facet clustering variational autoencoders” In Advances in Neural Information Processing Systems 34, 2021
- “Understanding composition of word embeddings via tensor decomposition” In arXiv preprint arXiv:1902.00613, 2019
- Alex Gittens, Dimitris Achlioptas and Michael W. Mahoney “Skip-Gram - Zipf + Uniform = Vector Additivity” In Annual Meeting of the Association for Computational Linguistics, 2017
- Alex Gittens, Dimitris Achlioptas and Michael W Mahoney “Skip-gram- zipf+ uniform= vector additivity” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 69–76
- Anna Gladkova, Aleksandr Drozd and Satoshi Matsuoka “Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.” In Proceedings of the NAACL Student Research Workshop, 2016, pp. 8–15
- “Finding Neurons in a Haystack: Case Studies with Sparse Probing” In arXiv preprint arXiv:2305.01610, 2023
- Aapo Hyvärinen, Ilyes Khemakhem and Ricardo Monti “Identifiability of latent-variable and structural-equation models: from linear to nonlinear” In arXiv preprint arXiv:2302.02672, 2023
- “Learning Latent Causal Graphs with Unknown Interventions” In Advances in Neural Information Processing Systems, 2023
- Yibo Jiang, Bryon Aragam and Victor Veitch “Uncovering meanings of embeddings via partial orthogonality” In arXiv preprint arXiv:2310.17611, 2023
- “Variational autoencoders and nonlinear ica: A unifying framework” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 2207–2217 PMLR
- “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)” In International conference on machine learning, 2018, pp. 2668–2677 PMLR
- Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015
- “Learning latent causal graphs via mixture oracles” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 18087–18101
- “Identifiability of deep generative models without auxiliary information” In Advances in Neural Information Processing Systems 35, 2022, pp. 15687–15701
- “Probabilistic graphical models: principles and techniques” MIT press, 2009
- Harold Kushner and G George Yin “Stochastic approximation and recursive algorithms and applications” Springer Science & Business Media, 2003
- “Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA” In 1st Conference on Causal Learning and Reasoning, CLeaR 2022, Sequoia Conference Center, Eureka, CA, USA, 11-13 April, 2022 177, Proceedings of Machine Learning Research PMLR, 2022, pp. 428–484
- Hector Levesque, Ernest Davis and Leora Morgenstern “The winograd schema challenge” In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2012 Citeseer
- “On the sentence embeddings from pre-trained language models” In arXiv preprint arXiv:2011.05864, 2020
- “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model” In arXiv preprint arXiv:2306.03341, 2023
- “Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning” In Advances in Neural Information Processing Systems 35, 2022, pp. 17612–17625
- “Acquisition of chess knowledge in alphazero” In Proceedings of the National Academy of Sciences 119.47 National Acad Sciences, 2022, pp. e2206625119
- Tomáš Mikolov, Wen-tau Yih and Geoffrey Zweig “Linguistic regularities in continuous space word representations” In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746–751
- “The strange geometry of skip-gram with negative sampling” In Conference on Empirical Methods in Natural Language Processing, 2017
- “Relative representations enable zero-shot latent space communication” In arXiv preprint arXiv:2209.15430, 2022
- Neel Nanda, Andrew Lee and Martin Wattenberg “Emergent Linear Representations in World Models of Self-Supervised Sequence Models” In arXiv preprint arXiv:2309.00941, 2023
- OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
- Kiho Park, Yo Joong Choe and Victor Veitch “The Linear Representation Hypothesis and the Geometry of Large Language Models”, 2023 arXiv:2311.03658 [cs.CL]
- Judea Pearl “Causality” Cambridge university press, 2009
- Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
- Alec Radford, Luke Metz and Soumith Chintala “Unsupervised representation learning with deep convolutional generative adversarial networks” In arXiv preprint arXiv:1511.06434, 2015
- “Svcca: Singular vector canonical correlation analysis for deep understanding and improvement” In stat 1050, 2017, pp. 19
- “Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models” In arXiv preprint, 2024
- “Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families” In Advances in Neural Information Processing Systems 34, 2021, pp. 18660–18672
- “An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis” In arXiv preprint arXiv:2311.18048, 2023
- “Visualizing and measuring the geometry of BERT” In Advances in Neural Information Processing Systems 32, 2019
- Narutatsu Ri, Fei-Tzin Lee and Nakul Verma “Contrastive Loss is All You Need to Recover Analogies as Parallel Lines” In arXiv preprint arXiv:2306.08221, 2023
- “Dynamic bernoulli embeddings for language evolution” In arXiv preprint arXiv:1703.08052, 2017
- “Exponential family embeddings” In Advances in Neural Information Processing Systems 29, 2016
- “From statistical to causal learning” In arXiv preprint arXiv:2204.00607, 2022
- “Toward causal representation learning” arXiv:2102.11107 In Proceedings of the IEEE 109.5 IEEE, 2021, pp. 612–634
- “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero” In arXiv preprint arXiv:2310.16410, 2023
- “The implicit bias of gradient descent on separable data” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 2822–2878
- Peter Spirtes, Clark N Glymour and Richard Scheines “Causation, prediction, and search” MIT press, 2000
- “Causal structure learning: a combinatorial perspective” In Foundations of Computational Mathematics Springer, 2022, pp. 1–35
- Jörg Tiedemann “Parallel Data, Tools and Interfaces in OPUS” In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul, Turkey: European Language Resources Association (ELRA), 2012
- “Linear Representations of Sentiment in Large Language Models” In arXiv preprint arXiv:2310.15154, 2023
- “Llama: Open and efficient foundation language models” In arXiv preprint arXiv:2302.13971, 2023
- “Linear spaces of meanings: compositional structures in vision-language models” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 15395–15404
- “Score-based Causal Representation Learning with Interventions” In arXiv preprint arXiv:2301.08230, 2023
- “Evaluating Natural Alpha Embeddings on Intrinsic and Extrinsic Tasks” In Workshop on Representation Learning for NLP, 2020
- “Natural alpha embeddings” In Information Geometry 4.1 Springer, 2021, pp. 3–29
- “Concept Algebra for Score-based Conditional Model” In ICML 2023 Workshop on Structured Probabilistic Inference {normal-{\{{\normal-\\backslash\&}normal-}\}} Generative Modeling, 2023
- “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate” In arXiv preprint arXiv:2011.02538, 2020
- “Contrastive learning inverts the data generating process” In International Conference on Machine Learning, 2021, pp. 12979–12990 PMLR