Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Origins of Linear Representations in Large Language Models (2403.03867v1)

Published 6 Mar 2024 in cs.CL, cs.LG, and stat.ML

Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of LLMs. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent together promote the linear representation of concepts. Experiments show that linear representations emerge when learning from data matching the latent variable model, confirming that this simple structure already suffices to yield linear representations. We additionally confirm some predictions of the theory using the LLaMA-2 LLM, giving evidence that the simplified model yields generalizable insights.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Carl Allen, Ivana Balazevic and Timothy Hospedales “What the vec? towards probabilistically grounded embeddings” In Advances in neural information processing systems 32, 2019
  2. “Analogies explained: Towards understanding word embeddings” In International Conference on Machine Learning, 2019, pp. 223–231 PMLR
  3. “A Practical Algorithm for Topic Modeling with Provable Guarantees” In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013 28, JMLR Workshop and Conference Proceedings JMLR.org, 2013, pp. 280–288
  4. “Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings” In arXiv preprint arXiv:1502.03520, 2015, pp. 385–399
  5. “A latent variable model approach to pmi-based word embeddings” In Transactions of the Association for Computational Linguistics 4 MIT Press One Rogers Street, Cambridge MA 02142-1209, USA journals-info …, 2016, pp. 385–399
  6. “Linear algebraic structure of word senses, with applications to polysemy” In Transactions of the Association for Computational Linguistics 6, 2018, pp. 483–495
  7. “Network dissection: Quantifying interpretability of deep visual representations” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549
  8. David M Blei and John D Lafferty “Dynamic topic models” In Proceedings of the 23rd international conference on Machine learning, 2006, pp. 113–120
  9. “Learning Linear Causal Representations from Interventions under General Nonlinear Mixing” In arXiv preprint arXiv:2306.02235, 2023
  10. “Discovering latent knowledge in language models without supervision” In arXiv preprint arXiv:2212.03827, 2022
  11. Tyler A Chang, Zhuowen Tu and Benjamin K Bergen “The geometry of multilingual language model representations” In arXiv preprint arXiv:2205.10964, 2022
  12. “Probing BERT in hyperbolic spaces” In arXiv preprint arXiv:2104.03869, 2021
  13. Sanjoy Dasgupta “Learning mixtures of Gaussians” In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), 1999, pp. 634–644 IEEE
  14. “Toy models of superposition” In arXiv preprint arXiv:2209.10652, 2022
  15. Jesse Engel, Matthew Hoffman and Adam Roberts “Latent constraints: Learning to generate conditionally from unconditional generative models” In arXiv preprint arXiv:1711.05772, 2017
  16. Kawin Ethayarajh, David Duvenaud and Graeme Hirst “Towards understanding linear word analogies” In arXiv preprint arXiv:1810.04882, 2018
  17. “Multi-facet clustering variational autoencoders” In Advances in Neural Information Processing Systems 34, 2021
  18. “Understanding composition of word embeddings via tensor decomposition” In arXiv preprint arXiv:1902.00613, 2019
  19. Alex Gittens, Dimitris Achlioptas and Michael W. Mahoney “Skip-Gram - Zipf + Uniform = Vector Additivity” In Annual Meeting of the Association for Computational Linguistics, 2017
  20. Alex Gittens, Dimitris Achlioptas and Michael W Mahoney “Skip-gram- zipf+ uniform= vector additivity” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 69–76
  21. Anna Gladkova, Aleksandr Drozd and Satoshi Matsuoka “Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.” In Proceedings of the NAACL Student Research Workshop, 2016, pp. 8–15
  22. “Finding Neurons in a Haystack: Case Studies with Sparse Probing” In arXiv preprint arXiv:2305.01610, 2023
  23. Aapo Hyvärinen, Ilyes Khemakhem and Ricardo Monti “Identifiability of latent-variable and structural-equation models: from linear to nonlinear” In arXiv preprint arXiv:2302.02672, 2023
  24. “Learning Latent Causal Graphs with Unknown Interventions” In Advances in Neural Information Processing Systems, 2023
  25. Yibo Jiang, Bryon Aragam and Victor Veitch “Uncovering meanings of embeddings via partial orthogonality” In arXiv preprint arXiv:2310.17611, 2023
  26. “Variational autoencoders and nonlinear ica: A unifying framework” In International Conference on Artificial Intelligence and Statistics, 2020, pp. 2207–2217 PMLR
  27. “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)” In International conference on machine learning, 2018, pp. 2668–2677 PMLR
  28. Diederik P. Kingma and Jimmy Ba “Adam: A Method for Stochastic Optimization” In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015
  29. “Learning latent causal graphs via mixture oracles” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 18087–18101
  30. “Identifiability of deep generative models without auxiliary information” In Advances in Neural Information Processing Systems 35, 2022, pp. 15687–15701
  31. “Probabilistic graphical models: principles and techniques” MIT press, 2009
  32. Harold Kushner and G George Yin “Stochastic approximation and recursive algorithms and applications” Springer Science & Business Media, 2003
  33. “Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA” In 1st Conference on Causal Learning and Reasoning, CLeaR 2022, Sequoia Conference Center, Eureka, CA, USA, 11-13 April, 2022 177, Proceedings of Machine Learning Research PMLR, 2022, pp. 428–484
  34. Hector Levesque, Ernest Davis and Leora Morgenstern “The winograd schema challenge” In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2012 Citeseer
  35. “On the sentence embeddings from pre-trained language models” In arXiv preprint arXiv:2011.05864, 2020
  36. “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model” In arXiv preprint arXiv:2306.03341, 2023
  37. “Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning” In Advances in Neural Information Processing Systems 35, 2022, pp. 17612–17625
  38. “Acquisition of chess knowledge in alphazero” In Proceedings of the National Academy of Sciences 119.47 National Acad Sciences, 2022, pp. e2206625119
  39. Tomáš Mikolov, Wen-tau Yih and Geoffrey Zweig “Linguistic regularities in continuous space word representations” In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746–751
  40. “The strange geometry of skip-gram with negative sampling” In Conference on Empirical Methods in Natural Language Processing, 2017
  41. “Relative representations enable zero-shot latent space communication” In arXiv preprint arXiv:2209.15430, 2022
  42. Neel Nanda, Andrew Lee and Martin Wattenberg “Emergent Linear Representations in World Models of Self-Supervised Sequence Models” In arXiv preprint arXiv:2309.00941, 2023
  43. OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
  44. Kiho Park, Yo Joong Choe and Victor Veitch “The Linear Representation Hypothesis and the Geometry of Large Language Models”, 2023 arXiv:2311.03658 [cs.CL]
  45. Judea Pearl “Causality” Cambridge university press, 2009
  46. Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
  47. Alec Radford, Luke Metz and Soumith Chintala “Unsupervised representation learning with deep convolutional generative adversarial networks” In arXiv preprint arXiv:1511.06434, 2015
  48. “Svcca: Singular vector canonical correlation analysis for deep understanding and improvement” In stat 1050, 2017, pp. 19
  49. “Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models” In arXiv preprint, 2024
  50. “Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families” In Advances in Neural Information Processing Systems 34, 2021, pp. 18660–18672
  51. “An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis” In arXiv preprint arXiv:2311.18048, 2023
  52. “Visualizing and measuring the geometry of BERT” In Advances in Neural Information Processing Systems 32, 2019
  53. Narutatsu Ri, Fei-Tzin Lee and Nakul Verma “Contrastive Loss is All You Need to Recover Analogies as Parallel Lines” In arXiv preprint arXiv:2306.08221, 2023
  54. “Dynamic bernoulli embeddings for language evolution” In arXiv preprint arXiv:1703.08052, 2017
  55. “Exponential family embeddings” In Advances in Neural Information Processing Systems 29, 2016
  56. “From statistical to causal learning” In arXiv preprint arXiv:2204.00607, 2022
  57. “Toward causal representation learning” arXiv:2102.11107 In Proceedings of the IEEE 109.5 IEEE, 2021, pp. 612–634
  58. “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero” In arXiv preprint arXiv:2310.16410, 2023
  59. “The implicit bias of gradient descent on separable data” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 2822–2878
  60. Peter Spirtes, Clark N Glymour and Richard Scheines “Causation, prediction, and search” MIT press, 2000
  61. “Causal structure learning: a combinatorial perspective” In Foundations of Computational Mathematics Springer, 2022, pp. 1–35
  62. Jörg Tiedemann “Parallel Data, Tools and Interfaces in OPUS” In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul, Turkey: European Language Resources Association (ELRA), 2012
  63. “Linear Representations of Sentiment in Large Language Models” In arXiv preprint arXiv:2310.15154, 2023
  64. “Llama: Open and efficient foundation language models” In arXiv preprint arXiv:2302.13971, 2023
  65. “Linear spaces of meanings: compositional structures in vision-language models” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 15395–15404
  66. “Score-based Causal Representation Learning with Interventions” In arXiv preprint arXiv:2301.08230, 2023
  67. “Evaluating Natural Alpha Embeddings on Intrinsic and Extrinsic Tasks” In Workshop on Representation Learning for NLP, 2020
  68. “Natural alpha embeddings” In Information Geometry 4.1 Springer, 2021, pp. 3–29
  69. “Concept Algebra for Score-based Conditional Model” In ICML 2023 Workshop on Structured Probabilistic Inference {normal-{\{{\normal-\\backslash\&}normal-}\}} Generative Modeling, 2023
  70. “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate” In arXiv preprint arXiv:2011.02538, 2020
  71. “Contrastive learning inverts the data generating process” In International Conference on Machine Learning, 2021, pp. 12979–12990 PMLR
Citations (18)

Summary

  • The paper reveals that gradient descent, combined with log-odds matching, is key to forming linear concept representations.
  • It employs a latent variable model with binary variables to analyze token prediction and the dynamics of semantic concepts.
  • Empirical experiments on simulated data and LLaMA-2 validate the emergence of both linear and orthogonal structures in the representation space.

On the Origins of Linear Representations in LLMs

In the landscape of interpretability research for LLMs, the encoding of high-level semantic concepts within model representations presents a fascinating area of paper. A recurring observation in this domain is the linear nature of these representations. This post explores a paper that provides a theoretical framework for explaining the emergence of such linear representations in LLMs.

Latent Variable Model for LLMs

The paper introduces a latent variable model designed to abstract and analyze the concept dynamics inherent in next token prediction tasks—central to the functioning of LLMs. This model posits a latent space, represented as a set of binary variables, each embodying a distinct 'concept.' These latent concepts, ranging from grammatical structures to thematic elements, serve as the underlying drivers for the generation of tokens (words or characters) and context sentences.

Crucially, the model captures the relationship between context sentences, latent concepts, and next tokens through a formal structure. It assumes that each context sentence conveys partial information about the latent concepts, which, in turn, probabilistically determine the next token. The learning objective for LLMs, thus, focuses on accurately estimating these conditional probabilities.

Insights into Linear Representations

The paper rigorously shows that under this model, concepts are indeed linearly represented in the learned representation space. This phenomenon is discussed from two key perspectives:

  1. Log-Odds Matching: Mirroring findings from earlier research on word embeddings, the paper demonstrates that a condition known as 'log-odds matching' leads to linear structures. This condition implies that the learned conditional probabilities closely mirror the actual probabilities, promoting a linear structure among concept representations.
  2. Implicit Bias of Gradient Descent: More significantly, the paper highlights the role of gradient descent's implicit bias in fostering linear representations. It elucidates that optimizing specific sub-tasks within the LLM objective, with gradient descent, naturally gravitates toward linearly encoding concepts in the representation space.

The practical implications of these results are profound. They suggest that the observed linear structure of concept representations in LLMs is not an artifact of model architecture but arises due to the learning dynamics and the optimization process.

Orthogonal Representations of Concepts

An interesting extension of the discussion on linear representations is the exploration of concept orthogonality. The paper brings to light how unrelated concepts—those not sharing direct probabilistic dependencies—tend to be represented orthogonally within the unembedding space. This finding aligns with empirical observations of semantic structures captured by Euclidean geometry in LLMs, notwithstanding that the training objectives do not explicitly identify Euclidean inner products.

Empirical Validation

The theoretical insights are further substantiated through experiments conducted on simulated data, confirming the emergence of linear and orthogonal representations in accordance with the predictions of the latent variable model. Additionally, analyses performed on the LLaMA-2 model reveal alignment between embedding and unembedding representations for matching concepts, lending further credence to the paper's theoretical contributions.

Concluding Remarks

This paper makes significant strides in demystifying the phenomenon of linearly encoded representations in LLMs. By leveraging a simple yet effective latent variable model, it provides a compelling theoretical basis for understanding how high-level semantic concepts are represented within these models. Moreover, the findings underscore the intricate interplay between model learning objectives, optimization dynamics, and the resultant geometrical structure of representations.

The implications of this research are far-reaching, opening avenues for further inquiries into the interpretability of LLMs and the optimization strategies that shape their learning process. It invites us to reevaluate our understanding of how abstract concepts are encoded and manipulated within the confines of large-scale machine learning models.