I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? (2503.08980v3)

Published 12 Mar 2025 in cs.LG and cs.CL

Abstract: The remarkable achievements of LLMs have led many to conclude that they exhibit a form of intelligence. This is as opposed to explanations of their capabilities based on their ability to perform relatively simple manipulations of vast volumes of data. To illuminate the distinction between these explanations, we introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables. Under mild conditions, even when the mapping from the latent space to the observed space is non-invertible, we establish an identifiability result, i.e., the representations learned by LLMs through next-token prediction can be approximately modeled as the logarithm of the posterior probabilities of these latent discrete concepts given input context, up to an invertible linear transformation. This theoretical finding not only provides evidence that LLMs capture underlying generative factors, but also provide a unified prospective for understanding of the linear representation hypothesis. Taking this a step further, our finding motivates a reliable evaluation of sparse autoencoders by treating the performance of supervised concept extractors as an upper bound. Pushing this idea even further, it inspires a structural variant that enforces dependence among latent concepts in addition to promoting sparsity. Empirically, we validate our theoretical results through evaluations on both simulation data and the Pythia, Llama, and DeepSeek model families, and demonstrate the effectiveness of our structured sparse autoencoder.

Summary

The paper presents a theoretical framework showing that LLMs approximate linear representations of human-interpretable concepts via next-token prediction.
Empirical evidence on Pythia, Llama, and DeepSeek models supports the theoretical findings regarding linear concept representations.
Findings reinforce the linear representation hypothesis, suggesting potential for enhancing LLM interpretability, alignment, and causal reasoning.

Overview of "I Predict Therefore I Am" Paper

The academic paper titled "I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?" presents a novel framework exploring the capability of LLMs to learn and represent human-interpretable concepts exclusively through the mechanism of next-token prediction. The authors embark on a theoretical and empirical investigation to determine whether the simplicity of next-token prediction suffices in conferring sophisticated intelligence akin to human reasoning.

Core Contributions and Findings

The paper introduces a generative model wherein tokens are generated based on latent discrete variables representing human-interpretable concepts. A pivotal theoretical result established in the paper is identifiability under the presented model: the representations learned by LLMs are shown to approximate a logarithmic transformation of the posterior probabilities of these latent discrete concepts, up to an invertible linear transformation. This result substantiates the linear representation hypothesis, positing that LLMs organize human-interpretable concepts in a linear manner.

Strong numerical evidence is provided via empirical validation on both simulated and real data using prominent model families such as Pythia, Llama, and DeepSeek. The experiments consistently align with the theoretical assertions, demonstrating the ability of LLMs to approximate linear representations of latent concepts.

Theoretical and Practical Implications

The paper's theoretical implications are significant, particularly in reinforcing the linear representation hypothesis within LLM architectures. This correlation suggests new avenues for exploring concept directionality, manipulability, and linear probing within LLM systems. Practically, the findings hint at a more profound understanding of LLM mechanisms, potentially guiding the refinement of LLM architectures to enhance interpretability and alignment with human cognition and reasoning.

Speculative Future Directions in AI Research

As the paper elucidates the potential of next-token prediction to capture complex, meaningful representations, future research may focus on leveraging these insights to enhance causal reasoning capabilities within AI models. The possibility of embedding causal reasoning through linear unmixing techniques in LLMs offers an intriguing direction for enabling a deeper understanding of data and fostering robust AI systems capable of nuanced predictions and decisions.

Additionally, the paper challenges the invertibility assumption common in causal representation learning, encouraging future work to reconsider these constraints and explore approximate identifiability in non-invertible mappings—thereby broadening the application scope in real-world settings where complex data interactions prevail.

Concluding Remarks

The paper "I Predict Therefore I Am" makes notable contributions to understanding the intersection of next-token prediction and the learning of human-interpretable concepts within LLMs. It not only theoretically validates core hypotheses about LLM linearity but also empirically supports these notions, setting the stage for future advancements in AI that bridge LLM capabilities with elements of human-like intelligence and reasoning.

Related Papers

Find Related Papers

Tweets

https://twitter.com/ai_database/status/1900795322772926827

https://twitter.com/BogdanIonutCir2/status/1915815691308933288

YouTube

Show All Videos