This paper, "From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning" (Shani et al., 21 May 2025 ), investigates whether LLMs develop internal conceptual representations that are analogous to human conceptual structures, particularly focusing on the trade-off between informational compression and semantic fidelity. The authors introduce a novel information-theoretic framework to quantitatively compare these strategies, leveraging seminal human categorization benchmarks.
The core research questions addressed are:
- [RQ1]: To what extent do concepts emergent in LLMs align with human-defined conceptual categories?
- [RQ2]: Do LLMs and humans exhibit similar internal geometric structures within these concepts, especially concerning item typicality?
- [RQ3]: How do humans and LLMs differ in their strategies for balancing representational compression with the preservation of semantic fidelity when forming concepts?
To answer these questions, the paper utilizes data from three classic cognitive psychology experiments: Rosch (1973), Rosch (1975), and McCloskey & Glucksberg (1978). These datasets provide human judgments on category membership and item typicality for a total of 1,049 items across 34 categories, which the authors digitized and aggregated. A diverse suite of LLMs is analyzed, including encoder-only models (BERT family) and various decoder-only models (Llama, Gemma, Qwen, Phi, Mistral families) ranging from 300 million to 72 billion parameters. The analysis focuses on static token-level embeddings from the input embedding layer of these models.
The central methodological contribution is an information-theoretic framework inspired by Rate-Distortion Theory (RDT) and the Information Bottleneck (IB) principle. This framework evaluates conceptual clusters derived from items (token embeddings) using an objective function :
where:
- Complexity(X,C) is the mutual information , measuring the informational cost of representing items through clusters . Lower implies greater compression.
- Distortion(X,C) is the average intra-cluster variance of item embeddings, quantifying semantic fidelity loss. Lower distortion means items are closer to their cluster centroids. , where . A lower score indicates a more statistically "efficient" representation according to this framework.
The empirical investigation yields several key findings:
- [RQ1] Broad Conceptual Alignment: LLM-derived clusters (using k-means on token embeddings, with set by human category counts) show significant alignment with human-defined conceptual categories, as measured by Adjusted Mutual Information (AMI), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI). Notably, some encoder models (like BERT-large-uncased) exhibit strong alignment, sometimes outperforming much larger decoder-only models, suggesting factors beyond scale influence human-like categorical abstraction.
- [RQ2] Limited Fidelity to Fine-Grained Semantics: LLMs demonstrate only modest alignment with human-perceived fine-grained semantic distinctions, such as item typicality. This was assessed by correlating (Spearman's ) human typicality ratings with the cosine similarity between an item's token embedding and the embedding of its human-assigned category name (e.g., 'robin' to 'bird'). The correlations were generally weak, indicating LLMs do not consistently represent human-perceived typical items as significantly more similar to their category label's embedding.
- [RQ3] Divergent Efficiency Strategies in Compression-Meaning Trade-off: LLMs exhibit markedly superior information-theoretic efficiency compared to human conceptual structures when evaluated by the objective (with ) and mean cluster entropy. LLM-derived clusters consistently achieve lower (more "optimal" by this statistical measure) values and lower entropy than human conceptual categories. This suggests LLMs are highly optimized for statistical compactness.
The discussion and conclusion highlight a fundamental divergence: LLMs appear optimized for aggressive statistical compression, likely due to their training on vast text corpora, achieving information-theoretically efficient representations. This focus, however, limits their ability to capture the richer, prototype-based semantic nuances crucial for deep human understanding. In contrast, human conceptual systems seem to prioritize adaptive richness, contextual flexibility, and functional utility. The apparent statistical "suboptimality" of human concepts (higher entropy and scores) likely reflects optimization for a broader set of cognitive demands, such as robust generalization, inferential power, and effective communication.
The authors suggest that merely scaling current LLM approaches might be insufficient for achieving human-like understanding. Future AI development could benefit from incorporating principles that foster richer conceptual structures, potentially using frameworks like the objective for guidance and evaluation. For cognitive science, LLMs serve as valuable computational models to test theories of human concept formation, highlighting the unique optimization pressures shaping human cognition. The paper concludes that moving "from tokens to thoughts" will require AI to embrace principles that cultivate richer, contextually-aware conceptual structures, recognizing that what appears as statistical "inefficiency" might be a haLLMark of robust intelligence.