Papers
Topics
Authors
Recent
2000 character limit reached

Towards Atoms of Large Language Models

Published 25 Sep 2025 in cs.CL and cs.AI | (2509.20784v1)

Abstract: The fundamental units of internal representations in LLMs remain undefined, limiting further understanding of their mechanisms. Neurons or features are often regarded as such units, yet neurons suffer from polysemy, while features face concerns of unreliable reconstruction and instability. To address this issue, we propose the Atoms Theory, which defines such units as atoms. We introduce the atomic inner product (AIP) to correct representation shifting, formally define atoms, and prove the conditions that atoms satisfy the Restricted Isometry Property (RIP), ensuring stable sparse representations over atom set and linking to compressed sensing. Under stronger conditions, we further establish the uniqueness and exact $\ell_1$ recoverability of the sparse representations, and provide guarantees that single-layer sparse autoencoders (SAEs) with threshold activations can reliably identify the atoms. To validate the Atoms Theory, we train threshold-activated SAEs on Gemma2-2B, Gemma2-9B, and Llama3.1-8B, achieving 99.9% sparse reconstruction across layers on average, and more than 99.8% of atoms satisfy the uniqueness condition, compared to 0.5% for neurons and 68.2% for features, showing that atoms more faithfully capture intrinsic representations of LLMs. Scaling experiments further reveal the link between SAEs size and recovery capacity. Overall, this work systematically introduces and validates Atoms Theory of LLMs, providing a theoretical framework for understanding internal representations and a foundation for mechanistic interpretability. Code available at https://github.com/ChenhuiHu/towards_atoms.

Summary

  • The paper introduces Atoms Theory, a novel framework redefining LLM components by demonstrating over 99.8% uniqueness in atom reconstruction compared to traditional neurons and features.
  • It leverages the Atomic Inner Product to correct representation shifting, ensuring internal representation angles align close to 90° for accurate geometric interpretation.
  • Experimental validation using sparse autoencoders on models like Gemma2 and Llama3.1 confirmed 99.9% sparse reconstruction, underscoring the robustness of the proposed atoms.

Towards Atoms of LLMs

Introduction to Atoms Theory

The paper "Towards Atoms of LLMs" proposes a novel theoretical framework, the Atoms Theory, which aims to redefine the fundamental units of LLMs. Traditional views considered neurons or features as the basic units, but these suffer from polysemy and instability, respectively. Atoms Theory introduces the concept of 'atoms' and defines these as the essential components that better encapsulate the internal representations within LLMs, providing improved interpretability and reconstruction capabilities.

Atomic Inner Product and Representation Shifting

A critical advancement in the paper is the introduction of the Atomic Inner Product (AIP), designed to correct representation shifting observed when using the Euclidean inner product. Representation shifting refers to the deviation of angle distributions between representations from 90∘90^\circ due to the inherent properties of the softmax function in LLMs. Figure 1

Figure 1: Representation shifting caused by adopting the Euclidean inner product, where the centroid of angles distribution between representations deviates substantially from 90∘90^\circ.

The AIP centers the angle distributions close to 90∘90^\circ, providing a more accurate depiction of the geometric relationships between internal representations. Figure 2

Figure 2: Correcting representation shifting by identifying and adopting the atomic inner product, where the centroid of angle distribution between representations approaches 90∘90^\circ.

Validation of Atoms Theory

The paper validates the Atoms Theory through comprehensive experiments using sparse autoencoders (SAEs) equipped with threshold activations. These are trained on models such as Gemma2-2B, Gemma2-9B, and Llama3.1-8B, achieving 99.9% sparse reconstruction across layers. Unlike neurons and traditional features, more than 99.8% of the atoms meet the uniqueness conditions required for stable and interpretable representation. This contrasts sharply with neurons (0.5%) and features (68.2%). Figure 3

Figure 3: Sparse reconstruction R2R^2 scores across models. GemmaScope and LlamaScope serve as standard tools for extracting features from representations.

Comparison with Neurons and Features

The Atoms Theory fundamentally challenges existing paradigms by demonstrating that atoms provide superior reconstructive fidelity and stability compared to neurons and features. Through systematic evaluation, atoms are shown to fulfill conditions for uniqueness and recoverability, supporting their role as more robust fundamental units. Figure 4

Figure 4: Spontaneous alignment between the encoder and decoder during training on Gemma2-2B.

Impacts and Future Directions

The theoretical underpinnings of Atoms Theory align closely with the principles of compressed sensing and provide rigorous guarantees such as the Restricted Isometry Property (RIP). This establishes a foundation for mechanistic interpretability of LLMs, significantly contributing to both theoretical and practical understanding. Future work will focus on refining computational methods for atom identification and exploring broader applications within AI interpretability frameworks.

Conclusion

"Towards Atoms of LLMs" presents a robust framework that effectively transcends limitations of traditional neuron and feature paradigms, offering a comprehensive approach to interpreting and reconstructing LLM representations. As the field progresses, the Atoms Theory may reshape our understanding of the fundamental structures underlying LLMs, highlighting the intricate balance between theoretical innovation and practical application.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.