Convergent Evolution: How Different Language Models Learn Similar Number Representations
This presentation examines a rigorous analysis of how language models develop periodic number representations. It reveals a crucial two-tiered hierarchy: while all models exhibit spectral convergence—universal Fourier spikes at periods 2, 5, and 10—only certain combinations of architecture, data, and optimizer achieve geometric convergence, where these periodic features support actual modular arithmetic through linearly separable residue classes. The work demonstrates that visible periodic signals substantially overestimate a model's capacity for structured reasoning, with profound implications for mechanistic interpretability and foundation model design.Script
Different language models, trained on different data with different architectures, all learn to represent numbers with the same periodic structure. It's convergent evolution in neural networks, and it hides a crucial distinction between what looks similar and what actually computes.
Every model the researchers examined—Transformers, recurrent networks, even classical word embeddings like GloVe—displays sharp Fourier spikes at periods 2, 5, and 10. This spectral convergence is universal, arising from the statistical structure of natural language data itself, not from any particular training objective or architecture.
But here's the critical insight: Fourier spikes are necessary but not sufficient for arithmetic capability. The researchers prove that spectral convergence—those periodic signals—can coexist with complete failure to perform modular arithmetic. Only when embeddings achieve geometric convergence, where residue classes become linearly separable, does the model gain functional arithmetic capacity.
Through controlled ablations, the authors isolate the sources of geometric convergence. Spectral features survive almost any data perturbation—even replacing numbers with random tokens preserves the Fourier spikes. But geometric convergence collapses when you remove contextual associations, restrict attention windows, or use the wrong architecture. Transformers and modern recurrent models succeed; deep LSTMs completely fail despite identical spectral structure.
Tokenization determines everything. When the researchers trained models on 9-digit addition—where each digit requires multiple tokens—both spectral and geometric convergence emerged universally, independent of optimizer or random seed. But single-token arithmetic, like 3-digit problems, produced no convergence at all. The embeddings scattered randomly, probe accuracy stayed at chance, and results varied wildly across training runs.
Unlike the sudden phase transitions seen in grokking, both tiers of convergence emerge gradually and continuously during natural language pretraining. This work reframes how we interpret learned representations: periodic signals are everywhere, but functional modularity requires precise alignment of data, architecture, and optimization. To explore more research like this and create your own video presentations, visit EmergentMind.com.