Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

Published 2 May 2026 in cs.LG and cs.AI | (2605.01609v1)

Abstract: We test whether the causal inner product of \citet{park2024linear} -- defined by the unembedding covariance $Σ$ -- enables cross-lingual concept transport. Across 17 models and 4 language pairs, a matched-spectrum randomization test finds that Whitened Causal Alignment is indistinguishable from spectral regularization alone ($p = 0.95$). However, this failure reveals a broader phenomenon: anti-concentration is observed in residual-stream difference-of-means vectors across five architecture families ($p < 10^{-33}$) and supported by SAE features (e.g., $p = 4.5 \times 10^{-19}$) and linear probes on Gemma and Llama. We discover a \emph{dual geometry}: activation-space concept directions anti-concentrate in the spectral tail, while static unembedding-row contrasts \emph{concentrate} in high-variance directions ($p < 10^{-4}$). Split-injection causal interventions support the functional basis on Gemma and Llama (Cohen's $d$ up to $1.80$), and POS-tag probing across 8 models shows syntax preferentially encodes in the high-variance subspace in 6 of 8 architectures ($p < 0.013$), with the Qwen~2.5 family showing a significant reversal consistent with architecture-specific spectral structure. These results suggest transformers may rotate semantic content into spectrally quiet regions during contextualized processing, encoding concepts where they can be manipulated with reduced grammatical disruption.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that transformer activation-space encodes semantic concepts in the spectral tail while syntax is concentrated in high-variance directions.
It employs three extraction methods across 17 models to quantify spectral energy distribution and reveals significant anti-concentration effects.
The study challenges the causal geometry hypothesis for cross-lingual transfer, offering new insights for interpretability and steering in LLMs.

Dual Geometry and Spectral Anti-Concentration in Transformer Representations

Introduction

The paper "Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations" (2605.01609) presents a comprehensive spectral analysis of concept encoding in LLMs, elucidating the geometric structure underlying semantic and syntactic information. The initial hypothesis is that the causal inner product—constructed from the inverse covariance of the unembedding matrix—provides a meaningful basis for cross-lingual concept transfer. However, a rigorous matched-spectrum randomization test across multiple architectures reveals that Whitened Causal Alignment (WCA) confers no practical advantage, refuting the causal geometry hypothesis for cross-lingual transport. Instead, the core insight arises from the characterization of where and how concepts are encoded in activation space relative to the eigenspectrum of the unembedding matrix.

Empirical Framework and Methodological Rigour

Seventeen transformer models spanning five architecture families (Llama, Qwen, Gemma, Mistral, SmolLM, MoE) and four language pairs are systematically analyzed. Concept directions are extracted using three mechanistically distinct methods:

Difference-of-means on residual activations (subtractive, activation-based)
Sparse autoencoder features (contextualized, unsupervised)
Linear probes via L2-regularized logistic regression (contextualized, supervised)

Spectral analysis projects each concept direction onto the eigenbasis of language-specific unembedding covariance, quantifying spectral energy distribution and anti-concentration via Gini deviation and Spectral Center of Mass. All methods converge, revealing a robust phenomenon: concept directions systematically anti-concentrate in the spectral tail, while unembedding-row contrasts concentrate in high-variance directions.

Negative Result for Causal Geometry in Cross-Lingual Transport

The causal geometry hypothesis predicts that WCA, which aligns representation spaces after whitening with the inverse square root of the unembedding covariance, should outperform naive Procrustes alignment. However, across all model/language pair combinations, matched-spectrum randomization shows that real WCA yields a 50.3% win rate, indistinguishable from randomized WCA (51.0%) with $p=0.95$ . This outcome demonstrates that any benefit is attributable to spectral regularization—not to the semantic content of causal eigendirections.

Spectral Anti-Concentration: Mechanistic Interpretability

Difference-of-means vectors, SAE features, and probe-derived concept directions all exhibit highly significant spectral anti-concentration (mean Gini = −0.282, all within-model $p<10^{-6}$ ), consistently placing disproportionate energy in low-eigenvalue directions. Importantly, this effect is not an artifact of the extraction method or a byproduct of subtractive cancellation: linear probe and SAE methods, which do not rely on subtraction, demonstrate identical anti-concentration.

Unembedding-row contrasts, on the other hand, concentrate in high-variance directions (positive Gini, $p<10^{-4}$ ), and random token-pair differences only mildly anti-concentrate (mean Gini ≈ −0.10). Concept-specific pairs are anomalous, actively concentrating, indicating that the unembedding matrix encodes concept contrasts along principal axes for maximal impact during token prediction.

Dual Geometry: Concept vs. Syntax Encoding

The observed geometry is dual: vocabulary space (unembedding matrix) maximally influences logit prediction via principal axes, while activation-space representations push concept information into the spectral tail. This suggests that transformers spectrally rotate semantic content into low-variance subspaces during forward processing, enabling semantic manipulation with reduced syntactic disruption.

Functional evidence is provided via split-injection steering, which decomposes concept vectors into spectral components and measures perplexity degradation on held-out text. Injection along high-variance directions (shouting) consistently increases perplexity and grammatical interference more than low-variance injection (whispering), as quantified by Cohen's $d$ up to 1.80 and significant $p$ -values in Gemma and Llama models.

POS-tag probing further corroborates this interpretation: classifiers trained on activations projected onto top-10% vs. bottom-10% eigenvectors reveal that syntactic information is preferentially encoded in the high-variance subspace in 6 out of 8 architectures, with the Qwen 2.5 family anomalously reversing this trend.

Implications and Theoretical Connections

This spectral allocation has immediate impact on interpretability and steering. PCA-based interpretability methods focused only on top principal components are likely to miss substantive concept-level information, as semantic representations reside in the spectral tail. Effective steering strategies should prioritize low-variance projection to avoid disruption of syntactic routing—a finding directly relevant for activation engineering frameworks.

The anti-concentration phenomenon connects to superposition theory [Elhage et al., 2022]: the high-variance subspace is claimed by frequent syntactic patterns, constraining effective semantic dimensionality and forcing semantic superposition even in high-capacity models. The spectral signature of anti-concentration reflects how transformers allocate representational capacity: syntax “shouts” in robust high-variance directions, semantics “whispers” in the quiet subspace.

Architecture-dependent effects (e.g., Qwen 2.5’s reversal) indicate that spectral allocation is not universal, warranting deeper investigation into how training processes, vocabulary structure, and normalization interact with covariance spectra.

Limitations and Future Directions

Current concept sets are derived from word-level counterfactual pairs, leaving open whether anti-concentration generalizes to abstract or higher-order concepts. Spectral analysis is restricted to uncentered, uniformly weighted covariance from the unembedding matrix; alternate constructions (layerwise, frequency-weighted, activation-covariance spectra) may reveal other phenomena. Layer-wise temporal analysis of spectral rotation is an immediate direction for future work, as is extension to encoder and encoder-decoder architectures.

Further investigation should address the Qwen 2.5 architectural anomaly, test broader conceptual categories, and explore the role of spectral allocation in reasoning tasks and adversarial steering.

Conclusion

Through rigorous experimentation across a diverse set of LLM architectures and extraction methods, this paper establishes that transformer activation-space concept representation consistently anti-concentrates in the low-variance spectral tail of the unembedding covariance, while syntactic information is encoded in high-variance directions. The dual geometry between reasoning and vocabulary spaces suggests that transformers rotate concepts into “quiet” regions, facilitating semantic manipulation with minimal grammatical interference. These findings challenge prior assumptions about geometric transfer and inform the development of more effective interpretability and steering protocols, marking a shift in understanding transformer spectral structure and its functional implications.

Markdown Report Issue