- The paper demonstrates that LLMs construct low-dimensional, curved manifolds to encode character counts for effective linebreaking.
- It reveals that distributed attention-head algorithms employ rotational alignment of manifold representations for robust boundary detection.
- Causal interventions and ablations confirm that manipulating these manifolds drastically alters newline predictions without impacting general token output.
Introduction and Motivation
This paper provides a detailed mechanistic analysis of how LLMs, specifically Claude 3.5 Haiku, perform the implicit visual-spatial task of linebreaking under fixed-width constraints—despite operating solely on token sequences. The study reveals that the model constructs low-dimensional, curved manifolds in its residual stream to represent character counts, leveraging algorithmic manipulations akin to biological place and boundary cells. This work unifies dictionary-feature-based and geometric perspectives to reverse-engineer model behavior on a task both prevalent in pretraining and challenging due to latent spatial structure.
Figure 1: Schematic overview of the linebreaking task, examined feature families, and the geometric substrate of the learned representations and computations.
Task Description and Computational Challenge
The linebreaking task is formalized as follows: given tokenized input drawn from line-wrapped text, the model must infer the character position within the current line, the overall line width constraint, and decide—using only sequential context and token identities—whether the next token should be a newline or additional inline content.
Figure 2: Example prompts illustrating the difficulty of the linebreaking task; the next word may or may not fit, influencing the model's decision boundary.
The inherent difficulty stems from the need for accurate incremental character counting, adaptive boundary detection across varying line widths, and integration with semantic prediction of the next token. The attribution graph (Figure 3) distills this process into a series of compositional, causally upstream features culminating in the "predict newline" decision.
Figure 3: Attribution graph showing how features for line width and character count synergize to activate downstream features indicating proximity to the line boundary, ultimately controlling the newline prediction.
Manifold Representations of Scalar Quantities
The central insight is that LLMs eschew both trivial one-hot and degenerate one-dimensional representations for scalar variables (e.g., character count). Instead, the model encodes counts as points on a rippled, highly curved 1D manifold, embedded in a low-dimensional subspace of width d≪N (with N the full representational dimensionality). Sparse feature families tile this manifold, providing local coordinates and facilitating causal interventions for interpretability.
Figure 4: Key computational steps in the linebreaking behavior, all characterized as operations on geometric manifolds.
Figure 5: Family of features tuned to different ranges of character count; receptive fields become broader with higher counts, mirroring biological number representation.
This manifold structure allows efficient use of representational capacity, high-fidelity discrimination across a large range of counts, and the application of linear computations (e.g., rotations) necessary for boundary detection circuits.
Figure 6: Character count as a jagged line (manifold) in 6D subspace; local parametrization by identifed features.
The causal sufficiency of these subspaces is empirically validated:
- Ablating the character count subspace dramatically degrades newline predictions, but not generic token prediction (Figure 7).
- Direct interventions on the 6D manifold can causally manipulate the linebreaking behavior (Figure 8).
Figure 7: Ablation of the character count subspace impairs model performance specifically for newline tokens.
Figure 8: Intervening on the character count manifold is sufficient to alter linebreaking decisions.
Probe Analysis and the Geometry of Rippling
Supervised probes for character count exhibit both widening receptive fields and off-diagonal "ringing," analytically predicted by the embedding of a discretized circle or interval into low-dimensional space, subject to capacity constraints.
Figure 9: Probes demonstrate broadening and ringing, a signature of optimal low-dimensional embedding.
Figure 10: Pairwise cosine similarity among character count representations highlights periodic, oscillatory similarity (ringing) in both mean activations and feature decoders.
Figure 11: Ideal and PCA-approximated similarity matrices for discrete points on a circle; rippling emerges naturally from dimensionality reduction.
Figure 12: Particle simulation visualizes the physical analog of constructing a rippled manifold under attraction-repulsion forces in low-dimensional space.
Figure 13: Curves similar to those observed in LLMs arise from high-curvature embeddings in vision and feature manifold studies.
Boundary Detection: Alignment, Twisting, and Stereoscopy
To sense proximity to the line boundary, the model compares the current counted position with the line width constraint. "Boundary heads" in attention layers perform a learned rotational alignment of these two manifold representations via their QK matrices, aligning counts such that i (position) aligns with k=i+ϵ (boundary), thus flagging imminent overflow.
Figure 14: QK transformations ("twisting") align the position and boundary manifolds for efficient boundary detection.
Figure 15: Boundary head QK operations maximize probe alignment in an offset-dependent fashion.
Rather than relying on a single head, multiple heads with distinct offsets "tile" the residual characters-remaining manifold, achieving high resolution and robustness across diverse line widths by distributed, overlapping outputs.
Predicting the Newline: Geometric Structure Enables Linear Decision
At the final decision point, the model merges character count, boundary, and next-token-length features. Analysis reveals that representations of "characters remaining" and "next token length" are arranged on near-orthogonal subspaces, enabling a linear hyperplane to implement the i−j≥0 comparison required for the linebreak, with an AUC of 0.91 for separating break/no-break states.
This orthogonalization substantially simplifies the computation, reducing the complexity of integrating semantic and structural information at the output.
Distributed Algorithms and Mechanistic Construction
Beyond final behavioral analysis, the study deconstructs the distributed counting algorithm. Multiple Layer 0 and Layer 1 attention heads act in concert, employing the attention pattern as an offset-specific sink and adjusting OV contributions according to the attended tokens' lengths. The sum of their outputs reconstructs the counting manifold with required curvature and resolution, while the embedding layer initially seeds the cyclic structure.
Ablation and visualization show:
- Each head tiles the count subspace with distinct offsets.
- OV circuits imbue representations with corrections for local token length statistics.
- Layer 1 heads sharpen and refine Layer 0's coarse count estimates.
- The distributed nature is necessary due to the limited rank and per-head expressiveness constraints.
Visual Illusions and Model Vulnerabilities
A further mechanistically guided finding is that certain context cues (e.g., code delimiters like @@) can "distract" counting heads, acting as visual illusions that disrupt accurate linebreak perception. Systematic analysis across 180 sequence insertions confirms that only a subset, typically reminiscent of code markers, causes significant degradation. The effects correlate with head attention redirection and can be ported to explicit comparison prompts.
This reveals vulnerabilities (especially in nonstandard context) and informs defenses for safety and robustness in model deployment.
Theoretical and Practical Implications
The findings provide strong evidence for:
- Manifolds with intrinsic curvature as the optimal substrate for representing large discrete variables (§R² = 0.985 for character count after layer 1).
- Distributed attention-head algorithms as a mechanism for constructing nontrivial geometric representations and enabling linear computation for task-relevant decisions.
- Duality between dictionary features and continuous manifolds, suggesting future interpretability methods must incorporate both perspectives.
- Manifold-based errors and illusions as a novel concrete handle for adversarial and red-teaming diagnostics.
The work grounds future interpretability in explicit geometric circuit modeling and motivates new directions in automatic geometric structure discovery, causal testing, and robust model training for tasks necessitating spatial or continuous variable reasoning.
Conclusion
This study systematically deconstructs the perceptual algorithm for linebreaking in LLMs, revealing intricate circuits for counting, boundary-sensing, and prediction, all grounded in low-dimensional geometric manifolds. It establishes rigorous causal sufficiency of these representations for model behavior, characterizes the distributed computation needed to construct them, and presents a new approach for interpretability that integrates geometric, feature-based, and causal perspectives. The implications extend to both the theory of representation learning and practical directions for transparent, robust, and safe deployment of LLMs (2601.04480).