Representation Manifolds in Superposition

Updated 20 May 2026

Representation manifolds in superposition are geometric mappings that embed low-dimensional concept spaces into high-dimensional, overlapping feature representations.
They exhibit regimes from sparse, nearly orthogonal configurations to strong, overlapping interference that underpin scaling laws in machine learning and quantum systems.
Probing methods like manifold analysis and geodesic evaluation offer practical tools to disentangle polysemantic representations for improved interpretability and model design.

Representation manifolds in superposition describe the geometric, algebraic, and statistical structure of how high-dimensional models (notably neural networks and quantum systems) encode more features than their representational dimension permits, by allowing features to overlap in the underlying vector space and arranging them into manifolds whose points correspond to interpretable concepts. This phenomenon is central to the scaling laws of large neural models, to mechanistic interpretability, to the representation of concepts in both classical and quantum systems, and to sophisticated mathematical frameworks used in supergeometry and functional analysis.

1. Definition and Mathematical Formalism

A representation manifold is the image, in activation (or representation) space, of a low-dimensional concept space $\mathcal Z$ under an embedding $\phi : \mathcal Z \to \mathbb R^d$ or $\phi : \mathcal Z \to \mathbb S^{d-1}$ . Features are encoded as continuous or discrete-valued parameters $z \in \mathcal Z$ , mapped into curved submanifolds $\mathcal M = \phi(\mathcal Z)$ within a high-dimensional sphere or linear space, rather than as isolated orthogonal directions (Modell et al., 23 May 2025).

Superposition refers to the regime in which more features are represented than there are dimensions in the latent space ( $n \gg m$ ), such that feature representations overlap (i.e., rows or columns of the decoding matrix $W$ are non-orthogonal) (Liu et al., 15 May 2025, Elhage et al., 2022, Prieto et al., 10 Mar 2026). Mathematically, in neural models, this is formalized as

$h(x) = \sum_i p_i(x) w_i,$

where $w_i \in \mathbb R^m$ are feature directions, $p_i$ are interpretable properties of the data, and $\phi : \mathcal Z \to \mathbb R^d$ 0 is the latent representation. In quantum mechanics, the associated manifold is a submanifold of the generalized Bloch sphere $\phi : \mathcal Z \to \mathbb R^d$ 1 where pure states correspond to extremal points and superpositions trace orbits under phase actions (Aerts et al., 2015).

Superposition can be categorized into weak (at most $\phi : \mathcal Z \to \mathbb R^d$ 2 features, nearly orthogonal) and strong (all or nearly all $\phi : \mathcal Z \to \mathbb R^d$ 3 features represented, significant overlap), with the geometry and interference properties of the embedded manifold varying accordingly (Liu et al., 15 May 2025).

2. Geometric Properties and Interference

The geometry of representation manifolds in superposition is intrinsically tied to the arrangement of feature directions:

In the uncorrelated/sparse regime, features optimize for minimal mutual interference, generating locally regular polytopes (simplices, regular polygons, or higher-dimensional analogs) on the sphere $\phi : \mathcal Z \to \mathbb R^d$ 4 (Elhage et al., 2022, Prieto et al., 10 Mar 2026). Non-negative activation regions are separated by large angles, and neurons become "polysemantic"—responding to multiple unrelated features.
In the correlated or data-driven regime, feature directions cluster, form cycles, or align with the principal components of the data covariance, resulting in constructive interference where overlapping representations facilitate functionally useful signal sharing (Prieto et al., 10 Mar 2026). This is prevalent in language data, where features (e.g., months, semantic clusters) geometrically arrange into cyclic or clustered manifolds reflecting co-occurrence statistics.

The average squared overlap between normalized representation vectors (rows of $\phi : \mathcal Z \to \mathbb R^d$ 5) empirically concentrates at $\phi : \mathcal Z \to \mathbb R^d$ 6 in large-scale models, evidencing strong superposition and geometric isotropy or near-ETF (Equiangular Tight Frame) structures (Liu et al., 15 May 2025).

These geometric considerations determine:

The error scaling (with error per-feature $\phi : \mathcal Z \to \mathbb R^d$ 7 under strong superposition).
The nature and structure of interference: whether it is filtered out as noise (harmful) or harnessed to aid signal (constructive).
The adversarial vulnerability and interpretability of internal representations.

3. Learning Dynamics, Scaling Laws, and Phase Transitions

Representation manifolds in superposition are emergent objects shaped by the optimization and data properties:

Scaling Laws: In large models (LLMs), empirical loss scales robustly as $\phi : \mathcal Z \to \mathbb R^d$ 8 with the hidden dimension in the strong superposition regime. This scaling can be derived from the geometric mean squared overlap and is observed across open-source and proprietary LLMs (Liu et al., 15 May 2025).
Phase Transitions: Toy model analyses reveal sharp first-order transitions as a function of sparsity or feature importance, between regimes where features are dropped, paired antipodally, packed into polytopes, or given dedicated subspaces (Elhage et al., 2022). Each transition corresponds to a discrete change in the geometry of the manifold, observable as discontinuities in loss during training.
Constructive Interference: In data regimes with correlated features (e.g., LLMs with Zipfian frequency distributions), the classical loss-minimizing strategy shifts to align feature directions so that their interference is beneficial for reconstruction, often without requiring non-linear filtering (Prieto et al., 10 Mar 2026).

The BOWS (Bag-of-Words Superposition) model formalizes these ideas in a controlled setting, demonstrating how real-data statistics induce anisotropic, semantically meaningful feature manifold structures beyond what is predicted by uncorrelated toy models (Prieto et al., 10 Mar 2026).

4. Methodologies for Discovery and Probing

Identification and analysis of representation manifolds in superposition employ structured probing techniques:

Manifold Probe: A supervised method generalizing linear probes, designed to extract the embedding $\phi : \mathcal Z \to \mathbb R^d$ 9 and the set of linearly accessible features $\phi : \mathcal Z \to \mathbb S^{d-1}$ 0 for a target concept in a neural representation (Modell, 18 May 2026). It solves sequential regression/eigenproblems to recover basis directions and feature functions, enabling visualization and quantification of manifold structure (e.g., for time or spatial concepts in Llama 2-7B).
Geodesic Analysis and Cosine Similarity: By validating that graph-based geodesic distances among manifold points match intrinsic concept-space distances, and that cosine similarity decays monotonically with squared feature-space distance, the structure of the manifold and its relationship to semantic similarity can be empirically confirmed (Modell et al., 23 May 2025).
Experimental Interventions: Steerability tests, whereby intervention along a discovered manifold direction produces measurable and interpretable output changes (e.g., shifting LLM predictions for document release years), confirm not only the descriptive adequacy but also the causal involvement of the extracted manifold in model behavior (Modell, 18 May 2026).

5. Connections to Quantum, Supergeometry, and Representation Theory

The mathematical formalism of representation manifolds in superposition is both inspired by and directly connected to frameworks in quantum mechanics, supergeometry, and representation theory:

Quantum Mechanics: The manifold of pure quantum states forms a curved submanifold (complex projective space) within a generalized Bloch sphere. Superposition manifests as nonlinear orbits, with interference, entanglement, and phase structure geometrically encoded (Aerts et al., 2015, Fedorov et al., 2019). The nonlinear addition law for mean-spin vectors in qubits projects Hilbert-space superposition onto the Bloch sphere manifold (Fedorov et al., 2019).
Supermanifolds and Integral Representations: In supergeometry, superposition of ordinary manifolds into supermanifolds is realized via integral kernels and Berezin–Fourier transforms, underpinning constructions such as the super Hodge dual, Liouville volumes, and picture-changing operators in string theory (Castellani et al., 2016). Representation manifolds in superposition thus generalize to functorial constructions on supermanifolds.
Convenient Category of Supermanifolds: The mathematical apparatus supports global objects such as mapping supermanifolds, loop supergroups, and section spaces, each encoding the totality of "superposed" classical or field-theoretic data as points of a representable functor (Alldridge, 2011).
Representation-Theoretic Constructions: Induced modules, convolution algebras, and Frobenius reciprocity can be viewed as organizing the superposition of possible representations into geometric or algebraic manifolds, with functional-analytic properties carrying over to the infinite-dimensional setting (Alldridge, 2011).

6. Implications for Mechanistic Interpretability and Model Design

Representation manifolds in superposition have direct and critical implications for interpretability and practical model architecture:

Polysemanticity: The necessity of superposing features in constrained dimensions leads to neurons that respond to multiple, sometimes unrelated, features, complicating the mapping between units and human concepts (Elhage et al., 2022). This calls into question the viability of naive circuit-level interpretability and necessitates advanced dictionary learning or probing methods to disentangle features.
Adversarial Vulnerabilities: Overlapping manifold structure entails that adversarial perturbations can exploit interference directions to manipulate activations, with vulnerability scaling with the degree of superposition (Elhage et al., 2022).
Optimization of Capacity: Understanding the manifold geometry allows for the design of architectures and regularizers (e.g., enforcing ETF-like row structures in decoding matrices, tuning weight decay) that maximize representation capacity and error-correcting potential (Liu et al., 15 May 2025).
Semantic Structure Emergence: In real networks, where feature correlations are strong, emergent clusters and cycles in representation space directly track data-driven semantics, as confirmed in BOWS experiments (Prieto et al., 10 Mar 2026). This suggests that probing and interpreting these manifolds is essential to extract latent knowledge and causal dependencies.

7. Broader Theoretical and Empirical Context

The emergence, geometry, and scaling of representation manifolds in superposition synthesize diverse conceptual threads:

Scaling laws of LLMs are linked to the universal geometry of strong superposition, leading to robust inverse-dimension loss decay (Liu et al., 15 May 2025).
Identification of value-manifolds versus presence-manifolds and their diagnostic separation is a target for future interpretability methods (Prieto et al., 10 Mar 2026).
The global structure of representation manifolds is both an object of modern geometric analysis (as in supergeometry and infinite-dimensional representation theory) and the focus of empirical statistical analysis in state-of-the-art neural models (Modell et al., 23 May 2025, Modell, 18 May 2026).

In summary, representation manifolds in superposition constitute a unifying geometric and algebraic backbone for understanding high-dimensional representations across modern machine learning, quantum mechanics, and supergeometry, with far-reaching consequences for interpretability, model scaling, and the mathematical foundations of feature encoding.