Representational Superposition Overview

Updated 4 July 2026

Representational Superposition is the phenomenon where systems encode more features than available orthogonal directions, leading to overlapping and compressed neural codes.
The concept employs linear, sparse, and manifold coding methods to recover latent features, revealing hidden alignment and metric distortions in representational spaces.
It bridges machine learning, quantum theory, and neuroscience, offering actionable insights for improving model interpretability, robustness, and feature disentanglement.

Searching arXiv for the cited work on representational superposition and closely related alignment/interpretability papers. Representational superposition denotes a regime in which a system encodes more representational content than can be assigned to mutually orthogonal axes of its ambient space, so multiple features, concepts, or alternatives are stored in overlapping directions. In the neural literature surveyed here, the canonical formalization is a low-dimensional linear code $y = Az$ with $A \in \mathbb{R}^{m \times n}$ and $m<n$ , where latent features $z$ are compressed into neuron activity $y$ ; related work generalizes this from discrete feature dictionaries to continuous concept manifolds and from passive storage to active computation in superposition (Liu et al., 31 Mar 2026, Modell, 18 May 2026, Hänni et al., 2024). Outside machine learning, closely related notions appear in quantum foundations, biorthogonal resource theory, reference-frame formalisms, and symbolic logic, but these uses are formally distinct and should not be conflated (Ronde, 2016, Pusuluk, 2022, Tammaro et al., 2023, Tzouvaras, 2023).

1. Formal definitions and scope

In the most explicit neural definition, a system is in superposition when latent features $z \in \mathbb{R}^n$ are linearly encoded into a lower-dimensional activity vector $y \in \mathbb{R}^m$ by

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$

Because $m<n$ , the columns of $A$ cannot all be mutually orthogonal, so different features must overlap in neuron space. In that formulation, superposition is the mathematical expression of mixed selectivity or polysemanticity: single neurons respond to multiple latent features because the features are linearly superposed (Liu et al., 31 Mar 2026).

A complementary formalization treats an activation vector as a sparse superposition of feature directions,

$A \in \mathbb{R}^{m \times n}$ 0

with more Boolean features than ambient dimensions. This literature distinguishes representational superposition, where superposition compresses features through a bottleneck, from computation in superposition, where the compressed code is actively used to implement nonlinear operations. In the same framework, features are said to be $A \in \mathbb{R}^{m \times n}$ 1-linearly represented if there exists a read-off matrix $A \in \mathbb{R}^{m \times n}$ 2 such that

$A \in \mathbb{R}^{m \times n}$ 3

for all features $A \in \mathbb{R}^{m \times n}$ 4 and inputs $A \in \mathbb{R}^{m \times n}$ 5 (Hänni et al., 2024).

Recent work broadens the notion further from isolated directions to continuous manifolds. In that setting, a representation has the form

$A \in \mathbb{R}^{m \times n}$ 6

or, for one target concept,

$A \in \mathbb{R}^{m \times n}$ 7

so each concept is represented not by a single direction but by an embedded manifold $A \in \mathbb{R}^{m \times n}$ 8 in activation space. This generalization is explicitly described as “representation manifolds in superposition,” because the concept manifold is additively embedded with nuisance semantics rather than cleanly isolated (Modell, 18 May 2026).

Taken together, these definitions suggest a family resemblance rather than a single theory. The shared core is representational overlap under limited dimensionality or contextual decomposition; the specific mathematical object varies from sparse feature dictionaries to continuous manifolds.

2. Alignment, identifiability, and the problem of hidden similarity

A central result in this literature is that superposition can make systems with the same latent content appear misaligned. For neural responses $A \in \mathbb{R}^{m \times n}$ 9, the representational similarity matrix is

$m<n$ 0

Under superposition,

$m<n$ 1

Writing $m<n$ 2, raw similarity is therefore computed not in the latent Euclidean geometry but in the geometry warped by $m<n$ 3. For two systems exposed to the same latent inputs,

$m<n$ 4

the asymptotic RSA score is

$m<n$ 5

with $m<n$ 6 and $m<n$ 7. Linear CKA collapses to the same expression asymptotically. The consequence is that alignment depends on similarity between the induced latent metrics, not simply on shared latent features. In the paper’s phrase, raw-space metrics conflate “what a system represents” with “how it represents it” (Liu et al., 31 Mar 2026).

The deflation can be large. Under independent random projections with equal output dimension $m<n$ 8,

$m<n$ 9

Thus expected alignment shrinks roughly like the compression ratio $z$ 0. Under partial feature overlap, the same paper shows that ordering can invert: a pair of systems with smaller feature overlap can receive a higher raw alignment score than a fully overlapping but more strongly compressed pair. This suggests that low RSA, CKA, or regression scores may reflect coding geometry rather than different represented content (Liu et al., 31 Mar 2026).

The same issue appears in neuron-matching metrics. If two systems are generated from the same latent feature matrix $z$ 1 but with different mixing matrices,

$z$ 2

then strict matching depends on the cross-structure $z$ 3. In the permutation metric,

$z$ 4

the score reaches $z$ 5 only when columns of the two mixing matrices coincide up to permutation and sign. With different superposition arrangements, even identical latent features yield a lower score. The paper “Superposition disentanglement of neural representations reveals hidden alignment” formalizes this point and proves that if sparse recovery succeeds exactly under a $z$ 6-RIP assumption with $z$ 7, then the recovered latent codes align perfectly and the deflation disappears (Longon et al., 3 Oct 2025).

A further nuance is that apparent misalignment need not imply information loss. Under sparsity assumptions $z$ 8 and a projection matrix satisfying the Restricted Isometry Property, compressed sensing guarantees recoverability when

$z$ 9

Above that regime, low raw-space alignment may indicate different mixtures of the same features rather than loss of latent information; below it, reduced alignment can reflect genuine irrecoverability (Liu et al., 31 Mar 2026).

3. Disentangling superposition and recovering latent structure

The most common practical response is to extract latent features before comparing or interpreting representations. The feature-recovery agenda is explicit in the recommendation to use sparse autoencoders, dictionary learning or sparse coding approaches, and more general feature-disentangling pipelines rather than raw activations (Liu et al., 31 Mar 2026).

One concrete implementation is the TopK sparse autoencoder. Given activations $y$ 0, the encoder computes

$y$ 1

and reconstructs

$y$ 2

In the alignment study, SAEs trained on toy-model, ResNet50, and ViT-B/16 activations frequently increased soft-matching and source-to-target regression alignment when raw units were replaced by sparse latent codes. For one Natural Scenes Dataset subject, DNN-to-brain linear predictivity increased by $y$ 3 for ResNet50 and $y$ 4 for ViT-B/16 after disentangling the DNN source representation, supporting the claim that hidden feature-level alignment can be masked by different neuron mixtures (Longon et al., 3 Oct 2025).

A related line of work targets continuous concepts rather than discrete sparse features. The Manifold Probe factorizes a concept representation as

$y$ 5

where $y$ 6 are scalar concept features and $y$ 7 are encoding directions in activation space. The method first learns which functions of a concept are linearly predictable from activations and then learns the directions that encode them. Applied to Llama 2-7b, it recovers interpretable manifolds for time and geographic space, and time-manifold steering with

$y$ 8

shifts year completions toward the targeted year, with the main evaluation metric being the probability that the model’s completion lies within two years of the steering target (Modell, 18 May 2026).

Biological imaging work pushes the same logic into a multimodal setting. In “Resolving superposition in AI for interpretability and cross-modal alignment in patient-neuronal images,” gated SAEs were trained on over $y$ 9 multiplexed images of patient-derived Parkinson’s disease and healthy neurons. The paper argues theoretically that superposition contaminates representational metric spaces and empirically reports improved effective rank, lower local semantic dispersion, higher Moran’s $z \in \mathbb{R}^n$ 0, and stronger downstream cell-death prediction after SAE expansion; for LRRK2, reported $z \in \mathbb{R}^n$ 1 improved from $z \in \mathbb{R}^n$ 2 to $z \in \mathbb{R}^n$ 3. Those SAE codes are then treated as single-cell state vectors and aligned de novo to scRNA-seq with Gromov-Wasserstein optimal transport (Park et al., 30 Jun 2026).

The same tools also generate negative evidence. In the analysis of PatchTST for time-series forecasting, SAEs trained on post-GELU FFN activations with dictionary sizes from $z \in \mathbb{R}^n$ 4 to $z \in \mathbb{R}^n$ 5 the native dimensionality yielded only an average $z \in \mathbb{R}^n$ 6 downstream MSE difference between the smallest and largest dictionaries, and $z \in \mathbb{R}^n$ 7 amplification of dominant latent features shifted forecasts by only about $z \in \mathbb{R}^n$ 8 MAE on average. The conclusion there is deliberately narrow: no empirical evidence of strong superposition was found in the analyzed layer/model regime, so superposition should not be treated as a universal explanation for transformer behavior (Yıldırım, 6 May 2026).

4. Geometric, spectral, and capacity-theoretic perspectives

Several papers recast superposition as a global geometry problem rather than only a sparse-recovery problem. In “Spectral Superposition: A Theory of Feature Geometry,” the basic object is a weight matrix $z \in \mathbb{R}^n$ 9 whose columns are feature vectors. Two derived operators are central: $y \in \mathbb{R}^m$ 0 The frame operator $y \in \mathbb{R}^m$ 1 is treated as the basis-invariant representation of global feature geometry. For each feature $y \in \mathbb{R}^m$ 2, the paper defines a spectral measure

$y \in \mathbb{R}^m$ 3

which records how the feature allocates norm across eigenspaces of $y \in \mathbb{R}^m$ 4. In the toy-model regime, if fractional dimensionality saturates the capacity bound,

$y \in \mathbb{R}^m$ 5

then every feature localizes spectrally: $y \in \mathbb{R}^m$ 6 for some eigenvalue $y \in \mathbb{R}^m$ 7, equivalently $y \in \mathbb{R}^m$ 8. The paper then shows that localized features partition into eigenspace clusters, each forming a tight frame, and uses association schemes to recover geometries such as simplices, polygons, and antiprisms (Ivanov et al., 2 Feb 2026).

A complementary perspective asks how many near-orthogonal directions a transformer can support at all. “Representational Capacity: Geometric Limits on Feature Representation in Transformer LLMs” defines a set of unit vectors $y \in \mathbb{R}^m$ 9 to be $y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 0-nearly orthogonal when

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 1

Using pairwise cosine distributions of embedding matrices as a measurable proxy, the paper estimates the accepted deviation from orthogonality by the heuristic

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 2

It then contrasts the standard Johnson–Lindenstrauss-style estimate

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 3

with an adjusted formula

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 4

or, equivalently, $y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 5 depending on $y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 6 rather than $y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 7. A flexible fit yields

$y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 8

Across dozens of open-source models, the paper reports two classes: high- $y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.$ 9 models whose embeddings lack near-orthogonal structure, and low- $m<n$ 0 models that maintain it. Capacity is exponentially sensitive to $m<n$ 1; at fixed $m<n$ 2, increasing $m<n$ 3 from $m<n$ 4 to $m<n$ 5 changes the estimated capacity from $m<n$ 6 to $m<n$ 7 (Guha, 1 Jun 2026).

These theories suggest that superposition is constrained simultaneously by local recoverability, global eigenspace structure, and packing limits. A plausible implication is that sparse-feature, manifold, and near-orthogonal-direction accounts are not alternatives so much as different coordinate systems on the same capacity problem.

5. Architectural manifestations and dynamical consequences

Superposition is also studied as a computational and dynamical phenomenon. “Mathematical Models of Computation in Superposition” argues that earlier theory emphasized representational bottlenecks, whereas some networks perform useful nonlinear computation while remaining fully superposed. Its flagship construction computes all pairwise ANDs of $m<n$ 8 Boolean features in a one-layer MLP while keeping both inputs and outputs in superposition; for superposed inputs, the paper gives a construction with width

$m<n$ 9

and the abstract emphasizes a $A$ 0-neuron realization of the universal pairwise-AND task. The deeper result adds error-correction layers so that fully connected networks of width $A$ 1 can emulate circuits of width $A$ 2 and polynomial depth while staying in superposition (Hänni et al., 2024).

The same geometry can produce vulnerabilities. In “Adversarial Attacks Leverage Interference Between Features in Superposition,” a layer is modeled as

$A$ 3

with $A$ 4, non-orthogonal feature directions, and sparse activations. For targeted attacks between source class $A$ 5 and target class $A$ 6, the optimal perturbation under an $A$ 7 constraint is

$A$ 8

In a synthetic setting without superposition, obtained by setting the bottleneck dimension equal to the class count, the paper reports zero successful adversarial examples across $A$ 9 attempts at all tested $A \in \mathbb{R}^{m \times n}$ 00, whereas superposed bottlenecks are vulnerable. This is used to argue that adversarial vulnerability can be a byproduct of representational compression rather than only of irregular decision boundaries or non-robust features (Stevinson et al., 13 Oct 2025).

Continual-learning work makes the dynamics explicit. In the synthetic framework of “Sparsity, Superposition, and Forgetting,” the learned representation of pure feature $A \in \mathbb{R}^{m \times n}$ 01 at time $A \in \mathbb{R}^{m \times n}$ 02 is

$A \in \mathbb{R}^{m \times n}$ 03

with representation strength

$A \in \mathbb{R}^{m \times n}$ 04

unit direction

$A \in \mathbb{R}^{m \times n}$ 05

superposition strength

$A \in \mathbb{R}^{m \times n}$ 06

and retention proxy

$A \in \mathbb{R}^{m \times n}$ 07

The empirical summary is nuanced: superposition tends to increase over time with transient dips at task boundaries; higher feature sparsity induces more superposition; but forgetting is concentrated in features that are both weakly represented and highly overlapped, while strongly represented features can remain stable despite overlap. At the task level, effective rank

$A \in \mathbb{R}^{m \times n}$ 08

increases with sparsity, suggesting broader capacity usage under sparse regimes (Wasilewski et al., 18 Jun 2026).

Graph neural networks add topological structure to the same problem. “Superposition in Graph Neural Networks” extracts graph-level class centroids and node-level linear-probe directions, then quantifies overlap using

$A \in \mathbb{R}^{m \times n}$ 09

and a Welch-normalized overlap score. Across GCN, GIN, and GAT, increasing width produces a three-phase pattern in overlap; topology imprints overlap onto node-level features; pooling partially remixes node geometry into task-aligned graph axes; sharper pooling increases axis alignment and reduces channel sharing; and shallow models can settle into metastable low-rank embeddings. These results place superposition at the intersection of width, topology, pooling, and final-layer activation choice rather than treating it as a width-only bottleneck phenomenon (Pertl et al., 31 Aug 2025).

6. Broader formalisms outside neural representation

Outside machine learning and neuroscience, the phrase “representational superposition” refers to formally different constructions. In quantum foundations, Christian de Ronde’s representational-realist program argues that the traditional measurement problem should be replaced by the superposition problem: $A \in \mathbb{R}^{m \times n}$ 10 where each term is linked by the Born rule to a meaningful physical statement, and the central question is how to conceptually represent the superposition itself rather than merely explain why one outcome is observed. In that program, superposition and contextuality are treated as representational problems requiring non-classical concepts rather than as defects to be reduced to classical actuality (Ronde, 2016, Ronde, 2016).

A mathematically sharper quantum extension appears in the biorthogonal resource theory of genuine quantum superposition. For a non-orthogonal basis $A \in \mathbb{R}^{m \times n}$ 11 with Gram matrix $A \in \mathbb{R}^{m \times n}$ 12, the paper defines the dual basis $A \in \mathbb{R}^{m \times n}$ 13 and a biorthogonal representation

$A \in \mathbb{R}^{m \times n}$ 14

Its key claim is that standard off-diagonal superposition misses a second source of nonclassicality residing in the overlaps of non-orthogonal basis states themselves. Genuine quantum superposition therefore includes both inter-basis and intra-basis contributions (Pusuluk, 2022).

Other papers develop alternative coordinate systems for known quantum superpositions. In the spin-projection mean representation for qubits, the state is parameterized by mean spin projections $A \in \mathbb{R}^{m \times n}$ 15, with density matrix

$A \in \mathbb{R}^{m \times n}$ 16

and Hilbert-space superposition becomes a nonlinear addition rule for these expectation values rather than linear addition of amplitudes (Fedorov et al., 2019).

A different generalization puts the representation relation itself into superposition. “Considering a superposition of classical reference frames” introduces a wavefunctional over coordinate transformations,

$A \in \mathbb{R}^{m \times n}$ 17

with a Born-rule interpretation

$A \in \mathbb{R}^{m \times n}$ 18

The superposed object is not primarily a particle state but the map between two otherwise classical coordinate descriptions, and transformed wavefunctions are obtained by integrating over possible inter-frame transformations (Tammaro et al., 2023).

Finally, “Propositional superposition logic” introduces a symbolic connective $A \in \mathbb{R}^{m \times n}$ 19 with choice-based collapse semantics. Its central theorem places the connective strictly between conjunction and disjunction: $A \in \mathbb{R}^{m \times n}$ 20 with both entailments strict in general. Here superposition is neither vector addition nor neural feature packing, but a non-truth-functional connective interpreted by a collapse-via-choice map (Tzouvaras, 2023).

These broader uses share a concern with representation under overlap, contextuality, or deferred resolution, but the papers are explicit that symbolic, quantum, and neural superposition are not interchangeable notions. A careful reading of the literature therefore treats “representational superposition” as a family of formally distinct theories linked by a common pressure: finite representational resources must encode more alternatives, features, or relations than a naive one-to-one scheme allows.