Papers
Topics
Authors
Recent
Search
2000 character limit reached

Representational Superposition Overview

Updated 4 July 2026
  • Representational Superposition is the phenomenon where systems encode more features than available orthogonal directions, leading to overlapping and compressed neural codes.
  • The concept employs linear, sparse, and manifold coding methods to recover latent features, revealing hidden alignment and metric distortions in representational spaces.
  • It bridges machine learning, quantum theory, and neuroscience, offering actionable insights for improving model interpretability, robustness, and feature disentanglement.

Searching arXiv for the cited work on representational superposition and closely related alignment/interpretability papers. Representational superposition denotes a regime in which a system encodes more representational content than can be assigned to mutually orthogonal axes of its ambient space, so multiple features, concepts, or alternatives are stored in overlapping directions. In the neural literature surveyed here, the canonical formalization is a low-dimensional linear code y=Azy = Az with ARm×nA \in \mathbb{R}^{m \times n} and m<nm<n, where latent features zz are compressed into neuron activity yy; related work generalizes this from discrete feature dictionaries to continuous concept manifolds and from passive storage to active computation in superposition (Liu et al., 31 Mar 2026, Modell, 18 May 2026, Hänni et al., 2024). Outside machine learning, closely related notions appear in quantum foundations, biorthogonal resource theory, reference-frame formalisms, and symbolic logic, but these uses are formally distinct and should not be conflated (Ronde, 2016, Pusuluk, 2022, Tammaro et al., 2023, Tzouvaras, 2023).

1. Formal definitions and scope

In the most explicit neural definition, a system is in superposition when latent features zRnz \in \mathbb{R}^n are linearly encoded into a lower-dimensional activity vector yRmy \in \mathbb{R}^m by

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.

Because m<nm<n, the columns of AA cannot all be mutually orthogonal, so different features must overlap in neuron space. In that formulation, superposition is the mathematical expression of mixed selectivity or polysemanticity: single neurons respond to multiple latent features because the features are linearly superposed (Liu et al., 31 Mar 2026).

A complementary formalization treats an activation vector as a sparse superposition of feature directions,

ARm×nA \in \mathbb{R}^{m \times n}0

with more Boolean features than ambient dimensions. This literature distinguishes representational superposition, where superposition compresses features through a bottleneck, from computation in superposition, where the compressed code is actively used to implement nonlinear operations. In the same framework, features are said to be ARm×nA \in \mathbb{R}^{m \times n}1-linearly represented if there exists a read-off matrix ARm×nA \in \mathbb{R}^{m \times n}2 such that

ARm×nA \in \mathbb{R}^{m \times n}3

for all features ARm×nA \in \mathbb{R}^{m \times n}4 and inputs ARm×nA \in \mathbb{R}^{m \times n}5 (Hänni et al., 2024).

Recent work broadens the notion further from isolated directions to continuous manifolds. In that setting, a representation has the form

ARm×nA \in \mathbb{R}^{m \times n}6

or, for one target concept,

ARm×nA \in \mathbb{R}^{m \times n}7

so each concept is represented not by a single direction but by an embedded manifold ARm×nA \in \mathbb{R}^{m \times n}8 in activation space. This generalization is explicitly described as “representation manifolds in superposition,” because the concept manifold is additively embedded with nuisance semantics rather than cleanly isolated (Modell, 18 May 2026).

Taken together, these definitions suggest a family resemblance rather than a single theory. The shared core is representational overlap under limited dimensionality or contextual decomposition; the specific mathematical object varies from sparse feature dictionaries to continuous manifolds.

2. Alignment, identifiability, and the problem of hidden similarity

A central result in this literature is that superposition can make systems with the same latent content appear misaligned. For neural responses ARm×nA \in \mathbb{R}^{m \times n}9, the representational similarity matrix is

m<nm<n0

Under superposition,

m<nm<n1

Writing m<nm<n2, raw similarity is therefore computed not in the latent Euclidean geometry but in the geometry warped by m<nm<n3. For two systems exposed to the same latent inputs,

m<nm<n4

the asymptotic RSA score is

m<nm<n5

with m<nm<n6 and m<nm<n7. Linear CKA collapses to the same expression asymptotically. The consequence is that alignment depends on similarity between the induced latent metrics, not simply on shared latent features. In the paper’s phrase, raw-space metrics conflate “what a system represents” with “how it represents it” (Liu et al., 31 Mar 2026).

The deflation can be large. Under independent random projections with equal output dimension m<nm<n8,

m<nm<n9

Thus expected alignment shrinks roughly like the compression ratio zz0. Under partial feature overlap, the same paper shows that ordering can invert: a pair of systems with smaller feature overlap can receive a higher raw alignment score than a fully overlapping but more strongly compressed pair. This suggests that low RSA, CKA, or regression scores may reflect coding geometry rather than different represented content (Liu et al., 31 Mar 2026).

The same issue appears in neuron-matching metrics. If two systems are generated from the same latent feature matrix zz1 but with different mixing matrices,

zz2

then strict matching depends on the cross-structure zz3. In the permutation metric,

zz4

the score reaches zz5 only when columns of the two mixing matrices coincide up to permutation and sign. With different superposition arrangements, even identical latent features yield a lower score. The paper “Superposition disentanglement of neural representations reveals hidden alignment” formalizes this point and proves that if sparse recovery succeeds exactly under a zz6-RIP assumption with zz7, then the recovered latent codes align perfectly and the deflation disappears (Longon et al., 3 Oct 2025).

A further nuance is that apparent misalignment need not imply information loss. Under sparsity assumptions zz8 and a projection matrix satisfying the Restricted Isometry Property, compressed sensing guarantees recoverability when

zz9

Above that regime, low raw-space alignment may indicate different mixtures of the same features rather than loss of latent information; below it, reduced alignment can reflect genuine irrecoverability (Liu et al., 31 Mar 2026).

3. Disentangling superposition and recovering latent structure

The most common practical response is to extract latent features before comparing or interpreting representations. The feature-recovery agenda is explicit in the recommendation to use sparse autoencoders, dictionary learning or sparse coding approaches, and more general feature-disentangling pipelines rather than raw activations (Liu et al., 31 Mar 2026).

One concrete implementation is the TopK sparse autoencoder. Given activations yy0, the encoder computes

yy1

and reconstructs

yy2

In the alignment study, SAEs trained on toy-model, ResNet50, and ViT-B/16 activations frequently increased soft-matching and source-to-target regression alignment when raw units were replaced by sparse latent codes. For one Natural Scenes Dataset subject, DNN-to-brain linear predictivity increased by yy3 for ResNet50 and yy4 for ViT-B/16 after disentangling the DNN source representation, supporting the claim that hidden feature-level alignment can be masked by different neuron mixtures (Longon et al., 3 Oct 2025).

A related line of work targets continuous concepts rather than discrete sparse features. The Manifold Probe factorizes a concept representation as

yy5

where yy6 are scalar concept features and yy7 are encoding directions in activation space. The method first learns which functions of a concept are linearly predictable from activations and then learns the directions that encode them. Applied to Llama 2-7b, it recovers interpretable manifolds for time and geographic space, and time-manifold steering with

yy8

shifts year completions toward the targeted year, with the main evaluation metric being the probability that the model’s completion lies within two years of the steering target (Modell, 18 May 2026).

Biological imaging work pushes the same logic into a multimodal setting. In “Resolving superposition in AI for interpretability and cross-modal alignment in patient-neuronal images,” gated SAEs were trained on over yy9 multiplexed images of patient-derived Parkinson’s disease and healthy neurons. The paper argues theoretically that superposition contaminates representational metric spaces and empirically reports improved effective rank, lower local semantic dispersion, higher Moran’s zRnz \in \mathbb{R}^n0, and stronger downstream cell-death prediction after SAE expansion; for LRRK2, reported zRnz \in \mathbb{R}^n1 improved from zRnz \in \mathbb{R}^n2 to zRnz \in \mathbb{R}^n3. Those SAE codes are then treated as single-cell state vectors and aligned de novo to scRNA-seq with Gromov-Wasserstein optimal transport (Park et al., 30 Jun 2026).

The same tools also generate negative evidence. In the analysis of PatchTST for time-series forecasting, SAEs trained on post-GELU FFN activations with dictionary sizes from zRnz \in \mathbb{R}^n4 to zRnz \in \mathbb{R}^n5 the native dimensionality yielded only an average zRnz \in \mathbb{R}^n6 downstream MSE difference between the smallest and largest dictionaries, and zRnz \in \mathbb{R}^n7 amplification of dominant latent features shifted forecasts by only about zRnz \in \mathbb{R}^n8 MAE on average. The conclusion there is deliberately narrow: no empirical evidence of strong superposition was found in the analyzed layer/model regime, so superposition should not be treated as a universal explanation for transformer behavior (Yıldırım, 6 May 2026).

4. Geometric, spectral, and capacity-theoretic perspectives

Several papers recast superposition as a global geometry problem rather than only a sparse-recovery problem. In “Spectral Superposition: A Theory of Feature Geometry,” the basic object is a weight matrix zRnz \in \mathbb{R}^n9 whose columns are feature vectors. Two derived operators are central: yRmy \in \mathbb{R}^m0 The frame operator yRmy \in \mathbb{R}^m1 is treated as the basis-invariant representation of global feature geometry. For each feature yRmy \in \mathbb{R}^m2, the paper defines a spectral measure

yRmy \in \mathbb{R}^m3

which records how the feature allocates norm across eigenspaces of yRmy \in \mathbb{R}^m4. In the toy-model regime, if fractional dimensionality saturates the capacity bound,

yRmy \in \mathbb{R}^m5

then every feature localizes spectrally: yRmy \in \mathbb{R}^m6 for some eigenvalue yRmy \in \mathbb{R}^m7, equivalently yRmy \in \mathbb{R}^m8. The paper then shows that localized features partition into eigenspace clusters, each forming a tight frame, and uses association schemes to recover geometries such as simplices, polygons, and antiprisms (Ivanov et al., 2 Feb 2026).

A complementary perspective asks how many near-orthogonal directions a transformer can support at all. “Representational Capacity: Geometric Limits on Feature Representation in Transformer LLMs” defines a set of unit vectors yRmy \in \mathbb{R}^m9 to be y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.0-nearly orthogonal when

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.1

Using pairwise cosine distributions of embedding matrices as a measurable proxy, the paper estimates the accepted deviation from orthogonality by the heuristic

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.2

It then contrasts the standard Johnson–Lindenstrauss-style estimate

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.3

with an adjusted formula

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.4

or, equivalently, y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.5 depending on y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.6 rather than y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.7. A flexible fit yields

y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.8

Across dozens of open-source models, the paper reports two classes: high-y=Az,ARm×n,m<n.y = Az, \qquad A \in \mathbb{R}^{m \times n}, \qquad m<n.9 models whose embeddings lack near-orthogonal structure, and low-m<nm<n0 models that maintain it. Capacity is exponentially sensitive to m<nm<n1; at fixed m<nm<n2, increasing m<nm<n3 from m<nm<n4 to m<nm<n5 changes the estimated capacity from m<nm<n6 to m<nm<n7 (Guha, 1 Jun 2026).

These theories suggest that superposition is constrained simultaneously by local recoverability, global eigenspace structure, and packing limits. A plausible implication is that sparse-feature, manifold, and near-orthogonal-direction accounts are not alternatives so much as different coordinate systems on the same capacity problem.

5. Architectural manifestations and dynamical consequences

Superposition is also studied as a computational and dynamical phenomenon. “Mathematical Models of Computation in Superposition” argues that earlier theory emphasized representational bottlenecks, whereas some networks perform useful nonlinear computation while remaining fully superposed. Its flagship construction computes all pairwise ANDs of m<nm<n8 Boolean features in a one-layer MLP while keeping both inputs and outputs in superposition; for superposed inputs, the paper gives a construction with width

m<nm<n9

and the abstract emphasizes a AA0-neuron realization of the universal pairwise-AND task. The deeper result adds error-correction layers so that fully connected networks of width AA1 can emulate circuits of width AA2 and polynomial depth while staying in superposition (Hänni et al., 2024).

The same geometry can produce vulnerabilities. In “Adversarial Attacks Leverage Interference Between Features in Superposition,” a layer is modeled as

AA3

with AA4, non-orthogonal feature directions, and sparse activations. For targeted attacks between source class AA5 and target class AA6, the optimal perturbation under an AA7 constraint is

AA8

In a synthetic setting without superposition, obtained by setting the bottleneck dimension equal to the class count, the paper reports zero successful adversarial examples across AA9 attempts at all tested ARm×nA \in \mathbb{R}^{m \times n}00, whereas superposed bottlenecks are vulnerable. This is used to argue that adversarial vulnerability can be a byproduct of representational compression rather than only of irregular decision boundaries or non-robust features (Stevinson et al., 13 Oct 2025).

Continual-learning work makes the dynamics explicit. In the synthetic framework of “Sparsity, Superposition, and Forgetting,” the learned representation of pure feature ARm×nA \in \mathbb{R}^{m \times n}01 at time ARm×nA \in \mathbb{R}^{m \times n}02 is

ARm×nA \in \mathbb{R}^{m \times n}03

with representation strength

ARm×nA \in \mathbb{R}^{m \times n}04

unit direction

ARm×nA \in \mathbb{R}^{m \times n}05

superposition strength

ARm×nA \in \mathbb{R}^{m \times n}06

and retention proxy

ARm×nA \in \mathbb{R}^{m \times n}07

The empirical summary is nuanced: superposition tends to increase over time with transient dips at task boundaries; higher feature sparsity induces more superposition; but forgetting is concentrated in features that are both weakly represented and highly overlapped, while strongly represented features can remain stable despite overlap. At the task level, effective rank

ARm×nA \in \mathbb{R}^{m \times n}08

increases with sparsity, suggesting broader capacity usage under sparse regimes (Wasilewski et al., 18 Jun 2026).

Graph neural networks add topological structure to the same problem. “Superposition in Graph Neural Networks” extracts graph-level class centroids and node-level linear-probe directions, then quantifies overlap using

ARm×nA \in \mathbb{R}^{m \times n}09

and a Welch-normalized overlap score. Across GCN, GIN, and GAT, increasing width produces a three-phase pattern in overlap; topology imprints overlap onto node-level features; pooling partially remixes node geometry into task-aligned graph axes; sharper pooling increases axis alignment and reduces channel sharing; and shallow models can settle into metastable low-rank embeddings. These results place superposition at the intersection of width, topology, pooling, and final-layer activation choice rather than treating it as a width-only bottleneck phenomenon (Pertl et al., 31 Aug 2025).

6. Broader formalisms outside neural representation

Outside machine learning and neuroscience, the phrase “representational superposition” refers to formally different constructions. In quantum foundations, Christian de Ronde’s representational-realist program argues that the traditional measurement problem should be replaced by the superposition problem: ARm×nA \in \mathbb{R}^{m \times n}10 where each term is linked by the Born rule to a meaningful physical statement, and the central question is how to conceptually represent the superposition itself rather than merely explain why one outcome is observed. In that program, superposition and contextuality are treated as representational problems requiring non-classical concepts rather than as defects to be reduced to classical actuality (Ronde, 2016, Ronde, 2016).

A mathematically sharper quantum extension appears in the biorthogonal resource theory of genuine quantum superposition. For a non-orthogonal basis ARm×nA \in \mathbb{R}^{m \times n}11 with Gram matrix ARm×nA \in \mathbb{R}^{m \times n}12, the paper defines the dual basis ARm×nA \in \mathbb{R}^{m \times n}13 and a biorthogonal representation

ARm×nA \in \mathbb{R}^{m \times n}14

Its key claim is that standard off-diagonal superposition misses a second source of nonclassicality residing in the overlaps of non-orthogonal basis states themselves. Genuine quantum superposition therefore includes both inter-basis and intra-basis contributions (Pusuluk, 2022).

Other papers develop alternative coordinate systems for known quantum superpositions. In the spin-projection mean representation for qubits, the state is parameterized by mean spin projections ARm×nA \in \mathbb{R}^{m \times n}15, with density matrix

ARm×nA \in \mathbb{R}^{m \times n}16

and Hilbert-space superposition becomes a nonlinear addition rule for these expectation values rather than linear addition of amplitudes (Fedorov et al., 2019).

A different generalization puts the representation relation itself into superposition. “Considering a superposition of classical reference frames” introduces a wavefunctional over coordinate transformations,

ARm×nA \in \mathbb{R}^{m \times n}17

with a Born-rule interpretation

ARm×nA \in \mathbb{R}^{m \times n}18

The superposed object is not primarily a particle state but the map between two otherwise classical coordinate descriptions, and transformed wavefunctions are obtained by integrating over possible inter-frame transformations (Tammaro et al., 2023).

Finally, “Propositional superposition logic” introduces a symbolic connective ARm×nA \in \mathbb{R}^{m \times n}19 with choice-based collapse semantics. Its central theorem places the connective strictly between conjunction and disjunction: ARm×nA \in \mathbb{R}^{m \times n}20 with both entailments strict in general. Here superposition is neither vector addition nor neural feature packing, but a non-truth-functional connective interpreted by a collapse-via-choice map (Tzouvaras, 2023).

These broader uses share a concern with representation under overlap, contextuality, or deferred resolution, but the papers are explicit that symbolic, quantum, and neural superposition are not interchangeable notions. A careful reading of the literature therefore treats “representational superposition” as a family of formally distinct theories linked by a common pressure: finite representational resources must encode more alternatives, features, or relations than a naive one-to-one scheme allows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Representational Superposition.