Papers
Topics
Authors
Recent
Search
2000 character limit reached

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Published 10 Mar 2026 in cs.LG, cs.AI, and cs.CV | (2603.09972v1)

Abstract: A central idea in mechanistic interpretability is that neural networks represent more features than they have dimensions, arranging them in superposition to form an over-complete basis. This framing has been influential, motivating dictionary learning approaches such as sparse autoencoders. However, superposition has mostly been studied in idealized settings where features are sparse and uncorrelated. In these settings, superposition is typically understood as introducing interference that must be minimized geometrically and filtered out by non-linearities such as ReLUs, yielding local structures like regular polytopes. We show that this account is incomplete for realistic data by introducing Bag-of-Words Superposition (BOWS), a controlled setting to encode binary bag-of-words representations of internet text in superposition. Using BOWS, we find that when features are correlated, interference can be constructive rather than just noise to be filtered out. This is achieved by arranging features according to their co-activation patterns, making interference between active features constructive, while still using ReLUs to avoid false positives. We show that this kind of arrangement is more prevalent in models trained with weight decay and naturally gives rise to semantic clusters and cyclical structures which have been observed in real LLMs yet were not explained by the standard picture of superposition. Code for this paper can be found at https://github.com/LucasPrietoAl/correlations-feature-geometry.

Summary

  • The paper introduces the BOWS framework to systematically study superposition in neural networks using realistic data correlations.
  • It demonstrates that correlated features enable constructive interference, leading to efficient, low-rank geometric feature arrangements.
  • Empirical results on text corpora validate theoretical predictions, bridging the gap between toy models and real-world deep learning architectures.

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Introduction and Motivation

The paper "From Data Statistics to Feature Geometry: How Correlations Shape Superposition" (2603.09972) advances the theoretical framework and empirical understanding of feature superposition in neural networks, with a focus on its geometric and statistical characteristics in realistic high-dimensional datasets. Mechanistic interpretability (MI) has conventionally adopted an idealized view, where features are sparse and uncorrelated, leading to interpretations of superposition as a source of deleterious interference that must be actively filtered by non-linearities. This framing motivates dictionary learning approaches such as sparse autoencoders (SAEs). However, these assumptions fall short when considering linguistic or natural data, where feature correlations and redundant informational structure are pervasive.

The authors introduce Bag-of-Words Superposition (BOWS), a controlled yet realistic framework for studying superposition encodings in neural representations, utilizing binary bag-of-words representations compressed via autoencoders. By exploiting known ground-truth features endowed with empirical co-activation statistics, BOWS enables systematic investigations into how the covariance structure of data imprints itself onto the geometry of learned feature representations.

Theoretical Framework: From Noise to Constructive Interference

The manuscript formalizes distinctions between linear and non-linear superposition, specifying the recoverability conditions under which features can be decoded from compressed representations. In the classical regimeโ€”where features are uncorrelated and sparseโ€”superposition introduces unstructured cross-feature interference, which must be actively filtered (e.g., using ReLU nonlinearities) to ensure faithful feature recovery. This scenario yields geometric arrangements that minimize pairwise dot products, with feature vectors organizing as regular polytopes.

Crucially, the authors demonstrate that when features are correlated, interference can be constructive: superposition aligns with the underlying principal components of the covariance structure. In this regime, feature arrangements exploit low-rank structure and co-activation patterns, leading to enhanced computational and norm efficiency. The paper highlights that such solutions naturally arise under capacity constraints (tight bottlenecks) or weight decay, and that interference filtering and constructive mechanisms are complementaryโ€”not mutually exclusive.

Empirical Analysis: Geometry Emerges from Data Statistics

Using the BOWS framework on realistic corpora such as WikiText-103 and OpenWebText, the authors provide empirical evidence for the emergence of geometric feature structuresโ€”from semantic clusters to cyclical arrangements. Non-linear autoencoders with ReLU decoding and tight bottlenecks are shown to recover semantic clusters (anisotropic superposition) and circle-like arrangements for features such as months or days, mirroring structures unearthed in large-scale LLMs.

Notably, the circular structure observed in month representations correlates directly with cyclic co-activation statistics found in natural text. Linear decoding achieves R2โ‰ˆ0.98R^2 \approx 0.98 for these cyclically arranged features, establishing that the observed geometric phenomena are bona fide cases of linear superposition, not artifacts of model nonlinearity.

The analysis extends to features with non-binary reconstruction profiles (e.g., Beatles-related words), providing strong evidence that constructive interference from correlated context terms frequently enhances reconstruction quality. The coexistence of constructive and filtering mechanisms is confirmed: ReLU and bias terms suppress spurious activations in non-supporting contexts, while correlated co-occurrences amplify reconstruction in relevant contexts.

Theoretical Implications and Extensions

The paper distinguishes between presence-coding and value-coding features. Presence-coding features correspond to binary semantic properties, whose geometry is contingent on co-activation statistics and architectural constraints. Value-coding features, by contrast, are defined by the explicit linear decodability of real-valued variables (trigonometric functions in modular arithmetic, geographic coordinates in mapping experiments). The existence of geometric feature manifolds in the absence of co-activation (e.g., circular and spatial arrangements in modular and map tasks) is attributed to the necessity for the model to encode continuous latent variables, not superposition arising from feature correlation. This suggests a principled way to distinguish superposition-induced geometry from intrinsically value-coding representations.

The results challenge simplistic formulations of the Linear Representation Hypothesis (LRH), demonstrating that while feature geometry such as circles or clusters can be explained via linear superposition, not all low-dimensional geometric structure implies linear representability or superposition. The findings harmonize conflicting observations in prior MI studies and refine theoretical expectations on how neural networks exploit data statistics.

Practical Implications and Future Directions

The identification of constructive interference as a mechanism for norm- and rank-efficient encoding has direct implications for the design and interpretation of sparse probing and dictionary learning approachesโ€”particularly in settings with high feature correlation or heavily bottlenecked architectures. The BOWS framework provides a benchmark for systematic evaluation, as ground-truth feature geometry is known. The results support the adoption of regularization strategies (e.g., weight decay) that bias models towards exploiting low-rank structure for improved feature disentanglement.

Unresolved questions include the exact mathematical characterization of the dominance of constructive versus filtering mechanisms, the extension to untied autoencoders, and the interplay with more realistic neural architectures. The observed heterogeneity in representation strategies for different feature types (linear vs. non-linear superposition, value- vs. presence-coding) signals a need for more nuanced analyses of interpretability and robusness in LLMs and other complex DL models.

Conclusion

This work provides a rigorous reconceptualization of superposition in deep networks, replacing the simplistic view of interference as pure noise with a richer theory encompassing both constructive and destructive interference shaped by data statistics. The BOWS framework bridges the gap between toy models and real-world datasets, empirically substantiating claims about the emergence of geometric structure from feature co-activation patterns. The theoretical and experimental results elucidate how semantic clusters and cyclical feature arrangements observed in LLMs can arise purely from exploiting low-rank structure, with practical implications for interpretability, architecture design, and future mechanistic studies.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper looks at how neural networks โ€œpackโ€ lots of ideas into a small mental space. In many models, there are more concepts to remember than there are slots to store them. So the model overlaps concepts in the same spaceโ€”a bit like layering multiple transparent images on top of each other. This overlap is called superposition.

Most past work said this overlap creates โ€œinterferenceโ€ (like static on a radio) that the model must block using a function called ReLU (which keeps only positive signals). But the authors show that, in real data where related ideas often appear together, interference isnโ€™t always bad. Sometimes it helpsโ€”like two voices harmonizing. They introduce a simple, controlled setup, called BOWS (Bag-of-Words Superposition), to study this. It explains why models form meaningful patterns like clusters (similar words close together) and circles (like the months arranged in a loop).

What questions did the researchers ask?

They asked:

  • How do models arrange many related features (like words) when they must share limited space?
  • Can interference between features be helpful, not just harmful?
  • When will a model choose to โ€œlean intoโ€ shared patterns instead of trying to separate everything?
  • Do these choices explain real patterns seen in LLMs, like semantic clusters and circular structures (months of the year)?
  • How can we tell apart patterns caused by co-occurrence (things happen together) from patterns caused by encoding continuous values (like angles or map coordinates)?

How did they study it?

They built a simple but realistic testbed and ran controlled experiments.

  • Bag-of-Words Superposition (BOWS): They turned internet text into โ€œbag-of-wordsโ€ vectors. Each vector says which words appear in a chunk of text (1 = present, 0 = absent). This keeps real-life co-occurrence patterns (e.g., โ€œDecemberโ€ often appears near โ€œChristmasโ€).
  • Autoencoders: They trained small โ€œcompress-and-reconstructโ€ models:
    • A linear autoencoder (no nonlinearity), which acts like finding the main trends in the data.
    • A ReLU autoencoder, which can block negative interference (think: keep only helpful, positive parts).
  • Bottleneck and weight decay: They limited the modelโ€™s โ€œmemory sizeโ€ (the bottleneck) and sometimes used weight decay (a gentle penalty that encourages simpler, smaller weights). Both push the model to share space efficiently.
  • Synthetic tests: They first made fake data with 12 features arranged in a circle (like months) to see how models handle structured correlations.
  • Real tests: They applied the approach to real text (WikiText-103) and visualized the learned feature directions using simple 2D projections to see patterns.
  • Case studies: They examined:
    • Months of the year (do they form a circle?),
    • Semantic clusters (do similar words group together?),
    • The Beatles (do related words support each otherโ€™s reconstruction?),
    • Months vs Roman numerals (do patterns vanish at different model sizes?),
    • โ€œValue-codingโ€ tasks (modular addition circles and city map coordinates), to show when circles/maps appear without co-occurrence.

Key ideas explained in plain terms

  • Superposition: Storing more features than there are dimensions by letting them share directions.
  • Interference: When overlapping features affect each otherโ€”can be bad (static) or good (harmony).
  • ReLU: A gate that keeps positive signals and blocks negative ones, preventing false alarms.
  • Constructive interference: When related features overlap in ways that reinforce each other (e.g., โ€œDecemberโ€ boosting โ€œChristmasโ€).
  • Principal components (PCA): The main directions in which the data variesโ€”like the biggest trends in a crowd.
  • Bottleneck: A small number of โ€œlanesโ€ for many features; forces sharing and clever packing.
  • Weight decay: Encourages simpler, smaller weights; nudges the model to find shared structure.
  • Presence-coding vs value-coding:
    • Presence-coding: โ€œIs this word here?โ€ (binary detection)
    • Value-coding: โ€œWhat is the value/position?โ€ (like an angle or coordinates). Value-coding can create circles/maps even when there are no co-occurrence patterns.

What did they find, and why is it important?

  1. Interference can be helpful when features are correlated
  • In toy examples with circularly related features and in real text, both linear and ReLU autoencoders often arrange features so that related ones share directions.
  • This lets interference become constructive: the overlap helps the model reconstruct whatโ€™s present with less effort.
  1. Two strategies can work together
  • The model often combines:
    • Constructive arrangement (pack related features together so they help each other), and
    • ReLU filtering (block any leftover harmful interference and avoid false positives).
  • Example: With โ€œBeatlesโ€ words, related names (Lennon, McCartney, etc.) often improve reconstruction when they appear together; when similar context appears without โ€œBeatles,โ€ the ReLU blocks false activation.
  1. Real patterns match whatโ€™s seen in LLMs
  • Semantic clusters: When the modelโ€™s memory is tight or weights are kept small, word features group by meaning (verbs together, sports terms together, etc.).
  • Circular structures: The months of the year form a circle because their co-occurrence follows the seasons. The model mirrors this structure, and โ€œDecemberโ€ can help reconstruct โ€œChristmasโ€ in context.
  1. Not all circles come from co-occurrenceโ€”sometimes itโ€™s value-coding
  • In modular addition (math with wrap-around) and city coordinates, circles/maps appear because the model encodes continuous values (like sine/cosine or latitude/longitude), not because words co-occur.
  • When the authors โ€œablateโ€ (remove) everything except these value-coding directions, the model still works wellโ€”proof these value features are doing the heavy lifting.
  1. Different features โ€œde-superposeโ€ at different speeds
  • As the modelโ€™s memory grows, some groups (like months) become nearly independent (orthogonal) sooner than others (like Roman numerals).
  • This shows real data is a mix: some features rely more on constructive sharing; others can be split apart with more capacity.

So whatโ€™s the big picture?

  • New perspective: Superposition isnโ€™t just a problem to be fixed; it can be a strategy. When features are related, the model can arrange them so overlap helps rather than hurts.
  • Explains real observations: The paper helps explain why LLMs show semantic clusters and neat circles (like months) in their internal spaces.
  • Practical impact:
    • Better interpretability tools: Knowing when overlap is helpful can guide how we design and evaluate feature-finding methods (like sparse autoencoders).
    • Training choices: Tight bottlenecks and weight decay encourage efficient, constructive sharingโ€”useful for compact or robust models.
    • Safer edits and robustness: Understanding feature geometry may help with knowledge editing and defending against adversarial tricks.

Limitations and future work: BOWS is intentionally simple and doesnโ€™t capture everything about LLMs. The next steps include analyzing more realistic settings and precisely predicting when a model will prefer constructive sharing versus strict separation.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paperโ€™s account of how correlations shape superposition.

  • Formal dominance conditions: Derive precise, testable conditions (in terms of the spectrum of EE, latent size mm, weight decay ฮป\lambda, and bias bb) that predict when constructive interference (linear superposition) will dominate over interference filtering (non-linear superposition) and vice versa, including error bounds on R2R^2 as a function of spectral decay.
  • Transition thresholds with capacity: Characterize and validate the critical latent dimension(s) at which the model transitions from PCA-like circular structure to antipodal/non-linear regimes; study how these thresholds scale with dd, spectral gaps, and feature sparsity.
  • Beyond tied, single-layer AEs: Extend analysis and experiments to untied-weight decoders, deeper autoencoders, and architectures closer to transformer blocks to test whether linear superposition emerges under realistic model inductive biases.
  • Application to real LM activations: Directly test in pretrained LLM hidden states whether the constructive-interference account explains observed clusters/cycles; quantify how much of the geometry can be attributed to low-rank co-activation structure versus other factors.
  • Robustness to distribution shift: Quantify how constructive interference degrades when test-time co-activation patterns deviate from training (e.g., altered seasonality, topic reweighting); report false-positive/false-negative rates as a function of bias thresholds and correlation mismatch.
  • Systematic regularization study: Map how different regularizers (L2, L1, sparsity penalties, orthogonality constraints), dropout, and initialization affect the balance between constructive and filtering solutions and the resulting feature geometry.
  • Nonlinearity choice: Evaluate how GELU, leaky-ReLU, soft-thresholding, or sigmoid affect the coexistence of constructive interference and filtering, including how learned biases adapt across nonlinearities.
  • Quantifying constructive vs destructive interference: Develop a general contribution-decomposition metric that partitions each featureโ€™s reconstruction into self vs contextual contributions and positive vs negative interference, and report distributions across the vocabulary.
  • Reliable geometry metrics beyond UMAP: Supplement visualizations with quantitative, reproducible measures (e.g., PCA variance explained, pairwise cosine matrices, anisotropy indices, silhouette scores, mutual information with human annotations) and report seed variability.
  • Causal tests on corpus statistics: Perform interventions that preserve marginals but scramble co-occurrences (e.g., permuting time-related words) to causally verify that observed circles/clusters arise from covariance structure rather than visualization artifacts.
  • Sensitivity to BOWS design choices: Probe how vocabulary size VV, context window size cc, binary vs count features, TF-IDF weighting, and inclusion of stop-words/subwords affect the covariance spectrum and the learned geometry.
  • Higher-order interactions: Assess whether higher-order (beyond pairwise) co-activations meaningfully shape geometry; construct datasets with controlled higher-order structure and test resulting arrangements.
  • Negative correlations and exclusivity: Study geometry for negatively correlated or mutually exclusive features (e.g., antonyms, disjoint categories) to see if specific antagonistic structures emerge and how ReLUs handle them.
  • Rare-word regime under Zipfโ€™s law: Systematically analyze how frequency and context diversity predict whether a feature becomes linear-superposed, non-linear, or nearly orthogonal; model the frequencyโ€“correlationโ€“geometry trade-off.
  • Sample complexity and generalization: Establish how many samples are required to learn a projector close to the true principal subspace and to keep constructive interference aligned at test time; provide concentration bounds or empirical scaling laws.
  • Bias-setting theory: Develop predictions for learned negative biases from data statistics (e.g., means, variances, off-subspace residuals) to control false positives; test biasโ€“threshold calibration procedures.
  • Downstream impact: Connect geometry to task performance by measuring whether constructive interference improves downstream metrics (e.g., perplexity, classification, probing) and whether forcing PCA-like geometry via regularization helps or hurts.
  • Cross-corpora and modalities: Validate findings across multiple corpora (beyond WikiText/OpenWebText) and modalities (vision/audio) where co-occurrence structure differs, and test whether constructive interference consistently creates semantic clusters.
  • Relation to classic embeddings: Compare AE-induced geometry with Word2Vec/GloVe/PMI factorization on the same corpora; isolate when similarities arise from shared low-rank co-occurrence structure versus differences due to reconstruction objectives.
  • SAEs on BOWS as a benchmark: Operationalize BOWS as a benchmark for sparse autoencoders by defining quantitative geometry recovery metrics (e.g., alignment with known PCs, cluster ordering scores) and evaluating SAEsโ€™ ability to recover ground-truth structures.
  • Distinguishing presence- vs value-coding in practice: Propose diagnostics and pipelines to separate manifolds arising from value codes (e.g., (sinโกฮธ,cosโกฮธ)(\sin\theta,\cos\theta)) from correlation-driven superposition in real models; apply to calendar/rotation/geography features in LLMs.
  • Ablations in BOWS: Mirror the value-coding ablations by ablating principal-subspace vs orthogonal components in BOWS-trained AEs to quantify how much reconstruction depends on low-rank projections vs ReLU filtering.
  • Scaling laws in high dimensions: Explore how geometry evolves as d,V,md,V,m scale to realistic LM sizes; assess whether PCA-like structures persist or break under extreme overcompleteness.
  • Training dynamics and path dependence: Investigate whether solutions converge to different geometries depending on optimization schedule, learning rates, or early stopping, and whether phase transitions occur during training.
  • Formal links to constrained PCA: Provide a rigorous connection between ReLU AEs with weight decay and constrained PCA (or projectors with thresholds), clarifying when the non-linear model effectively implements a soft projector.
  • Interaction with adversarial robustness and editing: Empirically test whether constructive-interference geometries alter adversarial vulnerability or the locality of knowledge edits by measuring transfer/interference among correlated features.
  • Tokenization and sequential structure: Move beyond bag-of-words to subword tokenization and sequential models that capture order/syntax; examine whether constructive interference remains a primary driver of geometry when order-sensitive correlations are present.
  • Untested hyperparameter spaces: Report systematic sweeps over mm, weight decay, and biases (rather than snapshots) with confidence intervals to ensure robustness of the claimed regimes.
  • Generalized decoders: Compare per-feature linear decoders to joint multi-output decoders and to sparse readouts to ensure the โ€œlinear superpositionโ€ designation is not an artifact of decoder choice.
  • Practical detection algorithms: Develop tools to automatically identify cycles, clusters, and antipodal pairs in learned features and to classify each featureโ€™s regime (linear-superposed vs non-linear vs orthogonal) at scale.

Practical Applications

Immediate Applications

The following applications can be deployed now using the paperโ€™s findings, code, and training recipes. They focus on exploiting constructive interference in correlated features, distinguishing linear vs nonโ€‘linear superposition, and recognizing valueโ€‘coding features.

  • Interpretability toolkit upgrade for LLMs and foundation models
    • What: Add BOWS-based diagnostics (from the paperโ€™s code) to existing pipelines: (i) linear vs nonโ€‘linear superposition Rยฒ tests via linear decoders on selected feature sets; (ii) geometry dashboards (PCA/UMAP, off-diagonal Frobenius norms) to detect semantic clusters and circular structures; (iii) bias/threshold checks for ReLU filtering.
    • Sector(s): Software/AI research, safety.
    • Tools/products/workflows: โ€œBOWSBenchโ€ module integrated into interpretability suites; CI jobs that flag geometry shifts after fineโ€‘tuning; feature-geometry reports in model cards.
    • Assumptions/dependencies: Linear Representation Hypothesis (LRH) approximately holds for targeted features; access to hidden activations and weights; tied or inspectable weights; negative biases available or emulatable in decoders.
  • Better sparse autoencoder (SAE) training recipes for feature discovery
    • What: Train SAEs with tighter bottlenecks and modest weight decay to bias toward constructive (lowโ€‘rank) superposition when data are correlated; monitor weight norms and rank proxies to avoid over-orthogonalization.
    • Sector(s): Software, academia.
    • Tools/products/workflows: Updated SAE configs; norm/rank dashboards; curriculum to alternate bottleneck size and weight decay; SAE evaluation using BOWS with known groundโ€‘truth geometry.
    • Assumptions/dependencies: Availability of correlated features in the target layer; compute for hyperparameter sweeps; willingness to accept anisotropic feature clusters when beneficial.
  • Safer knowledge editing and fineโ€‘tuning via cluster-aware adjustments
    • What: When editing a concept (e.g., โ€œDecemberโ€), propagate edits across its correlated cluster (e.g., other months, season words) to preserve constructive interference and avoid regressions; verify with linear-superposition tests.
    • Sector(s): NLP products, AI Ops.
    • Tools/products/workflows: Geometryโ€‘aware editing scripts; batch reโ€‘biasing for ReLU thresholds; regression tests that compare oneโ€‘hot vs contextual reconstructions.
    • Assumptions/dependencies: Identifiable clusters/cycles in the target layer; access to model internals; automated evaluation sets reflecting real coโ€‘occurrence patterns.
  • Lowโ€‘rank compression and distillation guided by data covariance
    • What: Use PCA/linear AEs to compress representations where feature covariance is approximately lowโ€‘rank; exploit constructive interference to preserve semantics with fewer dimensions; deploy on-device or latencyโ€‘sensitive settings.
    • Sector(s): Mobile/edge AI, enterprise software.
    • Tools/products/workflows: Lowโ€‘rank adapters; layerโ€‘wise PCA distillation; postโ€‘training projection matrices embedded as lightweight projectors.
    • Assumptions/dependencies: Spectrum concentration in target layers; acceptable accuracyโ€‘latency tradeโ€‘offs; calibration to avoid false positives on residual variance.
  • Retrieval and search indexing with bagโ€‘ofโ€‘words constructive compression
    • What: For documentโ€‘level search or RAG, index documents via BoW+PCA encodings tuned to capture semantic clusters and cycles (e.g., time/season topics), improving recall and memory footprint.
    • Sector(s): Information retrieval, enterprise search.
    • Tools/products/workflows: Indexers that build coโ€‘occurrence covariance and project onto top PCs; hybrid BoW+embedding pipelines.
    • Assumptions/dependencies: Document domains with stable coโ€‘occurrence structure; precomputed covariance remains valid over time; monitoring drift.
  • Adversarial robustness and redโ€‘teaming diagnostics
    • What: Add tests that distinguish constructive vs harmful interference; evaluate whether weight decay and bottlenecks reduce vulnerability by aligning interference with signal; probe features that flip under adversarial contexts.
    • Sector(s): Security, AI safety.
    • Tools/products/workflows: Robustness evals reporting interference alignment scores; offโ€‘diagonal norm tracking under adversarial prompts.
    • Assumptions/dependencies: Attack scenarios that exploit interference; representative adversarial contexts; minimal false sense of securityโ€”nonโ€‘correlated regimes still need ReLU filtering.
  • Timeโ€‘series and seasonal modeling with feature cycles
    • What: Encode seasonal/cyclical structure (months/days) using linear superposition to reduce dimensionality while preserving seasonality for forecasting and anomaly detection.
    • Sector(s): Energy, retail, finance (forecasting).
    • Tools/products/workflows: Preprocessors that project seasonal indicators onto cyclic PCs; lighter models for seasonal components.
    • Assumptions/dependencies: Clear periodic components; stable seasonal coโ€‘occurrence; integration with existing ML stacks.
  • Recommender systems: capacity sharing for correlated items
    • What: Encourage constructive interference by applying weight decay and controlled bottlenecks in item/user embeddings, allowing correlated items to share dimensions without harming ranking.
    • Sector(s): Eโ€‘commerce, media.
    • Tools/products/workflows: Embedding training with norm constraints; interference-aware negative sampling; cluster-level calibration.
    • Assumptions/dependencies: Sufficiently correlated item groups; careful mitigation of popularity bias.
  • Robotics and mapping: valueโ€‘coding probes for spatial variables
    • What: Use linear probes to verify valueโ€‘coding for coordinates or angles (e.g., sin/cos), ensuring models learned the intended continuous variables, and ablate nonโ€‘value subspaces for explainability checks.
    • Sector(s): Robotics, autonomous systems.
    • Tools/products/workflows: Probe libraries; ablation tests that preserve valueโ€‘coding while removing orthogonal subspaces.
    • Assumptions/dependencies: Tasks that require continuous variables; robust linear decodability.
  • Education and training materials on superposition regimes
    • What: Use BOWS notebooks to teach linear vs nonโ€‘linear superposition, constructive interference, and valueโ€‘coding; illustrate cycles (months) and clusters in small AEs.
    • Sector(s): Education, workforce upskilling.
    • Tools/products/workflows: Course modules, assignments using the paperโ€™s repo; visual dashboards.
    • Assumptions/dependencies: Classroom compute and familiarity with AEs.
  • Governance and audits: representationโ€‘geometry reports
    • What: Include featureโ€‘geometry diagnostics (clusters/cycles, linear-vsโ€‘nonโ€‘linear Rยฒ) in model audits and documentation; track shifts postโ€‘fineโ€‘tuning.
    • Sector(s): Policy, compliance, enterprise ML governance.
    • Tools/products/workflows: Audit checklists and standardized plots; regression thresholds for geometry metrics.
    • Assumptions/dependencies: Access to representations; organizational willingness to include interpretability criteria in go/noโ€‘go gates.

Longโ€‘Term Applications

These applications require further research, scaling, or ecosystem maturation (e.g., broader agreement on interpretability standards, generalization from AEs/BOWS to large-scale LMs and multimodal models).

  • Superpositionโ€‘aware architectures and regularizers
    • What: Build training objectives that explicitly encourage constructive interference for correlated features while penalizing harmful interference; dynamic biasing for ReLU thresholds based on estimated residuals.
    • Sector(s): Software/AI platforms.
    • Tools/products/workflows: New loss terms (e.g., covarianceโ€‘aligned projectors), adaptive weight decay, learnable thresholding layers.
    • Assumptions/dependencies: Reliable covariance estimation during training; stability in deep architectures beyond tied AEs.
  • Automated geometryโ€‘aware knowledge editors
    • What: Tools that identify correlated clusters and cyclic structures and apply minimal edits (projector updates, bias shifts) that preserve constructive interference while altering targeted knowledge.
    • Sector(s): AI Ops, NLP products.
    • Tools/products/workflows: โ€œGeometry Editorโ€ with constraints/solvers; sandboxed simulation of contextual reconstructions vs oneโ€‘hot reconstructions.
    • Assumptions/dependencies: Accurate cluster discovery; robust simulators that extrapolate to deployment contexts.
  • Standard benchmarks and certifications for feature geometry
    • What: Industryโ€‘wide benchmarks (extending BOWS) and certifications that require reporting on superposition regimes (linear vs nonโ€‘linear prevalence, interference utilization).
    • Sector(s): Policy, standards, procurement.
    • Tools/products/workflows: Shared datasets with known geometry; certification rubrics; thirdโ€‘party audit services.
    • Assumptions/dependencies: Community consensus; reproducibility across model families; regulator buyโ€‘in.
  • Hardware and systems that exploit lowโ€‘rank constructive interference
    • What: Accelerators with fast lowโ€‘rank projectors and dynamic biasing to benefit from covarianceโ€‘aligned representations; memory hierarchies tuned for projector reuse.
    • Sector(s): Semiconductors, cloud providers.
    • Tools/products/workflows: Kernel libraries for projector inference; compiler passes that detect and fuse projection patterns.
    • Assumptions/dependencies: Persistent lowโ€‘rank structure in production workloads; benefits outweigh complexity.
  • Multimodal generalization
    • What: Apply constructive interference principles to vision/audio (e.g., coโ€‘occurring visual attributes, phonemeโ€‘word coโ€‘activations), guiding feature geometry and compression.
    • Sector(s): Multimodal AI (vision, speech).
    • Tools/products/workflows: Crossโ€‘modal BOWS analogs (e.g., bagโ€‘ofโ€‘attributes); joint projectors across modalities.
    • Assumptions/dependencies: Robust coโ€‘occurrence statistics; alignment with downstream task performance.
  • Bias and safety interventions using cluster maps
    • What: Map and monitor harmful or biased associations as correlated clusters; reconfigure geometry to weaken undesirable constructive interference while preserving utility.
    • Sector(s): Public policy, platform safety.
    • Tools/products/workflows: โ€œBiasโ€‘clusterโ€ dashboards; geometryโ€‘preserving debiasing operations (e.g., nullspace projections).
    • Assumptions/dependencies: Reliable identification of harmful clusters; minimal sideโ€‘effects on benign correlations.
  • Clinical NLP with correlationโ€‘aware representations
    • What: Use constructive interference to represent symptom clusters and comorbidities efficiently, supporting summarization and decision support while minimizing false positives via thresholding.
    • Sector(s): Healthcare.
    • Tools/products/workflows: Clinical AEs trained with weight decay; safetyโ€‘critical ReLU threshold calibration; audit logs for edits to medical concept clusters.
    • Assumptions/dependencies: Highโ€‘quality labeled or weakly supervised clinical corpora; regulatory validation; privacy controls.
  • Finance: factor modeling via learned constructive interference
    • What: Align embeddings with correlated risk/alpha factors using lowโ€‘rank projectors; enable compact models that preserve factor structure and improve interpretability of signals.
    • Sector(s): Finance.
    • Tools/products/workflows: Representation audits against known factor covariance; geometryโ€‘aware portfolio construction features.
    • Assumptions/dependencies: Stable factor correlations; stringent validation against overfitting and regime changes.
  • Continual and multitask learning with shared lowโ€‘rank cores
    • What: Share capacity across correlated tasks/features through a learned projector core; add taskโ€‘specific ReLU thresholds to suppress residuals.
    • Sector(s): Enterprise ML, AutoML.
    • Tools/products/workflows: Adapter stacks composed of projectors + threshold layers; taskโ€‘aware covariance tracking.
    • Assumptions/dependencies: Task correlation; effective avoidance of negative transfer.
  • Humanโ€‘inโ€‘theโ€‘loop UIs for feature geometry exploration
    • What: Interfaces that visualize clusters and cycles, let users test oneโ€‘hot vs contextual reconstructions, and propose safe edit plans that maintain constructive interference.
    • Sector(s): Productivity tools, ML platforms.
    • Tools/products/workflows: Interactive PCA/UMAP canvases; Rยฒ probe widgets; โ€œedit impactโ€ previews.
    • Assumptions/dependencies: Usable performance at scale; clear mental models for nonโ€‘experts.

Notes on overarching assumptions and limitations:

  • Constructive interference is most effective when feature covariance is approximately lowโ€‘rank; spectrum concentration is a key dependency.
  • Results are demonstrated in tiedโ€‘weight autoencoders with ReLU decoders and bagโ€‘ofโ€‘words data; generalization to large transformers and untied architectures requires additional validation.
  • Valueโ€‘coding vs presenceโ€‘coding distinction is crucial: geometry from value codes can exist without superposition and requires different diagnostics and interventions.

Glossary

  • Anisotropic superposition: A feature arrangement where related features cluster rather than minimizing pairwise overlaps, contrary to isotropic/regular arrangements. "as well as anisotropic superposition, where related features cluster together rather than minimizing dot products"
  • Antipodal pairs: A geometry where features are placed as opposite (negatively correlated) directions so one suppresses the other via a nonlinearity. "represents features as antipodal pairs"
  • Bag-of-Words Superposition (BOWS): A controlled framework that encodes binary bag-of-words text in superposition to study realistic feature correlations. "we introduce Bag-of-Words Superposition (BOWS), a framework in which an autoencoder is trained to encode binary bag-of-words representations of internet text in superposition."
  • Bottlenecks (tight bottlenecks): Strong compression regimes where the latent dimension is much smaller than the number of features. "these solutions emerge prominently under tight bottlenecks or weight decay"
  • Co-activation patterns: Patterns of features that tend to be active together, used to arrange features so interference is constructive. "arranging features according to their co-activation patterns naturally gives rise to semantic clusters"
  • Coefficient of determination (R2): A per-feature measure of reconstruction quality comparing predicted and true values. "we define the per-feature coefficient of determination"
  • Constructive interference: Interference among features that aligns with and reinforces the target signal rather than harming it. "interference can be constructive rather than just noise to be filtered out."
  • Cyclical structures: Circular arrangements of related features (e.g., months) in representation space. "cyclical structures which have been observed in real LLMs"
  • Dictionary learning: Methods that learn sparse, interpretable components (atoms) composing representations. "sparse dictionary learning approaches like sparse autoencoders to decompose model activations into an overcomplete basis of linear features"
  • Frobenius norm (off-diagonal): A matrix norm used here to quantify interference via the magnitude of off-diagonal entries. "off-diagonal Frobenius norms"
  • Latent dimension (m): The size of the hidden representation (bottleneck) in the autoencoder. "with varying latent dimensions m."
  • Linear decoder: A linear mapping used to reconstruct features from the latent representation. "train a linear decoder (without a ReLU) to reconstruct their inputs."
  • Linear Representation Hypothesis (LRH): The hypothesis that high-level concepts are linearly represented in model activations. "Definition 4 (Linear Representation Hypothesis)."
  • Linear superposition: A regime where correlated features can be recovered with a linear decoder due to low-rank structure. "We refer to the regime in which low-rank structure in the data supports constructive interference as linear superposition."
  • Low-rank structure: Data covariance concentrated in a few principal components, enabling efficient projection-based reconstruction. "including approximately low-rank structure"
  • Non-linear autoencoder: An autoencoder that uses a nonlinearity (e.g., ReLU) in its reconstruction pathway. "non-linear autoencoders can exploit interference constructively"
  • Non-linear superposition: Superposition where accurate recovery requires a non-linear decoder and cannot be achieved linearly. "as an example of non-linear superposition."
  • Orthogonal complement: The subspace perpendicular to a chosen feature subspace; often ablated to test reliance on particular codes. "zeros its orthogonal complement"
  • Orthogonal projector: A linear operator projecting data onto a subspace, such as the top principal components. "the orthogonal projector onto the top-m principal components"
  • Overcomplete basis: A set of representing directions exceeding the ambient dimensionality, necessitating superposition. "over-complete basis"
  • PCA (Principal Component Analysis): A method that identifies directions of maximal variance; used to reveal circular structures in features. "PCA applied directly to the 12 month dimensions"
  • Pointwise Mutual Information (PMI): A measure of word association used in embedding theory to relate co-occurrence and vector factorization. "Pointwise Mutual Information (PMI) matrix"
  • Presence-coding features: Features that act as detectors for discrete properties, decodable by linear classifiers. "Presence-coding features."
  • Principal subspace: The subspace spanned by the leading principal components identified by PCA. "projection onto the principal subspace"
  • Regular polytopes: Highly symmetric geometric arrangements where pairwise dot products are minimized or uniform. "yielding local structures like regular polytopes."
  • ReLU (Rectified Linear Unit): A nonlinearity that zeroes negative inputs, used to suppress harmful interference. "ReLU filters out interference"
  • ReLU-based filtering: Using ReLU and biases to eliminate negative or spurious activations arising from interference. "ReLU-based filtering remains important for suppressing harmful interference"
  • Semantic clusters: Groupings of features by meaning that emerge in representation space under correlations and constraints. "giving rise to semantic clusters"
  • Sparse autoencoders (SAEs): Autoencoders trained with sparsity to recover interpretable features from activations. "sparse autoencoders (SAEs)"
  • Superposition: Representing more features than dimensions by sharing directions, allowing interference among features. "arranging them in superposition to form an overcomplete basis"
  • Tied weights: An architecture where decoder weights are the transpose of encoder weights. "For tied-weight AEs"
  • UMAP: A non-linear dimensionality reduction technique used to visualize learned feature geometries. "UMAP projections of the word embeddings"
  • Value-coding features: Features that linearly encode continuous variables (e.g., angles, coordinates) used for computation. "we say that a representation h(x) contains a value-coding feature"
  • Weight decay: L2 regularization that penalizes large weights, biasing models toward low-norm solutions. "more prevalent in models trained with weight decay"

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 81 likes about this paper.