ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data

Published 19 Apr 2026 in cs.LG, cs.AI, and cs.CL | (2604.17663v1)

Abstract: Constitution-conditioned post-training can be analysed as a structured perturbation of a model's learned representational geometry. We introduce ATLAS, a geometry-first program that traces constitution-induced hidden-state structure across charts, models, and substrates. Instead of treating the relevant unit as a single behaviour, neuron, vector, or patch, ATLAS tests a local chart whose tangent structure, occupancy distribution, and behavioural coupling can be measured under system change. On Gemma, the anchored source-local chart captures 310 / 320 reviewed source rows and all 84 / 84 reviewed score-flip rows, but compact exact-patch sufficiency does not close, so the exportable unit is the broader source-defined family. Freezing that family, we re-identify a target-local realisation in an unadapted Phi model, where the fully adjudicated confirmatory contrast separates with AUC 0.984 and mean gap 5.50. In held-out ALM8 mouse frontal-cortex perturbation data, the same source-defined family receives support across 5/5 folds, with mean held-out AUC 0.72 and mean fold gap 4.50. A multiple-choice analysis provides the main boundary: nearby target-local signals can appear without source-faithful closure. The resulting correspondence is not coordinate identity, site identity, or a target-side mediation theorem. It is geometric recurrence under redistribution: written constitutions can induce recoverable latent geometry whose organisation remains detectable across model and substrate changes while its local coordinates, occupancy, and behavioural expression shift.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces ATLAS, demonstrating that constitution-conditioned post-training imprints a recoverable latent geometry in language models.
It employs quantitative metrics such as tangent space overlap, occupancy distribution, and AUC to validate redistributive geometric patterns across models and neural data.
The study reveals that latent geometry re-identification occurs at a family-level, ensuring robust behavioral separation despite occupancy redistribution.

Constitution-Conditioned Latent Geometry and Redistribution in LLMs and Neural Perturbation Data

Introduction and Problem Setup

This work develops a representational geometric and cross-system analytic framework—ATLAS—to interrogate the effects of constitutional post-training on LMs and their correspondence with biological neural data. The central inquiry is whether constitution-conditioned post-training imprints a recoverable and reusable structure in the hidden-state space of an LM, and whether this structure constitutes a latent "geometry" that can be robustly re-identified across models and even across biological recordings. The paper employs Gemma 3 1B IT as the constitution-trained source model, Phi-4 Mini Instruct as an unadapted cross-model target, and ALM8 mouse frontal-cortex perturbation data as a held-out biological substrate.

Figure 1: Representation of the experiment's stages—Gemma source-local chart, source-defined family, Phi target-local realization, ALM8 held-out bridge realization, and conceptual claim tier.

The analysis moves beyond output-centric evaluations typical of post-training alignment work, instead establishing a geometry-first empirical program. ATLAS defines and operationalizes local "charts"—tangent subspaces in hidden state space whose organization, distribution, and behavioral coupling are quantitatively tracked through post-training and model transfer.

Formal Framework: Geometry Contracts and Redistributive Recurrence

ATLAS conceptualizes system-internal organization via local charts—low-dimensional structures on specific hidden-state differences, e.g., the adapter-on vs. adapter-off surface. Constitution conditioning induces such a chart on the Gemma source model, quantifiable through measures of tangent space overlap, occupancy distribution, and behavioral separability. Importantly, the underlying structure of interest is not a single neuron or direction, but a localized chart within a broader "source-defined family."

Within this framework, several operational distinctions are made:

Chart-level support: Preservation of tangent structure (basis-angle metrics, empirical tangent measures).
Occupancy: Overlap of occupancy and distribution (e.g., Wasserstein and energy metrics).
Behavioral coupling: Ability to behaviorally separate constitutional vs. null random controls (AUC, mean gap).
Specificity: Resistance to being explained by null, random, or orthogonal controls.
Redistribution: The regime where tangent structure is preserved but occupancy is not: local coordinates and occupancy shift, while the chart is still present and behaviorally functional.

This enables rigorous claim statements: the source-defined family is exported, not a compact patch or a single direction, and cross-model tests seek re-identification of this family as opposed to coordinate- or site-faithful transfer.

Figure 2: Schematic of the shared experimental structure, analysis pathways for structural validation, and discovery procedures.

Gemma: Source-Local Chart and Exportable Family

On Gemma, constitution-conditioned post-training reliably expresses a locally recoverable chart coupled with behavior. The core source-local chart (anchored at reason@checkpoint-2250 in high_effective_mi) captures 310/320 reviewed relevant rows and all 84/84 reviewed score-flip rows, showing strong linkage between geometry and behavioral change. Notably, compact exact-patch sufficiency is not established—informative-constitution patches do not generalize as sole sufficient units, necessitating reliance on a broader family.

Figure 3: Visualization of the Gemma source-local chart, the broader source-defined family, and delineation of exact-patch sufficiency boundaries.

The correct exportable unit for downstream correspondence claims is then not a minimal patch but the broader, behaviorally active high_effective_mi family.

Phi: Downstream Structural Validation and Redistribution

The downstream analysis asks whether the frozen Gemma-defined family can be robustly "rediscovered" in an unadapted Phi model under a fixed search contract. The search is constrained to a six-candidate band (layers 23–25, reason and late_reason modes). The selected target-local realization (Phi layer 24, reason, delta, candidate index 5) demonstrates strong downstream behavioral separation on the canonical evaluation: AUC 0.984 (bootstrap 95% CI [0.952, 1.0]) and mean gap 5.50 between constitution and control.

Crucially, occupancy support is not complete—redistribution (not strict identity) is observed on the Phi result, with tangent structure preserved but occupancy metrics failing threshold for exact identity. Control-suite AUCs do not exceed 0.669, so specificity is robust. The MCQ boundary analysis (discussed below) further clarifies this non-identity.

Figure 4: Structure of the Phi search band, frozen target-local lane, and confirmed cross-model re-identification.

ALM8 Mouse Perturbation: Cross-Substrate Corroboration and Redistribution

The study probes the persistence of these constitution-conditioned families beyond artificial neural systems by leveraging ALM8 mouse frontal-cortex photoinhibition data. The cross-substrate protocol is stringent: all assignments (family, relation, atlas rows) are frozen before held-out testing. The held-out corroboration establishes significant separability in all five tested folds: fold mean AUC of 0.72, mean fold gap 4.50, chart-level structural support in all cases, but with 1/5 losing strict occupancy support, reinforcing the theme of redistribution rather than isomorphic replay.

Figure 5: Demonstration of ALM8 held-out corroboration and explicitly visualized redistribution effects across held-out mouse folds.

The cross-substrate recurrence is family-level, not slot-level: local expression in neural data is broader/outwardly shifted, yet the induced geometry remains detectable.

Boundary Conditions and Negative Results: Local Signal and Failure of Source-Faithful Closure

The MCQ route implements the strongest limiting test. Here, a local signal is present in Phi, but without behavioral separation and without closing structural correspondence. Both reasoning and answer-inclusive subspaces yield AUCs well below 0.5 (0.434, 0.452) with negative mean gaps, and cannot beat random/orthogonal controls. Diagnostic hidden-state analysis reveals one-sided re-entry (110/192 informative rows vs 0 for control), with failure modes dominated by displacement rather than rotation. This decoupling demonstrates that local target activity does not imply source-faithful or behaviorally causal structure.

Figure 6: Illustration of the MCQ boundary, indicating local signal, one-sided structural re-entry, and predominant displacement driving non-identity.

Operational Consequences, Bounded Auditability, and Future Directions

The implications for model auditing and interpretability are sharply delimited. Family-level recurrent structures, once identified, can be frozen and subjected to post hoc audits and targeted prompt-manipulation. However, these units are not "replay laws," causal monitors, or universal repair primitives. Prompt manipulations can induce changes in local structure without a proportional effect on final behavior, confirming that behavior coupling is weaker than geometric mobility.

Figure 7: Bounded operational utility: fixed local targets are auditable and prompt manipulable, but not universally replayable.

Theoretical implications suggest that constitution-conditioned geometry occupies a mesoscopic scale—families of nearby local subspaces, not rigid transformations, persist across architectures and even across the biological-artificial boundary under redistribution. This reframes representational transfer and post-training effects as redistributive and recurrent at the family level, not site- or coordinate-specific. The strong limits are also notable: exact occupancy-faithful cross-system replay is not achieved. Future developments should focus on geometry-first post-training protocols, explicit family identification, and potentially tighter behavioral coupling under transfer.

Conclusion

ATLAS provides robust evidence that constitution-conditioned post-training imprints a recoverable, behavior-linked local chart and family structure in LM representational geometry. This structure recurs with strong tangent-level correspondence and behavioral separability in both unadapted LMs and neural perturbation data, but only at the family or redistributive level—exact site and occupancy identity do not persist. The methodology developed here points toward family-level, auditably recurrent latent geometry as a scientifically tractable, operationally bounded target for future alignment, model interpretability, and cross-modal neural correspondence studies.

Reference: "ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across LLMs and Neural Perturbation Data" (2604.17663)

Markdown Report Issue