Beyond Activation Alignment: The Geometry of Neural Sensitivity

Published 4 May 2026 in cs.LG and stat.ML | (2605.03222v2)

Abstract: Activation-alignment measures such as Representational Similarity Analysis (RSA), Canonical Correlation Analysis (CCA), and Centered Kernel Alignment (CKA) are widely used to compare biological and artificial neural representations. Recent theoretical work interprets many of these methods as assessing agreement between optimal linear readouts over broad families of global tasks. However, agreement at the level of global readouts does not determine how a system uses local stimulus evidence. Specifically, representations may align in activation space yet differ in their sensitivity to small perturbations. To address this challenge, we introduce a complementary framework based on local decodable information, which focuses on a representation's ability, under noise, to discriminate small perturbations within a specified stimulus-coordinate subspace. Building on Fisher information and local representation geometry, we summarize each representation using the expected projected pullback/Fisher metric over that subspace. This formulation induces a second-moment family of local discrimination tasks, for which the resulting operator provides a minimal, complete dataset-level summary of expected discriminability. We compare these regularized signatures using a log-spectral distance on the manifold of symmetric positive definite (SPD) matrices, yielding the Spectral Riemannian Alignment Score (S-RAS) and a uniform multiplicative certificate over the corresponding family of lifted task values. Empirically, this framework enables the recovery of corresponding layers across independently trained artificial neural networks, supports transferable class-conditional probes, reveals controlled dissociations between standard and robust training, and uncovers stimulus-coordinate family effects across mouse visual cortex using the Allen Brain Observatory static gratings dataset.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces S-RAS, a framework that uses Fisher pullback metrics to directly capture local discriminability in neural representations.
It rigorously compares neural sensitivity by applying an affine-invariant Riemannian log-spectral distance across different perturbation families, yielding sharp layer correspondence.
Empirical results on CNNs and mouse visual cortex demonstrate that local geometry-based summaries reveal task-relevant differences beyond traditional activation alignment measures.

The Geometry of Neural Sensitivity: A Dataset-Level Framework for Representation Comparison

Introduction

"Beyond Activation Alignment: The Geometry of Neural Sensitivity" (2605.03222) addresses a critical limitation in prevailing methodologies for neural representation comparison. Classical measures—such as RSA, CCA, and CKA—concentrate on activation alignment, which quantifies similarity at the level of evoked activity or linear decodeability across distributed representations. While effective in many respects, these approaches provide at most indirect evidence about whether two systems use local stimulus evidence analogously, that is, whether aligned readouts depend on the same directions in input space. Distinct encodings may yield high alignment under these metrics while diverging in their local geometric sensitivity and task-relevant input processing.

The paper formalizes a dataset-level geometry-grounded alternative, focusing on the system's statistical ability to discriminate infinitesimal perturbations in input space, as measured by the expected Fisher pullback metric projected onto explicit coordinate subspaces. This induces a family of second-moment local discrimination tasks, equipping each representation with a minimal, regularized summary—thereby establishing a direct operational link to the system's expected local evidence use under noise. Comparison is performed using an affine-invariant Riemannian log-spectral distance on SPD matrices, termed the Spectral Riemannian Alignment Score (S-RAS), yielding direct uniform certificates for task-value multiplicative agreement. The framework is systematically validated on diverse settings, including artificial neural networks (NN) and in vivo calcium measurements in mouse visual cortex.

Theoretical Framework: From Activation to Sensitivity Geometry

The core theoretical advance is the definition and operationalization of dataset-level summaries derived from local geometry in stimulus space, as opposed to global readout alignment.

A neural representation $f:\mathbb{R}^d\rightarrow \mathbb{R}^m$ is interrogated by analyzing how its output changes under small input perturbations. The local metric $M_f(x) = J_f(x)^\top J_f(x)$ , with $J_f(x)$ the Jacobian, encodes the amplification and suppression of input directions in stimulus space, directly corresponding to Fisher information under isotropic noise. Crucially, the framework allows the restriction of these local sensitivity calculations to explicit subspaces (e.g., PCA directions, random projections, or natural parameterizations in biological visual protocols).

To lift from pointwise to dataset-level comparison, the paper rigorously establishes that the projected expected Fisher operator fully characterizes the second-moment local discriminability over the specified subspace and stimulus distribution. Invariance to basis changes is formally shown, with statistical sufficiency arguments delineating completeness and minimality of the resulting SPD operator as a representation-level summary.

Figure 1: Illustration of the transition from activation alignment to local geometry-based comparison—structured data is mapped via two representations (f, g) with distinct sensitivity structure; local Fisher ellipses capture directional amplification or suppression; these are aggregated as dataset-level signatures over a perturbation family.

The comparison between two models' geometric summaries is framed via the affine-invariant Riemannian (AIRM) log-spectral distance on SPD matrices, rendering the S-RAS score. This metric provides a normalized, multiplicative control over all lifted second-moment discrimination tasks, as formalized by an explicit sandwich bound: for any quadratic probe, multiplicative similarity under S-RAS bounds the discrepancy between representations.

Empirical Results: Artificial and Biological Application Regimes

Layer Correspondence and Task-Family Dependence

The method is first tested on the classic layer-matching task: identifying corresponding representations across independently trained CNNs using S-RAS as the similarity metric.

Figure 3: S-RAS and CKA layer-matching: mean similarity by layer distance and accuracy as a function of subspace family dimension, demonstrating sharp diagonal matching and improved off-diagonal suppression for local geometry-based signatures.

Under random perturbation families, S-RAS achieves substantial correspondence signal with increasing subspace dimension $K$ (e.g., $85.1\%$ accuracy at $K=768$ ), with PCA-basis family further increasing performance to $93.8\%$ , comparable to linear and RBF CKA. Notably, S-RAS produces stronger off-diagonal suppression, with the correct correspondence better separated from distractor layers—indicating sharper layer-identity disambiguation at the level of local evidence geometry. The PCA-basis family, aligned with principal directions of dataset variation, yields higher identification accuracy than random projections, quantifying the dependence of the sensitivity structure on the chosen perturbation family.

Figure 5: S-RAS similarity matrices under random-basis subspaces as dimension increases from 16 to 1024—strong diagonal concentration and off-diagonal suppression emerge as $K$ grows.

Figure 7: S-RAS similarity matrices under PCA-basis subspaces, showing even tighter diagonal concentration and maximal accuracy at high $K$ when probing the dominant data-variate axes.

Task-family dependence is dissected: as the family aligns more closely with intrinsic data structure (PCA), S-RAS finds correspondence more robustly; with random subspaces, performance tracks expected projection distortion, with accuracy increasing as the subspace grows and distortion vanishes.

Transferable Discriminative Probes

A crucial demonstration is that class-conditional sensitivity contrast derived from S-RAS can be externalized as a small set of explicit finite perturbations ("probes"), whose effects transfer to both held-out images and independently trained models. In the CIFAR-10 ResNet-18 versus ViT setting, the method constructs between-group contrasts in class-conditional projected Fisher summaries and extracts probe directions maximizing group separation in expected local discriminability. When applied to new data, these finite probes yield significant separation ($1.10$ mean image separation at $M_f(x) = J_f(x)^\top J_f(x)$ 0) between model families, outperforming all relevant random/control baselines (Wilcoxon $M_f(x) = J_f(x)^\top J_f(x)$ 1).

Figure 2: Held-out class-conditional diagnostic probes: class-specific contrast probes produce substantial separation across architectures and robust transfer to unseen data and models.

The effect persists across family dimensions, probe counts, and amplitude, evidencing strong family-level stability of the sensitivity structure.

Regime Shifts Under Robust Training

The framework is then applied to detect and characterize regime shifts induced by adversarial training (PGD, TRADES, MART) in ResNet-18, holding architecture constant. S-RAS-based probes derived from regime-level contrasts robustly reveal systematic shifts: contrast probes distinguishing standard from robust training yield high effect sizes on unseen data, with the strongest effects observed between standard and robust regimes.

Figure 8: Controlled regime shifts: class-conditional probes derived from S-RAS contrast reliably separate standard and robust-trained models in held-out evaluation, indicating regime-specific divergence in local evidence geometry.

Discrimination between different robust-training regimes is weaker but present (e.g., PGD versus TRADES)—these selective effects confirm that S-RAS does not trivially flag every regime change, but specifically highlights local evidence-use differences induced by the training objective.

Biological Population Codes: Allen Mouse Visual Cortex

Moving beyond artificial systems, the theory is instantiated using the Allen Brain Observatory static-gratings dataset, where the population response is parameterized by orientation, log-spatial frequency, and phase. For each experiment, the dataset-level Fisher summary is estimated from finite-difference response Jacobians and regularized covariance, yielding a $M_f(x) = J_f(x)^\top J_f(x)$ 2 sensitivity metric for each animal.

Figure 4: Biological sensitivity geometry in mouse visual cortex—experiment-level Fisher summaries and their regularized S-RAS similarities support area and depth identity at above-chance accuracy, outperforming activation baselines and highlighting the role of correlated variability.

Top-1 identity retrieval accuracy (cross-area, matched-count) reaches $M_f(x) = J_f(x)^\top J_f(x)$ 3 for the full Fisher metric, outperforming CKA ( $M_f(x) = J_f(x)^\top J_f(x)$ 4) and unwhitened geometry ( $M_f(x) = J_f(x)^\top J_f(x)$ 5). The best separability is obtained not always by the full coordinate set, but often by coordinate pairs (e.g., $M_f(x) = J_f(x)^\top J_f(x)$ 6), indicating that direction-specific sensitivity is essential for resolving biological population code differences. Shape-normalized (allocation) Fisher is more informative for diagonal AUC and provides incremental utility beyond activation and decoder-profile baselines.

Contrasts and Implications

The major implications are twofold:

Framework Completeness and Family Dependence: S-RAS, encoding projective Fisher/pullback geometry, is both minimal and maximally informative for the specified family of second-moment local discrimination tasks. Agreement under S-RAS implies uniform upper and lower multiplicative bounds across all SDP-probe quadratic discriminabilities, a property provably not shared by activation alignment measures, especially in the presence of post-hoc representation-space transformations.
Operational Utility and Interpretability: Empirical results show that S-RAS captures structure lost to pure activation comparisons—both in engineered and biological systems. Layer correspondence, regime shifts, and class-conditional axes of sensitivity can be precisely and robustly diagnosed, with successful transfer to new data and individuals. In mouse cortex, the method clarifies how structure in local evidence use aligns with anatomical area and depth, above and beyond classical activation- and mapping-based measures.

Limitations and Directions for Future Work

The proposed methodology operates at the dataset- and family-averaged second-moment level. It does not recover image-specific optimal directions, nonlinear finite-perturbation phenomena, or higher-order (beyond quadratic) sensitivity structure. Direct connection to nonlinear readout robustness, information geometry under non-Gaussian noise, and single-trial decoding remain as outstanding avenues. Practically, large-scale architectures and high-dimensional families may necessitate subsampling, lower-rank projections, and computational acceleration (JVP batching).

Future developments could extend the framework to probe higher-order local information structure, expand biological benchmarks to more complex and heterogeneous populations, and explore task-driven adaptivity in the specification of perturbation families. Integration with recent manifold-aware and functional similarity methods provides a direct path toward a more unified theory of evidence accessibility and invariance in neural systems.

Conclusion

This work establishes S-RAS as a mathematically rigorous, experimentally validated, and operationally useful framework for comparing the geometry of local evidence in artificial and biological neural representations. By prioritizing the minimal dataset-level summary of local discriminability, and enabling direct uniform certificates for second-moment task families, it offers interpretable insight into when, how, and why two systems rely on shared or divergent local stimulus evidence, independent of activation alignment. This complements and, in specific critical settings, surpasses the utility of classical representational similarity analysis.

Markdown Report Issue