Differentiable Optimization of Similarity Scores

Updated 10 December 2025

Differentiable Optimization of Similarity Scores is a framework for optimizing neural alignment metrics through gradient-based loss functions.
It integrates various similarity measures, including representational geometry, unit-level matching, and behavioral error metrics, into a unified benchmarking approach.
The method addresses challenges like bias, computational complexity, and superposition effects, providing actionable insights for both model-to-brain and inter-model comparisons.

Neural alignment metrics quantify the similarity, correspondence, or structural equivalence between neural representations—either within artificial neural networks, between networks and biological systems, or across distinct populations and modalities. These metrics probe alignment at multiple levels: representational geometry, unit correspondence, predictive mapping, behavioral consistency, attention distribution, concept locality, and time-varying embedding stability. The field is highly multidimensional: different metrics capture distinct, sometimes orthogonal, aspects of neural organization and functional correspondence, which is reflected in low correlations and contrasting operational regimes among them (Ahlert et al., 10 Jul 2024). This article details major classes of neural alignment metrics, their mathematical foundations, computational properties, practical domains of use, and their interplay in research and benchmarking contexts.

1. Representational Similarity and Geometric Metrics

Representational similarity metrics compare high-dimensional activation spaces—often stimulus-by-stimulus matrices—produced by neural populations, model layers, or brain data. The most prevalent approaches are as follows:

Centered Kernel Alignment (CKA): Measures the similarity of Gram (kernel) matrices built from representations of matched stimuli. The linear CKA score is the HSIC-normalized Frobenius inner product of the covariance matrices:

$\mathrm{CKA}(X,Y) = \frac{\mathrm{HSIC}(X X^\top, Y Y^\top)}{\sqrt{\mathrm{HSIC}(X X^\top, X X^\top) \, \mathrm{HSIC}(Y Y^\top, Y Y^\top)}}$

Correct application in low-data/high-dimensional settings requires an unbiased HSIC estimator that removes spurious inflation due to feature-sample ratio mismatches (Murphy et al., 2 May 2024, Chun et al., 20 Feb 2025). Biased CKA is sensitive to feature/sample imbalances and may report artificially high similarity on random or shuffled data.

Representational Similarity Analysis (RSA): Correlates the representational dissimilarity matrices (RDMs) derived from pairwise stimulus representations. RSA is invariant to invertible linear re-embeddings and measures the preservation of relational geometry (Bo et al., 21 Nov 2024, Wu et al., 4 Sep 2025).
Procrustes Distance: Seeks the optimal orthogonal transformation to align two whitened (centered and standardized) representations, minimizing the Frobenius distance. The similarity is often reported as $1 - d_{\mathrm{proc}}$ , maximizing at 1 for identical point clouds after rigid rotation (Bo et al., 21 Nov 2024).
Normalized Space Alignment (NSA): Decomposes into local (LNSA) and global (GNSA) measures comparing local intrinsic dimensionality and normalized pairwise distances, respectively. NSA can be used as a differentiable loss for training and structure-preserving analysis, with mathematically controlled invariances and unbiased mini-batch properties (Ebadulla et al., 7 Nov 2024).

These geometry-preserving metrics excel at distinguishing trained from untrained models and capturing structure reflected in downstream behavior. However, they lack sensitivity to individual neuron tuning (see below).

2. Unit-level and Mapping-based Metrics

Unit-to-unit, permutation-based, and mapping-based metrics seek to establish neuron-level correspondence, crucial for studies where basis-specific properties—such as tuning curves—matter.

Soft Matching Distance: Defines a metric based on the minimum average squared distance over a doubly stochastic (soft permutation) matching between neuronal tuning curves. For representations $X \in \mathbb{R}^{M \times N_x}$ and $Y \in \mathbb{R}^{M \times N_y}$ , the soft matching distance is:

$d_T(X, Y) = \sqrt{ \min_{P \in T(N_x, N_y)} \sum_{i, j} P_{ij} \| x_i - y_j \|_2^2 }$

where $T(N_x, N_y)$ is the set of doubly stochastic matrices. This is exactly the Wasserstein-2 distance and retains all metric properties (Khosla et al., 2023). For equal-size representations, the metric reduces to hard permutations. This approach captures single-unit tuning differences missed by rotation-invariant metrics.

Semi/Soft-Matching and Regression-based Metrics: Semi-matching maximizes pairwise matching via assignment constraints, while soft-matching relaxes this via optimal transport (Longon et al., 3 Oct 2025). Linear regression (ridge) metrics fit an unconstrained mapping to predict one population's responses from another, quantifying linear overlap irrespective of geometric structure (Bo et al., 21 Nov 2024, Wu et al., 4 Sep 2025).
Effect of Superposition: Mapping-based metrics are sensitive to the so-called "superposition problem," where representations of the same features can be distributed across different mixtures of neurons. This often deflates mapping-alignment scores. Disentanglement via sparse autoencoders can recover latent codes and boost alignment scores, revealing that basis-dependent metrics systematically underestimate alignment when superposition arrangements differ (Longon et al., 3 Oct 2025).

3. Specialized Alignment Measures: Graphs, Attention, and Interpretability

Certain domains require customized alignment measures suited to their structure or interpretability needs.

Graph Alignment Metrics: Subspace Alignment Measure (SAM) quantifies the geometric misalignment among the feature, graph, and label subspaces in graph neural networks. Principal angles and chordal distances are aggregated as the Frobenius norm of a triangle of subspace distances, predicting performance under controlled randomization (Qian et al., 2019). Node-alignment methods via neural networks use constrained quadratic or linear assignment and Gumbel-Sinkhorn relaxations for interpretable one-to-one or soft node matching (Wang et al., 13 Dec 2024).
Attention/Alignment in Sequence Models: For neural machine translation, metrics such as attention entropy and alignment agreement (with a statistical aligner) provide interpretable, quantitative probes into attention focus and correspondence with reference alignments. Correlation with translation-quality metrics probes the link between alignment and generation fidelity (Mishra, 24 Dec 2024).
Concept Alignment and Attribution: Robust evaluation of concept activation vectors (CAVs) in interpretable models requires more than probe accuracy. Alignment metrics include hard accuracy for context-invariant detection, spatial attribution/segmentation overlap for locational correctness, and augmentation robustness for invariance to image transforms. Translation-invariant probes and explicit localization mapping mitigate spurious correlation and provide stronger concept alignment (Lysnæs-Larsen et al., 6 Nov 2025).

4. Behavioral and Error Alignment Metrics

Alignment is not restricted to representational space. Behavioral and error pattern metrics compare output similarity and error structures across systems—human, animal, or model.

Misclassification Agreement (MA): Computes instance-level Cohen's $\kappa$ agreement over joint error labels for two classifiers. This reflects whether systems make the same mistakes on the same inputs beyond chance (Xu et al., 20 Sep 2024).
Class-Level Error Similarity (CLES): Compares class-by-class error distributions using Jensen-Shannon divergence, providing an instance-independent similarity score robust to class imbalance and overall accuracy (Xu et al., 20 Sep 2024).
Integrative Benchmarking: Large-scale neuroethological, vision/AI-human comparison platforms (e.g., Brain-Score, Mouse vs. AI) employ an ensemble of neural alignment, behavioral, and similarity metrics, revealing the multidimensionality of alignment and the necessity of multifaceted benchmarks (Ahlert et al., 10 Jul 2024, Schneider et al., 17 Sep 2025).

5. Theoretical and Spectral Analyses of Alignment

Recent work has deepened the theoretical understanding of alignment metrics, particularly regression-based predictivity, from a spectral perspective.

Spectral Decomposition of Regression Alignment: The normalized mean-squared error of linear regression from model to brain (or between models) decomposes as a sum over eigenmodes:

$E_g(p) = \sum_{i=1}^P W_i E_i(p)$

where $E_i(p)$ depends on model spectrum, training set size, and regularization, and $W_i$ quantifies the projection of neural data onto each model eigenvector. This framework guides the understanding of how representational structure constrains and enables alignment, indicating that both spectrum shape and vector alignment must be jointly considered for optimizing predictive success (Canatar et al., 2023).

Spectral Alignment for Diagnostics: Spectral Alignment (SA) between input batches and principal singular vectors of weights provides an early-warning indicator for impending loss divergence during neural network training. Sign diversity of the SA distribution is a mechanistically grounded predictor of pathologies, outperforming traditional scalar metrics (Qiu et al., 5 Oct 2025).

6. Aggregation, Benchmarking, and Interpretation

Alignment is inherently multidimensional. Pairwise correlations among distinct alignment metrics are often low (mean $\approx 0.2$ ) even when applied to the same models (Ahlert et al., 10 Jul 2024). Aggregating alignment for model ranking or benchmarking requires careful normalization and weighting:

Aggregation schemes include arithmetic mean, z-transformed mean, mean-rank, and weighted sum. The choice impacts whether behavioral or neural metrics dominate the overall score.
Interpretation: Researchers must report the full vector of alignment scores and carefully select metrics that match their scientific or application-driven objectives (e.g., global geometry for behavioral predictivity, unit-level alignment for mechanistic interpretability, error pattern similarity for ethical/safety assessments).
Guidelines: Geometry-preserving metrics (CKA, Procrustes, RSA) are robust for discriminating trained/untrained status and reflecting functionally relevant structure. Mapping metrics must consider the potential confound of superposition, and behavioral alignment should always be contextualized with representational results.

7. Practical Considerations, Limitations, and Downstream Use

Bias and Control: Many metrics (especially CKA) are subject to high-dimensionality and finite-sample bias. Proper control with null models, shuffled data, and unbiased estimators (e.g., unbiased HSIC for CKA) is required for valid inference (Murphy et al., 2 May 2024, Chun et al., 20 Feb 2025).
Superposition and Latent Recovery: When features are encoded via overlapping basis functions (superposition), basis-dependent metrics severely underestimate true alignment. Sparse autoencoders and dictionary learning techniques can recover latent codes and reveal hidden correspondence (Longon et al., 3 Oct 2025).
Computational Complexity: NSA and most geometric metrics scale quadratically with sample size, but some (e.g., optimal transport/soft-matching) require either network simplex solvers or entropic regularization (Sinkhorn), balancing tractability and fidelity (Khosla et al., 2023, Ebadulla et al., 7 Nov 2024).
Behavioral Validity: Alignment correlates with but does not guarantee functional/behavioral correspondence—metrics must be interpreted alongside domain-relevant behavioral scores, particularly in applications such as interpretability, safety, and cognitive modeling.

In conclusion, neural alignment metrics form a mathematically principled, algorithmically diverse, and context-sensitive toolkit for quantifying similarity, correspondence, and transformation-invariance across neural representations. Their rigorous definition, correct usage, and domain-specific adaptation are foundational for cross-model, model-to-brain, and biological versus artificial system comparison in contemporary neuroscience, machine learning, and AI research.