- The paper presents a novel partial soft-matching distance method that selectively aligns corresponding neural units while filtering out noise and inactive elements.
- It leverages a partial optimal transport framework with an L-curve heuristic to optimize matching mass, enhancing interpretability through a correlational reformulation.
- Empirical validations on fMRI data and deep neural network models show that the method outperforms classical techniques in identifying true signal components.
Partial Soft-Matching Distance for Neural Representational Comparison with Partial Unit Correspondence
The alignment of neural representations across systems is central to quantifying convergence in learned or biological function. Canonical approaches—such as CKA, RSA, Procrustes, and CCA—provide rotation-invariant and permutation-invariant metrics, abstracting away correspondence between individual units and the axes on which information is encoded. This precludes interpretable neuron-level analysis and confounds understanding of whether homologous computations have emerged, particularly when representations are highly structured or axis-aligned.
The soft-matching distance, building on discrete optimal transport (OT), offers rotation sensitivity but, by design, forces all units across compared populations into correspondence. This mandatory matching produces spurious alignments in the presence of noisy, inactive, or genuinely non-corresponding units—a common scenario in both empirical neural recordings and deep models with architectural or training-induced specialization. Addressing this, the partial soft-matching distance relaxes the marginal constraints of classic OT, allowing only a fraction of the mass (neurons/voxels) to be matched, thus supporting comparison under partial correspondence with robust separation of well-matched and unmatched units.
Methodological Framework
Partial Optimal Transport Extension
Let X∈RM×NX, Y∈RM×NY denote two neural populations' tuning curves over M stimuli. The approach casts representational comparison as a partial optimal transport problem, optimizing over couplings T with total mass s≤1:
T∈Ts(NX,NY)min⟨C,T⟩F,
where Cij is a unit-pairwise cost (preferably cosine distance), and Ts denotes admissible partial couplings.
Unlike classical OT, row and column sums may not saturate, with the hyperparameter s controlling the total matched mass. Zeroed rows/columns in the optimal T∗ identify unmatched units, facilitating structured treatment of noise or divergence. This is particularly useful for empirical neurophysiology data (e.g., fMRI), where many units (voxels) are inherently unreliable or lack clear correspondence.
Regularization and L-Curve Heuristic
Selection of s is posed as a data-dependent regularization problem. An L-curve method traces the Pareto frontier between matching cost and unmatched mass, identifying the optimal s at maximal curvature—akin to classical Tikhonov-style regularization. This approach generalizes across settings without prior knowledge of noise magnitude or outlier abundance.
When tuning curves are normalized, the objective admits a correlational reinterpretation, transforming the transport optimization into maximization of mean-matched Pearson correlations. Consequently, the method directly quantifies alignment quality in familiar statistical terms, which enhances interpretability in neuroscientific and ANN analysis pipelines.
Empirical Validation
Simulations: Noise Robustness and Model Identification
Simulated populations with varying numbers of signal (matched) and noise (outlier) neurons demonstrate that forced matching (classical soft-matching) leads to inflated transport costs and spurious correspondences. In contrast, partial soft-matching robustly matches the true signal components, discarding noise and recovering ground-truth correspondences. In model selection scenarios with embedded noise, only the partial matching score reliably identifies the model with maximal shared signal, with classical methods often preferring mismatched models due to forced assignment of noise units.
fMRI Comparisons: Neural Population Alignment
Applying partial soft-matching to fMRI data from the Natural Scenes Dataset, the approach automatically excludes low-reliability voxels—those with low noise ceilings—when aligning homologous brain regions across subjects. As the matched mass hyperparameter decreases, the mean noise ceiling of retained voxels increases, demonstrating data-driven unit quality selection. Across all tested visual areas, this method yields higher within-area alignment precision compared to thresholding and forced soft-matching.
Further, the partial soft-matching ranking of voxels by alignment quality matches that of computationally expensive brute-force ablation (which exhaustively recomputes alignments per unit), but with much lower computational cost (O(n3logn) versus O(n4logn)).
Deep Neural Networks: Aligned and Divergent Computation
Analyses of ResNet-18 convolutional layers show that units ranked as highly matched by partial soft-matching yield nearly identical maximally exciting images (MEIs) across independently trained models, while unmatched units differ qualitatively. This provides concrete evidence that partial soft-matching discovers genuinely aligned features, and can partition convergent from divergent computational subspaces at scale. Correlation-based heuristics, by contrast, fail to recover these interpretable partitions.
Specificity and Privileged Axes
The method improves alignment specificity by correctly matching homologous regions and not forcing alignment of non-overlapping regions, a property lacking in alternative approaches. Analysis of subpopulations defined by the partial soft-matching scores further enables probing hypotheses about privileged axes in representation, demonstrating that alignment degrades under random orthogonal transformations even within the best-matched unit subsets. This supports the existence of axis-aligned—or “privileged”—coordinates in both artificial and biological systems.
Theoretical and Practical Implications
Partial soft-matching distance advances the toolkit for neural representational analysis by supporting meaningful comparison under partial correspondence. It overcomes the limitations of metric-based or full-matching methods in realistic datasets characterized by noise, outliers, and incomplete overlap. This is crucial for cross-system comparison (e.g., brain-to-brain, ANN-to-ANN, brain-to-ANN) where full one-to-one correspondence is the exception rather than the rule.
The method's ability to partition neural populations enables focused analyses of convergence versus divergence, offering new directions for both explainability and mechanistic hypothesis testing in neuroscience and machine learning. Its computational efficiency means it can be applied to modern large-scale neural datasets, a setting where brute-force approaches become rapidly intractable.
Limitations and Future Directions
The approach relies on heuristic selection of the matched mass via the L-curve, which may not be universally robust. Strict metric properties (e.g., the triangle inequality) are not guaranteed, though the use of newly introduced partial Wasserstein variants satisfying these properties presents an avenue for further extension. The cubic scaling in population size remains a bottleneck for extremely large datasets, motivating further work on scalable solvers.
Potential future work includes formalizing the metric properties of the partial matching under alternative cost functions, exploring Bayesian or learned selection of the matched fraction, and integrating partial soft-matching into downstream analytic and generative pipelines such as structure-preserving alignment in model distillation or cross-modal neuroscience analysis.
Conclusion
Partial soft-matching distance provides a rigorous, interpretable, and efficient approach for representational comparison that is robust to noise, outliers, and incomplete correspondence. Its empirical success in both neuroscience and deep learning settings—and its ability to support nuanced analyses of convergence, alignment specificity, and privileged axes—establish it as a principled alternative to existing metric- and matching-based representational similarity methods. Integration with scalable solvers and further formal extensions will render it applicable to ongoing advances in empirical neuroimaging and AI representation research.
(2602.19331)