Normalized Bures Similarity (NBS) Overview

Updated 10 December 2025

Normalized Bures Similarity (NBS) is a metric that quantifies neural representation similarity using quantum-information fidelity and kernel summary statistics.
It achieves key invariances—including orthogonal rotation, translation, permutation, and scale—by aligning geometric (Riemannian) and statistical (quantum-information) perspectives.
Efficient computational methods such as nuclear norm-based procedures and differentiable optimization make NBS practical for comparing both artificial and biological neural systems.

Normalized Bures Similarity (NBS) is a geometric similarity measure for comparing neural representations, based on the quantum-information fidelity between covariance matrices of activation patterns. Unlike methods based on explicit unit-wise mapping, NBS quantifies similarity in kernel summary statistics, conferring invariance to orthogonal rotations, permutations, translational shifts, and scale. Its foundational identity reveals deep connections to both Riemannian geometry and quantum-information theory, allowing unification of mapping-based and kernel-based similarity frameworks. NBS enjoys properties including metric validity, mapping-free computation, and scale-rotation invariance, and further admits efficient computational schemes and differentiable optimization.

1. Mathematical Definition

Given neural activations from two systems over $M$ stimuli, stored in matrices $X\in\mathbb{R}^{M\times N_x}$ and %%%%2%%%%, the centered linear kernel (stimulus-by-stimulus covariance) matrices are

$K_X = C X X^\top C, \qquad K_Y = C Y Y^\top C,$

where $C = I_M - \frac{1}{M}\mathbf{1}\mathbf{1}^\top$ is the $M\times M$ centering matrix. The fidelity between two positive semidefinite (PSD) matrices is

$F(K_X, K_Y) = \mathrm{Tr}\left[\left(K_X^{1/2} K_Y K_X^{1/2}\right)^{1/2}\right].$

The normalized Bures similarity is then defined by

$\mathrm{NBS}(K_X, K_Y) = \frac{F(K_X, K_Y)}{\sqrt{\mathrm{Tr}\,K_X\,\mathrm{Tr}\,K_Y}}, \qquad \mathrm{NBS}\in[0,1].$

Alternatively,

$\mathrm{NBS}(X, Y) = \frac{\|X^\top C Y\|_*}{\sqrt{\mathrm{Tr}(X^\top C X)\,\mathrm{Tr}(Y^\top C Y)}},$

where $\|\cdot\|_*$ denotes the nuclear norm (sum of singular values). This construction is equivalent to the cosine of the Riemannian shape distance $\theta^*(X,Y)$ between centered neural configurations: $\mathrm{NBS}(K_X, K_Y) = \cos\theta^*(X, Y).$ The equivalence between NBS as kernel fidelity and as shape alignment is established rigorously (see (Harvey et al., 2023)).

2. Geometric and Statistical Interpretations

NBS is invariant under permutations and orthogonal rotations of neuron axes; centering removes translational degrees of freedom, and normalization ensures scale invariance. From the geometric perspective, NBS measures the cosine of the Riemannian (geodesic) angle $\theta^*$ between two centered configurations, corresponding to optimal Procrustes alignment in the shape manifold. Statistically, $K_X$ and $K_Y$ are empirical covariance matrices and $F(K_X,K_Y)$ measures quantum-information fidelity. The associated Bures distance

$B(K_X,K_Y) = \sqrt{\mathrm{Tr}\,K_X + \mathrm{Tr}\,K_Y - 2 F(K_X,K_Y)}$

is the 2-Wasserstein distance between zero-mean Gaussians with covariance $K_X$ and $K_Y$ . By Uhlmann’s theorem,

$F(K_X, K_Y) = \max_{\substack{X',\,Y' \ X' X'^\top = K_X,\, Y' Y'^\top = K_Y}} |\mathrm{Tr}[X'^\top Y']|,$

so NBS represents the maximum normalized Hilbert-Schmidt overlap achievable for neural activations with fixed covariances (Harvey et al., 2023).

3. Computational Procedures

NBS can be computed in two equivalent ways:

Kernel Fidelity Method:

Center data: $\tilde{X}=CX$ , $\tilde{Y}=CY$ .
Compute $K_X = \tilde{X}\tilde{X}^\top$ , $K_Y = \tilde{Y}\tilde{Y}^\top$ .
Eigendecompose $K_X=U\Lambda U^\top$ to get $K_X^{1/2}=U\Lambda^{1/2}U^\top$ .
Form $A = K_X^{1/2} K_Y K_X^{1/2}$ and compute $A^{1/2}$ .
Calculate $F = \mathrm{Tr}(A^{1/2})$ and denominator $\sqrt{\mathrm{Tr}K_X\, \mathrm{Tr}K_Y}$ .
Output NBS.

Cross-Covariance/Nuclear Norm Method:

Center $X, Y$ with $C$ .
Compute ${}_{XY}=X^\top C Y$ , ${}_X = X^\top C X$ , ${}_Y = Y^\top C Y$ .
Obtain singular values $\{\sigma_i\}$ of ${}_{XY}$ , set $\|{}_{XY}\|_* = \sum_i \sigma_i$ .
Compute traces $\mathrm{Tr}({}_X)$ , $\mathrm{Tr}({}_Y)$ .
Output NBS.

For $N_x, N_y \ll M$ , the nuclear norm method is computationally preferable. As code, NBS amounts to mean-centering, computing cross-covariance, applying SVD, and normalizing by the geometric mean of marginal nuclear norms (Cloos et al., 9 Jul 2024):

def NBS(X, Y, epsilon=1e-6):
    Xc = X - X.mean(axis=0)
    Yc = Y - Y.mean(axis=0)
    M = Xc.T @ Yc
    num = np.sum(np.linalg.svd(M, compute_uv=False))
    Cx = Xc @ Xc.T
    Cy = Yc @ Yc.T
    denom = np.sqrt(np.sum(np.linalg.svd(Cx, compute_uv=False)) *
                    np.sum(np.linalg.svd(Cy, compute_uv=False))) + epsilon
    return num / denom

Differentiable optimization is enabled by autograd-capable SVD implementations; the nuclear norm gradient is stable for distinct singular values (Cloos et al., 9 Jul 2024).

4. Sensitivity to Principal Components

NBS exhibits linear sensitivity to principal component (PC) variances. If $X = U \Sigma V^\top$ and a copy $\tilde X_k$ is constructed by scrambling the $k^{th}$ left singular vector (preserving variance $\sigma_k$ ), then

$\mathrm{NBS}(X, \tilde{X}_k) \approx \frac{\sum_{i\neq k} \lambda_i}{\sum_{i=1}^r \lambda_i},$

where $\lambda_i = \sigma_i^2$ are the eigenvalues of $XX^\top$ (Cloos et al., 9 Jul 2024). Thus the decrease in NBS upon destroying PC $k$ is linear in its variance. By contrast, CKA’s dependence is quadratic ( $\propto \lambda_k^2$ ), making NBS more sensitive to mid-range PCs than CKA, but less so than angular Procrustes, which is most strongly sensitive to low-variance directions.

5. Metric Properties, Theorems, and Comparison to Other Measures

NBS provides a true metric (satisfying symmetry and triangle inequality) over stimulus–response geometries: $\mathrm{NBS}(K_X, K_Y) = \cos\theta^*(X, Y).$ The associated Bures distance $B(K_X, K_Y)$ is dual to the Procrustes “size-and-shape” distance $\mathcal{P}(X, Y)$ : $B(K_X, K_Y) = \mathcal{P}(X, Y),$ valid for neural configurations of unequal width $N_x \neq N_y$ (Harvey et al., 2023). Asymptotically, with $M\to\infty$ or $N\to\infty$ , the normalized Bures distance converges to its limiting form under the law of large numbers. Empirically, NBS values between random and real data start near 0, approach 0.98–1.0 under optimization, and dataset-dependent thresholds exist for meaningful “task-relevant” encoding.

Compared to other metrics:

RSA compares vectorized representational dissimilarity matrices, does not define a true metric, and ignores global scaling.
CKA computes

$\mathrm{CKA}(K_X, K_Y) = \frac{\mathrm{Tr}(K_X K_Y)}{\sqrt{\mathrm{Tr}(K_X^2)\,\mathrm{Tr}(K_Y^2)}}$

and is the cosine of the Hilbert–Schmidt angle. Unlike NBS, CKA neither exploits PSD geometry nor satisfies the triangle inequality, and it can be insensitive to alignment of dominant covariance subspaces. Tight bounds relate CKA and NBS:

$\frac{\mathrm{CKA}}{\sqrt{\mathrm{rank}(K_X)\,\mathrm{rank}(K_Y)}} \leq \mathrm{NBS}^2 \leq \min\{\mathrm{rank}(K_X), \mathrm{rank}(K_Y)\} \mathrm{CKA},$

and empirical discrepancies can be two- to three-fold.

CCA fits optimal linear mappings to maximize correlation in a shared subspace; mapping-based, affine-invariant, requiring generalized eigenproblems, and returns multiple coefficients. NBS is mapping-free, giving a single overlap scalar.

NBS should be preferred when metric validity, PSD-manifold geometry, and mapping-free similarity are desired.

6. Empirical Behavior, Optimization, and Interpretation

Differentiable optimization allows maximizing NBS directly. Under optimization, synthetic data initially learns the highest-variance PC of the target dataset, with NBS capturing lower-variance PCs more rapidly than CKA but less so than angular Procrustes. While high NBS scores ( $>0.9$ ) can be achieved, they do not guarantee encoding of all task-relevant dimensions; “overfitting” to highest-variance PCs is possible. No single threshold for “good” NBS exists; the appropriate value depends on dataset structure (e.g., $0.6$ NBS suffices for high task-decoding accuracy in some prefrontal recordings, while others require $>0.8$ ).

Empirical scatter plots show that CKA and NBS are correlated, but substantial envelope width remains due to kernel ranks and matrix square-root non-commutativity. Joint optimization experiments indicate that high angular Procrustes score entails high NBS and CKA, but not vice-versa; very high CKA or NBS can still omit lower-variance, task-relevant structure (Cloos et al., 9 Jul 2024).

7. Applications and Limitations

NBS is applicable for quantifying similarity between neural representations, including artificial and biological systems, without explicit neuron correspondences. It is effective for characterizing neural encoding overlap under PSD-manifold geometry. NBS’s mapping-free nature, scale and rotation invariance, and metric validity offer distinct advantages over ad hoc measures. However, NBS (like kernel-based measures) is susceptible to “overfitting” dominant PCs and may not ensure task-relevant dimensionality recovery. For practitioners, careful interpretation is required, especially when optimizing representations: additional analysis concerning encoding of low-variance dimensions and task variables is necessary for robust assessment.

NBS unifies geometric (shape manifold) and statistical (quantum-information and Wasserstein) perspectives on neural similarity, and its properties are increasingly prominent in comparative studies of neural representations in deep learning and neuroscience (Harvey et al., 2023, Cloos et al., 9 Jul 2024).