Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bures–Wasserstein Distance

Updated 6 May 2026
  • Bures–Wasserstein distance is a canonical metric on positive-definite matrices that unifies optimal transport, quantum fidelity, and Riemannian geometry.
  • It offers multiple formulations—including Procrustes, SDP, and Gaussian optimal transport—that facilitate efficient computation of geodesics and barycenters.
  • Its applications extend to machine learning, quantum information, and statistical inference, enhancing covariance averaging and optimization in high-dimensional settings.

The Bures–Wasserstein distance is a canonical metric on the space of symmetric (or Hermitian) positive-definite or positive semi-definite matrices, unifying notions from quantum information theory, optimal transport, and Riemannian geometry. Formally, it arises as the geodesic distance on the manifold of positive-definite matrices equipped with a specific Riemannian structure, and coincides with the 2-Wasserstein distance between centered Gaussian laws parameterized by those matrices. It also connects directly to quantum fidelity and appears as a central object in statistical learning, optimization, and quantum statistical inference.

1. Formal Definition and Equivalent Characterizations

Let A,BA,B be n×nn \times n symmetric (or Hermitian) positive-definite, or more generally positive semi-definite, matrices. The Bures–Wasserstein distance is given by

dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).

Equivalent characterizations include:

  • Procrustes form: dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F, where F\|\cdot\|_F is the Frobenius norm and O(n)O(n) is the orthogonal (or unitary) group (Oostrum, 2020, Bhatia et al., 2017).
  • Optimal transport: dBW2(A,B)d_{BW}^2(A,B) coincides with the $2$-Wasserstein distance between zero-mean Gaussians with covariances AA and BB (Maunu et al., 2022, Tankala et al., 27 Sep 2025).
  • Quantum information: For density matrices (trace-one Hermitian PSD matrices) n×nn \times n0, the Bures distance specializes to n×nn \times n1, where n×nn \times n2 is the quantum fidelity (Kroshnin et al., 2019).
  • SDP formulation: An exact convex semidefinite program representation is available for n×nn \times n3 and its barycenter (Mohan, 2023).

2. Riemannian Geometry and Geodesics

The manifold of positive-definite matrices carries a canonical Riemannian metric such that n×nn \times n4 is its geodesic distance. Tangent vectors n×nn \times n5 at n×nn \times n6 have pairing

n×nn \times n7

where n×nn \times n8 is the Lyapunov operator: n×nn \times n9 (Oostrum, 2020, Han et al., 2021). The geodesic from dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).0 to dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).1 is given by

dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).2

and its length realizes dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).3 (Oostrum, 2020). For covariance matrices of different ranks, the set of BW-minimizing geodesics can be parametrized explicitly; uniqueness fails precisely when the intersection rank is not minimal (Thanwerdas et al., 2022).

3. Fundamental Metric and Analytic Properties

The Bures–Wasserstein distance is a genuine metric:

  • Nonnegativity and symmetry: dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).4, dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).5;
  • Triangle inequality: dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).6;
  • Strict convexity: dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).7 is strictly convex on the cone of positive semidefinite matrices;
  • Extension to PSD matrices: The mapping remains well-defined on the boundary (rank-deficient case) by continuity of the principal square root (&&&10&&&, Kroshnin et al., 2019).

On the trace-one (quantum state) subset, the BW metric induces the so-called Bures–Wasserstein angle,

dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).8

paralleling the Fisher–Rao angle for classical measures (Oostrum, 2020).

4. Bures–Wasserstein Barycenter (Fréchet Mean) Theory

Given matrices dBW(A,B)2=Tr(A)+Tr(B)2Tr((A1/2BA1/2)1/2).d_{BW}(A,B)^2 = \operatorname{Tr}(A) + \operatorname{Tr}(B) - 2\,\operatorname{Tr}\Bigl((A^{1/2} B\,A^{1/2})^{1/2}\Bigr).9 (or, in the probabilistic setting, a law dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F0 on positive semidefinite matrices), the BW barycenter is the unique minimizer

dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F1

which can be characterized by the fixed-point equation

dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F2

Practical computation is achieved by fixed-point or gradient-based iteration with guaranteed global convergence due to strict convexity (Bhatia et al., 2017, Maunu et al., 2022). Statistical theory (CLT, concentration, empirical barycenters) is fully developed when the barycenter is estimated from random samples, extending the classical dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F3 framework to matrices (Kroshnin et al., 2019).

5. Optimal Transport and Connections to Gaussian Geometry

The Bures–Wasserstein metric is intimately tied to optimal mass transport: for dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F4, dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F5,

dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F6

where optimal coupling is realized by a linear map determined by dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F7 (Tankala et al., 27 Sep 2025, Maunu et al., 2022).

Geodesics and the Riemannian structure associated with dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F8 equip the Gaussian family with explicit geodesic curves, closed-form distances, and tractable mean and barycenter computations, affording concrete algorithmic advantages in applications (Diao et al., 2023, Jiang et al., 4 Feb 2026).

6. Generalizations and Weighted Variants

Multiple generalizations of the Bures–Wasserstein geometry have been developed:

  • Generalized Bures–Wasserstein geometry (GBW): A one-parameter deformation parameterized by dBW(A,B)=minUO(n)A1/2B1/2UFd_{BW}(A,B) = \min_{U\in O(n)} \|A^{1/2} - B^{1/2} U\|_F9, with metric and distance

F\|\cdot\|_F0

adapting the cost from Euclidean to Mahalanobis norm; explicit formulas for gradients, exponential/logarithm maps, and curvature are available (Han et al., 2021).

  • Weighted and matrix-valued optimal transport: The weighted Wasserstein–Bures distance incorporates reaction and transport dynamics for matrix-valued measures, reducing to F\|\cdot\|_F1 in the reaction-only regime and providing a metric on matrix-valued measure spaces (Li et al., 2020).

7. Applications and Computational Practices

The Bures–Wasserstein distance underpins numerous applications:

  • Machine Learning: Classification, clustering, and averaging of covariance descriptors (SPD matrices) utilize the Bures mean and barycenter in BCI, vision, and domain adaptation tasks—offering robustness and speed advantages over affine-invariant metrics (Zheng et al., 2023).
  • GANs: Penalizing generators by F\|\cdot\|_F2 in feature space improves diversity and mitigates mode collapse (Meulemeester et al., 2020).
  • Variational Inference: Optimizing evidence lower bounds in Bures–Wasserstein space yields more stable and effective algorithms, particularly with importance sampling, as gradients remain stable for large sample sizes (Jiang et al., 4 Feb 2026).
  • Low-rank and convex optimization: Fixed-point and SDP-based formulations allow efficient computation in matrix recovery and learning tasks (Mohan, 2023).
  • Quantum Information: Bures–Wasserstein distance characterizes distances and fidelities between quantum states and is central in quantum statistical inference (Kroshnin et al., 2019, Afham et al., 2024).

Algorithmic schemes for barycenter and geodesic computation exploit explicit geodesics, gradient flows, and proximal/JKO steps, often accelerated compared to alternative metrics. Empirical studies support the theoretical robustness of BW-based averaging and inference in high-dimensional, low-rank, or near-singular regimes (Zheng et al., 2023).

8. Relationships with Classical and Quantum Geometries

The Bures–Wasserstein paradigm offers a non-commutative analog of Fisher–Hellinger geometry:

  • Classical limit: For diagonal matrices (probability vectors), F\|\cdot\|_F3 recovers the Hellinger distance, and the angle on the trace-one subset yields the Fisher–Rao metric (Oostrum, 2020).
  • Quantum generalization: Lifts from diagonal to general Hermitian matrices replace the orthogonal by unitary group, leading to a Riemannian submersion framework where quantum fidelity, Bures angle, and generalized quantum Rényi divergences are all sources of invariant distances along the BW geometry (Afham et al., 2024).
  • Comparative geometry: The BW structure provides a flat cone metric over the normalized sphere, a result mirrored in the cone construction for matrix-valued measure spaces (Li et al., 2020).

9. Computational and Algorithmic Considerations

Efficient computation of F\|\cdot\|_F4 leverages eigendecomposition, matrix square roots, and convex optimization:

  • F\|\cdot\|_F5 complexity for dense matrices via eigen/SVD methods; reduced complexity for low-rank or diagonal cases.
  • Newton–Schulz and polynomial iterations improve practical performance in large dimensions (Meulemeester et al., 2020).
  • Robutness: BW barycenters do not require full rank, avoid inverses, and are stable under near-singular perturbations (Zheng et al., 2023, Maunu et al., 2022).
  • SDP formulations make F\|\cdot\|_F6 readily incorporable as a constraint or regularizer in convex programs (Mohan, 2023).

10. Extensions: Block Structures, Generalized Fidelities, and Multivariate Generalizations

Recently, the geometric underpinnings of F\|\cdot\|_F7 have been extended to support:

  • Generalized fidelities: Unified block-matrix and Riemannian-geometric formalisms yield new families of quantum fidelities, with invariance and overlap properties, extending the Uhlmann, Holevo, and Matsumoto fidelities (Afham et al., 2024).
  • Multivariate and F\|\cdot\|_F8-Rényi divergences: The generalized BW metric by linearization at a base point underpins a class of distances applicable to multivariate and quantum Rényi divergence settings, broadening the analytic scope (Afham et al., 2024).

References:

  • "Bures–Wasserstein geometry for positive-definite Hermitian matrices and their trace-one subset" (Oostrum, 2020)
  • "Learning with symmetric positive definite matrices via generalized Bures–Wasserstein geometry" (Han et al., 2021)
  • "Statistical inference for Bures–Wasserstein barycenters" (Kroshnin et al., 2019)
  • "Barycenter Estimation of Positive Semi-Definite Matrices with Bures–Wasserstein Distance" (Zheng et al., 2023)
  • "Bures–Wasserstein Barycenters and Low-Rank Matrix Recovery" (Maunu et al., 2022)
  • "Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures–Wasserstein Loss" (Bréchet et al., 2023)
  • "The Bures Metric for Generative Adversarial Networks" (Meulemeester et al., 2020)
  • "A note on the Bures–Wasserstein metric" (Mohan, 2023)
  • "Dense associative memory on the Bures–Wasserstein space" (Tankala et al., 27 Sep 2025)
  • "On Wasserstein distances for affine transformations of random vectors" (Hamm et al., 2023)
  • "Riemannian-geometric generalizations of quantum fidelities and Bures–Wasserstein distance" (Afham et al., 2024)
  • "Forward-backward Gaussian variational inference via JKO in the Bures–Wasserstein Space" (Diao et al., 2023)
  • "Bures–Wasserstein minimizing geodesics between covariance matrices of different ranks" (Thanwerdas et al., 2022)
  • "On a general matrix-valued unbalanced optimal transport problem" (Li et al., 2020)
  • "Bures–Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications" (Jiang et al., 4 Feb 2026)
  • "On the Bures–Wasserstein distance between positive definite matrices" (Bhatia et al., 2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bures-Wasserstein Distance.