Matrix Wasserstein Distance Formulation

Updated 22 November 2025

Wasserstein Distance Matrix Formulation is a framework extending scalar optimal transport to matrix-valued measures, rigorously quantifying minimal transport cost.
It utilizes both primal and dual formulations, including 1-Wasserstein and Wasserstein-2 (Bures) metrics, to tackle transport with convex optimization and dynamic formulations.
The approach underpins applications in quantum information, statistics, machine learning, and signal processing, supported by scalable algorithms and matrix geometry.

The Wasserstein distance matrix formulation provides a rigorous framework to quantify the minimal "cost" required to transport one matrix-valued distribution into another, generalizing the scalar optimal transport theory to matrix-valued and positive semidefinite matrix measures. This extension encompasses various regimes, including the 1-Wasserstein (Earth Mover’s), the 2-Wasserstein (Bures or Kantorovich-Bures), and their entropy-regularized analogues, with direct applications in quantum information, statistics, machine learning, and signal processing.

1. Matrix-Valued Probability Measures and Mass Transport

Matrix-valued probability densities and positive semidefinite (PSD) matrix-valued measures form the foundational objects in this framework. A matrix probability density is a map $\rho: \Omega \to \mathcal{H}_+$ , where $\mathcal{H}_+$ is the cone of $n\times n$ Hermitian PSD matrices, subject to $\int_\Omega \operatorname{Tr}\rho(x)\,dx=1$ . In more generality, one considers PSD-valued Radon measures on $\Omega$ , admitting total mass $\int \operatorname{Tr}\rho < \infty$ (Chen et al., 2017). Matrix-valued mass transport, both balanced and unbalanced, extends scalar optimal transport by introducing a “quantum source” term $v(x)\in\mathcal{H}$ that allows for creation/destruction of matrix-valued “mass.” The unbalanced continuity equation is

$\rho_0 - \rho_1 = -\operatorname{div}_x u_1 + \nabla_L^*u_2 + v,$

where $u_1$ is a spatial matrix-valued flux, $u_2$ encodes flux along Hermitian directions $\{L_k\}$ , and $v$ is penalized via the nuclear norm.

2. Primal and Dual Formulations

2.1 Wasserstein-1: Matrix Formulation

The primal formulation for matricial $W_1$ is

$W^{\textrm{mat}}_1(\rho_0,\rho_1) = \inf_{u_1,u_2} \int_\Omega \left\|\begin{bmatrix} u_1(x) \ u_2(x) \end{bmatrix}\right\|_* dx$

subject to the matrix-valued flux continuity equation

$\rho_0 - \rho_1 + \nabla_x\!\cdot u_1 - \nabla_L^* u_2 = 0.$

Here, the nuclear (trace) norm induces convexity and regularity, and the variables $(u_1, u_2)$ encode classical and non-commutative (quantum) transport modes (Chen et al., 2017).

The classical dual (Kantorovich–Rubinstein type) for matrices is

$W^{\rm mat}_1(\rho_0,\rho_1) = \sup_{f\in C^1_c(\Omega,\mathcal{H})} \int_\Omega \operatorname{Tr}\left[ f(x)\bigl(\rho_0(x)-\rho_1(x)\bigr)\right]dx$

subject to the gradient constraint

$\left\|\begin{bmatrix}\nabla_x f\ \nabla_L f\end{bmatrix}\right\| \leq 1,$

where $\nabla_Lf=(L_1f-fL_1,\ldots,L_Nf-fL_N)$ . This operator-norm constraint specifies a non-commutative Lipschitz class.

2.2 Wasserstein-2: Bures and Matrix-Bures Formulations

For positive-definite matrices $A,B\in P(n)$ , the 2-Wasserstein/Bures metric is defined by

$d_{\rm BW}(A,B) = \left[\operatorname{Tr}A + \operatorname{Tr}B - 2\operatorname{Tr}(A^{1/2}BA^{1/2})^{1/2}\right]^{1/2},$

arising equivalently as the minimal mean-square transport cost between zero-mean Gaussians with covariances $A,B$ (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). The matrix Monge–Kantorovich problem also admits a Benamou–Brenier-type dynamic form: $W_2^2(\rho_0, \rho_1)=\inf_{\rho,v} \int_0^1 \operatorname{Tr}[\rho(t)v(t)^*v(t)] dt,$ subject to a quantum continuity equation

$\partial_t \rho(t) = \frac{1}{2}\nabla_L^*\!\left(\rho(t)v(t)+v(t)\rho(t)\right),$

with optimization over time-dependent density trajectories and skew-Hermitian velocities (Chen et al., 2017, Li et al., 2020).

3. Geometry, Metric Properties, and Commuting Cases

The matrix Wasserstein distances are bona fide metrics: they are nonnegative, symmetric, satisfy the triangle inequality, and vanish if and only if the arguments are equal. In the commuting (simultaneously diagonalizable) case, both $W_1^{\rm mat}$ and $W_2$ decouple into sums of scalar Wasserstein distances over eigenvalue densities: $W_1^{\rm mat}(\rho_0,\rho_1) = \sum_{i=1}^n W_1(\lambda^0_i, \lambda^1_i),$

$W_2^2(N(0,A),N(0,B)) = \sum_{i=1}^n W_2^2(N(0,\lambda^A_i), N(0,\lambda^B_i)),$

where $\{\lambda^A_i\}$ , $\{\lambda^B_i\}$ are eigenvalues of $A$ , $B$ (Chen et al., 2017, Chen et al., 2017). In general, these matrix-valued distances metrize the weak-* topology and capture convergence of both density and mass.

The Riemannian geometry induced by $W_2$ is explicitly computable: the metric tensor at $A$ is given via the Lyapunov operator $\mathcal{L}_A[H]=X$ , where $AX + XA=H$ , yielding

$g_A(H,K) = \operatorname{Tr}[\mathcal{L}_A[H] K].$

The geodesic from $A$ to $B$ is

$\Sigma(t) = [(1-t)I + tT]\, A \, [(1-t)I + tT], \quad T = A^{-1/2}(A^{1/2}BA^{1/2})^{1/2}A^{-1/2},$

with constant speed along the path (Malagò et al., 2018).

4. Computational Aspects and Algorithms

The primal "flux" and dual formulations are convex and, in the discrete domain, reduce to linear or conic programs. The "dual of the dual" (flux) formulation allows significant reduction in variables: for scalar $W_1$ , from $O(n^2)$ (coupling matrix) to $O(nd)$ (for degree- $d$ graphs). The matrix-analogous reduction is realized by expressing the transport in terms of two flux fields $(u_1,u_2)$ , especially efficient for high-dimensional and sparse data (Chen et al., 2017).

For $W_2$ (Bures), the barycenter of semidefinite matrices $\{A_j\}$ under weights $w_j$ is the unique solution $X$ of

$X = \sum_{j=1}^m w_j (X^{1/2}A_j X^{1/2})^{1/2},$

which can be efficiently computed by a fixed-point iteration

$S_{k+1} = S_k^{-1/2} \left[\sum_{j=1}^m w_j (S_k^{1/2}A_j S_k^{1/2})^{1/2}\right]^2 S_k^{-1/2}$

(Bhatia et al., 2017, Maunu et al., 2022). The entropic regularization of matrix OT problems introduces strict convexity and differentiability, enabling the use of scalable iterative algorithms such as Sinkhorn scaling (Zhang, 2021, Quang, 2020).

5. Extensions: Unbalanced, Weighted, and Entropic Matrix OT

Matrix-valued Wasserstein metrics admit natural unbalanced and weighted generalizations. The weighted Wasserstein–Bures distance ${\rm WB}_\Lambda$ on $\mathcal{M}(\Omega,\mathbb{S}_+^n)$ is given by a dynamic (Benamou–Brenier) program over $(G,q,R)$ ,

${\rm WB}_\Lambda^2(G_0,G_1)=\inf_{(G,q,R)} \int_0^1 \int_\Omega J_A( G, q, R ) dx dt$

subject to a generalized continuity equation incorporating both momentum and reaction (mass change). The functional $J_A$ is a convex quadratic in the momentum and reaction fields, regularized by weight matrices $A_1$ , $A_2$ corresponding to spatial and non-spatial modes (Li et al., 2020). The corresponding dual (Kantorovich) characterization enforces a pointwise matrix Hamilton–Jacobi inequality.

Entropy regularization yields strictly convex, differentiable variants of the Wasserstein distance between Gaussians, defined by

$W_{2,\varepsilon}^2(\mu_0,\mu_1) = \|m_0-m_1\|^2 + \operatorname{Tr}(C_0)+\operatorname{Tr}(C_1) - \frac{\varepsilon}{2}\operatorname{Tr}(M_{01}^\varepsilon) + \frac{\varepsilon}{2}\log\det(I+\tfrac12 M_{01}^\varepsilon),$

where $M_{01}^\varepsilon = -I + \left(I + \tfrac{16}{\varepsilon^2} C_0^{1/2} C_1 C_0^{1/2}\right)^{1/2}$ , and the barycenter of several Gaussians is characterized by an explicit fixed-point equation (Quang, 2020).

6. Practical Applications and Implications

Matrix Wasserstein distances underpin data analysis methodologies in quantum information, imaging, covariance-based statistics, and network theory. In quantum settings, the Bures (Wasserstein-2) metric is especially significant for comparing quantum states. The barycenter (Wasserstein mean) construction provides intrinsic notion of averaging positive definite matrices in structure-preserving ways (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). Scalable algorithms exploiting matrix structure have enabled application to high-dimensional imaging, network diffusion, and stochastic processes (Li et al., 2020).

Generalizations to tensor data, multi-marginal OT, and tree-structured metrics are also active areas, with the matrix structure enabling the capture of correlations and non-commutative relationships that scalar OT cannot (Takezawa et al., 2021).

7. Open Problems and Future Directions

Current open directions include the proper choice and interpretation of quantum gradient directions $\{L_k\}$ , characterizing robustness and geometric curvature in matrix-valued transport on networks, extensions to higher-order costs (e.g., Wasserstein-2 for general matrix-valued measures), and the design of efficient, scalable algorithms for unbalanced and entropic matrix OT in large-scale and infinite-dimensional regimes (Chen et al., 2017, Li et al., 2020, Quang, 2020). Theoretical analysis of convergence, regularity, and geometric structure continues to be an active research frontier.