Papers
Topics
Authors
Recent
2000 character limit reached

Matrix Wasserstein Distance Formulation

Updated 22 November 2025
  • Wasserstein Distance Matrix Formulation is a framework extending scalar optimal transport to matrix-valued measures, rigorously quantifying minimal transport cost.
  • It utilizes both primal and dual formulations, including 1-Wasserstein and Wasserstein-2 (Bures) metrics, to tackle transport with convex optimization and dynamic formulations.
  • The approach underpins applications in quantum information, statistics, machine learning, and signal processing, supported by scalable algorithms and matrix geometry.

The Wasserstein distance matrix formulation provides a rigorous framework to quantify the minimal "cost" required to transport one matrix-valued distribution into another, generalizing the scalar optimal transport theory to matrix-valued and positive semidefinite matrix measures. This extension encompasses various regimes, including the 1-Wasserstein (Earth Mover’s), the 2-Wasserstein (Bures or Kantorovich-Bures), and their entropy-regularized analogues, with direct applications in quantum information, statistics, machine learning, and signal processing.

1. Matrix-Valued Probability Measures and Mass Transport

Matrix-valued probability densities and positive semidefinite (PSD) matrix-valued measures form the foundational objects in this framework. A matrix probability density is a map ρ:ΩH+\rho: \Omega \to \mathcal{H}_+, where H+\mathcal{H}_+ is the cone of n×nn\times n Hermitian PSD matrices, subject to ΩTrρ(x)dx=1\int_\Omega \operatorname{Tr}\rho(x)\,dx=1. In more generality, one considers PSD-valued Radon measures on Ω\Omega, admitting total mass Trρ<\int \operatorname{Tr}\rho < \infty (Chen et al., 2017). Matrix-valued mass transport, both balanced and unbalanced, extends scalar optimal transport by introducing a “quantum source” term v(x)Hv(x)\in\mathcal{H} that allows for creation/destruction of matrix-valued “mass.” The unbalanced continuity equation is

ρ0ρ1=divxu1+Lu2+v,\rho_0 - \rho_1 = -\operatorname{div}_x u_1 + \nabla_L^*u_2 + v,

where u1u_1 is a spatial matrix-valued flux, u2u_2 encodes flux along Hermitian directions {Lk}\{L_k\}, and vv is penalized via the nuclear norm.

2. Primal and Dual Formulations

2.1 Wasserstein-1: Matrix Formulation

The primal formulation for matricial W1W_1 is

W1mat(ρ0,ρ1)=infu1,u2Ω[u1(x) u2(x)]dxW^{\textrm{mat}}_1(\rho_0,\rho_1) = \inf_{u_1,u_2} \int_\Omega \left\|\begin{bmatrix} u_1(x) \ u_2(x) \end{bmatrix}\right\|_* dx

subject to the matrix-valued flux continuity equation

ρ0ρ1+x ⁣u1Lu2=0.\rho_0 - \rho_1 + \nabla_x\!\cdot u_1 - \nabla_L^* u_2 = 0.

Here, the nuclear (trace) norm induces convexity and regularity, and the variables (u1,u2)(u_1, u_2) encode classical and non-commutative (quantum) transport modes (Chen et al., 2017).

The classical dual (Kantorovich–Rubinstein type) for matrices is

W1mat(ρ0,ρ1)=supfCc1(Ω,H)ΩTr[f(x)(ρ0(x)ρ1(x))]dxW^{\rm mat}_1(\rho_0,\rho_1) = \sup_{f\in C^1_c(\Omega,\mathcal{H})} \int_\Omega \operatorname{Tr}\left[ f(x)\bigl(\rho_0(x)-\rho_1(x)\bigr)\right]dx

subject to the gradient constraint

[xf Lf]1,\left\|\begin{bmatrix}\nabla_x f\ \nabla_L f\end{bmatrix}\right\| \leq 1,

where Lf=(L1ffL1,,LNffLN)\nabla_Lf=(L_1f-fL_1,\ldots,L_Nf-fL_N). This operator-norm constraint specifies a non-commutative Lipschitz class.

2.2 Wasserstein-2: Bures and Matrix-Bures Formulations

For positive-definite matrices A,BP(n)A,B\in P(n), the 2-Wasserstein/Bures metric is defined by

dBW(A,B)=[TrA+TrB2Tr(A1/2BA1/2)1/2]1/2,d_{\rm BW}(A,B) = \left[\operatorname{Tr}A + \operatorname{Tr}B - 2\operatorname{Tr}(A^{1/2}BA^{1/2})^{1/2}\right]^{1/2},

arising equivalently as the minimal mean-square transport cost between zero-mean Gaussians with covariances A,BA,B (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). The matrix Monge–Kantorovich problem also admits a Benamou–Brenier-type dynamic form: W22(ρ0,ρ1)=infρ,v01Tr[ρ(t)v(t)v(t)]dt,W_2^2(\rho_0, \rho_1)=\inf_{\rho,v} \int_0^1 \operatorname{Tr}[\rho(t)v(t)^*v(t)] dt, subject to a quantum continuity equation

tρ(t)=12L ⁣(ρ(t)v(t)+v(t)ρ(t)),\partial_t \rho(t) = \frac{1}{2}\nabla_L^*\!\left(\rho(t)v(t)+v(t)\rho(t)\right),

with optimization over time-dependent density trajectories and skew-Hermitian velocities (Chen et al., 2017, Li et al., 2020).

3. Geometry, Metric Properties, and Commuting Cases

The matrix Wasserstein distances are bona fide metrics: they are nonnegative, symmetric, satisfy the triangle inequality, and vanish if and only if the arguments are equal. In the commuting (simultaneously diagonalizable) case, both W1matW_1^{\rm mat} and W2W_2 decouple into sums of scalar Wasserstein distances over eigenvalue densities: W1mat(ρ0,ρ1)=i=1nW1(λi0,λi1),W_1^{\rm mat}(\rho_0,\rho_1) = \sum_{i=1}^n W_1(\lambda^0_i, \lambda^1_i),

W22(N(0,A),N(0,B))=i=1nW22(N(0,λiA),N(0,λiB)),W_2^2(N(0,A),N(0,B)) = \sum_{i=1}^n W_2^2(N(0,\lambda^A_i), N(0,\lambda^B_i)),

where {λiA}\{\lambda^A_i\}, {λiB}\{\lambda^B_i\} are eigenvalues of AA, BB (Chen et al., 2017, Chen et al., 2017). In general, these matrix-valued distances metrize the weak-* topology and capture convergence of both density and mass.

The Riemannian geometry induced by W2W_2 is explicitly computable: the metric tensor at AA is given via the Lyapunov operator LA[H]=X\mathcal{L}_A[H]=X, where AX+XA=HAX + XA=H, yielding

gA(H,K)=Tr[LA[H]K].g_A(H,K) = \operatorname{Tr}[\mathcal{L}_A[H] K].

The geodesic from AA to BB is

Σ(t)=[(1t)I+tT]A[(1t)I+tT],T=A1/2(A1/2BA1/2)1/2A1/2,\Sigma(t) = [(1-t)I + tT]\, A \, [(1-t)I + tT], \quad T = A^{-1/2}(A^{1/2}BA^{1/2})^{1/2}A^{-1/2},

with constant speed along the path (Malagò et al., 2018).

4. Computational Aspects and Algorithms

The primal "flux" and dual formulations are convex and, in the discrete domain, reduce to linear or conic programs. The "dual of the dual" (flux) formulation allows significant reduction in variables: for scalar W1W_1, from O(n2)O(n^2) (coupling matrix) to O(nd)O(nd) (for degree-dd graphs). The matrix-analogous reduction is realized by expressing the transport in terms of two flux fields (u1,u2)(u_1,u_2), especially efficient for high-dimensional and sparse data (Chen et al., 2017).

For W2W_2 (Bures), the barycenter of semidefinite matrices {Aj}\{A_j\} under weights wjw_j is the unique solution XX of

X=j=1mwj(X1/2AjX1/2)1/2,X = \sum_{j=1}^m w_j (X^{1/2}A_j X^{1/2})^{1/2},

which can be efficiently computed by a fixed-point iteration

Sk+1=Sk1/2[j=1mwj(Sk1/2AjSk1/2)1/2]2Sk1/2S_{k+1} = S_k^{-1/2} \left[\sum_{j=1}^m w_j (S_k^{1/2}A_j S_k^{1/2})^{1/2}\right]^2 S_k^{-1/2}

(Bhatia et al., 2017, Maunu et al., 2022). The entropic regularization of matrix OT problems introduces strict convexity and differentiability, enabling the use of scalable iterative algorithms such as Sinkhorn scaling (Zhang, 2021, Quang, 2020).

5. Extensions: Unbalanced, Weighted, and Entropic Matrix OT

Matrix-valued Wasserstein metrics admit natural unbalanced and weighted generalizations. The weighted Wasserstein–Bures distance WBΛ{\rm WB}_\Lambda on M(Ω,S+n)\mathcal{M}(\Omega,\mathbb{S}_+^n) is given by a dynamic (Benamou–Brenier) program over (G,q,R)(G,q,R),

WBΛ2(G0,G1)=inf(G,q,R)01ΩJA(G,q,R)dxdt{\rm WB}_\Lambda^2(G_0,G_1)=\inf_{(G,q,R)} \int_0^1 \int_\Omega J_A( G, q, R ) dx dt

subject to a generalized continuity equation incorporating both momentum and reaction (mass change). The functional JAJ_A is a convex quadratic in the momentum and reaction fields, regularized by weight matrices A1A_1, A2A_2 corresponding to spatial and non-spatial modes (Li et al., 2020). The corresponding dual (Kantorovich) characterization enforces a pointwise matrix Hamilton–Jacobi inequality.

Entropy regularization yields strictly convex, differentiable variants of the Wasserstein distance between Gaussians, defined by

W2,ε2(μ0,μ1)=m0m12+Tr(C0)+Tr(C1)ε2Tr(M01ε)+ε2logdet(I+12M01ε),W_{2,\varepsilon}^2(\mu_0,\mu_1) = \|m_0-m_1\|^2 + \operatorname{Tr}(C_0)+\operatorname{Tr}(C_1) - \frac{\varepsilon}{2}\operatorname{Tr}(M_{01}^\varepsilon) + \frac{\varepsilon}{2}\log\det(I+\tfrac12 M_{01}^\varepsilon),

where M01ε=I+(I+16ε2C01/2C1C01/2)1/2M_{01}^\varepsilon = -I + \left(I + \tfrac{16}{\varepsilon^2} C_0^{1/2} C_1 C_0^{1/2}\right)^{1/2}, and the barycenter of several Gaussians is characterized by an explicit fixed-point equation (Quang, 2020).

6. Practical Applications and Implications

Matrix Wasserstein distances underpin data analysis methodologies in quantum information, imaging, covariance-based statistics, and network theory. In quantum settings, the Bures (Wasserstein-2) metric is especially significant for comparing quantum states. The barycenter (Wasserstein mean) construction provides intrinsic notion of averaging positive definite matrices in structure-preserving ways (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). Scalable algorithms exploiting matrix structure have enabled application to high-dimensional imaging, network diffusion, and stochastic processes (Li et al., 2020).

Generalizations to tensor data, multi-marginal OT, and tree-structured metrics are also active areas, with the matrix structure enabling the capture of correlations and non-commutative relationships that scalar OT cannot (Takezawa et al., 2021).

7. Open Problems and Future Directions

Current open directions include the proper choice and interpretation of quantum gradient directions {Lk}\{L_k\}, characterizing robustness and geometric curvature in matrix-valued transport on networks, extensions to higher-order costs (e.g., Wasserstein-2 for general matrix-valued measures), and the design of efficient, scalable algorithms for unbalanced and entropic matrix OT in large-scale and infinite-dimensional regimes (Chen et al., 2017, Li et al., 2020, Quang, 2020). Theoretical analysis of convergence, regularity, and geometric structure continues to be an active research frontier.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Wasserstein Distance Matrix Formulation.