Matrix Wasserstein Distance Formulation
- Wasserstein Distance Matrix Formulation is a framework extending scalar optimal transport to matrix-valued measures, rigorously quantifying minimal transport cost.
- It utilizes both primal and dual formulations, including 1-Wasserstein and Wasserstein-2 (Bures) metrics, to tackle transport with convex optimization and dynamic formulations.
- The approach underpins applications in quantum information, statistics, machine learning, and signal processing, supported by scalable algorithms and matrix geometry.
The Wasserstein distance matrix formulation provides a rigorous framework to quantify the minimal "cost" required to transport one matrix-valued distribution into another, generalizing the scalar optimal transport theory to matrix-valued and positive semidefinite matrix measures. This extension encompasses various regimes, including the 1-Wasserstein (Earth Mover’s), the 2-Wasserstein (Bures or Kantorovich-Bures), and their entropy-regularized analogues, with direct applications in quantum information, statistics, machine learning, and signal processing.
1. Matrix-Valued Probability Measures and Mass Transport
Matrix-valued probability densities and positive semidefinite (PSD) matrix-valued measures form the foundational objects in this framework. A matrix probability density is a map , where is the cone of Hermitian PSD matrices, subject to . In more generality, one considers PSD-valued Radon measures on , admitting total mass (Chen et al., 2017). Matrix-valued mass transport, both balanced and unbalanced, extends scalar optimal transport by introducing a “quantum source” term that allows for creation/destruction of matrix-valued “mass.” The unbalanced continuity equation is
where is a spatial matrix-valued flux, encodes flux along Hermitian directions , and is penalized via the nuclear norm.
2. Primal and Dual Formulations
2.1 Wasserstein-1: Matrix Formulation
The primal formulation for matricial is
subject to the matrix-valued flux continuity equation
Here, the nuclear (trace) norm induces convexity and regularity, and the variables encode classical and non-commutative (quantum) transport modes (Chen et al., 2017).
The classical dual (Kantorovich–Rubinstein type) for matrices is
subject to the gradient constraint
where . This operator-norm constraint specifies a non-commutative Lipschitz class.
2.2 Wasserstein-2: Bures and Matrix-Bures Formulations
For positive-definite matrices , the 2-Wasserstein/Bures metric is defined by
arising equivalently as the minimal mean-square transport cost between zero-mean Gaussians with covariances (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). The matrix Monge–Kantorovich problem also admits a Benamou–Brenier-type dynamic form: subject to a quantum continuity equation
with optimization over time-dependent density trajectories and skew-Hermitian velocities (Chen et al., 2017, Li et al., 2020).
3. Geometry, Metric Properties, and Commuting Cases
The matrix Wasserstein distances are bona fide metrics: they are nonnegative, symmetric, satisfy the triangle inequality, and vanish if and only if the arguments are equal. In the commuting (simultaneously diagonalizable) case, both and decouple into sums of scalar Wasserstein distances over eigenvalue densities:
where , are eigenvalues of , (Chen et al., 2017, Chen et al., 2017). In general, these matrix-valued distances metrize the weak-* topology and capture convergence of both density and mass.
The Riemannian geometry induced by is explicitly computable: the metric tensor at is given via the Lyapunov operator , where , yielding
The geodesic from to is
with constant speed along the path (Malagò et al., 2018).
4. Computational Aspects and Algorithms
The primal "flux" and dual formulations are convex and, in the discrete domain, reduce to linear or conic programs. The "dual of the dual" (flux) formulation allows significant reduction in variables: for scalar , from (coupling matrix) to (for degree- graphs). The matrix-analogous reduction is realized by expressing the transport in terms of two flux fields , especially efficient for high-dimensional and sparse data (Chen et al., 2017).
For (Bures), the barycenter of semidefinite matrices under weights is the unique solution of
which can be efficiently computed by a fixed-point iteration
(Bhatia et al., 2017, Maunu et al., 2022). The entropic regularization of matrix OT problems introduces strict convexity and differentiability, enabling the use of scalable iterative algorithms such as Sinkhorn scaling (Zhang, 2021, Quang, 2020).
5. Extensions: Unbalanced, Weighted, and Entropic Matrix OT
Matrix-valued Wasserstein metrics admit natural unbalanced and weighted generalizations. The weighted Wasserstein–Bures distance on is given by a dynamic (Benamou–Brenier) program over ,
subject to a generalized continuity equation incorporating both momentum and reaction (mass change). The functional is a convex quadratic in the momentum and reaction fields, regularized by weight matrices , corresponding to spatial and non-spatial modes (Li et al., 2020). The corresponding dual (Kantorovich) characterization enforces a pointwise matrix Hamilton–Jacobi inequality.
Entropy regularization yields strictly convex, differentiable variants of the Wasserstein distance between Gaussians, defined by
where , and the barycenter of several Gaussians is characterized by an explicit fixed-point equation (Quang, 2020).
6. Practical Applications and Implications
Matrix Wasserstein distances underpin data analysis methodologies in quantum information, imaging, covariance-based statistics, and network theory. In quantum settings, the Bures (Wasserstein-2) metric is especially significant for comparing quantum states. The barycenter (Wasserstein mean) construction provides intrinsic notion of averaging positive definite matrices in structure-preserving ways (Bhatia et al., 2017, Bhatia et al., 2019, Maunu et al., 2022). Scalable algorithms exploiting matrix structure have enabled application to high-dimensional imaging, network diffusion, and stochastic processes (Li et al., 2020).
Generalizations to tensor data, multi-marginal OT, and tree-structured metrics are also active areas, with the matrix structure enabling the capture of correlations and non-commutative relationships that scalar OT cannot (Takezawa et al., 2021).
7. Open Problems and Future Directions
Current open directions include the proper choice and interpretation of quantum gradient directions , characterizing robustness and geometric curvature in matrix-valued transport on networks, extensions to higher-order costs (e.g., Wasserstein-2 for general matrix-valued measures), and the design of efficient, scalable algorithms for unbalanced and entropic matrix OT in large-scale and infinite-dimensional regimes (Chen et al., 2017, Li et al., 2020, Quang, 2020). Theoretical analysis of convergence, regularity, and geometric structure continues to be an active research frontier.