Square Wasserstein Distance Matrices

Updated 25 September 2025

Square Wasserstein distance matrices are defined as matrices encoding squared optimal transport costs based on the Benamou–Brenier formulation.
They extend to generalized and matrix-valued forms, allowing comparisons of distributions with unequal mass and quantum states using efficient computational techniques.
These matrices underpin manifold learning, clustering, imaging, and information geometry by capturing non-Euclidean distances and Riemannian structures.

Square Wasserstein distance matrices are central data structures in optimal transport, encoding the pairwise Wasserstein distances—typically the square ( $W_2^2$ )—between a collection of distributions or structured measures. These matrices play foundational roles in clustering, manifold learning, image analysis, quantum information, kernel methods, and matrix analysis. They bridge dynamic and static optimal transport, encode non-Euclidean geometry, and enable computational strategies for large-scale and high-dimensional data problems.

1. The Classical Square Wasserstein Distance and Its Matrix Structure

The quadratic Wasserstein distance $W_2$ arises from the Monge-Kantorovich framework and is defined for measures $\mu_0, \mu_1$ with equal mass. The square of the distance admits the dynamic Benamou–Brenier formulation: $W_2^2(\mu_0, \mu_1) = \inf_{(\mu_t, v_t)} \int_0^1 dt \int_{\mathbb{R}^d} |v_t(x)|^2 \, d\mu_t(x)$ subject to the continuity equation

$\partial_t \mu_t + \nabla \cdot (v_t \mu_t) = 0, \quad \mu_{t=0} = \mu_0,\, \mu_{t=1} = \mu_1.$

Computing all pairwise $W_2^2$ values over a set $\{\mu_i\}_{i=1}^N$ yields a symmetric, positive semidefinite matrix with zero diagonal, whose off-diagonal entries encode the minimal transport costs. These matrices serve as distance or kernel matrices for downstream tasks such as clustering, embedding, and statistical analysis, reflecting the underlying geometry and “energy” landscape among the input measures.

2. Extensions: Generalized and Matrix-valued Wasserstein Matrices

The limitation of $W_p$ to equal-mass measures is addressed by introducing generalized Wasserstein distances $W_p^{a,b}$ , which combine standard transport cost with total variation penalties: $T_{a,b}(\mu, \nu) = \inf_{\tilde{\mu} \leq \mu,\, \tilde{\nu} \leq \nu} \left\{a|\mu - \tilde{\mu}| + a|\nu - \tilde{\nu}| + b W_p(\tilde{\mu}, \tilde{\nu})\right\}, \quad W_p^{a,b}(\mu, \nu) = \left[T_{a,b}(\mu, \nu)\right]^{1/p}$ This formulation admits a generalized Benamou-Brenier formula, minimizing a functional that penalizes both kinetic energy and source mass creation/removal: $\mathcal{B}_{a,b}[\mu, v, h] = a^2 \int_0^1 \int_{\mathbb{R}^d} |h_t(x)| dx\, dt + b^2 \int_0^1 \int_{\mathbb{R}^d} |v_t(x)|^2 d\mu_t(x) dt$ with the modified evolution

$\partial_t \mu_t + \nabla \cdot (v_t \mu_t) = h_t$

This enables the construction of “unbalanced” square Wasserstein matrices—i.e., that can compare measures of possibly unequal total mass—which arise in systems featuring sources and sinks, dissipative transport, or intensity-varying data (Piccoli et al., 2013).

3. Square Wasserstein Matrices in Matrix and Operator Domains

For noncommutative analogues, densities are replaced by density matrices (positive definite or positive semidefinite matrices, e.g., in quantum information). The square Wasserstein distance between density matrices $p_0, p_1$ is defined by

$W_2(p_0, p_1)^2 = \inf \int_0^1 \mathrm{tr}[p(t) v(t)^* v(t)]\, dt$

where $p(t)$ solves a suitable matrix continuity equation, and $v(t)$ is an operator-valued velocity. The minimization admits convex reformulation and strong duality, leading to mathematically well-posed noncommutative generalizations (Chen et al., 2017).

Analogous generalizations exist for the Wasserstein-1 metric on matrices and measures valued in matrix spaces, with associated dual and “dual of the dual” (flux) characterizations that ensure computational amenability and theoretical rigor (Chen et al., 2017).

These constructions allow the development of square Wasserstein distance matrices for sets of density matrices, spectral data, or even matrix-valued images.

4. Geometric and Riemannian Structure in Wasserstein Matrices

The quadratic Wasserstein distance endows natural Riemannian geometry on various spaces. For multivariate Gaussian measures (with covariance matrices $\Sigma$ ), the $W_2^2$ -distance specializes to

$W^2((\mu_1, \Sigma_1), (\mu_2, \Sigma_2)) = \|\mu_1 - \mu_2\|^2 + \mathrm{Tr}[\Sigma_1 + \Sigma_2 - 2 (\Sigma_1^{1/2} \Sigma_2 \Sigma_1^{1/2})^{1/2}]$

The corresponding space of symmetric positive-definite matrices $\mathrm{SPD}(n)$ , with this metric, forms a Riemannian manifold with explicit geodesics, exponential and logarithm maps, parallel transport, explicit tensor formulas, and controllable curvature (determined by, for instance, the minimal eigenvalue) (Malagò et al., 2018, Luo et al., 2020).

The conic and geodesic spaces induced by matrix-valued Wasserstein-Bures metrics, as developed for matrix-valued Radon measures, have complete geodesic structures, well-defined tangent spaces, and admit Riemannian interpretations important for imaging and quantum information (Li et al., 2020, Thanwerdas et al., 2022).

5. Efficient Computation and Approximation of Distance Matrices

Computation of square Wasserstein matrices is challenging due to high-dimensional optimal transport. Key algorithmic strategies include:

Entropic Regularization and Sinkhorn Algorithm: Efficient regularization of the transport problem (adding entropy) allows for scalable computation of regularized Wasserstein distances via Sinkhorn iterations, critical for discriminant analysis tasks and matrix construction in high dimensions (Flamary et al., 2016).
Slicing and Projection Methods: Approximations such as the sliced Wasserstein, mixture sliced Wasserstein (MSW), and double-sliced mixture Wasserstein (DSMW) distances project high-dimensional problems onto random lines, where closed-form solutions are leveraged. These approximations are rigorously characterized as strongly or weakly equivalent to the exact distances, and preserve topological properties for the purpose of building robust square distance matrices among Gaussian mixtures (Piening et al., 11 Apr 2025).
Low-Rank Completion and Nyström Methods: Since Wasserstein distance matrices often have low-rank structure linked to underlying geometric embedding, sampling a small subset of entries or columns (upper triangle samples, Nyström columns) suffices to approximate the entire matrix efficiently. Stability guarantees for manifold learning embeddings constructed from these approximations are established, with significant practical impact on large-scale applications (Rana et al., 23 Sep 2025).
Fast PDE-based Linearization: For measures that are close, linearizing the $W_2$ distance yields a negative weighted Sobolev norm, computable via fast elliptic PDE solvers, and enables efficient construction of approximate Wasserstein matrices for large grids or images (Greengard et al., 2022).

Efficient computation makes it feasible to deploy square Wasserstein matrices in manifold learning, clustering, and other computationally intensive data analysis tasks.

6. Applications Across Domains

Square Wasserstein distance matrices are exploited in a wide array of domains:

Discriminant and Kernel Methods: Wasserstein discriminant analysis (WDA) employs regularized Wasserstein distances in the construction of scatter matrices for supervised dimensionality reduction, leveraging the ability of the Wasserstein metric to capture global and local data geometry (Flamary et al., 2016). Wasserstein exponential kernels and Wasserstein feature maps utilize $W_2^2$ in the construction of positive-definite kernels for classification tasks, outperforming traditional Euclidean-based kernels in structured data settings (Plaen et al., 2020).
Manifold Learning and Visualization: Wasserstein-based t-SNE and related manifold learning techniques rely on square Wasserstein distance matrices to capture both central tendency and spread (mean, covariance, higher moments) among hierarchical or distributional data units, yielding more informative low-dimensional embeddings (Bachmann et al., 2022).
Random Matrix Theory and Quantum Information: In spectral analysis of random matrices drawn from classical compact groups, the square Wasserstein distance between empirical spectral measures and uniformity is linked precisely to Fourier coefficients and discrepancies in eigenvalue spacings, and connects to properties of characteristic polynomials and moments (Borda, 2023). In quantum settings, Wasserstein distances generalize to comparisons among quantum states, linking to quantum channel and control problems (Chen et al., 2017).
Imaging, Clustering, and Perceptual Metrics: Image comparison, clustering and structural analysis, and the evaluation of differences between distributions in deep learning pipelines leverage square Wasserstein matrices for both direct and kernelized comparison, with applications ranging from medical imaging to generative modeling (Oh et al., 2019, Piening et al., 11 Apr 2025).

7. Theoretical Properties and Equivalence Results

Construction and analysis of square Wasserstein distance matrices have illuminated several deeper structural themes:

Equivalence and Approximation: Approximate versions (e.g., through slicing, kernelization, regularization) are shown to be weakly or strongly equivalent to exact Wasserstein matrices under controlled conditions; this ensures that substituting lower-cost computations does not violate the geometry or qualitative behavior needed for downstream analysis (Piening et al., 11 Apr 2025).
Geodesic Structure and Interpolation: In both full-rank and fixed-rank settings, the set of minimizing geodesics under the Bures–Wasserstein metric is characterized explicitly, and uniqueness conditions are fully described, with direct implications for statistical estimation and interpolation in structured matrix spaces (Thanwerdas et al., 2022).
Topology and Conic Structure: The metric geometry induced by Wasserstein and its generalizations ensures that convergence and continuity properties reflect in the induced topology, for example in the weighted Wasserstein–Bures setting, where the space is a complete geodesic cone (Li et al., 2020).
Robustness to Noise and Sampling: The low-rank and geometric properties confer robustness in the context of missing data, random sampling, and matrix completion, as shown in matrix recovery and manifold learning settings (Rana et al., 23 Sep 2025).

8. Computational and Statistical Implications

Advances in constructing and estimating square Wasserstein distance matrices have led to improved methods for high-dimensional covariance estimation, clustering, and barycenter computation. Random matrix improved estimators outperform classical plug-in methods, particularly when the dimension is comparable to the number of samples (Tiomoko et al., 2019). Approximations such as Quasi Manhattan Wasserstein Distance (QMWD) balance accuracy and efficiency for comparing large-scale matrices (such as images) and avoid the quadratic scaling of classical Manhattan Wasserstein Distance (Lim, 2023).

Summary Table: Classes of Square Wasserstein Distance Matrices

Type	Key Properties	Primary Application Domains
Classical scalar $W_2^2$	Infimum of kinetic action, Benamou–Brenier	Imaging, manifold learning, clustering
Generalized/unbalanced $W_p^{a,b}$	Also penalizes mass creation/removal	Source/sink modeling, unbalanced data
Matrix-valued (density/states)	Noncommutative continuity and duality	Quantum information, matrix-valued data
SPD/Bures–Wasserstein	Riemannian structure, explicit geodesics	Covariance clustering, information geometry
Sliced/approximate variants	Random projection, strong/weak equivalence	High-dimensional, large-scale applications
Kernel-based and regularized	Nonlinear structure, positive-definite kernels	Supervised learning, similarity search
Low-rank/completed	Nyström/sample-efficient estimation	Scalable manifold learning, classification

Square Wasserstein distance matrices are thus vital to both theoretical and applied research: they encode non-Euclidean structure among empirical distributions, enable scalable algorithms across scientific fields, and connect fluid mechanics, geometry, statistical mechanics, signal processing, and machine learning in a unified mathematical framework.