Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wasserstein Formulation & Optimal Transport

Updated 4 May 2026
  • Wasserstein formulation is a framework in optimal transport that minimizes the cost of moving probability measures, typically using squared Euclidean distances.
  • It extends to multi-marginal and dual settings with strategies like genetic column updates to enable sparse, tractable solutions for high-dimensional problems.
  • Applications include computing barycenters, mesh-free interpolation, and Wasserstein splines, advancing techniques in statistics, machine learning, and computational mathematics.

The Wasserstein formulation is a canonical framework in optimal transport theory that characterizes the minimal cost of transporting one probability measure to another with respect to a specified ground cost, most commonly the squared Euclidean distance. This formulation encompasses both classical two-marginal problems and generalizations to the multi-marginal setting, where optimal transport plans couple more than two marginals. Modern developments include dual formulations, algorithmic advances for tractable high-dimensional computation, as well as applications to Wasserstein barycenters and spline interpolation in the metric space of probability measures. The Wasserstein framework enables both geometric and variational perspectives, which are foundational for recent advances in statistics, machine learning, computational mathematics, and applied sciences (Friesecke et al., 2022).

1. Multi-Marginal Optimal Transport: Primal and Dual Wasserstein Formulations

Let XRdX\subset\mathbb{R}^d and given probability measures μkP(X)\mu_k\in\mathcal{P}(X) for k=1,,Nk=1,\dots,N, the multi-marginal Kantorovich problem is

minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),

where Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N) denotes the set of all couplings with prescribed marginals

Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},

and c:XNRc:X^N\rightarrow\mathbb{R} is a given cost function. For quadratic cost, the Wasserstein-$2$ setting, c(x1,,xN)c(x_1,\ldots,x_N) is typically a sum of squared distances or a symmetric quadratic cost.

After discretization, each marginal is represented on a finite grid Xk={a1(k),,ak(k)}X_k=\{a^{(k)}_1,\dots,a^{(k)}_{\ell_k}\}, and a plan μkP(X)\mu_k\in\mathcal{P}(X)0 is an μkP(X)\mu_k\in\mathcal{P}(X)1 nonnegative tensor, interpreted as a probability mass function. The discrete problem reads

μkP(X)\mu_k\in\mathcal{P}(X)2

where μkP(X)\mu_k\in\mathcal{P}(X)3 extracts the μkP(X)\mu_k\in\mathcal{P}(X)4-th marginal, and μkP(X)\mu_k\in\mathcal{P}(X)5 is the Frobenius inner product. The feasible set is the so-called Kantorovich polytope.

The dual formulation introduces potentials μkP(X)\mu_k\in\mathcal{P}(X)6, yielding

μkP(X)\mu_k\in\mathcal{P}(X)7

with equality of primal and dual optima by linear programming duality.

2. The GenCol Algorithm for High-Dimensional Multi-Marginal Problems

The GenCol (genetic column generation) algorithm is specifically designed to solve large-scale multi-marginal optimal transport (MMOT) linear programs which arise in mesh-based Wasserstein formulations. The algorithm operates on a restricted subset of the configuration space, iteratively augmenting active constraints to converge to the full optimal solution. The main steps are:

  1. Column restriction: Start with a small subset μkP(X)\mu_k\in\mathcal{P}(X)8.
  2. Reduced LP solve: Solve the primal and dual LPs on μkP(X)\mu_k\in\mathcal{P}(X)9.
  3. Feasibility check: If dual feasibility extends globally, terminate; the current solution is optimal.
  4. Genetic update: Otherwise, select a "parent" k=1,,Nk=1,\dots,N0 with k=1,,Nk=1,\dots,N1, mutate one coordinate to create an "offspring" k=1,,Nk=1,\dots,N2 not in k=1,,Nk=1,\dots,N3, and admit k=1,,Nk=1,\dots,N4 if it violates the dual constraint.
  5. Pruning: If k=1,,Nk=1,\dots,N5 exceeds a chosen inflation threshold, prune inactive columns.

Key properties include:

  • Primal LP on k=1,,Nk=1,\dots,N6 has k=1,,Nk=1,\dots,N7 variables and k=1,,Nk=1,\dots,N8 constraints.
  • By sparsity, at most k=1,,Nk=1,\dots,N9 columns are needed for an exact solution.
  • The genetic update ensures efficient exploration while controlling computational complexity.

The GenCol algorithm empirically achieves exponential convergence and enables MMOT problems with minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),0–minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),1 variables to be solved on standard hardware, whereas classical approaches (Sinkhorn iterations, iterative Bregman projections) are infeasible due to memory constraints (Friesecke et al., 2022).

3. Wasserstein Barycenters and Mesh-Free Formulations

The Wasserstein barycenter is the probability measure that minimizes a weighted sum of Wasserstein distances to a given family of marginals. The standard barycenter problem is

minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),2

with minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),3, minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),4. In the mesh-free multi-marginal formulation, the cost is

minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),5

with minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),6. The barycenter measure is then the pushforward minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),7, where minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),8 is the optimal MMOT plan, allowing the barycenter to be supported on a grid of size minγΠ(μ1,,μN)XNc(x1,,xN)  dγ(x1,,xN),\min_{\gamma\in\Pi(\mu_1,\dots,\mu_N)} \int_{X^N} c(x_1,\ldots,x_N)\;\mathrm d\gamma(x_1,\ldots,x_N),9 times finer than the input marginals—hence, "mesh-free."

This formulation avoids limitations of grid-based approaches, which restrict barycenter locations, by exploiting the higher-dimensional simplex induced by the multi-marginal coupling (Friesecke et al., 2022).

4. Wasserstein Splines and Geodesic Interpolation

Wasserstein splines generalize classical spline interpolation to the space of probability measures endowed with the Wasserstein metric. Given marginal constraints at discrete times Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)0, the MMOT problem seeks a joint law Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)1 on Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)2 minimizing

Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)3

subject to Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)4 for all Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)5. The cost Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)6 is defined via

Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)7

the cubic-spline energy associated with the knot sequence. The resulting interpolants are Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)8, where Π(μ1,,μN)\Pi(\mu_1,\dots,\mu_N)9 evaluates the spline at intermediate times.

This provides a principled, variational approach to interpolate measures along smooth paths in Wasserstein space, with links to dynamic optimal transport (Friesecke et al., 2022).

5. Sparsity, Correctness, and Complexity of Wasserstein Formulations

Several theoretical properties characterize the tractability of the MMOT Wasserstein formulation:

  • Sparsity (Dubins): Any optimizer Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},0 to the discrete MMOT LP with marginal grid sizes Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},1 has at most Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},2 nonzeros. This sharp bound ensures that although the full coupling grows exponentially, optimizers are supported on a much smaller subset of configurations.
  • Algorithmic correctness (GenCol): Under the genetic column update, the reduced LP converges to the global optimum once dual feasibility is satisfied, ensuring correctness and finite termination in practice.
  • Computational efficiency: Each iteration involves LPs of size Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},3, with overall complexity dictated by the manageable active set size Π(μ1,,μN)={γP(XN)    (πk)#γ=μkk=1,,N},\Pi(\mu_1,\dots,\mu_N) =\left\{\gamma\in\mathcal{P}(X^N)\;\left|\; (\pi_k)_\#\gamma=\mu_k\quad\forall\,k=1,\dots,N\right.\right\},4.

Empirical observations confirm rapid (often exponential) convergence and mesh-free numerical accuracy for barycenter and spline tasks in high dimensions (Friesecke et al., 2022).

6. Significance: Applications and Extensions

The general Wasserstein formulation underlies a broad range of optimal transport-based techniques. Key applications include:

  • Multi-marginal barycenters: Allowing mesh-free, high-dimensional computation, essential for statistical averaging in geometry and machine learning.
  • Wasserstein splines: Enabling smooth interpolation of distributions in generative modeling, time-series, and dynamical inference.
  • High-dimensional OT solvers: Via the GenCol scheme and sparsity guarantees, practical in scientific computing regimes previously inaccessible to full dense plan approaches.
  • Theoretical innovation: The integration of duality, sparsity, and mesh-free evaluation within a unified mathematical framework has enabled advances across computational mathematics, statistics, and applied sciences.

References and further technical details, such as precise convergence proofs or spline cost expressions, are detailed in (Friesecke et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein Formulation.