Discrete Optimal Transport Methods

Updated 24 February 2026

Discrete optimal transport approaches are mathematical and computational methods that solve mass transfer between discrete measures using linear programming, variational principles, and combinatorial optimization.
These methods leverage rich polyhedral and PDE-inspired structures to enable efficient algorithms like the Sinkhorn iteration, auction methods, and ADMM for high-dimensional and sparse problems.
Applications span data analysis, imaging, machine learning, and shape analysis, with ongoing research addressing scalability, discretization errors, and convergence in complex scenarios.

Discrete Optimal Transport (OT) approaches comprise a class of mathematical and computational methods that address the problem of transporting mass between discrete measures, typically defined on finite or countable spaces. These methods are central in modern statistics, computer science, mathematics, and applied domains, providing the backbone for algorithms in data analysis, imaging, machine learning, and geometry. Discrete OT is primarily formulated through linear programming (LP), convex optimization, or variational principles, and exhibits rich polyhedral, combinatorial, and PDE-inspired structures.

1. Mathematical Formulations of Discrete Optimal Transport

The prototypical discrete OT problem concerns finite probability measures $\mu=\sum_i a_i \delta_{x_i}$ and $\nu=\sum_j b_j \delta_{y_j}$ supported on finite sets $\{x_i\}$ , $\{y_j\}$ and a cost matrix $C=(c_{ij})$ , typically $c_{ij} = \|x_i - y_j\|^p$ for $p \geq 1$ . The primal Kantorovich LP is

$\min_{\pi \in \mathbb{R}^{m \times n}_+}\ \sum_{i=1}^m\sum_{j=1}^n c_{ij} \pi_{ij} \quad \text{s.t.}\quad \sum_{j=1}^n \pi_{ij}=a_i,\ \sum_{i=1}^m \pi_{ij}=b_j.$

Its dual is

$\max_{u \in \mathbb{R}^m, v \in \mathbb{R}^n} \sum_i a_i u_i + \sum_j b_j v_j \quad \text{s.t.}\ u_i + v_j \leq c_{ij}\ (\forall i,j).$

For empirical or discrete distributions with equal numbers of support points and uniform mass, this specializes to the classical assignment problem. In the quadratic cost case, the Wasserstein-$2$ distance is recovered as $W_2^2(\mu,\nu)=\min_{\pi}\langle \pi,C\rangle$ subject to the marginal constraints (Merigot et al., 2020, Solomon, 2018, Schrieber et al., 2016).

Semi-discrete OT considers a discrete-to-continuous pairing; for example, when $\mu$ is atomic and $\nu$ is absolutely continuous, yielding variational formulations involving power diagrams and Laguerre cells.

Dynamical and PDE-inspired discrete formulations adapt the Benamou–Brenier dynamic formulation to graphs or grids by optimizing over time-dependent densities and fluxes with discrete continuity constraints (Papadakis et al., 2013, Erbar et al., 2017, Lavenant et al., 2018).

2. Polyhedral Theory, Sparsity, and Barycenters

For fully discrete OT, the feasible region is a polyhedron. Basic feasible solutions have sparsity: for the classical bipartite transport LP with $m$ sources and $n$ targets, every vertex (extreme point) solution $\pi^*$ has support size at most $m+n-1$ and forms an acyclic bipartite graph (Zanetti et al., 2022, Hou et al., 2023). This is leveraged for efficient algorithms and memory reductions.

Wasserstein barycenters for discrete measures are of central theoretical and computational interest. In the discrete barycenter problem, all input marginals $P_i$ are discrete, and any barycenter $\bar P$ must be supported on the finite centroid set $S= \{\frac{1}{N}(x_{1k_1} + \dots + x_{Nk_N})\mid k_i\}\$, leading to a high-dimensional but sparse LP. Every discrete barycenter is itself discrete, and there is always a barycenter with support at most $\sum_i S_i - (N-1)$ (“provable sparsity”) (Anderes et al., 2015). An exact LP formulation with non-mass-splitting optimal transport plans to each marginal exists in this setting.

3. Algorithmic Approaches and Computational Techniques

Classical LP, Auction, and Simplex

Standard simplex/network simplex: Directly solve the Kantorovich LP via specialized simplex techniques or general-purpose commercial solvers. Scaling is $O(m^2 n^2)$ in practice for dense problems (Schrieber et al., 2016, Zanetti et al., 2022, Hou et al., 2023).
Auction algorithms: For balanced, uniform measures, coordinate-ascent on duals translates to Bertsekas' auction and $\epsilon$ -scaling methods (Merigot et al., 2020, Solomon, 2018).
Column-generation/IPM hybrid: Dynamic active-set methods that maintain a sparse support during interior-point iterations, solving only small Schur complement systems and pricing new variables (Zanetti et al., 2022).

Entropic Regularization and Sinkhorn

Sinkhorn/Knopp iteration: Entropic regularization smooths the OT LP by adding an $\varepsilon$ -KL term, yielding efficient matrix scaling iterations with geometric convergence and excellent parallelizability, but cost bias $O(\varepsilon \log n)$ (Merigot et al., 2020, Solomon, 2018, Lu et al., 2018, Schrieber et al., 2016). Proposed as the default approximate solver for large moderate-precision problems.

Proximal Splitting and ADMM

Dynamic (Eulerian) formulations: The Benamou–Brenier approach is discretized on regular grids or triangulated surfaces, leading to large sparse convex programs that are efficiently solved by first-order methods (Douglas–Rachford, Chambolle–Pock, ADMM) (Papadakis et al., 2013, Lavenant et al., 2018, Erbar et al., 2017). Staggered grid schemes help avoid checkerboarding and improve stability.
Graph and Markov-kernel formulations: Wasserstein metrics between probability densities on graphs employ means such as the logarithmic mean, leading to convex actions and efficient proximal projections per edge, with proven $\Gamma$ -convergence (Erbar et al., 2017).

High-Dimensional and Approximate Methods

For scalability:

Approximate algorithms: Monte Carlo $(1+\varepsilon)$ -approximation, randomized geometric spanner-based minimum-cost-flow, and multiplicative weights boosting enable subquadratic approximation in high dimensions (Agarwal et al., 2023).
Stochastic gradient for grid discretization: SGD on entropic objectives with cell-partitioning allows accurate approximations for marginal discretization with proven rates, enabling parallelization and adaptability in dimension (Wang et al., 2021).

4. Extensions and Generalizations

Regularized and Relaxed Discrete OT

Relaxing mass constraints and introducing graph-based regularizers enable applications to color normalization, artifact reduction, and interpolation in multimodal distributions. The regularized problem remains convex under Sobolev ( $\ell_2$ on graph) or TV penalties, solvable by Frank–Wolfe, primal–dual, or LP-based algorithms. Block-coordinate descent extends regularized OT to barycenter calculations (Ferradans et al., 2013).

Semi-Discrete, Quasi-Discrete, and “3/4-Discrete” Regimes

“3/4-discrete” OT arises when one marginal is continuous on lines/curves (segments) and the other is discrete, with dual formulations and fast Laguerre diagram computations allowing scalable quasi-Newton optimization for fitting curves to point clouds (Gournay et al., 2018).

Brenier-based quasi-discrete $\to$ discrete methods avoid entropy and directly yield exact Wasserstein distances for quadratic cost, outperforming Sinkhorn when high precision or extremal transport plans are desired (Lu et al., 2018).

Discrete-Time and Dynamical Settings

Discrete-time OT with Lagrangian costs is addressed via Kantorovich duality for time-indexed marginal constraints. Algorithms combining optimal control and splitting methods enable efficient parallel solvers without requiring solution of continuous Hamilton–Jacobi equations, with extensions to linear–Gaussian settings represented as matrix-valued SDPs (Wu et al., 2024).

5. Convergence Theory and Discretization Error

Comprehensive convergence analyses establish that finite-dimensional discretizations of Kantorovich problems on compact metric spaces provide sharp error bounds $|I[\pi_h]-I^*|\leq \omega_c(h)+\varepsilon$ for modulus-of-continuity $\omega_c$ (Frungillo, 2024). Barycentric and geometric-median projections allow convergence of discrete approximants to Monge maps under uniqueness and regularity hypotheses.

Homogenization on periodic 1D meshes reveals that discrete Benamou–Brenier-type metrics may converge not to $W_2$ but to a rescaled ( $\sqrt{c^*}$ ) Wasserstein metric, with the limiting mobility controlled by mesh microstructure and isotropy conditions (Gladbach et al., 2019).

6. Applications, Numerical Examples, and Benchmarking

Benchmark Comparisons

The DOTmark collection provides systematic benchmarks for discrete OT solvers, facilitating objective comparison of classical simplex, network simplex, Sinkhorn, shielding/geometry-aware methods, and semidiscrete convex optimization. Geometry-aware and semidiscrete methods (e.g., AHA) offer substantial speedups and precision improvements in high-resolution settings (Schrieber et al., 2016).

Barycenters, Color Transfer, and Normalization

Discrete Wasserstein barycenters with sparse support enable optimal blending or interpolation among multiple input measures, with provable non-mass-splitting transport structure (Anderes et al., 2015, Ferradans et al., 2013).

Graph-regularized and relaxed discrete OT provides a practical basis for robust color normalization and palette transfer across multi-modal distributions in image processing (Ferradans et al., 2013).

Fitting Curves, Shape Analysis, and Probability Flow

Semi-discrete and 3/4-discrete OT methods enable extraction of polyline and filamentary structures (e.g., galaxy filament tracing) and physically consistent curve approximation for large-scale point clouds (Gournay et al., 2018).

Recent work connects discrete OT to deterministic probability flow in diffusion models, defining optimal discrete flows via Kantorovich plans and providing sampling algorithms with lower variance and increased certainty in generative settings (Zhang et al., 2023).

7. Open Problems and Research Directions

Ongoing challenges include:

Discretization of dynamic OT on graphs that preserves displacement geodesics and convergence to continuum limits.
Accelerated scalable solvers for fully discrete or multi-marginal OT in high dimensions.
Unbalanced OT for mass-varying scenarios and efficient approximation of Gromov–Wasserstein or tensor-valued measures.
Quantitative analysis of discretization errors, particularly as $n \to \infty$ on manifolds or random graphs.
Adaptations of discrete PDE-based methods with guarantees of mass preservation, contraction, Ricci curvature, and compatibility with gradient flows (Solomon, 2018, Lavenant et al., 2018, Erbar et al., 2017).

Current developments continue to advance both the mathematical foundations and computational tractability of discrete optimal transport, broadening its impact and enabling novel applications across scientific disciplines.