Discrete Optimal Transport

Updated 20 September 2025

Discrete Optimal Transport studies the optimal reallocation of discrete masses under cost minimization with prescribed marginal constraints.
It employs variational and polyhedral formulations, using convex functions and power diagrams to derive robust, scalable numerical algorithms.
DOT is applied in computational geometry, statistical imaging, and machine learning, offering efficient methods for large-scale transport problems.

Discrete optimal transport (DOT) is the paper of optimal transportation problems where measures are discrete, i.e., supported on finite sets or domains partitioned into finitely many regions. In DOT, the goal is to minimize a prescribed transportation cost when reallocating "mass" from one distribution to another, subject to prescribed marginal constraints. DOT stands at the interface of convex geometry, combinatorial optimization, computational geometry, and modern applications in statistics, data science, and computer vision. The domain supports distinct variational principles, linear programming formulations, deep connections to power diagrams and convex polytopes, and leads to scalable algorithms for applications involving large discrete domains.

1. Variational and Polyhedral Formulations of Discrete Optimal Transport

DOT is grounded in a variational framework that finds its most geometric statement in the context of convex subdivision and power diagrams. Given a compact convex domain $\Omega \subset \mathbb{R}^n$ , a positive density $\sigma$ on $\Omega$ , a set of target points $p_1, \ldots, p_k$ in $\mathbb{R}^n$ , and prescribed masses $A_1, \ldots, A_k$ with $\sum_i A_i = \int_\Omega \sigma(x)dx$ , one seeks a partition of $\Omega$ into convex regions $W_i$ such that $\int_{W_i \cap \Omega} \sigma(x)\,dx = A_i$ for all $i$ (Gu et al., 2013).

The partition is encoded via a "height vector" $h\in \mathbb{R}^k$ :

$W_i(h) = \{x \in \mathbb{R}^n : x\cdot p_i + h_i \ge x\cdot p_j + h_j\;\; \forall j\}$

and via the convex, piecewise linear function:

$u_h(x) = \max_{i = 1,\ldots,k} (x\cdot p_i + h_i)$

The energy functional

$E(h) = \int_\Omega u_h(x) \,\sigma(x)dx$

is strictly convex (on the hyperplane $\sum_i h_i = 0$ ), $C^1$ -smooth, and admits an explicit Hessian. Its gradient is the vector of cell $\sigma$ -masses: $\partial E/\partial h_i = \int_{W_i(h) \cap \Omega} \sigma(x)dx$ . Thus, finding $h$ such that the prescribed masses are recovered is equivalent to solving $\nabla E(h) = (A_1,\ldots,A_k)$ . The unique solution (modulo additive constant) can be efficiently computed (e.g., via Newton’s method) (Gu et al., 2013).

From the duality perspective, in pure discrete settings, optimal transport is classically cast as a linear program: for empirical measures on finite sets $X = \{x_1, \dots, x_m\}$ , $Y = \{y_1, \dots, y_n\}$ ,

$\min_{\pi \in \Pi(\mu, \nu)} \sum_{i, j} c(x_i, y_j)\,\pi_{ij},\qquad \sum_j \pi_{ij} = p_i,\ \sum_i \pi_{ij} = q_j,\ \pi_{ij} \ge 0$

where $c(x, y)$ is a prescribed ground cost and the probability vectors $p_i$ and $q_j$ specify the mass at $x_i$ and $y_j$ respectively (Solomon, 2018).

2. Geometric Structures: Power Diagrams and Links to Convex Geometry

A major structural insight is that the regions $W_i(h)$ —which partition $\Omega$ for the DOT problem in the semidiscrete or continuous-to-discrete setting—are exactly the cells of a power diagram (weighted Voronoi diagram) defined by points $\{(p_i,w_i)\}$ with $h_i = -\frac{1}{2}(|p_i|^2 + w_i)$ . The connection is precise:

$U_i = \{ x : |x-p_i|^2 + w_i \le |x-p_j|^2 + w_j\;\; \forall j\} = W_i(h)$

Thus, DOT can be reframed as adjusting the power diagram weights so that induced cell volumes match prescribed masses (Gu et al., 2013). This geometric viewpoint underpins both the variational approach and algorithmic strategies, leading to direct links to discrete convex polytopes (e.g., Minkowski-type problems) and to the discrete Monge–Ampère equation via the volume of the convex hull of neighboring gradients.

3. Algorithmic Approaches: Convex Optimization and Multi-Scale Methods

The variational framework makes the DOT problem amenable to efficient numerical algorithms. The strictly convex function $E(h)$ (with explicit gradient and Hessian) yields, upon restricting to $\sum_i h_i = 0$ , a unique solution which is accessible by Newton’s method, with explicit computation of gradients and Hessians on each iteration. The Hessian for $i\ne j$ is:

$\frac{\partial^2 E(h)}{\partial h_i \partial h_j} = - \frac{1}{|p_i - p_j|}\int_F \sigma|_F(x)\,dA,$

where $F$ is the shared codimension-1 face between adjacent cells.

For fully discrete problems, linear programming approaches remain standard, but recent advances exploit sparsity and geometric structure. Hierarchical multi-scale algorithms, such as the sparse shielding neighborhood method, construct local approximations and coarse-to-fine initializations: only a sparse subset $N\subset X\times Y$ is handled at each iteration, leading to dramatic reductions in run-time and memory usage scaling as $O(|X|)$ . These methods verify optimality by leveraging geometric shielding conditions on the dual constraints and are the foundation of performant solvers for large DOT problems (Schmitzer, 2015, Rauch et al., 28 Feb 2025).

In applied contexts, entropic regularization (Sinkhorn algorithm) provides scalable, highly parallelizable DOT solvers, though they introduce regularization error and may fail to achieve high precision in practice (Solomon, 2018, Lu et al., 29 Jul 2024).

4. Fundamental Connections: Discrete Monge–Ampère Equation and Convex Duality

The discrete Monge–Ampère equation (DMAE) emerges naturally from the variational DOT framework. Whereas the classical Monge–Ampère equation constrains the volume element of the Hessian determinant of a convex function, the discrete analogue defines a “discrete Hessian determinant” at each target $p_i$ as the volume of the convex hull formed by neighboring gradients. The key mass-balance condition,

$\int_{W_i(h)\cap \Omega} \sigma(x)\,dx = A_i,$

is equivalent to requiring the discrete Hessian determinant at $p_i$ equals $A_i$ . This fact directly relates DOT to combinatorial convex geometry and supports novel variational proofs of classical rigidity theorems, including Alexandrov's theorem on the realization of polytopal metrics (Gu et al., 2013).

Moreover, the DOT variational principle offers a unified convex optimization view, encompassing Minkowski-type problems for polytopes and providing new, structurally explicit proofs for results in discrete differential geometry.

5. Applications and Algorithmic Implications

DOT and its geometric/variational formulation have several practical implications:

Computational geometry: Efficient algorithms for power diagrams enable fast solutions to the DOT problem in high dimensions, especially in applications involving spatial partitioning, such as mesh generation, centroidal Voronoi tessellations, and tessellation-based quantization (Gu et al., 2013).
Numerical optimal transport: The ability to compute explicit gradients and Hessians in the $h$ -weights makes the Newton-based approach highly efficient, particularly when compared to generic, dense linear programming solvers for large $X\times Y$ .
Scalable computation: Multi-scale and shielding techniques enable DOT computations for massive datasets (e.g., in statistical imaging, machine learning, and computational graphics) (Schmitzer, 2015, Rauch et al., 28 Feb 2025).
Statistical and data science applications: DOT underwrites the computation of Wasserstein barycenters, domain adaptation, and distributional clustering in the fully discrete or discretized setting (Anderes et al., 2015).

6. Generalizations and Extensions

The discrete framework for optimal transport not only solves specific allocation problems but also generalizes to:

Wasserstein barycenters in the discrete setting, where support can be restricted to candidate centroids and sparsity and non-mass-splitting transport properties hold (Anderes et al., 2015).
Discrete Monge–Ampère-type equations, with rigidity and convergence properties accessible through the variational framework.
Combinatorial analogues on graphs and higher-dimensional polytopes, incorporating the interplay of linear programming duality, convex geometry, and discrete analysis.
Extensions to settings with one measure supported on continuous curves or polylines (“3/4‑discrete” OT) and to hybrid discrete–continuous scenarios (Gournay et al., 2018).

7. Theoretical and Algorithmic Impact

The variational principles for DOT and the identification of discrete optimal maps with power diagram adjustments provide a unifying structure. They yield direct geometric and algorithmic interpretations, robust guarantees on uniqueness and stability, and enable the derivation and analysis of efficient, large-scale solvers. The DOT variational approach delivers not just numerical efficiency, but also theoretical advancements—such as new proofs for old theorems, and a rigorous bridge between discrete geometry, classical analysis, and combinatorial optimization.

This synthesis of convex variational principles, geometric structure, and computational tractability defines discrete optimal transport as a foundational methodology in modern geometric analysis, computational mathematics, and data science.