Memory-Augmented Stateful Pools in Discrete OT

Updated 24 February 2026

Memory-Augmented Stateful Pools are defined as frameworks that incorporate memory retention in iterative discrete optimal transport algorithms to enhance stateful computations.
They leverage classical optimal transport formulations, including linear programming and entropic regularization, to manage large-scale, sparsity-structured problems.
These methods improve convergence rates and enable practical applications in imaging, machine learning, and graph-based data analysis through enhanced algorithmic state management.

Discrete optimal transport approaches refer to the mathematical frameworks and algorithms for computing optimal transport (OT) maps, couplings, and associated objects when the input probability measures are finitely supported, i.e., are discrete or empirical measures. Discrete OT has become central to computational mathematics, machine learning, statistics, and imaging due to its tractable formulation, explicit connections to combinatorial optimization and convex analysis, and its suitability for large-scale computational implementations. Underpinning nearly all methodologies is the Kantorovich linear programming (LP) formalism, though significant developments have introduced regularization, PDE-based discretizations, and optimization-theoretic relaxations that exploit the structure and sparsity of discrete instances.

1. Foundational Formulations and Linear Programming

At the core, the discrete OT problem is formulated as a linear program: given probability vectors $\mu = (\mu_i)_{i=1}^n$ and $\nu = (\nu_j)_{j=1}^m$ supported on finite sets $\{x_i\}$ and $\{y_j\}$ , and a ground cost matrix $C = (c_{ij})$ (typically Euclidean powers), the goal is to solve

$\min_{\pi \in \mathbb{R}_+^{n \times m}} \sum_{i,j} c_{ij} \pi_{ij} \quad \text{s.t.} \quad \sum_j \pi_{ij} = \mu_i,\; \sum_i \pi_{ij} = \nu_j$

This primal LP is complemented by a dual formulation in potentials, providing the basis for optimality conditions, algorithmic strategies, and error analysis (Solomon, 2018, Merigot et al., 2020). For the special case of empirical distributions with equal mass, the assignment problem arises, with solutions as permutation matrices. Complementarity and duality structures govern both exact and approximate algorithmic design throughout the literature.

2. Algorithms and Numerical Methods

A spectrum of algorithms exists for solving finite-dimensional OT problems, each exploiting particular regularity or structure:

Classical Linear Programming and Network Simplex: Direct application of the transportation simplex or network-simplex algorithms allows exact solution for moderate $n$ (up to $n \approx 10^3$ – $10^4$ ), providing crucial benchmarks (DOTmark (Schrieber et al., 2016)). LP solvers with column-generation and interior-point refinements extend feasible problem sizes, especially by dynamically exploiting sparsity (Zanetti et al., 2022, Hou et al., 2023).
Entropic Regularization (Sinkhorn): Adding an entropy penalty renders the problem strictly convex and leads to highly scalable matrix-scaling algorithms (Sinkhorn–Knopp). The algorithm alternately rescales rows and columns, achieving geometric convergence and enabling GPU acceleration at the cost of a bias controlled by the regularization parameter (Solomon, 2018, Merigot et al., 2020, Wang et al., 2021).
Assignment and Auction Algorithms: For balanced, equally weighted pairs (Monge problem), assignment/Auction methods and Bertsekas' $\varepsilon$ -scaling scheme provide efficient combinatorial solutions with strong theoretical guarantees (Merigot et al., 2020).
Geometric and Graph-based Solvers: When OT is posed on graphs, network flow algorithms and minimum-cost flow paradigms are tractable for $\nu = (\nu_j)_{j=1}^m$ 0; extensions for $\nu = (\nu_j)_{j=1}^m$ 1 exploit Benamou–Brenier analogues or graph Laplacians (Erbar et al., 2017, Lavenant et al., 2018).
Newton and Proximal Splitting: Second-order (Newton-type) methods, including Huber-smoothing approaches, exploit solution sparsity to accelerate solution of the KKT system for large instances, notably for Wasserstein barycenter computation (Hou et al., 2023).

Large-scale implementation choices depend on support size, target accuracy, underlying geometry, and the regularity of the transport cost and marginals.

3. Barycenters, Multi-marginal OT, and Polyhedral Structure

For sets of discrete marginals $\nu = (\nu_j)_{j=1}^m$ 2, discrete Wasserstein barycenters are solutions to

$\nu = (\nu_j)_{j=1}^m$ 3

The main properties include:

Discrete support: Any barycenter measure must be supported on the set of all centroids $\nu = (\nu_j)_{j=1}^m$ 4, yielding an LP of large but finite size (Anderes et al., 2015).
Sparsity: The number of nonzero weights in any optimal barycenter is bounded by $\nu = (\nu_j)_{j=1}^m$ 5, with extremal ("vertex") solutions corresponding to sparse barycenters (Anderes et al., 2015).
Non-mass-splitting transport plans: At the barycenter, each support point splits its mass to a unique support point in each marginal, a property not shared by general pairwise discrete OT, highlighting a special feature of barycenter geometry (Anderes et al., 2015).
Algorithmic approach: Barycenter LPs are feasible for problem sizes up to tens of thousands of support points using off-the-shelf solvers. For fixed supports, specialized block coordinate descent and smoothing Newton methods further improve computational feasibility (Hou et al., 2023).

This polyhedral and sparsity structure generalizes in various ways to regularized settings and to barycenter computation with relaxed constraints.

4. Discrete Approximations, Discretizations, and Convergence

Discrete OT serves as the basis for approximation and discretization of continuous OT problems. The standard pipeline replaces continuous measures with finite discrete approximations, with convergence characterized as follows (Frungillo, 2024):

Discrete Approximation: Partition the domain into cells of diameter $\nu = (\nu_j)_{j=1}^m$ 6, aggregate mass to cell centers, and solve the discrete LP on the induced grids.
Convergence: The resulting plans and map projections converge in cost and weak topology, with explicit error bounds $\nu = (\nu_j)_{j=1}^m$ 7 determined by the cost function modulus of continuity (Frungillo, 2024).
Map Recovery: When the optimal plan is induced by a Monge map, barycentric projection from discrete plans recovers the transport map up to order $\nu = (\nu_j)_{j=1}^m$ 8 accuracy (Frungillo, 2024).
Extensions and Refinements: Coarse-to-fine and multiscale methods, along with tailored partitioning (e.g., kd-trees), accommodate high-dimensional and large-scale problems (Wang et al., 2021).

Additional frameworks, including semi-discrete and 3/4-discrete approaches, accommodate hybrid settings with one measure continuous and one discrete, or with continuous mass along lower-dimensional structures (e.g., line segments) (Gournay et al., 2018).

5. Dynamical, Riemannian, and PDE-based Discrete OT

Discrete analogues of dynamical OT (i.e., Benamou–Brenier) have been developed to preserve deeper geometric structures:

Benamou–Brenier on Graphs and Meshes: Discrete analogues express the $\nu = (\nu_j)_{j=1}^m$ 9 (or more general) distance via convex programs over time-dependent densities and momenta constrained by discrete continuity equations and mesh geometry (Papadakis et al., 2013, Erbar et al., 2017, Lavenant et al., 2018, Gladbach et al., 2019).
Action and Metric Structure: Discrete action functionals on graphs use suitable mass-averaging means (including the logarithmic mean) to guarantee convexity and discrete Ricci curvature properties (Erbar et al., 2017).
Numerical Schemes: Proximal splitting (Douglas–Rachford, primal–dual) and ADMM methods efficiently solve the resulting high-dimensional convex programs. For regular meshes, FFT or multigrid Poisson solvers accelerate the continuity projection steps (Papadakis et al., 2013, Lavenant et al., 2018).
JKO Gradient Flows: Discrete Benamou–Brenier strategies are compatible with variational time-stepping (JKO) schemes for entropy gradient flow, matching semigroup evolutions on graphs (Erbar et al., 2017).

Mesh geometry (e.g., isotropy, microstructure) plays a crucial role; in 1D and higher-dimensional settings, ensuring geometric compatibility is necessary for convergence to the true $\{x_i\}$ 0 metric (Gladbach et al., 2019). Otherwise, an effective (homogenized) mobility governs the limiting transport cost.

6. Regularized, Relaxed, and Approximate Discrete OT

Regularization and relaxation of the basic discrete LP structure have diverse motivations:

Entropic Regularization: Sinkhorn methods introduce strict convexity, smoothing, and scalable computation at the cost of transport plan bias. Several algorithms, such as EDOT, optimize support locations directly by minimizing the entropy-regularized Wasserstein distance, improving discretization efficiency (Wang et al., 2021).
Relaxed and Regularized Plans: Variational formulations introduce partial mass transport and graph-based gradient or total variation penalties, yielding convex programs for image and histogram matching tasks. Block coordinate descent handles multi-marginal and barycenter extensions (Ferradans et al., 2013).
Approximate OT: Fast $\{x_i\}$ 1-approximation algorithms combine spanner construction with approximate minimum-cost flow and multiplicative-weights boosting, achieving improved scaling with support size and dimension (Agarwal et al., 2023).
Brenier and Monge–Ampère Discretizations: For quadratic costs, discrete variational formulations inspired by Brenier and Monge–Ampère equations provide access to the transport potential, with convergence proofs under triangulation regularity and log-concavity (Lindsey et al., 2016).

These developments provide both exact and approximate solutions, offering trade-offs between accuracy, speed, flexibility, and the ability to handle unbalanced or noisy marginals.

7. Applications, Generalizations, and Future Directions

Discrete OT approaches have broad impact:

Applications: Image and shape matching, color transfer, barycenter-based clustering, generative model learning (notably in diffusion models through explicit probability flows (Zhang et al., 2023)), structure extraction from large spatial data (e.g., astronomy, point clouds), and graph-based signal processing are among primary application domains.
Generalizations: Discrete OT is routinely extended to nonuniform and non-Euclidean environments, addresses partial or unbalanced transport, handles constraints (e.g., capacity), and interfaces with optimal control (discrete dynamical systems with Lagrangian costs) (Wu et al., 2024, Dieci et al., 20 Oct 2025).
Theoretical and Algorithmic Frontiers: Open questions include discrete analogues of continuum-principled gradient flows, higher-order convergence and stability under discretization, scalable solutions for higher-order (Gromov–Wasserstein) problems or tensor-valued measures, and the preservation of metric and geometric properties under regularization and discretization (Solomon, 2018, Lu et al., 2018, Gournay et al., 2018).

As discrete OT techniques continue to blend combinatorial, convex-analytic, and geometric methods, the field remains distinguished by its theoretical richness, computational tractability, and deep interplay with modern data-driven science.