Adaptive Optimal Transport: Methods & Applications

Updated 30 May 2026

Adaptive Optimal Transport (AOT) is a framework that adjusts transport plans, constraints, and regularization based on intrinsic data features for improved robustness, interpretability, and scalability.
It employs adaptive mass transport, latent low-rank couplings, and pointwise smoothing to dynamically determine transported mass and mitigate noise and outlier effects.
AOT enhances algorithmic efficiency through Sinkhorn-type iterations, dual ascent methods, and anchor-based optimizations, proving effective in domain adaptation and deep model alignment.

Adaptive Optimal Transport (AOT) denotes a class of methodologies within the optimal transport (OT) paradigm designed to adapt the transport computation, plan structure, or regularization to intrinsic data or task features. While classical OT enforces rigid marginal matching or prescribes globally constant regularization, AOT introduces data-dependent adaptivity—across the transport constraints, cost structure, feature learning, or algorithmic solvers—enabling improved robustness, interpretability, scalability, and task-specific fidelity.

1. Mathematical Formulations and Core Variants

The label "Adaptive Optimal Transport" encompasses several mathematically distinct frameworks that share a unifying ethos: allow the transport or its constraints to adapt to characteristics of the empirical problem.

Adaptive Mass Transport: Marginal Inequality and Self-determined Matching

The AOT formulation of (Yang et al., 7 Mar 2025) is defined as follows. For $(X,\mu)$ and $(Z,\nu)$ Polish probability spaces, and a continuous, potentially mixed-sign cost $c: X \times Z \to \mathbb{R}$ , adaptive OT seeks the plan $\gamma \in \mathcal{P}(X \times Z)$ solving

$\min_{\gamma \in \Gamma_{\leq}(\mu,\nu)} \int c(x,z)\,d\gamma(x,z)$

where

$\Gamma_{\leq}(\mu,\nu) := \{\,\gamma \in \mathcal{P}(X \times Z) : \gamma[A \times Z] \leq \mu[A],\;\gamma[X \times B] \leq \nu[B]\;\forall\text{ Borel }A,B\,\}$

The mass to be transported is not fixed a priori but adapts optimally to the cost structure (see (Yang et al., 7 Mar 2025), Theorem 1).

Adaptive/Latent Structure: Anchor-based Low-rank Couplings

Adaptive OT, also called Latent OT (LOT), in (Lin et al., 2020) factorizes the coupling through a learned set of anchor points. For empirical measures $\mu = \sum_i p_i^x \delta_{x_i}$ , $\nu = \sum_j p_j^y \delta_{y_j}$ , AOT introduces latent supports $\mathbf{Z}_x$ , $\mathbf{Z}_y$ and a three-factor coupling $(Z,\nu)$ 0, enforcing only $(Z,\nu)$ 1– $(Z,\nu)$ 2 coupling complexity (see (Lin et al., 2020), Section 1).

Adaptive Regularization: Pointwise Smoothing Constraints

The OTARI (Optimal Transport with Adaptive Regularization) approach (Assel et al., 2023) endows the regularization with local adaptivity, imposing per-row/column convex-entropy or quadratic function constraints: $(Z,\nu)$ 3 yielding more uniform smoothing (see (Assel et al., 2023), Section 1).

Adaptive Learning/Adversarial Approaches

Essid et al. (Essid et al., 2018) introduce a saddle-point, data-adaptive approach. Instead of matching histograms or enforcing fixed costs, the map $(Z,\nu)$ 4 and discriminator $(Z,\nu)$ 5 are co-learned over iteratively composed "local OT" games, with no hand-crafted feature representation; both adapt to the current residual between pushforward and target ((Essid et al., 2018), Sections 2–3).

2. Theoretical Insights and Structural Properties

AOT frameworks yield results distinct from classical or partial OT.

Adaptive Mass Mechanism and Support Sparsity

Negative cost activation: Only negative-cost regions of the transport matrix can be active; positive-cost pairs carry zero mass ((Yang et al., 7 Mar 2025), Theorem 1).
Self-determined partial matching: The total amount of transported mass is neither enforced (as in classical OT) nor user-selected (as in partial OT), but dynamically determined by the cost configuration.

Duality and Variational Structure

Dual potential constraints: Dual maximization for AOT adds $(Z,\nu)$ 6 inequality constraints, as opposed to the unconstrained dual in classical OT ((Yang et al., 7 Mar 2025), Theorem 3).
Latent discrepancy and quasi-metric properties: The LOT framework defines a "latent" Wasserstein distance $(Z,\nu)$ 7 with quasi-triangle inequality, symmetry, and nonnegativity ((Lin et al., 2020), Section 4).

Stability and Convergence

Regularization and entropic smoothing: Entropic AOT converges to classical AOT as the regularization parameter tends to zero; the unique entropic optimizer approaches the true coupling in adapted Wasserstein topology ((Eckstein et al., 2022), Section 3.2).
Sampling and statistical rates: The empirical AOT, especially in LOT or 1D distributional alignment (see (Melnyk et al., 2024)), achieves $(Z,\nu)$ 8 statistical error, as established by Rademacher complexity and OT-duality arguments.

3. Algorithmic Schemes and Solvers

The adaptivity of AOT is reflected in algorithmic design.

Sinkhorn-type and Bregman Iterative Schemes

Augmented Sinkhorn for mass-inequality: AOT with inequality constraints (as in (Yang et al., 7 Mar 2025)) uses an augmented cost matrix and Sinkhorn iterations with dummy rows/columns, enforcing mass non-exceedance.
Adapted Sinkhorn for causal structures: Temporal/causal AOT employs adapted Sinkhorn algorithms with dynamic programming for pathwise constraints ((Eckstein et al., 2022), Section 4).

Alternating Projections & Dual Ascent

Dykstra's and Bregman projection: OTARI uses alternating Bregman projections onto the polyhedral simplex, per-row/column entropy balls, and dual ascent (L-BFGS/Adam) ((Assel et al., 2023), Sections 3a–3b).

Latent (Anchor-Based) OT Algorithms

Alternating optimization: LOT alternates between updating Sinkhorn-scaled couplings $(Z,\nu)$ 9 and analytic anchor point updates via solving a linear system ((Lin et al., 2020), Section 3).
Complexity: The main cost is $c: X \times Z \to \mathbb{R}$ 0 per outer iteration; empirically comparable to several runs of standard entropic OT.

4. Adaptivity Beyond Classical OT: Functional Extensions

AOT methodology generalizes standard OT in several axes:

Variant / Feature	Adaptivity Mechanism	Key Distinction
Marginal constraints (Yang et al., 7 Mar 2025)	Inequality, cost-determined mass	No fixed mass, robust to outliers
Anchor/latent structure (Lin et al., 2020)	Low-rank learned anchors	Robust, interpretable, scalable
Regularization (Assel et al., 2023)	Per-point entropy/quad constraint	Uniform smoothing, less overconcentration
Adversarial learning (Essid et al., 2018)	Feature map/discriminator co-learned	High flexibility, composition of local maps

The practical effect is that AOT can "ignore" or discount noise and outliers, avoid over-smoothed or under-smoothed rows (as in global entropic OT), and generate low-complexity interpretable transport plans.

5. Applications and Empirical Evidence

AOT and its variants underpin robust and interpretable solutions in a range of domains.

Domain Adaptation and Distribution Alignment

Partial distribution alignment: AOT yields significant accuracy improvements on VisDA '17 (AOT 76.68% vs m-POT 73.59%), Office-Home (AOT 72.24% vs m-POT 70.34%), and Office-31 ((Yang et al., 7 Mar 2025), Section 6).
Anchor-based OT for cross-domain embedding: On MNIST → USPS domain adaptation, LOT raises classification accuracy from 76.9% (vanilla OT) to 86.2% ((Lin et al., 2020), Section 5).
Robust under domain drift and outliers: Only negative-cost pairs are activated; outliers receive zero mass ((Yang et al., 7 Mar 2025), Section 7).

Deep Model Alignment and Filtering

Distributional preference alignment for LLMs: AOT formalizes first-order stochastic dominance via 1D OT, outperforming DPO/KTO/IPO on AlpacaEval and Open LLM Leaderboard tasks ((Melnyk et al., 2024), Section 7).
Adaptive Kalman filtering: OTAKNet leverages geometry-aware OT losses for online adaptation to noise drift in sequential state estimation ((He et al., 9 Aug 2025), Section 6).

Interpretability and Visualization

Cluster-level mapping: Anchor-based AOT enables direct visualization of source-target alignment at the cluster level; the intermediate $c: X \times Z \to \mathbb{R}$ 1 matrices are highly interpretable ((Lin et al., 2020), Section 6).
Samplewise mass allocation diagnostics: Transport heatmaps often display block-diagonal "class-aware" structure, confirming adaptive focus ((Yang et al., 7 Mar 2025), Section 6).

6. Adaptive OT in Extended and Special Settings

The adaptive paradigm generalizes further:

Causal/adapted OT for time series: Adapted OT incorporates temporal causal (filtration) constraints, formalized with pathwise couplings and solved by specialized LPs or backwards induction (Eckstein et al., 2022, Gunasingam et al., 24 Apr 2026).
Fast adaptive stochastic solvers: Semi-dual approaches with $c: X \times Z \to \mathbb{R}$ 2 penalty enable SGD and adaptive accelerated gradient (ANAG) for unbalanced OT with $c: X \times Z \to \mathbb{R}$ 3 and $c: X \times Z \to \mathbb{R}$ 4 rates ((Genans, 11 Feb 2026), Section 6).

7. Limitations and Open Directions

Computational Complexity: Adaptive Sinkhorn methods and anchor updates introduce additional computational overhead, particularly in large-scale settings ((Yang et al., 7 Mar 2025), Section 7). Mini-batch and low-rank approaches partially mitigate this.
Choice of cost function: Mixed-sign cost design is critical for effective adaptivity; application-specific heuristics may be required ((Yang et al., 7 Mar 2025), Section 7).
Parameter selection: OTARI and anchor-based methods require careful tuning of per-point smoothing or anchor cardinality; overestimation is not usually harmful, but underestimation degrades performance.
Generalization guarantees: While empirical convergence and sampling bounds exist (e.g., $c: X \times Z \to \mathbb{R}$ 5 for 1D AOT, (Melnyk et al., 2024)), theoretical results in higher-dimensional, highly adaptive regimes remain an active area.
Extensions: Potential future directions include parameterized, jointly optimized cost functions (end-to-end AOT), extension to multi-marginal or multi-view settings, and refined statistical guarantees for empirically learned adaptive couplings ((Yang et al., 7 Mar 2025), Section 7).

References

Adaptive Optimal Transport (Essid et al., 2018)
Adaptive Optimal Transport for Partial Distribution Alignment (Yang et al., 7 Mar 2025)
Latent Adaptive Optimal Transport (LOT) (Lin et al., 2020)
Optimal Transport with Adaptive Regularisation (OTARI) (Assel et al., 2023)
Computational Methods for Adapted Optimal Transport (Eckstein et al., 2022)
Distributional Preference Alignment via Optimal Transport (AOT) (Melnyk et al., 2024)
Differentiable Adaptive Kalman Filtering via Optimal Transport (He et al., 9 Aug 2025)
Fast and Large-Scale Unbalanced OT Adaptive Methods (Genans, 11 Feb 2026)
Adapted OT between Filtered Gaussian Processes (Gunasingam et al., 24 Apr 2026)