Neural OT Map Learning

Updated 30 May 2026

Neural OT map learning is a framework that uses neural networks to parameterize optimal transport maps from source to target distributions, enabling applications in domain adaptation and generative modeling.
It employs methodologies such as input-convex neural networks, adversarial minimax solvers, and flow-based dynamic formulations to achieve theoretical guarantees on convergence and stability.
The approach has broad applications across generative modeling, image translation, and inverse problems, demonstrating significant empirical improvements in efficiency and robustness.

Neural OT map learning refers to the class of algorithms and statistical frameworks that use neural networks to parameterize and optimize optimal transport (OT) maps—deterministic or stochastic—from a source distribution to a target distribution, minimizing a given cost functional. These neural estimators are central in machine learning tasks such as generative modeling, domain adaptation, image translation, and scientific inverse problems. Modern approaches utilize convex neural architectures, adversarial optimization, variational flows, and geometric regularization to efficiently approximate Monge maps or more general OT couplings, with theoretical guarantees on expressivity and statistical consistency.

1. Theoretical Foundations and Semi-Discrete OT

The Monge OT problem seeks a measurable map $T:\mathcal{X}\to\mathcal{Y}$ pushing a source measure $\mu$ onto a target measure $\nu$, minimizing the cost $\int_{\mathcal{X}} c(x,T(x))\,d\mu(x)$, where $c$ is often $c(x,y)=\frac12\lVert x-y \rVert^2$. By Brenier's theorem, for absolutely continuous $\mu$ with quadratic cost, the optimal map is the gradient of a convex potential, $T=\nabla\varphi$. In semi-discrete OT, the target measure is discrete, resulting in transport maps that assign regions (power or Laguerre cells) of the source space to target points. Brenier potentials in the semi-discrete setting admit a representation as an upper envelope of affine functions parameterized by a vector of "heights" $h$:
[
u_h(x) = \max_{i}{\langle x, y_i \rangle + h_i}
]
with the map $\nabla u_h(x)= y_{i^*(x)}$ where $i^*(x)$ is the index achieving the maximum. This structure enables efficient stochastic optimization of $h$ by aligning empirical regions' volumes with prescribed masses. DPM-OT leverages this (without deep nets) for fast latent code transfer in diffusion models, achieving high sampling efficiency and strong empirical bounds on error propagation in generative modeling [2307.11308].

2. Neural Parameterizations and ICNNs

For fully-continuous OT problems in Euclidean spaces, neural networks parameterize either the Brenier potential (for Monge maps) or the dual Kantorovich potentials:
- Input-Convex Neural Networks (ICNNs): Ensure the potential function $\varphi_\theta$ is convex in $x$. The Monge map becomes $T_\theta(x) = \nabla \varphi_\theta(x)$, with the network constructed using nonnegative weights for hidden-to-hidden connections and convex, nondecreasing activations. The semi-dual objective minimizes empirical expectation plus convex conjugate (via maximization in $x$). Regularizations guarantee strong convexity and Lipschitzness [2412.08064, 2502.01310].
- Adversarial Minimax Solvers: Parameterize both the convex potential $\phi_\theta$ and a transport map $T_\omega$ independently. The objective is optimized in a saddle-point fashion:
[
\min_\theta\max_\omega\, E_{X}\big[\langle X, T_\omega(X) \rangle - \phi_\theta(T_\omega(X))\big] + E_{Y}[\phi_\theta(Y)]
]
This supports generalization error analysis in terms of Rademacher complexity, attaining $O(N^{-1/2})$ rates under standard neural network capacity constraints [2502.01310, 2412.08064].

3. Architectural Extensions and Advanced OT Structures

Recent work expands neural OT map learning to complex geometric and dynamical regimes:
- Riemannian Manifolds: Neural potentials are composed with a geometry-preserving feature map (e.g., distance-to-landmarks) and then passed through a neural network, yielding Riemannian Monge maps as $T_\theta(x) = \exp_x(-\nabla\phi(x))$. The curse of dimensionality for discretization-based methods is overcome by continuous parameterizations, with provable sub-exponential parameter complexity in dimension [2602.03566].
- Trajectory/Flow-Based Methods: The Benamou–Brenier dynamic formulation recasts the Monge map as the endpoint of a time-dependent velocity ODE learned via saddle-point optimization. The learned velocity field is integrated to yield the transport map, supporting OT with general cost/Lagrangian structures [2602.22003].
- Natural Gradient and Wasserstein Flow: Learning by lifting losses (e.g., KL divergence) to the space of maps and projecting L²-gradients onto the convex cone of Monge maps. Discrete gradient flows in map space, combined with ICNN-based parameterization, yield effective, reparameterization-invariant updates that enhance convergence and stability [2603.25182].

Parameterization	Main Construction	Provable Guarantee
ICNN (Euclidean)	$T_\theta=\nabla\varphi_\theta$	Statistical L² risk, plug-in rates
Semi-discrete $h$	Upper envelope/height	Monge equivalence, cell partition
Riemannian NN	$\phi=\psi^c$, $T=\exp_x(-\nabla\phi(x))$	Sub-exponential in dimension
Dynamic/Flow-based	ODE integration of $v_\theta$	Exact recovery for quadratic cost

4. Statistical Guarantees and Convergence Rates

Comprehensive analyses establish non-asymptotic convergence rates for neural OT map estimators:
- Plug-in/Sieve Plug-in: For empirical measures, the estimator minimizing the semi-dual achieves
[
|\nabla\hat\varphi_{n,N}-\nabla\varphi_0|^{2_{L^2(P)}} = O(\tilde n^{-1/\gamma} + \tilde N^{-1/\gamma})
]
with $\gamma$ determined by the function class's entropy; faster $O(\tilde n^{{-2/(\gamma+2)})$} rates are possible with Poincaré-type inequalities for Donsker classes [2412.08064]. These results require only mild regularity on the support, and strong convexity can be dispensed with for sieve variants.
- Adversarial Minimax Solvers: With appropriately strong convexity/Lipschitzness in the function classes, the estimated map's statistical risk is bounded by the sum of approximation and Rademacher complexity terms, yielding parametric $O(N^{-1/2})$ rates in typical architectures [2502.01310].
- Algorithmic Details: Neural OT map training involves stochastic optimization over batches, inner maximization for convex conjugates, and learning-rate/weight normalization strategies for stability. Complexity per epoch is determined by ICNN forward/backward passes and inner maximization steps.

5. Extensions to Non-Euclidean, Unbalanced, and Discontinuous Settings

Neural OT map learning is robustly extended to a variety of generalized OT settings:
- Unbalanced Optimal Transport (UOT): Neural Monge-maps can be adapted to reweighted source and target measures via entropic UOT couplings. The unbalanced Monge map is learned on empirical reweighted samples, improving robustness to mass-mismatch and outliers in applications such as cell trajectory inference and image translation. Empirically, UOT-adapted estimators (e.g., UOT-flow-matching) yield measurable improvements in target correspondence and attribute preservation [2311.15100].
- Discontinuous Maps: OT-Net utilizes a neural network to learn the height vector associated with the semi-discrete Brenier potential. Its explicit cell partitioning supports sharp discontinuities, crucial for representing transport between measures with disjoint or non-convex support, and achieves competitive generative modeling results without mode mixing [2306.08233].
- Cost-Parameterized Monge Maps: Input-convex neural networks parameterize strictly convex, translation-invariant costs, permitting optimized Monge map estimation and ground cost learning. End-to-end differentiability is ensured via Sinkhorn tracking and implicit differentiation, supporting domain-adaptive and regularized transport [2406.08399].
- Cross-Space/Manifold and Gromov-Monge Problems: Explicit alignment via learned strong isomorphisms and Monge-gap regularization enables practical map learning between incomparable metric-measure spaces [2407.14957].

6. Instabilities, Spurious Solutions, and Algorithmic Robustness

Despite the formal consistency of neural OT learning frameworks, several pitfalls require careful handling:
- Spurious Solutions: Semi-dual neural OT solvers can admit spurious critical points when uniqueness (e.g., absolute continuity of the source) fails, resulting in mappings that do not actually realize the correct transport. Smoothing strategies, such as the OTP approach, regularize the source by convolution or variance-preserving noise, ensuring uniqueness at each smoothing stage. The plan induced converges to the correct (possibly stochastic) OT plan as the noise level vanishes, thus enabling proper map learning in pathologically ill-posed regimes [2502.04583].
- Max-min Optimization Stability: Empirical evidence and theoretical analysis indicate adversarial semi-dual training may suffer from instability and severe sensitivity to hyperparameters. Time-based regularization (e.g., displacement interpolation, HJB consistency penalty in DIOTM [2410.03783]) and full-trajectory training provide major improvements by smoothing the learning dynamics both across time and in functional space.
- Natural Gradient and Geometry-Aware Optimization: Wasserstein natural-gradient descent in map-parameter space substantially improves convergence over Euclidean or Adam-style updates, corresponding to constrained flows in the $L^2$ metric on maps [2603.25182].

7. Application Domains and Empirical Performance

Neural OT map learning is widely applied and benchmarked:
- Generative Modeling and Diffusion Samplers: DPM-OT achieves substantial reductions in neural function calls per sample ($100\times$ speedup) without degradation in sample quality (e.g., FID $\approx3.6$, mode-mixture rates $\approx1.4\%$) [2307.11308].
- Domain Translation and GANs: AE-OT-GAN combines semi-discrete latent OT map learning with GAN discriminators, yielding improved FID and sharp image synthesis compared to GAN baselines, and resolving mode collapse systematically [2001.03698].
- Scientific Inverse Problems: Neural architectures for optical tomography (1D/2D convolutional nets) achieve sub-percent reconstruction error and several orders-of-magnitude computational speedup compared to PDE-based solvers [1910.04756].
- Unbalanced/Outlier-Rich and Cross-geometry Tasks: UOT-augmented neuro-transport achieves improved attribute and structure preservation across imbalanced or structurally mismatched datasets; cross-metric learning aligns and transports between manifolds or latent spaces with explicit geometric consistency [2311.15100, 2407.14957].
- Numerical Convergence: Consistent with theory, neural OT plug-in and sieve estimators attain decreasing $L^2$ error with sample size, and superiority of sieve estimators is most pronounced for heavy-tailed source distributions [2412.08064].

8. Open Challenges and Future Directions

Key limitations and ongoing research directions include:
- Finite-sample generalization for unbalanced and generalized costs: While parametric convergence rates are established for quadratic Euclidean cost and ICNN parameterizations, matching theory for unbalanced, manifold, and learned-cost regimes remains to be fully worked out [2311.15100, 2406.08399].
- Batch-level computational overhead for UOT and large-scale coupling: Entropic UOT computations scale quadratically with batch size yet can be competitive due to network backprop bottlenecks.
- Extension to high-dimensional, multi-modal, and partially observed data: Wasserstein natural gradient approaches, amortized Riemannian OT, and embedding-based cross-space alignment are actively being developed to address scalability and robustness in such contexts [2602.03566, 2603.25182, 2407.14957].
- Adaptive regularization and time-sampling: Optimal schedule selection for regularizer weights (e.g., $\lambda$ in DIOTM, $\tau$ in UOT) and for time or trajectory sampling (displacement interpolation) remains open for adaptive tuning.

Neural OT map learning thus comprises a collection of expressively parameterized, theoretically grounded, and statistically consistent algorithmic frameworks for learning optimal and problem-adapted transport maps, with broad impact across applied mathematics, machine learning, and computational sciences [2307.11308, 2412.08064, 2602.03566, 2502.01310, 2603.25182].