Optimal Transport Warping: Methods & Applications

Updated 12 February 2026

Optimal Transport Warping (OTW) is a framework that aligns data distributions by solving optimal transport problems with entropy regularization.
OTW methods include particle evolution, neural Hamilton–Jacobi flows, and Sinkhorn iterations, enabling efficient alignment for time series and images.
OTW guarantees unique, convex solutions with robust applications ranging from classification and generative modeling to spatio-temporal data analysis.

Optimal Transport Warping (OTW) is a suite of methodologies and algorithms for aligning, interpolating, or transforming probability measures, signals, or datasets under the framework of optimal transport (OT). OTW links the computation of transport maps or plans to warping operations in time, space, or general data domains, and serves as a robust alternative to traditional alignment techniques such as Dynamic Time Warping (DTW) for time series and image morphing in vision tasks. The mathematical and algorithmic structures of OTW rely on solving OT problems—often with added regularization or relaxed constraints—to produce smooth, interpretable, and often computationally efficient transport-based alignment.

1. Mathematical Principles and Entropy-Regularized OTW

At its core, OTW addresses the alignment of two distributions or datasets by minimizing a transport cost, allowing for warping between their supports. The canonical OT problem in the Kantorovich form seeks a coupling $\gamma$ with prescribed marginals $\mu$ , $\nu$ and cost $c(x, y)$ : $\min_{\gamma \in \Pi(\mu, \nu)} \int c(x, y) \, d\gamma(x, y)$

Entropy-regularized OTW relaxes the hard marginal constraints by introducing Kullback–Leibler (KL) divergence penalties: $\mathcal{E}_\Lambda(\gamma \mid \mu, \nu) = \iint c(x, y) \, d\gamma(x, y) + \Lambda D_{KL}(\pi_{1\#}\gamma \| \mu) + \Lambda D_{KL}(\pi_{2\#}\gamma \| \nu)$ where $\Lambda > 0$ tunes the strength of marginal relaxation. This formulation enables gradient-based solution methods and guarantees strict convexity and uniqueness of the minimizer for finite $\Lambda$ , with $\Gamma$ -convergence to the classical OT formulation as $\Lambda \rightarrow \infty$ (Theorems 2.8, 2.9 in (Liu et al., 2021)).

Recent advances also connect OTW to solutions of nonlinear PDEs such as the Hamilton–Jacobi equation, characterizing the OT map as a viscosity solution and enabling new computational frameworks based on the method of characteristics, with guarantees for existence and uniqueness under mild regularity assumptions (Park et al., 30 Sep 2025).

2. OTW Algorithms: Particle, Neural, and Sinkhorn-Based Methods

Multiple computational strategies instantiate OTW. Key approaches include:

Particle Evolution Approach: Discretizes the coupling as an empirical distribution over particle pairs $\mu$ 0, evolving their positions by Wasserstein-gradient flows of the entropy-regularized objective. The evolution is governed by interacting ODEs containing both cost gradients and KL terms, with convergence to the OT solution as $\mu$ 1 increases (Liu et al., 2021).
Neural Hamilton–Jacobi Flows: Represents the viscosity solution to the HJ equation by a neural network $\mu$ 2, and extracts forward/backward transport maps by closed-form characteristics. The network is trained via residual minimization of the HJ equation and MMD-based matching of push-forwarded marginals, guaranteeing convergence to the optimal map without adversarial training (Park et al., 30 Sep 2025).
Sinkhorn-OTW for Time Series: For discrete data such as time series, the entropic-regularized OT framework (Sinkhorn iterations) computes a soft, differentiable alignment plan that serves as a warping map between sequences. The algorithm exploits the strict convexity of the entropic OT objective and efficient matrix scaling (Moradi, 8 Jan 2025).
Closed-form and Linear-Time OTW: For 1D data, closed-form formulas leveraging empirical CDFs yield warpings in $\mu$ 3 time (Latorre et al., 2023), and derivative-based density matching enables efficient warping that provably captures warping deformations (Aldroubi et al., 8 May 2025).
Spatio-Temporal Alignments: Combines soft-DTW and unbalanced, entropy-regularized OT (Sinkhorn divergences) to enable differentiable and robust spatio-temporal warpings for multivariate data, with strong empirical performance for temporally and spatially structured signals (Janati et al., 2019).

3. Theoretical Properties and Guarantees

OTW inherits and extends the rigorous guarantees of optimal transport theory:

Existence and Uniqueness: For convex cost $\mu$ 4 and strictly convex regularization (entropy or KL penalties), the OTW objective admits a unique minimum (Liu et al., 2021). Viscosity solution frameworks retain uniqueness for a wide class of costs (Park et al., 30 Sep 2025).
Convergence: Under entropy or KL penalty scaling, minimizers converge (in $\mu$ 5) to the solution of the classical Monge or Kantorovich OT problem (Liu et al., 2021). For neural PDE-based solvers, zero residual plus MMD matching implies exact recovery of the Monge map (Park et al., 30 Sep 2025).
Metric Properties: OTW distances with appropriate regularizers (e.g., absolute value) define valid distances or divergences, preserving triangle inequality and non-negativity, and in 1D agree with known Wasserstein metrics (Latorre et al., 2023).
Computational Complexity: OTW achieves significant gains in scalability: $\mu$ 6 time for 1D and prefix-sum-based distances (Latorre et al., 2023), $\mu$ 7 for Sinkhorn iterations (Moradi, 8 Jan 2025), linear in batch size for neural solvers (Park et al., 30 Sep 2025), and efficient mesh-free discretization for particle flows (Liu et al., 2021).

4. Practical Applications and Empirical Benchmarks

OTW has demonstrated robust empirical performance across diverse domains:

Time Series Alignment: OTW outperforms DTW in robustness to noise, alignment of constant-value segments, and facilitates interpretability of alignments. In k-NN classification, OTW yields 3–5 pp absolute accuracy gain on UCR benchmarks (e.g., ECG200: DTW 85.0%, OTW 90.5%) (Moradi, 8 Jan 2025).
Imaging and Generative Modeling: OTW provides deterministic, content-preserving image generation and restoration (denoising, inpainting, colorization) with superior FID and detail preservation compared to adversarial alternatives (Rout et al., 2021). Unconditional and unpaired restoration tasks on datasets such as CelebA and CIFAR10 show clear improvements in both sample quality and quantitative metrics.
Spatio-Temporal Data: STA (a form of OTW) jointly handles spatial and temporal variability, vastly outperforming DTW in clustering multivariate signals (ARI up to 0.99 for brain simulation vs 0.60 for DTW) (Janati et al., 2019).
Computational Anatomy and Image Morphing: OTW with singular sources allows precise density modulation in morphing, enabling sharp feature preservation compared to diffusion-based metamorphosis (Maas et al., 2016).
Deep Learning Architectures: OTW-based "layers" replace DTW in neural models, yielding order-of-magnitude speedups and reduced resource demands in architectures processing long sequences (Latorre et al., 2023).

5. Limitations, Challenges, and Open Questions

Open directions and challenges in OTW include:

Curse of Dimensionality: Quadratic or even higher complexity in constructing full transport plans for long sequences or high-resolution images remains a bottleneck; multi-scale, sparse, or low-rank approximations have been proposed as mitigations (Moradi, 8 Jan 2025).
Parameter Selection: Choice of regularization weights, especially entropy/Sinkhorn parameters, significantly impacts the sparsity/fidelity trade-off and may require cross-validation.
Feature Engineering: For derivative-based OTW measures, sensitivity to noise or high-frequency oscillations can limit robustness; smoothing and alternative feature densities are under exploration (Aldroubi et al., 8 May 2025).
Theory: Displacement convexity (exponential convergence of gradient flows) and explicit error bounds for finite (sampled) particle systems remain open (Liu et al., 2021).
Extension to Asymmetric Domains or Unequal-Length Series: Further generalizations to non-square alignments, online/streaming scenarios, or multivariate/time-varying support are active research areas (Moradi, 8 Jan 2025).

6. Comparative Overview and Algorithmic Landscape

Domain / Task	OTW Approach	Key Properties
Time series (1D, discrete)	Sinkhorn-OTW, linear-time prefix sums	$\mu$ 8, differentiable, robust to warping (Latorre et al., 2023, Moradi, 8 Jan 2025)
Spatio-temporal alignment	Entropic Sinkhorn, soft-DTW hybrid	Differentiable, spatially aware (Janati et al., 2019)
High-dim. generative/vision	Min-max neural OT map estimation	Deterministic warping, FID improvements (Rout et al., 2021)
PDE-based map construction	Neural Hamilton–Jacobi flows	Explicit maps, end-to-end, adversarial-free (Park et al., 30 Sep 2025)
Mesh-free continuous OT	Particle-evolving Wasserstein gradients	Direct sampling, no spatial discretization (Liu et al., 2021)
Image warping/morphing	Benamou–Brenier w/ singular sources	Singular mass creation, sharp feature match (Maas et al., 2016)

7. Synthesis, Impact, and Outlook

Optimal Transport Warping unifies OT theory, regularized optimization, and algorithmic warping strategies across domains. OTW endows traditional alignment tasks with probabilistic interpretability and strong theoretical guarantees, while offering efficient, differentiable, and scalable algorithms—even in high-dimensional and complex data regimes. The variety and adaptability of OTW methods enable applications from classic time series classification to cutting-edge generative modeling and computational morphometrics.

Remaining challenges include further scaling to extremely high dimensions, integrating OTW into end-to-end learning frameworks (for example, task-invariant embedding learning), and the principled design of warping regularizers for domain-specific constraints. The field is actively evolving, and advances in scalable OT solvers, robust regularization, and neural representations continue to broaden the impact of optimal transport warping across data science, computational geometry, and machine intelligence (Latorre et al., 2023, Park et al., 30 Sep 2025, Liu et al., 2021, Maas et al., 2016, Moradi, 8 Jan 2025).