Nested Optimal Transport Distance

Updated 9 September 2025

Nested optimal transport distance is a time-causal extension of classical optimal transport that leverages bi-causal couplings to maintain sequential feasibility.
It employs a dynamic programming formulation and parallelized tree-based quantization to efficiently compute multi-level couplings over time.
This metric robustly supports sequential decision problems, such as financial modeling and reinforcement learning, by ensuring Lipschitz continuity of optimal values.

The nested optimal transport distance is a time-causal extension of classical optimal transport metrics, specifically tailored to stochastic processes and decision-making scenarios where the temporal adaptation of couplings is essential. Rather than relying solely on one-step couplings between distributions, the nested optimal transport introduces a multi-level, recursive construction in which couplings must respect the filtrations (information sets) revealed up to each time point. This time-adaptation is particularly relevant for the evaluation and simulation of financial time series and any context where dynamic policies use only past and present data for optimal control or prediction.

1. Time-Causal Structure and Motivation

Classical optimal transport distances (e.g., Wasserstein-2, $W_2$ ) measure the cost of morphed mass between two probability measures $\mu, \nu$ on $\mathbb{R}^{dT}$ (interpreted as $d$ -dimensional, $T$ -step paths), using the infimum over all couplings $\pi \in \mathrm{Cpl}(\mu, \nu)$ :

$W_2^2(\mu, \nu) = \inf_{\pi \in \mathrm{Cpl}(\mu, \nu)} \int \|x - y\|^2\, d\pi(x, y).$

However, in sequential tasks such as hedging, optimal stopping, or reinforcement learning, admissible strategies must be adapted to information available up to the current time $t$ . The nested optimal transport distance imposes a bi-causal (or adapted) structure: for each time step, the conditional law of the coupling is a coupling of the conditional laws, given the observed history. The restriction to bi-causal couplings ensures that the transport plan does not "peek into the future," preserving dynamic feasibility.

The resulting adapted Wasserstein-2 distance, denoted $AW_2^2(\mu,\nu)$ , is defined by:

$AW_2^2(\mu, \nu) = \inf_{\pi \in \mathrm{Cpl}_{\text{bi-causal}}(\mu, \nu)}\, \int \left(\sum_{t=1}^T \|x_t - y_t\|^2\right)\, d\pi(x, y),$

where $\mathrm{Cpl}_{\text{bi-causal}}(\mu,\nu)$ is the set of bi-causal couplings.

2. Dynamic Programming Representation

A central result is that $AW_2^2(\mu, \nu)$ can be computed recursively via a dynamic programming principle (DPP). Given two measures $\mu$ and $\nu$ on $\mathbb{R}^{dT}$ , define, for $t = 0, \ldots, T$ and $(x_{1:t}, y_{1:t})$ ,

$V_T^{(\mu, \nu)} \equiv 0,$

$V_t^{(\mu, \nu)}(x_{1:t}, y_{1:t}) = \inf_{\pi \in \mathrm{Cpl}(\mu_{x_{1:t}}, \nu_{y_{1:t}})} \int \left[\|x_{t+1} - y_{t+1}\|^2 + V_{t+1}^{(\mu, \nu)}(x_{1:t+1}, y_{1:t+1})\right]\, d\pi,$

where $\mu_{x_{1:t}}$ and $\nu_{y_{1:t}}$ are the conditional measures given the histories $x_{1:t}$ and $y_{1:t}$ . By induction, $V_0^{(\mu, \nu)} = AW_2^2(\mu, \nu)$ .

This formulation exposes the essential nested structure: at each step, the cost-to-go is the sum of present transportation cost and the expected cost at the next time, conditioned on the realized history, optimizing over admissible (conditional) couplings.

3. Robustness in Sequential Decision Problems

For dynamic problems in finance—such as hedging, optimal stopping, and reinforcement learning—robustness with respect to time-causal perturbations of the underlying probability law is paramount. The adapted Wasserstein metric provides a metric under which many stochastic optimization problems become Lipschitz-continuous:

$|v(\mu) - v(\nu)| \leq L \cdot AW_2(\mu, \nu),$

where $v(\mu)$ and $v(\nu)$ denote the optimal objective values in decision problems under $\mu$ and $\nu$ , respectively.

This robustness property is critical for stress testing, scenario analysis, and model selection, as it ensures that simulated or generated synthetic time series are guaranteed to behave comparably in downstream decision tasks.

4. Algorithmic Computation and Empirical Quantization

The proposed algorithm computes the nested optimal transport distance efficiently via a two-step procedure.

a. Quantization and Tree Construction:

Samples $x^{(i)} \in \mathbb{R}^{dT}$ are quantized onto a fixed grid of width $\Delta_N$ using a mapping $\varphi^N$ .
Quantized paths are then organized into a tree structure by their histories. The tree structure enables efficient calculation of conditional measures.

b. Parallelized Dynamic Programming Recursion:

For each node (history) in the tree, conditional distributions are computed efficiently.
The dynamic programming recursion $V_t^{(\hat{\mu}^N, \hat{\nu}^N)}$ is evaluated in parallel for all pairs of quantized path histories, exploiting repeated substructures and enabling substantial speedups.

Once recursion reaches $t=0$ , the computed $V_0^{(\hat{\mu}^N, \hat{\nu}^N)}$ is the nested OT distance between the quantized empirical measures.

Statistical Consistency:

By Theorem 2 in the source paper, as $N$ increases and $\Delta_N \sim N^{-1/(dT)}$ , the empirical nested OT distance converges almost surely to $AW_2^2(\mu, \nu)$ . In the Markovian case, further simplification accelerates convergence, improving the rate to $O(N^{-1/(2d)})$ .

5. Comparison with Standard and Nested OT Implementations

Existing algorithms for Wasserstein or nested optimal transport distances typically scale poorly with both dimension $d$ and time horizon $T$ , especially for long time series:

They do not exploit repeated subtrees in empirical quantization.
They are not naturally parallelizable.
Convergence rates are limited by naive discretization.

The presented tree-structured, parallel algorithm dramatically improves performance:

Quantization reduces computational complexity by collapsing repeated path histories.
Parallelized recursion leverages hardware for direct speedup.
The Markovian variant (for processes with the Markov property) further mitigates dimension curse.

Empirical benchmarks—such as on Ornstein-Uhlenbeck processes and simulated versus real Brownian motion—demonstrate orders-of-magnitude acceleration over standard implementations, as well as theoretical convergence guarantees.

6. Mathematical Formulation and General Properties

Key mathematical objects and results:

Notation	Definition	Significance
$W_2^2$	inf over all couplings, sum squared cost over full paths	Classical Wasserstein-2
$AW_2^2$	inf over bi-causal couplings, sum squared cost per timestep	Time-adapted, robust for dynamics
$V_t^{(\mu,\nu)}$	Dynamic program value at time $t$ (history-indexed)	Recursive computation of $AW_2^2$

The bi-causal (nested) coupling constraint ensures that conditionals align with available information, essential for causal evaluation of sequential models.

7. Practical Implications and Applications

The nested optimal transport distance is particularly suitable for:

Evaluating generative models and synthetic scenario generators in finance, guaranteeing consistency with functional tasks (hedging, policy evaluation, stopping problems).
Stress testing and robust scenario generation.
Tasks in which downstream value is sensitive to time-local information or adapted policies.

A plausible implication is that nested optimal transport distances fill a gap in the toolbox for real-world financial modeling, offering a metric that is both sensitive to the dynamic, time-causal structure of processes and robust in decision-theoretic applications.

By combining dynamic-programming formulation with parallelizable tree-based quantization, the nested optimal transport metric becomes computationally tractable for high-dimensional, long-horizon problems encountered in practical finance, decision analytics, and simulation-based policy evaluation.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Nested Optimal Transport Distance.