Conditional Optimal Transport (COT) Metric

Updated 20 May 2026

Conditional optimal transport (COT) is a framework that aligns probability measures conditionally on auxiliary variables, defining distances via fiber-wise 2-Wasserstein metrics.
It enables high-dimensional statistical learning, generative modeling, Bayesian inference, and causal inference through conditional alignment of distributions.
Computational approaches for COT include penalty relaxations, sample-based discretizations, and neural network parameterizations to efficiently solve conditional Monge and Kantorovich problems.

Conditional Optimal Transport (COT) generalizes classic optimal transport to settings where distributions are indexed or conditioned on auxiliary variables, enabling the alignment of probability measures in a fiber-wise (conditional) manner. The COT metric is central to high-dimensional statistical learning, generative modeling, Bayesian inference, and causal inference, particularly when population or empirical distributions are layered over continuous covariates or labels. Several distinct but compatible mathematical and computational formalisms exist (conditional Monge/Kantorovich, block-triangular maps, dual/adversarial formulations), with critical implications for geometry, statistical estimation, and machine learning algorithms.

1. Mathematical Formulation of Conditional Optimal Transport

Let $\rho(x|z)$ and $\mu(y|z)$ be families of conditional probability densities indexed by $z\in\mathcal Z$ , representing covariate or label slices. Classical OT seeks a map or coupling minimizing a cost over all of $\mathcal X\times\mathcal Y$ , whereas COT enforces alignment at each value of $z$ :

$\min_{T}\; \int_{\mathcal Z}\int_{\mathcal X} c(x,T(x,z);z)\,\rho(x,z)\,dx\,dz \quad \text{s.t.} \quad T_\#\rho(x|z) = \mu(y|z) \;\; \forall z.$

This "conditional Monge problem" seeks a mapping $T(x,z)$ such that, fixing $z$ , $T(\cdot,z)_\# \rho(\cdot|z)=\mu(\cdot|z)$ , achieving minimal expected cost $c(x,y;z)$ (often $\mu(y|z)$ 0) (Tabak et al., 2019). The corresponding "conditional Kantorovich" problem replaces the map by couplings, enforcing marginals over $\mu(y|z)$ 1.

Conditional extensions to function and Hilbert spaces, block-triangular maps, and various product measures are formally treated to encompass infinite-dimensional settings and more general conditional structures (Kerrigan et al., 2024, Hosseini et al., 2023).

2. Metric Properties and Topological Structure

The COT metric separates probability measures with identical covariate marginals by distances that reflect differences in the conditional distributions. For finite or continuous $\mu(y|z)$ 2: $\mu(y|z)$ 3 where $\mu(y|z)$ 4 is the classical 2-Wasserstein distance (Barboni et al., 2024, Tabak et al., 2019). This fiber-wise aggregation induces a genuine metric (non-negativity, symmetry, triangle inequality) on the space of joint laws with a fixed marginal on $\mu(y|z)$ 5 (Barboni et al., 2024, Generale et al., 2024).

The topology induced by $\mu(y|z)$ 6 is strictly stronger than that of the classical Wasserstein metric on the joint distribution; convergence in $\mu(y|z)$ 7 implies narrow convergence of joint laws and uniform convergence of all conditional distributions (Barboni et al., 2024). Regularity of COT as a functional typically requires additional topological conditions (e.g., adapted Wasserstein, see (Lin et al., 30 May 2025)).

Key properties include:

Property	Satisfied	Source Papers
Non-negativity	Yes	(Tabak et al., 2019, Barboni et al., 2024)
Symmetry	Yes (or can be symmetrized)	(Generale et al., 2024, Hosseini et al., 2023)
Triangle inequality	Yes (standard COT); not always for unbalanced COT	(Barboni et al., 2024, Yoon et al., 7 Mar 2026)
Completeness	Yes	(Barboni et al., 2024)
Stronger topology	Yes	(Barboni et al., 2024, Lin et al., 30 May 2025)

For extensions such as conditional unbalanced OT (CUOT), some metric properties (triangle inequality, strict symmetry) may fail, but quasi-metric structure and outlier-robust divergence remain (Yoon et al., 7 Mar 2026).

3. Computational Methodologies and Approximation

Direct solution of the conditional Monge or Kantorovich problems is intractable when $\mu(y|z)$ 8 is continuous or high-dimensional, as pointwise enforcement on each slice is statistically and numerically prohibitive (Tabak et al., 2019). Principal computational strategies include:

Penalty Relaxation: Replace hard pushforward constraints by Kullback-Leibler or divergence penalties, parameterize test functions via Donsker-Varadhan duality, and reformulate as adversarial min-max problems allowing sample-based empirical estimation (Tabak et al., 2019).
Sample-based Algorithms: Discretize integrals and constraints by empirical means, using batches $\mu(y|z)$ 9, and approach optimization over compound parameterizations (elementary map compositions, neural networks, conditional flows) (Tabak et al., 2019, Wang et al., 2023, Generale et al., 2024).
Neural Parameterization: Employ partially input-convex neural nets (PICNN) for static maps, neural ODEs for dynamic flows, and adversarial/discriminative critics for dual objectives. Theoretically, gradients and invariances necessary for optimality are preserved via proper architectural choices (Wang et al., 2023, Kerrigan et al., 2024).
Regularization and Empirical Consistency: Smoothed empirical measures (e.g., via kernel convolution) or statistical penalties (e.g., MMD) are necessary when enforcing conditional constraints using finite samples (Manupriya et al., 2023, Xu et al., 2021).
Batched/Semi-dual/Entropic Methods: In dynamic or simulation-free flows, batchwise assignment of pairs (via Sinkhorn, EMD) and Benamou-Brenier-type geodesic interpolation are employed to build scalable, amortized learning objectives (Zeghal et al., 28 Oct 2025, Generale et al., 2024, Kerrigan et al., 2024).

Computational complexity scales with the number of samples and slices (or RBF centers/neural parameters). No universal finite-sample rates exist for generic COT, but empirical and theoretical works provide $z\in\mathcal Z$ 0 rates for certain plug-in and discretization-based estimators (Lin et al., 30 May 2025, Manupriya et al., 2023).

4. Extensions: Dynamic, Unbalanced, and Causal OT

Dynamic COT: The Benamou–Brenier dynamic formulation extends to COT by constraining the velocity fields to be triangular (preserve the covariate) and minimize action in each fiber. Flows parameterized by neural ODEs, matched by regression against dynamically computed "bridge" velocities, yield simulation-free generative methods effective even in infinite-dimensional spaces (Kerrigan et al., 2024, Zeghal et al., 28 Oct 2025).

Unbalanced and Robust Variants: The conditional unbalanced optimal transport framework (CUOT) introduces Csiszár divergence penalties to relax the exact matching of conditional distributions, ensuring outlier robustness and stable estimation in sparse or contaminated data regimes (Yoon et al., 7 Mar 2026).

Causal Conditional OT: When the "conditioning" variable is time or a filtration, the causal OT or conditional COT framework restricts couplings to be non-anticipative (preserving adaptedness), enabling principled distances and generative models for sequential data (Xu et al., 2020, Xu et al., 2021).

Non-Euclidean and Geometric Extensions: COT has been extended to analysis on non-Euclidean manifolds (e.g., circular optimal transport), with definitions of COT as geodesic minimizations in the relevant geometry, often allowing for efficient closed-form or linearized solutions (Martin et al., 2023).

5. Applications in Learning, Inference, and Domain Adaptation

COT is foundational to a range of applications across domains:

Conditional Generative Modeling: COT-based flows and adversarial networks enable conditional density estimation, conditional sample generation, and simulation-to-simulation transfer in physics and imaging (Tabak et al., 2019, Zeghal et al., 28 Oct 2025, Kerrigan et al., 2024).
Bayesian Inverse Problems: Block-triangular Monge maps learned via COT characterize posterior distributions conditionally on observations, with amortized inference (i.e., sample once, predict for any $z\in\mathcal Z$ 1) (Hosseini et al., 2023, Wang et al., 2023).
Causal Inference: Covariate-assisted partial identification bounds for potential outcomes can be characterized exactly via COT, with direct estimators achieving optimality and robust performance in finite samples (Lin et al., 30 May 2025).
Domain Adaptation and Barycenters: COT provides a principled approach for feature and distribution alignment, barycenter computation, domain adaptation, and latent-variable discovery, all resting on conditional distance geometry (Yang et al., 2019).
Prompt Learning and Few-Shot Classification: Relaxed COT formulations with sample-based penalties improve prompt-tuning and domain adaptation by enforcing slice-wise alignment between auxiliary and main features (Manupriya et al., 2023).

Empirical results consistently show that COT-based models outperform marginal or unconditional OT in scenarios with structured covariate variability or sample imbalance (Tabak et al., 2019, Lin et al., 30 May 2025, Manupriya et al., 2023).

6. Limitations, Open Problems, and Extensions

While COT provides a powerful and flexible framework, several limitations and areas for further investigation remain:

Finite-Sample Analysis: General closed-form error bounds for high-dimensional, continuous conditional OT remain elusive; most consistency results apply under strong smoothness or regularity assumptions (Tabak et al., 2019, Lin et al., 30 May 2025).
Model Selection and Numerical Sensitivity: Performance depends on choices of kernel bandwidth, divergence penalty, map/test-function parameterization, and optimization heuristics; poorly tuned hyperparameters can lead to misestimation or instability (Tabak et al., 2019, Yoon et al., 7 Mar 2026).
Unbalanced and Latent Conditioning: Directions for future work include maximal efficiency in the unbalanced and latent-covariate settings, stronger robustness to misspecification, and efficient computation over large families of conditionals (Yoon et al., 7 Mar 2026, Tabak et al., 2019).
Structure-Preserving Flows and Metric Geometry: Extensions to general metric measure spaces, non-Euclidean geometries, and more exotic conditional dependencies (graphical, topological) are active research areas (Martin et al., 2023, Barboni et al., 2024).
Theory-Practice Gap: Practical performance hinges on the interface between theory (metric properties, existence/uniqueness, convergence) and empirical algorithm design (neural architectures, flow-matching, adversarial games), with ongoing need for robust unification.

In totality, the Conditional Optimal Transport metric and its algorithmic avatars underpin a powerful class of statistical, computational, and geometric methods for aligning, interpolating, and transforming structured distributions in modern data science (Tabak et al., 2019, Barboni et al., 2024, Yoon et al., 7 Mar 2026).