Partial Optimal Transport

Updated 14 December 2025

Partial Optimal Transport is a framework that generalizes classical optimal transport by allowing only a prescribed mass to be transported under relaxed marginal constraints.
It employs methodologies such as linear programming and entropic regularization (Sinkhorn scaling) to efficiently solve the mass-constrained optimization problems.
Applications span domain adaptation, clustering under imbalance, and scientific feature tracking, offering robust solutions in machine learning and computer vision.

Partial Optimal Transport (POT) extends classical optimal transport by permitting only a prescribed mass fraction to be transported, under marginal constraints that allow for surplus or deficit. This framework generalizes balanced OT and unbalanced OT, enabling flexible matching between distributions of unequal mass. POT is now a core tool in mathematical analysis, computational optimal transport, machine learning, computer vision, and statistics, with broad applications including domain adaptation, shape and color matching, clustering under imbalance, and scientific feature tracking. The field encompasses two-marginal, multi-marginal, and generalized settings, with algorithms spanning linear programming, scaling methods, and first-order convex optimization.

1. Mathematical Formulation: Primal, Dual, and Variants

In the two-marginal case, given nonnegative measures $\mu$ and $\nu$ on a ground space $\Omega$ and a transported mass $0 \leq m \leq \min\{\mu(\Omega), \nu(\Omega)\}$ , the primal Kantorovich problem for POT seeks a coupling $\gamma$ on $\Omega \times \Omega$ satisfying

$\begin{aligned} &\min_{\gamma \geq 0} \int_{\Omega \times \Omega} c(x, y) \, d\gamma(x, y) \ &\text{subject to } \gamma(\Omega \times \Omega) = m, \;\; \gamma_1 \leq \mu, \;\; \gamma_2 \leq \nu, \end{aligned}$

where $\gamma_1, \gamma_2$ are marginals and $c$ is a ground cost (Bai et al., 2022). In the discrete case, this involves searching for a matrix $T \in \mathbb{R}_+^{n \times m}$ with $T 1_m \leq \mu$ , $T^\top 1_n \leq \nu$ , and $1_n^\top T 1_m = m$ (Chapel et al., 2020).

In unbalanced and generalized settings, mass destruction and creation penalties appear: $\inf_\gamma \left[ \int c \, d\gamma + \lambda_1 D_1(\gamma_1, \mu) + \lambda_2 D_2(\gamma_2, \nu) \right],$ where $D_1, D_2$ may be total variation, Kullback–Leibler, or other divergence functionals (Bai, 9 Jul 2024). Generalized optimal partial transport (GOPT) covers this setting.

Multi-marginal POT involves $N \geq 2$ measures $\mu_j$ and searches for a coupling $\sigma$ on $\Omega^N$ with total mass $m$ and dominated marginals $\pi_j{}_\# \sigma \leq \mu_j$ (Kitagawa et al., 2014, Le et al., 2021). A sharp uniqueness condition for multi-marginal problems requires $M(\bar\mu) \leq m$ with $\bar\mu = \min_j \frac{d\mu_j}{dx} dx$ .

Duality for POT involves variables clipped by the penalty parameter $\lambda$ : $\max_{\phi, \psi} \sum_{i=1}^n \min\{\phi_i, \lambda\} + \sum_{j=1}^m \min\{\psi_j, \lambda\},$ subject to $\phi_i + \psi_j \leq c(x_i, y_j)$ (Bai et al., 2022).

2. Computational Methods and Algorithmic Complexity

Linear programming formulations discretize the domain and solve sparse LPs with inequality constraints and a total mass equality (Oberman et al., 2015). Efficient grid refinement and support growth yield near-linear scaling, and barycentric projections recover approximate maps.

Entropic regularization via Sinkhorn scaling is standard for large-scale instances (Bai, 9 Jul 2024). For classical POT, scaling updates alternate between prox-divide operations for the source and target: $u \gets \mathrm{proxdiv}^{KL}_{D_1/\epsilon}(Kv), \quad v \gets \mathrm{proxdiv}^{KL}_{D_2/\epsilon}(K^\top u),$ with $K_{ij} = e^{-c_{ij}/\epsilon}$ . Marginal constraints are enforced via clipping or thresholding.

Recent work demonstrates that naïve application of balanced Sinkhorn with augmented "dummy" points fails to exactly satisfy the mass constraint, and specialized "Round-POT" algorithms enforce feasibility with $O(n^2)$ complexity (Nguyen et al., 2023). First-order methods such as Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) and Dual Extrapolation further improve scaling, achieving $\widetilde O(n^2/\varepsilon)$ complexity for an $\varepsilon$ -approximate solution.

Multi-marginal entropic regularization is addressed by tensor-scaling procedures, often exponentially complex in the number of marginals $m$ , with complexity $\widetilde O(m^3 (n+1)^m/\varepsilon^2)$ for entropic regularization (Le et al., 2021).

Specialized schemes exist for one-dimensional and semi-discrete settings, e.g., thickening in an auxiliary dimension for Newton stability and regularizing the dual problem, yielding quadratic convergence rates in the regularization parameter (Cances et al., 10 Sep 2025, Bai et al., 2022).

3. Analytic Structure and Geometric Properties

Geometric analysis of POT reveals that for $N=2$ the set of transported (or "active") marginals grows monotonically with transported mass; the free boundaries separating transported from idle regions are regular (Caffarelli–McCann, Figalli). In higher dimensions and for non-convex domains, the free boundary is smooth except for finitely many singular points or corners (Chen et al., 2023).

Multi-marginal POT introduces failures of monotonicity: as the mass parameter increases, active marginals may fail to be nested, and supports may shrink or otherwise behave non-monotonically—phenomena not present in the classical two-marginal case (Kitagawa et al., 2014). In multi-marginal settings, optimal couplings are not necessarily concentrated on graphs, necessitating barycentric reformulations.

The barycenter equivalence recasts $N$ -marginal POT as minimization over a single measure (the "partial barycenter"), connected to optimal plans via the average mapping, with a sharp mass-overlap uniqueness criterion.

Generalized frameworks with functional penalties admit variable destruction/creation costs and local regularity in spatially inhomogeneous settings (Bai, 9 Jul 2024). Stability results, e.g., for Fused Gromov–Wasserstein distances between measure networks, tie changes in input features to bounded changes in the induced POT metric (Li et al., 2023).

4. Extensions and Applications

POT has been exploited in machine learning for domain adaptation, positive–unlabeled learning, color/shape adaptation, and clustering under severe class imbalance.

Domain Adaptation: Partial Wasserstein distance is used for domain alignment terms in partial domain adaptation (PDA); generalization bounds are derived via PAC-Bayes theory, assigning weights to source samples via the optimal partial coupling (Naram et al., 3 Jun 2025).
Positive–Unlabeled Learning: Partial Wasserstein and Gromov–Wasserstein distances provide sparse, exact matching between positive and unlabeled data, lifting previous limitations of entropic methods (Chapel et al., 2020).
Clustering under Imbalance: Progressive partial transport (SP²OT, PROTOCOL) gradually relaxes mass constraints, avoids mode collapse, and learns semantic pseudo-labeling in deep clustering of imbalanced data (Zhang et al., 4 Apr 2024, Xue et al., 14 Jun 2025).
Sliced Partial OT: High-dimensional measures are efficiently compared by slicing to one-dimensional marginals and assembling partial OT distances, yielding metrics and robust applications in point-cloud registration and color histogram transfer (Bai et al., 2022).
Medical Imaging: In anomaly detection tasks (MADPOT), multi-prompt features are partially aligned to local image patches, isolating abnormal regions more robustly than balanced OT (Shiri et al., 9 Jul 2025).
Scientific Feature Tracking: Partial Fused Gromov–Wasserstein distances enable probabilistic tracking of merge tree features in time-varying data (Li et al., 2023).

In computational practice, mini-batch POT stabilizes mapping for large-scale generative modeling and deep domain adaptation, while semi-discrete 1D solvers address risk modeling and crowd motion (Nguyen et al., 2021, Cances et al., 10 Sep 2025).

5. Theoretical Guarantees and Open Questions

Existence of minimizers in POT is assured by compactness and lower-semicontinuity. Uniqueness requires sharp mass-overlap criteria; in multi-marginal settings $M(\bar\mu) \leq m$ is necessary and sufficient (Kitagawa et al., 2014). Metric properties for discrete and sliced POT have been established, with distances defined via transport cost plus mass penalties (Bai et al., 2022).

Stability bounds link the change in POT cost to input perturbations, and semi-metric properties are shown for partial Gromov–Wasserstein variants (Li et al., 2023). The quadratic convergence of regularized dual solvers in 1D semi-discrete cases is established rigorously (Cances et al., 10 Sep 2025).

Open problems include regularity and geometric description of free boundaries in higher dimensions, extensions to general cost functions (e.g., $c(x_1,\dots,x_N) = \inf_y \sum_j c_j(x_j, y)$ ), the numerical stability of multi-marginal partial barycenters, and scalable algorithms for high $m$ in multi-marginal settings. Further work is needed to connect functional penalized GOPT to new application domains and to analyze convergence rates of first-order and scaling methods in the presence of entropic and KL penalties.

6. Relationships and Contrasts with Classical OT

POT generalizes the classic Kantorovich formulation to supply/demand scenarios with inherent mass mismatch, relaxing marginal equality in favor of inequalities and fixed total transported mass. When $\mu(\Omega) = \nu(\Omega)$ and $m = \mu(\Omega)$ , POT reduces to classical OT.

Balanced OT often forces arbitrary matches due to mass constraints, while POT flexibly ignores outlier mass and restricts transport to high-confidence matches. Multi-marginal and barycenter-based formulations further reveal structural differences; e.g., convexity, monotonicity, graph-concentration of plans, and boundary regularity observed in classical settings may fail for partial and multi-marginal generalizations (Kitagawa et al., 2014, Le et al., 2021, Chen et al., 2023).

Generalized optimal partial transport (GOPT) recovers classical OT under constant penalty and total variation choices, and all balanced OT LP solvers can be re-used through suitable augmentation (Bai, 9 Jul 2024).

In summary, Partial Optimal Transport formalizes the allocation of mass between distributions under partial matching and destruction/creation penalties, extending classic optimal transport theory and its algorithmic and analytic tools to highly flexible and application-heavy contemporary domains [(Kitagawa et al., 2014, Bai, 9 Jul 2024, Nguyen et al., 2023, Bai et al., 2022, Naram et al., 3 Jun 2025, Le et al., 2021, Nguyen et al., 2021), 2(2509.8799, Li et al., 2023, Zhang et al., 4 Apr 2024, Oberman et al., 2015, Xue et al., 14 Jun 2025, Shiri et al., 9 Jul 2025, Chapel et al., 2020, Bai et al., 2023, Chen et al., 2023)].