Optimal Transport Barycenters

Updated 7 June 2026

Optimal Transport Barycenters are defined as minimizers of weighted Wasserstein distances, providing a geometric averaging of probability distributions.
They are computed via methods like LP formulations, entropic regularization, and particle-based gradient flows, balancing precision with computational complexity.
Their applications in statistics, imaging, and sensor fusion highlight their practical importance in aggregating high-dimensional and structured data.

Optimal transport barycenters, commonly referred to as Wasserstein barycenters, generalize the notion of Euclidean averaging to the metric geometry of probability measures. They are central objects in statistics, machine learning, signal processing, imaging, and computational geometry due to their ability to produce geometrically meaningful averages of distributions under optimal transport costs. The study of optimal transport barycenters encompasses foundational theory, computational complexity, algorithm design, statistical analysis, and broad applications across high-dimensional data sciences.

1. Mathematical Foundations and General Formulation

Given probability measures $\mu_1, \ldots, \mu_k$ on a metric space $(X, d)$ and barycentric weights $\lambda_1, \ldots, \lambda_k \geq 0$ , $\sum_{i=1}^k \lambda_i = 1$ , the $p$ -Wasserstein barycenter is defined as the minimizer (if it exists) of

$\nu^* = \operatorname{argmin}_{\nu \in \mathcal{P}_p(X)} \sum_{i=1}^k \lambda_i\, W_p^p(\mu_i, \nu)$

where $W_p$ denotes the $p$ -Wasserstein distance. In the Euclidean quadratic case ( $p=2$ , $X = \mathbb{R}^d$ ), $(X, d)$ 0 is induced by the minimal cost for moving mass between $(X, d)$ 1 and $(X, d)$ 2 using cost $(X, d)$ 3.

A critical reformulation expresses the problem as a multi-marginal optimal transport (MMOT) problem: for $(X, d)$ 4 on $(X, d)$ 5, the barycenter is the pushforward under the weighted mean map $(X, d)$ 6 of an optimal coupling $(X, d)$ 7 between the $(X, d)$ 8 with cost function $(X, d)$ 9 (Friesecke et al., 2022).

Existence of minimizers holds under mild compactness or moment conditions. In the discrete case, barycenters are always supported on finitely many points; for continuous measures, further analytic structure emerges (Anderes et al., 2015).

2. Computational Complexity and Hardness

Wasserstein barycenter computation is fundamentally harder than two-marginal OT. Altschuler & Boix-Adser established that even for discrete measures with uniform weights and binary supports, computing the 2-Wasserstein barycenter is NP-hard in ambient dimension $\lambda_1, \ldots, \lambda_k \geq 0$ 0; moreover, it remains NP-hard to approximate the value to any additive error $\lambda_1, \ldots, \lambda_k \geq 0$ 1 in time polynomial in $\lambda_1, \ldots, \lambda_k \geq 0$ 2, $\lambda_1, \ldots, \lambda_k \geq 0$ 3, $\lambda_1, \ldots, \lambda_k \geq 0$ 4, $\lambda_1, \ldots, \lambda_k \geq 0$ 5 unless NP $\lambda_1, \ldots, \lambda_k \geq 0$ 6BPP (Altschuler et al., 2021). The curse of dimensionality is thus inherent: all known algorithms for exact computation scale exponentially in $\lambda_1, \ldots, \lambda_k \geq 0$ 7 (or support size $\lambda_1, \ldots, \lambda_k \geq 0$ 8).

This hardness extends to generalized barycenters with arbitrary $\lambda_1, \ldots, \lambda_k \geq 0$ 9, and to both weighted and unweighted cases. Single-pair OT (two marginal) suffers no such curse—network flow or LP solutions exist in time polynomial in $\sum_{i=1}^k \lambda_i = 1$ 0 and $\sum_{i=1}^k \lambda_i = 1$ 1 (Altschuler et al., 2021).

Approximation results are similarly limited. Algorithms based on restricting the barycenter to the union of supports of the $\sum_{i=1}^k \lambda_i = 1$ 2 achieve a 2-approximation in polynomial time (Borgwardt, 2017), but improving the approximation factor below 2 is computationally intractable in high dimension under standard complexity assumptions.

3. Discrete and Regularized Barycenter Computation

Exact and Approximate LP Formulations

When marginals are discrete, all barycenters are also discrete with support contained in the set of all possible barycentric means over tuples, i.e., $\sum_{i=1}^k \lambda_i = 1$ 3 (Anderes et al., 2015). The exact LP formulation, however, is of size $\sum_{i=1}^k \lambda_i = 1$ 4 in the number of marginals and support points, which is infeasible for large $\sum_{i=1}^k \lambda_i = 1$ 5 or $\sum_{i=1}^k \lambda_i = 1$ 6 (Borgwardt et al., 2018). Improved LP models reduce the number of variables, leveraging mass-splitting properties and combinatorial constraints, but still require exponential time in general.

Borgwardt (Borgwardt, 2017) demonstrated that restricting to the union of supports yields a strongly-polynomial 2-approximation LP; with further iterative refinement (Algorithm 3), a sparse barycenter with a non-mass-splitting transport is recovered in practice in minutes on standard hardware, albeit with a worst-case error factor of 2.

Entropic regularization (Sinkhorn) allows scalable approximate computation by smoothing the transport problem; however, this introduces blur and destroys sparsity and non-mass-splitting structure. Sliced Wasserstein and regularized barycenters offer different bias-variance tradeoffs (Li et al., 2020, Portales et al., 27 May 2025).

Stochastic and Particle-based Methods

Alternatives to large-scale LPs include stochastic optimization and particle-based gradient flows. The stochastic semidiscrete method of Claici–Chien–Solomon iteratively updates discrete barycenter support locations by alternating dual optimization and support adjustments, enabling computation without regularization while maintaining sharp barycenter structure (Claici et al., 2018).

Geometry-aware particle-flow algorithms interpret the barycenter functional as a gradient flow in Wasserstein space, advecting particles along averaged OT displacement fields and using Kantorovich barycentric projections when Monge maps do not exist (You, 14 Sep 2025). This approach preserves atom sharpness, avoids entropic blur, and admits convergence and stability guarantees, with computational complexity linear in the number of input distributions and particles per iteration.

Column generation and MMOT reduction algorithms exploit the sparsity of optimal transport plans for mesh-free computation of high-dimensional barycenters, as in the GenCol algorithm, which efficiently grows active supports via genetic mutation and pricing procedures (Friesecke et al., 2022).

4. High-Dimensional and Generative Approaches

Conventional LP and grid-based OT barycenter solvers are infeasible in high-dimensions. Conditional normalizing flows have enabled scalable computation of Monge maps and barycenters in $\sum_{i=1}^k \lambda_i = 1$ 7 by directly parameterizing invertible transport maps from a shared latent Gaussian to the input measures, using a variance minimization objective for the barycenter (Visentin et al., 28 May 2025). This methodology supports several hundred input measures, provides generative barycenters, and demonstrates stable error scaling up to $\sum_{i=1}^k \lambda_i = 1$ 8.

Deep convolutional neural network (CNN) surrogates have shown success in learning barycenter operators for high-resolution images, with inference times two orders of magnitude faster than classic solvers. These models, trained on pairs of barycenters with Sinkhorn-type losses, generalize to arbitrary numbers of input measures and perform well in large-scale sketch interpolation and color transfer (Lacombe et al., 2021).

Additionally, weak OT and energy-based model approaches reframe the barycenter problem as bi-level or single-player optimization, yielding scalable algorithms for continuous and regularized barycenters with non-Euclidean or manifold-constrained costs (Kolesov et al., 2023, Kolesov et al., 2024).

5. Statistical and Structural Properties

Barycenter estimation from empirical measures has been analyzed under finite-sample regimes. For sparse-support barycenters with at most $\sum_{i=1}^k \lambda_i = 1$ 9 atoms and $p$ 0 samples per input, the uniform statistical excess risk is $p$ 1, holding for Wasserstein, Sinkhorn, and sliced Wasserstein divergences, and independent of ambient dimension $p$ 2 (Portales et al., 27 May 2025). This provides rigorous guidance on atom count selection in relation to sample size, balancing bias and variance.

In special geometric settings, such as measures supported on trees or directed graphs, modified OT metrics (e.g., layerwise-Wasserstein) better preserve structural features under barycentric interpolation, guaranteeing that barycenter supports remain dendritic and convex (Kim et al., 2019).

Generalization to Riemannian and fiber-bundle settings is realized via disintegrated Monge–Kantorovich metrics, yielding existence and uniqueness of barycenters on arbitrary connected, complete manifolds—removing restrictions such as non-branching or absence of cut loci (Kitagawa et al., 21 Jan 2026).

6. Robustness, Extensions, and Emerging Directions

Robust barycenter estimation, resilient to outliers and class imbalance, can be formulated via semi-unbalanced optimal transport, relaxing marginal constraints with $p$ 3-divergence penalties and enabling adversarial training on neural potentials (Gazdieva et al., 2024). Such methods interpolate between strict OT and classically robust means.

Linearized Wasserstein barycenters (LBCM) offer closed-form solutions in compatible measure families—especially in 1D, where the LBCM is weakly dense in the space of probability measures—and exhibit promising applications in covariance estimation and imputation (Werenski et al., 2024).

Emerging themes include the fusion of OT barycenter theory with deep generative models, manifold learning, non-Euclidean geometry, and scalable optimization in high-dimensional or sample-limited settings (Kim et al., 24 Jan 2025, Visentin et al., 28 May 2025, You, 14 Sep 2025, Kolesov et al., 2023, Gazdieva et al., 2024). Adapting methods to general costs, regularization types, and geometric constraints remains a central research frontier.

7. Applications and Practical Considerations

Optimal transport barycenters are used for the aggregation and averaging of distributions in fields ranging from Bayesian posterior fusion, image and signal archetyping, and distributed clustering, to sensor fusion under systematic noise (Liu et al., 5 Feb 2026, You, 14 Sep 2025). Applications exploit properties such as preservation of sharp features, interpretability of atomic supports, and adaptability to compositional, high-dimensional, or structured data.

Major computational trade-offs concern accuracy versus tractability: unregularized methods preserve sharpness but scale poorly, while regularized or neural methods scale but may obscure fine structure or introduce bias. Exploiting structure (e.g., parametric families, low-dimensional support, manifold constraints) is essential for scalable, interpretable barycenter computation in modern data regimes (Altschuler et al., 2021). Practitioners are advised to account for inherent computational hardness and to select algorithms aligned with the statistical, geometric, and performance requirements of their domain.

Representative References:

"Wasserstein barycenters are NP-hard to compute" (Altschuler et al., 2021)
"Discrete Wasserstein Barycenters: Optimal Transport for Discrete Data" (Anderes et al., 2015)
"An LP-based, Strongly-Polynomial 2-Approximation Algorithm for Sparse Wasserstein Barycenters" (Borgwardt, 2017)
"Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters" (Lindheim, 2022)
"A Particle-Flow Algorithm for Free-Support Wasserstein Barycenters" (You, 14 Sep 2025)
"Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows" (Visentin et al., 28 May 2025)
"Continuous Regularized Wasserstein Barycenters" (Li et al., 2020)
"Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport" (Gazdieva et al., 2024)