Papers
Topics
Authors
Recent
2000 character limit reached

Wasserstein Barycentres in Optimal Transport

Updated 21 December 2025
  • Wasserstein barycentres are the Fréchet means of probability measures computed using Wasserstein distance, preserving geometric structure unlike classical averages.
  • They use discrete, semi-discrete, and regularized algorithms to provide convergent and scalable solutions even in high-dimensional settings.
  • Their application enhances Bayesian aggregation, model ensembling, and domain adaptation by integrating geometric properties for robust inference.

A Wasserstein barycentre is the Fréchet mean of a collection of probability measures with respect to the Wasserstein (optimal transport) distance. Given measures {μi}i=1m\{\mu_i\}_{i=1}^m on a metric space (X,d)(\mathcal X, d) and weights {λi}\{\lambda_i\}, the barycentre μ\mu^* minimizes the weighted sum of squared Wasserstein distances: μ=argminμiλiW22(μ,μi)\mu^* = \mathop{\arg\min}_{\mu} \sum_i \lambda_i W_2^2(\mu, \mu_i). This aggregation preserves geometric properties of distributions in ways classical (e.g., Euclidean or KL) averaging does not. The concept of the Wasserstein barycentre underpins key developments across statistics, machine learning, large-scale inference, and optimal transport theory.

1. Mathematical Definition and Existence

Formally, let (E,d)(E,d) be a separable, locally compact, geodesic metric space, and p1p\geq1. The pp-Wasserstein distance between measures μ,νWp(E)\mu,\nu \in \mathcal{W}_p(E) is

Wp(μ,ν)=[infπΓ(μ,ν)E×Ed(x,y)pdπ(x,y)]1/pW_p(\mu, \nu) = \bigg[ \inf_{\pi\in\Gamma(\mu,\nu)} \int_{E\times E} d(x,y)^p d\pi(x, y) \bigg]^{1/p}

where Γ(μ,ν)\Gamma(\mu, \nu) is the set of couplings with marginals μ\mu, ν\nu.

Given a collection {μi}i=1m\{\mu_i\}_{i=1}^m and weights {λi0},λi=1\{\lambda_i \geq 0\}, \sum \lambda_i=1, the pp-Wasserstein barycentre is any minimizer

μargminμWp(E)i=1mλiWpp(μ,μi)\mu^* \in \arg\min_{\mu\in \mathcal{W}_p(E)} \sum_{i=1}^m \lambda_i\, W_p^p(\mu, \mu_i)

For empirical or population versions, this minimization generalizes to expectations over random measures. Existence of a barycentre is guaranteed under broad conditions, e.g., if (E,d)(E,d) is geodesic and locally compact and PWp(Wp(E))\mathbb{P}\in \mathcal W_p(\mathcal W_p(E)) (Gouic et al., 2015).

Uniqueness is not guaranteed in full generality: it holds when EE is a non-positively curved (NPC) space or in Rd\mathbb R^d for p=2p=2 if at least one measure is absolutely continuous, or more generally, if all support measures avoid concentration on small sets (Gouic et al., 2015). In Riemannian or manifold settings, additional structure and curvature conditions yield existence and uniqueness, and absolute continuity can be established when the marginals have it (Kim et al., 2014).

2. Algorithmic Approaches

Discrete and Semi-Discrete Methods

Discrete barycentres arise when all measures are supported on finite sets (Anderes et al., 2015). The barycentre support is itself discrete, contained within the set of all centroids of one support point from each marginal. The barycentre is recovered as the solution to a large-scale multi-marginal linear program (LP), for which specialized LP or block-wise ADMM solvers offer globally convergent and linearly convergent algorithms (Yang et al., 2018).

For continuous measures, semi-discrete schemes restrict the barycentre to a fixed set of support points, leaving the inputs unconstrained. The central optimization is cast via saddle-point duality in the space of weights and dual OT potentials. The parallel, streaming, and scalable stochastic gradient methods operate on dual variables associated with each input—using only samples from the inputs, and generating a barycentre that tracks possibly non-stationary input distributions. These admit explicit error guarantees in O(n1/d)O(n^{-1/d}) in the number of barycentre support points nn and O(T1/2)O(T^{-1/2}) with TT stochastic gradient steps (Staib et al., 2017).

Table: Discrete and Semi-Discrete Barycentre LPs

Setting Barycentre Support Optimization Form
All marginals discrete Set of all centroids ixi,ki/N\sum_i x_{i,k_i}/N Multi-marginal LP (Anderes et al., 2015)
Semi-discrete Barycentre on nn points, inputs arbitrary Saddle-point with sampled OT duals (Staib et al., 2017)

Regularized and Continuous Approaches

Regularization, most often entropic, enables efficient Sinkhorn-type scaling algorithms for the barycentre problem and stabilizes the numerics in high dimensions (Li et al., 2020, Dognin et al., 2019). Dual formulations for the regularized barycentre admit SGD or primal-dual methods, leveraging closed-form gradient oracles for the duals. This strategy scales to continuous input distributions without explicit discretization of the inputs (Li et al., 2020); smoothness may be maintained, at the cost of bias due to regularization.

Non-entropic convex regularization (e.g., quadratic or Bregman) gives rise to alternative sample complexity and computational guarantees in empirical barycentre estimation settings (Dvinskikh, 2021).

For high-dimensional problems, projection-robust barycentre models project measures into lower-dimensional subspaces maximizing the barycentric objective, and then compute barycentres in these subspaces, reducing the effective sample and computational complexity (Huang et al., 2021).

3. Statistical Properties and Stability

Wasserstein barycentres exhibit strong statistical properties, admitting consistency results: empirical barycentres converge, in Wasserstein distance, to the population barycentre as the number of measures or data per measure increases (Gouic et al., 2015). Explicit concentration and stability rates are available under density and regularity assumptions; for example, if marginal distributions are close in W2W_2, their barycentres are close (with Hölder exponent $1/6$ for p=2p=2) (Carlier et al., 2022): W2(μP,μQ)(Cα)1/6iwiW2(μi,νi)1/6W_2(\mu_{P},\mu_{Q}) \leq \Bigl(\frac{C}{\alpha}\Bigr)^{1/6} \sum_{i} w_{i} W_2(\mu_i,\nu_i)^{1/6} where α\alpha is the minimal weight of regular (well-behaved) marginals.

Approximate or regularized barycentres maintain bias bounds: entropic penalty λ\lambda yields W2(μPλ,μP0)λ1/6W_{2}(\mu_P^\lambda,\mu_P^0)\lesssim \lambda^{1/6} (Carlier et al., 2022).

Population barycentres minimize expected WppW_p^p distances with respect to a law PP on Wp(E)\mathcal{W}_p(E) (Gouic et al., 2015, Lau et al., 2022). In the Gaussian and location-scatter setting, barycentric operators commute with the location and scatter family and have closed-form fixed point equations (Lau et al., 2022).

4. Extensions: Gradient Flows, Nonconvex-Concave Minimax, and Generic Costs

Recent algorithms recast the barycentre computation as a gradient flow in the Wasserstein space, directly minimizing an objective that includes both barycentric and regularization terms. Empirical and Gaussian mixture barycentres are treated via flows over particles or parameters, and theoretical convergence guarantees are derived under Polyak–Łojasiewicz inequalities (Montesuma et al., 6 Oct 2025).

The WDHA primal-dual method achieves minimax convergence for the unregularized barycentre on large discrete grids, alternating between Wasserstein-geometric (primal) descent and Sobolev-geometric (dual) ascent. This yields nearly linear-time algorithms for grids, with O(1/T)O(1/T) convergence to stationarity and high practical performance compared to standard Sinkhorn solvers (Kim et al., 24 Jan 2025).

For generic transport costs, fixed-point iterations extend the barycentre computation beyond classical quadratic costs (Tanguy et al., 20 Dec 2024). The fixed-point map uses multi-marginal couplings and barycentric projection, converging (subsequentially) to a barycentre under continuity and uniqueness-of-barycentric-map hypotheses, and is empirically efficient for a wide class of geometries and costs.

5. Generalizations and Canonical Selection

The barycentre definition can be generalized to include negative weights, provided their sum remains positive, and still yields existence results in Hilbert spaces (Tornabene et al., 11 Nov 2024). Uniqueness is more delicate—guaranteed only when at most one coefficient is positive. In one dimension, the barycentre’s quantile function is the L2L^2-projection of the quantile barycenter onto the cone of nondecreasing functions. Stability properties of these generalized barycentres analogously mirror the positive-case barycentre under L2L^2 projections.

Canonical selection of barycentres among a nonunique set is addressed via Wasserstein-geometric regularization: adding a vanishingly small penalty for closeness to a reference measure selects a unique barycentre that minimizes this distance among all barycentric minimizers (Kim et al., 2017).

6. Distributed, Parallel, and Learning-Based Approaches

Scalable computation of Wasserstein barycentres includes:

  • Parallel streaming algorithms: Only communicate minimal statistics (two integers per update), leveraging a master–worker protocol for massive-scale, streaming data and time-varying inputs (Staib et al., 2017).
  • Distributed consensus via displacement interpolation: Each agent asynchronously updates its measure by pairwise geodesic interpolation, converging (in expectation) to the barycentre of all agents' initial measures (Cisneros-Velarde et al., 2020).
  • Neural network-based barycentre estimation: Deep CNNs trained on synthetic OT barycentres can infer high-dimensional barycentres rapidly for tasks such as shape morphing or color transfer; accuracy is competitive with, and speed orders-of-magnitude better than, traditional OT solvers in moderate dimensions (Lacombe et al., 2021).

7. Applications and Implications

Wasserstein barycentres provide a principled, geometry-respecting aggregate of multiple distributions, fundamental in:

  • Bayesian aggregation: Efficiently fusing non-iid posterior samples across data partitions (Staib et al., 2017, Li et al., 2020).
  • Ensembling and transfer: Model ensemble via W. barycentres explicitly incorporates semantic relations and divergence smoothing; proven improvements in classification, captioning, and multi-label prediction (Dognin et al., 2019).
  • Domain adaptation: Multi-source aggregation for unsupervised transfer, leveraging barycentres as prototypes for adaptation (Montesuma et al., 6 Oct 2025).
  • Clustering and dimension reduction: Projection-robust barycentres as effective low-dimensional summaries in high-dimensional or structured data (Huang et al., 2021).
  • Generic geometry/statistics: Construction of Fréchet means in manifold or geodesic metric spaces and for measures on Riemannian manifolds (Kim et al., 2014).

The statistical, computational, and practical features of Wasserstein barycentres have made them central in modern machine learning, statistics, and computational probability, with ongoing research addressing their scalability, bias/variance tradeoffs, stability, and generalization across diverse cost structures and data geometries.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Wasserstein Barycentres.