Wasserstein Barycentres in Optimal Transport
- Wasserstein barycentres are the Fréchet means of probability measures computed using Wasserstein distance, preserving geometric structure unlike classical averages.
- They use discrete, semi-discrete, and regularized algorithms to provide convergent and scalable solutions even in high-dimensional settings.
- Their application enhances Bayesian aggregation, model ensembling, and domain adaptation by integrating geometric properties for robust inference.
A Wasserstein barycentre is the Fréchet mean of a collection of probability measures with respect to the Wasserstein (optimal transport) distance. Given measures on a metric space and weights , the barycentre minimizes the weighted sum of squared Wasserstein distances: . This aggregation preserves geometric properties of distributions in ways classical (e.g., Euclidean or KL) averaging does not. The concept of the Wasserstein barycentre underpins key developments across statistics, machine learning, large-scale inference, and optimal transport theory.
1. Mathematical Definition and Existence
Formally, let be a separable, locally compact, geodesic metric space, and . The -Wasserstein distance between measures is
where is the set of couplings with marginals , .
Given a collection and weights , the -Wasserstein barycentre is any minimizer
For empirical or population versions, this minimization generalizes to expectations over random measures. Existence of a barycentre is guaranteed under broad conditions, e.g., if is geodesic and locally compact and (Gouic et al., 2015).
Uniqueness is not guaranteed in full generality: it holds when is a non-positively curved (NPC) space or in for if at least one measure is absolutely continuous, or more generally, if all support measures avoid concentration on small sets (Gouic et al., 2015). In Riemannian or manifold settings, additional structure and curvature conditions yield existence and uniqueness, and absolute continuity can be established when the marginals have it (Kim et al., 2014).
2. Algorithmic Approaches
Discrete and Semi-Discrete Methods
Discrete barycentres arise when all measures are supported on finite sets (Anderes et al., 2015). The barycentre support is itself discrete, contained within the set of all centroids of one support point from each marginal. The barycentre is recovered as the solution to a large-scale multi-marginal linear program (LP), for which specialized LP or block-wise ADMM solvers offer globally convergent and linearly convergent algorithms (Yang et al., 2018).
For continuous measures, semi-discrete schemes restrict the barycentre to a fixed set of support points, leaving the inputs unconstrained. The central optimization is cast via saddle-point duality in the space of weights and dual OT potentials. The parallel, streaming, and scalable stochastic gradient methods operate on dual variables associated with each input—using only samples from the inputs, and generating a barycentre that tracks possibly non-stationary input distributions. These admit explicit error guarantees in in the number of barycentre support points and with stochastic gradient steps (Staib et al., 2017).
Table: Discrete and Semi-Discrete Barycentre LPs
| Setting | Barycentre Support | Optimization Form |
|---|---|---|
| All marginals discrete | Set of all centroids | Multi-marginal LP (Anderes et al., 2015) |
| Semi-discrete | Barycentre on points, inputs arbitrary | Saddle-point with sampled OT duals (Staib et al., 2017) |
Regularized and Continuous Approaches
Regularization, most often entropic, enables efficient Sinkhorn-type scaling algorithms for the barycentre problem and stabilizes the numerics in high dimensions (Li et al., 2020, Dognin et al., 2019). Dual formulations for the regularized barycentre admit SGD or primal-dual methods, leveraging closed-form gradient oracles for the duals. This strategy scales to continuous input distributions without explicit discretization of the inputs (Li et al., 2020); smoothness may be maintained, at the cost of bias due to regularization.
Non-entropic convex regularization (e.g., quadratic or Bregman) gives rise to alternative sample complexity and computational guarantees in empirical barycentre estimation settings (Dvinskikh, 2021).
For high-dimensional problems, projection-robust barycentre models project measures into lower-dimensional subspaces maximizing the barycentric objective, and then compute barycentres in these subspaces, reducing the effective sample and computational complexity (Huang et al., 2021).
3. Statistical Properties and Stability
Wasserstein barycentres exhibit strong statistical properties, admitting consistency results: empirical barycentres converge, in Wasserstein distance, to the population barycentre as the number of measures or data per measure increases (Gouic et al., 2015). Explicit concentration and stability rates are available under density and regularity assumptions; for example, if marginal distributions are close in , their barycentres are close (with Hölder exponent $1/6$ for ) (Carlier et al., 2022): where is the minimal weight of regular (well-behaved) marginals.
Approximate or regularized barycentres maintain bias bounds: entropic penalty yields (Carlier et al., 2022).
Population barycentres minimize expected distances with respect to a law on (Gouic et al., 2015, Lau et al., 2022). In the Gaussian and location-scatter setting, barycentric operators commute with the location and scatter family and have closed-form fixed point equations (Lau et al., 2022).
4. Extensions: Gradient Flows, Nonconvex-Concave Minimax, and Generic Costs
Recent algorithms recast the barycentre computation as a gradient flow in the Wasserstein space, directly minimizing an objective that includes both barycentric and regularization terms. Empirical and Gaussian mixture barycentres are treated via flows over particles or parameters, and theoretical convergence guarantees are derived under Polyak–Łojasiewicz inequalities (Montesuma et al., 6 Oct 2025).
The WDHA primal-dual method achieves minimax convergence for the unregularized barycentre on large discrete grids, alternating between Wasserstein-geometric (primal) descent and Sobolev-geometric (dual) ascent. This yields nearly linear-time algorithms for grids, with convergence to stationarity and high practical performance compared to standard Sinkhorn solvers (Kim et al., 24 Jan 2025).
For generic transport costs, fixed-point iterations extend the barycentre computation beyond classical quadratic costs (Tanguy et al., 20 Dec 2024). The fixed-point map uses multi-marginal couplings and barycentric projection, converging (subsequentially) to a barycentre under continuity and uniqueness-of-barycentric-map hypotheses, and is empirically efficient for a wide class of geometries and costs.
5. Generalizations and Canonical Selection
The barycentre definition can be generalized to include negative weights, provided their sum remains positive, and still yields existence results in Hilbert spaces (Tornabene et al., 11 Nov 2024). Uniqueness is more delicate—guaranteed only when at most one coefficient is positive. In one dimension, the barycentre’s quantile function is the -projection of the quantile barycenter onto the cone of nondecreasing functions. Stability properties of these generalized barycentres analogously mirror the positive-case barycentre under projections.
Canonical selection of barycentres among a nonunique set is addressed via Wasserstein-geometric regularization: adding a vanishingly small penalty for closeness to a reference measure selects a unique barycentre that minimizes this distance among all barycentric minimizers (Kim et al., 2017).
6. Distributed, Parallel, and Learning-Based Approaches
Scalable computation of Wasserstein barycentres includes:
- Parallel streaming algorithms: Only communicate minimal statistics (two integers per update), leveraging a master–worker protocol for massive-scale, streaming data and time-varying inputs (Staib et al., 2017).
- Distributed consensus via displacement interpolation: Each agent asynchronously updates its measure by pairwise geodesic interpolation, converging (in expectation) to the barycentre of all agents' initial measures (Cisneros-Velarde et al., 2020).
- Neural network-based barycentre estimation: Deep CNNs trained on synthetic OT barycentres can infer high-dimensional barycentres rapidly for tasks such as shape morphing or color transfer; accuracy is competitive with, and speed orders-of-magnitude better than, traditional OT solvers in moderate dimensions (Lacombe et al., 2021).
7. Applications and Implications
Wasserstein barycentres provide a principled, geometry-respecting aggregate of multiple distributions, fundamental in:
- Bayesian aggregation: Efficiently fusing non-iid posterior samples across data partitions (Staib et al., 2017, Li et al., 2020).
- Ensembling and transfer: Model ensemble via W. barycentres explicitly incorporates semantic relations and divergence smoothing; proven improvements in classification, captioning, and multi-label prediction (Dognin et al., 2019).
- Domain adaptation: Multi-source aggregation for unsupervised transfer, leveraging barycentres as prototypes for adaptation (Montesuma et al., 6 Oct 2025).
- Clustering and dimension reduction: Projection-robust barycentres as effective low-dimensional summaries in high-dimensional or structured data (Huang et al., 2021).
- Generic geometry/statistics: Construction of Fréchet means in manifold or geodesic metric spaces and for measures on Riemannian manifolds (Kim et al., 2014).
The statistical, computational, and practical features of Wasserstein barycentres have made them central in modern machine learning, statistics, and computational probability, with ongoing research addressing their scalability, bias/variance tradeoffs, stability, and generalization across diverse cost structures and data geometries.