Empirical OT Barycenter Cost Functional

Updated 20 September 2025

The paper introduces the empirical OT barycenter cost functional as a method for averaging probability distributions via optimal transport metrics, preserving intrinsic geometric structures.
It leverages discretization, linear programming, and barycentric projection to improve accuracy and scalability in high-dimensional settings.
The approach underpins diverse applications in imaging, multi-agent learning, and robust statistics with established convergence rates and theoretical guarantees.

The empirical OT (Optimal Transport) barycenter cost functional captures the process of averaging probability distributions with respect to optimal transport discrepancies, in a way that preserves geometric and spatial structures intrinsic to the input measures. Unlike naive averaging (such as in Euclidean space), the OT barycenter leverages the Wasserstein (or more generally, OT) distance to interpolate or summarize a set of input distributions, producing a barycenter measure that sits “centrally” relative to the reference distributions in Wasserstein space. Empirical variants arise, in practice, by estimating this cost functional from finite data or discretized measures. This functional forms the backbone of numerous modern methodologies in statistics, data science, economics, imaging, learning, and computational geometry.

1. Mathematical Formulation and Discretization

Let $\{\rho_i\}_{i=1}^n$ be input probability measures with positive weights $\{\mu_i\}_{i=1}^n$ , $\sum_i \mu_i = 1$ . The prototypical OT barycenter cost functional is

$J(\rho) = \sum_{i=1}^n \mu_i W_2^2(\rho, \rho_i),$

where $W_2^2$ is the quadratic optimal transportation cost. The barycenter is defined as

$\rho^* = \operatorname{argmin}_{\rho \in \mathcal{P}(X)} J(\rho).$

For more general costs, $c(x,y) = |x-y|^p$ can be used; the quadratic case is prevalent but not exclusive.

Empirical implementation replaces the continuous measures $\rho$ with discretized measures,

$\rho \approx \sum_{i=1}^n a_i \delta_{x_i}, \quad \rho_j \approx \sum_{j=1}^m b_j^k \delta_{y_j^k},$

and the transport cost is determined by a matrix $c_{ij} = c(x_i, y_j^k)$ ; the barycenter cost then becomes a finite-dimensional functional. In the practical method of (Oberman et al., 2015), the barycenter computation reduces to the minimization of $J(\rho)$ for empirical measures, implemented as a finite-dimensional LP.

2. Numerical Algorithms and Barycentric Projection

The discretization process in (Oberman et al., 2015) yields a sparse, large-scale linear program. Grid refinement methods enable scalability: after coarse solutions, supports are refined and the LP is solved again, with overall computation scaling linearly with the number of grid points in the sparse version.

A key practical point is that discretized OT solutions give optimal transport plans (possibly splitting mass), not maps. To improve map approximation, a barycentric projection is used: for each source grid point $x_i$ , the barycentric projection computes the weighted average of targets (by outgoing flow): $\bar{y}_i = \frac{\sum_j \pi_{ij} y_j}{\sum_j \pi_{ij}},$ and the plan is replaced by a mapping plan

$\bar\pi = \sum_i \mu_i \delta_{(x_i, \bar{y}_i)}.$

This dramatically improves the accuracy and sharpness of barycenters, especially for high-resolution shape data or image barycenters.

3. Empirical and Statistical Aspects

The empirical OT barycenter cost functional is central to high-dimensional inference, estimation, and learning. Several results establish convergence, rates and statistical properties:

The empirical cost functional $J(\rho)$ computed from finite samples converges to the population version at rates dependent on the complexity (e.g., intrinsic dimension) and smoothness of the data (Hundrieser et al., 2022, Staudt et al., 2023).
Under suitable regularity (Lipschitz/semi-concave costs, low-complexity support), rates approach $n^{-1/d}$ , or even $n^{-2/d}$ for smooth costs, where $d$ is the intrinsic dimension.
In semi-discrete or low-dimensional settings, rates can be parametric: $n^{-1/2}$ convergence for empirical barycenter cost values (Hundrieser et al., 2022).
Asymptotic distributions are obtainable via Hadamard directional differentiability and the delta method. The empirical barycenter cost functional’s limiting law is characterized as a maximization over a set of solutions of a Gaussian process indexed by the dual optimal plans (Hundrieser et al., 2022, Groppe et al., 17 Sep 2025).
In settings where the cost function is unknown and estimated from data, the limiting fluctuations of empirical OT barycenter cost include both sampling variation and error from cost estimation, with central limit theorems established under weak assumptions (Hundrieser et al., 2023).

4. Extensions: Unbalanced, Robust, and Entropic OT Barycenters

Empirical OT barycenter cost functionals have been generalized to treat several real-world complexities:

Unbalanced OT / Kantorovich–Rubinstein: These functionals permit mass variation, adding a mass–penalty parameter $C$ ,

$\mathrm{KR}_{p,C}(\mu, \nu) := \left[ \inf_{\pi \leq (\mu, \nu)} \sum_{x,x'} d(x,x')^p \pi(x,x') + \frac{C^p}{2} [|\mu| + |\nu| - 2|\pi|] \right]^{1/p}$

with the corresponding Fréchet barycenter minimized using this cost (Heinemann et al., 2021). The choice of $C$ acts as a cutoff: for small $C$ , the cost reduces to total variation, while for large $C$ it becomes the Wasserstein cost.

Robust barycenters: Marginal constraints in the transport problems can be relaxed with Kullback–Leibler or ψ-divergence penalties, creating robust functionals less sensitive to outliers or class imbalance (Le et al., 2021, Gazdieva et al., 4 Oct 2024).
Entropic/smoothed OT barycenters: Adding an entropy penalty to the transport plan promotes regularity and stability, leading to functionals such as (Cuturi et al., 2018, Shen et al., 2020, Kolesov et al., 2023): $\operatorname{EOT}_\varepsilon(\mu, \nu) = \min_{\pi \in \Pi(\mu, \nu)} \int c(x,y) d\pi(x,y) - \varepsilon H(\pi).$ Entropic regularization yields strong convexity and smoothness, enabling scalable gradient-based optimization (e.g., Sinkhorn iteration).
Multimarginal/Schrödinger barycenters: Multi-distribution barycenters using entropy-regularized multimarginal OT provide scalable, unique solutions with dimension-free parametric rates (Li et al., 4 Feb 2025).

5. Algorithmic Frameworks and High-Dimensional Computation

Empirical OT barycenter computation requires efficient solvers, especially for large-scale or high-dimensional data:

Sinkhorn algorithms and entropic OT: State-of-the-art methods leverage matrix scaling and block coordinate ascent in dual variables, efficiently handling high-dimensional grid or sample-based measures (Cuturi et al., 2018, Li et al., 4 Feb 2025).
Functional gradient descent in RKHS: The Sinkhorn Descent method recasts the barycenter task as functional optimization over perturbations of the identity map in an RKHS, admitting scalable updates and structure preservation (Shen et al., 2020).
Neural adversarial optimization: Modern deep learning solvers (e.g., bi-level adversarial min-max problems) learn both potential functions and transport maps parameterized by neural networks; these can handle general costs, manifold constraints, and large samples (Kolesov et al., 2023, Kolesov et al., 6 Feb 2024, Gazdieva et al., 4 Oct 2024).
Tree-based diffusion/schrodinger bridge: For multi-input barycenters, tree-structured entropic regularization enables tractable and highly scalable consensus through iterative proportional fitting on graphs (Noble et al., 2023).
Randomized and subsampled schemes: Empirical barycenter functionals can be evaluated efficiently via plug-in estimators on subsampled measures, with explicit error bounds and massive computational speedup (Hundrieser et al., 2022).

6. Applications, Extensions, and Impact

Empirical OT barycenter cost functionals underpin a broad spectrum of applications:

Imaging and shape analysis: High-resolution image barycenters, shape morphing, and structural averaging with sharp geometric features are achievable through the grid refinement + barycentric projection approach (Oberman et al., 2015).
Multi-agent learning and consensus: In MARL, Sinkhorn barycenters of agents’ visitation distributions serve as soft, geometry-aware group policies, with rapid convergence and strong empirical performance (Baheri, 14 Jun 2025).
Robust statistics and testing: OT barycenter test statistics extend nonparametric ANOVA/ANOSVA to distributions, providing permutation and bootstrap-based inference for factorial designs with known asymptotic law (Groppe et al., 17 Sep 2025).
There is substantial impact in generative modeling, style transfer, and domain adaptation, where empirical barycenter functionals allow for faithful and interpretable aggregation of empirical distributions under geometric constraints (Kolesov et al., 2023, Kolesov et al., 6 Feb 2024).

7. Theoretical Guarantees and Ongoing Developments

The theoretical analysis of empirical OT barycenter cost functionals includes:

Directional differentiability and CLTs: The OT barycenter cost as a function of empirical input measures is directionally Hadamard differentiable, allowing application of central limit theory via the functional delta method (Hundrieser et al., 2022, Groppe et al., 17 Sep 2025).
Convergence in unbounded domains: Decomposition-based analysis extends sharp convergence rates to unbounded domains under moment conditions, ensuring empirical barycenter estimators are consistent for heavy-tailed or high-dimensional data (Staudt et al., 2023).
Dimension-independent sample complexity: Entropic regularization and specific barycenter constructions (e.g., the Schrödinger barycenter (Li et al., 4 Feb 2025), entropy-regularized $W_2$ barycenter (Mallery et al., 13 Jan 2025)) yield parametric or nearly parametric $n^{-1/2}$ rates even in high dimensions.

Empirical OT barycenter cost functionals thus provide mathematically principled, robust, and computationally feasible mechanisms for averaging, summarizing, and testing collections of probability distributions under geometric optimal transport metrics, with profound impact across statistics, machine learning, and applied computational sciences.