Wasserstein Barycenter Fusion

Updated 21 April 2026

Wasserstein barycenter fusion is a method that aggregates multiple probability distributions using the Wasserstein metric to compute a geometric mean.
It employs a nonconvex–concave minimax formulation with the WDHA algorithm, integrating Wasserstein descent and Sobolev ascent for precise fusion.
The approach outperforms entropic methods with faster runtimes and sharper results, making it suitable for large-scale multi-sensor and high-resolution image fusion.

Wasserstein barycenter fusion refers to the process of aggregating multiple probability distributions into a single representative distribution—the Wasserstein barycenter—by optimizing a mean in the space of probability measures endowed with the Wasserstein metric. This operation preserves geometric and spatial structure, providing a non-linear notion of “averaging” suitable for both continuous and discrete distributions in high dimensions. Recent developments emphasize nearly linear-time computation without entropic blurring, strong theoretical guarantees, and direct application to large-scale multi-modal or multi-sensor fusion tasks (Kim et al., 24 Jan 2025).

1. Mathematical Formulation and Variational Principle

Given input probability vectors $\mu_i\in\mathbb{R}^m$ (histograms with $\mu_i^j\ge0$ and $\sum_j\mu_i^j=1$ ) on a common support, the unregularized discrete Wasserstein-2 barycenter is defined as

$\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$

where $\Delta_m$ is the probability simplex, and $W_2^2(\nu, \mu_i)$ is the squared 2-Wasserstein distance. Kantorovich duality yields the equivalent minimax variational form: $\min_{\nu\in\Delta_m}\, \max_{\varphi_1,\dots,\varphi_n \in \text{Conv}} \frac{1}{n} \sum_{i=1}^n \left\{ \sum_{j=1}^m \left( \frac{1}{2}\|x_j\|^2 - \varphi_i^j \right) \nu^j + \sum_{k=1}^m \left( \frac{1}{2}\|x_k\|^2 - (\varphi_i^*)^k \right)\mu_i^k \right\}$ where $\varphi_i$ are Kantorovich dual potentials, $\varphi_i^*$ denotes the convex conjugate, and “Conv” is the cone of convex vectors (Kim et al., 24 Jan 2025).

2. Nonconvex–Concave Saddle Point Reformulation

Defining the block-separable objective: $J(\nu, \Phi) = \frac{1}{n} \sum_{i=1}^n I^{\mu_i}_\nu(\varphi_i), \quad I^\mu_\nu(\varphi) = \sum_{j=1}^m \left( \tfrac{1}{2} \|x_j\|^2 - \varphi_j \right) \nu_j + \sum_{k=1}^m \left( \tfrac{1}{2} \|x_k\|^2 - \varphi^*_k \right) \mu_k$ the barycenter problem reduces to

$\mu_i^j\ge0$ 0

This objective is nonconvex in $\mu_i^j\ge0$ 1 (geodesically convex in general) but, crucially, is concave in each block $\mu_i^j\ge0$ 2.

3. The Wasserstein-Descent Homogeneous Sobolev-Ascent (WDHA) Algorithm

WDHA alternates:

Primal descent in the Wasserstein ( $\mu_i^j\ge0$ 3) geometry (on the barycenter $\mu_i^j\ge0$ 4)
Dual ascent in the homogeneous Sobolev ( $\mu_i^j\ge0$ 5) geometry (on potentials $\mu_i^j\ge0$ 6)

Pseudocode:

input: {μ_i}ⁿ_{i=1} on m‐point grid; init ν⁰∈Δ_m, φ_i⁰∈Conv
for t=0…T−1 do
    for i=1…n do
        -- Sobolev ascent (dual update) --
        \hatφ_i ← φ_i^t + η·∇_{Ḣ¹} I^{μ_i}_{ν^t}(φ_i^t)
        φ_i^{t+1} ← projection onto Conv(\hatφ_i)
    end
    -- Wasserstein descent (primal update) --
    \barφ ← (1/n)\sum_i φ_i^{t+1}
    ν^{t+1} ←  (id – τ·(id – ∇\barφ))_# ν^t
end
output: ν^T, {φ_i^T}

The key gradients are:

Dual $\mu_i^j\ge0$ 7: $\mu_i^j\ge0$ 8
Primal $\mu_i^j\ge0$ 9: $\sum_j\mu_i^j=1$ 0 (Kim et al., 24 Jan 2025).

4. Convergence and Complexity

Under suitable density boundedness and step size constraints (specific thresholds given in (Kim et al., 24 Jan 2025)), the squared Wasserstein gradient norm decays as

$\sum_j\mu_i^j=1$ 1

ensuring finding an $\sum_j\mu_i^j=1$ 2-stationary point in $\sum_j\mu_i^j=1$ 3 iterations.

Computational complexity per WDHA iteration:

Operation	Time Complexity	Space Complexity
Dual block update	$\sum_j\mu_i^j=1$ 4	$\sum_j\mu_i^j=1$ 5
Primal update	$\sum_j\mu_i^j=1$ 6	$\sum_j\mu_i^j=1$ 7
LP for OT map	$\sum_j\mu_i^j=1$ 8	$\sum_j\mu_i^j=1$ 9
Sinkhorn-type	$\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 0 (for accuracy $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 1)	--

Efficient WDHA implementation leverages fast Legendre transforms and FFT-based Poisson solvers.

Each data modality or sensor provides a discrete distribution $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 2 on a common spatial grid. WDHA fuses these to a geometric barycenter $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 3 that captures the central "arithmetic mean" in Wasserstein geometry—without introducing entropic blur, unlike regularized Sinkhorn solvers.

Practical points:

Algorithmically scalable (per-iteration $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 4), feasible on GPUs for grids up to $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 5.
Leveraging separability for efficient 1D row/column transforms.
Optional projection by convex double conjugation for dual supports (Kim et al., 24 Jan 2025).

6. Empirical Results: Accuracy and Runtime Advantages

WDHA yields sharper and more accurate unregularized barycenters than entropic-regularized (Sinkhorn-type) approaches, with significantly lower runtime:

Experiment	WDHA (iter/time/cost)	CWB (entropic, time/cost)	DSB (entropic, time/cost)
4 shapes $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 6	300 / 676s / 74.58e-3	3731s / 75.07e-3 (blurred)	7249s / 74.58e-3 (blurry)
Handwritten "8" $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 7	~3300s (sharp)	~10800s (blur)	~11200s (blur)

WDHA, marrying Wasserstein primal descent and Sobolev dual ascent, outperforms entropic-regularized algorithms in both sharpness of the barycenter (no blurring) and wall-clock time, particularly at high resolution (Kim et al., 24 Jan 2025).

7. Theoretical Guarantees and Practical Recommendations

The WDHA minimax optimization (nonconvex in $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 8, concave in dual blocks) admits well-posedness under boundedness and geometric convexity assumptions.
Convergence rate: gradient norm decays as $\bar\nu = \arg\min_{\nu \in \Delta_m} \frac{1}{n} \sum_{i=1}^n W_2^2(\nu, \mu_i)$ 9, ensuring linear rate to stationarity.
Memory and runtime per iteration scale as $\Delta_m$ 0 and $\Delta_m$ 1 respectively, with linear memory even for high-dimensional barycenter fusion.
WDHA is recommended for scenarios requiring sharp barycentric fusion at scale (e.g., sensor networks, high-res image morphing), where entropic regularization would induce excessive bias or blur.

References

J. Kim, L. Nurbekyan, G. Peyré, "Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization", (Kim et al., 24 Jan 2025) (2025) (Kim et al., 24 Jan 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein Barycenter Fusion.