Fast Computation of Wasserstein Barycenters (1310.4375v3)

Published 16 Oct 2013 in stat.ML

Abstract: We present new algorithms to compute the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is the measure that minimizes the sum of its Wasserstein distances to each element in that set. We propose two original algorithms to compute Wasserstein barycenters that build upon the subgradient method. A direct implementation of these algorithms is, however, too costly because it would require the repeated resolution of large primal and dual optimal transport problems to compute subgradients. Extending the work of Cuturi (2013), we propose to smooth the Wasserstein distance used in the definition of Wasserstein barycenters with an entropic regularizer and recover in doing so a strictly convex objective whose gradients can be computed for a considerably cheaper computational cost using matrix scaling algorithms. We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.

Citations (717)

View on Semantic Scholar

Summary

The paper proposes two novel algorithms that leverage entropic regularization to efficiently compute Wasserstein barycenters.
It introduces fixed and free support methods, reducing computation cost and ensuring convex optimization via matrix scaling.
Applications in image visualization and constrained clustering highlight the practical benefits of summarizing high-dimensional data.

Fast Computation of Wasserstein Barycenters

The paper "Fast Computation of Wasserstein Barycenters" by Marco Cuturi and Arnaud Doucet addresses the problem of computing the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is significant in various applications within statistics and machine learning, particularly in scenarios requiring the comparison, summarization, and dimensionality reduction of empirical probability measures.

Summary of Contributions

The primary contribution of this work is the proposal of two algorithms to compute Wasserstein barycenters. These algorithms build upon the subgradient method but surmount its computational cost by extending prior work with an entropic regularizer for the Wasserstein distance. The regularization yields a strictly convex objective with substantially reduced computational cost for gradient computations through matrix scaling algorithms. The authors provide thorough theoretical underpinnings for their methods, as well as practical applications for visualizing a large family of images and solving a constrained clustering problem.

Background on Optimal Transport

The authors begin with a concise but comprehensive review of the Wasserstein distance and its relevance to optimal transport problems. Given two probability measures $\mu$ and $\nu$ on a space $X$ , their $p$ -Wasserstein distance is defined with respect to a metric $D$ on $X$ . For empirical measures, the computation of this distance reduces to solving a network flow problem.

Definition and Special Cases of Wasserstein Barycenters

The paper defines a Wasserstein barycenter problem as the minimizer of the sum of $p$ -Wasserstein distances from an empirical measure to a set of given measures. This problem encompasses several special cases, such as finding centroids of histograms and constrained $k$ -means.

Centroids of Histograms: When $X$ is finite and $p=1$ , the $1$-Wasserstein distance is equivalent to the Earth Mover's Distance.
Euclidean $X$ and $k$ -Means: For $p=2$ , minimizing this objective in a Euclidean space is equivalent to the $k$ -means problem.
Constrained $k$ -Means: When the weights of the barycenters are constrained, the problem approximates scenarios in sensor deployment and resampling in particle filters.

Proposed Algorithms

Fixed Support Algorithm (Algorithm 1): This algorithm handles the case when the support of the barycenter is fixed but the weights are variable and lie in a convex subset of the simplex. It uses a projected subgradient method for weight optimization.
Free Support Algorithm (Algorithm 2): For cases where the support is not fixed, this algorithm alternates between updating weights and support points. It leverages a Newton update for support points and a subgradient method for weights.

Entropic Regularization

To address practical computational constraints, the authors extend the entropic regularization approach presented by Cuturi (2013). This regularization converts the optimization into a smoothed dual problem, allowing efficient computation of gradients via matrix scaling algorithms, notably Sinkhorn's algorithm. The result is a strictly convex objective that can be minimized more efficiently.

Applications

Visualization of Perturbed Images

The authors use their algorithm to compute barycenters of digit images subjected to random scaling and translations. This application illustrates the efficacy of the proposed method in summarizing large collections of high-dimensional data. The computations demonstrate that the entropy-regularized barycenters can be efficiently computed even for a large number of images.

Constrained Clustering

The constrained clustering application uses census data on income and population across the contiguous US states. The algorithm computes clusters that ensure a more balanced assignment of weights. This use case showcases the utility of the approach in scenarios where uniformity constraints are essential, such as fair resource distribution and balanced clustering.

Implications and Future Research

The proposed methods enhance the capability to compute Wasserstein barycenters efficiently, making them useful in high-dimensional data analysis tasks. These algorithms open up new avenues for the application of Wasserstein barycenters in various domains, including image synthesis, finance, and spatial statistics. Future work could extend these methods to more complex problems involving multiple Wasserstein distances, such as in semi-supervised learning scenarios.

In conclusion, this paper delivers significant theoretical advancements and practical algorithms for the computation of Wasserstein barycenters, broadening their applicability and efficiency in contemporary data science and machine learning tasks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/probnstat/status/1753810352867827750