- The paper proposes two novel algorithms that leverage entropic regularization to efficiently compute Wasserstein barycenters.
- It introduces fixed and free support methods, reducing computation cost and ensuring convex optimization via matrix scaling.
- Applications in image visualization and constrained clustering highlight the practical benefits of summarizing high-dimensional data.
Fast Computation of Wasserstein Barycenters
The paper "Fast Computation of Wasserstein Barycenters" by Marco Cuturi and Arnaud Doucet addresses the problem of computing the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is significant in various applications within statistics and machine learning, particularly in scenarios requiring the comparison, summarization, and dimensionality reduction of empirical probability measures.
Summary of Contributions
The primary contribution of this work is the proposal of two algorithms to compute Wasserstein barycenters. These algorithms build upon the subgradient method but surmount its computational cost by extending prior work with an entropic regularizer for the Wasserstein distance. The regularization yields a strictly convex objective with substantially reduced computational cost for gradient computations through matrix scaling algorithms. The authors provide thorough theoretical underpinnings for their methods, as well as practical applications for visualizing a large family of images and solving a constrained clustering problem.
Background on Optimal Transport
The authors begin with a concise but comprehensive review of the Wasserstein distance and its relevance to optimal transport problems. Given two probability measures μ and ν on a space X, their p-Wasserstein distance is defined with respect to a metric D on X. For empirical measures, the computation of this distance reduces to solving a network flow problem.
Definition and Special Cases of Wasserstein Barycenters
The paper defines a Wasserstein barycenter problem as the minimizer of the sum of p-Wasserstein distances from an empirical measure to a set of given measures. This problem encompasses several special cases, such as finding centroids of histograms and constrained k-means.
- Centroids of Histograms: When X is finite and p=1, the $1$-Wasserstein distance is equivalent to the Earth Mover's Distance.
- Euclidean X and k-Means: For p=2, minimizing this objective in a Euclidean space is equivalent to the k-means problem.
- Constrained k-Means: When the weights of the barycenters are constrained, the problem approximates scenarios in sensor deployment and resampling in particle filters.
Proposed Algorithms
- Fixed Support Algorithm (Algorithm 1): This algorithm handles the case when the support of the barycenter is fixed but the weights are variable and lie in a convex subset of the simplex. It uses a projected subgradient method for weight optimization.
- Free Support Algorithm (Algorithm 2): For cases where the support is not fixed, this algorithm alternates between updating weights and support points. It leverages a Newton update for support points and a subgradient method for weights.
Entropic Regularization
To address practical computational constraints, the authors extend the entropic regularization approach presented by Cuturi (2013). This regularization converts the optimization into a smoothed dual problem, allowing efficient computation of gradients via matrix scaling algorithms, notably Sinkhorn's algorithm. The result is a strictly convex objective that can be minimized more efficiently.
Applications
Visualization of Perturbed Images
The authors use their algorithm to compute barycenters of digit images subjected to random scaling and translations. This application illustrates the efficacy of the proposed method in summarizing large collections of high-dimensional data. The computations demonstrate that the entropy-regularized barycenters can be efficiently computed even for a large number of images.
Constrained Clustering
The constrained clustering application uses census data on income and population across the contiguous US states. The algorithm computes clusters that ensure a more balanced assignment of weights. This use case showcases the utility of the approach in scenarios where uniformity constraints are essential, such as fair resource distribution and balanced clustering.
Implications and Future Research
The proposed methods enhance the capability to compute Wasserstein barycenters efficiently, making them useful in high-dimensional data analysis tasks. These algorithms open up new avenues for the application of Wasserstein barycenters in various domains, including image synthesis, finance, and spatial statistics. Future work could extend these methods to more complex problems involving multiple Wasserstein distances, such as in semi-supervised learning scenarios.
In conclusion, this paper delivers significant theoretical advancements and practical algorithms for the computation of Wasserstein barycenters, broadening their applicability and efficiency in contemporary data science and machine learning tasks.