Computational Optimal Transport (1803.00567v4)

Published 1 Mar 2018 in stat.ML

Abstract: Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a "global" cost to every such transport, using the "local" consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.

Citations (1,949)

View on Semantic Scholar

Summary

The paper introduces advanced numerical formulations for optimal transport, leveraging entropic regularization and the Sinkhorn algorithm for scalable solutions.
It details the computation of Wasserstein barycenters and unbalanced transport, providing robust methods for applications in imaging and machine learning.
The paper also explores statistical convergence and scalability issues, outlining future directions for improved high-dimensional optimization.

Overview of "Computational Optimal Transport"

Introduction

The work under review, "Computational Optimal Transport," authored by Gabriel Peyré and Marco Cuturi, provides an extensive examination of Optimal Transport (OT) with a pronounced focus on numerical methods. Optimal Transport is a mathematical theory that seeks the most efficient way to transform one probability distribution into another, framed typically as minimizing some cost function. The historical roots of OT span back to Gaspard Monge in the 18th century, with significant contributions from Kantorovich during the mid-20th century, leading to its firm integration into optimization theory. OT has recently seen a resurgence due to the advent of scalable approximate solvers, which have broadened its application scope across various domains such as imaging sciences, graphics, and machine learning.

Key Concepts and Structure

1. Theoretical Foundations

The foundation of OT lies in the concept of "cost" associated with morphing one distribution into another. Mathematically, this is framed using the Monge and Kantorovich formulations. The Kantorovich relaxation, which allows for probabilistic mass splitting, transforms the original combinatorial problem into a continuous, convex optimization problem. This leads to more tractable numerical formulations, notably through linear programming.

A fundamental tool in OT is the Wasserstein distance which provides a metric for comparing probability distributions. For example, given two discrete probability vectors, this distance can be computed using a cost matrix and involves finding the optimal transportation plan. The Wasserstein distance generalizes naturally to continuous measures, retaining its ability to compare singular distributions.

2. Barycenters and Clustering

One practical extension of OT is the computation of Wasserstein barycenters, which generalizes the notion of geometric means to the space of probability distributions. This is crucial in applications like clustering and dictionary learning of distributions. The barycenter problem, framed as a convex optimization task over the Wasserstein space, finds useful applications in domains including image processing and Bayesian computations. The authors provide techniques for computing these barycenters using entropic regularization, which smoothes the optimization landscape.

3. Numerical Methods

The paper extensively covers numerical solvers for OT problems. One of the highlights is the Sinkhorn algorithm, which utilizes entropic regularization to transform the OT problem into a series of matrix scaling iterations. This algorithm is particularly advantageous owing to its simplicity and parallelization capabilities, making it suitable for large-scale applications. The text also explores multiscale and approximate Newton methods for more complex OT formulations. These methods are crucial for handling the high dimensionality occasionally encountered in practical applications.

4. Statistical Perspectives

From a statistical standpoint, computing empirical Wasserstein distances using samples from distributions leads to challenges. The paper assesses the convergence rates of empirical estimators for the Wasserstein distance and compares these with other statistical measures, such as $\phi$ -divergences and Maximum Mean Discrepancies (MMD). The nuanced understanding of sample complexity and convergence behavior informs practical strategies for effective model fitting and estimation in probabilistic settings.

5. Unbalanced Optimal Transport

Real-world problems often involve measures that do not have matching total mass. The framework of unbalanced OT expands the classical OT theory by relaxing the marginal constraints, catering to scenarios where mass creation or destruction is allowed. This formulation is particularly relevant to applications in computer vision and generative modeling where distributions may significantly differ in mass.

Implications and Future Directions

Practical Implications

The techniques discussed for OT have vast implications across various disciplines. For example, in imaging sciences, OT is instrumental for tasks such as color and texture processing. In machine learning, OT aids in defining loss functions for generative models and transportation policies for domain adaptation. The versatility of OT in modeling and solving complex transport problems makes it a keystone in computational mathematics and data science.

Theoretical Implications

On the theoretical front, the comprehensive approach to dynamic and entropic formulations of OT opens avenues for further research into the convergence properties and stability of these methods. The integration of convex analysis and optimization theory with probabilistic modeling underscores the interplay between these mathematical domains, fostering a deeper understanding of transport phenomena.

Future Developments

Looking ahead, several key areas could benefit from further exploration:

Algorithmic Enhancements: While the Sinkhorn algorithm is efficient, exploring hybrid methods that combine its simplicity with the robustness of interior-point methods could yield faster convergence rates, especially for high-dimensional problems.
Statistical Learning: Embedding OT within broader machine learning frameworks, like reinforcement learning and neural network training, could harness its full potential in adaptive systems.
Scalability: Developing parallelized versions of OT solvers that leverage modern hardware advancements, such as GPUs and TPUs, could further enhance the scalability of OT methods.
Application Expansion: Extending the application domains of OT to fields such as economics, climate science, and even social sciences, where resource distribution and matching problems are prevalent, could provide novel insights and solutions.

In summary, "Computational Optimal Transport" offers a rich and methodical discourse on OT theory and its computational aspects. By bridging theoretical underpinnings and practical algorithms, it lays a valuable groundwork for continued innovations and applications in numerous scientific and engineering domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AmitMoscovich/status/1863698931730800822

https://twitter.com/FatrasKilian/status/1810743988066062753

https://twitter.com/FatrasKilian/status/1810744031326130638