Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances (1306.0895v1)

Published 4 Jun 2013 in stat.ML

Abstract: Optimal transportation distances are a fundamental family of parameterized distances for histograms. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost is prohibitive whenever the histograms' dimension exceeds a few hundreds. We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. We smooth the classical optimal transportation problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn-Knopp's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transportation solvers. We also report improved performance over classical optimal transportation distances on the MNIST benchmark problem.

Citations (3,874)

Summary

  • The paper introduces an entropic regularization method that transforms the OT problem into a strictly convex one, enabling faster and more stable computations.
  • It demonstrates that the Sinkhorn-Knopp algorithm computes distances thousands of times faster than classical Earth Mover’s Distance methods, as shown on the MNIST dataset.
  • The efficiency gains allow practical application of robust OT metrics in high-dimensional machine learning and computer vision tasks.

Lightspeed Computation of Optimal Transportation Distances Using Sinkhorn Distances

The paper "Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances" by Marco Cuturi presents a significant advancement in the computation of Optimal Transportation (OT) distances, a class of mathematical distances employed to measure the "cost" of transforming one probability distribution into another. This work focuses on overcoming the computational inefficiencies of traditional OT distances which become prohibitive for high-dimensional datasets.

Optimal Transportation distances—also known as Earth Mover's Distances (EMD)—have been well-regarded for their theoretical robustness and practical effectiveness in applications such as image retrieval and computer vision. However, despite their advantages, the computational cost of OT distances, specifically the need to solve a linear program (LP) for each distance computation, scales poorly with histogram dimension, typically requiring O(d3logd)O(d^3 \log d) time complexity. This limitation restricts their use in large-scale machine learning tasks.

Entropic Regularization of Optimal Transportation

The key innovation introduced in this paper is the regularization of the optimal transportation problem using an entropy term, leading to the formulation of Sinkhorn distances. By adding an entropic penalty to the transport cost, Cuturi converts the original LP into a strictly convex problem, which can be efficiently solved using the Sinkhorn-Knopp matrix scaling algorithm. This algorithm ensures linear convergence and is highly parallelizable, making it feasible to execute on modern parallel computing platforms such as GPGPUs.

Algorithmic Efficiency and Implementation

The Sinkhorn distances are computed by solving the following modified optimization problem: $d_{M}^{\lambda}(r,c) \defeq \min_{P \in U(r,c)} \langle P, M \rangle - \frac{1}{\lambda} h(P),$ where h(P)h(P) is the entropy of the transport plan PP, λ\lambda is the regularization parameter, and MM is the ground distance matrix.

This transformation is executed efficiently using the Sinkhorn-Knopp algorithm:

  1. Initialize vectors uu and vv with ones.
  2. Iteratively update uu and vv to satisfy the constraints P1=rP \mathbf{1} = r and PT1=cP^T \mathbf{1} = c.
  3. The distance is calculated as ui(KM)vj\sum u_i (K \odot M) v_j, where \odot denotes the element-wise product and K=eλMK = e^{-\lambda M}.

The computation of Sinkhorn distances is several orders of magnitude faster than traditional OT methods. Empirical tests on the MNIST dataset demonstrate that the Sinkhorn distance can be computed thousands of times faster than the Earth Mover’s Distance and perform comparably, if not better, in classification tasks using these distances.

Empirical Results and Computational Benchmarks

Experiments on the MNIST dataset reveal that the Sinkhorn distance shows superior performance in classifying handwritten digits. For varying sizes of the dataset, this method not only outperformed classical OT distances but did so with markedly reduced computation times.

Furthermore, empirical analyses underscore the scalability of the Sinkhorn distances. By measuring computational time for histograms of varying dimensions, the Sinkhorn algorithm recurrently exhibited linear scalability. The analysis confirmed that for high-dimensional histograms, the algorithm executed in considerably less time compared to traditional methods.

Theoretical and Practical Implications

The introduction of Sinkhorn distances represents a pivotal step for the practical application of optimal transportation theories in the field of machine learning and data analysis. By mitigating the computational expense, these distances can now be applied to high-dimensional datasets, broadening their utility in real-world applications.

Theoretically, the entropic regularization ensures that the transportation problem remains feasible and stable, making it a robust alternative to classic OT methods. Practically, this advancement paves the way for more efficient data comparison techniques in machine learning pipelines without compromising on the accuracy or the richness of the distance metrics.

Future Developments

Further research may delve into optimizing the selection of the regularization parameter λ\lambda, improving convergence rates, and extending the applicability of Sinkhorn distances in other domains such as natural language processing and bioinformatics. Additionally, more comprehensive empirical evaluations across diverse datasets can enhance the understanding and operational benefits of Sinkhorn distances.

In conclusion, the paper "Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances" presents a substantial improvement in the computation of OT distances. The entropic regularization technique not only makes these distances computationally feasible for large-scale and high-dimensional data but also maintains, and often enhances, their practical applicability in machine learning tasks. This balance of theoretical innovation and practical application underscores the significant potential of Sinkhorn distances in advancing the field of data sciences.

X Twitter Logo Streamline Icon: https://streamlinehq.com