Interpolating between Optimal Transport and MMD using Sinkhorn Divergences (1810.08278v1)

Published 18 Oct 2018 in math.ST and stat.TH

Abstract: Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law. This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples.

Citations (477)

View on Semantic Scholar

Summary

The paper presents Sinkhorn divergences that interpolate between OT and MMD, providing a robust, unbiased metric through entropic regularization.
It demonstrates that these divergences maintain key properties like positivity, convexity, and convergence metrization, essential for geometric machine learning.
The study introduces an efficient GPU-optimized gradient formulation that accelerates computation by 2-3 times, enabling large-scale applications.

An Analytical Examination of Interpolating Distances in Probability Measures

The paper presents a comprehensive paper of Sinkhorn divergences, which serve as a bridge between Optimal Transport (OT) distances and Maximum Mean Discrepancies (MMD) within the field of comparing probability distributions. This intersection is particularly relevant in machine learning and data science contexts, where geometric considerations are essential for tasks such as shape matching, classification, and training generative models. Unlike traditional metrics like Total Variation and Kullback-Leibler divergence that fail to capture geometric properties, MMD and OT provide a robust framework by integrating these spatial attributes.

Sinkhorn Divergences: Theoretical and Practical Insights

Sinkhorn divergences are introduced as a parameterized family of divergences that interpolate between OT and MMD, characterized by maximized dual forms, offering positivity, convexity, and metrization of convergence in law. They rely on the novel approach of entropic regularization, which smooths the computation of OT distances, substantially reducing computational overhead and allowing for efficient computation on GPU architectures for large-scale applications.

Key Theoretical Contributions:

Positivity, Convexity, and Metrization: The Sinkhorn divergences are shown to uphold the qualities of a proper metric, providing a symmetric positive definite measure.
Interpolation Parameter: By varying the parameter $\epsilon$ , the Sinkhorn divergences can explore the geometric spectrum from OT to MMD. As $\epsilon \to 0$ , the divergences approach OT, capturing the transportation cost between measures. Conversely, as $\epsilon \to \infty$ , they converge to an MMD form, representing a simpler, convolution-based measure.
Elimination of Entropic Bias: Traditional OT suffers from bias induced by entropic regularization, leading to skewed solutions that fail to preserve distribution mass faithfully. The Sinkhorn divergence formulation corrects this bias, ensuring that the solutions remain unbiased.

Numerical Implementation and Efficiency

The computational implementation focuses on achieving scalability and efficiency on modern GPU hardware. By exploiting dual potentials within the Sinkhorn algorithm and optimizing using GPU-friendly structures, the approach significantly outperforms conventional tensor-based implementations. The authors provide an explicit gradient formulation which avoids the computational burden of differentiating through the entire Sinkhorn loop, thus accelerating performance by factors of 2-3 compared to naive autograd methods.

The numerical experiments underscore the divergence’s capacity to scale to large datasets while retaining geometric fidelity. Benchmark tests indicate that Sinkhorn divergences can effectively handle measures with up to hundreds of thousands of samples—showcasing their applicability in real-world machine learning scenarios.

Implications and Future Directions

Sinkhorn divergences offer a promising path forward for geometric machine learning and statistical analysis. Their dual nature affords a degree of flexibility in manipulating distributional distance metrics, making them ideal for applications that blend geometric intuition with computational tractability. The methodological rigor and the strong theoretical guarantees provided by the authors pave the way for deploying these tools in a broader set of domains.

Future research could delve into the practical deployment of Sinkhorn divergences in advanced machine learning architectures, particularly in model architectures where scalability and geometric precision are critical. Furthermore, exploring the interaction of these divergences with various types of regularization and constraints could yield new insights into improved algorithmic performance in classification and clustering tasks. The intersection of these theoretical tools with domain-specific applications marks an exciting frontier in machine learning research.

PDF Markdown

GitHub

GitHub - jeanfeydy/global-divergences: MMD, Hausdorff and Sinkhorn divergences scaled up to 1,000,000 samples. (56 stars)