Stochastic Optimization for Large-scale Optimal Transport (1605.08527v1)

Published 27 May 2016 in math.OC, cs.LG, and math.NA

Abstract: Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.

Citations (440)

View on Semantic Scholar

Summary

The paper proposes new stochastic optimization algorithms that recast the dual optimal transport problem as expectation maximization combined with entropic regularization.
The approach outperforms traditional Sinkhorn iterations in discrete scenarios and offers superior performance in semi-discrete and continuous settings via efficient gradient descent.
The research enables scalable and accurate high-dimensional distribution comparisons, paving the way for advanced applications in machine learning and density estimation.

Stochastic Optimization for Large-Scale Optimal Transport

The paper "Stochastic Optimization for Large-scale Optimal Transport" presents a robust framework aimed at alleviating the computational intensity associated with optimal transport (OT) problems. Traditionally used to compare probability distributions while preserving geometric integrity, OT is challenged by its heavy computational demands, particularly in large-scale contexts. This paper introduces novel stochastic optimization algorithms suitable for high-dimensional learning applications, facilitating operations with both discrete and continuous distributions through sampling. This negates the necessity of density discretization and mitigates associated errors.

Key Contributions

Stochastic Optimization Algorithms: The paper proposes a new class of stochastic optimization algorithms that effectively handle large-scale OT problems. These algorithms are grounded in two principal concepts: recasting the dual OT problem as expectation maximization and using entropic regularization for smooth dual optimization.
Three-fold Application Strategy:
- Discrete to Discrete: The authors demonstrate that incremental stochastic optimization methods outperform the state-of-the-art Sinkhorn algorithm for discrete distributions.
- Discrete to Continuous: For scenarios involving a discrete distribution and a continuous density, a semi-discrete reformulation using averaged stochastic gradient descent yields superior performance compared to approximate discretization.
- Continuous to Continuous: The paper introduces a stochastic gradient descent methodology utilizing reproducing kernel Hilbert spaces (RKHS). This is notably the sole known method to handle continuous-to-continuous OT beyond finite sampling.

Numerical Results and Implications

The research provides empirical evidence across discrete, semi-discrete, and continuous benchmark problems, asserting the efficacy of the proposed stochastic approaches. Notably, the incremental algorithms, particularly when applied to discrete problems, outpace traditional Sinkhorn iterations in terms of scalability and efficiency. These results are particularly significant for applications requiring vast scalability, such as text classification via word mover's distances, which previously entailed prohibitive computational costs.

Implications for Machine Learning and Beyond

The implications of this work are profound, particularly for machine learning and related fields that require efficient large-scale distribution comparison. The stochastic optimization methods bridge a critical gap, offering scalable solutions without sacrificing computational integrity. Moreover, the ability to handle continuous distributions opens new research paths and applications, especially in density estimation and high-dimensional data analysis.

Future Prospects

This research sets the stage for further advancements in efficient computation methods within the optimal transport domain. Future work could explore diversified regularization techniques in the stochastic framework, enhancing convergence rates and broadening applicability. Additionally, further refinement and adaptation of these algorithms for specific machine learning applications can yield significant performance improvements.

In conclusion, this paper stands as a significant contribution to the field of optimal transport, offering practical solutions for the computational challenges associated with large-scale problems. Through the lens of stochastic optimization, it revitalizes the practicality and application potential of optimal transport in various computational domains.

PDF Markdown