Accelerating Sinkhorn Algorithm with Sparse Newton Iterations (2401.12253v1)
Abstract: Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.
- David J Aldous. The ζ𝜁\zetaitalic_ζ (2) limit in the random assignment problem. Random Structures & Algorithms, 18(4):381–418, 2001.
- Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems, 30, 2017.
- Massively scalable sinkhorn distances via the nyström method. Advances in neural information processing systems, 32, 2019.
- Structured optimal transport. In International conference on artificial intelligence and statistics, pp. 1771–1780. PMLR, 2018.
- Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. PMLR, 2017.
- Smooth and sparse optimal transport. In International conference on artificial intelligence and statistics, pp. 880–889. PMLR, 2018.
- Wasserstein barycentric coordinates: histogram regression using optimal transport. ACM Trans. Graph., 35(4):71–1, 2016.
- From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642, 2017.
- Convex optimization. Cambridge university press, 2004.
- A sinkhorn-newton method for entropic optimal transport. arXiv preprint arXiv:1710.06635, 2017.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Guillaume Carlier. On the linear convergence of the multimarginal sinkhorn algorithm. SIAM Journal on Optimization, 32(2):786–794, 2022.
- Exponential convergence of sinkhorn under regularization scheduling. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23), pp. 180–188. SIAM, 2023.
- Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp. 1542–1553. PMLR, 2020.
- Combinatorial optimisation. Wiley-Interscience Series in Discrete Mathematics and Optimization, USA, 1:998, 1998.
- Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In Proceedings of the European conference on computer vision (ECCV), pp. 447–463, 2018.
- Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34:12955–12965, 2021.
- Rachid Deriche. Recursively implementating the Gaussian and its derivatives. PhD thesis, INRIA, 1993.
- Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10648–10656, 2019.
- Computational optimal transport: Complexity by accelerated gradient descent is better than by sinkhorn’s algorithm. In International conference on machine learning, pp. 1367–1376. PMLR, 2018.
- Probabilistic methods in combinatorics. (No Title), 1974.
- Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp. 3186–3197. PMLR, 2021.
- Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2013.
- On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717–735, 1989.
- Operator scaling: theory and applications. Foundations of Computational Mathematics, 20(2):223–290, 2020.
- Stochastic optimization for large-scale optimal transport. Advances in neural information processing systems, 29, 2016.
- Gan and vae from an optimal transport point of view. arXiv preprint arXiv:1706.01807, 2017.
- Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pp. 1608–1617. PMLR, 2018.
- Sample complexity of sinkhorn divergences. In The 22nd international conference on artificial intelligence and statistics, pp. 1574–1583. PMLR, 2019.
- On the convergence rate of sinkhorn’s algorithm. arXiv preprint arXiv:2212.06000, 2022.
- Matrix computations. JHU press, 2013.
- Geodesic sinkhorn for fast and accurate optimal transport on manifolds. ArXiv, 2023.
- Otlda: A geometry-aware optimal transport approach for topic modeling. Advances in Neural Information Processing Systems, 33:18573–18582, 2020.
- Interpretable distribution features with maximum testing power. Advances in Neural Information Processing Systems, 29, 2016.
- Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
- Efficient and accurate optimal transport with mirror descent and conjugate gradients. arXiv preprint arXiv:2307.08507, 2023.
- Optimal mass transport: Signal processing and machine-learning applications. IEEE signal processing magazine, 34(4):43–59, 2017.
- Flavien Léger. A gradient descent perspective on sinkhorn. Applied Mathematics & Optimization, 84(2):1843–1855, 2021.
- A geometric view of optimal transportation and generative model. Computer Aided Geometric Design, 68:1–21, 2019.
- Importance sparsification for sinkhorn algorithm. arXiv preprint arXiv:2306.06581, 2023.
- A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 644–652, 1998.
- Differential properties of sinkhorn approximation for learning with wasserstein distance. Advances in Neural Information Processing Systems, 31, 2018.
- Optimal transport mapping via input convex neural networks. In International Conference on Machine Learning, pp. 6672–6681. PMLR, 2020.
- Entropy-regularized 2222-wasserstein distance between gaussian measures. arXiv preprint arXiv:2006.03416, 2020.
- On the solution of the random link matching problems. Journal de Physique, 48(9):1451–1459, 1987.
- Konstantin Mishchenko. Sinkhorn algorithm as a special case of stochastic mirror descent. arXiv preprint arXiv:1909.06918, 2019.
- Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
- Yurii Evgen’evich Nesterov. A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In Doklady Akademii Nauk, volume 269, pp. 543–547. Russian Academy of Sciences, 1983.
- Improving mini-batch optimal transport via partial transportation. In International Conference on Machine Learning, pp. 16656–16690. PMLR, 2022.
- Most: Multi-source domain adaptation via optimal transport for student-teacher learning. In Uncertainty in Artificial Intelligence, pp. 225–235. PMLR, 2021.
- Numerical optimization. Springer, 1999.
- Exploiting mmd and sinkhorn divergences for fair and transferable representation learning. Advances in Neural Information Processing Systems, 33:15360–15370, 2020.
- Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9223–9232, 2021.
- Sinkhorn autoencoders. In Uncertainty in Artificial Intelligence, pp. 733–743. PMLR, 2020.
- Computational optimal transport. Center for Research in Economics and Statistics Working Papers, (2017-86), 2017.
- Mark S Pinsker. Information and information stability of random variables and processes. Holden-Day, 1964.
- Theoretical analysis of domain adaptation with optimal transport. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part II 10, pp. 737–753. Springer, 2017.
- Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
- Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590–1602, 2011.
- On the convergence and robustness of training gans with regularized optimal transport. Advances in Neural Information Processing Systems, 31, 2018.
- Linear time sinkhorn divergences using positive features. Advances in Neural Information Processing Systems, 33:13468–13480, 2020.
- Low-rank sinkhorn factorization. In International Conference on Machine Learning, pp. 9344–9354. PMLR, 2021.
- Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283, 2017.
- Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
- Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34(4):1–11, 2015.
- J Michael Steele. Probability theory and combinatorial optimization. SIAM, 1997.
- Overrelaxed sinkhorn–knopp algorithm for regularized optimal transport. Algorithms, 14(5):143, 2021.
- Multi-source domain adaptation via weighted joint distributions optimal transport. In Uncertainty in Artificial Intelligence, pp. 1970–1980. PMLR, 2022.
- Optimal transport for structured data with application on graphs. arXiv preprint arXiv:1805.09114, 2018.
- Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
- Jonathan Weed. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pp. 1841–1855. PMLR, 2018.
- Unsupervised domain adaptation via deep hierarchical optimal transport. arXiv preprint arXiv:2211.11424, 2022.
- G Udny Yule. On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 75(6):579–652, 1912.