Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating Sinkhorn Algorithm with Sparse Newton Iterations (2401.12253v1)

Published 20 Jan 2024 in math.OC, cs.LG, and stat.ML

Abstract: Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. David J Aldous. The ζ𝜁\zetaitalic_ζ (2) limit in the random assignment problem. Random Structures & Algorithms, 18(4):381–418, 2001.
  2. Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems, 30, 2017.
  3. Massively scalable sinkhorn distances via the nyström method. Advances in neural information processing systems, 32, 2019.
  4. Structured optimal transport. In International conference on artificial intelligence and statistics, pp.  1771–1780. PMLR, 2018.
  5. Wasserstein generative adversarial networks. In International conference on machine learning, pp.  214–223. PMLR, 2017.
  6. Smooth and sparse optimal transport. In International conference on artificial intelligence and statistics, pp.  880–889. PMLR, 2018.
  7. Wasserstein barycentric coordinates: histogram regression using optimal transport. ACM Trans. Graph., 35(4):71–1, 2016.
  8. From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642, 2017.
  9. Convex optimization. Cambridge university press, 2004.
  10. A sinkhorn-newton method for entropic optimal transport. arXiv preprint arXiv:1710.06635, 2017.
  11. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
  12. Guillaume Carlier. On the linear convergence of the multimarginal sinkhorn algorithm. SIAM Journal on Optimization, 32(2):786–794, 2022.
  13. Exponential convergence of sinkhorn under regularization scheduling. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23), pp.  180–188. SIAM, 2023.
  14. Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp.  1542–1553. PMLR, 2020.
  15. Combinatorial optimisation. Wiley-Interscience Series in Discrete Mathematics and Optimization, USA, 1:998, 1998.
  16. Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
  17. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  18. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In Proceedings of the European conference on computer vision (ECCV), pp.  447–463, 2018.
  19. Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34:12955–12965, 2021.
  20. Rachid Deriche. Recursively implementating the Gaussian and its derivatives. PhD thesis, INRIA, 1993.
  21. Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10648–10656, 2019.
  22. Computational optimal transport: Complexity by accelerated gradient descent is better than by sinkhorn’s algorithm. In International conference on machine learning, pp.  1367–1376. PMLR, 2018.
  23. Probabilistic methods in combinatorics. (No Title), 1974.
  24. Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp.  3186–3197. PMLR, 2021.
  25. Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2013.
  26. On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717–735, 1989.
  27. Operator scaling: theory and applications. Foundations of Computational Mathematics, 20(2):223–290, 2020.
  28. Stochastic optimization for large-scale optimal transport. Advances in neural information processing systems, 29, 2016.
  29. Gan and vae from an optimal transport point of view. arXiv preprint arXiv:1706.01807, 2017.
  30. Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pp.  1608–1617. PMLR, 2018.
  31. Sample complexity of sinkhorn divergences. In The 22nd international conference on artificial intelligence and statistics, pp.  1574–1583. PMLR, 2019.
  32. On the convergence rate of sinkhorn’s algorithm. arXiv preprint arXiv:2212.06000, 2022.
  33. Matrix computations. JHU press, 2013.
  34. Geodesic sinkhorn for fast and accurate optimal transport on manifolds. ArXiv, 2023.
  35. Otlda: A geometry-aware optimal transport approach for topic modeling. Advances in Neural Information Processing Systems, 33:18573–18582, 2020.
  36. Interpretable distribution features with maximum testing power. Advances in Neural Information Processing Systems, 29, 2016.
  37. Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
  38. Efficient and accurate optimal transport with mirror descent and conjugate gradients. arXiv preprint arXiv:2307.08507, 2023.
  39. Optimal mass transport: Signal processing and machine-learning applications. IEEE signal processing magazine, 34(4):43–59, 2017.
  40. Flavien Léger. A gradient descent perspective on sinkhorn. Applied Mathematics & Optimization, 84(2):1843–1855, 2021.
  41. A geometric view of optimal transportation and generative model. Computer Aided Geometric Design, 68:1–21, 2019.
  42. Importance sparsification for sinkhorn algorithm. arXiv preprint arXiv:2306.06581, 2023.
  43. A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp.  644–652, 1998.
  44. Differential properties of sinkhorn approximation for learning with wasserstein distance. Advances in Neural Information Processing Systems, 31, 2018.
  45. Optimal transport mapping via input convex neural networks. In International Conference on Machine Learning, pp.  6672–6681. PMLR, 2020.
  46. Entropy-regularized 2222-wasserstein distance between gaussian measures. arXiv preprint arXiv:2006.03416, 2020.
  47. On the solution of the random link matching problems. Journal de Physique, 48(9):1451–1459, 1987.
  48. Konstantin Mishchenko. Sinkhorn algorithm as a special case of stochastic mirror descent. arXiv preprint arXiv:1909.06918, 2019.
  49. Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
  50. Yurii Evgen’evich Nesterov. A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In Doklady Akademii Nauk, volume 269, pp.  543–547. Russian Academy of Sciences, 1983.
  51. Improving mini-batch optimal transport via partial transportation. In International Conference on Machine Learning, pp.  16656–16690. PMLR, 2022.
  52. Most: Multi-source domain adaptation via optimal transport for student-teacher learning. In Uncertainty in Artificial Intelligence, pp.  225–235. PMLR, 2021.
  53. Numerical optimization. Springer, 1999.
  54. Exploiting mmd and sinkhorn divergences for fair and transferable representation learning. Advances in Neural Information Processing Systems, 33:15360–15370, 2020.
  55. Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  9223–9232, 2021.
  56. Sinkhorn autoencoders. In Uncertainty in Artificial Intelligence, pp.  733–743. PMLR, 2020.
  57. Computational optimal transport. Center for Research in Economics and Statistics Working Papers, (2017-86), 2017.
  58. Mark S Pinsker. Information and information stability of random variables and processes. Holden-Day, 1964.
  59. Theoretical analysis of domain adaptation with optimal transport. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part II 10, pp.  737–753. Springer, 2017.
  60. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
  61. Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590–1602, 2011.
  62. On the convergence and robustness of training gans with regularized optimal transport. Advances in Neural Information Processing Systems, 31, 2018.
  63. Linear time sinkhorn divergences using positive features. Advances in Neural Information Processing Systems, 33:13468–13480, 2020.
  64. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pp.  9344–9354. PMLR, 2021.
  65. Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283, 2017.
  66. Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
  67. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34(4):1–11, 2015.
  68. J Michael Steele. Probability theory and combinatorial optimization. SIAM, 1997.
  69. Overrelaxed sinkhorn–knopp algorithm for regularized optimal transport. Algorithms, 14(5):143, 2021.
  70. Multi-source domain adaptation via weighted joint distributions optimal transport. In Uncertainty in Artificial Intelligence, pp.  1970–1980. PMLR, 2022.
  71. Optimal transport for structured data with application on graphs. arXiv preprint arXiv:1805.09114, 2018.
  72. Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
  73. Jonathan Weed. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pp.  1841–1855. PMLR, 2018.
  74. Unsupervised domain adaptation via deep hierarchical optimal transport. arXiv preprint arXiv:2211.11424, 2022.
  75. G Udny Yule. On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 75(6):579–652, 1912.
Citations (1)

Summary

We haven't generated a summary for this paper yet.