Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Fast Dual Subgradient Optimization of the Integrated Transportation Distance Between Stochastic Kernels (2312.01432v1)

Published 3 Dec 2023 in cs.LG and math.OC

Abstract: A generalization of the Wasserstein metric, the integrated transportation distance, establishes a novel distance between probability kernels of Markov systems. This metric serves as the foundation for an efficient approximation technique, enabling the replacement of the original system's kernel with a kernel with a discrete support of limited cardinality. To facilitate practical implementation, we present a specialized dual algorithm capable of constructing these approximate kernels quickly and efficiently, without requiring computationally expensive matrix operations. Finally, we demonstrate the efficacy of our method through several illustrative examples, showcasing its utility in practical scenarios. This advancement offers new possibilities for the streamlined analysis and manipulation of stochastic systems represented by kernels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  2. Wasserstein GAN, 2017. arXiv:1701.07875.
  3. The sketched Wasserstein distance for mixture distributions, 2022. arXiv:2206.12768.
  4. Aggregated Wasserstein distance and state registration for hidden Markov models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(9):2133–2147, 2020.
  5. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  6. M. Cuturi and G. Peyré. A smoothed dual approach for variational Wasserstein problems. SIAM Journal on Imaging Sciences, 9(1):320–343, 2016.
  7. M. Cuturi and G. Peyré. Semidual regularized optimal transport. SIAM Review, 60(4):941–965, 2018.
  8. J. Delon and A. Desolneux. A Wasserstein-type distance in the space of Gaussian mixture models. SIAM Journal on Imaging Sciences, 13(2):936–970, 2020.
  9. Constructive quantization: Approximation by empirical measures. Annales de l’IHP Probabilités et Statistiques, 49(4):1183–1203, 2013.
  10. Decentralize and randomize: Faster algorithm for Wasserstein barycenters. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  11. Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1367–1376. PMLR, 10–15 Jul 2018.
  12. N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3):707–738, 2015.
  13. G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods, 2023. arXiv:2301.11235.
  14. Stochastic optimization for large-scale optimal transport. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3440–3448, Red Hook, NY, USA, 2016. Curran Associates Inc.
  15. H. Heitsch and W. Römisch. Scenario tree modeling for multistage stochastic programs. Mathematical Programming, 118(2):371–406, 2009.
  16. Multilevel clustering via Wasserstein means, 2017. arXiv:1706.03883.
  17. K. Høyland and S. W. Wallace. Generating scenario trees for multistage decision problems. Management Science, 47(2):295–307, 2001.
  18. M. Kaut and S. W. Wallace. Shape-based scenario generation using copulas. Computational Management Science, 8(1):181–199, 2011.
  19. Sliced Wasserstein distance for learning Gaussian mixture models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  20. On the complexity of approximating Wasserstein barycenters. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3530–3540. PMLR, 09–15 Jun 2019.
  21. Ergodic, primal convergence in dual subgradient schemes for convex programming. Mathematical Programming, 86:283–312, 1999.
  22. On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3982–3991. PMLR, 09–15 Jun 2019.
  23. Z. Lin and A. Ruszczyński. An integrated transportation distance between kernels and approximate dynamic risk evaluation in Markov systems, 2023. arXiv:2311.06645.
  24. An improved analysis of stochastic gradient descent with momentum. Advances in Neural Information Processing Systems, 33:18261–18271, 2020.
  25. O. Pele and M. Werman. Fast and robust earth mover’s distances. In 2009 IEEE 12th International Conference on Computer Vision, pages 460–467, 2009.
  26. G. C. Pflug. Version-independence and nested distributions in multistage stochastic optimization. SIAM Journal on Optimization, 20(3):1406–1420, 2010.
  27. G. C. Pflug and A. Pichler. Dynamic generation of scenario trees. Computational Optimization and Applications, 62(3):641–668, 2015.
  28. S. T. Rachev and L. Rüschendorf. Mass Transportation Problems: Volume I: Theory. Springer Science & Business Media, 1998.
  29. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40:99–121, 2000.
  30. Wasserstein propagation for semi-supervised learning. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 306–314, Bejing, China, 22–24 Jun 2014. PMLR.
  31. C. Villani. Optimal Transport: Old and New. Springer, 2009.
  32. A unified analysis of stochastic momentum methods for deep learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 2955–2961, 2018.
  33. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.