Entropic Gromov-Wasserstein Distances: Stability and Algorithms (2306.00182v4)
Abstract: The Gromov-Wasserstein (GW) distance quantifies discrepancy between metric measure spaces and provides a natural framework for aligning heterogeneous datasets. Alas, as exact computation of GW alignment is NP hard, entropic regularization provides an avenue towards a computationally tractable proxy. Leveraging a recently derived variational representation for the quadratic entropic GW (EGW) distance, this work derives the first efficient algorithms for solving the EGW problem subject to formal, non-asymptotic convergence guarantees. To that end, we derive smoothness and convexity properties of the objective in this variational problem, which enables its resolution by the accelerated gradient method. Our algorithms employs Sinkhorn's fixed point iterations to compute an approximate gradient, which we model as an inexact oracle. We furnish convergence rates towards local and even global solutions (the latter holds under a precise quantitative condition on the regularization parameter), characterize the effects of gradient inexactness, and prove that stationary points of the EGW problem converge towards a stationary point of the unregularized GW problem, in the limit of vanishing regularization. We provide numerical experiments that validate our theory and empirically demonstrate the state-of-the-art empirical performance of our algorithm.
- D. Alvarez-Melis and T. Jaakkola. Gromov-wasserstein alignment of word embedding spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1881–1890. Association for Computational Linguistics, 2018.
- T. M. Apostol. Mathematical analysis. Addison-Wesley, 5 edition, 1974.
- Templates for convex cone problems with applications to sparse signal recovery. Mathematical programming computation, 3:165–218, 2011.
- On assignment problems related to Gromov-Wasserstein distances on the real line. arXiv preprint arXiv:2205.09006, 2022.
- G. Birkhoff. Extensions of Jentzsch’s theorem. Transactions of the American Mathematical Society, 85(1):219–227, 1957.
- MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data. arXiv preprint arXiv:2001.01666, 2020.
- J. F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems. Springer Science & Business Media, 2013.
- A. Braides. Local minimization, variational evolution and Γnormal-Γ\Gammaroman_Γ-convergence, volume 2094. Springer, 2014.
- H. Brézis. Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer, 2011.
- Learning generative models across incomparable spaces. In International conference on machine learning, pages 851–861. PMLR, 2019.
- G. Carlier and M. Laborde. A differential approach to the multi-marginal Schrödinger system. SIAM Journal on Mathematical Analysis, 52(1):709–717, 2020.
- Convergence of entropic schemes for optimal transport and gradient flows. SIAM Journal on Mathematical Analysis, 49(2):1385–1418, 2017.
- Semidefinite relaxations of the gromov-wasserstein distance. arXiv preprint arXiv:2312.14572, 2023.
- Gradient norm minimization of Nesterov acceleration: o(1/k3)𝑜1superscript𝑘3o(1/k^{3})italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). arXiv preprint arXiv:2209.08862, 2022.
- F. H. Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975.
- F. H. Clarke. Optimization and nonsmooth analysis. SIAM, 1990.
- C. W. Commander. A survey of the quadratic assignment problem, with applications. Morehead Electronic Journal of Applicable Mathematics, 4:MATH–2005–01, 2005.
- M. Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In Proceedings of the 26th International Conference on Neural Information Processing Systems, pages 2292–2300, 2013.
- A. d’Aspremont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, 19(3):1171–1183, 2008.
- Gromov-Wasserstein distances between Gaussian distributions. Journal of Applied Probability, pages 1–21, 2022.
- SCOT: single-cell multi-omics alignment with optimal transport. Journal of Computational Biology, 29(1):3–18, 2022.
- First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146:37–75, 2014.
- On the existence of Monge maps for the Gromov-Wasserstein distance. arXiv preprint arXiv:2210.11945, 2022.
- P. Dvurechensky. Gradient method with inexact oracle for composite non-convex optimization. arXiv preprint arXiv:1703.09180, 2017.
- Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In International conference on machine learning, pages 1367–1376. PMLR, 2018.
- S. Eckstein and M. Nutz. Quantitative stability of regularized optimal transport and convergence of Sinkhorn’s algorithm. SIAM Journal on Mathematical Analysis, 54(6):5922–5948, 2022.
- Interpolating between optimal transport and mmd using sinkhorn divergences. arXiv preprint arXiv:1810.08278, Oct. 2018.
- POT: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021.
- J. Franklin and J. Lorenz. On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717–735, 1989.
- Learning generative models with Sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR, 2018.
- Sample complexity of sinkhorn divergences. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 1574–1583, 2019.
- S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1-2):59–99, 2016.
- Stability of entropic optimal transport and Schrödinger bridges. Journal of Functional Analysis, 283(9):109622, 2022.
- J.-B. Hiriart-Urruty and C. Lemaréchal. Fundamentals of convex analysis. Springer Science & Business Media, 2004.
- L. V. Kantorovich. On the translocation of masses. In Doklady Akademii Nauk USSR, volume 37, pages 199–201, 1942.
- Computing the Gromov-Wasserstein distance between two surface meshes using optimal transport. Algorithms, 16(3):131, 2023.
- Entropic Gromov-Wasserstein between Gaussian distributions. In International Conference on Machine Learning, pages 12164–12203. PMLR, 2022.
- F. Mémoli. Spectral Gromov-Wasserstein distances for shape matching. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pages 256–263. IEEE, 2009.
- F. Mémoli. Gromov-Wasserstein distances and the metric approach to object matching. Found. Comput. Math., 11(4):417–487, 2011.
- Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
- Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125–161, 2013.
- M. Nutz. Introduction to entropic optimal transport. Lecture notes, Columbia University, 2021.
- M. Nutz and J. Wiesel. Stability of Schrödinger potentials and convergence of Sinkhorn’s algorithm. The Annals of Probability, 51(2):699–722, 2023.
- Gromov-Wasserstein averaging of kernel and distance matrices. In International Conference on Machine Learning, pages 2664–2672. PMLR, 2016.
- R. T. Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
- H. Samelson. On the Perron-Frobenius theorem. Michigan Mathematical Journal, 4(1):57 – 59, 1957.
- F. Santambrogio. Optimal Transport for Applied Mathematicians. Birkhäuser, 2015.
- Linear-time Gromov- Wasserstein distances using low rank couplings and costs. In International Conference on Machine Learning, pages 19347–19365. PMLR, 2022.
- The unbalanced Gromov-Wasserstein distance: conic formulation and relaxation. Advances in Neural Information Processing Systems, 34:8766–8779, 2021.
- Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming, pages 1–70, 2021.
- R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
- Entropic metric alignment for correspondence problems. ACM Transactions on Graphics (ToG), 35(4):1–13, 2016.
- K.-T. Sturm. The space of spaces: curvature bounds and gradient flows on the space of metric measure spaces. arXiv preprint arXiv:1208.0434, 2012.
- P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf, 2008.
- Optimal transport for structured data with application on graphs. In International Conference on Machine Learning (ICML), 2019.
- Sliced Gromov-Wasserstein. arXiv preprint arXiv:1905.10124, 2020.
- C. Villani. Optimal Transport: Old and New. Springer, 2008.
- Semi-relaxed Gromov Wasserstein divergence with applications on graphs. In International Conference on Learning Representations, 2022.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Scalable Gromov-Wasserstein learning for graph partitioning and matching. Advances in neural information processing systems, 32, 2019a.
- Gromov-Wasserstein learning for graph matching and node embedding. In International conference on machine learning, pages 6932–6941. PMLR, 2019b.
- Semi-supervised optimal transport for heterogeneous domain adaptation. In IJCAI, volume 7, pages 2969–2975, 2018.
- K. Yosida. Functional analysis. Springer Science & Business Media, 1995.
- Gromov-Wasserstein distances: entropic regularization, duality, and sample complexity. arXiv preprint arXiv:2212.12848, 2022a.
- Cycle consistent probability divergences across different spaces. In International Conference on Artificial Intelligence and Statistics, pages 7257–7285. PMLR, 2022b.