Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment (2209.02905v2)

Published 7 Sep 2022 in cs.CV and cs.LG

Abstract: Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a ``data-dependent'' approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Normalization of language embeddings for cross-lingual alignment. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  2. Faster algorithms for the geometric transportation problem. In 33rd International Symposium on Computational Geometry, SoCG 2017, July 4-7, 2017, Brisbane, Australia, pages 7:1–7:16, 2017.
  3. A near-linear constant-factor approximation for euclidean bipartite matching? In Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8-11, 2004, pages 247–252, 2004.
  4. Network flows: theory, algorithms, and applications. Prentice Hall, 1993.
  5. Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. In Annual Conference on Neural Information Processing Systems, pages 1964–1974, 2017.
  6. D. Alvarez-Melis and T. S. Jaakkola. Gromov-wasserstein alignment of word embedding spaces. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1881–1890. Association for Computational Linguistics, 2018.
  7. Efficient sketches for earth-mover distance, with applications. In Foundations of Computer Science, 2009. FOCS’09. 50th Annual IEEE Symposium on, pages 324–330. IEEE, 2009.
  8. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574–583, 2014.
  9. Parallel approximate undirected shortest paths via low hop emulators. In K. Makarychev, Y. Makarychev, M. Tulsiani, G. Kamath, and J. Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 322–335. ACM, 2020.
  10. D. Arthur and S. Vassilvitskii. K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, 2007.
  11. M. Belkin. Problems of learning on manifolds. The University of Chicago, 2003.
  12. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010.
  13. P. Besl and N. D. McKay. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992.
  14. Improving approximate optimal transport distances using quantization. In Uncertainty in artificial intelligence, pages 290–300. PMLR, 2021.
  15. Learning bounds for domain adaptation. In Proc. of the 21st Annual Conference on Neural Information Processing Systems, pages 129–136, 2007.
  16. Matching point sets with respect to the earth mover’s distance. Computational Geometry, 39(2):118–133, 2008.
  17. Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2):177–190, 2014.
  18. Unbalanced optimal transport through non-negative penalized linear regression. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 23270–23282, 2021.
  19. Maximum flow and minimum-cost flow in almost-linear time. arXiv preprint arXiv:2203.00671, 2022.
  20. New streaming algorithms for high dimensional emd and mst. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 222–233, 2022.
  21. S. Cohen and L. Guibas. The earth mover’s distance under transformation sets. In Proceedings of the 7th IEEE International Conference on Computer Vision, page 1, 1999.
  22. 3d object retrieval using many-to-many matching of curve skeletons. In Shape Modeling and Applications, 2005 International Conference, pages 366–371. IEEE, 2005.
  23. Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 39(9):1853–1865, 2016.
  24. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell., 39(9):1853–1865, 2017.
  25. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 2292–2300, 2013.
  26. S. Dasgupta and K. Sinha. Randomized partition trees for exact nearest neighbor search. In Conference on Learning Theory, pages 317–337, 2013.
  27. Closed form word embedding alignment. Knowl. Inf. Syst., 63(3):565–588, 2021.
  28. A data-dependent algorithm for querying earth mover’s distance with low doubling dimensions. In C. Demeniconi and I. Davidson, editors, Proceedings of the 2021 SIAM International Conference on Data Mining, SDM 2021, Virtual Event, April 29 - May 1, 2021, pages 630–638. SIAM, 2021.
  29. H. Ding and J. Xu. FPTAS for minimizing the earth mover’s distance under rigid transformations and related problems. Algorithmica, 78(3):741–770, 2017.
  30. H. Ding and M. Ye. On geometric alignment in low doubling dimension. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1460–1467, 2019.
  31. D. Feldman. Core-sets: An updated survey. WIREs Data Mining Knowl. Discov., 10(1), 2020.
  32. K. Fox and J. Lu. A near-linear time approximation scheme for geometric transportation with arbitrary supplies and spread. J. Comput. Geom., 13(1), 2022.
  33. Finding minimum-cost circulations by canceling negative cycles. J. ACM, 36(4):873–886, 1989.
  34. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE conference on computer vision and pattern recognition, pages 2066–2073. IEEE, 2012.
  35. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985.
  36. Unsupervised alignment of embeddings with wasserstein procrustes. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1880–1890. PMLR, 2019.
  37. A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864. ACM, 2016.
  38. Semisupervised alignment of manifolds. In AISTATS, pages 120–127, 2005.
  39. S. Har-Peled and M. Mendel. Fast construction of nets in low-dimensional metrics and their applications. SIAM Journal on Computing, 35(5):1148–1184, 2006.
  40. P. Indyk. A near linear time constant factor approximation for euclidean bichromatic matching (cost). In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 39–42. Society for Industrial and Applied Mathematics, 2007.
  41. P. Indyk and N. Thaper. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.
  42. Two-sided wasserstein procrustes analysis. In IJCAI, pages 3515–3521, 2021.
  43. Provably approximated ICP. CoRR, abs/2101.03588, 2021.
  44. D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 741–750. ACM, 2002.
  45. Preconditioning for the geometric transportation problem. In 35th International Symposium on Computational Geometry, pages 15:1–15:14, 2019.
  46. Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS computational biology, 4(11):e1000232, 2008.
  47. O. Klein and R. C. Veltkamp. Approximation algorithms for computing the earth mover’s distance under transformations. In International Symposium on Algorithms and Computation, pages 1019–1028. Springer, 2005.
  48. R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 798–807. Society for Industrial and Applied Mathematics, 2004.
  49. From word embeddings to document distances. In International Conference on Machine Learning, pages 957–966, 2015.
  50. T. J. Laakso. Plane with a∞subscript𝑎a_{\infty}italic_a start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-weighted metric not bilipschitz embeddable to rnsuperscript𝑟𝑛r^{n}italic_r start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Bulletin of the London Mathematical Society, 34(6):667–676, 2002.
  51. F. Le Gall. Faster algorithms for rectangular matrix multiplication. In 2012 IEEE 53rd annual symposium on foundations of computer science, pages 514–523. IEEE, 2012.
  52. Y. T. Lee and A. Sidford. Path finding methods for linear programming: Solving linear programs in õ(vrank) iterations and faster algorithms for maximum flow. In 55th IEEE Annual Symposium on Foundations of Computer Science,, pages 424–433, 2014.
  53. S. Li. On constant factor approximation for earth mover distance over doubling metrics. CoRR, abs/1002.4034, 2010.
  54. Novel geometric approach for global alignment of PPI networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 31–37, 2017.
  55. S. Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
  56. Unified alignment of protein-protein interaction networks. Scientific Reports, 7(1):953, 2017.
  57. Handbook of fingerprint recognition. Springer Science & Business Media, 2009.
  58. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013.
  59. Outlier-robust optimal transport. In International Conference on Machine Learning, pages 7850–7860. PMLR, 2021.
  60. A. Munteanu and C. Schwiegelshohn. Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms. Künstliche Intell., 32(1):37–53, 2018.
  61. Low-cost and faster tracking systems using core-sets for pose-estimation. CoRR, abs/1511.09120, 2015.
  62. J. B. Orlin. A faster strongly polynominal minimum cost flow algorithm. In Proc. of the 20th Annual ACM Symposium on Theory of Computing, pages 377–387, 1988.
  63. J. B. Orlin. A polynomial time primal network simplex algorithm for minimum cost flows. Mathematical Programming, 78(2):109–129, 1997.
  64. Polynomial dual network simplex algorithms. Mathematical programming, 60(1-3):255–276, 1993.
  65. S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Trans. Knowl. Data Eng., 22(10):1345–1359, 2010.
  66. Mapping estimation for discrete optimal transport. Advances in Neural Information Processing Systems, 29, 2016.
  67. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis., 40(2):99–121, 2000.
  68. A network synthesis model for generating protein interaction network families. PloS one, 7, August 2012.
  69. P. H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  70. R. Sharathkumar and P. K. Agarwal. Algorithms for the transportation problem in geometric settings. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 306–317, 2012.
  71. R. Sharathkumar and P. K. Agarwal. A near-linear time ϵitalic-ϵ\epsilonitalic_ϵ-approximation algorithm for geometric bipartite matching. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 385–394, 2012.
  72. J. Sherman. Generalized preconditioning and undirected minimum-cost flow. In 28th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 772–780, 2017.
  73. A model of large-scale proteome evolution. Advances in Complex Systems, 5(01):43–54, 2002.
  74. K. Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 281–290, 2004.
  75. É. Tardos. A strongly polynomial minimum cost circulation algorithm. Combinatorica, 5(3):247–256, 1985.
  76. S. Todorovic and N. Ahuja. Region-based hierarchical image matching. International Journal of Computer Vision, 78(1):47–66, 2008.
  77. P. M. Vaidya. Geometry helps in matching. SIAM J. Comput., 18(6):1201–1225, 1989.
  78. Approximation algorithms for bipartite and non-bipartite matching in the plane. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, 17-19 January 1999, Baltimore, Maryland, USA, pages 805–814, 1999.
  79. Modeling of protein interaction networks. Complexus, 1(1):38–44, 2003.
  80. C. Villani. Topics in optimal transportation. American Mathematical Society, 58, 2008.
  81. G. Wahba. A least squares estimate of satellite attitude. SIAM review, 7(3):409–409, 1965.
  82. Manifold alignment, 2011.
  83. Computing platforms for big biological data analytics: perspectives and challenges. Computational and structural biotechnology journal, 15:403–411, 2017.
  84. On the universal structure of human lexical semantics. Proceedings of the National Academy of Sciences, 113(7):1766–1771, 2016.
  85. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 1934–1945, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.