Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein (2402.02239v2)

Published 3 Feb 2024 in cs.LG and stat.ML

Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Minimum-distortion embedding. Foundations and Trends® in Machine Learning, 14(3):211–378, 2021.
  2. Gromov-wasserstein alignment of word embedding spaces. arXiv preprint arXiv:1809.00013, 2018.
  3. Riemannian adaptive optimization methods. arXiv preprint arXiv:1810.00760, 2018.
  4. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
  5. Bhatia, R. Infinitely divisible matrices. The American Mathematical Monthly, 113(3):221–235, 2006.
  6. Birkhoff, G. Tres observaciones sobre el algebra lineal. Univ. Nac. Tucuman, Ser. A, 5:147–154, 1946.
  7. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
  8. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
  9. Learning probability measures with respect to optimal transport metrics. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
  10. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nature communications, 12(1):124, 2021.
  11. Centrosymmetric stochastic matrices. Linear and Multilinear Algebra, 70(3):449–464, 2022.
  12. Cela, E. The quadratic assignment problem: theory and algorithms, volume 1. Springer Science & Business Media, 2013.
  13. Horopca: Hyperbolic dimensionality reduction via horospherical projections. In International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  1419–1429. PMLR, 2021.
  14. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nature biotechnology, 37(12):1452–1457, 2019.
  15. A gromov–wasserstein geometric view of spectrum-preserving graph coarsening. arXiv preprint arXiv:2306.08854, 2023.
  16. The gromov–wasserstein distance between networks and stable network invariants. Information and Inference: A Journal of the IMA, 8(4):757–787, 2019.
  17. Generalized spectral clustering via gromov-wasserstein learning. In International Conference on Artificial Intelligence and Statistics, pp.  712–720. PMLR, 2021.
  18. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  19. Gromov-wasserstein optimal transport to align single-cell multi-omics data. BioRxiv, pp.  2020–04, 2020.
  20. Torchmetrics-measuring reproducibility in pytorch. Journal of Open Source Software, 7(70):4101, 2022.
  21. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  551–556, 2004.
  22. Weighted graph cuts without eigenvectors a multilevel approach. IEEE transactions on pattern analysis and machine intelligence, 29(11):1944–1957, 2007.
  23. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
  24. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110:104743, 2022.
  25. Gwcnn: A metric alignment layer for deep shape analysis. In Computer Graphics Forum, volume 36, pp.  49–57. Wiley Online Library, 2017.
  26. Nested hyperbolic spaces for dimensionality reduction and hyperbolic nn design. In Conference on Computer Vision and Pattern Recognition, pp. 356–365, June 2022.
  27. Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1):3571–3578, 2021.
  28. Unified framework for spectral dimensionality reduction, maximum variance unfolding, and kernel learning by semidefinite programming: Tutorial and survey. arXiv preprint arXiv:2106.15379, 2021.
  29. On the convergence of the block nonlinear gauss–seidel method under convex constraints. Operations research letters, 26(3):127–136, 2000.
  30. CO-SNE: Dimensionality reduction and visualization for hyperbolic data. In Computer Vision and Pattern Recognition (CVPR), pp. 11–20, Los Alamitos, CA, USA, jun 2022.
  31. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the twenty-first international conference on Machine learning, pp.  47, 2004.
  32. Unsupervised Learning. Springer New York, New York, NY, 2009.
  33. Stochastic neighbor embedding. Advances in neural information processing systems, 15, 2002.
  34. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  35. Geoopt: Riemannian optimization in pytorch, 2020.
  36. Konno, H. A cutting plane algorithm for solving bilinear programs. Mathematical Programming, 11(1):14–27, 1976.
  37. Lacoste-Julien, S. Convergence rate of frank-wolfe for non-convex objectives. arXiv preprint arXiv:1607.00345, 2016.
  38. Hyperbolic diffusion embedding and distance for hierarchical representation learning. In International Conference on Machine Learning, 2023.
  39. Joint dimension reduction and clustering analysis of single-cell rna-seq and spatial transcriptomics data. Nucleic acids research, 50(12):e72–e72, 2022a.
  40. Robust graph dictionary learning. In The Eleventh International Conference on Learning Representations, 2022b.
  41. Doubly stochastic neighbor embedding on spheres. Pattern Recognition Letters, 128:100–106, 2019.
  42. Block majorization-minimization with diminishing radius for constrained nonconvex optimization. 08 2023.
  43. Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on Multimedia, pp.  1485–1488, 2010.
  44. (probably) concave graph matching. Advances in Neural Information Processing Systems, 31, 2018.
  45. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  46. Mémoli, F. Gromov–wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11:417–487, 2011.
  47. Gromov-monge quasi-metrics and distance distributions. arXiv, 2018, 2018.
  48. Comparison results for gromov-wasserstein and gromov-monge distances. arXiv preprint arXiv:2212.14123, 2022.
  49. A wrapped normal distribution on hyperbolic space for gradient-based learning. In International Conference on Machine Learning, volume 97, pp.  4693–4702. PMLR, 09–15 Jun 2019.
  50. Columbia object image library (coil-20). 1996.
  51. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In International Conference on Machine Learning, volume 80, pp.  3779–3788, 10–15 Jul 2018.
  52. Automatic differentiation in pytorch. 2017.
  53. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  54. Gromov-wasserstein averaging of kernel and distance matrices. In International conference on machine learning, pp. 2664–2672. PMLR, 2016.
  55. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  56. Co-optimal transport. Advances in Neural Information Processing Systems, 33(17559-17570):2, 2020.
  57. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp.  410–420, 2007.
  58. Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
  59. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323–2326, 2000.
  60. A review of clustering techniques and developments. Neurocomputing, 267:664–681, 2017.
  61. Linear-time gromov wasserstein distances using low rank couplings and costs. In International Conference on Machine Learning, pp. 19347–19365. PMLR, 2022.
  62. Schaeffer, S. E. Graph clustering. Computer science review, 1(1):27–64, 2007.
  63. Kernel principal component analysis. In International conference on artificial neural networks, pp.  583–588. Springer, 1997.
  64. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.
  65. Entropic metric alignment for correspondence problems. ACM Transactions on Graphics (ToG), 35(4):1–13, 2016.
  66. Sturm, K.-T. The space of spaces: curvature bounds and gradient flows on the space of metric measure spaces. arXiv preprint arXiv:1208.0434, 2012.
  67. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
  68. Aligning individual brains with fused unbalanced gromov wasserstein. Advances in Neural Information Processing Systems, 35:21792–21804, 2022.
  69. Tseng, P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of optimization theory and applications, 109:475–494, 2001.
  70. A probabilistic graph coupling view of dimension reduction. Advances in Neural Information Processing Systems, 35:10696–10708, 2022.
  71. SNEkhorn: Dimension reduction with symmetric entropic affinities. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  72. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  73. Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71), 2009.
  74. Optimal transport for structured data with application on graphs. arXiv preprint arXiv:1805.09114, 2018.
  75. One model fits all: combining inference and simulation of gene regulatory networks. PLoS Computational Biology, 19(3):e1010962, 2023.
  76. Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009.
  77. Vincent-Cuaz, C. Optimal transport for graph representation learning. PhD thesis, Université Côte d’Azur, 2023.
  78. Online graph dictionary learning. In International conference on machine learning, pp. 10564–10574. PMLR, 2021.
  79. Semi-relaxed gromov-wasserstein divergence and applications on graphs. In International Conference on Learning Representations, 2022a.
  80. Template based graph neural network with optimal transport distances. Advances in Neural Information Processing Systems, 35:11800–11814, 2022b.
  81. Entropic affinities: Properties and efficient numerical computation. In International conference on machine learning, pp. 477–485. PMLR, 2013.
  82. Von Luxburg, U. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
  83. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
  84. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  85. Xu, H. Gromov-wasserstein factorization models for graph clustering. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  6478–6485, 2020.
  86. Scalable gromov-wasserstein learning for graph partitioning and matching. Advances in neural information processing systems, 32, 2019.
  87. Cell types in the mouse cortex and hippocampus revealed by single-cell rna-seq. Science, 347(6226):1138–1142, 2015.
  88. Generative graph dictionary learning. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  40749–40769. PMLR, 23–29 Jul 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.