Papers
Topics
Authors
Recent
2000 character limit reached

Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures (2306.03801v2)

Published 6 Jun 2023 in cs.LG, cs.CG, math.AT, and stat.ML

Abstract: Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35, 2017.
  2. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science, 5(6):405–424, 2015.
  3. A survey of vectorization methods in topological data analysis. arXiv preprint arXiv:2212.09703, 2022.
  4. On approximation of 2D persistence modules by interval-decomposables. Journal of Computational Algebra, 6-7:100007, 2023.
  5. Keeping it sparse: Computing persistent homology revisited. arXiv preprint arXiv:2211.09075, 2022.
  6. Euler characteristic surfaces. Foundations of Data Science, 4(4):505–536, 2022.
  7. Harmonic analysis on semigroups, volume 100 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1984. Theory of positive definite and related functions.
  8. P. Billingsley. Probability and measure. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, third edition, 1995. A Wiley-Interscience Publication.
  9. H. B. Bjerkevik and M. Lesnick. ℓpsuperscriptℓ𝑝\ell^{p}roman_ℓ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-distances on multiparameter persistence modules. arXiv preprint arXiv:2106.13589, 2021.
  10. M. B. Botnan and M. Lesnick. An introduction to multiparameter persistence. Proceedings of the 2020 International Conference on Representations of Algebras (to appear). arXiv preprint arXiv:2203.14289, 2023.
  11. Signed barcodes for multi-parameter persistence via rank decompositions. In 38th International Symposium on Computational Geometry, volume 224 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 19, 18. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2022.
  12. On the bottleneck stability of rank decompositions of multi-parameter persistence modules. arXiv preprint arXiv:2208.00300, 2022.
  13. P. Bubenik. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16:77–102, 2015.
  14. Efficient and robust persistent homology for measures. Comput. Geom., 58:70–96, 2016.
  15. C. Cai and Y. Wang. Understanding the power of persistence pairing via permutation test. arXiv preprint arXiv:2001.06058, 2020.
  16. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS computational biology, 14(1):e1005929, 2018.
  17. G. Carlsson and F. Mémoli. Multiparameter hierarchical clustering methods. In Classification as a tool for research, Stud. Classification Data Anal. Knowledge Organ., pages 63–70. Springer, Berlin, 2010.
  18. G. Carlsson and F. Mémoli. Classifying clustering schemes. Foundations of Computational Mathematics, 13(2):221–252, 2013.
  19. G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93, 2009.
  20. M. Carrière and A. Blumberg. Multiparameter persistence image for topological machine learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 22432–22444. Curran Associates, Inc., 2020.
  21. Perslay: A neural network layer for persistence diagrams and new graph topological signatures. In International Conference on Artificial Intelligence and Statistics, pages 2786–2796. PMLR, 2020.
  22. Sliced Wasserstein kernel for persistence diagrams. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 664–673. PMLR, 06–11 Aug 2017.
  23. The structure and stability of persistence modules. SpringerBriefs in Mathematics. Springer, [Cham], 2016.
  24. Robust topological inference: Distance to a measure and kernel distance. Journal of Machine Learning Research, 18(159):1–40, 2018.
  25. Scalar field analysis over point cloud data. Discrete Comput. Geom., 46(4):743–775, 2011.
  26. F. Chazal and B. Michel. An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4:667963, 2021.
  27. T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, 2016. Association for Computing Machinery.
  28. Robust ligand-based modeling of the biological targets of known drugs. Journal of medicinal chemistry, 49(10):2921–2938, 2006.
  29. Stability of persistence diagrams. In Computational geometry (SCG’05), pages 263–271. ACM, New York, 2005.
  30. Lipschitz functions have Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable persistence. Foundations of Computational Mathematics, 10(2):127–139, 2010.
  31. Vines and vineyards by updating persistence in linear time. In Computational geometry (SCG’06), pages 119–126. ACM, New York, 2006.
  32. A kernel for multi-parameter persistent homology. Computers & Graphics: X, 2:100005, 2019.
  33. W. Crawley-Boevey. Decomposition of pointwise finite-dimensional persistence modules. Journal of Algebra and its Applications, 14(05):1550066, 2015.
  34. ToDD: Topological compound fingerprinting in computer-aided drug discovery. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
  35. T. K. Dey and Y. Wang. Computational topology for data analysis. Cambridge University Press, Cambridge, 2022.
  36. V. Divol and T. Lacombe. Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport. J. Appl. Comput. Topol., 5(1):1–53, 2021.
  37. R. Durrett. Probability: theory and examples, volume 31 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, fourth edition, 2010.
  38. Topological persistence and simplification. volume 28, pages 511–533. 2002. Discrete and computational geometry and graph drawing (Columbia, SC, 2001).
  39. H. A. D. et al. The UCR Time Series Archive. IEEE/CAA Journal of Automatica Sinica, 6, 2019.
  40. A. Fomenko. Visual geometry and topology. Springer-Verlag, Berlin, 1994. Translated from the Russian by Marianna V. Tsaplina.
  41. R. Ghrist. Barcodes: the persistent topology of data. Bull. Amer. Math. Soc. (N.S.), 45(1):61–75, 2008.
  42. O. Hacquard and V. Lebovici. Euler characteristic tools for topological data analysis. arXiv preprint arXiv:2303.14040, 2023.
  43. L. G. Hanin. Kantorovich-Rubinstein norm and its application in the theory of Lipschitz spaces. Proc. Amer. Math. Soc., 115(2):345–352, 1992.
  44. Inequalities. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 1988. Reprint of the 1952 edition.
  45. A. Hatcher. Algebraic Topology. Cambridge University Press, 2001.
  46. On a functional space and certain extremum problems. Dokl. Akad. Nauk SSSR (N.S.), 115:1058–1061, 1957.
  47. Persistent homology for virtual screening. 2018.
  48. M. Kerber and A. Rolle. Fast Minimal Presentations of Bi-graded Persistence Modules, pages 207–220.
  49. W. Kim and F. Mémoli. Generalized persistence diagrams for persistence modules over posets. J. Appl. Comput. Topol., 5(4):533–581, 2021.
  50. Persistence weighted gaussian kernel for topological data analysis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 2004–2013. JMLR.org, 2016.
  51. T. Y. Lam. Exercises in modules and rings. Problem Books in Mathematics. Springer, New York, 2007.
  52. T. Le and M. Yamada. Persistence fisher kernel: A riemannian manifold kernel for persistence diagrams. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 10028–10039, Red Hook, NY, USA, 2018. Curran Associates Inc.
  53. M. Lesnick and M. Wright. Computing minimal presentations and bigraded Betti numbers of 2-parameter persistent homology. SIAM Journal on Applied Algebra and Geometry, 6(2):267–298, 2022.
  54. TUDataset: A collection of benchmark datasets for learning with graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020.
  55. S. Oudot and L. Scoccola. On the stability of multigraded Betti numbers and hilbert functions. SIAM Journal on Applied Algebra and Geometry, 8(1):54–88, 2024.
  56. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  57. G. Peyré and M. Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019.
  58. Topological function optimization for continuous shape matching. Computer Graphics Forum, 37(5):13–25, 2018.
  59. Wasserstein barycenter and its application to texture mixing. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 435–446, 2011.
  60. A stable multi-scale kernel for topological machine learning. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4741–4748, 2015.
  61. U. Rester. From virtuality to reality-virtual screening in lead discovery and lead optimization: a medicinal chemistry perspective. Current opinion in drug discovery & development, 11(4):559–568, 2008.
  62. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nature Biotechnology, 35:551–560, 2017.
  63. Pore configuration landscape of granular crystallization. Nature Communications, 8:15082, 2017.
  64. Comparative analysis of two discretizations of ricci curvature for complex networks. Scientific reports, 8(1):8650, 2018.
  65. L. Scoccola and A. Rolle. Persistable: persistent and stable clustering. Journal of Open Source Software, 8(83):5022, 2023.
  66. D. R. Sheehy. Linear-size approximations to the Vietoris-Rips filtration. Discrete Comput. Geom., 49(4):778–796, 2013.
  67. Three-dimensional compound comparison methods and their application in drug discovery. Molecules, 20(7):12841–12862, 2015.
  68. P. Skraba and K. Turner. Wasserstein stability for persistence diagrams. arXiv preprint arXiv:2006.16824, 2022.
  69. A concise and provably informative multi-scale signature based on heat diffusion. In Computer graphics forum, volume 28, pages 1383–1392. Wiley Online Library, 2009.
  70. The GUDHI Project. GUDHI User and Reference Manual. GUDHI Editorial Board, 3.6.0 edition, 2022.
  71. The RIVET Developers. Rivet, 2020.
  72. T. tom Dieck. Algebraic topology. EMS Textbooks in Mathematics. European Mathematical Society (EMS), Zürich, 2008.
  73. Topological data analysis of biological aggregation models. PloS one, 10(5):e0126383, 2015.
  74. Y. Umeda. Time series classification via topological data analysis. Information and Media Technologies, 12:228–239, 2017.
  75. S. Verma and Z.-L. Zhang. Hunt for the unique, stable, sparse and fast feature learning on graphs. Advances in Neural Information Processing Systems, 30, 2017.
  76. O. Vipond. Multiparameter persistence landscapes. J. Mach. Learn. Res., 21:Paper No. 61, 38, 2020.
  77. Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors. Proceedings of the National Academy of Sciences, 118(41):e2102166118, 2021.
  78. Performance of machine-learning scoring functions in structure-based virtual screening. Scientific Reports, 7(1):1–10, 2017.
  79. GRIL: A 2-parameter Persistence Based Vectorization for Machine Learning. In 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning. OpenReviews.net, 2023.
  80. How powerful are graph neural networks? In International Conference on Learning Representations, 2019.
  81. Retgk: Graph kernels based on return probabilities of random walks. Advances in Neural Information Processing Systems, 31, 2018.
  82. Filtration-domination in bifiltered graphs. 2023 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 27–38.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.