Analysing Multiscale Clusterings with Persistent Homology (2305.04281v5)
Abstract: In data clustering, it is often desirable to find not just a single partition into clusters but a sequence of partitions that describes the data at different scales (or levels of coarseness). A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions. Here, we use tools from topological data analysis and introduce the Multiscale Clustering Filtration (MCF), a well-defined and stable filtration of abstract simplicial complexes that encodes arbitrary cluster assignments in a sequence of partitions across scales of increasing coarseness. We show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy of this sequence, and the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we provide an equivalent construction via a nerve complex filtration, and we show that, in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. Using synthetic data, we then illustrate how the persistence diagram of the MCF provides a feature map that can serve to characterise and classify multiscale clusterings.
- Mehmet E. Aktas, Esra Akbas and Ahmed El Fatmaoui “Persistence Homology of Networks: Methods and Applications” In Applied Network Science 4.1 SpringerOpen, 2019, pp. 1–28 URL: https://appliednetsci.springeropen.com/articles/10.1007/s41109-019-0179-3
- “Extracting Information from Free Text through Unsupervised Graph-Based Clustering: An Application to Patient Incident Records”, 2019 arXiv: http://arxiv.org/abs/1909.00183
- “PyGenStability: Multiscale Community Detection with Generalized Markov Stability”, 2023 arXiv: http://arxiv.org/abs/2303.05385
- Jean-Daniel Boissonnat “GUDHI Library”, 2022 URL: https://gudhi.inria.fr/index.html
- Jean-Daniel Boissonnat, Tamal K. Dey and Clément Maria “The Compressed Annotation Matrix: An Efficient Data Structure for Computing Persistent Cohomology” In Algorithmica 73.3, 2015, pp. 607–619 URL: https://doi.org/10.1007/s00453-015-9999-4
- Béla Bollobás “Random Graphs” Cambridge: Cambridge University Press, 2011
- Kyle Brown “Topological Hierarchies and Decomposition: From Clustering to Persistence”, 2022 URL: https://etd.ohiolink.edu/apexprod/rws_olink/r/1501/10?clear=10&p10_accession_num=wright1650388451804736
- “HELOC Applicant Risk Performance Evaluation by Topological Hierarchical Decomposition”, 2018 arXiv: http://arxiv.org/abs/1811.10658
- Richard A. Brualdi “Introductory Combinatorics” Upper Saddle River, N.J: Pearson/Prentice Hall, 2010
- “Approximating Persistent Homology for Large Datasets”, 2022 arXiv: http://arxiv.org/abs/2204.09155
- Gunnar Carlsson “Topology and Data” In Bulletin of the American Mathematical Society 46.2, 2009, pp. 255–308 URL: https://www.ams.org/bull/2009-46-02/S0273-0979-09-01249-X/
- “Characterization, Stability and Convergence of Hierarchical Clustering Methods” In Journal of Machine Learning Research 11.47, 2010, pp. 1425–1470 URL: http://jmlr.org/papers/v11/carlsson10a.html
- “Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 5219–5223 DOI: 10.1109/ICASSP.2013.6638658
- Joseph Minhow Chan, Gunnar Carlsson and Raul Rabadan “Topology of Viral Evolution” In Proceedings of the National Academy of Sciences 110.46, 2013, pp. 18566–18571 URL: https://pnas.org/doi/full/10.1073/pnas.1313480110
- Frédéric Chazal and Steve Yann Oudot “Towards Persistence-Based Reconstruction in Euclidean Spaces” In Proceedings of the Twenty-Fourth Annual Symposium on Computational Geometry, SCG ’08 New York, NY, USA: Association for Computing Machinery, 2008, pp. 232–241 URL: https://doi.org/10.1145/1377676.1377719
- “Gromov-Hausdorff Stable Signatures for Shapes Using Persistence” In Computer Graphics Forum 28.5, 2009, pp. 1393–1403 URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2009.01516.x
- Frédéric Chazal, Vin Silva and Steve Oudot “Persistence Stability for Geometric Complexes” In Geometriae Dedicata 173.1, 2014, pp. 193–214 URL: http://link.springer.com/10.1007/s10711-013-9937-z
- Jean-Charles Delvenne, Sophia N. Yaliraki and Mauricio Barahona “Stability of Graph Communities across Time Scales” In Proceedings of the National Academy of Sciences 107.29, 2010, pp. 12755–12760 URL: http://www.pnas.org/cgi/doi/10.1073/pnas.0903215107
- Tamal K. Dey and Yusu Wang “Computational Topology for Data Analysis” New York: Cambridge University Press, 2022
- Tamal K. Dey, Facundo Mémoli and Yusu Wang “Multiscale Mapper: Topological Summarization via Codomain Covers” In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms Society for Industrial and Applied Mathematics, 2016, pp. 997–1013 URL: http://epubs.siam.org/doi/10.1137/1.9781611974331.ch71
- “Computational Topology: An Introduction” Providence, R.I: American Mathematical Society, 2010
- Herbert Edelsbrunner, David Letscher and Afra Zomorodian “Topological Persistence and Simplification” In Discrete & Computational Geometry 28.4, 2002, pp. 511–533 URL: https://doi.org/10.1007/s00454-002-2885-2
- “On Random Graphs I” In Publicationes Mathematicae Debrecen 6, 1959, pp. 290–297 DOI: 10.5486/PMD.1959.6.3-4.12
- “Multiscale Methods for Signal Selection in Single-Cell Data” In Entropy 24.8 Multidisciplinary Digital Publishing Institute, 2022, pp. 1116 URL: https://www.mdpi.com/1099-4300/24/8/1116
- Paul W. Holland, Kathryn Blackmond Laskey and Samuel Leinhardt “Stochastic Blockmodels: First Steps” In Social Networks 5.2, 1983, pp. 109–137 URL: https://www.sciencedirect.com/science/article/pii/0378873383900217
- Danijela Horak, Slobodan Maletić and Milan Rajković “Persistent Homology of Complex Networks” In Journal of Statistical Mechanics: Theory and Experiment 2009.03, 2009, pp. P03034 URL: https://doi.org/10.1088/1742-5468/2009/03/p03034
- A.K. Jain, M.N. Murty and P.J. Flynn “Data Clustering: A Review” In ACM Computing Surveys 31.3, 1999, pp. 264–323 URL: https://doi.org/10.1145/331499.331504
- “A Topological Representation of Branching Neuronal Morphologies” In Neuroinformatics 16.1, 2018, pp. 3–13 URL: https://doi.org/10.1007/s12021-017-9341-1
- Lida Kanari, Adélie Garin and Kathryn Hess “From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives” In Algorithms 13.12 Multidisciplinary Digital Publishing Institute, 2020, pp. 335 URL: https://www.mdpi.com/1999-4893/13/12/335
- “Stochastic Blockmodels and Community Structure in Networks” In Physical Review E 83.1 American Physical Society, 2011, pp. 016107 URL: https://link.aps.org/doi/10.1103/PhysRevE.83.016107
- “Extracting Persistent Clusters in Dynamic Data via Möbius Inversion”, 2022 arXiv: http://arxiv.org/abs/1712.04064
- Renaud Lambiotte, Jean-Charles Delvenne and Mauricio Barahona “Laplacian Dynamics and Multiscale Modular Structure in Networks”, 2009 arXiv: http://arxiv.org/abs/0812.1770
- Renaud Lambiotte, Jean-Charles Delvenne and Mauricio Barahona “Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks” In IEEE Transactions on Network Science and Engineering 1.2, 2014, pp. 76–90 URL: http://ieeexplore.ieee.org/document/7010026/
- Ulrike Luxburg, Robert C. Williamson and Isabelle Guyon “Clustering: Science or Art?” In Proceedings of ICML Workshop on Unsupervised and Transfer Learning JMLR Workshop and Conference Proceedings, 2012, pp. 65–79 URL: https://proceedings.mlr.press/v27/luxburg12a.html
- Jiří Matoušek “Using the Borsuk-Ulam Theorem: Lectures on Topological Methods in Combinatorics and Geometry”, Universitext Berlin ; New York: Springer, 2003
- “A Roadmap for the Computation of Persistent Homology” In EPJ Data Science 6.1, 2017, pp. 1–38 URL: https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-017-0109-5
- Tiago P. Peixoto “Hierarchical Block Structures and High-Resolution Model Selection in Large Networks” In Physical Review X 4.1 American Physical Society, 2014, pp. 011047 URL: https://link.aps.org/doi/10.1103/PhysRevX.4.011047
- “Markov Dynamics as a Zooming Lens for Multiscale Community Detection: Non Clique-Like Communities and the Field-of-View Limit” In PLoS ONE 7.2, 2012 URL: https://dx.plos.org/10.1371/journal.pone.0032210
- Michael T. Schaub, Renaud Lambiotte and Mauricio Barahona “Encoding Dynamics for Multiscale Community Detection: Markov Time Sweeping for the Map Equation” In Physical Review E 86.2, 2012, pp. 026112 URL: https://link.aps.org/doi/10.1103/PhysRevE.86.026112
- Michael T. Schaub, Jiaze Li and Leto Peel “Hierarchical Community Structure in Networks” In Physical Review E 107.5, 2023, pp. 054305 URL: https://link.aps.org/doi/10.1103/PhysRevE.107.054305
- Dominik J. Schindler, Jonathan Clarke and Mauricio Barahona “Multiscale Mobility Patterns and the Restriction of Human Movement”, 2023 arXiv: http://arxiv.org/abs/2201.06323
- Gurjeet Singh, Facundo Memoli and Gunnar Carlsson “Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition” In Eurographics Symposium on Point-Based Graphics The Eurographics Association, 2007, pp. 10 pages URL: http://diglib.eg.org/handle/10.2312/SPBG.SPBG07.091-100
- “Wasserstein Stability for Persistence Diagrams”, 2022 arXiv: http://arxiv.org/abs/2006.16824
- Richard P. Stanley “Enumerative Combinatorics. Volume 1”, Cambridge Studies in Advanced Mathematics 49 Cambridge, NY: Cambridge University Press, 2011
- Qingsong Wang “The Persistent Topology of Geometric Filtrations”, 2022 URL: https://etd.ohiolink.edu/apexprod/rws_olink/r/1501/10?p10_etd_subid=196459&clear=10
- “Optimal Sankey Diagrams Via Integer Programming” In 2018 IEEE Pacific Visualization Symposium (PacificVis), 2018, pp. 135–139 DOI: 10.1109/PacificVis.2018.00025
- “Computing Persistent Homology” In Discrete & Computational Geometry 33.2, 2005, pp. 249–274 URL: https://doi.org/10.1007/s00454-004-1146-y