$k$-Means Clustering for Persistent Homology (2210.10003v4)
Abstract: Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the $k$-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that $k$-means clustering performance directly on persistence diagrams and measures outperform their vectorized representations.
- Persistence Images: A Stable Vector Representation of Persistent Homology. Journal of Machine Learning Research, 18(8), 1–35.
- Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18.
- k-means++: The Advantages of Careful Seeding. Technical Report 2006-13. Stanford InfoLab.
- Homological persistence in time series: an application to music classification. Journal of Mathematics and Music, 14(2), 204–221.
- Persistent Homology for Path Planning in Uncertain Environments. IEEE Transactions on Robotics, 31(3), 578–590.
- Regression analysis for interval-valued data. Pages 369–374 of: Data analysis, classification, and related methods. Springer.
- Fréchet Mean Set Estimation in the Hausdorff Metric, via Relaxation. arXiv preprint arXiv:2212.12057.
- Convex optimization. Cambridge university press.
- Bubenik, Peter. 2015a. Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research, 16(3), 77–102.
- Bubenik, Peter. 2015b. Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research, 16, 77–102.
- Centrosymmetric stochastic matrices. Linear and Multilinear Algebra, 70(3), 449–464.
- The structure and stability of persistence modules. Springer.
- Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis. Journal of the American Statistical Association, 115(531), 1139–1150.
- Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information sciences, 324, 126–145.
- Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7(1), 339–358.
- The density of expected persistence diagrams and its kernel based estimation. Journal of Computational Geometry, 10(2), 127–153.
- Estimation and quantization of expected persistence diagrams. International Conference on Machine Learning, 2760–2770.
- Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport. Journal of Applied and Computational Topology, 5(1), 1–53.
- Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. The Annals of Applied Statistics, 3(3), 1102 – 1123.
- Topological Persistence and Simplification. Discrete & Computational Geometry, 28(4), 511–533.
- Multiscale Topology of Chromatin Folding.
- POT: Python Optimal Transport. Journal of Machine Learning Research, 22(78), 1–8.
- Frosini, Patrizio. 1992. Measuring shapes by size functions. Pages 122–134 of: Intelligent Robots and Computer Vision X: Algorithms and Techniques, vol. 1607. International Society for Optics and Photonics.
- Size Functions and Formal Series. Applicable Algebra in Engineering, Communication and Computing, 12(4), 327–349.
- A topological measurement of protein compressibility. Japan Journal of Industrial and Applied Mathematics, 32(Mar.), 1–17.
- Ghrist, Robert. 2008. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1), 61–75.
- Feature-space clustering for fMRI meta-analysis. Human brain mapping, 13(3), 165–183.
- Algorithm AS 136: A k𝑘kitalic_k-Means Clustering Algorithm. Applied Statistics, 28(1), 100.
- Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences, 113(26), 7035–7040.
- Comparing partitions. Journal of Classification, 2(1), 193–218.
- Unsupervised space–time clustering using persistent homology. Environmetrics, 30(4), e2539.
- Detecting Early Warning Signals of Major Financial Crashes in Bitcoin Using Persistent Homology. IEEE Access, 8, 202042–202057. Conference Name: IEEE Access.
- Using persistent homology and dynamical distances to analyze protein binding. Statistical Applications in Genetics and Molecular Biology, 15(1).
- Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport. arXiv:1805.08331 [cs, stat].
- The Fréchet mean shape and the shape of the means. Advances in Applied Probability, 32(1), 101–113.
- Clustering and classification of time series using topological data analysis with applications to finance. Expert Systems with Applications, 162, 113868.
- K𝐾Kitalic_K-means clustering on the space of persistence diagrams. Wavelets and Sparsity XVII, 10394, 103940W.
- Probability measures on the space of persistence diagrams. Inverse Problems, 27(12), 124007.
- Geomstats: A Python Package for Riemannian Geometry in Machine Learning. J. Mach. Learn. Res., 21(1).
- Tropical Sufficient Statistics for Persistent Homology. SIAM Journal on Applied Algebra and Geometry, 3(2), 337–371.
- Characterizing Reaction Route Map of Realistic Molecular Reactions Based on Weight Rank Clique Filtration of Persistent Homology. Journal of Chemical Theory and Computation, 0(0), null. PMID: 37395411.
- A roadmap for the computation of persistent homology. EPJ Data Science, 6(1), 17.
- Panagopoulos, Dimitrios. 2022. Topological data analysis and clustering. arXiv preprint arXiv:2201.09054.
- Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103–119.
- Topological trajectory classification with filtrations of simplicial complexes and persistent homology. The International Journal of Robotics Research, 35(1-3), 204–223.
- A stable multi-scale kernel for topological machine learning. Pages 4741–4748 of: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE.
- K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 81–87.
- Deformation transfer for triangle meshes. ACM Transactions on graphics (TOG), 23(3), 399–405.
- Thorndike, Robert L. 1953. Who belongs in the family? Psychometrika, 18(4), 267–276.
- Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 52(1), 44–70.
- Persistent homology for automatic determination of human-data based cost of bipedal walking. Nonlinear Analysis: Hybrid Systems, 7(1), 101–115. IFAC World Congress 2011.
- On the use of size functions for shape analysis. Biological Cybernetics, 70(2), 99–107.
- Multiscale persistent functions for biomolecular structure characterization.
- Computing Persistent Homology. Discrete & Computational Geometry, 33(2), 249–274.