Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$k$-Means Clustering for Persistent Homology (2210.10003v4)

Published 18 Oct 2022 in stat.AP, math.OC, and stat.ML

Abstract: Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the $k$-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that $k$-means clustering performance directly on persistence diagrams and measures outperform their vectorized representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Persistence Images: A Stable Vector Representation of Persistent Homology. Journal of Machine Learning Research, 18(8), 1–35.
  2. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18.
  3. k-means++: The Advantages of Careful Seeding. Technical Report 2006-13. Stanford InfoLab.
  4. Homological persistence in time series: an application to music classification. Journal of Mathematics and Music, 14(2), 204–221.
  5. Persistent Homology for Path Planning in Uncertain Environments. IEEE Transactions on Robotics, 31(3), 578–590.
  6. Regression analysis for interval-valued data. Pages 369–374 of: Data analysis, classification, and related methods. Springer.
  7. Fréchet Mean Set Estimation in the Hausdorff Metric, via Relaxation. arXiv preprint arXiv:2212.12057.
  8. Convex optimization. Cambridge university press.
  9. Bubenik, Peter. 2015a. Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research, 16(3), 77–102.
  10. Bubenik, Peter. 2015b. Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research, 16, 77–102.
  11. Centrosymmetric stochastic matrices. Linear and Multilinear Algebra, 70(3), 449–464.
  12. The structure and stability of persistence modules. Springer.
  13. Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis. Journal of the American Statistical Association, 115(531), 1139–1150.
  14. Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information sciences, 324, 126–145.
  15. Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology, 7(1), 339–358.
  16. The density of expected persistence diagrams and its kernel based estimation. Journal of Computational Geometry, 10(2), 127–153.
  17. Estimation and quantization of expected persistence diagrams. International Conference on Machine Learning, 2760–2770.
  18. Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport. Journal of Applied and Computational Topology, 5(1), 1–53.
  19. Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. The Annals of Applied Statistics, 3(3), 1102 – 1123.
  20. Topological Persistence and Simplification. Discrete & Computational Geometry, 28(4), 511–533.
  21. Multiscale Topology of Chromatin Folding.
  22. POT: Python Optimal Transport. Journal of Machine Learning Research, 22(78), 1–8.
  23. Frosini, Patrizio. 1992. Measuring shapes by size functions. Pages 122–134 of: Intelligent Robots and Computer Vision X: Algorithms and Techniques, vol. 1607. International Society for Optics and Photonics.
  24. Size Functions and Formal Series. Applicable Algebra in Engineering, Communication and Computing, 12(4), 327–349.
  25. A topological measurement of protein compressibility. Japan Journal of Industrial and Applied Mathematics, 32(Mar.), 1–17.
  26. Ghrist, Robert. 2008. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1), 61–75.
  27. Feature-space clustering for fMRI meta-analysis. Human brain mapping, 13(3), 165–183.
  28. Algorithm AS 136: A k𝑘kitalic_k-Means Clustering Algorithm. Applied Statistics, 28(1), 100.
  29. Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences, 113(26), 7035–7040.
  30. Comparing partitions. Journal of Classification, 2(1), 193–218.
  31. Unsupervised space–time clustering using persistent homology. Environmetrics, 30(4), e2539.
  32. Detecting Early Warning Signals of Major Financial Crashes in Bitcoin Using Persistent Homology. IEEE Access, 8, 202042–202057. Conference Name: IEEE Access.
  33. Using persistent homology and dynamical distances to analyze protein binding. Statistical Applications in Genetics and Molecular Biology, 15(1).
  34. Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport. arXiv:1805.08331 [cs, stat].
  35. The Fréchet mean shape and the shape of the means. Advances in Applied Probability, 32(1), 101–113.
  36. Clustering and classification of time series using topological data analysis with applications to finance. Expert Systems with Applications, 162, 113868.
  37. K𝐾Kitalic_K-means clustering on the space of persistence diagrams. Wavelets and Sparsity XVII, 10394, 103940W.
  38. Probability measures on the space of persistence diagrams. Inverse Problems, 27(12), 124007.
  39. Geomstats: A Python Package for Riemannian Geometry in Machine Learning. J. Mach. Learn. Res., 21(1).
  40. Tropical Sufficient Statistics for Persistent Homology. SIAM Journal on Applied Algebra and Geometry, 3(2), 337–371.
  41. Characterizing Reaction Route Map of Realistic Molecular Reactions Based on Weight Rank Clique Filtration of Persistent Homology. Journal of Chemical Theory and Computation, 0(0), null. PMID: 37395411.
  42. A roadmap for the computation of persistent homology. EPJ Data Science, 6(1), 17.
  43. Panagopoulos, Dimitrios. 2022. Topological data analysis and clustering. arXiv preprint arXiv:2201.09054.
  44. Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103–119.
  45. Topological trajectory classification with filtrations of simplicial complexes and persistent homology. The International Journal of Robotics Research, 35(1-3), 204–223.
  46. A stable multi-scale kernel for topological machine learning. Pages 4741–4748 of: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE.
  47. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 81–87.
  48. Deformation transfer for triangle meshes. ACM Transactions on graphics (TOG), 23(3), 399–405.
  49. Thorndike, Robert L. 1953. Who belongs in the family? Psychometrika, 18(4), 267–276.
  50. Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 52(1), 44–70.
  51. Persistent homology for automatic determination of human-data based cost of bipedal walking. Nonlinear Analysis: Hybrid Systems, 7(1), 101–115. IFAC World Congress 2011.
  52. On the use of size functions for shape analysis. Biological Cybernetics, 70(2), 99–107.
  53. Multiscale persistent functions for biomolecular structure characterization.
  54. Computing Persistent Homology. Discrete & Computational Geometry, 33(2), 249–274.
Citations (2)

Summary

We haven't generated a summary for this paper yet.