Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$G$-Mapper: Learning a Cover in the Mapper Construction (2309.06634v3)

Published 12 Sep 2023 in cs.LG, math.AT, and stat.ML

Abstract: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The annals of mathematical statistics, pages 193–212, 1952.
  2. A social perspective on perceived distances reveals deep community structure. Proceedings of the National Academy of Sciences, 119(4):e2003634119, 2022.
  3. James C Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, 2013.
  4. Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  5. F-mapper: A fuzzy mapper clustering algorithm. Knowledge-Based Systems, 189:105107, 2020.
  6. Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
  7. Statistical analysis and parameter selection for mapper. The Journal of Machine Learning Research, 19(1):478–516, 2018.
  8. Adaptive covers for mapper graphs using information criteria. In 2021 IEEE International Conference on Big Data (Big Data), pages 3789–3800. IEEE, 2021.
  9. A benchmark for 3D mesh segmentation. ACM Trans. Graph., 28(3), jul 2009.
  10. Morphometric analysis of Passiflora Leaves: the Relationship Between Landmarks of the Vasculature and Elliptical Fourier Descriptors of the Blade. GigaScience, 6(1), 01 2017.
  11. Hypergraph co-optimal transport: Metric and categorical properties. Journal of Applied and Computational Topology, pages 1–60, 2023.
  12. Extending persistence using Poincaré and Lefschetz duality. Foundations of Computational Mathematics, 9(1):79–103, 2009.
  13. An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases, 20(5):533–534, 2020.
  14. Pattern Classification. John Wiley & Sons, 2012.
  15. Joseph C Dunn. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 1973.
  16. Ralph B. D’Agostino. Tests for the normal distribution. In Goodness-of-fit Techniques, pages 367–420. Routledge, 2017.
  17. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, pages 226–231, 1996.
  18. Augmentations of Forman’s Ricci curvature and their applications in community detections. arXiv preprint arXiv:2306.06474, 2023.
  19. Learning the k in k-means. Advances in neural information processing systems, 16, 2003.
  20. Investigation on several model selection criteria for determining the number of cluster. Neural Information Processing-Letters and Reviews, 4(1):1–10, 2004.
  21. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622:178–210, 2023.
  22. Algorithms for Clustering Data. Prentice-Hall, Inc., 1988.
  23. Stephen C Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241–254, 1967.
  24. Michal Konkol. Fuzzy agglomerative clustering. In Artificial Intelligence and Soft Computing: 14th International Conference, ICAISC 2015, Zakopane, Poland, June 14-18, 2015, Proceedings, Part I 14, pages 207–217. Springer, 2015.
  25. Learning multiple layers of features from tiny images. Technical Report, 2009.
  26. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science translational medicine, 7(311):311ra174–311ra174, 2015.
  27. The Gudhi library: Simplicial complexes and persistent homology. In Hoon Hong and Chee Yap, editors, Mathematical Software – ICMS 2014, pages 167–174, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.
  28. Frank J Massey Jr. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, 46(253):68–78, 1951.
  29. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.
  30. X-means: Extending k-means with efficient estimation of the number of clusters. In Icml, volume 1, pages 727–734, 2000.
  31. Topological data analysis reveals core heteroblastic and ontogenetic programs embedded in leaves of grapevine (vitaceae) and maracuyá (passifloraceae). PLOS Computational Biology, 20(2):e1011845, 2024.
  32. TopoAct: Visually exploring the shape of activations in deep learning. In Computer Graphics Forum, pages 382–397. Wiley Online Library, 2021.
  33. TopoBERT: Exploring the topology of fine-tuned word representations. Information Visualization, 22(3):186–208, 2023.
  34. Georges Reeb. Sur les points singuliers d’une forme de Pfaff completement integrable ou d’une fonction numerique [On the singular points of a completely integrable Pfaff form or of a numerical function]. Comptes Rendus Acad. Sciences Paris, 222:847–849, 1946.
  35. Fuzzy spectral clustering by PCCA+: Application to Markov state models and data classification. Advances in Data Analysis and Classification, 7:147–179, 2013.
  36. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
  37. Unsupervised k-means clustering algorithm. IEEE access, 8:80716–80727, 2020.
  38. Topological methods for the analysis of high dimensional data sets and 3D object recognition. PBG@ Eurographics, 2, 2007.
  39. Michael A Stephens. EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347):730–737, 1974.
  40. Giotto-TDA: A topological data analysis toolkit for machine learning and data exploration. Journal of Machine Learning Research, 22(42):1–6, 2020.
  41. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  42. Kepler mapper: A flexible Python implementation of the mapper algorithm. Journal of Open Source Software, 4(42):1315, 2019.
  43. Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
  44. Hierarchical fuzzy spectral clustering in social networks using spectral characterization. In The twenty-eighth international flairs conference. Citeseer, 2015.
  45. Mapper interactive: A scalable, extendable, and interactive toolbox for the visual exploration of high-dimensional data. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pages 101–110. IEEE, 2021.
  46. Comparing mapper graphs of artificial neuron activations. In 2023 Topological Data Analysis and Visualization (TopoInVis), pages 41–50. IEEE, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Enrique Alvarado (7 papers)
  2. Robin Belton (6 papers)
  3. Emily Fischer (2 papers)
  4. Kang-Ju Lee (5 papers)
  5. Sourabh Palande (5 papers)
  6. Sarah Percival (8 papers)
  7. Emilie Purvine (28 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.