Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A distribution-guided Mapper algorithm (2401.12237v2)

Published 19 Jan 2024 in math.AT, cs.LG, and q-bio.QM

Abstract: Motivation: The Mapper algorithm is an essential tool to explore shape of data in topology data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a reeb graph of data. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle features of data, especially when the underlying structure is complex. Results: In this work, we introduce a distribution guided Mapper algorithm named D-Mapper, that utilizes the property of the probability model and data intrinsic characteristics to generate density guided covers and provides enhanced topological features. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Moreover, we introduce a metric accounting for both the quality of overlap clustering and extended persistence homology to measure the performance of Mapper type algorithm. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios. We also apply the D-Mapper to a SARS-COV-2 coronavirus RNA sequences dataset to explore the topological structure of different virus variants. The results indicate that the D-Mapper algorithm can reveal both vertical and horizontal evolution processes of the viruses. Availability: Our package is available at https://github.com/ShufeiGe/D-Mapper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Mining Social Media Data Using Topological Data Analysis. In 2017 IEEE International Conference on Information Reuse and Integration (IRI), pages 144–153, 2017.
  2. F-Mapper: A Fuzzy Mapper clustering algorithm. Knowledge-Based Systems, 189:105107, 2020.
  3. Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
  4. Statistical analysis and parameter selection for mapper. Journal of Machine Learning Research, 19(12):1–39, 2018.
  5. Structure and Stability of the One-Dimensional Mapper. Foundations of Computational Mathematics, 18(6):1333–1396, 2018.
  6. Adaptive Covers for Mapper Graphs Using Information Criteria. In 2021 IEEE International Conference on Big Data (Big Data), pages 3789–3800, Orlando, FL, USA, December 2021. IEEE.
  7. Topology of viral evolution. Proceedings of the National Academy of Sciences, 110(46):18566–18571, 2013.
  8. An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists. Frontiers in Artificial Intelligence, 4, 2021.
  9. CNCB-NGDC Members and Partners. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023. Nucleic Acids Res, 51(D1):D18–D28, 2023.
  10. Using Topological Data Analysis and RRT to Investigate Protein Conformational Spaces. page 10.
  11. Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers. In 33rd International Symposium on Computational Geometry (SoCG 2017), volume 77 of Leibniz International Proceedings in Informatics (LIPIcs), pages 36:1–36:16, 2017.
  12. Paweł Dłotko. Ball mapper: a shape summary for topological data analysis, January 2019.
  13. Confidence sets for persistence diagrams. The Annals of Statistics, 42(6):2301–2339, 2014.
  14. Cluster Analysis. In Data Mining, pages 443–495. Elsevier, 2012.
  15. The elements of statistical learning: Data mining, inference and prediction. by. The Mathematical Intelligencer, 27(2):83–85, 2009.
  16. Experiments on Fraud Detection use case with QML and TDA Mapper. In 2021 IEEE International Conference on Quantum Computing and Engineering (QCE), pages 471–472, 2021.
  17. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.
  18. The GUDHI Project. GUDHI User and Reference Manual. GUDHI Editorial Board, 3.8.0 edition, 2023.
  19. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat Biotechnol, 35(6):551–560, 2017.
  20. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In Symposium on Point Based Graphics, pages 91–100.
  21. Topological data analysis in biomedicine: A review. Journal of Biomedical Informatics, 130:104082, 2022.
  22. Rw Sumner and J Popovic. Deformation transfer for triangle meshes. Acm Transactions on Graphics, 23(3):399–405, 2004.
  23. Ito J. Uriu K. et al. Tamura, T. Virological characteristics of the SARS-CoV-2 XBB variant derived from recombination of two Omicron subvariants. Nat Commun, 14(1):2800, 2023.
  24. Visualization, Clustering, and Graph Generation of Optimization Search Trajectories for Evolutionary Computation Through Topological Data Analysis: Application of the Mapper. In 2022 IEEE Congress on Evolutionary Computation (CEC), pages 1–8, 2022.
  25. Kepler mapper: A flexible python implementation of the mapper algorithm. Journal of Open Source Software, 4(42):1315, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yuyang Tao (2 papers)
  2. Shufei Ge (7 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com