A Distributed Approach for Persistent Homology Computation on a Large Scale (2404.08245v1)
Abstract: Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost, that is bound to explode more because new imaging devices generate an ever-growing amount of data. In this paper we present PixHomology, a novel algorithm for efficiently computing $0$-dimensional PH on 2D images, optimizing memory and processing time. By leveraging the Apache Spark framework, we also present a distributed version of our algorithm with several optimized variants, able to concurrently process large batches of astronomical images. Finally, we present the results of an experimental analysis showing that our algorithm and its distributed version are efficient in terms of required memory, execution time, and scalability, consistently outperforming existing state-of-the-art PH computation tools when used to process large datasets.
- Javaplex: A research software package for persistent (co) homology. In Mathematical Software–ICMS 2014: 4th International Congress, Seoul, South Korea, August 5-9, 2014. Proceedings 4, pages 129–136. Springer, 2014.
- Scheduling k-mers counting in a distributed environment. In L. Amorosi, P. Dell’Olmo, and I. Lari, editors, Optimization in Artificial Intelligence and Data Sciences, pages 73–83, Cham, 2022. Springer International Publishing.
- Apache Software Foundation. Apache Spark Documentation. https://spark.apache.org/docs/latest/, 2024. Accessed: April 2, 2024.
- U. Bauer. Ripser: efficient computation of Vietoris-Rips persistence barcodes. J. Appl. Comput. Topol., 5(3):391–423, 2021.
- Phat - persistent homology algorithms toolbox. Journal of Symbolic Computation, 78:76–90, 01 2017.
- Geometric and Topological Inference. Cambridge University Press, 10 2017.
- High-performance computing with terastat. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing (DASC/PiCom/CBDCom/CyberSciTech), pages 499–506, Los Alamitos, CA, USA, aug 2020. IEEE Computer Society.
- G. Carlsson. Topology and data. Bulletin of The American Mathematical Society - BULL AMER MATH SOC, 46:255–308, 04 2009.
- Topological persistence for astronomical image segmentation. In The 51st Scientific Meeting of the Italian Statistical Society, pages 1993–1998, 2022.
- Stability of persistence diagrams. Discrete and Computational Geometry, 2007.
- L. S. S. T. Corporation. Large synoptic survey telescope, 2023.
- M. Craig. A guide to CCD data reduction and stellar photometry using astropy and affiliated packages. github.io, 2023.
- Dualities in persistent (co) homology. Inverse Problems, 27(12):124003, 2011.
- V. De Silva and M. Vejdemo-Johansson. Persistent cohomology and circular coordinates. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 227–236, 2009.
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Proceedings of Sixth Symposium on Operating System Design and Implementation (OSDI04), 51, 01 2004.
- H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. American Mathematical Soc., 01 2010.
- Topological persistence and simplification. Discrete & Computational Geometry, 01 2003.
- Geometry helps in bottleneck matching and related problems. Algorithmica, 31:1–28, 09 2001.
- A. et al. The Astropy Project: Sustaining and Growing a Community-oriented Open-source Project and the Latest Major Release (v5.0) of the Core Package. The Astrophysical Journal, 935(2):167, Aug. 2022.
- D. M. et al. Dionysus: A software library for topological data analysis, 2012.
- R. L. Graham. Bounds on Multiprocessing Timing Anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969.
- Computational Homology. Applied Mathematical Sciences. Springer New York, 2004.
- Cubical ripser: Software for computing persistent homology of image and volume data, 05 2020.
- The gudhi library: Simplicial complexes and persistent homology. In Mathematical Software–ICMS 2014: 4th International Congress, Seoul, South Korea, August 5-9, 2014. Proceedings 4, pages 167–174. Springer, 2014.
- J. Munkres. Elements of Algebraic Topology. Westview Press; First Edition, 01 1984.
- A roadmap for the computation of persistent homology. EPJ Data Science, 6, 06 2015.
- Big data in contemporary electron microscopy: challenges and opportunities in data transfer, compute and management. Histochemistry and Cell Biology, 160(3):169–192, 2023.
- Persistent homology for fast tumor segmentation in whole slide histology images. Procedia Computer Science, 90:119–124, 2016. 20th Conference on Medical Image Understanding and Analysis (MIUA 2016).
- J. Starck and F. Murtagh. Astronomical Image and Data Analysis. Astronomy and Astrophysics Library. Springer Berlin Heidelberg, 2007.
- H. Wagner. Slice, simplify and stitch: Topology-preserving simplification scheme for massive voxel data. In 39th International Symposium on Computational Geometry (SoCG 2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2023.
- Efficient Computation of Persistent Homology for Cubical Data, pages 91–106. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
- Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pages 2–2, 04 2012.
- Spark: Cluster computing with working sets. HotCloud, 10(10-10):95, 2010.