A Class of Topological Pseudodistances for Fast Comparison of Persistence Diagrams (2402.14489v1)
Abstract: Persistence diagrams (PD)s play a central role in topological data analysis, and are used in an ever increasing variety of applications. The comparison of PD data requires computing comparison metrics among large sets of PDs, with metrics which are accurate, theoretically sound, and fast to compute. Especially for denser multi-dimensional PDs, such comparison metrics are lacking. While on the one hand, Wasserstein-type distances have high accuracy and theoretical guarantees, they incur high computational cost. On the other hand, distances between vectorizations such as Persistence Statistics (PS)s have lower computational cost, but lack the accuracy guarantees and in general they are not guaranteed to distinguish PDs (i.e. the two PS vectors of different PDs may be equal). In this work we introduce a class of pseudodistances called Extended Topological Pseudodistances (ETD)s, which have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme, while being computationally lighter and close to Persistence Statistics at the lower complexity extreme, and thus allow users to interpolate between the two metrics. We build theoretical comparisons to show how to fit our new distances at an intermediate level between persistence vectorizations and Wasserstein distances. We also experimentally verify that ETDs outperform PSs in terms of accuracy and outperform Wasserstein and Sliced Wasserstein distances in terms of computational complexity.
- Gunnar Carlsson. Topology and data. Bull. Amer. Math. Soc., 46(2):255–308, 01 2009.
- Computational Topology - an Introduction. American Mathematical Society, 2010. ISBN 978-0-8218-4925-5. URL http://www.ams.org/bookstore-getitem/item=MBK-69.
- Stability of persistence diagrams. Discr. Comp. Geom., 37(1):103–120, 01 2007. ISSN 1432-0444. doi:10.1007/s00454-006-1276-5. URL https://doi.org/10.1007/s00454-006-1276-5.
- Lipschitz functions have lp-stable persistence. Found. Comp. Math., 10(2):127–139, 4 2010a. ISSN 1615-3383. doi:10.1007/s10208-010-9060-6. URL https://doi.org/10.1007/s10208-010-9060-6.
- Afra Zomorodian. Topology for Computing. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2009. ISBN 0521136091.
- Robert Ghrist. Barcodes: The persistent topology of data. BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY, 45:61–75, 02 2008. doi:10.1090/S0273-0979-07-01191-3.
- On time-series topological data analysis: New data and opportunities. In 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1014–1022, 06 2016. doi:10.1109/CVPRW.2016.131.
- Topological Methods in Data Analysis and Visualization IV. Theory, Algorithms, and Applications. Mathematics and Visualization. Springer International Publishing, 2017. ISBN 9783319446844. doi:10.1007/978-3-319-44684-4.
- Persistent homology of attractors for action recognition. In ICIP 2016, pages 4150–4154, 2016. doi:10.1109/ICIP.2016.7533141. URL https://doi.org/10.1109/ICIP.2016.7533141.
- Yuhei Umeda. Time series classification via topological data analysis. Trans. Jap. Soc. AI, 32:D–G72, 05 2017. doi:10.1527/tjsai.D-G72.
- Towards topological analysis of high-dimensional feature spaces. Comp. Vis. Im. Underst., 121:21–26, 2014. ISSN 1077-3142. doi:https://doi.org/10.1016/j.cviu.2014.01.005. URL https://www.sciencedirect.com/science/article/pii/S1077314214000125.
- Topological data analysis for word sense disambiguation, 2022. URL https://arxiv.org/abs/2203.00565.
- Topology-based analysis for multimodal atmospheric data of volcano eruptions. In Carr et al., editor, Topological Methods in Data Analysis and Visualization IV, pages 35–50. Springer, 2017. ISBN 978-3-319-44684-4.
- The importance of the whole: Topological data analysis for the network neuroscientist. Network Neuroscience, 3(3):656–673, 2019. doi:10.1162/netn_a_00073. URL https://doi.org/10.1162/netn_a_00073.
- Persistent homology-based gait recognition robust to upper body variations. In ICPR2016, Cancún, Mexico, December 4-8, 2016, 2016.
- Driving behaviour analysis using topological features. In SMC2016. IEEE, 2016.
- Topological data analysis of high resolution diabetic retinopathy images. PLOS ONE, 14(5):1–10, 05 2019. doi:10.1371/journal.pone.0217413. URL https://doi.org/10.1371/journal.pone.0217413.
- Primoz Ŝkraba. Persistent homology and machine learning. Informatica, 42(2):253–258, 01 2018.
- R. L. Dobrushin. Prescribing a system of random variables by conditional distributions. Theory of Probability & Its Applications, 15(3):458–486, 1970. doi:10.1137/1115049. URL https://doi.org/10.1137/1115049.
- Statistical aspects of wasserstein distances. Annual Review of Statistics and Its Application, 6(1):405–431, 2019. doi:10.1146/annurev-statistics-030718-104938. URL https://doi.org/10.1146/annurev-statistics-030718-104938.
- Universality of persistence diagrams and the bottleneck and wasserstein distances. Computational Geometry, 105-106:101882, 2022. ISSN 0925-7721. doi:https://doi.org/10.1016/j.comgeo.2022.101882. URL https://www.sciencedirect.com/science/article/pii/S0925772122000256.
- James Munkres. Algorithms for the assignment and transportation problems. Journal of the SIAM, 5(1):32–38, 2023/08/10/ 1957. ISSN 03684245. URL http://www.jstor.org/stable/2098689. Full publication date: Mar., 1957.
- Computational topology for data analysis. Cambridge University Press, 2022.
- Geometry helps to compare persistence diagrams. ACM J. Exp. Algorithmics, 22, September 2017. ISSN 1084-6654. doi:10.1145/3064175.
- Approximation algorithms for 1-wasserstein distance between persistence diagrams. In 19th Int. Symp. Exper. Alg., page 1, 2021.
- Scalable nearest neighbor search for optimal transport. In ICML, pages 497–506. PMLR, 2020.
- Sliced Wasserstein kernel for persistence diagrams. In Doina Precup and Yee Whye Teh, editors, ICML, volume 70 of PMLR, pages 664–673. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/carriere17a.html.
- Wasserstein barycenter and its application to texture mixing. In Alfred M. Bruckstein, Bart M. ter Haar Romeny, Alexander M. Bronstein, and Michael M. Bronstein, editors, Scale Space and Variational Methods in Computer Vision, pages 435–446. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-24785-9.
- Sliced and radon wasserstein barycenters of measures. J. Math. Imaging and Vision, 51(1):22–45, 01 2015. ISSN 1573-7683. doi:10.1007/s10851-014-0506-3. URL https://doi.org/10.1007/s10851-014-0506-3.
- Subspace robust wasserstein distances. In ICML, pages 5072–5081. PMLR, May 2019. URL https://proceedings.mlr.press/v97/paty19a.html. ISSN: 2640-3498.
- Strong equivalence between metrics of Wasserstein type. Electronic Communications in Probability, 26(none):1–13, January 2021. ISSN 1083-589X, 1083-589X. doi:10.1214/21-ECP383. URL https://projecteuclid.org/journals/electronic-communications-in-probability/volume-26/issue-none/Strong-equivalence-between-metrics-of-Wasserstein-type/10.1214/21-ECP383.full. Publisher: Institute of Mathematical Statistics and Bernoulli Society.
- Marco Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In NIPS, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html.
- Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling. Mathematical Programming, 188(1):395–407, 2021.
- Valentin Khrulkov and I. Oseledets. Geometry score: A method for comparing generative adversarial networks. In ICML, 2018.
- Approximating 1-Wasserstein Distance between Persistence Diagrams by Graph Sparsification, pages 169–183. Society for Industrial and Applied Mathematics, 2022. doi:10.1137/1.9781611977042.14. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611977042.14.
- Persistence images: A stable vector representation of persistent homology. JMLR, 18(8):1–35, 2017. URL http://jmlr.org/papers/v18/16-337.html.
- A survey of vectorization methods in topological data analysis. IEEE Trans. Patt. Anal. Mach.e Intell., pages 1–14, 2023. ISSN 1939-3539. doi:10.1109/TPAMI.2023.3308391.
- Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1):6, Jan 2022. ISSN 1572-9044. doi:10.1007/s10444-021-09893-4. URL https://doi.org/10.1007/s10444-021-09893-4.
- Comparing distance metrics on vectorized persistence summaries. In TDA & Beyond, 2020. URL https://openreview.net/forum?id=X1bxKJo5_qL.
- Gromov-Hausdorff Stable Signatures for Shapes using Persistence. Computer Graphics Forum, 28(5):1393–1403, July 2009. ISSN 01677055, 14678659. doi:10.1111/j.1467-8659.2009.01516.x. URL https://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2009.01516.x.
- Persistence stability for geometric complexes. Geometriae Dedicata, 173(1):193–214, December 2014. ISSN 1572-9168. doi:10.1007/s10711-013-9937-z. URL https://doi.org/10.1007/s10711-013-9937-z.
- Harold W Kuhn. The hungarian method for the assignment problem. Nav. Res. Log. Quart., 2(1-2):83–97, 1955.
- Computational optimal transport: With applications to data science. FTML, 11(5-6):355–607, 2019.
- Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
- Topology distance: A topology-based approach for evaluating generative adversarial networks. AAAI, 35(9):7721–7728, 05 2021. doi:10.1609/aaai.v35i9.16943.
- An entropy-based persistence barcode. Pattern Recognition, 48(2):391–401, 2015.
- Nicolas Bonnotte. Unidimensional and evolution methods for optimal transportation. PhD thesis, Université Paris Sud-Paris XI; Scuola normale superiore (Pise, Italie), 2013.
- Kimia Nadjahi. Sliced-Wasserstein distance for large-scale machine learning : theory, methodology and extensions. Thesis, Institut Polytechnique de Paris, November 2021. URL https://theses.hal.science/tel-03533097.
- On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107:107509, 2020.
- Scikit-tda: Topological data analysis for python, 2019. URL https://doi.org/10.5281/zenodo.2533369.
- Pot: Python optimal transport. JMLR, 22(78):1–8, 2021. URL http://jmlr.org/papers/v22/20-451.html.
- Large scale computation of means and clusters for persistence diagrams using optimal transport. In Proc. 32nd Int. Conference on Neural Information Processing Systems, NIPS’18, page 9792–9802. Curran Associates Inc., 2018.
- The gudhi library: Simplicial complexes and persistent homology. In Hoon Hong and Chee Yap, editors, Mathematical Software – ICMS 2014. Springer Berlin Heidelberg, 2014.
- Outex - new framework for empirical evaluation of texture analysis algorithms. In 2002 ICPR, volume 1, pages 701–706 vol.1, 8 2002. doi:10.1109/ICPR.2002.1044854.
- Shape retrieval of non-rigid 3d human models. IJCV, 120(2):169–193, 11 2016. ISSN 1573-1405. doi:10.1007/s11263-016-0903-8. URL https://doi.org/10.1007/s11263-016-0903-8.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017a. URL http://arxiv.org/abs/1708.07747.
- Computational Homology. Applied Mathematical Sciences. Springer New York, 2004. ISBN 9780387408538. URL https://books.google.cl/books?id=V1kyZ1E7pLgC.
- Efficient computation of persistent homology for cubical data. In Topological methods in data analysis and visualization II: theory, algorithms, and applications, pages 91–106. Springer, 2011.
- Random search for hyper-parameter optimization. JMLR, 13(Feb):281–305, 2012. ISSN ISSN 1533-7928. URL http://www.jmlr.org/papers/v13/bergstra12a.html.
- An Introduction to Statistical Learning: with Applications in Python. Springer Texts in Statistics. Springer, 2023. ISBN 9783031387470. URL https://books.google.cl/books?id=ygzJEAAAQBAJ.
- Scikit-learn: Machine learning in python. JMLR, 12, 01 2012.
- Topology of deep neural networks. JMLR, 21(184):1–40, 2020. URL http://jmlr.org/papers/v21/20-345.html.
- Deep learning, volume 1. MIT press Cambridge, MA, USA, 2017.
- Lipschitz functions have l p-stable persistence. Found. Comp. Math., 10(2):127–139, 2010b.
- A course in metric geometry, volume 33. American Mathematical Society, 2022.
- Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.
- Statistical, robustness, and computational guarantees for sliced wasserstein distances. NeurIPS, 35:28179–28193, 2022.
- Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections. In Advances in Neural Information Processing Systems, volume 34, pages 12411–12424. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/hash/6786f3c62fbf9021694f6e51cc07fe3c-Abstract.html.
- Array programming with NumPy. Nature, 585(7825):357–362, September 2020. doi:10.1038/s41586-020-2649-2. URL https://doi.org/10.1038/s41586-020-2649-2.
- Shape retrieval contest 2007: Watertight models track. SHREC competition, 8, 07 2008.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017b. URL http://arxiv.org/abs/1708.07747. cite arxiv:1708.07747Comment: Dataset is freely available at https://github.com/zalandoresearch/fashion-mnist Benchmark is available at http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/.
- Generalized heat kernel signatures. In International Conference in Central Europe on Computer Graphics and Visualization, 2011. URL https://api.semanticscholar.org/CorpusID:59453473.
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005. doi:10.1109/CVPR.2005.177.
- Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nature Machine Intelligence, 1(9):423–433, Sep 2019. ISSN 2522-5839. doi:10.1038/s42256-019-0087-3. URL https://doi.org/10.1038/s42256-019-0087-3.
- A topological "reading" lesson: Classification of mnist using tda, 2019.