Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures (2104.01672v3)
Abstract: Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos, and medical images.
- Evasion paths in mobile sensor networks. The International Journal of Robotics Research 34(1), 90–104.
- Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research 18.
- Joint Geometric and Topological Analysis of Hierarchical Datasets. arXiv preprint arXiv:2104.01395.
- Topological Data Analysis of Functional MRI Connectivity in Time and Space Domains. In G. Wu, I. Rekik, M. D. Schirmer, A. W. Chung, and B. Munsell (Eds.), Connectomics in NeuroImaging, Cham, pp. 67–77. Springer International Publishing.
- Persistent Homology Based Characterization of the Breast Cancer Immune Microenvironment: A Feasibility Study. In S. Cabello and D. Z. Chen (Eds.), 36th International Symposium on Computational Geometry (SoCG 2020), Volume 164 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, pp. 11:1–11:20. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
- Bauer, U. (2021). Ripser: efficient computation of vietoris–rips persistence barcodes. Journal of Applied and Computational Topology 5(3), 391–423.
- Breaking through the 80% glass ceiling: Raising the state of the art in word sense disambiguation by incorporating knowledge graph information. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 2854–2864. Association for Computational Linguistics.
- Maximally persistent cycles in random geometric complexes. Ann. Appl. Probab. 27(4), 2032–2060.
- Keyphrase generation for scientific document retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 1118–1126. Association for Computational Linguistics.
- A Topology Layer for Machine Learning.
- Efficient and robust persistent homology for measures. Computational Geometry 58, 70–96.
- A course in metric geometry, Volume 33. American Mathematical Soc.
- Approximating persistent homology for large datasets. arXiv preprint arXiv:2204.09155.
- Carlsson, G. (2009). Topology and Data. Bulletin of the American Mathematical Society 46(2), 255–308.
- Gromov–Hausdorff Stable Signatures for Shapes using Persistence. Computer Graphics Forum 28(5), 1393–1403.
- Subsampling methods for persistent homology. In International Conference on Machine Learning, pp. 2143–2151. PMLR.
- Persistence-Based Clustering in Riemannian Manifolds. J. ACM 60(6).
- A persistent homology approach to heart rate variability analysis with an application to sleep-wake classification. Frontiers in Physiology 12, 202.
- Modelling topological spatial relations: Strategies for query processing. Computers & Graphics 18(6), 815–822.
- Stability of persistence diagrams. Discrete & computational geometry 37(1), 103–120.
- Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis. Journal of the American Statistical Association 115(531), 1139–1150.
- On the notion of weak isometry for finite metric spaces.
- Coordinate-free coverage in sensor networks with controlled boundaries via homology. The International Journal of Robotics Research 25(12), 1205–1222.
- Topological models of document-query sets in retrieval for enterprise information management. pp. 18–23.
- Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport. Journal of Applied and Computational Topology 5(1), 1–53.
- The classification of endoscopy images with persistent homology. Pattern Recognition Letters 83, 13–22. Geometric, topological and harmonic trends to image processing.
- Persistent Homology – a Survey. Contemporary mathematics 453, 257–282.
- Topological persistence and simplification. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 454–463.
- Geometry helps in bottleneck matching and related problems. Algorithmica 31(1), 1–28.
- Egghe, L. (1998). Properties of topologies of information retrieval systems. Mathematical and Computer Modelling 27(2), 61–79.
- Topological aspects of information retrieval. Journal of the American Society for Information Science 49(13), 1144–1160.
- Everett, D. M. and S. C. Cater (1992a). Topology of document retrieval systems. Journal of the American Society for Information Science 43(10), 658–673.
- Everett, D. M. and S. C. Cater (1992b). Topology of document retrieval systems. Journal of the American Society for Information Science 43(10), 658–673.
- Size theory as a topological tool for computer vision. Pattern Recognition and Image Analysis 9(4), 596–603.
- Hyperbolic neural networks. Advances in Neural Information Processing Systems 2018(NeurIPS), 5345–5355.
- Ghrist, R. (2008). Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society 45(1), 61–75.
- Google (2018). Kaggle Google Landmark Retrieval Challenge.
- ActivityNet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970.
- Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences.
- Structural changes during glass formation extracted by computational homology with machine learning. Communications Materials 1(1), 1–8.
- Connectivity-Optimized Representation Learning via Persistent Homology. In K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning, Volume 97 of Proceedings of Machine Learning Research, pp. 2751–2760. PMLR.
- Deep learning with topological signatures. arXiv preprint arXiv:1707.04041.
- Hopcroft, J. E. and R. M. Karp (1971). A n5/2superscript𝑛52n^{5/2}italic_n start_POSTSUPERSCRIPT 5 / 2 end_POSTSUPERSCRIPT algorithm for maximum matchings in bipartite. In Proceedings of the 12th Annual Symposium on Switching and Automata Theory (Swat 1971), SWAT ’71, USA, pp. 122–125. IEEE Computer Society.
- Hou, B. (2019). ResNetAE-https://github.com/farrell236/resnetae.
- Topology-Preserving Deep Image Segmentation. pp. 1–11.
- Hera. URL: https://bitbucket. org/grey_narn/hera.
- Geometry helps to compare persistence diagrams. ACM Journal of Experimental Algorithmics 22, 1–20.
- Large scale computation of means and clusters for persistence diagrams using optimal transport. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9792–9802.
- Persistent homology for the automatic classification of prostate cancer aggressiveness in histopathology images. In J. E. Tomaszewski and A. D. Ward (Eds.), Medical Imaging 2019: Digital Pathology, Volume 10956, pp. 72 – 85. International Society for Optics and Photonics: SPIE.
- Cross-lingual document retrieval with smooth learning. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), pp. 3616–3629. International Committee on Computational Linguistics.
- Searching for actions on the hyperbole. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders.
- Miller, G. A. (1995). WordNet: A lexical database for english. Commun. ACM 38(11), 39––41.
- Topological Autoencoders. Proceedings of the 37th International Conference on Machine Learning, 1–18.
- Munkres, J. R. (2018). Elements of algebraic topology. CRC press.
- Nathaniel Saul, C. T. (2019). Scikit-TDA: Topological Data Analysis for Python.
- Poincaré embeddings for learning hierarchical representations. Advances in Neural Information Processing Systems 2017(NeurIPS), 6339–6348.
- A roadmap for the computation of persistent homology. EPJ Data Science 6, 1–38.
- Topological gene expression networks recapitulate brain anatomy and function. Network Neuroscience 3(3), 744–762.
- A Klein-Bottle-Based Dictionary for Texture Representation. International journal of computer vision 107(1), 75–97.
- Multiscale topological trajectory classification with persistent homology. In Robotics: science and systems.
- A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4741–4748.
- SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation. In Proceedings of the Thirty-Fourth Conference on Artificial Intelligence, pp. 8758–8765. Association for the Advancement of Artificial Intelligence.
- Computing the Shift-Invariant Bottleneck Distance for Persistence Diagrams. In CCCG, pp. 78–84.
- giotto-tda: A topological data analysis toolkit for machine learning and data exploration.
- Computer-aided classification of hepatocellular ballooning in liver biopsies from patients with nash using persistent homology. Computer Methods and Programs in Biomedicine 195, 105614.
- The GUDHI Project (2021). GUDHI User and Reference Manual (3.4.1 ed.). GUDHI Editorial Board.
- Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry 52(1), 44–70.
- MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. In IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 191–195.
- Adapting BERT for word sense disambiguation with gloss selection objective and example sentences. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online, pp. 41–46. Association for Computational Linguistics.
- Computing persistent homology. Discrete & Computational Geometry 33(2), 249–274.