GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning (2304.04970v2)
Abstract: $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at https://github.com/soham0209/mpml-graph.
- Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35, 2017. URL http://jmlr.org/papers/v18/16-337.html.
- Classification of hepatic lesions using the matching metric. Comput. Vis. Image Underst., 121:36–42, apr 2014. ISSN 1077-3142. doi: 10.1016/j.cviu.2013.10.014. URL https://doi.org/10.1016/j.cviu.2013.10.014.
- A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 522–539, 2021. doi: 10.1137/1.9781611976465.32. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611976465.32.
- Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022. doi: 10.1109/TPAMI.2022.3154319. URL https://doi.org/10.1109/TPAMI.2022.3154319.
- Peter Bubenik. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res., 16:77–102, 2015. doi: 10.5555/2789272.2789275. URL https://dl.acm.org/doi/10.5555/2789272.2789275.
- API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122, 2013.
- Topologynet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLOS Computational Biology, 13(7):1–27, 07 2017. doi: 10.1371/journal.pcbi.1005690. URL https://doi.org/10.1371/journal.pcbi.1005690.
- Zigzag persistence. Foundations of computational mathematics, 10(4):367–405, 2010. URL https://link.springer.com/content/pdf/10.1007/s10208-010-9066-0.pdf.
- Topological Data Analysis with Applications. Cambridge University Press, 2021. doi: 10.1017/9781108975704.
- The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71–93, Jul 2009. ISSN 1432-0444. doi: 10.1007/s00454-009-9176-0. URL https://doi.org/10.1007/s00454-009-9176-0.
- Multiparameter persistence image for topological machine learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 22432–22444. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/fdff71fcab656abfbefaabecab1a7f6d-Paper.pdf.
- Perslay: A neural network layer for persistence diagrams and new graph topological signatures. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp. 2786–2796. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/carriere20a.html.
- LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry, SCG ’09, pp. 237–246, 2009.
- A topological regularizer for classifiers via persistent homology. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2573–2582. PMLR, 2019.
- XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 785–794, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939785. URL https://doi.org/10.1145/2939672.2939785.
- A kernel for multi-parameter persistent homology. Computers & Graphics: X, 2:100005, 2019. ISSN 2590-1486. doi: https://doi.org/10.1016/j.cagx.2019.100005. URL https://www.sciencedirect.com/science/article/pii/S2590148619300056.
- Support-vector networks. Machine Learning, 20(3):273–297, Sep 1995. ISSN 1573-0565. doi: 10.1007/BF00994018. URL https://doi.org/10.1007/BF00994018.
- Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology. Curran Associates Inc., Red Hook, NY, USA, 2019.
- ToDD: Topological compound fingerprinting in computer-aided drug discovery. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=8hs7qlWcnGs.
- Fast computation of zigzag persistence. In 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pp. 43:1–43:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. doi: 10.4230/LIPIcs.ESA.2022.43. URL https://doi.org/10.4230/LIPIcs.ESA.2022.43.
- Computational Topology for Data Analysis. Cambridge University Press, 2022. doi: 10.1017/9781009099950.
- Computing generalized rank invariant for 2-parameter persistence modules via zigzag persistence and its applications. In Xavier Goaoc and Michael Kerber (eds.), 38th International Symposium on Computational Geometry, SoCG 2022, June 7-10, 2022, Berlin, Germany, volume 224 of LIPIcs, pp. 34:1–34:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. doi: 10.4230/LIPIcs.SoCG.2022.34. URL https://doi.org/10.4230/LIPIcs.SoCG.2022.34.
- Computational Topology: An Introduction. Applied Mathematics. American Mathematical Society, 2010. ISBN 9780821849255.
- Topological persistence and simplification. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 454–463, 2000. doi: 10.1109/SFCS.2000.892133.
- Measure theory and fine properties of functions. Textbooks in Mathematics. CRC Press, Boca Raton, FL, revised edition, 2015. ISBN 978-1-4822-4238-6.
- Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871–1874, jun 2008. ISSN 1532-4435.
- Robin Forman. Bochner’s method for cell complexes and combinatorial Ricci curvature. Discret. Comput. Geom., 29(3):323–374, 2003. doi: 10.1007/s00454-002-0743-x. URL https://doi.org/10.1007/s00454-002-0743-x.
- A topology layer for machine learning. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, pp. 1553–1563. PMLR, 2020. URL http://proceedings.mlr.press/v108/gabrielsson20a.html.
- A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pp. 729–734 vol. 2, 2005. doi: 10.1109/IJCNN.2005.1555942. URL https://doi.org/10.1109/IJCNN.2005.1555942.
- Allen Hatcher. Algebraic topology. Cambridge Univ. Press, Cambridge, 2000. URL https://cds.cern.ch/record/478079.
- Deep learning with topological signatures. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 1634–1644, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/883e881bb4d22a7add958f2d6b052c9f-Abstract.html.
- Graph filtration learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 4314–4323. PMLR, 2020. URL http://proceedings.mlr.press/v119/hofer20b.html.
- Topological graph neural networks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=oxxUMeFwEHd.
- Phos: Persistent homology for virtual screening. ChemRxiv, 2018. doi: 10.26434/chemrxiv.6969260.v1.
- PLLay: Efficient topological layer based on persistent landscapes. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 15965–15977. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/b803a9254688e259cde2ec0361c8abe4-Paper.pdf.
- Generalized persistence diagrams for persistence modules over posets. Journal of Applied and Computational Topology, 5(4):533–581, Dec 2021. ISSN 2367-1734. doi: 10.1007/s41468-021-00075-1. URL https://doi.org/10.1007/s41468-021-00075-1.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
- Michael Lesnick. The theory of the interleaving distance on multidimensional persistence modules. Found. Comput. Math., 15(3):613–650, jun 2015. ISSN 1615-3375. doi: 10.1007/s10208-015-9255-y. URL https://doi.org/10.1007/s10208-015-9255-y.
- Interactive visualization of 2-d persistence modules. CoRR, abs/1512.00180, 2015. URL http://arxiv.org/abs/1512.00180.
- Dowker complex based machine learning (dcml) models for protein-ligand binding affinity prediction. PLOS Computational Biology, 18(4):1–17, 04 2022. doi: 10.1371/journal.pcbi.1009943. URL https://doi.org/10.1371/journal.pcbi.1009943.
- Saunders MacLane. Categories for the working mathematician. Graduate Texts in Mathematics, Vol. 5. Springer-Verlag, New York-Berlin, 1971.
- Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 4602–4609, 2019.
- TUDataset: A collection of benchmark datasets for learning with graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020. URL www.graphlearning.io.
- Community detection on networks with Ricci flow. Scientific Reports, 9(1):9984, Jul 2019. ISSN 2045-2322. doi: 10.1038/s41598-019-46380-9. URL https://doi.org/10.1038/s41598-019-46380-9.
- Steve Y. Oudot. Persistence Theory: From Quiver Representations to Data Analysis. Number 209 in Mathematical Surveys and Monographs. American Mathematical Society, 2015. URL https://hal.inria.fr/hal-01247501.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
- Amit Patel. Generalized persistence diagrams. J. Appl. Comput. Topol., 1(3-4):397–419, 2018. doi: 10.1007/s41468-018-0012-6. URL https://doi.org/10.1007/s41468-018-0012-6.
- The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009. doi: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605.
- PersGNN: Applying topological data analysis and geometric deep learning to structure-based protein function prediction. arXiv preprint arXiv:2010.16027, 2020.
- Oliver Vipond. Multiparameter persistence landscapes. Journal of Machine Learning Research, 21(61):1–38, 2020. URL http://jmlr.org/papers/v21/19-054.html.
- How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
- An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- GEFL: extended filtration learning for graph classification. In Proceedings of the First Learning on Graphs Conference, volume 198 of Proceedings of Machine Learning Research, pp. 16:1–16:26. PMLR, 09–12 Dec 2022. URL https://proceedings.mlr.press/v198/zhang22b.html.
- Persistence enhanced graph neural network. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp. 2896–2906. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/zhao20d.html.
- Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, Feb 2005. ISSN 1432-0444. doi: 10.1007/s00454-004-1146-y. URL https://doi.org/10.1007/s00454-004-1146-y.