A Vectorization Method Induced By Maximal Margin Classification For Persistent Diagrams (2407.21298v1)
Abstract: Persistent homology is an effective method for extracting topological information, represented as persistent diagrams, of spatial structure data. Hence it is well-suited for the study of protein structures. Attempts to incorporate Persistent homology in machine learning methods of protein function prediction have resulted in several techniques for vectorizing persistent diagrams. However, current vectorization methods are excessively artificial and cannot ensure the effective utilization of information or the rationality of the methods. To address this problem, we propose a more geometrical vectorization method of persistent diagrams based on maximal margin classification for Banach space, and additionaly propose a framework that utilizes topological data analysis to identify proteins with specific functions. We evaluated our vectorization method using a binary classification task on proteins and compared it with the statistical methods that exhibit the best performance among thirteen commonly used vectorization methods. The experimental results indicate that our approach surpasses the statistical methods in both robustness and precision.
- D. Ali, A. Asaad, M. J. Jimenez, V. Nanda, E. Paluzo-Hidalgo, and M. Soriano-Trigueros, “A survey of vectorization methods in topological data analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 12, pp. 14069–14080, 2023.
- D. Barnes, L. Polanco, and J. A. Perea, “A Comparative Study of Machine Learning Methods for Persistence Diagrams,” Front. Artif. Intell., vol. 4, pp. 681174, 2021.
- U. Bauer, “Ripser: efficient computation of Vietoris–Rips persistence barcodes,” J Appl. and Comput. Topology, vol. 5, pp. 391–423, 2021.
- G. Carlsson, “Topology and data,” Bull. Amer. Math. Soc., vol. 46, no. 2, pp. 255-308, 2009.
- G. Carlsson, “Topological methods for data modelling,” Nat. Rev. Phys., vol. 2, pp. 697-708, 2020.
- F. Chazal and B. Michel, “An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists,” Front. Artif. Intell., vol. 4, pp. 667963, 2021.
- D. Chen, J. Liu, J. Wu, G.-W. W, F. Pan, and S.-T. Yau, “Path Topology in Molecular and Materials Sciences,” J. Phys. Chem. Lett., vol. 14, no. 4, pp. 954-964, 2023.
- I. Chevyrev, V. Nanda, and H. Oberhauser, “Persistence paths and signature features in topological data analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 192–202, 2020.
- Y.-M. Chung and A. Lawson, “Persistence Curves: A canonical framework for summarizing persistence diagrams,” Adv. Comput. Math., vol. 48, no. 6, pp. 1-42, 2022.
- Z.-T. Dong, C.-F. Hu, C. Zhou, and H.-W. Lin, “Vectorization of persistence barcode with applications in pattern classification of porous structures,” Comput. Graph., vol. 90, pp. 182-192, 2020.
- Z.-T. Dong, H.-W. Lin, C. Zhou, B. Zhang, and G.-C. Li, “Persistence B‑spline grids: stable vector representation of persistence diagrams based on data fitting,” Machine Learning, vol. 113, pp. 1373-1420, 2024.
- H. Edelsbrunner, D. Letscher, and A. Zomorodian, “Topological Persistence and Simplification,” Discrete Comput. Geom., vol. 28, pp. 511–533, 2002.
- A. Grover and J. Leskovec, “node2vec: Scalable Feature Learning for Networks,” KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
- M. Hein, O. Bousquet, and B. Schölkopf, “Maximal margin classification for metric spaces,” J. Comput. Syst. Sci., vol. 71, no. 3, pp. 333-359, 2005.
- Y. Hiraoka, T. Nakamura, A. Hirata, E. G. Escolar, K. Matsue, and Y. Nishiura, “Hierarchical structures of amorphous solids characterized by persistent homology,” Proc. Natl. Acad. Sci. USA., vol. 113, no. 26, pp. 7035-40, 2016.
- S. Hwang, C.-X. Pan, B. Garcia, A. R. Davidson, T. F. Moraes, and K. L. Maxwell, “Structural and Mechanistic Insight into CRISPR-Cas9 Inhibition by Anti-CRISPR Protein AcrIIC4Hpa,” J. Mol. Biol., vol. 434, no. 5, pp. 167420, 2022.
- M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, and E. Charpentier, “A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity,” Science, vol. 337, no. 6096, pp. 816-821, 2012.
- P. Joharinad and J. Jost, “Topology and curvature of metric spaces,” Adv. Math., vol. 356, pp. 106813, 2019.
- S. Kališnik, “Tropical coordinates on the space of persistence Barcodes,” Found. Comput. Math., vol. 19, no. 1, pp. 101–129, 2018.
- Y. Kim, S. J. Lee, H. J. Yoon, N.-K. Kim, B.-J Lee, and J.-Y. Suh, “Anti-CRISPR AcrIIC3 discriminates between Cas9 orthologs via targeting the variable surface of the HNH nuclease domain,” FEBS. J., vol. 286, no. 2, pp. 4661-4674, 2019.
- S. Lloyd, S. Garnerone, and P. Zanardi, “Quantum algorithms for topological and geometric analysis of data,” Nat. Commun., vol. 7, pp. 10138, 2016.
- J. A. Perea and G. Carlsson, “A Klein bottle based dictionary for texture representation,” Int. J. Comput. Vis., vol. 107, no. 1, pp. 75-97, 2014.
- J. A. Perea, “Topological time series analysis,” Notice of AMS., vol. 66, no. 5, pp. 686-694, 2019.
- G. Schuler, C. Hu, and A. Ke, “Structural basis for RNA-guided DNA cleavage by IscB-ω𝜔\omegaitalic_ωRNA and mechanistic comparison with Cas9,” Science, vol. 376, pp. 1476-1481, 2022.
- C. Tralie, N. Saul, and R. Bar-On, “Ripser.py: A Lean Persistent Homology Library for Python,” J. Open Source Softw., vol. 3, pp. 925, 2018.
- B. Zieliński, M. Lipiński, M. Juda, M. Zeppelzauer, and P. Dłotko, “Persistence codebooks for topological data analysis,” Artif. Intell. Rev., vol. 54, no. 3, pp. 1969–2009, 2020.