CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning Molecular Representations (2307.13275v2)
Abstract: AI-driven drug design relies significantly on predicting molecular properties, which is a complex task. In current approaches, the most commonly used feature representations for training deep neural network models are based on SMILES and molecular graphs. While these methods are concise and efficient, they have limitations in capturing complex spatial information. Recently, researchers have recognized the importance of incorporating three-dimensional information of molecular structures into models. However, capturing spatial information requires the introduction of additional units in the generator, bringing additional design and computational costs. Therefore, it is necessary to develop a method for predicting molecular properties that effectively combines spatial structural information while maintaining the simplicity and efficiency of graph neural networks. In this work, we propose an embedding approach CTAGE, utilizing $k$-hop discrete Ricci curvature to extract structural insights from molecular graph data. This effectively integrates spatial structural information while preserving the training complexity of the network. Experimental results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks, validating that the information from k-hop node curvature effectively reflects the relationship between molecular structure and function.
- Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:2209.01712, 2022.
- The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
- Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
- Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS computational biology, 14(1):e1005929, 2018.
- Unifying structural descriptors for biological and bioinspired nanoscale complexes. Nature Computational Science, 2(4):243–252, 2022.
- A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
- Benchmarking graph neural networks, 2022.
- Robin Forman. Bochner’s method for cell complexes and combinatorial Ricci curvature. Discrete and Computational Geometry, 29(3):323–374, 2003.
- A data-driven approach to predicting successes and failures of clinical trials. Cell chemical biology, 23(10):1294–1301, 2016.
- Neural message passing for quantum chemistry. pages 1263–1272, 2017.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations, 2018.
- Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738, 2019.
- Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design, 30:595–608, 2016.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Greg Landrum. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8:31, 2013.
- Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739, 2020.
- Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdisciplinary Reviews: Computational Molecular Science, 10(5):e1465, 2020.
- Distance encoding: Design provably more powerful neural networks for graph representation learning. Advances in Neural Information Processing Systems, 33:4465–4478, 2020.
- Dgl-lifesci: An open-source toolkit for deep learning on graphs in life science. ACS omega, 6(41):27233–27238, 2021.
- Equiformer: Equivariant graph attention transformer for 3d atomistic graphs, 2023.
- Ricci curvature of graphs. Tohoku Mathematical Journal, 63(4):605 – 627, 2011.
- N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. volume 32. 2019.
- Molecule attention transformer. arXiv preprint arXiv:2002.08264, 2020.
- Ricci curvature of the internet topology. In 2015 IEEE conference on computer communications (INFOCOM), pages 2758–2766. IEEE, 2015.
- Yann Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256(3):810–864, 2009.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014.
- Ginte Petrulionyte. Ricci curvature in network embedding and clustering, 2020.
- Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559–12571, 2020.
- E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
- Forman curvature for complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2016(6):063206, 2016.
- Computational modeling of β𝛽\betaitalic_β-secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of chemical information and modeling, 56(10):1936–1949, 2016.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:2111.14522, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Graph attention networks. stat, 1050(20):10–48550, 2017.
- Ollivier persistent ricci curvature-based machine learning for the protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling, 61(4):1617–1626, 2021.
- Smiles. 2. algorithm for generation of unique smiles notation. Journal of chemical information and computer sciences, 29(2):97–101, 1989.
- Quantitative toxicity prediction using topology based multitask deep neural networks. Journal of Chemical Information and Modeling, 58(2):520–531, 2018. PMID: 29314829.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Molformer: Motif-based transformer on 3d heterogeneous molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5312–5320, 2023.
- Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019.
- Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 285–294, 2017.
- Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
- Curvature graph network. In International conference on learning representations, 2019.
- Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
- Graph-bert: Only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140, 2020.
- Application of computational biology and artificial intelligence in drug design. International Journal of Molecular Sciences, 23(21), 2022.
- Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2022.