Molecular geometric deep learning (2306.15065v1)
Abstract: Geometric deep learning (GDL) has demonstrated huge power and enormous potential in molecular data analysis. However, a great challenge still remains for highly efficient molecular representations. Currently, covalent-bond-based molecular graphs are the de facto standard for representing molecular topology at the atomic level. Here we demonstrate, for the first time, that molecular graphs constructed only from non-covalent bonds can achieve similar or even better results than covalent-bond-based models in molecular property prediction. This demonstrates the great potential of novel molecular representations beyond the de facto standard of covalent-bond-based molecular graphs. Based on the finding, we propose molecular geometric deep learning (Mol-GDL). The essential idea is to incorporate a more general molecular representation into GDL models. In our Mol-GDL, molecular topology is modeled as a series of molecular graphs, each focusing on a different scale of atomic interactions. In this way, both covalent interactions and non-covalent interactions are incorporated into the molecular representation on an equal footing. We systematically test Mol-GDL on fourteen commonly-used benchmark datasets. The results show that our Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Source code and data are available at https://github.com/CS-BIO/Mol-GDL.
- L. Zhang, J. Tan, D. Han, and H. Zhu, “From machine learning to deep learning: progress in machine intelligence for rational drug discovery,” Drug discovery today, vol. 22, no. 11, pp. 1680–1685, 2017.
- H. Chen, O. Engkvist, Y. Wang, M. Olivecrona, and T. Blaschke, “The rise of deep learning in drug discovery,” Drug discovery today, vol. 23, no. 6, pp. 1241–1250, 2018.
- K.-K. Mak and M. R. Pichika, “Artificial intelligence in drug development: present status and future prospects,” Drug discovery today, vol. 24, no. 3, pp. 773–780, 2019.
- H. S. Chan, H. Shan, T. Dahoun, H. Vogel, and S. Yuan, “Advancing drug discovery via artificial intelligence,” Trends in pharmacological sciences, vol. 40, no. 8, pp. 592–604, 2019.
- Y. C. Lo, S. E. Rensi, W. Torng, and R. B. Altman, “Machine learning in chemoinformatics and drug discovery,” Drug discovery today, vol. 23, no. 8, pp. 1538–1546, 2018.
- C. Merkwirth and T. Lengauer, “Automatic generation of complementary descriptors with molecular graph networks,” Journal of chemical information and modeling, vol. 45, no. 5, pp. 1159–1168, 2005.
- D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, 2015, pp. 2224–2232.
- C. W. Coley, R. Barzilay, W. H. Green, T. S. Jaakkola, and K. F. Jensen, “Convolutional embedding of attributed molecular graphs for physical property prediction,” Journal of chemical information and modeling, vol. 57, no. 8, pp. 1757–1772, 2017.
- Y. Xu, J. Pei, and L. Lai, “Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction,” Journal of chemical information and modeling, vol. 57, no. 11, pp. 2672–2685, 2017.
- R. Winter, F. Montanari, F. Noé, and D.-A. Clevert, “Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations,” Chemical science, vol. 10, no. 6, pp. 1692–1701, 2019.
- O. Wieder, S. Kohlbacher, M. Kuenemann, A. Garon, P. Ducrot, T. Seidel, and T. Langer, “A compact review of molecular property prediction with graph neural networks,” Drug Discovery Today: Technologies, vol. 37, pp. 1–12, 2020.
- Z. Yu and H. Gao, “Molecular graph representation learning via heterogeneous motif graph construction,” arXiv preprint arXiv:2202.00529, 2022.
- K. Atz, F. Grisoni, and G. Schneider, “Geometric deep learning on molecular representations,” Nature Machine Intelligence, vol. 3, no. 12, pp. 1023–1032, 2021.
- S. Li, J. Zhou, T. Xu, D. Dou, and H. Xiong, “GeomGCL: geometric graph contrastive learning for molecular property prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 4, 2022, pp. 4541–4549.
- Y. Wang, J. Wang, Z. Cao, and A. Barati Farimani, “Molecular contrastive learning of representations via graph neural networks,” Nature Machine Intelligence, vol. 4, no. 3, pp. 279–287, 2022.
- J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “GraphRNN: Generating realistic graphs with deep auto-regressive models,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 5708–5717. [Online]. Available: https://proceedings.mlr.press/v80/you18a.html
- M. Welling and T. N. Kipf, “Semi-supervised classification with graph convolutional networks,” in J. International Conference on Learning Representations (ICLR 2017), 2016.
- T. N. Kipf and M. Welling, “Variational graph auto-encoders,” Advances in neural information processing systems, 2016.
- S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks,” Advances in neural information processing systems, vol. 32, 2019.
- P.-C. Kotsias, J. Arús-Pous, H. Chen, O. Engkvist, C. Tyrchan, and E. J. Bjerrum, “Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks,” Nature Machine Intelligence, vol. 2, no. 5, pp. 254–265, 2020.
- J. Wang, C.-Y. Hsieh, M. Wang, X. Wang, Z. Wu, D. Jiang, B. Liao, X. Zhang, B. Yang, Q. He et al., “Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning,” Nature Machine Intelligence, vol. 3, no. 10, pp. 914–922, 2021.
- J. Vamathevan, D. Clark, P. Czodrowski, I. Dunham, E. Ferran, G. Lee, B. Li, A. Madabhushi, P. Shah, M. Spitzer et al., “Applications of machine learning in drug discovery and development,” Nature reviews Drug discovery, vol. 18, no. 6, pp. 463–477, 2019.
- M. Wang, Z. Cang, and G.-W. Wei, “A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation,” Nature Machine Intelligence, vol. 2, no. 2, pp. 116–123, 2020.
- Z. Meng and K. Xia, “Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction,” Science Advances, vol. 7, no. 19, p. eabc5329, 2021.
- K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller, “SchNet–a deep learning architecture for molecules and materials,” The Journal of Chemical Physics, vol. 148, no. 24, p. 241722, 2018.
- W. Gong and Q. Yan, “Graph-based deep learning frameworks for molecules and solid-state materials,” Computational Materials Science, vol. 195, p. 110332, 2021.
- K. Kim, S. Kang, J. Yoo, Y. Kwon, Y. Nam, D. Lee, I. Kim, Y.-S. Choi, Y. Jung, S. Kim et al., “Deep-learning-based inverse design model for intelligent discovery of organic molecules,” npj Computational Materials, vol. 4, no. 1, p. 67, 2018.
- I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csányi, “MACE: Higher order equivariant message passing neural networks for fast and accurate force fields,” Advances in Neural Information Processing Systems, vol. 35, pp. 11 423–11 436, 2022.
- Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, “MoleculeNet: a benchmark for molecular machine learning,” Chemical science, vol. 9, no. 2, pp. 513–530, 2018.
- G. Subramanian, B. Ramsundar, V. Pande, and R. A. Denny, “Computational modeling of β𝛽\betaitalic_β-secretase 1 (bace-1) inhibitors using ligand based approaches,” Journal of chemical information and modeling, vol. 56, no. 10, pp. 1936–1949, 2016.
- S. G. Rohrer and K. Baumann, “Maximum unbiased validation (MUV) data sets for virtual screening based on pubchem bioactivity data,” Journal of chemical information and modeling, vol. 49, no. 2, pp. 169–184, 2009.
- I. F. Martins, A. L. Teixeira, L. Pinheiro, and A. O. Falcao, “A bayesian approach to in silico blood-brain barrier penetration modeling,” Journal of chemical information and modeling, vol. 52, no. 6, pp. 1686–1697, 2012.
- M. Kuhn, I. Letunic, L. J. Jensen, and P. Bork, “The sider database of drugs and side effects,” Nucleic acids research, vol. 44, no. D1, pp. D1075–D1079, 2016.
- K. M. Gayvert, N. S. Madhukar, and O. Elemento, “A data-driven approach to predicting successes and failures of clinical trials,” Cell chemical biology, vol. 23, no. 10, pp. 1294–1301, 2016.
- K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea et al., “Analyzing learned molecular representations for property prediction,” Journal of chemical information and modeling, vol. 59, no. 8, pp. 3370–3388, 2019.
- Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang et al., “Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism,” Journal of medicinal chemistry, vol. 63, no. 16, pp. 8749–8760, 2019.
- S. Liu, M. F. Demirel, and Y. Liang, “N-gram graph: Simple unsupervised representation for graphs, with applications to molecules,” Advances in neural information processing systems, vol. 32, 2019.
- W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” arXiv preprint arXiv:1905.12265, 2019.
- Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang, “Self-supervised graph transformer on large-scale molecular data,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 559–12 571, 2020.
- X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang, “Geometry-enhanced molecular representation learning for property prediction,” Nature Machine Intelligence, vol. 4, no. 2, pp. 127–134, 2022.
- J. Wang, X. Liu, S. Shen, L. Deng, and H. Liu, “DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations,” Briefings in Bioinformatics, vol. 23, no. 1, p. bbab390, 2022.
- Q. Liu and L. Xie, “TranSynergy: Mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations,” PLoS computational biology, vol. 17, no. 2, p. e1008653, 2021.
- K. Preuer, R. P. Lewis, S. Hochreiter, A. Bender, K. C. Bulusu, and G. Klambauer, “DeepSynergy: predicting anti-cancer drug synergy with deep learning,” Bioinformatics, vol. 34, no. 9, pp. 1538–1546, 2018.
- Z. Sun, S. Huang, P. Jiang, and P. Hu, “DTF: deep tensor factorization for predicting anticancer drug synergy,” Bioinformatics, vol. 36, no. 16, pp. 4483–4489, 2020.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
- T. K. Ho, “Random decision forests,” in Proceedings of 3rd international conference on document analysis and recognition, vol. 1. IEEE, 1995, pp. 278–282.
- J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann et al., “A deep learning approach to antibiotic discovery,” Cell, vol. 180, no. 4, pp. 688–702, 2020.
- S. N. Pozdnyakov and M. Ceriotti, “Incompleteness of graph neural networks for points clouds in three dimensions,” Machine Learning: Science and Technology, vol. 3, no. 4, p. 045020, 2022.
- S. N. Pozdnyakov, M. J. Willatt, A. P. Bartók, C. Ortner, G. Csányi, and M. Ceriotti, “Incompleteness of atomic structure representations,” Physical Review Letters, vol. 125, no. 16, p. 166001, 2020.
- S. Li, F. Wan, H. Shu, T. Jiang, D. Zhao, and J. Zeng, “MONN: a multi-objective neural network for predicting compound-protein interactions and affinities,” Cell Systems, vol. 10, no. 4, pp. 308–322, 2020.
- D. Chen, K. Gao, D. D. Nguyen, X. Chen, Y. Jiang, G.-W. Wei, and F. Pan, “Algebraic graph-assisted bidirectional transformers for molecular property prediction,” Nature Communications, vol. 12, no. 1, pp. 1–9, 2021.