Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations (2306.01631v5)

Published 2 Jun 2023 in cs.LG, cs.AI, and q-bio.QM

Abstract: Molecular representation learning is vital for various downstream applications, including the analysis and prediction of molecular properties and side effects. While Graph Neural Networks (GNNs) have been a popular framework for modeling molecular data, they often struggle to capture the full complexity of molecular representations. In this paper, we introduce a novel method called GODE, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. GODE integrates individual molecular graph representations with multi-domain biochemical data from knowledge graphs. By pre-training two GNNs on different graph structures and employing contrastive learning, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures. This fusion yields a more robust and informative representation, enhancing molecular property predictions by leveraging both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model significantly outperforms existing benchmarks, achieving an average ROC-AUC improvement of 12.7% for classification tasks and an average RMSE/MAE improvement of 34.4% for regression tasks. Notably, GODE surpasses the current leading model in property prediction, with advancements of 2.2% in classification and 7.2% in regression tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome biology, 21(1):1–17, 2020.
  2. Tucker: Tensor factorization for knowledge graph completion. arXiv preprint arXiv:1901.09590, 2019.
  3. Optimal transport graph neural networks. 2020.
  4. Bio2rdf: towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics, 41(5):706–716, 2008.
  5. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. Journal of the American Chemical Society, 131(25):8732–8733, 2009.
  6. Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004.
  7. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013.
  8. Fp-gnn: a versatile deep learning architecture for enhanced molecular property prediction. Briefings in Bioinformatics, 23(6):bbac408, 2022.
  9. Building a knowledge graph to enable precision medicine. Nature Scientific Data, 2023a. doi: https://doi.org/10.1038/s41597-023-01960-3. URL https://www.nature.com/articles/s41597-023-01960-3.
  10. Building a knowledge graph to enable precision medicine. Scientific Data, 10(1):67, 2023b.
  11. A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical science, 10(2):370–377, 2019.
  12. John S Delaney. Esol: estimating aqueous solubility directly from molecular structure. Journal of chemical information and computer sciences, 44(3):1000–1005, 2004.
  13. Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems, 28, 2015.
  14. Text2mol: Cross-modal molecule retrieval with natural language queries. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  595–607, 2021.
  15. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022a.
  16. Molecular contrastive learning with chemical element knowledge graph, 2022b.
  17. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence, pp.  1–12, 2023.
  18. Potentialnet for molecular property prediction. ACS central science, 4(11):1520–1530, 2018.
  19. Improvement in admet prediction with multitask deep featurization. Journal of medicinal chemistry, 63(16):8835–8848, 2020.
  20. Pubchemrdf: towards the semantic annotation of pubchem compound and substance databases. Journal of cheminformatics, 7(1):1–15, 2015.
  21. Chembl: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1):D1100–D1107, 2012.
  22. A data-driven approach to predicting successes and failures of clinical trials. Cell chemical biology, 23(10):1294–1301, 2016.
  23. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
  24. Learning to make chemical predictions: the interplay of feature representation, data, and machine learning methods. Chem, 6(7):1527–1542, 2020.
  25. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
  26. Graph meta learning via local subgraphs. Advances in neural information processing systems, 33:5862–5874, 2020.
  27. Deeppurpose: a deep learning library for drug–target interaction prediction. Bioinformatics, 36(22-23):5545–5547, 2020.
  28. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  29. Editorial: Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental toxicants and drugs. Frontiers in Environmental Science, 5, 2017. ISSN 2296-665X. doi: 10.3389/fenvs.2017.00003. URL https://www.frontiersin.org/articles/10.3389/fenvs.2017.00003.
  30. Mol2vec: unsupervised machine learning approach with chemical intuition. Journal of chemical information and modeling, 58(1):27–35, 2018.
  31. Predicting organic reaction outcomes with weisfeiler-lehman network. arXiv preprint arXiv:1709.04555, 2017.
  32. Learning multimodal graph-to-graph translation for molecular optimization. arXiv preprint arXiv:1812.01070, 2018.
  33. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  34. The sider database of drugs and side effects. Nucleic acids research, 44(D1):D1075–D1079, 2016.
  35. Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8, 2013.
  36. Retcl: A selection-based approach for retrosynthesis via contrastive learning, 2021.
  37. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering, pp.  1–17, 2022.
  38. Dgl-lifesci: An open-source toolkit for deep learning on graphs in life science. ACS omega, 6(41):27233–27238, 2021.
  39. Lanczosnet: Multi-scale deep graph convolutional networks. arXiv preprint arXiv:1901.01484, 2019.
  40. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019.
  41. Molecular property prediction: A multilevel quantum interactions modeling perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  1052–1060, 2019.
  42. Molecular geometry prediction using a deep generative graph neural network. Scientific reports, 9(1):20381, 2019.
  43. A bayesian approach to in silico blood-brain barrier penetration modeling. Journal of chemical information and modeling, 52(6):1686–1697, 2012.
  44. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  45. Freesolv: a database of experimental and calculated hydration free energies, with input files. Journal of computer-aided molecular design, 28:711–720, 2014.
  46. The disgenet knowledge platform for disease genomics: 2019 update. Nucleic acids research, 48(D1):D845–D855, 2020.
  47. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  48. Electronic spectra from tddft and machine learning in chemical space. The Journal of chemical physics, 143(8), 2015.
  49. Toxcast chemical landscape: paving the road to 21st century toxicology. Chemical research in toxicology, 29(8):1225–1251, 2016.
  50. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
  51. Self-supervised graph transformer on large-scale molecular data, 2020.
  52. Contrastive learning of image-and structure-based representations in drug discovery. In ICLR2022 Machine Learning for Drug Discovery, 2022.
  53. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
  54. Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022.
  55. Enhancing activity prediction models in drug discovery with the ability to understand human language. arXiv preprint arXiv:2303.03363, 2023.
  56. Edge attention-based multi-relational graph convolutional networks. arXiv preprint arXiv: 1802.04944, 2018.
  57. Gated graph recursive neural networks for molecular property prediction. arXiv preprint arXiv:1909.00259, 2019.
  58. Multi-scale representation learning on proteins. Advances in Neural Information Processing Systems, 34:25244–25255, 2021.
  59. Communicative representation learning on attributed molecular graphs. In IJCAI, volume 2020, pp.  2831–2838, 2020.
  60. 3D infomax improves GNNs for molecular property prediction. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  20479–20502. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/stark22a.html.
  61. Zinc 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11):2324–2337, 2015.
  62. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481, 2022.
  63. Computational modeling of β𝛽\betaitalic_β-secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of chemical information and modeling, 56(10):1936–1949, 2016.
  64. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197, 2019.
  65. String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research, 47(D1):D607–D613, 2019.
  66. Multi-modal classification of alzheimer’s disease using nonlinear graph fusion. Pattern recognition, 63:171–181, 2017.
  67. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  68. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
  69. Chemical-reaction-aware molecule representation learning. arXiv preprint arXiv:2109.09888, 2021a.
  70. Molecule property prediction based on spatial graph embedding. Journal of chemical information and modeling, 59(9):3817–3828, 2019.
  71. Molclr: Molecular contrastive learning of representations via graph neural networks. arxiv 2021. arXiv preprint arXiv:2102.10056, 2021b.
  72. Imbalanced graph classification via graph-of-graph neural networks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp.  2067–2076, 2022a.
  73. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022b.
  74. Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction. Journal of cheminformatics, 12(1):1–18, 2020.
  75. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  76. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
  77. Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. Journal of chemical information and modeling, 57(11):2672–2685, 2017.
  78. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
  79. Vision-language pre-training with triple contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15671–15680, 2022.
  80. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
  81. A unified drug–target interaction prediction framework based on knowledge graph and recommendation system. Nature communications, 12(1):6775, 2021.
  82. Structpool: Structured graph pooling via conditional random fields. In Proceedings of the 8th International Conference on Learning Representations, 2020.
  83. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications, 13(1):862, 2022.
  84. Motif-based graph self-supervised learning for molecular property prediction, 2021.
  85. Predicting retrosynthetic reactions using self-corrected transformer neural networks. Journal of Chemical Information and Modeling, 60(1):47–55, 2019.
  86. Unified 2d and 3d pre-training of molecular representations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  2626–2636, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Pengcheng Jiang (15 papers)
  2. Cao Xiao (84 papers)
  3. Tianfan Fu (53 papers)
  4. Jimeng Sun (181 papers)
  5. Parminder Bhatia (50 papers)
  6. Taha Kass-Hout (13 papers)
  7. Jiawei Han (263 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com