Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Multi-view biomedical foundation models for molecule-target and property prediction (2410.19704v3)

Published 25 Oct 2024 in q-bio.BM, cs.LG, and cs.AI

Abstract: Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. \bibcommenthead
  2. Principles of early drug discovery. British Journal of Pharmacology 162, 1239–1249 (2011). URL https://doi.org/10.1111/j.1476-5381.2010.01127.x. Publisher: John Wiley & Sons, Ltd.
  3. Zeng, X. et al. Deep generative molecular design reshapes drug discovery. Cell Reports Medicine 3 (2022). URL https://doi.org/10.1016/j.xcrm.2022.100794. Publisher: Elsevier.
  4. Cheng, F. et al. Artificial intelligence and open science in discovery of disease-modifying medicines for Alzheimer’s disease. Cell Reports Medicine 5 (2024). URL https://doi.org/10.1016/j.xcrm.2023.101379. Publisher: Elsevier.
  5. Butkiewicz, M. et al. Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database. Molecules 18, 735–756 (2013).
  6. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery 18, 463–477 (2019). URL https://doi.org/10.1038/s41573-019-0024-5.
  7. Muratov, E. N. et al. QSAR without borders. Chemical Society Reviews 49, 3525–3564 (2020). URL http://dx.doi.org/10.1039/D0CS00098A. Publisher: The Royal Society of Chemistry.
  8. A review of molecular representation in the age of machine learning. WIREs Computational Molecular Science 12, e1603 (2022). URL https://doi.org/10.1002/wcms.1603. Publisher: John Wiley & Sons, Ltd.
  9. Zhao, W. X. et al. A survey of large language models (2024). URL https://arxiv.org/abs/2303.18223. 2303.18223.
  10. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction (2020). URL http://arxiv.org/abs/2010.09885. ArXiv:2010.09885 [physics, q-bio].
  11. Chilingaryan, G. et al. BARTSmiles: Generative Masked Language Models for Molecular Representations (2022). URL http://arxiv.org/abs/2211.16349. ArXiv:2211.16349 [cs, q-bio].
  12. Ross, J. et al. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence 4, 1256–1264 (2022). URL https://doi.org/10.1038/s42256-022-00580-7.
  13. Geometric deep learning on molecular representations. Nature Machine Intelligence 3, 1023–1032 (2021). URL https://www.nature.com/articles/s42256-021-00418-8.
  14. Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences 25, 64–73 (1985). URL https://doi.org/10.1021/ci00046a002. Publisher: American Chemical Society.
  15. Reoptimization of MDL Keys for Use in Drug Discovery. Journal of Chemical Information and Computer Sciences 42, 1273–1280 (2002). URL https://doi.org/10.1021/ci010132r. Publisher: American Chemical Society.
  16. Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 50, 742–754 (2010). URL https://doi.org/10.1021/ci100050t. Publisher: American Chemical Society.
  17. Comparing structural fingerprints using a literature-based similarity benchmark. Journal of Cheminformatics 8, 36 (2016). URL https://doi.org/10.1186/s13321-016-0148-0.
  18. Landrum, G. Rdkit: Open-source cheminformatics. URL http://www.rdkit.org.
  19. Duvenaud, D. K. et al. Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R. (eds) Convolutional networks on graphs for learning molecular fingerprints. (eds Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R.) Advances in Neural Information Processing Systems, Vol. 28 (Curran Associates, Inc., 2015). URL https://proceedings.neurips.cc/paper_files/paper/2015/file/f9be311e65d81a9ad8150a60844bb94c-Paper.pdf.
  20. Low Data Drug Discovery with One-Shot Learning. ACS Central Science 3, 283–293 (2017). URL https://doi.org/10.1021/acscentsci.6b00367. Publisher: American Chemical Society.
  21. Combining docking pose rank and structure with deep learning improves protein–ligand binding mode prediction over a baseline docking approach. Journal of chemical information and modeling 60, 4170–4179 (2020). Publisher: American Chemical Society.
  22. Wieder, O. et al. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies 37, 1–12 (2020). URL https://www.sciencedirect.com/science/article/pii/S1740674920300305.
  23. Kim, J. et al. Pure Transformers are Powerful Graph Learners (2022). URL http://arxiv.org/abs/2207.02505. ArXiv:2207.02505 [cs].
  24. Attending to Graph Transformers (2023). URL http://arxiv.org/abs/2302.04181. ArXiv:2302.04181 [cs].
  25. Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed qsar/qspr models (2017). URL https://arxiv.org/abs/1706.06689. 1706.06689.
  26. Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nature Machine Intelligence 4, 1004–1016 (2022). URL https://www.nature.com/articles/s42256-022-00557-6.
  27. Liu, S. et al. Pre-training Molecular Graph Representation with 3D Geometry (2022). URL http://arxiv.org/abs/2110.07728. ArXiv:2110.07728 [cs, eess, q-bio].
  28. Zhu, J. et al. Unified 2D and 3D Pre-Training of Molecular Representations 2626–2636 (2022). URL https://dl.acm.org/doi/10.1145/3534678.3539368.
  29. Zhu, Y. et al. Improving Molecular Pretraining with Complementary Featurizations (2022). URL http://arxiv.org/abs/2209.15101. ArXiv:2209.15101 [physics, q-bio].
  30. Du, W. et al. Fusing 2D and 3D molecular graphs as unambiguous molecular descriptors for conformational and chiral stereoisomers. Briefings in Bioinformatics 24, bbac560 (2023). URL https://doi.org/10.1093/bib/bbac560.
  31. Hu, J. et al. Biomedical foundation models (2023). URL https://research.ibm.com/projects/biomedical-foundation-models.
  32. Kim, S. et al. PubChem 2023 update. Nucleic Acids Research 51, D1373–D1380 (2022). URL https://doi.org/10.1093/nar/gkac956. _eprint: https://academic.oup.com/nar/article-pdf/51/D1/D1373/48441598/gkac956.pdf.
  33. Tingle, B. I. et al. Zinc-22-a free multi-billion-scale database of tangible compounds for ligand discoery. Journal of Chemical Information and Modeling 63, 1166–1176 (2023). URL https://doi.org/10.1021/acs.jcim.2c01253. Publisher: American Chemical Society.
  34. Cheng, F. et al. Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers. Journal of Chemical Information and Modeling 51, 996–1011 (2011). URL https://doi.org/10.1021/ci200028n. Publisher: American Chemical Society.
  35. Wu, Z. et al. Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018). URL http://dx.doi.org/10.1039/C7SC02664A.
  36. Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology 29, 1046–1051 (2011). URL https://doi.org/10.1038/nbt.1990.
  37. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research 40, D1100–D1107 (2011). URL https://doi.org/10.1093/nar/gkr777.
  38. Chan, W. K. B. et al. GLASS: a comprehensive database for experimentally validated GPCR-ligand associations. Bioinformatics 31, 3035–3042 (2015). URL https://doi.org/10.1093/bioinformatics/btv302.
  39. Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research 46, D1074–D1082 (2017). URL https://doi.org/10.1093/nar/gkx1037.
  40. Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions (2023). URL http://arxiv.org/abs/2209.03430. ArXiv:2209.03430 [cs].
  41. Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. Journal of Chemical Information and Computer Sciences 27, 82–85 (1987). URL https://doi.org/10.1021/ci00054a008. Publisher: American Chemical Society.
  42. The Properties of Known Drugs. 1. Molecular Frameworks. Journal of Medicinal Chemistry 39, 2887–2893 (1996). URL https://doi.org/10.1021/jm9602928. Publisher: American Chemical Society.
  43. Fast graph representation learning with pytorch geometric (2019). URL https://arxiv.org/abs/1903.02428. 1903.02428.
  44. Transformer protein language models are unsupervised structure learners. bioRxiv (2020). URL https://www.biorxiv.org/content/early/2020/12/15/2020.12.15.422761.
  45. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. Journal of chemical information and modeling 59, 947–961 (2019).
  46. From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction. Journal of Chemical Information and Modeling 64, 2496–2507 (2024). URL https://doi.org/10.1021/acs.jcim.3c01208. Publisher: American Chemical Society.
  47. Trends in gpcr drug discovery: new agents, targets and indications. Nature reviews Drug discovery 16, 829–842 (2017).
  48. Qiu, Y. et al. Systematic characterization of multi-omics landscape between gut microbial metabolites and GPCRome in Alzheimer’s disease. Cell Reports 43 (2024). URL https://doi.org/10.1016/j.celrep.2024.114128. Publisher: Elsevier.
  49. Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing alzheimer’s disease risk. Nature Genetics 51, 404–413 (2019). URL http://dx.doi.org/10.1038/s41588-018-0311-9.
  50. Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for alzheimer’s disease. Nature genetics 53, 1276–1282 (2021). [Online; accessed 2024-10-17].
  51. Zhou, Y. et al. Alzgps: a genome-wide positioning systems platform to catalyze multi-omics for alzheimer’s drug discovery. Alzheimer’s Research & Therapy 13 (2021). URL http://dx.doi.org/10.1186/s13195-020-00760-w.
  52. Han, S. et al. A metabolomics pipeline for the mechanistic interrogation of the gut microbiome. Nature 595, 415–420 (2021). URL http://dx.doi.org/10.1038/s41586-021-03707-9.
  53. Coletto, E. et al. The role of the mucin-glycan foraging Ruminococcus gnavus in the communication between the gut and the brain. Gut microbes 14, 2073784 (2022). [Online; accessed 2024-10-17].
  54. dos Santos Guilherme, M. et al. Impact of acute and chronic amyloid-β𝛽\betaitalic_β peptide exposure on gut microbial commensals in the mouse. Frontiers in Microbiology 11 (2020). URL http://dx.doi.org/10.3389/fmicb.2020.01008.
  55. Averill-Bates, D. A. The antioxidant glutathione. Vitamins and hormones 121, 109–141 (2023). [Online; accessed 2024-10-17].
  56. Impact of supplementary amino acids, micronutrients, and overall diet on glutathione homeostasis. Nutrients 11, 1056 (2019). [Online; accessed 2024-10-17].
  57. Cognitive improvement with glutathione supplement in alzheimer’s disease: A way forward. Journal of Alzheimer’s disease : JAD 68, 531–535 (2019). [Online; accessed 2024-10-17].
  58. Minhas, P. S. et al. Restoring hippocampal glucose metabolism rescues cognition across alzheimer’s disease pathologies. Science (New York, N.Y.) 385, eabm6131 (2024). [Online; accessed 2024-10-17].
  59. Jung, J. H. et al. Gut microbiome alterations in preclinical alzheimer’s disease. PloS one 17, e0278276 (2022). [Online; accessed 2024-10-17].
  60. Molecular representations for drug discovery (2024). Accepted.
  61. Zhou, G. et al. Uni-mol: A universal 3d molecular representation learning framework (2023). URL https://openreview.net/forum?id=6K2RM6wVqKu.
  62. Cheng, F. et al. A molecular video-derived foundation model streamlines scientific drug discovery. Research Square (2024).
  63. Guided docking as a data generation approach facilitates structure-based machine learning on kinases. Journal of Chemical Information and Modeling 64, 4009–4020 (2024). URL https://doi.org/10.1021/acs.jcim.4c00055. PMID: 38751014.
  64. Kaplan, J. et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020).
  65. Landrum, G. et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8, 5281 (2013).
  66. Hu, W. et al. Strategies for Pre-training Graph Neural Networks (2020). URL http://arxiv.org/abs/1905.12265. ArXiv:1905.12265 [cs, stat].
  67. Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28, 31–36 (1988).
  68. “found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chemical science 9, 6091–6098 (2018).
  69. Su, J. et al. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).
  70. Transformers are rnns: Fast autoregressive transformers with linear attention (2020).
  71. Deep residual learning for image recognition 770–778 (2016).
  72. McNemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157 (1947). URL https://doi.org/10.1007/BF02295996.
  73. Liaw, R. et al. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 (2018).
  74. Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021). URL http://dx.doi.org/10.1038/s41586-021-03819-2.
  75. Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling 61, 3891–3898 (2021).

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 69 likes.

Upgrade to Pro to view all of the tweets about this paper: