Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 98 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 33 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 87 tok/s
GPT OSS 120B 465 tok/s Pro
Kimi K2 220 tok/s Pro
2000 character limit reached

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model (2402.08790v2)

Published 13 Feb 2024 in cs.LG and q-bio.QM

Abstract: Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Semantic probabilistic layers for neuro-symbolic learning. Advances in Neural Information Processing Systems, 35:29944–29959, 2022.
  2. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. Journal of Machine Learning Research, 22(82):1–6, 2021. URL http://jmlr.org/papers/v22/20-825.html.
  3. On the ambiguity of rank-based evaluation of entity alignment or link prediction methods. arXiv preprint arXiv:2002.06914, 2020.
  4. Quantifying the chemical beauty of drugs. Nature chemistry, 4(2):90–98, 2012.
  5. Generative models for molecular discovery: Recent advances and challenges. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1608, 2022.
  6. Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301, 2023.
  7. Understanding the performance of knowledge graph embeddings in drug discovery. Artificial Intelligence in the Life Sciences, 2:100036, December 2022. ISSN 2667-3185. doi: 10.1016/j.ailsci.2022.100036. URL http://dx.doi.org/10.1016/j.ailsci.2022.100036.
  8. Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), February 2023. ISSN 2052-4463. doi: 10.1038/s41597-023-01960-3. URL http://dx.doi.org/10.1038/s41597-023-01960-3.
  9. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  10. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1:1–11, 2009.
  11. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  12. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife, 6:e26726, 2017.
  13. Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.
  14. Knowledge Graphs and Their Applications in Drug Discovery, pp.  203–221. Springer US, September 2023. ISBN 9781071634493. doi: 10.1007/978-1-0716-3449-3˙9. URL http://dx.doi.org/10.1007/978-1-0716-3449-3_9.
  15. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Scientific Reports, 10, 12 2020.
  16. Hierarchical generation of molecular graphs using structural motifs. In International Conference on Machine Learning, pp.  4839–4848. PMLR, 2020a.
  17. Multi-objective molecule generation using interpretable substructures. In International conference on machine learning, pp.  4849–4859. PMLR, 2020b.
  18. Score-based generative modeling of graphs via the system of stochastic differential equations. In International Conference on Machine Learning, pp.  10362–10383. PMLR, 2022a.
  19. Score-based generative modeling of graphs via the system of stochastic differential equations. In Proceedings of the 39th International Conference on Machine Learning, 2022b.
  20. Autoregressive diffusion model for graph generation, 2023. URL https://openreview.net/forum?id=98J48HZXxd5.
  21. Lavecchia, A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug discovery today, 24(10):2017–2032, 2019.
  22. Exploring chemical space with score-based out-of-distribution generation. ArXiv, abs/2206.07632, 2022. URL https://api.semanticscholar.org/CorpusID:249674702.
  23. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
  24. How to turn your knowledge graph embeddings into generative models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=RSGNGiB1q4.
  25. Deep neural nets as a method for quantitative structure–activity relationships. Journal of Chemical Information and Modeling, 55:263–274, 2015. doi: 10.1021/ci500747n.
  26. Spectre: Spectral conditioning helps to overcome the expressivity limits of one-shot graph generators. In International Conference on Machine Learning, pp.  15159–15179. PMLR, 2022.
  27. The cost of drug development: a systematic review. Health Policy, 100:4–17, 2011. doi: 10.1016/j.healthpol.2010.12.002.
  28. Exploring deep recurrent models with reinforcement learning for molecule design. 2018.
  29. Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring. Briefings in Bioinformatics, 23, 2022. doi: 10.1093/bib/bbac269.
  30. A three-way model for collective learning on multi-relational data. In Icml, volume 11, pp.  3104482–3104584, 2011.
  31. Predicting polypharmacy side-effects using knowledge graph embeddings. AMIA Summits on Translational Science Proceedings, 2020:449, 2020.
  32. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics, 9(1):1–14, 2017.
  33. Deep learning for drug repurposing: Methods, databases, and applications. Wiley interdisciplinary reviews: Computational molecular science, 12(4):e1597, 2022.
  34. Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. Journal of chemical information and modeling, 58(9):1736–1741, 2018.
  35. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
  36. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Briefings in Bioinformatics, 20:1878–1912, 2018. doi: 10.1093/bib/bby061.
  37. Virtual screening, molecular docking and qsar studies in drug discovery and development programme. Journal of Drug Delivery and Therapeutics, 10:225–233, 2020. doi: 10.22270/jddt.v10i4.4218.
  38. A knowledge graph to interpret clinical proteomics data. Nature Biotechnology, 40(5):692–702, January 2022. ISSN 1546-1696. doi: 10.1038/s41587-021-01145-6. URL http://dx.doi.org/10.1038/s41587-021-01145-6.
  39. Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2019.
  40. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  41. Anvil - system architecture and experiences from deployment and early user operations. In Practice and Experience in Advanced Research Computing, PEARC ’22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450391610. doi: 10.1145/3491418.3530766. URL https://doi.org/10.1145/3491418.3530766.
  42. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197, 2019.
  43. Molshap: interpreting quantitative structure–activity relationships using shapley values of r-groups. Journal of Chemical Information and Modeling, 2023. doi: 10.1021/acs.jcim.3c00465.
  44. Knowledge graph completion via complex tensor factorization. Journal of Machine Learning Research, 18(130):1–38, 2017.
  45. Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX.
  46. Biokg: A knowledge graph for relational learning on biological data. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp.  3173–3180, 2020.
  47. Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  48. Hit and lead discovery with explorative RL and fragment-based molecule generation. arXiv:2110.01219, 2021.
  49. Predicting drug–disease associations through layer attention graph convolutional network. Briefings in bioinformatics, 22(4):bbaa243, 2021.
  50. Moflow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  617–626, 2020.
  51. Knowledge graph embedding with hierarchical relation structure. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  3198–3207, 2018.
  52. Pharmkg: a dedicated knowledge graph benchmark for bomedical data mining. Briefings in Bioinformatics, 22(4), December 2020. ISSN 1477-4054. doi: 10.1093/bib/bbaa344. URL http://dx.doi.org/10.1093/bib/bbaa344.
  53. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets