Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Flows on Synthetic Pathway for Drug Design

Published 6 Oct 2024 in q-bio.BM and cs.LG | (2410.04542v2)

Abstract: Generative models in drug discovery have recently gained attention as efficient alternatives to brute-force virtual screening. However, most existing models do not account for synthesizability, limiting their practical use in real-world scenarios. In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to constrain the synthetic chemical pathway. We then train on this sequential generating process with the objective of generative flow networks (GFlowNets) to generate both highly rewarded and diverse molecules. To mitigate the large action space of synthetic pathways in GFlowNets, we implement a novel action space subsampling method. This enables RxnFlow to learn generative flows over extensive action spaces comprising combinations of 1.2 million building blocks and 71 reaction templates without significant computational overhead. Additionally, RxnFlow can employ modified or expanded action spaces for generation without retraining, allowing for the introduction of additional objectives or the incorporation of newly discovered building blocks. We experimentally demonstrate that RxnFlow outperforms existing reaction-based and fragment-based models in pocket-specific optimization across various target pockets. Furthermore, RxnFlow achieves state-of-the-art performance on CrossDocked2020 for pocket-conditional generation, with an average Vina score of -8.85 kcal/mol and 34.8% synthesizability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
  2. The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
  3. Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381–27394, 2021.
  4. Gflownet foundations. The Journal of Machine Learning Research, 24(1):10006–10060, 2023.
  5. Quantifying the chemical beauty of drugs. Nature chemistry, 4(2):90–98, 2012.
  6. A model to search for synthesizable molecules. Advances in Neural Information Processing Systems, 32, 2019.
  7. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nature machine intelligence, 1(7):307–315, 2019.
  8. Scscore: synthetic complexity learned from a reaction corpus. Journal of chemical information and modeling, 58(2):252–261, 2018.
  9. Synflownet: Towards molecule design with guaranteed synthesis pathways. arXiv preprint arXiv:2405.01155, 2024.
  10. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
  11. Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences, 42(6):1273–1280, 2002.
  12. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1:1–11, 2009.
  13. Structure-based discovery of nonopioid analgesics acting through the α𝛼\alphaitalic_α2a-adrenergic receptor. Science, 377(6614):eabn7065, 2022.
  14. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 60(9):4200–4215, 2020.
  15. The synthesizability of molecules proposed by generative models. Journal of chemical information and modeling, 60(12):5714–5723, 2020.
  16. Sample efficiency matters: a benchmark for practical molecular optimization. Advances in neural information processing systems, 35:21342–21357, 2022a.
  17. Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design. In International Conference on Learning Representations, 2022b. URL https://openreview.net/forum?id=FRxhHdnxt1.
  18. Aizynthfinder: a fast, robust and flexible open-source software for retrosynthetic planning. Journal of cheminformatics, 12(1):70, 2020.
  19. Learning to navigate the synthetically accessible chemical space using reinforcement learning. In International conference on machine learning, pp.  3668–3679. PMLR, 2020.
  20. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chemical science, 12(22):7866–7881, 2021.
  21. Generating multibillion chemical space of readily accessible screening compounds. Iscience, 23(11), 2020.
  22. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=kJqXEPXMsE0.
  23. Decompdiff: Diffusion models with decomposed priors for structure-based drug design. ICML, 2023b.
  24. Dogs: reaction-driven de novo design of bioactive compounds. PLoS computational biology, 8(2):e1002380, 2012.
  25. Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS omega, 5(51):32984–32994, 2020.
  26. Using autodock 4 and autodock vina with autodocktools: a tutorial. The Scripps Research Institute Molecular Graphics Laboratory, 10550(92037):1000, 2012.
  27. Biological sequence design with gflownets. In International Conference on Machine Learning, pp.  9786–9801. PMLR, 2022.
  28. Multi-objective gflownets. In International conference on machine learning, pp.  14631–14653. PMLR, 2023.
  29. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2020.
  30. Structure-based screening of novel lichen compounds against sars coronavirus main protease (mpro) as potentials inhibitors of covid-19. Molecular Diversity, 25:1665–1677, 2021.
  31. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  32. Dfrscore: deep learning-based scoring of synthetic complexity with drug-focused retrosynthetic analysis for high-throughput virtual screening. Journal of Chemical Information and Modeling, 64(7):2432–2444, 2023.
  33. Rgfn: Synthesizable molecular generation using gflownets. arXiv preprint arXiv:2406.08506, 2024.
  34. Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8(31.10):5281, 2013.
  35. Exploring chemical space with score-based out-of-distribution generation. In International Conference on Machine Learning, pp.  18872–18892. PMLR, 2023.
  36. Generating 3d molecules for target protein binding. arXiv preprint arXiv:2204.09410, 2022.
  37. Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, 2015.
  38. Daniel Lowe. Chemical reactions from US patents (1976-Sep2016), 6 2017. URL https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873.
  39. A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
  40. Projecting molecules into synthesizable chemical spaces. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=scFlbJQdm1.
  41. Trajectory balance: Improved credit assignment in gflownets. Advances in Neural Information Processing Systems, 35:5955–5967, 2022.
  42. Harry L Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of chemical documentation, 5(2):107–113, 1965.
  43. Causal inference in gene regulatory networks with gflownet: Towards scalability in large systems. arXiv preprint arXiv:2310.03579, 2023a.
  44. Hierarchical gflownet for crystal structure generation. In AI for Accelerated Materials Design-NeurIPS 2023 Workshop, 2023b.
  45. Open babel: An open chemical toolbox. Journal of cheminformatics, 3:1–14, 2011.
  46. Consideration of molecular weight during compound selection in virtual target-based database screening. Journal of chemical information and computer sciences, 43(1):267–272, 2003.
  47. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pp.  17644–17655. PMLR, 2022.
  48. Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701–2713, 2022.
  49. Danny Reidenbach. Evosbdd: Latent evolution for accurate and efficient structure-based drug design. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations, 2024.
  50. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature, 601(7893):452–459, 2022.
  51. Structure-based drug design with equivariant diffusion models, 2023.
  52. Pharmaconet: Accelerating structure-based virtual screening by pharmacophore modeling. arXiv preprint arXiv:2310.00681, 2023.
  53. Molecular generative model via retrosynthetically prepared chemical building block assembly. Advanced Science, 10(8):2206674, 2023.
  54. Tacogfn: Target conditioned gflownet for structure-based drug design. arXiv preprint arXiv:2310.03223, 2023.
  55. Lit-pcba: an unbiased data set for machine learning and virtual screening. Journal of chemical information and modeling, 60(9):4263–4273, 2020.
  56. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010.
  57. Amortizing intractable inference in diffusion models for vision, language, and control. arXiv preprint arXiv:2405.20971, 2024.
  58. Improving conformer generation for small rings and macrocycles based on distance geometry and experimental torsional-angle preferences. Journal of chemical information and modeling, 60(4):2044–2058, 2020.
  59. Uni-dock: Gpu-accelerated docking enables ultralarge virtual screening. Journal of chemical theory and computation, 19(11):3336–3345, 2023.
  60. Graph transformer networks: Learning meta-path graphs to improve gnns. Neural Networks, 153:104–119, 2022.
  61. Deep learning enables rapid identification of potent ddr1 kinase inhibitors. Nature biotechnology, 37(9):1038–1040, 2019.
  62. Phylogfn: Phylogenetic inference with generative flow networks. arXiv preprint arXiv:2310.08774, 2023.
  63. 3d molecular generative framework for interaction-guided drug design. Nature Communications, 15(1):2688, 2024.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.