Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling (2402.14315v2)

Published 22 Feb 2024 in q-bio.BM and cs.LG

Abstract: Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves optimization-based approaches more promising in practical scenario. However, existing optimization-based approaches choose to edit molecules in 2D space, and use molecular docking to estimate the activity using docking predicted 3D target-ligand complexes. The misalignment between the action space and the objective hinders the performance of these models, especially for those employ deep learning for acceleration. In this work, we propose MolEdit3D to combine 3D molecular generation with optimization frameworks. We develop a novel 3D graph editing model to generate molecules using fragments, and pre-train this model on abundant 3D ligands for learning target-independent properties. Then we employ a target-guided self-learning strategy to improve target-related properties using self-sampled molecules. MolEdit3D achieves state-of-the-art performance on majority of the evaluation metrics, and demonstrate strong capability of capturing both target-dependent and -independent properties.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Multi-objective optimization methods in de novo drug design. Mini reviews in medicinal chemistry, 12(10):979–987, 2012.
  2. Guiding deep molecular optimization with genetic exploration. Advances in Neural Information Processing Systems, 33, 2020.
  3. Geom: Energy-annotated molecular conformations for property prediction and molecular generation. arXiv preprint arXiv:2006.05531, 2020.
  4. Jürgen Bajorath. Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery, 1(11):882–894, 2002.
  5. A structure-based drug discovery paradigm. International journal of molecular sciences, 20(11):2783, 2019.
  6. Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, 2012.
  7. Molgan: An implicit generative model for small molecular graphs. ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
  8. Evolutionary algorithms for de novo drug design–a survey. Applied Soft Computing, 27:543–552, 2015.
  9. Autogrow: a novel algorithm for protein inhibitor design. Chemical biology & drug design, 73(2):168–178, 2009.
  10. Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61:3891–3898, 8 2021.
  11. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1):8–8, 2009.
  12. Molecular docking and structure-based drug design strategies. Molecules, 20(7):13384–13421, 2015.
  13. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 2020.
  14. Mimosa: Multi-constraint molecule sampling for molecule optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 125–133, 2021.
  15. Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022.
  16. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. arXiv preprint arXiv:2106.07802, 2021.
  17. The chembl database in 2017. Nucleic Acids Research, 45, 2017.
  18. Torsionnet: A reinforcement learning approach to sequential conformer search. Advances in Neural Information Processing Systems, 33:20142–20153, 2020.
  19. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  20. Jan H Jensen. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical science, 10(12):3567–3572, 2019.
  21. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
  22. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
  23. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  24. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
  25. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
  26. De novo structure-based drug design using deep learning. Journal of Chemical Information and Modeling, 62(21):5100–5109, 2021.
  27. Greg Landrum. RDKit: Open-source cheminformatics software. https://www.rdkit.org, 2021. Version: 2021.03.4, Accessed: 2023-5-01.
  28. Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv preprint arXiv:2012.11175, 2020.
  29. Generating 3D molecules for target protein binding. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 13912–13924. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/liu22m.html.
  30. Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
  31. Zero-shot 3d drug design by sketching and generating. In NeurIPS, 2022.
  32. A 3d generative model for structure-based drug design. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=yDwfVD_odRo.
  33. Molecular geometry prediction using a deep generative graph neural network. Scientific reports, 9(1):1–13, 2019.
  34. Generating 3d molecular structures conditional on a receptor binding site with deep generative models. 2020.
  35. Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design, 7(2):146–157, 2011.
  36. Asynchronous methods for deep reinforcement learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1928–1937, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/mniha16.html.
  37. Autodock4 and autodocktools4: Automated docking with selective receptor flexibility. Journal of computational chemistry, 30(16):2785–2791, 2009.
  38. n.d. Meeko: preparation of small molecules for autodock. Available at https://github.com/forlilab/Meeko.
  39. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In International Conference on Learning Representations, 2020.
  40. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, 2022.
  41. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.
  42. Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv preprint arXiv:1905.13372, 2019.
  43. Trust region policy optimization. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schulman15.html.
  44. Proximal policy optimization algorithms, 2017.
  45. Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020.
  46. Learning gradient fields for molecular conformation generation. In International Conference on Machine Learning, 2021.
  47. A generative model for molecular distance geometry. In International Conference on Machine Learning, pages 8949–8958. PMLR, 2020.
  48. Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of cheminformatics, 12(1):1–16, 2020.
  49. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010.
  50. Drugcentral 2018: an update. Nucleic acids research, 47(D1):D963–D970, 2019.
  51. Assessing the impact of generative ai on medicinal chemistry. Nature biotechnology, 38(2):143–145, 2020.
  52. Efficient multi-objective molecular optimization in a continuous latent space. Chemical science, 10(34):8016–8024, 2019.
  53. Mars: Markov molecular sampling for multi-objective drug discovery. In ICLR 2021: International Conference on Learning Representations 2021, 2021.
  54. Learning neural generative dynamics for molecular conformation generation. In International Conference on Learning Representations, 2021.
  55. Graph convolutional policy network for goal-directed molecular graph generation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  56. Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):1–10, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuwei Yang (11 papers)
  2. Siqi Ouyang (15 papers)
  3. Xueyu Hu (8 papers)
  4. Mingyue Zheng (6 papers)
  5. Hao Zhou (351 papers)
  6. Lei Li (1293 papers)
Citations (1)

Summary

Enhancing Structure-Based Drug Design with 3D Molecular Generative Pre-Training and Sampling

Introduction to MolEdit3D

Structure-based drug discovery (SBDD) presents a critical area within pharmaceutical research, where the goal is to design ligands (potential drug molecules) with high affinity towards target biomolecules, leveraging the knowledge of the 3D structure of the target. The paper introduces MolEdit3D, a novel approach that combines 3D molecular generation with optimization frameworks to address the challenges faced by current SBDD methods. MolEdit3D leverages a pre-trained 3D graph editing model, utilizing fragments as building blocks for molecule construction. The model is then fine-tuned via a target-guided self-learning strategy to enhance target-related properties. The paper details substantial improvements over previous methods in generating molecules with higher binding affinities and maintaining drug-like properties.

Overview of Existing SBDD Methods

Existing SBDD methods are broadly categorized into two types: conditional generation methods and optimization-based approaches. Conditional generation methods are constrained by the limited availability and quality of 3D target-ligand complex data, affecting their ability to learn both target-dependent and -independent properties. Optimization-based approaches, such as those using molecular docking, tend to operate on 2D molecular representations, which introduces a misalignment with the inherently 3D nature of molecular interactions. The paper critiques these existing methodologies for their limitations in efficiently exploring the molecular space and accurately approximating real-world biochemical interactions.

Key Contributions of MolEdit3D

MolEdit3D's development and implementation introduce several key innovations in the field of SBDD:

  • The introduction of a 3D molecular graph editing model allows for direct manipulation and generation of molecules in three-dimensional space, improving the relevance and accuracy of generated ligand structures.
  • Pre-training the model with a significant dataset of 3D molecules enables it to recognize and reproduce general drug-like properties, making the generated molecules more viable as potential drugs.
  • The application of a target-guided self-learning strategy, which leverages self-generated samples to fine-tune the model, enhances its capacity to produce molecules with higher affinity for specific targets.
  • MolEdit3D demonstrates leading performance across various metrics, notably in producing molecules with high binding affinity, while also ensuring the molecules exhibit desirable drug-like and synthesizable properties.

Theoretical and Practical Implications

MolEdit3D's approach to incorporating 3D generation and optimization within SBDD processes holds significant implications for the field. Practically, it offers a more efficient pathway to identify potential drug candidates by directly generating and optimizing molecules within the spatial confines of target binding sites. This direct approach could significantly shorten drug discovery timelines and increase the specificity and efficacy of resulting compounds. Theoretically, MolEdit3D's success supports the hypothesis that closer alignment of the model's action space with the natural three-dimensional interaction space of molecules can lead to more accurate and effective drug design methodologies.

Future Prospects in AI and Drug Discovery

Looking forward, the integration of 3D molecular generation and machine learning optimization presents a fertile ground for innovation in drug discovery. Potential areas for further research and development include the exploration of more complex molecular interaction models, incorporation of dynamic molecular simulations, and the expansion of self-learning strategies to incorporate more nuanced biochemical and pharmacological properties. As computational power and machine learning algorithms continue to evolve, methods like MolEdit3D stand at the forefront of transforming drug discovery, rendering it more precise, efficient, and rooted in the complex realities of molecular biology.

MolEdit3D's development represents an important step forward in the application of generative AI and machine learning to structure-based drug design. It not only addresses current limitations within SBDD methodologies but also opens up new avenues for research and application in the quest for more effective medicinal compounds.