Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling (2402.14315v2)
Abstract: Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves optimization-based approaches more promising in practical scenario. However, existing optimization-based approaches choose to edit molecules in 2D space, and use molecular docking to estimate the activity using docking predicted 3D target-ligand complexes. The misalignment between the action space and the objective hinders the performance of these models, especially for those employ deep learning for acceleration. In this work, we propose MolEdit3D to combine 3D molecular generation with optimization frameworks. We develop a novel 3D graph editing model to generate molecules using fragments, and pre-train this model on abundant 3D ligands for learning target-independent properties. Then we employ a target-guided self-learning strategy to improve target-related properties using self-sampled molecules. MolEdit3D achieves state-of-the-art performance on majority of the evaluation metrics, and demonstrate strong capability of capturing both target-dependent and -independent properties.
- Multi-objective optimization methods in de novo drug design. Mini reviews in medicinal chemistry, 12(10):979–987, 2012.
- Guiding deep molecular optimization with genetic exploration. Advances in Neural Information Processing Systems, 33, 2020.
- Geom: Energy-annotated molecular conformations for property prediction and molecular generation. arXiv preprint arXiv:2006.05531, 2020.
- Jürgen Bajorath. Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery, 1(11):882–894, 2002.
- A structure-based drug discovery paradigm. International journal of molecular sciences, 20(11):2783, 2019.
- Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, 2012.
- Molgan: An implicit generative model for small molecular graphs. ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
- Evolutionary algorithms for de novo drug design–a survey. Applied Soft Computing, 27:543–552, 2015.
- Autogrow: a novel algorithm for protein inhibitor design. Chemical biology & drug design, 73(2):168–178, 2009.
- Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61:3891–3898, 8 2021.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1):8–8, 2009.
- Molecular docking and structure-based drug design strategies. Molecules, 20(7):13384–13421, 2015.
- Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 2020.
- Mimosa: Multi-constraint molecule sampling for molecule optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 125–133, 2021.
- Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022.
- Geomol: Torsional geometric generation of molecular 3d conformer ensembles. arXiv preprint arXiv:2106.07802, 2021.
- The chembl database in 2017. Nucleic Acids Research, 45, 2017.
- Torsionnet: A reinforcement learning approach to sequential conformer search. Advances in Neural Information Processing Systems, 33:20142–20153, 2020.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- Jan H Jensen. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical science, 10(12):3567–3572, 2019.
- Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Optimization by simulated annealing. science, 220(4598):671–680, 1983.
- Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
- De novo structure-based drug design using deep learning. Journal of Chemical Information and Modeling, 62(21):5100–5109, 2021.
- Greg Landrum. RDKit: Open-source cheminformatics software. https://www.rdkit.org, 2021. Version: 2021.03.4, Accessed: 2023-5-01.
- Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv preprint arXiv:2012.11175, 2020.
- Generating 3D molecules for target protein binding. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 13912–13924. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/liu22m.html.
- Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
- Zero-shot 3d drug design by sketching and generating. In NeurIPS, 2022.
- A 3d generative model for structure-based drug design. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=yDwfVD_odRo.
- Molecular geometry prediction using a deep generative graph neural network. Scientific reports, 9(1):1–13, 2019.
- Generating 3d molecular structures conditional on a receptor binding site with deep generative models. 2020.
- Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design, 7(2):146–157, 2011.
- Asynchronous methods for deep reinforcement learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1928–1937, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/mniha16.html.
- Autodock4 and autodocktools4: Automated docking with selective receptor flexibility. Journal of computational chemistry, 30(16):2785–2791, 2009.
- n.d. Meeko: preparation of small molecules for autodock. Available at https://github.com/forlilab/Meeko.
- Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In International Conference on Learning Representations, 2020.
- Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, 2022.
- Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.
- Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv preprint arXiv:1905.13372, 2019.
- Trust region policy optimization. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schulman15.html.
- Proximal policy optimization algorithms, 2017.
- Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020.
- Learning gradient fields for molecular conformation generation. In International Conference on Machine Learning, 2021.
- A generative model for molecular distance geometry. In International Conference on Machine Learning, pages 8949–8958. PMLR, 2020.
- Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of cheminformatics, 12(1):1–16, 2020.
- Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010.
- Drugcentral 2018: an update. Nucleic acids research, 47(D1):D963–D970, 2019.
- Assessing the impact of generative ai on medicinal chemistry. Nature biotechnology, 38(2):143–145, 2020.
- Efficient multi-objective molecular optimization in a continuous latent space. Chemical science, 10(34):8016–8024, 2019.
- Mars: Markov molecular sampling for multi-objective drug discovery. In ICLR 2021: International Conference on Learning Representations 2021, 2021.
- Learning neural generative dynamics for molecular conformation generation. In International Conference on Learning Representations, 2021.
- Graph convolutional policy network for goal-directed molecular graph generation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):1–10, 2019.
- Yuwei Yang (11 papers)
- Siqi Ouyang (15 papers)
- Xueyu Hu (8 papers)
- Mingyue Zheng (6 papers)
- Hao Zhou (351 papers)
- Lei Li (1293 papers)