Papers
Topics
Authors
Recent
2000 character limit reached

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space (2404.12141v4)

Published 18 Apr 2024 in q-bio.BM and cs.LG

Abstract: Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. Molecular generation with recurrent neural networks (rnns). arXiv preprint arXiv:1705.04612, 2017.
  4. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 2020a. doi: 10.1021/acs.jcim.0c00411. URL https://doi.org/10.1021/acs.jcim.0c00411. PMID: 32865404.
  5. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 60(9):4200–4215, 2020b.
  6. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  7. Bayesian flow networks. arXiv preprint arXiv:2308.07037, 2023.
  8. Energy-inspired molecular conformation optimization. In international conference on learning representations, 2021.
  9. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In The Eleventh International Conference on Learning Representations, 2022.
  10. DecompDiff: Diffusion models with decomposed priors for structure-based drug design. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  11827–11846. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/guan23a.html.
  11. Benchmarking generated poses: How rational is structure-based drug design with generative models? arXiv preprint arXiv:2308.07413, 2023.
  12. Protein-ligand blind docking using quickvina-w with inter-process spatio-temporal integration. Scientific reports, 7(1):15451, 2017.
  13. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  14. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp.  8867–8887. PMLR, 2022.
  15. Structure-based drug design with geometric deep learning. Current Opinion in Structural Biology, 79:102548, April 2023. ISSN 0959440X. doi: 10.1016/j.sbi.2023.102548. URL https://linkinghub.elsevier.com/retrieve/pii/S0959440X23000222.
  16. Pocketflow is a data-and-knowledge-driven structure-based molecular generative model. Nature Machine Intelligence, pp.  1–12, 2024.
  17. Alphaspace 2.0: Representing concave biomolecular surfaces using beta-clusters. Journal of Chemical Information and Modeling, 60(3):1494–1508, 2020. doi: 10.1021/acs.jcim.9b00652. URL https://doi.org/10.1021/acs.jcim.9b00652. PMID: 31995373.
  18. Generating 3D Molecules for Target Protein Binding, May 2022. URL http://arxiv.org/abs/2204.09410. arXiv:2204.09410 [cs, q-bio].
  19. Zero-shot 3d drug design by sketching and generating. In NeurIPS, 2022.
  20. A 3D Generative Model for Structure-Based Drug Design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021. URL http://arxiv.org/abs/2203.10446.
  21. Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models, November 2020. URL http://arxiv.org/abs/2010.14442. arXiv:2010.14442 [physics, q-bio].
  22. Gnina 1.0: molecular docking with deep learning. Journal of cheminformatics, 13(1):1–20, 2021.
  23. Pocket2Mol: Efficient molecular sampling based on 3D protein pockets. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  17644–17655. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/peng22b.html.
  24. Moldiff: Addressing the atom-bond inconsistency problem in 3d molecule diffusion generation. In International Conference on Machine Learning, pp.  27611–27629. PMLR, 2023.
  25. Fragment-based ligand generation guided by geometric deep learning on protein-ligand structures. In ICLR2022 Machine Learning for Drug Discovery, 2022. URL https://openreview.net/forum?id=192L9cr-8HU.
  26. E (n) equivariant graph neural networks. In International conference on machine learning, pp.  9323–9332. PMLR, 2021.
  27. Structure-based Drug Design with Equivariant Diffusion Models, October 2022. URL http://arxiv.org/abs/2210.13695. arXiv:2210.13695 [cs, q-bio].
  28. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018.
  29. Unified generative modeling of 3d molecules via bayesian flow networks. arXiv preprint arXiv:2403.15441, 2024a.
  30. Equivariant flow matching with hybrid probability transport for 3d molecule generation. Advances in Neural Information Processing Systems, 36, 2024b.
  31. The application of in silico drug-likeness predictions in pharmaceutical research. Advanced drug delivery reviews, 86:2–10, 2015.
  32. Autodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry, 31(2):455–461, 2010. doi: https://doi.org/10.1002/jcc.21334. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21334.
  33. Understanding drug-likeness. Wiley Interdisciplinary Reviews: Computational Molecular Science, 1(5):760–781, 2011.
  34. Walters, W. P. Virtual chemical libraries. Journal of Medicinal Chemistry, 62(3):1116–1124, 2019. doi: 10.1021/acs.jmedchem.8b01048. URL https://doi.org/10.1021/acs.jmedchem.8b01048. PMID: 30148631.
  35. Deep learning approaches for de novo drug design: An overview. Current Opinion in Structural Biology, 72:135–144, February 2022. ISSN 0959440X. doi: 10.1016/j.sbi.2021.10.001. URL https://linkinghub.elsevier.com/retrieve/pii/S0959440X21001433.
  36. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2021.
  37. Learning Subpocket Prototypes for Generalizable Structure-based Drug Design, May 2023. URL http://arxiv.org/abs/2305.13997. arXiv:2305.13997 [cs, q-bio].
  38. Molecule Generation For Target Protein Binding with Structural Motifs. 2023.
Citations (7)

Summary

  • The paper proposes MolCRAFT, a framework that leverages a unified continuous parameter space to enhance molecular binding affinity and realistic 3D conformations.
  • It employs a noise-reduced sampling strategy and SE-(3) equivariance to generate chemically plausible molecular geometries and reduce computational variance.
  • MolCRAFT demonstrates superior performance with reference Vina Scores of -6.59 kcal/mol and improved sample efficiency over conventional methods.

Exploring MolCRAFT: A Novel Approach to Structure-Based Drug Design in Continuous Parameter Space

Introduction

MolCRAFT introduces a novel framework for Structure-Based Drug Design (SBDD) that operates entirely within a continuous parameter space. By addressing the limitations of existing generative models, which predominantly focus on binding affinity while often neglecting the accurate modeling of molecular conformation, MolCRAFT proves effective in generating feasible 3D molecular structures with high binding affinities.

Challenges of Existing SBDD Generative Models

Generative models for SBDD, including both autoregressive and diffusion-based approaches, face significant challenges in maintaining accurate and realistic molecular conformations:

  • Autoregressive Models: These tend to suffer from mode collapse, frequently generating a limited set of molecular substructures, and often fail to produce chemically plausible molecular geometries due to unnatural atom ordering.
  • Diffusion Models: While addressing some of the limitations of autoregressive models by using a non-autoregressive approach, diffusion models struggle with the integration of discrete and continuous spaces, leading to high variance and occasional generation of incomplete or disconnected molecular structures.

MolCRAFT's Approach and Methodology

MolCRAFT introduces several innovations to overcome the limitations observed in existing models:

  • Unified Continuous Parameter Space: MolCRAFT models molecular generation in a unified continuous parameter space, which helps in bridging the gap between discrete and continuous modeling aspects inherent in molecular structures.
  • Noise Reduced Sampling Strategy: By implementing a novel sampling technique within the continuous parameter space, MolCRAFT reduces variance and improves the efficiency of the generative process.
  • SE-(3) Equivariance: The model incorporates SE-(3) equivariance to ensure that the generative process respects the spatial symmetries of molecules, essential for reliable predictions of molecular interactions within the biological context.

Empirical Performance

MolCRAFT's empirical validation demonstrates superior performance over existing methods:

  • Binding Affinity: MolCRAFT achieves reference-level Vina Scores of -6.59 kcal/mol, outperforming other strong baselines by at least 0.84 kcal/mol.
  • Conformational Stability: The model generates molecules with more stable 3D structures, reducing the occurrence of strain and steric clashes often seen with other generative models.
  • Sample Efficiency: MolCRAFT exhibits high efficiency in sample generation, producing valid and complete molecular configurations with significantly reduced computational overhead.

Theoretical Contributions and Practical Implications

MolCRAFT's continuous parameter space approach not only offers a new theoretical framework for understanding molecule generation but also has practical implications in drug discovery, providing a tool that efficiently generates viable drug candidates with a high probability of successful interaction with target biological receptors.

Future Directions

Looking ahead, MolCRAFT's integration of continuous parameter space modeling opens the avenue for further exploration and enhancement of generative models in drug design. Potential areas for future research include improving the model's ability to generate a wider variety of molecular structures and extending the approach to handle larger and more complex biological targets.

Conclusion

MolCRAFT represents a significant step forward in the modeling and generation of drug-like molecules with high precision and efficiency. Through its innovative approach to handling continuous and discrete molecular data, MolCRAFT offers both theoretical insights and practical solutions to the challenges of SBDD, paving the way for more sophisticated and effective drug discovery methodologies.

Whiteboard

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 102 likes about this paper.