Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure-based Drug Design with Equivariant Diffusion Models (2210.13695v3)

Published 24 Oct 2022 in q-bio.BM and cs.LG

Abstract: Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs in complex with their protein targets to propose new drug candidates. These approaches typically place one atom at a time in an autoregressive fashion using the binding pocket as well as previously added ligand atoms as context in each step. Recently a surge of diffusion generative models has entered this domain which hold promise to capture the statistical properties of natural ligands more faithfully. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pre-trained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design, and partial molecular design with inpainting. We formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Our in silico experiments demonstrate that DiffSBDD captures the statistics of the ground truth data effectively. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics. These results support the assumption that diffusion models represent the complex distribution of structural data more accurately than previous methods, and are able to incorporate additional design objectives and constraints changing nothing but the sampling strategy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Free energy calculation guided virtual screening of synthetically feasible ligand r-group and scaffold modifications: an emerging paradigm for lead optimization. In Annual Reports in Medicinal Chemistry, volume 50, pp. 237–262. Elsevier, 2017.
  2. Learning 3d representations of molecular chirality with invariance to bond rotations. arXiv preprint arXiv:2110.04383, 2021.
  3. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
  4. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
  5. Anderson, A. C. The process of structure-based drug design. Chemistry & biology, 10(9):787–797, 2003.
  6. Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
  7. Designed nanomolar small-molecule inhibitors of ena/vasp evh1 interaction impair invasion and extravasation of breast cancer cells. Proceedings of the National Academy of Sciences, 117(47):29684–29690, 2020.
  8. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):1–11, 2022.
  9. The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
  10. Scaffold hopping. Drug discovery today: Technologies, 1(3):217–224, 2004.
  11. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
  12. Structural basis for recognition of frizzled proteins by clostridium difficile toxin b. Science, 360(6389):664–669, 2018.
  13. Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776, 2022.
  14. On the art of compiling and using’drug-like’chemical fragment spaces. ChemMedChem: Chemistry Enabling Drug Discovery, 3(10):1503–1507, 2008.
  15. Fzd2 inhibits the cell growth and migration of salivary adenoid cystic carcinomas. Oncology Reports, 35(2):1006–1012, 2016.
  16. Structure-aware generation of drug-like molecules. arXiv preprint arXiv:2111.04107, 2021.
  17. Se (3) equivariant graph neural networks with complete local frames. In International Conference on Machine Learning, pp. 5583–5608. PMLR, 2022a.
  18. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022b.
  19. Chemspace: Interpretable and interactive chemical space exploration. 2022c.
  20. Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems, 28, 2015.
  21. Molecular docking and structure-based drug design strategies. Molecules, 20(7):13384–13421, 2015.
  22. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 2020.
  23. Iterative computational design and crystallographic screening identifies potent inhibitors targeting the nsp3 macrodomain of sars-cov-2. Proceedings of the National Academy of Sciences, 120(2):e2212931120, 2023.
  24. Utilizing graph machine learning within drug discovery and development. Briefings in Bioinformatics, 22(6), May 2021. doi: 10.1093/bib/bbab159. URL https://doi.org/10.1093/bib/bbab159.
  25. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
  26. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023.
  27. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  28. Path integral stochastic optimal control for sampling transition paths. arXiv preprint arXiv:2207.02149, 2022.
  29. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867–8887. PMLR, 2022.
  30. Binding moad (mother of all databases). Proteins: Structure, Function, and Bioinformatics, 60(3):333–340, 2005.
  31. Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274, 2022.
  32. Zinc- a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45(1):177–182, 2005.
  33. Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
  34. Structure-based drug design to augment hit discovery. Drug discovery today, 16(17-18):831–839, 2011.
  35. The phyre2 web portal for protein modeling, prediction and analysis. Nature protocols, 10(6):845–858, 2015.
  36. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  37. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
  38. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pp. 5361–5370. PMLR, 2020.
  39. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2021.
  40. Landrum, G. et al. Rdkit: Open-source cheminformatics software. 2016.
  41. Euclidean neural networks (e3nn) v1. 0. Technical report, Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), 2020.
  42. Li, Q. Application of fragment-based drug discovery to versatile targets. Frontiers in molecular biosciences, 7:180, 2020.
  43. Structure-based de novo drug design using 3d deep generative models. Chemical science, 12(41):13664–13675, 2021.
  44. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced drug delivery reviews, 64:4–17, 2012.
  45. Generating 3d molecules for target protein binding. arXiv preprint arXiv:2204.09410, 2022.
  46. Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, 2022.
  47. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11461–11471, 2022.
  48. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2837–2845, 2021.
  49. A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
  50. Antigen-specific antibody design and optimization with diffusion-based generative models. bioRxiv, 2022.
  51. Lyne, P. D. Structure-based virtual screening: an overview. Drug discovery today, 7(20):1047–1055, 2002.
  52. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
  53. Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):1–14, 2011.
  54. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. arXiv preprint arXiv:2205.07249, 2022.
  55. Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701–2713, 2022.
  56. The impact of aromatic ring count on compound developability–are too many aromatic rings a liability in drug design? Drug discovery today, 14(21-22):1011–1020, 2009.
  57. E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021.
  58. Fragment binding to the nsp3 macrodomain of sars-cov-2 identified through crystallographic screening and computational docking. Science advances, 7(16):eabf8711, 2021.
  59. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
  60. Serre, J.-P. et al. Linear representations of finite groups, volume 42. Springer, 1977.
  61. Shoichet, B. K. Virtual screening of chemical libraries. Nature, 432(7019):862–865, 2004.
  62. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
  63. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  64. Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pp. 20503–20521. PMLR, 2022.
  65. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11):1026–1028, October 2017. doi: 10.1038/nbt.3988. URL https://doi.org/10.1038/nbt.3988.
  66. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
  67. Scaffolding protein functional sites using deep learning. Science, 377(6604):387–394, 2022.
  68. Prediction of physicochemical parameters by atomic contributions. Journal of chemical information and computer sciences, 39(5):868–873, 1999.
  69. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Arne Schneuing (3 papers)
  2. Yuanqi Du (52 papers)
  3. Charles Harris (8 papers)
  4. Arian Jamasb (4 papers)
  5. Ilia Igashov (6 papers)
  6. Weitao Du (23 papers)
  7. Tom Blundell (5 papers)
  8. Carla Gomes (26 papers)
  9. Max Welling (202 papers)
  10. Michael Bronstein (77 papers)
  11. Bruno Correia (6 papers)
  12. Kieran Didi (11 papers)
  13. Pietro Lio (69 papers)
Citations (150)

Summary

  • The paper introduces DiffSBDD, which employs an SE(3)-equivariant diffusion model to generate novel ligands with promising docking scores.
  • It leverages geometric deep learning to frame structure-based drug design as a 3D-conditional generation problem for precise molecular interactions.
  • The model demonstrates versatility in scaffold hopping and property optimization, outperforming existing approaches on established datasets.

Structure-based Drug Design with Equivariant Diffusion Models

The paper presented introduces DiffSBDD, an innovative approach to structure-based drug design (SBDD) that leverages the capabilities of denoising diffusion probabilistic models (DDPMs) within the field of geometric deep learning. By harnessing SE(3)-equivariant principles, the model aims to refine the design of small-molecule ligands that exhibit elevated binding affinity and specificity towards predefined protein targets. The paper characterizes SBDD as a 3D-conditional generation problem, emphasizing the intricacies involved in designing ligands that undergo complex interactions with protein pockets.

DiffSBDD distinguishes itself through its utilization of an SE(3)-equivariant 3D-conditional diffusion model. This model is foundational for generating novel drug-like ligands, accounting for the distinct symmetry constraints of molecular structures. The paper details comprehensive evaluations, highlighting the model's efficacy in generating ligands with promising docking scores, as evidenced by extensive in silico experiments. Moreover, the paper showcases the diffusion framework's versatility, demonstrating its applicability across diverse tasks within the drug design domain such as property optimization and molecular inpainting.

Methodology Overview

The proposed model addresses SBDD as a statistical generation issue by implementing an SE(3)-equivariant diffusion process. Specifically, the diffusion model engages in a forward and reverse trajectory through molecular generations, ensuring the SE(3) symmetry invariant is maintained. The paper provides insights into the enhancive role of geometric deep learning principles, coalescing permutations, rotations, and translations in molecular frameworks through graph neural networks.

Notably, the model utilizes an inpainting-inspired approach for molecular scaffold alterations, thus characterizing the paper's meritorious attempt at achieving efficient scaffold hopping, fragment growing, and other molecular redesign tasks.

Empirical Evaluation

The efficacy of DiffSBDD is empirically underscored through evaluations on both CrossDocked and Binding MOAD datasets. Notably, the model evidences its superiority over existing approaches such as 3D-SBDD and Pocket2Mol, as demonstrated by the docking scores of generated molecules. Furthermore, the results affirm the model's propensity to generate diverse and chemically viable structures with heightened predicted binding affinities.

In addition to de novo molecular design, DiffSBDD's capability to optimize molecular properties via an evolutionary framework, without retraining, exemplifies its flexibility and adaptability within lead optimization paradigms. The diffusion-driven optimization schema underpinning the model's evolutionary algorithm allows for leverage in exploring local chemical spaces effectively and demonstrates notable improvements in molecular design tasks.

Implications and Future Directions

The implications of this paper are twofold. Practically, the model can significantly impact pharmaceutical research by expediting the ligand design process and reducing reliance on costly and time-intensive empirical screening techniques. Theoretically, it sets a precedent for leveraging geometric deep learning to resolve 3D-level conditional generation challenges in molecular design.

Future research may explore augmenting the model to address scalability across larger biomolecular structures and exploring generalization capabilities on less characterized protein sites. Exploration into enhancing synthetic accessibility within the model might also be beneficial, ensuring a more conducive alignment with medicinal chemistry practices.

This paper adeptly illustrates DiffSBDD’s potential to revolutionize SBDD by underscoring the SE(3)-equivariant diffusion model's capacity to generate innovative ligands, optimizing molecular interactions with high precision and effectiveness. As AI continues to burgeon within drug discovery disciplines, such methodological ingenuity embodies the palpable potential for future SBDD advancements.