Structure-based Drug Design with Equivariant Diffusion Models (2210.13695v3)
Abstract: Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs in complex with their protein targets to propose new drug candidates. These approaches typically place one atom at a time in an autoregressive fashion using the binding pocket as well as previously added ligand atoms as context in each step. Recently a surge of diffusion generative models has entered this domain which hold promise to capture the statistical properties of natural ligands more faithfully. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pre-trained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design, and partial molecular design with inpainting. We formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Our in silico experiments demonstrate that DiffSBDD captures the statistics of the ground truth data effectively. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics. These results support the assumption that diffusion models represent the complex distribution of structural data more accurately than previous methods, and are able to incorporate additional design objectives and constraints changing nothing but the sampling strategy.
- Free energy calculation guided virtual screening of synthetically feasible ligand r-group and scaffold modifications: an emerging paradigm for lead optimization. In Annual Reports in Medicinal Chemistry, volume 50, pp. 237–262. Elsevier, 2017.
- Learning 3d representations of molecular chirality with invariance to bond rotations. arXiv preprint arXiv:2110.04383, 2021.
- Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Anderson, A. C. The process of structure-based drug design. Chemistry & biology, 10(9):787–797, 2003.
- Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
- Designed nanomolar small-molecule inhibitors of ena/vasp evh1 interaction impair invasion and extravasation of breast cancer cells. Proceedings of the National Academy of Sciences, 117(47):29684–29690, 2020.
- E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):1–11, 2022.
- The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
- Scaffold hopping. Drug discovery today: Technologies, 1(3):217–224, 2004.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
- Structural basis for recognition of frizzled proteins by clostridium difficile toxin b. Science, 360(6389):664–669, 2018.
- Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776, 2022.
- On the art of compiling and using’drug-like’chemical fragment spaces. ChemMedChem: Chemistry Enabling Drug Discovery, 3(10):1503–1507, 2008.
- Fzd2 inhibits the cell growth and migration of salivary adenoid cystic carcinomas. Oncology Reports, 35(2):1006–1012, 2016.
- Structure-aware generation of drug-like molecules. arXiv preprint arXiv:2111.04107, 2021.
- Se (3) equivariant graph neural networks with complete local frames. In International Conference on Machine Learning, pp. 5583–5608. PMLR, 2022a.
- Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022b.
- Chemspace: Interpretable and interactive chemical space exploration. 2022c.
- Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems, 28, 2015.
- Molecular docking and structure-based drug design strategies. Molecules, 20(7):13384–13421, 2015.
- Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 2020.
- Iterative computational design and crystallographic screening identifies potent inhibitors targeting the nsp3 macrodomain of sars-cov-2. Proceedings of the National Academy of Sciences, 120(2):e2212931120, 2023.
- Utilizing graph machine learning within drug discovery and development. Briefings in Bioinformatics, 22(6), May 2021. doi: 10.1093/bib/bbab159. URL https://doi.org/10.1093/bib/bbab159.
- Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
- 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Path integral stochastic optimal control for sampling transition paths. arXiv preprint arXiv:2207.02149, 2022.
- Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867–8887. PMLR, 2022.
- Binding moad (mother of all databases). Proteins: Structure, Function, and Bioinformatics, 60(3):333–340, 2005.
- Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274, 2022.
- Zinc- a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45(1):177–182, 2005.
- Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
- Structure-based drug design to augment hit discovery. Drug discovery today, 16(17-18):831–839, 2011.
- The phyre2 web portal for protein modeling, prediction and analysis. Nature protocols, 10(6):845–858, 2015.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
- Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pp. 5361–5370. PMLR, 2020.
- Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2021.
- Landrum, G. et al. Rdkit: Open-source cheminformatics software. 2016.
- Euclidean neural networks (e3nn) v1. 0. Technical report, Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), 2020.
- Li, Q. Application of fragment-based drug discovery to versatile targets. Frontiers in molecular biosciences, 7:180, 2020.
- Structure-based de novo drug design using 3d deep generative models. Chemical science, 12(41):13664–13675, 2021.
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced drug delivery reviews, 64:4–17, 2012.
- Generating 3d molecules for target protein binding. arXiv preprint arXiv:2204.09410, 2022.
- Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, 2022.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471, 2022.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845, 2021.
- A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
- Antigen-specific antibody design and optimization with diffusion-based generative models. bioRxiv, 2022.
- Lyne, P. D. Structure-based virtual screening: an overview. Drug discovery today, 7(20):1047–1055, 2002.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):1–14, 2011.
- Pocket2mol: Efficient molecular sampling based on 3d protein pockets. arXiv preprint arXiv:2205.07249, 2022.
- Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701–2713, 2022.
- The impact of aromatic ring count on compound developability–are too many aromatic rings a liability in drug design? Drug discovery today, 14(21-22):1011–1020, 2009.
- E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021.
- Fragment binding to the nsp3 macrodomain of sars-cov-2 identified through crystallographic screening and computational docking. Science advances, 7(16):eabf8711, 2021.
- Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
- Serre, J.-P. et al. Linear representations of finite groups, volume 42. Springer, 1977.
- Shoichet, B. K. Virtual screening of chemical libraries. Nature, 432(7019):862–865, 2004.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pp. 20503–20521. PMLR, 2022.
- MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11):1026–1028, October 2017. doi: 10.1038/nbt.3988. URL https://doi.org/10.1038/nbt.3988.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
- Scaffolding protein functional sites using deep learning. Science, 377(6604):387–394, 2022.
- Prediction of physicochemical parameters by atomic contributions. Journal of chemical information and computer sciences, 39(5):868–873, 1999.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Arne Schneuing (3 papers)
- Yuanqi Du (52 papers)
- Charles Harris (8 papers)
- Arian Jamasb (4 papers)
- Ilia Igashov (6 papers)
- Weitao Du (23 papers)
- Tom Blundell (5 papers)
- Carla Gomes (26 papers)
- Max Welling (202 papers)
- Michael Bronstein (77 papers)
- Bruno Correia (6 papers)
- Kieran Didi (11 papers)
- Pietro Lio (69 papers)