TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design (2310.03223v6)
Abstract: Searching the vast chemical space for drug-like molecules that bind with a protein pocket is a challenging task in drug discovery. Recently, structure-based generative models have been introduced which promise to be more efficient by learning to generate molecules for any given protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, structure-based methods do not yet outperform optimization-based methods that generate binding molecules for just one pocket. To overcome limitations on data while leveraging learning across protein targets, we choose to model the reward distribution conditioned on pocket structure, instead of the training data distribution. We design TacoGFN, a novel GFlowNet-based approach for structure-based drug design, which can generate molecules conditioned on any protein pocket structure with probabilities proportional to its affinity and property rewards. In the generative setting for CrossDocked2020 benchmark, TacoGFN attains a state-of-the-art success rate of $56.0\%$ and $-8.44$ kcal/mol in median Vina Dock score while improving the generation time by multiple orders of magnitude. Fine-tuning TacoGFN further improves the median Vina Dock score to $-10.93$ kcal/mol and the success rate to $88.8\%$, outperforming all optimization-based methods.
- Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
- Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, 7(1):1–13, 2015.
- A practical guide to large-scale docking. Nature Protocols, 16(10):4799–4832, September 2021. ISSN 1750-2799. doi: 10.1038/s41596-021-00597-z. URL http://dx.doi.org/10.1038/s41596-021-00597-z.
- Flow network based generative models for non-iterative diverse candidate generation, 2021.
- Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, January 2012. ISSN 1755-4349. doi: 10.1038/nchem.1243. URL http://dx.doi.org/10.1038/nchem.1243.
- REINVENT 2.0: An AI tool for de novo drug design. Journal of Chemical Information and Modeling, 60(12):5918–5922, October 2020. doi: 10.1021/acs.jcim.0c00915. URL https://doi.org/10.1021/acs.jcim.0c00915.
- Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions. arXiv preprint arXiv:2308.09086, 2023.
- Coley, C. W. Defining and exploring chemical spaces. Trends in Chemistry, 3(2):133–145, 2021. ISSN 2589-5974. doi: https://doi.org/10.1016/j.trechm.2020.11.004. URL https://www.sciencedirect.com/science/article/pii/S2589597420302884. Special Issue: Machine Learning for Molecules and Materials.
- On the art of compiling and using “drug-like” chemical fragment spaces. ChemMedChem, 3(10):1503–1507, October 2008. ISSN 1860-7187. doi: 10.1002/cmdc.200800178. URL http://dx.doi.org/10.1002/cmdc.200800178.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1), June 2009. ISSN 1758-2946. doi: 10.1186/1758-2946-1-8. URL http://dx.doi.org/10.1186/1758-2946-1-8.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, August 2020. ISSN 1549-960X. doi: 10.1021/acs.jcim.0c00411. URL http://dx.doi.org/10.1021/acs.jcim.0c00411.
- Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS central science, 6(6):939–949, 2020.
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
- MoleGuLAR: Molecule generation using reinforcement learning with alternating rewards. Journal of Chemical Information and Modeling, 61(12):5815–5826, December 2021. doi: 10.1021/acs.jcim.1c01341. URL https://doi.org/10.1021/acs.jcim.1c01341.
- 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In ICLR, 2023a.
- Decompdiff: Diffusion models with decomposed priors for structure-based drug design. ICML, 2023b.
- Benchmarking generated poses: How rational is structure-based drug design with generative models?, 2023.
- Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
- Using autodock 4 and autodock vina with autodocktools: a tutorial. The Scripps Research Institute Molecular Graphics Laboratory, 10550(92037):1000, 2012.
- Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling, 60(12):6065–6073, 2020.
- Multi-objective gflownets, 2023.
- Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Scientific Reports, 10(1), December 2020. doi: 10.1038/s41598-020-78537-2. URL https://doi.org/10.1038/s41598-020-78537-2.
- Junction tree variational autoencoder for molecular graph generation. CoRR, abs/1802.04364, 2018. URL http://arxiv.org/abs/1802.04364.
- Learning from protein structure with geometric vector perceptrons, 2021.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Adam: A method for stochastic optimization, 2017.
- Landrum, G. et al. Rdkit: Open-source cheminformatics, 2006.
- DGFN: Double generative flow networks. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023. URL https://openreview.net/forum?id=1wa9JEanV5.
- Exploring chemical space with score-based out-of-distribution generation. In International Conference on Machine Learning, pp. 18872–18892. PMLR, 2023.
- Monn: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Systems, 10(4):308–322, 2020.
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23(1):3–25, 1997. ISSN 0169-409X. doi: https://doi.org/10.1016/S0169-409X(96)00423-1. URL https://www.sciencedirect.com/science/article/pii/S0169409X96004231. In Vitro Models for Selection of Development Candidates.
- Generating 3d molecules for target protein binding. In ICML, 2022.
- Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, October 2014. ISSN 1367-4803. doi: 10.1093/bioinformatics/btu626. URL http://dx.doi.org/10.1093/bioinformatics/btu626.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- A 3D generative model for structure-based drug design. In NeurIPS, 2021.
- Trajectory balance: Improved credit assignment in gflownets, 2023.
- Generating 3d molecular structures conditional on a receptor binding site with deep generative models. arXiv preprint arXiv:2010.14442, 2020.
- Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):1–14, 2011.
- Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), September 2017. doi: 10.1186/s13321-017-0235-x. URL https://doi.org/10.1186/s13321-017-0235-x.
- Ligand binding prediction using protein structure graphs and residual graph attention networks. Molecules, 27(16), 2022. ISSN 1420-3049. doi: 10.3390/molecules27165114. URL https://www.mdpi.com/1420-3049/27/16/5114.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
- Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In ICML, pp. 17644–17655. PMLR, 2022.
- Graph classification via deep learning with virtual nodes, 2017.
- Geometric deep learning for structure-based ligand design. ACS Central Science, November 2023. ISSN 2374-7951. doi: 10.1021/acscentsci.3c00572. URL http://dx.doi.org/10.1021/acscentsci.3c00572.
- Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery, 18(1):41–58, oct 2018. doi: 10.1038/nrd.2018.168. URL https://doi.org/10.1038/nrd.2018.168.
- Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell, 185(14):2559–2575.e28, July 2022. ISSN 0092-8674. doi: 10.1016/j.cell.2022.05.013. URL http://dx.doi.org/10.1016/j.cell.2022.05.013.
- Structure-based drug design with equivariant diffusion models, 2023.
- Usrcat: real-time ultrafast shape recognition with pharmacophoric constraints. Journal of Cheminformatics, 4(1), November 2012. ISSN 1758-2946. doi: 10.1186/1758-2946-4-27. URL http://dx.doi.org/10.1186/1758-2946-4-27.
- Pharmaconet: Accelerating large-scale virtual screening by deep pharmacophore modeling, 2023.
- Molecular generative model via retrosynthetically prepared chemical building block assembly. Advanced Science, 10(8):2206674, 2023.
- Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11):1026–1028, October 2017. ISSN 1546-1696. doi: 10.1038/nbt.3988. URL http://dx.doi.org/10.1038/nbt.3988.
- AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010.
- Most ligand-based classification benchmarks reward memorization rather than generalization. Journal of chemical information and modeling, 58(5):916–932, 2018.
- Glossary of terms used in medicinal chemistry (iupac recommendations 1998). Pure and applied Chemistry, 70(5):1129–1143, 1998.
- Yang, S.-Y. Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug discovery today, 15(11-12):444–450, 2010.
- Graph transformer networks, 2020.
- Deepbindgcn: Integrating molecular vector representation with graph convolutional neural networks for protein–ligand interaction prediction. Molecules, 28(12):4691, 2023a.
- Learning subpocket prototypes for generalizable structure-based drug design. In ICML, 2023.
- Molecule generation for target protein binding with structural motifs. In The Eleventh ICLR, 2023b.
- A systematic survey in geometric deep learning for structure-based drug design, 2023c.
- Optimization of molecules via deep reinforcement learning. Scientific Reports, 9(1), jul 2019. doi: 10.1038/s41598-019-47148-x. URL https://doi.org/10.1038%2Fs41598-019-47148-x.
- Tony Shen (5 papers)
- Seonghwan Seo (5 papers)
- Grayson Lee (2 papers)
- Mohit Pandey (9 papers)
- Jason R Smith (1 paper)
- Artem Cherkasov (4 papers)
- Woo Youn Kim (24 papers)
- Martin Ester (29 papers)