Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design (2309.13957v2)
Abstract: Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the performance of the recently reported Augmented Memory algorithm, which achieved the new state-of-the-art on the Practical Molecular Optimization benchmark for sample efficiency. The combined algorithm generates more high reward molecules and faster, given a fixed oracle budget. Beam Enumeration shows that improvements to explainability and sample efficiency for molecular design can be made synergistic.
- Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10):1340–1347, 2010.
- 2, 4-diaminopyrimidine mk2 inhibitors. part i: observation of an unexpected inhibitor binding mode. Bioorganic & medicinal chemistry letters, 20(1):330–333, 2010.
- The influence of lipophilicity in drug discovery and design. Expert opinion on drug discovery, 7(10):863–875, 2012.
- De novo drug design using reinforcement learning with graph-based deep generative models. Journal of Chemical Information and Modeling, 62(20):4863–4872, 2022.
- De novo design of nurr1 agonists via fragment-augmented generative deep learning in low-data regime. Journal of Medicinal Chemistry, 2023.
- The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
- Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381–27394, 2021a.
- GFlowNet foundations. CoRR, abs/2111.09266, 2021b.
- Quantifying the chemical beauty of drugs. Nature Chem, 4(2):90–98, February 2012. ISSN 1755-4349. doi: 10.1038/nchem.1243. URL https://www.nature.com/articles/nchem.1243. Number: 2 Publisher: Nature Publishing Group.
- REINVENT 2.0: An AI Tool for De Novo Drug Design. J. Chem. Inf. Model., 60(12):5918–5922, December 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00915. URL https://doi.org/10.1021/acs.jcim.0c00915. Publisher: American Chemical Society.
- High-dimensional sequence transduction. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 3178–3182. IEEE, 2013.
- Recurrent neural network (rnn) model accelerates the development of antibacterial metronidazole derivatives. RSC advances, 12(35):22893–22901, 2022.
- Sample efficient reinforcement learning with active learning for molecular design. Chem. Sci., 2024. ISSN 2041-6539. doi: 10.1039/d3sc04653b.
- Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
- Target-focused library design by pocket-applied computer vision and fragment deep generative linking. Journal of Medicinal Chemistry, 65(20):13771–13783, 2022.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1:1–11, 2009.
- LibINVENT: Reaction-based Generative Scaffold Decoration for in Silico Library Design. J. Chem. Inf. Model., 62(9):2046–2063, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00469. URL https://doi.org/10.1021/acs.jcim.1c00469. Publisher: American Chemical Society.
- MIMOSA: Multi-constraint molecule sampling for molecule optimization. AAAI, 2020.
- Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022a.
- Differentiable scaffolding tree for molecular optimization. International Conference on Learning Representations, 2022b.
- Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization, October 2022. URL http://arxiv.org/abs/2206.12411. arXiv:2206.12411 [cs, q-bio].
- ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res, 40(Database issue):D1100–D1107, January 2012a. ISSN 0305-1048. doi: 10.1093/nar/gkr777. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245175/.
- Chembl: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1):D1100–D1107, 2012b.
- Molegular: molecule generation using reinforcement learning with alternating rewards. Journal of Chemical Information and Modeling, 61(12):5815–5826, 2021.
- Explainable ai: current status and future directions. arXiv preprint arXiv:2107.07045, 2021.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- Generative Adversarial Networks, June 2014. URL http://arxiv.org/abs/1406.2661. arXiv:1406.2661 [cs, stat].
- Alex Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012.
- Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Science Advances, 7(24):eabg3338, 2021.
- Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models, February 2018. URL http://arxiv.org/abs/1705.10843. arXiv:1705.10843 [cs, stat].
- Augmented memory: Capitalizing on experience replay to accelerate de novo molecular design. arXiv preprint arXiv:2305.16160, 2023.
- Dockstream: a docking wrapper to enhance de novo molecular design. Journal of cheminformatics, 13(1):1–21, 2021a.
- DockStream: a docking wrapper to enhance de novo molecular design. Journal of Cheminformatics, 13(1):89, November 2021b. ISSN 1758-2946. doi: 10.1186/s13321-021-00563-7. URL https://doi.org/10.1186/s13321-021-00563-7.
- Data-efficient graph grammar learning for molecular generation. arXiv preprint arXiv:2203.08031, 2022.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Effective reaction-based de novo strategy for kinase targets: a case study on mertk inhibitors. Journal of Chemical Information and Modeling, 62(7):1654–1668, 2022.
- Pcw-a1001, ai-assisted de novo design approach to design a selective inhibitor for flt-3 (d835y) in acute myeloid leukemia. Frontiers in Molecular Biosciences, 9:1072028, 2022.
- Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp. 2323–2332. PMLR, 2018.
- Multi-Objective Molecule Generation using Interpretable Substructures. In Proceedings of the 37th International Conference on Machine Learning, pp. 4849–4859. PMLR, November 2020. URL https://proceedings.mlr.press/v119/jin20b.html. ISSN: 2640-3498.
- Auto-Encoding Variational Bayes, December 2022. URL http://arxiv.org/abs/1312.6114. arXiv:1312.6114 [cs, stat].
- Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun Chem, 5(1):1–11, October 2022. ISSN 2399-3669. doi: 10.1038/s42004-022-00733-0. URL https://www.nature.com/articles/s42004-022-00733-0. Number: 1 Publisher: Nature Publishing Group.
- Structure of acetylcholinesterase complexed with e2020 (aricept®): implications for the design of new anti-alzheimer drugs. Structure, 7(3):297–307, 1999.
- Chemspaceal: An efficient active learning methodology applied to protein-specific molecular generation. arXiv preprint arXiv:2309.05853, 2023.
- Discovery of potent, selective, and orally bioavailable small-molecule inhibitors of cdk8 for the treatment of cancer. Journal of Medicinal Chemistry, 2023.
- Generative deep learning enables the discovery of a potent and selective ripk1 inhibitor. Nature Communications, 13(1):6891, 2022.
- Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8:293–321, 1992.
- Exploring graph traversal algorithms in graph-based molecular generation. Journal of Chemical Information and Modeling, 62(9):2093–2100, 2021a.
- Graph networks for molecular design. Machine Learning: Science and Technology, 2(2):025023, 2021b.
- Graph networks for molecular design. Mach. Learn.: Sci. Technol., 2(2):025023, March 2021c. ISSN 2632-2153. doi: 10.1088/2632-2153/abcf91. URL https://dx.doi.org/10.1088/2632-2153/abcf91. Publisher: IOP Publishing.
- Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid x receptor modulators. Communications Chemistry, 1(1):68, 2018.
- Automated relative binding free energy calculations from smiles to δ𝛿\deltaitalic_δδ𝛿\deltaitalic_δg. Communications Chemistry, 6(1):82, 2023.
- Beam search for automated design and scoring of novel ror ligands with machine intelligence. Angewandte Chemie International Edition, 60(35):19477–19482, 2021.
- Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nature Communications, 14(1):114, 2023.
- Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1):48, September 2017. ISSN 1758-2946. doi: 10.1186/s13321-017-0235-x. URL https://doi.org/10.1186/s13321-017-0235-x.
- Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology, 11, 2020a. ISSN 1663-9812. URL https://www.frontiersin.org/articles/10.3389/fphar.2020.565644.
- Molecular sets (moses): a benchmarking platform for molecular generation models. Frontiers in pharmacology, 11:565644, 2020b.
- Deep reinforcement learning for de novo drug design. Science Advances, 4(7):eaap7885, July 2018. doi: 10.1126/sciadv.aap7885. URL https://www.science.org/doi/10.1126/sciadv.aap7885. Publisher: American Association for the Advancement of Science.
- Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024–10035, 1992.
- AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chemical Science, 14(6):1443–1452, 2023. doi: 10.1039/D2SC05709C. URL https://pubs.rsc.org/en/content/articlelanding/2023/sc/d2sc05709c. Publisher: Royal Society of Chemistry.
- ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016.
- De novo design of κ𝜅\kappaitalic_κ-opioid receptor antagonists using a generative deep learning framework. bioRxiv, pp. 2023–04, 2023.
- Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, July 2018. doi: 10.1126/science.aat2663. URL https://www.science.org/doi/10.1126/science.aat2663. Publisher: American Association for the Advancement of Science.
- Optimizing distributions over molecular space. an objective-reinforced generative adversarial network for inverse-design chemistry (organic). ChemRxiv, 2017.
- Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017.
- Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
- Application of deep generative model for design of pyrrolo [2, 3-d] pyrimidine derivatives as new selective tank binding kinase 1 (tbk1) inhibitors. European Journal of Medicinal Chemistry, 247:115034, 2023.
- Discovery of pyrazolo [3, 4-d] pyridazinone derivatives as selective ddr1 inhibitors via deep learning based design, synthesis, and biological evaluation. Journal of Medicinal Chemistry, 65(1):103–119, 2021.
- Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010.
- pbrics: A novel fragmentation method for explainable property prediction of drug-like small molecules. Journal of Chemical Information and Modeling, 2023.
- Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015.
- Structure of the d2 dopamine receptor bound to the atypical antipsychotic drug risperidone. Nature, 555(7695):269–273, 2018.
- David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci., 28(1):31–36, February 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005. URL https://doi.org/10.1021/ci00057a005. Publisher: American Chemical Society.
- Model agnostic generation of counterfactual explanations for molecules. Chemical science, 13(13):3697–3705, 2022.
- MARS: Markov molecular sampling for multi-objective drug discovery. In ICLR, 2021.
- Design and synthesis of ddr1 inhibitors with a desired pharmacophore using deep generative models. ChemMedChem, 16(6):955–958, 2021.
- Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation, February 2019. URL http://arxiv.org/abs/1806.02473. arXiv:1806.02473 [cs, stat].
- A novel scalarized scaffold hopping algorithm with graph-based variational autoencoder for discovery of jak1 inhibitors. ACS omega, 6(35):22945–22954, 2021.
- Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol, 37(9):1038–1040, September 2019. ISSN 1546-1696. doi: 10.1038/s41587-019-0224-x. URL https://www.nature.com/articles/s41587-019-0224-x. Number: 9 Publisher: Nature Publishing Group.