FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise (2312.12737v2)
Abstract: Determining whether a molecule can be synthesized is crucial in chemistry and drug discovery, as it guides experimental prioritization and molecule ranking in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to new chemical spaces or fail to discriminate based on subtle differences such as chirality. This work addresses these limitations by introducing the Focused Synthesizability score~(FSscore), which uses machine learning to rank structures based on their relative ease of synthesis. First, a baseline trained on an extensive set of reactant-product pairs is established, which is then refined with expert human feedback tailored to specific chemical spaces. This targeted fine-tuning improves performance on these chemical scopes, enabling more accurate differentiation between molecules that are hard and easy to synthesize. The FSscore showcases how a human-in-the-loop framework can be utilized to optimize the assessment of synthetic feasibility for various chemical applications.
- Estimation of the size of drug-like chemical space based on gdb-17 data. Journal of computer-aided molecular design, 27:675–679, 2013.
- Chemical space as a source for new drugs. MedChemComm, 1(1):30–38, 2010.
- Virtual screening—an overview. Drug Discovery Today, 3(4):160–178, April 1998. ISSN 1359-6446. doi: 10.1016/S1359-6446(97)01163-X. URL https://www.sciencedirect.com/science/article/pii/S135964469701163X.
- Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1):48, September 2017. ISSN 1758-2946. doi: 10.1186/s13321-017-0235-x. URL https://doi.org/10.1186/s13321-017-0235-x.
- Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, February 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572. URL https://doi.org/10.1021/acscentsci.7b00572. Publisher: American Chemical Society.
- Graph convolutional policy network for goal-directed molecular graph generation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/d60678e8f2ba9c540798ebbde31177e8-Paper.pdf.
- REINVENT 2.0: An AI Tool for De Novo Drug Design. Journal of Chemical Information and Modeling, 60(12):5918–5922, December 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00915. URL https://doi.org/10.1021/acs.jcim.0c00915. Publisher: American Chemical Society.
- SyntaLinker: automatic fragment linking with deep conditional transformer neural networks. Chemical Science, 11(31):8312–8322, 2020. doi: 10.1039/D0SC03126G. URL https://pubs.rsc.org/en/content/articlelanding/2020/sc/d0sc03126g. Publisher: Royal Society of Chemistry.
- MolGAN: An implicit generative model for small molecular graphs, September 2022. URL http://arxiv.org/abs/1805.11973. arXiv:1805.11973 [cs, stat].
- Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets, May 2022. URL http://arxiv.org/abs/2205.07249. arXiv:2205.07249 [cs, q-bio].
- Link-INVENT: Generative Linker Design with Reinforcement Learning. April 2022. doi: 10.26434/chemrxiv-2022-qkx9f. URL https://chemrxiv.org/engage/chemrxiv/article-details/62628b2debac3a61c7debf31.
- Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design, October 2022. URL http://arxiv.org/abs/2210.04893. arXiv:2210.04893 [physics, q-bio].
- Structure-based Drug Design with Equivariant Diffusion Models, October 2022. URL http://arxiv.org/abs/2210.13695. arXiv:2210.13695 [cs, q-bio].
- Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design, October 2022. URL http://arxiv.org/abs/2210.05274. arXiv:2210.05274 [cs, q-bio].
- Generative models for molecular discovery: Recent advances and challenges. WIREs Computational Molecular Science, 12(5):e1608, 2022. ISSN 1759-0884. doi: 10.1002/wcms.1608. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1608. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1608.
- Deep generative models for 3D molecular structure. Current Opinion in Structural Biology, 80:102566, June 2023. ISSN 0959-440X. doi: 10.1016/j.sbi.2023.102566. URL https://www.sciencedirect.com/science/article/pii/S0959440X23000404.
- Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. Journal of Cheminformatics, 15(1):6, January 2023. ISSN 1758-2946. doi: 10.1186/s13321-023-00678-z. URL https://doi.org/10.1186/s13321-023-00678-z.
- The Hitchhiker’s Guide to Deep Learning Driven Generative Chemistry. ACS Medicinal Chemistry Letters, June 2023. doi: 10.1021/acsmedchemlett.3c00041. URL https://doi.org/10.1021/acsmedchemlett.3c00041. Publisher: American Chemical Society.
- The Synthesizability of Molecules Proposed by Generative Models. Journal of Chemical Information and Modeling, 60(12):5714–5723, December 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00174. URL https://doi.org/10.1021/acs.jcim.0c00174. Publisher: American Chemical Society.
- Fake it until you make it? generative de novo design and virtual screening of synthesizable molecules. Current Opinion in Structural Biology, 82:102658, 2023.
- Learning chemical intuition from humans in the loop, February 2023. URL https://chemrxiv.org/engage/chemrxiv/article-details/63f89282897b18336f0c5a55.
- SCScore: Synthetic Complexity Learned from a Reaction Corpus. Journal of Chemical Information and Modeling, 58(2):252–261, February 2018. ISSN 1549-9596. doi: 10.1021/acs.jcim.7b00622. URL https://doi.org/10.1021/acs.jcim.7b00622. Publisher: American Chemical Society.
- SYBA: Bayesian estimation of synthetic accessibility of organic compounds. Journal of Cheminformatics, 12(1):35, May 2020. ISSN 1758-2946. doi: 10.1186/s13321-020-00439-2. URL https://doi.org/10.1186/s13321-020-00439-2.
- Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chemical Science, 12(9):3339–3349, March 2021. ISSN 2041-6539. doi: 10.1039/D0SC05401A. URL https://pubs.rsc.org/en/content/articlelanding/2021/sc/d0sc05401a. Publisher: The Royal Society of Chemistry.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1):8, June 2009. ISSN 1758-2946. doi: 10.1186/1758-2946-1-8. URL https://doi.org/10.1186/1758-2946-1-8.
- Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism. Journal of Chemical Information and Modeling, 62(12):2973–2986, June 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.2c00038. URL https://doi.org/10.1021/acs.jcim.2c00038. Publisher: American Chemical Society.
- Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph. Molecules, 27(3):1039, January 2022. ISSN 1420-3049. doi: 10.3390/molecules27031039. URL https://www.mdpi.com/1420-3049/27/3/1039. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute.
- RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software. Journal of Chemical Information and Modeling, 62(10):2293–2300, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c01476. URL https://doi.org/10.1021/acs.jcim.1c01476. Publisher: American Chemical Society.
- DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. Journal of Chemical Information and Modeling, August 2023. ISSN 1549-9596. doi: 10.1021/acs.jcim.3c01134. URL https://doi.org/10.1021/acs.jcim.3c01134. Publisher: American Chemical Society.
- Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html.
- Training language models to follow instructions with human feedback, March 2022. URL http://arxiv.org/abs/2203.02155. arXiv:2203.02155 [cs].
- Introducing ChatGPT. https://openai.com/blog/chatgpt, 2022.
- Modeling a Crowdsourced Definition of Molecular Complexity. Journal of Chemical Information and Modeling, 54(6):1604–1616, June 2014. ISSN 1549-9596. doi: 10.1021/ci5001778. URL https://doi.org/10.1021/ci5001778. Publisher: American Chemical Society.
- Choices, values, and frames. American psychologist, 39(4):341, 1984.
- How attentive are graph attention networks? In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=F72ximsx7C1.
- Enhancing Molecular Representations Via Graph Transformation Layers. Journal of Chemical Information and Modeling, April 2023. ISSN 1549-9596. doi: 10.1021/acs.jcim.3c00059. URL https://doi.org/10.1021/acs.jcim.3c00059. Publisher: American Chemical Society.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012.
- When SMILES Smiles, Practicality Judgment and Yield Prediction of Chemical Reaction via Deep Chemical Language Processing. IEEE Access, 9:85071–85083, 2021. ISSN 2169-3536. doi: 10.1109/ACCESS.2021.3083838. Conference Name: IEEE Access.
- Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology, 11, 2020. ISSN 1663-9812. URL https://www.frontiersin.org/articles/10.3389/fphar.2020.565644.
- COCONUT online: Collection of Open Natural Products database. Journal of Cheminformatics, 13(1):2, January 2021. ISSN 1758-2946. doi: 10.1186/s13321-020-00478-9. URL https://doi.org/10.1186/s13321-020-00478-9.
- PROTAC-DB 2.0: an updated database of PROTACs. Nucleic Acids Research, page gkac946, October 2022. ISSN 0305-1048. doi: 10.1093/nar/gkac946. URL https://doi.org/10.1093/nar/gkac946.
- Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization, October 2022. URL http://arxiv.org/abs/2206.12411. arXiv:2206.12411 [cs, q-bio].
- Augmented Memory: Capitalizing on Experience Replay to Accelerate De Novo Molecular Design, May 2023. URL http://arxiv.org/abs/2305.16160. arXiv:2305.16160 [cs, q-bio].
- Chemspace search. https://chem-space.com/search.
- AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. Journal of Cheminformatics, 12(1):70, November 2020. ISSN 1758-2946. doi: 10.1186/s13321-020-00472-1. URL https://doi.org/10.1186/s13321-020-00472-1.
- RDKit: Open-source cheminformatics. http://www.rdkit.org.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, 1987. ISSN 0377-0427. doi: https://doi.org/10.1016/0377-0427(87)90125-7. URL https://www.sciencedirect.com/science/article/pii/0377042787901257.
- Permutation Invariant Graph-to-Sequence Model for Template-Free Retrosynthesis and Reaction Prediction. Journal of Chemical Information and Modeling, 62(15):3503–3513, August 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.2c00321. URL https://doi.org/10.1021/acs.jcim.2c00321. Publisher: American Chemical Society.
- Breaking cycles in noisy hierarchies. In Proceedings of the 2017 ACM on Web Science Conference, pages 151–160, 2017.
- Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise. Journal of Chemical Information and Modeling, 53(8):1893–1904, August 2013. ISSN 1549-9596. doi: 10.1021/ci300604z. URL https://doi.org/10.1021/ci300604z. Publisher: American Chemical Society.
- UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American Chemical Society, 114(25):10024–10035, December 1992. ISSN 0002-7863. doi: 10.1021/ja00051a040. URL https://doi.org/10.1021/ja00051a040. Publisher: American Chemical Society.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108–122, 2013.