Assessing the Extrapolation Capability of Template-Free Retrosynthesis Models (2403.03960v1)
Abstract: Despite the acknowledged capability of template-free models in exploring unseen reaction spaces compared to template-based models for retrosynthesis prediction, their ability to venture beyond established boundaries remains relatively uncharted. In this study, we empirically assess the extrapolation capability of state-of-the-art template-free models by meticulously assembling an extensive set of out-of-distribution (OOD) reactions. Our findings demonstrate that while template-free models exhibit potential in predicting precursors with novel synthesis rules, their top-10 exact-match accuracy in OOD reactions is strikingly modest (< 1%). Furthermore, despite the capability of generating novel reactions, our investigation highlights a recurring issue where more than half of the novel reactions predicted by template-free models are chemically implausible. Consequently, we advocate for the future development of template-free models that integrate considerations of chemical feasibility when navigating unexplored regions of reaction space.
- Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1604, 2022.
- Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chemical Science, 2023.
- Computer-assisted retrosynthesis based on molecular similarity. ACS central science, 3(12):1237–1245, 2017.
- Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal, 23(25):5966–5971, 2017.
- Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 32, 2019.
- Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au, 1(10):1612–1620, 2021.
- Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science, 3(10):1103–1113, 2017.
- State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications, 11(1):1–11, 2020.
- Gta: Graph truncated attention for retrosynthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pages 531–539. 2021.
- Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1):015022, 2022.
- Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In International Conference on Machine Learning, pages 22475–22490. PMLR, 2022.
- Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling, 62(15):3503–3513, 2022.
- Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012.
- A collection of robust organic synthesis reactions for in silico molecule design. Journal of chemical information and modeling, 51(12):3093–3098, 2011.
- Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling, 59(6):2529–2537, 2019.
- What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling, 56(12):2336–2346, 2016.
- Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in neural information processing systems, 30, 2017.
- A generalized-template-based graph neural network for accurate organic reactivity prediction. Nature Machine Intelligence, 4(9):772–780, 2022.
- Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9):1572–1583, 2019.
- Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science, 11(12):3316–3325, 2020.
- Landrum, G. Rdkit documentation. Release, 1(1-79):4, 2013.
- Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances, 7(15):eabe4166, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science, 13(31):9023–9034, 2022.
- o𝑜oitalic_o-gnn: incorporating ring priors into molecular modeling. In The Eleventh International Conference on Learning Representations. 2022.